Zhihao Zhan

I'm the Autonomous Driving Algorithm Leader at TopXGun (Nanjing) Robotics Ltd. in China, where I lead a small team that mostly works on SLAM, Perception, Planning and 3D Reconstruction for UAVs.

I hold a M.Sc in Multimedia Infomation Technology from City University of Hong Kong, where I was advised by Prof. CHAN Ho Man and a B.Eng in Electronic Information Engineering from Nanjing Tech University.

Mail: zhihazhan2-c [at] my [dot] cityu [dot] edu [dot] hk

LinkedIn  /  Scholar  /  Github

profile photo

Research

I'm interested in SLAM, Computer Vision, Deep Learning and Generative AI.

*: equal contribution; †: corresponding author(s)

LiDAR-UDA Rethinking Video Super-Resolution: Towards Diffusion-Based Methods without Motion Alignment
Zhihao Zhan*, Wang Pang*, Xiang Zhu*, and Yechao Bai
the 17th International Conference on Signal Processing Systems (ICSPS), 2025
arXiv

We rethink the approach to video super-resolution by introducing a method based on the Diffusion Posterior Sampling framework, combined with an unconditional video diffusion transformer operating in latent space. The video generation model, a diffusion transformer, functions as a space-time model. We argue that a powerful model, which learns the physics of the real world, can easily handle various kinds of motion patterns as prior knowledge, thus eliminating the need for explicit estimation of optical flows or motion parameters for pixel alignment.

Fusion-Ortho A Multi-Sensor Fusion Approach for Rapid Orthoimage Generation in Large-Scale UAV Mapping
Jialei He*, Zhihao Zhan*, Zhituo Tu, Xiang Zhu, and Jie Yuan
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025, Oral
arXiv

In this paper, we utilize multi-sensor data to overcome the limitations of conventional orthoimage generation methods in terms of temporal performance, system robustness, and geographic reference accuracy. A prior-pose-optimized feature matching method is introduced to enhance matching speed and accuracy, reducing the number of required features and providing precise references for the Structure from Motion (SfM) process. The proposed method exhibits robustness in low-texture scenes like farmlands, where feature matching is difficult.

VDM-MD Image Motion Blur Removal in the Temporal Dimension with Video Diffusion Models
Wang Pang*, Zhihao Zhan*, Xiang Zhu*, and Yechao Bai
IEEE International Conference on Image Processing (ICIP), 2025
website / arXiv / code

We propose a novel single-image deblurring approach that treats motion blur as a temporal averaging phenomenon. Our core innovation lies in leveraging a pre-trained video diffusion transformer model to capture diverse motion dynamics within a latent space. It sidesteps explicit kernel estimation and effectively accommodates diverse motion patterns.

Services

Conference Reviewer: IROS (2025).


This website is adapted from Jon Barron's template.