Zhihao Zhan

I'm the Autonomous Driving Algorithm Leader at TopXGun (Nanjing) Robotics Ltd. in China, where I lead a small team that mostly works on SLAM, Perception, Planning and 3D Reconstruction for UAVs.

I hold a M.Sc in Multimedia Infomation Technology from City University of Hong Kong, where I was advised by Prof. CHAN Ho Man and a B.Eng in Electronic Information Engineering from Nanjing Tech University.

Mail: zhihazhan2-c [at] my [dot] cityu [dot] edu [dot] hk

LinkedIn / Scholar / Github

Research

I'm interested in SLAM, Computer Vision, Deep Learning and Generative AI.

*: equal contribution; †: corresponding author(s)

	Rethinking Video Super-Resolution: Towards Diffusion-Based Methods without Motion Alignment Zhihao Zhan, Wang Pang, Xiang Zhu, and Yechao Bai^† the 17th International Conference on Signal Processing Systems (ICSPS)*, 2025 arXiv We rethink the approach to video super-resolution by introducing a method based on the Diffusion Posterior Sampling framework, combined with an unconditional video diffusion transformer operating in latent space. The video generation model, a diffusion transformer, functions as a space-time model. We argue that a powerful model, which learns the physics of the real world, can easily handle various kinds of motion patterns as prior knowledge, thus eliminating the need for explicit estimation of optical flows or motion parameters for pixel alignment.
	A Multi-Sensor Fusion Approach for Rapid Orthoimage Generation in Large-Scale UAV Mapping Jialei He, Zhihao Zhan, Zhituo Tu, Xiang Zhu^†, and Jie Yuan^† IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025, Oral arXiv In this paper, we utilize multi-sensor data to overcome the limitations of conventional orthoimage generation methods in terms of temporal performance, system robustness, and geographic reference accuracy. A prior-pose-optimized feature matching method is introduced to enhance matching speed and accuracy, reducing the number of required features and providing precise references for the Structure from Motion (SfM) process. The proposed method exhibits robustness in low-texture scenes like farmlands, where feature matching is difficult.
	Image Motion Blur Removal in the Temporal Dimension with Video Diffusion Models Wang Pang, Zhihao Zhan, Xiang Zhu, and Yechao Bai^† IEEE International Conference on Image Processing (ICIP)*, 2025 website / arXiv / code We propose a novel single-image deblurring approach that treats motion blur as a temporal averaging phenomenon. Our core innovation lies in leveraging a pre-trained video diffusion transformer model to capture diverse motion dynamics within a latent space. It sidesteps explicit kernel estimation and effectively accommodates diverse motion patterns.

Services

Conference Reviewer: IROS (2025).

This website is adapted from Jon Barron's template.