MagicHOI: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips

Project Video Code

Perceiving Systems

Zicong (Alex) Fan

Guest Scientist

Most RGB-based hand-object reconstruction methods rely on object templates, while template-free methods typically assume full object visibility. This assumption often breaks in real-world settings, where fixed camera viewpoints and static grips leave parts of the object unobserved, resulting in implausible reconstructions. To overcome this, we present MagicHOI, a method for reconstructing hands and objects from short monocular interaction videos, even under limited viewpoint variation. Our key insight is that, despite the scarcity of paired 3D hand-object data, largescale novel view synthesis diffusion models offer rich object supervision. This supervision serves as a prior to regularize unseen object regions during hand interactions. Leveraging this insight, we integrate a novel view synthesis model into our hand-object reconstruction framework. We further align hand to object by incorporating visible contact constraints. Our results demonstrate that MagicHOI significantly outperforms existing state-of-the-art hand-object reconstruction methods. We also show that novel view synthesis diffusion priors effectively regularize unseen object regions, enhancing 3D hand-object reconstruction.

Author(s):	Shibo Wang and Haonan He and Maria Parelli and Christoph Gebhardt and Zicong Fan and Jie Song
Links:	Project Video Code
Book Title:	Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Year:	2025
Month:	October

BibTeX Type:	Conference Paper (inproceedings)

Event Name:	ICCV
Event Place:	Honolulu
State:	Published
URL:	https://byran-wang.github.io/MagicHOI/

BibTeX

@inproceedings{wang2024Magichoi,
  title = {{MagicHOI}: Leveraging {3D} Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  abstract = {Most RGB-based hand-object reconstruction methods rely on object templates, while template-free methods typically assume full object visibility. This assumption often breaks in real-world settings, where fixed camera viewpoints and static grips leave parts of the object unobserved, resulting in implausible reconstructions. To overcome this, we present MagicHOI, a method for reconstructing hands and objects from short monocular interaction videos, even under limited viewpoint variation. Our key insight is that, despite the scarcity of paired 3D hand-object data, largescale novel view synthesis diffusion models offer rich object supervision. This supervision serves as a prior to regularize unseen object regions during hand interactions. Leveraging this insight, we integrate a novel view synthesis model into our hand-object reconstruction framework. We further align hand to object by incorporating visible contact constraints. Our results demonstrate that MagicHOI significantly outperforms existing state-of-the-art hand-object reconstruction methods. We also show that novel view synthesis diffusion priors effectively regularize unseen object regions, enhancing 3D hand-object reconstruction.},
  month = oct,
  year = {2025},
  author = {Wang, Shibo and He, Haonan and Parelli, Maria and Gebhardt, Christoph and Fan, Zicong and Song, Jie},
  url = {https://byran-wang.github.io/MagicHOI/},
  month_numeric = {10}
}

Research

Departments

Max Planck Research Groups

Start-Up Teams

Research Groups

People

Contact

Our Institute

Our History

Career

Doctoral Programs

Training

Service Units

Central Scientific Facilities

Workshops

Campus Services

Impact

Cooperation

Partners and Initiatives