Real-time Monocular Full-body Capture in World Space via Sequential Proxy-to-Motion Learning

arxiv project

Perceiving Systems

Hongwei Yi

Guest Scientist

Learning-based approaches to monocular motion capture have recently shown promising results by learning to regress in a data-driven manner. However, due to the challenges in data collection and network designs, it remains challenging for existing solutions to achieve real-time full-body capture while being accurate in world space. In this work, we introduce ProxyCap, a human-centric proxy-to-motion learning scheme to learn world-space motions from a proxy dataset of 2D skeleton sequences and 3D rotational motions. Such proxy data enables us to build a learning-based network with accurate world-space supervision while also mitigating the generalization issues. For more accurate and physically plausible predictions in world space, our network is designed to learn human motions from a human-centric perspective, which enables the understanding of the same motion captured with different camera trajectories. Moreover, a contact-aware neural motion descent module is proposed in our network so that it can be aware of foot-ground contact and motion misalignment with the proxy observations. With the proposed learning-based solution, we demonstrate the first real-time monocular full-body capture system with plausible foot-ground contact in world space even using hand-held moving cameras.

Author(s):	Hongwen Zhang and Yuxiang Zhang and Liangxiao Hu and Jiajun Zhang and Hongwei Yi and Shengping Zhang and Yebin Liu
Links:	arxiv project
Book Title:	IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Pages:	1954 - 1964
Year:	2024
Month:	September
Day:	16

BibTeX Type:	Conference Paper (inproceedings)

Address:	Piscataway, NJ
DOI:	10.1109/CVPR52733.2024.00191
Event Name:	CVPR 2024
Event Place:	Seattle, USA
State:	Published
URL:	https://doi.org/10.1109/CVPR52733.2024.00191

Electronic Archiving:	grant_archive

BibTeX

@inproceedings{proxycap:cvpr:2024,
  title = {Real-time Monocular Full-body Capture in World Space via Sequential Proxy-to-Motion Learning},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  abstract = {Learning-based approaches to monocular motion capture have recently shown promising results by learning to regress in a data-driven manner. However, due to the challenges in data collection and network designs, it remains challenging for existing solutions to achieve real-time full-body capture while being accurate in world space. In this work, we introduce ProxyCap, a human-centric proxy-to-motion learning scheme to learn world-space motions from a proxy dataset of 2D skeleton sequences and 3D rotational motions. Such proxy data enables us to build a learning-based network with accurate world-space supervision while also mitigating the generalization issues. For more accurate and physically plausible predictions in world space, our network is designed to learn human motions from a human-centric perspective, which enables the understanding of the same motion captured with different camera trajectories. Moreover, a contact-aware neural motion descent module is proposed in our network so that it can be aware of foot-ground contact and motion misalignment with the proxy observations. With the proposed learning-based solution, we demonstrate the first real-time monocular full-body capture system with plausible foot-ground contact in world space even using hand-held moving cameras.},
  pages = {1954 - 1964},
  address = {Piscataway, NJ },
  month = sep,
  year = {2024},
  author = {Zhang, Hongwen and Zhang, Yuxiang and Hu, Liangxiao and Zhang, Jiajun and Yi, Hongwei and Zhang, Shengping and Liu, Yebin},
  doi = {10.1109/CVPR52733.2024.00191},
  url = {https://doi.org/10.1109/CVPR52733.2024.00191},
  month_numeric = {9}
}