Real-time Monocular Full-body Capture in World Space via Sequential Proxy-to-Motion Learning
arxiv projectLearning-based approaches to monocular motion capture have recently shown promising results by learning to regress in a data-driven manner. However, due to the challenges in data collection and network designs, it remains challenging for existing solutions to achieve real-time full-body capture while being accurate in world space. In this work, we introduce ProxyCap, a human-centric proxy-to-motion learning scheme to learn world-space motions from a proxy dataset of 2D skeleton sequences and 3D rotational motions. Such proxy data enables us to build a learning-based network with accurate world-space supervision while also mitigating the generalization issues. For more accurate and physically plausible predictions in world space, our network is designed to learn human motions from a human-centric perspective, which enables the understanding of the same motion captured with different camera trajectories. Moreover, a contact-aware neural motion descent module is proposed in our network so that it can be aware of foot-ground contact and motion misalignment with the proxy observations. With the proposed learning-based solution, we demonstrate the first real-time monocular full-body capture system with plausible foot-ground contact in world space even using hand-held moving cameras.
| Author(s): | Hongwen Zhang and Yuxiang Zhang and Liangxiao Hu and Jiajun Zhang and Hongwei Yi and Shengping Zhang and Yebin Liu |
| Links: | |
| Book Title: | IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) |
| Pages: | 1954 - 1964 |
| Year: | 2024 |
| Month: | September |
| Day: | 16 |
| BibTeX Type: | Conference Paper (inproceedings) |
| Address: | Piscataway, NJ |
| DOI: | 10.1109/CVPR52733.2024.00191 |
| Event Name: | CVPR 2024 |
| Event Place: | Seattle, USA |
| State: | Published |
| URL: | https://doi.org/10.1109/CVPR52733.2024.00191 |
| Electronic Archiving: | grant_archive |
BibTeX
@inproceedings{proxycap:cvpr:2024,
title = {Real-time Monocular Full-body Capture in World Space via Sequential Proxy-to-Motion Learning},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
abstract = {Learning-based approaches to monocular motion capture have recently shown promising results by learning to regress in a data-driven manner. However, due to the challenges in data collection and network designs, it remains challenging for existing solutions to achieve real-time full-body capture while being accurate in world space. In this work, we introduce ProxyCap, a human-centric proxy-to-motion learning scheme to learn world-space motions from a proxy dataset of 2D skeleton sequences and 3D rotational motions. Such proxy data enables us to build a learning-based network with accurate world-space supervision while also mitigating the generalization issues. For more accurate and physically plausible predictions in world space, our network is designed to learn human motions from a human-centric perspective, which enables the understanding of the same motion captured with different camera trajectories. Moreover, a contact-aware neural motion descent module is proposed in our network so that it can be aware of foot-ground contact and motion misalignment with the proxy observations. With the proposed learning-based solution, we demonstrate the first real-time monocular full-body capture system with plausible foot-ground contact in world space even using hand-held moving cameras.},
pages = {1954 - 1964},
address = {Piscataway, NJ },
month = sep,
year = {2024},
author = {Zhang, Hongwen and Zhang, Yuxiang and Hu, Liangxiao and Zhang, Jiajun and Yi, Hongwei and Zhang, Shengping and Liu, Yebin},
doi = {10.1109/CVPR52733.2024.00191},
url = {https://doi.org/10.1109/CVPR52733.2024.00191},
month_numeric = {9}
}