Monocular, One-Stage, Regression of Multiple 3D People
pdf supp arXiv code
This paper focuses on the regression of multiple 3D people from a single RGB image. Existing approaches predominantly follow a multi-stage pipeline that first detects people in bounding boxes and then independently regresses their 3D body meshes. In contrast, we propose to Regress all meshes in a One-stage fashion for Multiple 3D People (termed ROMP). The approach is conceptually simple, bounding box-free, and able to learn a per-pixel representation in an end-to-end manner. Our method simultaneously predicts a Body Center heatmap and a Mesh Parameter map, which can jointly describe the 3D body mesh on the pixel level. Through a body-center-guided sampling process, the body mesh parameters of all people in the image are easily extracted from the Mesh Parameter map. Equipped with such a fine-grained representation, our one-stage framework is free of the complex multi-stage process and more robust to occlusion. Compared with state-of-the-art methods, ROMP achieves superior performance on the challenging multi-person benchmarks, including 3DPW and CMU Panoptic. Experiments on crowded/occluded datasets demonstrate the robustness under various types of occlusion. The released code is the first real-time implementation of monocular multi-person 3D mesh regression.
| Author(s): | Sun, Yu and Bao, Qian and Liu, Wu and Fu, Yili and Black, Michael J. and Mei, Tao |
| Links: | |
| Book Title: | Proc. International Conference on Computer Vision (ICCV) |
| Pages: | 11159--11168 |
| Year: | 2021 |
| Month: | October |
| Publisher: | IEEE |
| Project(s): | |
| BibTeX Type: | Conference Paper (inproceedings) |
| Address: | Piscataway, NJ |
| DOI: | 10.1109/ICCV48922.2021.01099 |
| Event Name: | International Conference on Computer Vision 2021 |
| Event Place: | virtual (originally Montreal, Canada) |
| State: | Published |
| Electronic Archiving: | grant_archive |
| ISBN: | 978-1-6654-2812-5 |
BibTeX
@inproceedings{ROMP:ICCV:2021,
title = {Monocular, One-Stage, Regression of Multiple {3D} People},
booktitle = {Proc. International Conference on Computer Vision (ICCV)},
abstract = {This paper focuses on the regression of multiple 3D people from a single RGB image. Existing approaches predominantly follow a multi-stage pipeline that first detects people in bounding boxes and then independently regresses their 3D body meshes. In contrast, we propose to Regress all meshes in a One-stage fashion for Multiple 3D People (termed ROMP). The approach is conceptually simple, bounding box-free, and able to learn a per-pixel representation in an end-to-end manner. Our method simultaneously predicts a Body Center heatmap and a Mesh Parameter map, which can jointly describe the 3D body mesh on the pixel level. Through a body-center-guided sampling process, the body mesh parameters of all people in the image are easily extracted from the Mesh Parameter map. Equipped with such a fine-grained representation, our one-stage framework is free of the complex multi-stage process and more robust to occlusion. Compared with state-of-the-art methods, ROMP achieves superior performance on the challenging multi-person benchmarks, including 3DPW and CMU Panoptic. Experiments on crowded/occluded datasets demonstrate the robustness under various types of occlusion. The released code is the first real-time implementation of monocular multi-person 3D mesh regression.},
pages = {11159--11168},
publisher = {IEEE},
address = {Piscataway, NJ},
month = oct,
year = {2021},
author = {Sun, Yu and Bao, Qian and Liu, Wu and Fu, Yili and Black, Michael J. and Mei, Tao},
doi = {10.1109/ICCV48922.2021.01099},
month_numeric = {10}
}