Data Infrastructure for Scaling up Human Understanding and Modelling to the Real World (Talk)
Human sensing and modelling are fundamental tasks in vision and graphics with numerous applications. However, due to the prohibitive cost, existing datasets are often limited in scale and diversity. This talk shares two of our recent works to tackle data scarcity. First, with the advances of new sensors and algorithms, paired data can be obtained from an inexpensive set-up and an automatic annotation pipeline. Specifically, we demonstrate the data collection solution by introducing HuMMan, a large-scale multimodal 4D human dataset. HuMMan has several appealing properties: 1) multimodal data and annotations including color images, point clouds, keypoints, SMPL parameters, and textured meshes; 2) popular mobile device is included in the sensor suite; 3) a set of 500 actions, designed to cover fundamental movements; 4) multiple tasks such as action recognition, pose estimation, parametric human recovery. Second, synthetic data could be a scalable complement to real data. We build GTA-Human, a large-scale 3D human dataset generated with the GTA-V game engine, featuring a highly diverse set of subjects, actions, and scenarios.
Biography: Zhongang Cai is a second-year Ph.D. student from S-Lab at Nanyang Technological University, advised by Prof. Ziwei Liu and Prof. Chen Change Loy, and a Senior Algorithm Researcher at SenseTime Research. Previously, he attained his bachelor’s degree at Nanyang Technological University. His research interest includes building systems and algorithms that perceive, reconstruct, and generate humans. Homepage: https://caizhongang.github.io/.
3D human pose estimation
3D human reconstruction
synthetic data