Optimizing Human Pose and Shape

Institute Homepage

Institute Homepage Sign In

Back

Research Overview

Inferring and exploiting contact

Generative Proxemics: A Prior for 3D Social Interaction from Images

BITE -- Dog Shape and Pose from an Image

HOLD -- inferring 3D hand and object shape from video

MOVER -- Reconstructing 3D Scenes and People using Interaction

Datasets for understanding humans and animals

The Poses for Equine Research Dataset (PFERD)

BEAT2 Dataset for Holistic Co-Speech Gesture Generation

ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation

The BioAMASS Dataset

OpenCapBench dataset

Human health and the 3D body

Body Shape Models in Treating Anorexia Nervosa

Customized Bone Plants for Humerus Shaft Fractures

Reconstructing Signing Avatars From Video Using Linguistic Priors

The AI animator

HAAR: Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles

Gaussian Garments

PuzzleAvatar: Assembling 3D Avatars from Personal Albums

FLARE: Fast Learning of Animatable and Relightable Mesh Avatars

Language, Vision, and World Models

AWOL: Analysis WithOut synthesis using Language

Re-Thinking Inverse Graphics with Large Language Models

TeCH: Text-guided Reconstruction of Clothed Humans

Human pose, shape, and motion capture

WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion

3D Human Pose Estimation via Intuitive Physics

Accurate 3D Body Shape Regression using Metric and Semantic Attributes

BEV

Generating human motion

Generating Human Interaction Motions in Scenes with Text Control

TEMOS: Generating Diverse Human Motions from Text

EMAGE: Full-body Gestures from Audio

TEACH: Temporal Action Compositions for 3D Humans

Robot Perception Group

AirCap: 3D Motion Capture

AirCap: Perception-Based Control

AirCapRL: Aerial Motion Capture Using Deep RL

Data Team

Lab Tours and Public Outreach

Collecting Data - From the Idea to the Publication

Capture Technologies Setup

Completed Projects

Human Pose, Shape and Action

3D Pose from Images

2D Pose from Images

Beyond Motion Capture

Action and Behavior

Body Perception

Body Applications

Pose and Motion Priors

Clothing Models (2011-2015)

Reflectance Filtering

Learning on Manifolds

Markerless Animal Motion Capture

Multi-Camera Capture

2D Pose from Optical Flow

Body Perception

Neural Prosthetics and Decoding

Part-based Body Models

Intrinsic Depth

Lie Bodies

Layers, Time and Segmentation

Understanding Action Recognition (JHMDB)

Intrinsic Video

Intrinsic Images

Action Recognition with Tracking

Neural Control of Grasping

Flowing Puppets

Faces

Deformable Structures

Model-based Anthropometry

Modeling 3D Human Breathing

Optical flow in the LGN

FlowCap

Smooth Loops from Unconstrained Video

PCA Flow

Efficient and Scalable Inference

Motion Blur in Layers

Facade Segmentation

Smooth Metric Learning

Robust PCA

3D Recognition

Object Detection

Perceiving Systems Members Publications

Optimizing Human Pose and Shape

Sab 2016 2021 optimization v2 — Human inference via optimization. (Left) SMPLify estimates configurations of the SMPL body model from 2D body joints detected in images. (Middle) SMPLify-X estimates SMPL-X from whole-body 2D landmarks; note the expressive face and fingers. (Right) SMPLify-X humans (yellow) penetrate 3D objects; PROX (gray) extends it to use a 3D scene scan to encourage contact between bodies and objects, while discouraging inter-penetrations.

While data-driven methods for directly regressing 3D humans from 2D images are widely popular, optimization-based methods continue to play an important role. While typically slower than regression methods, optimization approaches require no training data, can be quickly adapted to new problems, and produce image-aligned results. In our view, the two approaches are not competing, but rather, complimentary.

Optimization-based approaches directly fit a 3D body model like SMPL to image observations (e.g., detected joint locations, edges, silhouettes, semantic segmentations, etc.). We introduced the first such method, SMPLify [], which optimizes SMPL pose and shape to minimize the 2D error between detected joints and projected SMPL joints. Because of the inherent ambiguity in estimating 3D from 2D, SMPLify introduced a pose prior trained on mocap data and a term that discouraged self-penetration.

With SMPLify-X [] we extend this concept to estimate the expressive SMPL-X model by fitting it to 2D landmarks from OpenPose. SMPLify-X introduced several improvements including a gender classifier so that the estimated body shapes better matched the image. We also introduced a better VAE-based pose prior, VPoser, trained on AMASS, and we improved the interpenetration detection.

Because images with ground-truth human pose and shape are hard to obtain, these optimization methods provide critical pseudo ground truth for training deep regression networks. For example, we use SMPLify-X to obtain SMPL-X fits to images and use these to train ExPose []. With SPIN [], we showed that an even tighter integration of regression and optimization is valuable and synergistic. SPIN uses a regressor to initialize SMPLify, which is then run for a few optimization steps, improving the fit. These improved fits are then used to retrain the regressor. By doing this in a loop, we incrementally obtain better training data and a better regressor. This training approach is now widely used.

The basic SMPLify(-X) approach is easily adapted to new problems making it a foundational tool in our research. For example, we extended it to perform multi-view fitting and use silhouettes [], which we exploited to create the AGORA [] and SPEC-MTP [] datasets. We use it with aerial vehicles to simultaneously solve for camera extrinsics and body pose in multi-view images []. We adapted it to RGB-D images by including a depth loss and scene contact constraints in the objective function, enabling the creation of the PROX dataset []. We added constraints related to self-contact and exploited this to create the training and test data for TUCH [].

Members

Affiliated Researcher

Perceiving Systems

Javier Romero

Affiliated Researcher

Robust Machine Learning

Research Group Leader

Intern

Research Scientist

Perceiving Systems

Ahmed Osman

Guest Scientist

Perceiving Systems

Dimitris Tzionas

Guest Scientist

Publications

Perceiving Systems Conference Paper SPEC: Seeing People in the Wild with an Estimated Camera Kocabas, M., Huang, C. P., Tesch, J., Müller, L., Hilliges, O., Black, M. J. In Proc. International Conference on Computer Vision (ICCV), :11015-11025, IEEE, Piscataway, NJ, International Conference on Computer Vision, October 2021 (Published) pdf supp arXiv code video project website poster DOI BibTeX

Perceiving Systems Conference Paper On Self-Contact and Human Pose Müller, L., Osman, A. A. A., Tang, S., Huang, C. P., Black, M. J. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), :9985-9994, IEEE, Piscataway, NJ, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), June 2021 (Published) project arXiv poster video code DOI BibTeX

Perceiving Systems Conference Paper Monocular Expressive Body Regression through Body-Driven Attention Choutas, V., Pavlakos, G., Bolkart, T., Tzionas, D., Black, M. J. In Computer Vision - ECCV 2020, 10:20-40, Lecture Notes in Computer Science, 12355, (Editors: Vedaldi, Andrea and Bischof, Horst and Brox, Thomas and Frahm, Jan-Michael), Springer, Cham, 16th European Conference on Computer Vision (ECCV 2020), August 2020 (Published) code Short video Long video arxiv pdf suppl DOI URL BibTeX

Perceiving Systems Conference Paper Markerless Outdoor Human Motion Capture Using Multiple Autonomous Micro Aerial Vehicles Saini, N., Price, E., Tallamraju, R., Enficiaud, R., Ludwig, R., Martinović, I., Ahmad, A., Black, M. Proceedings 2019 IEEE/CVF International Conference on Computer Vision (ICCV), :823-832, IEEE, International Conference on Computer Vision (ICCV), October 2019 (Published) Code Data Video Paper Manuscript DOI BibTeX

Perceiving Systems Conference Paper Resolving 3D Human Pose Ambiguities with 3D Scene Constraints Hassan, M., Choutas, V., Tzionas, D., Black, M. J. In International Conference on Computer Vision (ICCV), :2282-2292, October 2019 (Published) pdf poster DOI URL BibTeX

Perceiving Systems Conference Paper Expressive Body Capture: 3D Hands, Face, and Body from a Single Image Pavlakos, G., Choutas, V., Ghorbani, N., Bolkart, T., Osman, A. A. A., Tzionas, D., Black, M. J. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR) , :10975-10985, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2019 () video code pdf suppl poster DOI URL BibTeX

Perceiving Systems Conference Paper Unite the People: Closing the Loop Between 3D and 2D Human Representations Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M. J., Gehler, P. V. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, :4704-4713, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 () arXiv project/code/data BibTeX

Perceiving Systems Conference Paper Towards Accurate Marker-less Human Shape and Pose Estimation over Time Huang, Y., Bogo, F., Lassner, C., Kanazawa, A., Gehler, P. V., Romero, J., Akhter, I., Black, M. J. In International Conference on 3D Vision (3DV), :421-430, 2017 () Code pdf DOI BibTeX

Perceiving Systems Conference Paper Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M. J. In Computer Vision – ECCV 2016, :561-578, Lecture Notes in Computer Science, Springer International Publishing, 14th European Conference on Computer Vision, October 2016 () pdf Video Sup Mat video Code Project ppt BibTeX