Back

Perceiving Systems Members Publications

Multi-Camera Capture

Humanpose3dmulti
Top row: (left) In [File Icon], bodies are represented by a part-based graphical model in space and time. (middle) Messages between parts are represented by particles. (right) Non-parametric belief propagation computes message products. Bottom row: In [File Icon], we segment and fit bodies multi-camera images. (a) Articulated template models. (b) Input silhouettes. (c) Segmentation. (d) Contour labels assigned to each person. (e) Estimated surface. (f) Estimated 3D models with embedded skeletons.

Members

Thumb ticker sm headshot2021
Perceiving Systems
Director
no image
Perceiving Systems
Thumb ticker sm img 20170501 231243
Perceiving Systems
Affiliated Researcher

Publications

Perceiving Systems Conference Paper Human Pose Estimation: New Benchmark and State of the Art Analysis Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 3686 - 3693, IEEE, IEEE International Conference on Computer Vision and Pattern Recognition, June 2014 pdf DOI BibTeX

Perceiving Systems Article Markerless Motion Capture of Multiple Characters Using Multi-view Image Segmentation Liu, Y., Gall, J., Stoll, C., Dai, Q., Seidel, H., Theobalt, C. Transactions on Pattern Analysis and Machine Intelligence, 35(11):2720-2735, 2013
Capturing the skeleton motion and detailed time-varying surface geometry of multiple, closely interacting peoples is a very challenging task, even in a multicamera setup, due to frequent occlusions and ambiguities in feature-to-person assignments. To address this task, we propose a framework that exploits multiview image segmentation. To this end, a probabilistic shape and appearance model is employed to segment the input images and to assign each pixel uniquely to one person. Given the articulated template models of each person and the labeled pixels, a combined optimization scheme, which splits the skeleton pose optimization problem into a local one and a lower dimensional global one, is applied one by one to each individual, followed with surface estimation to capture detailed nonrigid deformations. We show on various sequences that our approach can capture the 3D motion of humans accurately even if they move rapidly, if they wear wide apparel, and if they are engaged in challenging multiperson motions, including dancing, wrestling, and hugging.
data and video pdf DOI BibTeX

Perceiving Systems Article Coupled Action Recognition and Pose Estimation from Multiple Views Yao, A., Gall, J., van Gool, L. International Journal of Computer Vision, 100(1):16-37, October 2012 publisher's site code pdf BibTeX

Perceiving Systems Article Loose-limbed People: Estimating 3D Human Pose and Motion Using Non-parametric Belief Propagation Sigal, L., Isard, M., Haussecker, H., Black, M. J. International Journal of Computer Vision, 98(1):15-48, Springer Netherlands, May 2011
We formulate the problem of 3D human pose estimation and tracking as one of inference in a graphical model. Unlike traditional kinematic tree representations, our model of the body is a collection of loosely-connected body-parts. In particular, we model the body using an undirected graphical model in which nodes correspond to parts and edges to kinematic, penetration, and temporal constraints imposed by the joints and the world. These constraints are encoded using pair-wise statistical distributions, that are learned from motion-capture training data. Human pose and motion estimation is formulated as inference in this graphical model and is solved using Particle Message Passing (PaMPas). PaMPas is a form of non-parametric belief propagation that uses a variation of particle filtering that can be applied over a general graphical model with loops. The loose-limbed model and decentralized graph structure allow us to incorporate information from "bottom-up" visual cues, such as limb and head detectors, into the inference process. These detectors enable automatic initialization and aid recovery from transient tracking failures. We illustrate the method by automatically tracking people in multi-view imagery using a set of calibrated cameras and present quantitative evaluation using the HumanEva dataset.
pdf publisher's site DOI BibTeX