Header logo is


2018


Thumb xl learn etc
Deep Reinforcement Learning for Event-Triggered Control

Baumann, D., Zhu, J., Martius, G., Trimpe, S.

In Proceedings of the 57th IEEE International Conference on Decision and Control (CDC), pages: 943-950, 57th IEEE International Conference on Decision and Control (CDC), December 2018 (inproceedings)

al ics

arXiv PDF DOI Project Page Project Page [BibTex]

2018


arXiv PDF DOI Project Page Project Page [BibTex]


Thumb xl imgidx 00326
Customized Multi-Person Tracker

Ma, L., Tang, S., Black, M. J., Gool, L. V.

In Computer Vision – ACCV 2018, Springer International Publishing, Asian Conference on Computer Vision, December 2018 (inproceedings)

ps

PDF Project Page [BibTex]

PDF Project Page [BibTex]


Thumb xl dip final
Deep Inertial Poser: Learning to Reconstruct Human Pose from Sparse Inertial Measurements in Real Time

Huang, Y., Kaufmann, M., Aksan, E., Black, M. J., Hilliges, O., Pons-Moll, G.

ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 37, pages: 185:1-185:15, ACM, November 2018, Two first authors contributed equally (article)

Abstract
We demonstrate a novel deep neural network capable of reconstructing human full body pose in real-time from 6 Inertial Measurement Units (IMUs) worn on the user's body. In doing so, we address several difficult challenges. First, the problem is severely under-constrained as multiple pose parameters produce the same IMU orientations. Second, capturing IMU data in conjunction with ground-truth poses is expensive and difficult to do in many target application scenarios (e.g., outdoors). Third, modeling temporal dependencies through non-linear optimization has proven effective in prior work but makes real-time prediction infeasible. To address this important limitation, we learn the temporal pose priors using deep learning. To learn from sufficient data, we synthesize IMU data from motion capture datasets. A bi-directional RNN architecture leverages past and future information that is available at training time. At test time, we deploy the network in a sliding window fashion, retaining real time capabilities. To evaluate our method, we recorded DIP-IMU, a dataset consisting of 10 subjects wearing 17 IMUs for validation in 64 sequences with 330,000 time instants; this constitutes the largest IMU dataset publicly available. We quantitatively evaluate our approach on multiple datasets and show results from a real-time implementation. DIP-IMU and the code are available for research purposes.

ps

data code pdf preprint video DOI Project Page [BibTex]

data code pdf preprint video DOI Project Page [BibTex]


Thumb xl sevillagcpr
On the Integration of Optical Flow and Action Recognition

Sevilla-Lara, L., Liao, Y., Güney, F., Jampani, V., Geiger, A., Black, M. J.

In German Conference on Pattern Recognition (GCPR), LNCS 11269, pages: 281-297, Springer, Cham, October 2018 (inproceedings)

Abstract
Most of the top performing action recognition methods use optical flow as a "black box" input. Here we take a deeper look at the combination of flow and action recognition, and investigate why optical flow is helpful, what makes a flow method good for action recognition, and how we can make it better. In particular, we investigate the impact of different flow algorithms and input transformations to better understand how these affect a state-of-the-art action recognition method. Furthermore, we fine tune two neural-network flow methods end-to-end on the most widely used action recognition dataset (UCF101). Based on these experiments, we make the following five observations: 1) optical flow is useful for action recognition because it is invariant to appearance, 2) optical flow methods are optimized to minimize end-point-error (EPE), but the EPE of current methods is not well correlated with action recognition performance, 3) for the flow methods tested, accuracy at boundaries and at small displacements is most correlated with action recognition performance, 4) training optical flow to minimize classification error instead of minimizing EPE improves recognition performance, and 5) optical flow learned for the task of action recognition differs from traditional optical flow especially inside the human body and at the boundary of the body. These observations may encourage optical flow researchers to look beyond EPE as a goal and guide action recognition researchers to seek better motion cues, leading to a tighter integration of the optical flow and action recognition communities.

avg ps

arXiv DOI [BibTex]

arXiv DOI [BibTex]


Thumb xl cover
Deep Neural Network-based Cooperative Visual Tracking through Multiple Micro Aerial Vehicles

Price, E., Lawless, G., Ludwig, R., Martinovic, I., Buelthoff, H. H., Black, M. J., Ahmad, A.

IEEE Robotics and Automation Letters, Robotics and Automation Letters, 3(4):3193-3200, IEEE, October 2018, Also accepted and presented in the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). (article)

Abstract
Multi-camera tracking of humans and animals in outdoor environments is a relevant and challenging problem. Our approach to it involves a team of cooperating micro aerial vehicles (MAVs) with on-board cameras only. DNNs often fail at objects with small scale or far away from the camera, which are typical characteristics of a scenario with aerial robots. Thus, the core problem addressed in this paper is how to achieve on-board, online, continuous and accurate vision-based detections using DNNs for visual person tracking through MAVs. Our solution leverages cooperation among multiple MAVs and active selection of most informative regions of image. We demonstrate the efficiency of our approach through simulations with up to 16 robots and real robot experiments involving two aerial robots tracking a person, while maintaining an active perception-driven formation. ROS-based source code is provided for the benefit of the community.

ps

Published Version link (url) DOI [BibTex]

Published Version link (url) DOI [BibTex]


Thumb xl interpolation
Temporal Interpolation as an Unsupervised Pretraining Task for Optical Flow Estimation

Wulff, J., Black, M. J.

In German Conference on Pattern Recognition (GCPR), LNCS 11269, pages: 567-582, Springer, Cham, October 2018 (inproceedings)

Abstract
The difficulty of annotating training data is a major obstacle to using CNNs for low-level tasks in video. Synthetic data often does not generalize to real videos, while unsupervised methods require heuristic n losses. Proxy tasks can overcome these issues, and start by training a network for a task for which annotation is easier or which can be trained unsupervised. The trained network is then fine-tuned for the original task using small amounts of ground truth data. Here, we investigate frame interpolation as a proxy task for optical flow. Using real movies, we train a CNN unsupervised for temporal interpolation. Such a network implicitly estimates motion, but cannot handle untextured regions. By fi ne-tuning on small amounts of ground truth flow, the network can learn to fill in homogeneous regions and compute full optical flow fi elds. Using this unsupervised pre-training, our network outperforms similar architectures that were trained supervised using synthetic optical flow.

ps

pdf arXiv DOI Project Page [BibTex]

pdf arXiv DOI Project Page [BibTex]


Thumb xl alice
First Impressions of Personality Traits From Body Shapes

Hu, Y., Parde, C. J., Hill, M. Q., Mahmood, N., O’Toole, A. J.

Psychological Science, 29(12):1969-–1983, October 2018 (article)

Abstract
People infer the personalities of others from their facial appearance. Whether they do so from body shapes is less studied. We explored personality inferences made from body shapes. Participants rated personality traits for male and female bodies generated with a three-dimensional body model. Multivariate spaces created from these ratings indicated that people evaluate bodies on valence and agency in ways that directly contrast positive and negative traits from the Big Five domains. Body-trait stereotypes based on the trait ratings revealed a myriad of diverse body shapes that typify individual traits. Personality-trait profiles were predicted reliably from a subset of the body-shape features used to specify the three-dimensional bodies. Body features related to extraversion and conscientiousness were predicted with the highest consensus, followed by openness traits. This study provides the first comprehensive look at the range, diversity, and reliability of personality inferences that people make from body shapes.

ps

publisher site pdf DOI [BibTex]

publisher site pdf DOI [BibTex]


Thumb xl bmvc pic
Human Motion Parsing by Hierarchical Dynamic Clustering

Zhang, Y., Tang, S., Sun, H., Neumann, H.

In Proceedings of the British Machine Vision Conference (BMVC), pages: 269, BMVA Press, 29th British Machine Vision Conference, September 2018 (inproceedings)

Abstract
Parsing continuous human motion into meaningful segments plays an essential role in various applications. In this work, we propose a hierarchical dynamic clustering framework to derive action clusters from a sequence of local features in an unsuper- vised bottom-up manner. We systematically investigate the modules in this framework and particularly propose diverse temporal pooling schemes, in order to realize accurate temporal action localization. We demonstrate our method on two motion parsing tasks: temporal action segmentation and abnormal behavior detection. The experimental results indicate that the proposed framework is significantly more effective than the other related state-of-the-art methods on several datasets.

ps

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Thumb xl coma faces
Generating 3D Faces using Convolutional Mesh Autoencoders

Ranjan, A., Bolkart, T., Sanyal, S., Black, M. J.

In European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science, vol 11207, pages: 725-741, Springer, Cham, September 2018 (inproceedings)

Abstract
Learned 3D representations of human faces are useful for computer vision problems such as 3D face tracking and reconstruction from images, as well as graphics applications such as character generation and animation. Traditional models learn a latent representation of a face using linear subspaces or higher-order tensor generalizations. Due to this linearity, they can not capture extreme deformations and non-linear expressions. To address this, we introduce a versatile model that learns a non-linear representation of a face using spectral convolutions on a mesh surface. We introduce mesh sampling operations that enable a hierarchical mesh representation that captures non-linear variations in shape and expression at multiple scales within the model. In a variational setting, our model samples diverse realistic 3D faces from a multivariate Gaussian distribution. Our training data consists of 20,466 meshes of extreme expressions captured over 12 different subjects. Despite limited training data, our trained model outperforms state-of-the-art face models with 50% lower reconstruction error, while using 75% fewer parameters. We also show that, replacing the expression space of an existing state-of-the-art face model with our autoencoder, achieves a lower reconstruction error. Our data, model and code are available at http://coma.is.tue.mpg.de/.

ps

Code (tensorflow) Code (pytorch) Project Page paper supplementary DOI Project Page Project Page [BibTex]

Code (tensorflow) Code (pytorch) Project Page paper supplementary DOI Project Page Project Page [BibTex]


Thumb xl person reid.001
Part-Aligned Bilinear Representations for Person Re-identification

Suh, Y., Wang, J., Tang, S., Mei, T., Lee, K. M.

In European Conference on Computer Vision (ECCV), 11218, pages: 418-437, Springer, Cham, September 2018 (inproceedings)

Abstract
Comparing the appearance of corresponding body parts is essential for person re-identification. However, body parts are frequently misaligned be- tween detected boxes, due to the detection errors and the pose/viewpoint changes. In this paper, we propose a network that learns a part-aligned representation for person re-identification. Our model consists of a two-stream network, which gen- erates appearance and body part feature maps respectively, and a bilinear-pooling layer that fuses two feature maps to an image descriptor. We show that it results in a compact descriptor, where the inner product between two image descriptors is equivalent to an aggregation of the local appearance similarities of the cor- responding body parts, and thereby significantly reduces the part misalignment problem. Our approach is advantageous over other pose-guided representations by learning part descriptors optimal for person re-identification. Training the net- work does not require any part annotation on the person re-identification dataset. Instead, we simply initialize the part sub-stream using a pre-trained sub-network of an existing pose estimation network and train the whole network to minimize the re-identification loss. We validate the effectiveness of our approach by demon- strating its superiority over the state-of-the-art methods on the standard bench- mark datasets including Market-1501, CUHK03, CUHK01 and DukeMTMC, and standard video dataset MARS.

ps

pdf supplementary DOI Project Page [BibTex]

pdf supplementary DOI Project Page [BibTex]


Thumb xl persondetect  copy
Learning Human Optical Flow

Ranjan, A., Romero, J., Black, M. J.

In 29th British Machine Vision Conference, September 2018 (inproceedings)

Abstract
The optical flow of humans is well known to be useful for the analysis of human action. Given this, we devise an optical flow algorithm specifically for human motion and show that it is superior to generic flow methods. Designing a method by hand is impractical, so we develop a new training database of image sequences with ground truth optical flow. For this we use a 3D model of the human body and motion capture data to synthesize realistic flow fields. We then train a convolutional neural network to estimate human flow fields from pairs of images. Since many applications in human motion analysis depend on speed, and we anticipate mobile applications, we base our method on SpyNet with several modifications. We demonstrate that our trained network is more accurate than a wide range of top methods on held-out test data and that it generalizes well to real image sequences. When combined with a person detector/tracker, the approach provides a full solution to the problem of 2D human flow estimation. Both the code and the dataset are available for research.

ps

video code pdf link (url) Project Page Project Page [BibTex]

video code pdf link (url) Project Page Project Page [BibTex]


Thumb xl nbf
Neural Body Fitting: Unifying Deep Learning and Model-Based Human Pose and Shape Estimation

(Best Student Paper Award)

Omran, M., Lassner, C., Pons-Moll, G., Gehler, P. V., Schiele, B.

In 3DV, September 2018 (inproceedings)

Abstract
Direct prediction of 3D body pose and shape remains a challenge even for highly parameterized deep learning models. Mapping from the 2D image space to the prediction space is difficult: perspective ambiguities make the loss function noisy and training data is scarce. In this paper, we propose a novel approach (Neural Body Fitting (NBF)). It integrates a statistical body model within a CNN, leveraging reliable bottom-up semantic body part segmentation and robust top-down body model constraints. NBF is fully differentiable and can be trained using 2D and 3D annotations. In detailed experiments, we analyze how the components of our model affect performance, especially the use of part segmentations as an explicit intermediate representation, and present a robust, efficiently trainable framework for 3D human pose estimation from 2D images with competitive results on standard benchmarks. Code is available at https://github.com/mohomran/neural_body_fitting

ps

arXiv code Project Page [BibTex]


Thumb xl joeleccv18
Unsupervised Learning of Multi-Frame Optical Flow with Occlusions

Janai, J., Güney, F., Ranjan, A., Black, M. J., Geiger, A.

In European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science, vol 11220, pages: 713-731, Springer, Cham, September 2018 (inproceedings)

avg ps

pdf suppmat Video Project Page DOI Project Page [BibTex]

pdf suppmat Video Project Page DOI Project Page [BibTex]


Thumb xl sample3 merge black
Learning an Infant Body Model from RGB-D Data for Accurate Full Body Motion Analysis

Hesse, N., Pujades, S., Romero, J., Black, M. J., Bodensteiner, C., Arens, M., Hofmann, U. G., Tacke, U., Hadders-Algra, M., Weinberger, R., Muller-Felber, W., Schroeder, A. S.

In Int. Conf. on Medical Image Computing and Computer Assisted Intervention (MICCAI), September 2018 (inproceedings)

Abstract
Infant motion analysis enables early detection of neurodevelopmental disorders like cerebral palsy (CP). Diagnosis, however, is challenging, requiring expert human judgement. An automated solution would be beneficial but requires the accurate capture of 3D full-body movements. To that end, we develop a non-intrusive, low-cost, lightweight acquisition system that captures the shape and motion of infants. Going beyond work on modeling adult body shape, we learn a 3D Skinned Multi-Infant Linear body model (SMIL) from noisy, low-quality, and incomplete RGB-D data. We demonstrate the capture of shape and motion with 37 infants in a clinical environment. Quantitative experiments show that SMIL faithfully represents the data and properly factorizes the shape and pose of the infants. With a case study based on general movement assessment (GMA), we demonstrate that SMIL captures enough information to allow medical assessment. SMIL provides a new tool and a step towards a fully automatic system for GMA.

ps

pdf Project page video extended arXiv version DOI Project Page [BibTex]

pdf Project page video extended arXiv version DOI Project Page [BibTex]


Thumb xl eccv pascal results  thumbnail
Deep Directional Statistics: Pose Estimation with Uncertainty Quantification

Prokudin, S., Gehler, P., Nowozin, S.

European Conference on Computer Vision (ECCV), September 2018 (conference)

Abstract
Modern deep learning systems successfully solve many perception tasks such as object pose estimation when the input image is of high quality. However, in challenging imaging conditions such as on low resolution images or when the image is corrupted by imaging artifacts, current systems degrade considerably in accuracy. While a loss in performance is unavoidable we would like our models to quantify their uncertainty in order to achieve robustness against images of varying quality. Probabilistic deep learning models combine the expressive power of deep learning with uncertainty quantification. In this paper, we propose a novel probabilistic deep learning model for the task of angular regression. Our model uses von Mises distributions to predict a distribution over object pose angle. Whereas a single von Mises distribution is making strong assumptions about the shape of the distribution, we extend the basic model to predict a mixture of von Mises distributions. We show how to learn a mixture model using a finite and infinite number of mixture components. Our model allow for likelihood-based training and efficient inference at test time. We demonstrate on a number of challenging pose estimation datasets that our model produces calibrated probability predictions and competitive or superior point estimates compared to the current state-of-the-art.

ps

code pdf [BibTex]

code pdf [BibTex]


Thumb xl vip
Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera

Marcard, T. V., Henschel, R., Black, M. J., Rosenhahn, B., Pons-Moll, G.

In European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science, vol 11214, pages: 614-631, Springer, Cham, September 2018 (inproceedings)

Abstract
In this work, we propose a method that combines a single hand-held camera and a set of Inertial Measurement Units (IMUs) attached at the body limbs to estimate accurate 3D poses in the wild. This poses many new challenges: the moving camera, heading drift, cluttered background, occlusions and many people visible in the video. We associate 2D pose detections in each image to the corresponding IMU-equipped persons by solving a novel graph based optimization problem that forces 3D to 2D coherency within a frame and across long range frames. Given associations, we jointly optimize the pose of a statistical body model, the camera pose and heading drift using a continuous optimization framework. We validated our method on the TotalCapture dataset, which provides video and IMU synchronized with ground truth. We obtain an accuracy of 26mm, which makes it accurate enough to serve as a benchmark for image-based 3D pose estimation in the wild. Using our method, we recorded 3D Poses in the Wild (3DPW ), a new dataset consisting of more than 51; 000 frames with accurate 3D pose in challenging sequences, including walking in the city, going up-stairs, having co ffee or taking the bus. We make the reconstructed 3D poses, video, IMU and 3D models available for research purposes at http://virtualhumans.mpi-inf.mpg.de/3DPW.

ps

pdf SupMat data project DOI Project Page [BibTex]

pdf SupMat data project DOI Project Page [BibTex]


Thumb xl fict 05 00018 g003
Visual Perception and Evaluation of Photo-Realistic Self-Avatars From 3D Body Scans in Males and Females

Thaler, A., Piryankova, I., Stefanucci, J. K., Pujades, S., de la Rosa, S., Streuber, S., Romero, J., Black, M. J., Mohler, B. J.

Frontiers in ICT, 5, pages: 1-14, September 2018 (article)

Abstract
The creation or streaming of photo-realistic self-avatars is important for virtual reality applications that aim for perception and action to replicate real world experience. The appearance and recognition of a digital self-avatar may be especially important for applications related to telepresence, embodied virtual reality, or immersive games. We investigated gender differences in the use of visual cues (shape, texture) of a self-avatar for estimating body weight and evaluating avatar appearance. A full-body scanner was used to capture each participant's body geometry and color information and a set of 3D virtual avatars with realistic weight variations was created based on a statistical body model. Additionally, a second set of avatars was created with an average underlying body shape matched to each participant’s height and weight. In four sets of psychophysical experiments, the influence of visual cues on the accuracy of body weight estimation and the sensitivity to weight changes was assessed by manipulating body shape (own, average) and texture (own photo-realistic, checkerboard). The avatars were presented on a large-screen display, and participants responded to whether the avatar's weight corresponded to their own weight. Participants also adjusted the avatar's weight to their desired weight and evaluated the avatar's appearance with regard to similarity to their own body, uncanniness, and their willingness to accept it as a digital representation of the self. The results of the psychophysical experiments revealed no gender difference in the accuracy of estimating body weight in avatars. However, males accepted a larger weight range of the avatars as corresponding to their own. In terms of the ideal body weight, females but not males desired a thinner body. With regard to the evaluation of avatar appearance, the questionnaire responses suggest that own photo-realistic texture was more important to males for higher similarity ratings, while own body shape seemed to be more important to females. These results argue for gender-specific considerations when creating self-avatars.

ps

pdf DOI [BibTex]

pdf DOI [BibTex]


Thumb xl aircap ca 3
Decentralized MPC based Obstacle Avoidance for Multi-Robot Target Tracking Scenarios

Tallamraju, R., Rajappa, S., Black, M. J., Karlapalem, K., Ahmad, A.

2018 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), pages: 1-8, IEEE, August 2018 (conference)

Abstract
In this work, we consider the problem of decentralized multi-robot target tracking and obstacle avoidance in dynamic environments. Each robot executes a local motion planning algorithm which is based on model predictive control (MPC). The planner is designed as a quadratic program, subject to constraints on robot dynamics and obstacle avoidance. Repulsive potential field functions are employed to avoid obstacles. The novelty of our approach lies in embedding these non-linear potential field functions as constraints within a convex optimization framework. Our method convexifies nonconvex constraints and dependencies, by replacing them as pre-computed external input forces in robot dynamics. The proposed algorithm additionally incorporates different methods to avoid field local minima problems associated with using potential field functions in planning. The motion planner does not enforce predefined trajectories or any formation geometry on the robots and is a comprehensive solution for cooperative obstacle avoidance in the context of multi-robot target tracking. We perform simulation studies for different scenarios to showcase the convergence and efficacy of the proposed algorithm.

ps

Published Version link (url) DOI [BibTex]

Published Version link (url) DOI [BibTex]


Thumb xl mazen
Robust Physics-based Motion Retargeting with Realistic Body Shapes

Borno, M. A., Righetti, L., Black, M. J., Delp, S. L., Fiume, E., Romero, J.

Computer Graphics Forum, 37, pages: 6:1-12, July 2018 (article)

Abstract
Motion capture is often retargeted to new, and sometimes drastically different, characters. When the characters take on realistic human shapes, however, we become more sensitive to the motion looking right. This means adapting it to be consistent with the physical constraints imposed by different body shapes. We show how to take realistic 3D human shapes, approximate them using a simplified representation, and animate them so that they move realistically using physically-based retargeting. We develop a novel spacetime optimization approach that learns and robustly adapts physical controllers to new bodies and constraints. The approach automatically adapts the motion of the mocap subject to the body shape of a target subject. This motion respects the physical properties of the new body and every body shape results in a different and appropriate movement. This makes it easy to create a varied set of motions from a single mocap sequence by simply varying the characters. In an interactive environment, successful retargeting requires adapting the motion to unexpected external forces. We achieve robustness to such forces using a novel LQR-tree formulation. We show that the simulated motions look appropriate to each character’s anatomy and their actions are robust to perturbations.

mg ps

pdf video Project Page Project Page [BibTex]

pdf video Project Page Project Page [BibTex]


no image
Nonlinear decoding of a complex movie from the mammalian retina

Botella-Soler, V., Deny, S., Martius, G., Marre, O., Tkačik, G.

PLOS Computational Biology, 14(5):1-27, Public Library of Science, May 2018 (article)

Abstract
Author summary Neurons in the retina transform patterns of incoming light into sequences of neural spikes. We recorded from ∼100 neurons in the rat retina while it was stimulated with a complex movie. Using machine learning regression methods, we fit decoders to reconstruct the movie shown from the retinal output. We demonstrated that retinal code can only be read out with a low error if decoders make use of correlations between successive spikes emitted by individual neurons. These correlations can be used to ignore spontaneous spiking that would, otherwise, cause even the best linear decoders to “hallucinate” nonexistent stimuli. This work represents the first high resolution single-trial full movie reconstruction and suggests a new paradigm for separating spontaneous from stimulus-driven neural activity.

al

DOI [BibTex]

DOI [BibTex]


Thumb xl thesis cover2
Model-based Optical Flow: Layers, Learning, and Geometry

Wulff, J.

Tuebingen University, April 2018 (phdthesis)

Abstract
The estimation of motion in video sequences establishes temporal correspondences between pixels and surfaces and allows reasoning about a scene using multiple frames. Despite being a focus of research for over three decades, computing motion, or optical flow, remains challenging due to a number of difficulties, including the treatment of motion discontinuities and occluded regions, and the integration of information from more than two frames. One reason for these issues is that most optical flow algorithms only reason about the motion of pixels on the image plane, while not taking the image formation pipeline or the 3D structure of the world into account. One approach to address this uses layered models, which represent the occlusion structure of a scene and provide an approximation to the geometry. The goal of this dissertation is to show ways to inject additional knowledge about the scene into layered methods, making them more robust, faster, and more accurate. First, this thesis demonstrates the modeling power of layers using the example of motion blur in videos, which is caused by fast motion relative to the exposure time of the camera. Layers segment the scene into regions that move coherently while preserving their occlusion relationships. The motion of each layer therefore directly determines its motion blur. At the same time, the layered model captures complex blur overlap effects at motion discontinuities. Using layers, we can thus formulate a generative model for blurred video sequences, and use this model to simultaneously deblur a video and compute accurate optical flow for highly dynamic scenes containing motion blur. Next, we consider the representation of the motion within layers. Since, in a layered model, important motion discontinuities are captured by the segmentation into layers, the flow within each layer varies smoothly and can be approximated using a low dimensional subspace. We show how this subspace can be learned from training data using principal component analysis (PCA), and that flow estimation using this subspace is computationally efficient. The combination of the layered model and the low-dimensional subspace gives the best of both worlds, sharp motion discontinuities from the layers and computational efficiency from the subspace. Lastly, we show how layered methods can be dramatically improved using simple semantics. Instead of treating all layers equally, a semantic segmentation divides the scene into its static parts and moving objects. Static parts of the scene constitute a large majority of what is shown in typical video sequences; yet, in such regions optical flow is fully constrained by the depth structure of the scene and the camera motion. After segmenting out moving objects, we consider only static regions, and explicitly reason about the structure of the scene and the camera motion, yielding much better optical flow estimates. Furthermore, computing the structure of the scene allows to better combine information from multiple frames, resulting in high accuracies even in occluded regions. For moving regions, we compute the flow using a generic optical flow method, and combine it with the flow computed for the static regions to obtain a full optical flow field. By combining layered models of the scene with reasoning about the dynamic behavior of the real, three-dimensional world, the methods presented herein push the envelope of optical flow computation in terms of robustness, speed, and accuracy, giving state-of-the-art results on benchmarks and pointing to important future research directions for the estimation of motion in natural scenes.

ps

Official link DOI Project Page [BibTex]


Thumb xl animage2mask3
Assessing body image in anorexia nervosa using biometric self-avatars in virtual reality: Attitudinal components rather than visual body size estimation are distorted

Mölbert, S. C., Thaler, A., Mohler, B. J., Streuber, S., Romero, J., Black, M. J., Zipfel, S., Karnath, H., Giel, K. E.

Psychological Medicine, 48(4):642-653, March 2018 (article)

Abstract
Background: Body image disturbance (BID) is a core symptom of anorexia nervosa (AN), but as yet distinctive features of BID are unknown. The present study aimed at disentangling perceptual and attitudinal components of BID in AN. Methods: We investigated n=24 women with AN and n=24 controls. Based on a 3D body scan, we created realistic virtual 3D bodies (avatars) for each participant that were varied through a range of ±20% of the participants' weights. Avatars were presented in a virtual reality mirror scenario. Using different psychophysical tasks, participants identified and adjusted their actual and their desired body weight. To test for general perceptual biases in estimating body weight, a second experiment investigated perception of weight and shape matched avatars with another identity. Results: Women with AN and controls underestimated their weight, with a trend that women with AN underestimated more. The average desired body of controls had normal weight while the average desired weight of women with AN corresponded to extreme AN (DSM-5). Correlation analyses revealed that desired body weight, but not accuracy of weight estimation, was associated with eating disorder symptoms. In the second experiment, both groups estimated accurately while the most attractive body was similar to Experiment 1. Conclusions: Our results contradict the widespread assumption that patients with AN overestimate their body weight due to visual distortions. Rather, they illustrate that BID might be driven by distorted attitudes with regard to the desired body. Clinical interventions should aim at helping patients with AN to change their desired weight.

ps

doi pdf DOI Project Page [BibTex]


Thumb xl plos1
Body size estimation of self and others in females varying in BMI

Thaler, A., Geuss, M. N., Mölbert, S. C., Giel, K. E., Streuber, S., Romero, J., Black, M. J., Mohler, B. J.

PLoS ONE, 13(2), Febuary 2018 (article)

Abstract
Previous literature suggests that a disturbed ability to accurately identify own body size may contribute to overweight. Here, we investigated the influence of personal body size, indexed by body mass index (BMI), on body size estimation in a non-clinical population of females varying in BMI. We attempted to disentangle general biases in body size estimates and attitudinal influences by manipulating whether participants believed the body stimuli (personalized avatars with realistic weight variations) represented their own body or that of another person. Our results show that the accuracy of own body size estimation is predicted by personal BMI, such that participants with lower BMI underestimated their body size and participants with higher BMI overestimated their body size. Further, participants with higher BMI were less likely to notice the same percentage of weight gain than participants with lower BMI. Importantly, these results were only apparent when participants were judging a virtual body that was their own identity (Experiment 1), but not when they estimated the size of a body with another identity and the same underlying body shape (Experiment 2a). The different influences of BMI on accuracy of body size estimation and sensitivity to weight change for self and other identity suggests that effects of BMI on visual body size estimation are self-specific and not generalizable to other bodies.

ps

pdf DOI Project Page [BibTex]


Thumb xl coregpatentfig
Co-Registration – Simultaneous Alignment and Modeling of Articulated 3D Shapes

Black, M., Hirshberg, D., Loper, M., Rachlin, E., Weiss, A.

Febuary 2018, U.S.~Patent 9,898,848 (misc)

Abstract
Present application refers to a method, a model generation unit and a computer program (product) for generating trained models (M) of moving persons, based on physically measured person scan data (S). The approach is based on a common template (T) for the respective person and on the measured person scan data (S) in different shapes and different poses. Scan data are measured with a 3D laser scanner. A generic personal model is used for co-registering a set of person scan data (S) aligning the template (T) to the set of person scans (S) while simultaneously training the generic personal model to become a trained person model (M) by constraining the generic person model to be scan-specific, person-specific and pose-specific and providing the trained model (M), based on the co registering of the measured object scan data (S).

ps

text [BibTex]


Thumb xl hmrteaser
End-to-end Recovery of Human Shape and Pose

Kanazawa, A., Black, M. J., Jacobs, D. W., Malik, J.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2018, 2018 (inproceedings)

Abstract
We describe Human Mesh Recovery (HMR), an end-to-end framework for reconstructing a full 3D mesh of a human body from a single RGB image. In contrast to most current methods that compute 2D or 3D joint locations, we produce a richer and more useful mesh representation that is parameterized by shape and 3D joint angles. The main objective is to minimize the reprojection loss of keypoints, which allows our model to be trained using in-the-wild images that only have ground truth 2D annotations. However, the reprojection loss alone is highly underconstrained. In this work we address this problem by introducing an adversary trained to tell whether human body shape and pose parameters are real or not using a large database of 3D human meshes. We show that HMR can be trained with and without using any paired 2D-to-3D supervision. We do not rely on intermediate 2D keypoint detections and infer 3D pose and shape parameters directly from image pixels. Our model runs in real-time given a bounding box containing the person. We demonstrate our approach on various images in-the-wild and out-perform previous optimization-based methods that output 3D meshes and show competitive results on tasks such as 3D joint location estimation and part segmentation.

ps

pdf code project video Project Page [BibTex]

pdf code project video Project Page [BibTex]


no image
L4: Practical loss-based stepsize adaptation for deep learning

Rolinek, M., Martius, G.

In Advances in Neural Information Processing Systems 31 (NeurIPS 2018), pages: 6434-6444, (Editors: S. Bengio and H. Wallach and H. Larochelle and K. Grauman and N. Cesa-Bianchi and R. Garnett), Curran Associates, Inc., 2018 (inproceedings)

al

Github link (url) Project Page [BibTex]

Github link (url) Project Page [BibTex]


Thumb xl publication alife 2018
Systematic self-exploration of behaviors for robots in a dynamical systems framework

Pinneri, C., Martius, G.

In Proc. Artificial Life XI, pages: 319-326, MIT Press, Cambridge, MA, 2018 (inproceedings)

Abstract
One of the challenges of this century is to understand the neural mechanisms behind cognitive control and learning. Recent investigations propose biologically plausible synaptic mechanisms for self-organizing controllers, in the spirit of Hebbian learning. In particular, differential extrinsic plasticity (DEP) [Der and Martius, PNAS 2015], has proven to enable embodied agents to self-organize their individual sensorimotor development, and generate highly coordinated behaviors during their interaction with the environment. These behaviors are attractors of a dynamical system. In this paper, we use the DEP rule to generate attractors and we combine it with a “repelling potential” which allows the system to actively explore all its attractor behaviors in a systematic way. With a view to a self-determined exploration of goal-free behaviors, our framework enables switching between different motion patterns in an autonomous and sequential fashion. Our algorithm is able to recover all the attractor behaviors in a toy system and it is also effective in two simulated environments. A spherical robot discovers all its major rolling modes and a hexapod robot learns to locomote in 50 different ways in 30min.

al

link (url) DOI Project Page [BibTex]

link (url) DOI Project Page [BibTex]


Thumb xl yanzhang clustering
Temporal Human Action Segmentation via Dynamic Clustering

Zhang, Y., Sun, H., Tang, S., Neumann, H.

arXiv preprint arXiv:1803.05790, 2018 (article)

Abstract
We present an effective dynamic clustering algorithm for the task of temporal human action segmentation, which has comprehensive applications such as robotics, motion analysis, and patient monitoring. Our proposed algorithm is unsupervised, fast, generic to process various types of features, and applica- ble in both the online and offline settings. We perform extensive experiments of processing data streams, and show that our algorithm achieves the state-of- the-art results for both online and offline settings.

ps

link (url) [BibTex]

link (url) [BibTex]


Thumb xl motion segmentation tracking clustering teaser
Motion Segmentation & Multiple Object Tracking by Correlation Co-Clustering

Keuper, M., Tang, S., Andres, B., Brox, T., Schiele, B.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018 (article)

ps

pdf DOI Project Page [BibTex]

pdf DOI Project Page [BibTex]


Thumb xl smalrteaser
Lions and Tigers and Bears: Capturing Non-Rigid, 3D, Articulated Shape from Images

Zuffi, S., Kanazawa, A., Black, M. J.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2018, 2018 (inproceedings)

Abstract
Animals are widespread in nature and the analysis of their shape and motion is important in many fields and industries. Modeling 3D animal shape, however, is difficult because the 3D scanning methods used to capture human shape are not applicable to wild animals or natural settings. Consequently, we propose a method to capture the detailed 3D shape of animals from images alone. The articulated and deformable nature of animals makes this problem extremely challenging, particularly in unconstrained environments with moving and uncalibrated cameras. To make this possible, we use a strong prior model of articulated animal shape that we fit to the image data. We then deform the animal shape in a canonical reference pose such that it matches image evidence when articulated and projected into multiple images. Our method extracts significantly more 3D shape detail than previous methods and is able to model new species, including the shape of an extinct animal, using only a few video frames. Additionally, the projected 3D shapes are accurate enough to facilitate the extraction of a realistic texture map from multiple frames.

ps

pdf code/data 3D models Project Page [BibTex]

pdf code/data 3D models Project Page [BibTex]


Thumb xl featured pic
Learning equations for extrapolation and control

Sahoo, S. S., Lampert, C. H., Martius, G.

In Proc. 35th International Conference on Machine Learning, ICML 2018, Stockholm, Sweden, 2018, 80, pages: 4442-4450, http://proceedings.mlr.press/v80/sahoo18a/sahoo18a.pdf, (Editors: Dy, Jennifer and Krause, Andreas), PMLR, 2018 (inproceedings)

Abstract
We present an approach to identify concise equations from data using a shallow neural network approach. In contrast to ordinary black-box regression, this approach allows understanding functional relations and generalizing them from observed data to unseen parts of the parameter space. We show how to extend the class of learnable equations for a recently proposed equation learning network to include divisions, and we improve the learning and model selection strategy to be useful for challenging real-world data. For systems governed by analytical expressions, our method can in many cases identify the true underlying equation and extrapolate to unseen domains. We demonstrate its effectiveness by experiments on a cart-pendulum system, where only 2 random rollouts are required to learn the forward dynamics and successfully achieve the swing-up task.

al

Code Arxiv Poster Slides link (url) Project Page [BibTex]

Code Arxiv Poster Slides link (url) Project Page [BibTex]


Thumb xl sab
Robust Affordable 3D Haptic Sensation via Learning Deformation Patterns

Sun, H., Martius, G.

Proceedings International Conference on Humanoid Robots, pages: 846-853, IEEE, New York, NY, USA, 2018 IEEE-RAS International Conference on Humanoid Robots, 2018, Oral Presentation (conference)

Abstract
Haptic sensation is an important modality for interacting with the real world. This paper proposes a general framework of inferring haptic forces on the surface of a 3D structure from internal deformations using a small number of physical sensors instead of employing dense sensor arrays. Using machine learning techniques, we optimize the sensor number and their placement and are able to obtain high-precision force inference for a robotic limb using as few as 9 sensors. For the optimal and sparse placement of the measurement units (strain gauges), we employ data-driven methods based on data obtained by finite element simulation. We compare data-driven approaches with model-based methods relying on geometric distance and information criteria such as Entropy and Mutual Information. We validate our approach on a modified limb of the “Poppy” robot [1] and obtain 8 mm localization precision.

al

DOI Project Page [BibTex]

DOI Project Page [BibTex]

2015


Thumb xl zhou
Exploiting Object Similarity in 3D Reconstruction

Zhou, C., Güney, F., Wang, Y., Geiger, A.

In International Conference on Computer Vision (ICCV), December 2015 (inproceedings)

Abstract
Despite recent progress, reconstructing outdoor scenes in 3D from movable platforms remains a highly difficult endeavor. Challenges include low frame rates, occlusions, large distortions and difficult lighting conditions. In this paper, we leverage the fact that the larger the reconstructed area, the more likely objects of similar type and shape will occur in the scene. This is particularly true for outdoor scenes where buildings and vehicles often suffer from missing texture or reflections, but share similarity in 3D shape. We take advantage of this shape similarity by locating objects using detectors and jointly reconstructing them while learning a volumetric model of their shape. This allows us to reduce noise while completing missing surfaces as objects of similar shape benefit from all observations for the respective category. We evaluate our approach with respect to LIDAR ground truth on a novel challenging suburban dataset and show its advantages over the state-of-the-art.

avg ps

pdf suppmat [BibTex]

2015


pdf suppmat [BibTex]


Thumb xl philip
FollowMe: Efficient Online Min-Cost Flow Tracking with Bounded Memory and Computation

Lenz, P., Geiger, A., Urtasun, R.

In International Conference on Computer Vision (ICCV), International Conference on Computer Vision (ICCV), December 2015 (inproceedings)

Abstract
One of the most popular approaches to multi-target tracking is tracking-by-detection. Current min-cost flow algorithms which solve the data association problem optimally have three main drawbacks: they are computationally expensive, they assume that the whole video is given as a batch, and they scale badly in memory and computation with the length of the video sequence. In this paper, we address each of these issues, resulting in a computationally and memory-bounded solution. First, we introduce a dynamic version of the successive shortest-path algorithm which solves the data association problem optimally while reusing computation, resulting in faster inference than standard solvers. Second, we address the optimal solution to the data association problem when dealing with an incoming stream of data (i.e., online setting). Finally, we present our main contribution which is an approximate online solution with bounded memory and computation which is capable of handling videos of arbitrary length while performing tracking in real time. We demonstrate the effectiveness of our algorithms on the KITTI and PETS2009 benchmarks and show state-of-the-art performance, while being significantly faster than existing solvers.

avg ps

pdf suppmat video project [BibTex]

pdf suppmat video project [BibTex]


Thumb xl grassmanteaser
Scalable Robust Principal Component Analysis using Grassmann Averages

Hauberg, S., Feragen, A., Enficiaud, R., Black, M.

IEEE Trans. Pattern Analysis and Machine Intelligence (PAMI), December 2015 (article)

Abstract
In large datasets, manual data verification is impossible, and we must expect the number of outliers to increase with data size. While principal component analysis (PCA) can reduce data size, and scalable solutions exist, it is well-known that outliers can arbitrarily corrupt the results. Unfortunately, state-of-the-art approaches for robust PCA are not scalable. We note that in a zero-mean dataset, each observation spans a one-dimensional subspace, giving a point on the Grassmann manifold. We show that the average subspace corresponds to the leading principal component for Gaussian data. We provide a simple algorithm for computing this Grassmann Average (GA), and show that the subspace estimate is less sensitive to outliers than PCA for general distributions. Because averages can be efficiently computed, we immediately gain scalability. We exploit robust averaging to formulate the Robust Grassmann Average (RGA) as a form of robust PCA. The resulting Trimmed Grassmann Average (TGA) is appropriate for computer vision because it is robust to pixel outliers. The algorithm has linear computational complexity and minimal memory requirements. We demonstrate TGA for background modeling, video restoration, and shadow removal. We show scalability by performing robust PCA on the entire Star Wars IV movie; a task beyond any current method. Source code is available online.

ps sf

preprint pdf from publisher supplemental Project Page [BibTex]

preprint pdf from publisher supplemental Project Page [BibTex]


Thumb xl intrinsicdepth teaser1
Intrinsic Depth: Improving Depth Transfer with Intrinsic Images

Kong, N., Black, M. J.

In IEEE International Conference on Computer Vision (ICCV), pages: 3514-3522, International Conference on Computer Vision (ICCV), December 2015 (inproceedings)

Abstract
We formulate the estimation of dense depth maps from video sequences as a problem of intrinsic image estimation. Our approach synergistically integrates the estimation of multiple intrinsic images including depth, albedo, shading, optical flow, and surface contours. We build upon an example-based framework for depth estimation that uses label transfer from a database of RGB and depth pairs. We combine this with a method that extracts consistent albedo and shading from video. In contrast to raw RGB values, albedo and shading provide a richer, more physical, foundation for depth transfer. Additionally we train a new contour detector to predict surface boundaries from albedo, shading, and pixel values and use this to improve the estimation of depth boundaries. We also integrate sparse structure from motion with our method to improve the metric accuracy of the estimated depth maps. We evaluate our Intrinsic Depth method quantitatively by estimating depth from videos in the NYU RGB-D and SUN3D datasets. We find that combining the estimation of multiple intrinsic images improves depth estimation relative to the baseline method.

ps

pdf suppmat YouTube official video poster Project Page Project Page [BibTex]

pdf suppmat YouTube official video poster Project Page Project Page [BibTex]


Thumb xl bogo iccv2015 teaser
Detailed Full-Body Reconstructions of Moving People from Monocular RGB-D Sequences

Bogo, F., Black, M. J., Loper, M., Romero, J.

In International Conference on Computer Vision (ICCV), pages: 2300-2308, December 2015 (inproceedings)

Abstract
We accurately estimate the 3D geometry and appearance of the human body from a monocular RGB-D sequence of a user moving freely in front of the sensor. Range data in each frame is first brought into alignment with a multi-resolution 3D body model in a coarse-to-fine process. The method then uses geometry and image texture over time to obtain accurate shape, pose, and appearance information despite unconstrained motion, partial views, varying resolution, occlusion, and soft tissue deformation. Our novel body model has variable shape detail, allowing it to capture faces with a high-resolution deformable head model and body shape with lower-resolution. Finally we combine range data from an entire sequence to estimate a high-resolution displacement map that captures fine shape details. We compare our recovered models with high-resolution scans from a professional system and with avatars created by a commercial product. We extract accurate 3D avatars from challenging motion sequences and even capture soft tissue dynamics.

ps

Video pdf Project Page Project Page [BibTex]

Video pdf Project Page Project Page [BibTex]


Thumb xl thumb3
3D Object Reconstruction from Hand-Object Interactions

Tzionas, D., Gall, J.

In International Conference on Computer Vision (ICCV), pages: 729-737, International Conference on Computer Vision (ICCV), December 2015 (inproceedings)

Abstract
Recent advances have enabled 3d object reconstruction approaches using a single off-the-shelf RGB-D camera. Although these approaches are successful for a wide range of object classes, they rely on stable and distinctive geometric or texture features. Many objects like mechanical parts, toys, household or decorative articles, however, are textureless and characterized by minimalistic shapes that are simple and symmetric. Existing in-hand scanning systems and 3d reconstruction techniques fail for such symmetric objects in the absence of highly distinctive features. In this work, we show that extracting 3d hand motion for in-hand scanning effectively facilitates the reconstruction of even featureless and highly symmetric objects and we present an approach that fuses the rich additional information of hands into a 3d reconstruction pipeline, significantly contributing to the state-of-the-art of in-hand scanning.

ps

pdf Project's Website Video Spotlight Extended Abstract YouTube DOI Project Page [BibTex]

pdf Project's Website Video Spotlight Extended Abstract YouTube DOI Project Page [BibTex]


Thumb xl splitbodieswebteaser2
SMPL: A Skinned Multi-Person Linear Model

Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M. J.

ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1-248:16, ACM, New York, NY, October 2015 (article)

Abstract
We present a learned model of human body shape and pose-dependent shape variation that is more accurate than previous models and is compatible with existing graphics pipelines. Our Skinned Multi-Person Linear model (SMPL) is a skinned vertex-based model that accurately represents a wide variety of body shapes in natural human poses. The parameters of the model are learned from data including the rest pose template, blend weights, pose-dependent blend shapes, identity-dependent blend shapes, and a regressor from vertices to joint locations. Unlike previous models, the pose-dependent blend shapes are a linear function of the elements of the pose rotation matrices. This simple formulation enables training the entire model from a relatively large number of aligned 3D meshes of different people in different poses. We quantitatively evaluate variants of SMPL using linear or dual-quaternion blend skinning and show that both are more accurate than a Blend-SCAPE model trained on the same data. We also extend SMPL to realistically model dynamic soft-tissue deformations. Because it is based on blend skinning, SMPL is compatible with existing rendering engines and we make it available for research purposes.

ps

pdf video code/model errata DOI Project Page Project Page [BibTex]

pdf video code/model errata DOI Project Page Project Page [BibTex]


Thumb xl thumb
Proceedings of the 37th German Conference on Pattern Recognition

Gall, J., Gehler, P., Leibe, B.

Springer, German Conference on Pattern Recognition, October 2015 (proceedings)

ps

GCPR conference website [BibTex]

GCPR conference website [BibTex]


Thumb xl teaser
Towards Probabilistic Volumetric Reconstruction using Ray Potentials

(Best Paper Award)

Ulusoy, A. O., Geiger, A., Black, M. J.

In 3D Vision (3DV), 2015 3rd International Conference on, pages: 10-18, Lyon, October 2015 (inproceedings)

Abstract
This paper presents a novel probabilistic foundation for volumetric 3-d reconstruction. We formulate the problem as inference in a Markov random field, which accurately captures the dependencies between the occupancy and appearance of each voxel, given all input images. Our main contribution is an approximate highly parallelized discrete-continuous inference algorithm to compute the marginal distributions of each voxel's occupancy and appearance. In contrast to the MAP solution, marginals encode the underlying uncertainty and ambiguity in the reconstruction. Moreover, the proposed algorithm allows for a Bayes optimal prediction with respect to a natural reconstruction loss. We compare our method to two state-of-the-art volumetric reconstruction algorithms on three challenging aerial datasets with LIDAR ground truth. Our experiments demonstrate that the proposed algorithm compares favorably in terms of reconstruction accuracy and the ability to expose reconstruction uncertainty.

avg ps

code YouTube pdf suppmat DOI Project Page [BibTex]

code YouTube pdf suppmat DOI Project Page [BibTex]


Thumb xl sap2015
Perception of Strength and Power of Realistic Male Characters

Wellerdiek, A. C., Breidt, M., Geuss, M. N., Streuber, S., Kloos, U., Black, M. J., Mohler, B. J.

In Proc. ACM SIGGRAPH Symposium on Applied Perception, SAP’15, pages: 7-14, ACM, New York, NY, September 2015 (inproceedings)

Abstract
We investigated the influence of body shape and pose on the perception of physical strength and social power for male virtual characters. In the first experiment, participants judged the physical strength of varying body shapes, derived from a statistical 3D body model. Based on these ratings, we determined three body shapes (weak, average, and strong) and animated them with a set of power poses for the second experiment. Participants rated how strong or powerful they perceived virtual characters of varying body shapes that were displayed in different poses. Our results show that perception of physical strength was mainly driven by the shape of the body. However, the social attribute of power was influenced by an interaction between pose and shape. Specifically, the effect of pose on power ratings was greater for weak body shapes. These results demonstrate that a character with a weak shape can be perceived as more powerful when in a high-power pose.

ps

PDF DOI Project Page [BibTex]

PDF DOI Project Page [BibTex]


Thumb xl dynateaser
Dyna: A Model of Dynamic Human Shape in Motion

Pons-Moll, G., Romero, J., Mahmood, N., Black, M. J.

ACM Transactions on Graphics, (Proc. SIGGRAPH), 34(4):120:1-120:14, ACM, August 2015 (article)

Abstract
To look human, digital full-body avatars need to have soft tissue deformations like those of real people. We learn a model of soft-tissue deformations from examples using a high-resolution 4D capture system and a method that accurately registers a template mesh to sequences of 3D scans. Using over 40,000 scans of ten subjects, we learn how soft tissue motion causes mesh triangles to deform relative to a base 3D body model. Our Dyna model uses a low-dimensional linear subspace to approximate soft-tissue deformation and relates the subspace coefficients to the changing pose of the body. Dyna uses a second-order auto-regressive model that predicts soft-tissue deformations based on previous deformations, the velocity and acceleration of the body, and the angular velocities and accelerations of the limbs. Dyna also models how deformations vary with a person’s body mass index (BMI), producing different deformations for people with different shapes. Dyna realistically represents the dynamics of soft tissue for previously unseen subjects and motions. We provide tools for animators to modify the deformations and apply them to new stylized characters.

ps

pdf preprint video data DOI Project Page Project Page [BibTex]

pdf preprint video data DOI Project Page Project Page [BibTex]


Thumb xl invgraphicsdemo
The Informed Sampler: A Discriminative Approach to Bayesian Inference in Generative Computer Vision Models

Jampani, V., Nowozin, S., Loper, M., Gehler, P. V.

In Special Issue on Generative Models in Computer Vision and Medical Imaging, 136, pages: 32-44, Elsevier, July 2015 (inproceedings)

Abstract
Computer vision is hard because of a large variability in lighting, shape, and texture; in addition the image signal is non-additive due to occlusion. Generative models promised to account for this variability by accurately modelling the image formation process as a function of latent variables with prior beliefs. Bayesian posterior inference could then, in principle, explain the observation. While intuitively appealing, generative models for computer vision have largely failed to deliver on that promise due to the difficulty of posterior inference. As a result the community has favored efficient discriminative approaches. We still believe in the usefulness of generative models in computer vision, but argue that we need to leverage existing discriminative or even heuristic computer vision methods. We implement this idea in a principled way in our informed sampler and in careful experiments demonstrate it on challenging models which contain renderer programs as their components. The informed sampler, using simple discriminative proposals based on existing computer vision technology achieves dramatic improvements in inference. Our approach enables a new richness in generative models that was out of reach with existing inference technology.

ps

arXiv-preprint pdf DOI Project Page [BibTex]

arXiv-preprint pdf DOI Project Page [BibTex]


Thumb xl objs2acts
Linking Objects to Actions: Encoding of Target Object and Grasping Strategy in Primate Ventral Premotor Cortex

Vargas-Irwin, C. E., Franquemont, L., Black, M. J., Donoghue, J. P.

Journal of Neuroscience, 35(30):10888-10897, July 2015 (article)

Abstract
Neural activity in ventral premotor cortex (PMv) has been associated with the process of matching perceived objects with the motor commands needed to grasp them. It remains unclear how PMv networks can flexibly link percepts of objects affording multiple grasp options into a final desired hand action. Here, we use a relational encoding approach to track the functional state of PMv neuronal ensembles in macaque monkeys through the process of passive viewing, grip planning, and grasping movement execution. We used objects affording multiple possible grip strategies. The task included separate instructed delay periods for object presentation and grip instruction. This approach allowed us to distinguish responses elicited by the visual presentation of the objects from those associated with selecting a given motor plan for grasping. We show that PMv continuously incorporates information related to object shape and grip strategy as it becomes available, revealing a transition from a set of ensemble states initially most closely related to objects, to a new set of ensemble patterns reflecting unique object-grip combinations. These results suggest that PMv dynamically combines percepts, gradually navigating toward activity patterns associated with specific volitional actions, rather than directly mapping perceptual object properties onto categorical grip representations. Our results support the idea that PMv is part of a network that dynamically computes motor plans from perceptual information. Significance Statement: The present work demonstrates that the activity of groups of neurons in primate ventral premotor cortex reflects information related to visually presented objects, as well as the motor strategy used to grasp them, linking individual objects to multiple possible grips. PMv could provide useful control signals for neuroprosthetic assistive devices designed to interact with objects in a flexible way.

ps

publisher link DOI Project Page [BibTex]

publisher link DOI Project Page [BibTex]


Thumb xl silviateaser
The Stitched Puppet: A Graphical Model of 3D Human Shape and Pose

Zuffi, S., Black, M. J.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2015), pages: 3537-3546, June 2015 (inproceedings)

Abstract
We propose a new 3D model of the human body that is both realistic and part-based. The body is represented by a graphical model in which nodes of the graph correspond to body parts that can independently translate and rotate in 3D as well as deform to capture pose-dependent shape variations. Pairwise potentials define a “stitching cost” for pulling the limbs apart, giving rise to the stitched puppet model (SPM). Unlike existing realistic 3D body models, the distributed representation facilitates inference by allowing the model to more effectively explore the space of poses, much like existing 2D pictorial structures models. We infer pose and body shape using a form of particle-based max-product belief propagation. This gives the SPM the realism of recent 3D body models with the computational advantages of part-based models. We apply the SPM to two challenging problems involving estimating human shape and pose from 3D data. The first is the FAUST mesh alignment challenge (http://faust.is.tue.mpg.de/), where ours is the first method to successfully align all 3D meshes. The second involves estimating pose and shape from crude visual hull representations of complex body movements.

ps

pdf Extended Abstract poster code/project video DOI Project Page [BibTex]

pdf Extended Abstract poster code/project video DOI Project Page [BibTex]


Thumb xl img displet
Displets: Resolving Stereo Ambiguities using Object Knowledge

Güney, F., Geiger, A.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) 2015, pages: 4165-4175, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2015 (inproceedings)

Abstract
Stereo techniques have witnessed tremendous progress over the last decades, yet some aspects of the problem still remain challenging today. Striking examples are reflecting and textureless surfaces which cannot easily be recovered using traditional local regularizers. In this paper, we therefore propose to regularize over larger distances using object-category specific disparity proposals (displets) which we sample using inverse graphics techniques based on a sparse disparity estimate and a semantic segmentation of the image. The proposed displets encode the fact that objects of certain categories are not arbitrarily shaped but typically exhibit regular structures. We integrate them as non-local regularizer for the challenging object class 'car' into a superpixel based CRF framework and demonstrate its benefits on the KITTI stereo evaluation.

avg ps

pdf abstract suppmat [BibTex]

pdf abstract suppmat [BibTex]


Thumb xl img sceneflow
Object Scene Flow for Autonomous Vehicles

Menze, M., Geiger, A.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) 2015, pages: 3061-3070, IEEE, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2015 (inproceedings)

Abstract
This paper proposes a novel model and dataset for 3D scene flow estimation with an application to autonomous driving. Taking advantage of the fact that outdoor scenes often decompose into a small number of independently moving objects, we represent each element in the scene by its rigid motion parameters and each superpixel by a 3D plane as well as an index to the corresponding object. This minimal representation increases robustness and leads to a discrete-continuous CRF where the data term decomposes into pairwise potentials between superpixels and objects. Moreover, our model intrinsically segments the scene into its constituting dynamic components. We demonstrate the performance of our model on existing benchmarks as well as a novel realistic dataset with scene flow ground truth. We obtain this dataset by annotating 400 dynamic scenes from the KITTI raw data collection using detailed 3D CAD models for all vehicles in motion. Our experiments also reveal novel challenges which can't be handled by existing methods.

avg ps

pdf abstract suppmat DOI [BibTex]

pdf abstract suppmat DOI [BibTex]


Thumb xl ijazteaser
Pose-Conditioned Joint Angle Limits for 3D Human Pose Reconstruction

Akhter, I., Black, M. J.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2015), pages: 1446-1455, June 2015 (inproceedings)

Abstract
The estimation of 3D human pose from 2D joint locations is central to many vision problems involving the analysis of people in images and video. To address the fact that the problem is inherently ill posed, many methods impose a prior over human poses. Unfortunately these priors admit invalid poses because they do not model how joint-limits vary with pose. Here we make two key contributions. First, we collected a motion capture dataset that explores a wide range of human poses. From this we learn a pose-dependent model of joint limits that forms our prior. The dataset and the prior will be made publicly available. Second, we define a general parameterization of body pose and a new, multistage, method to estimate 3D pose from 2D joint locations that uses an over-complete dictionary of human poses. Our method shows good generalization while avoiding impossible poses. We quantitatively compare our method with recent work and show state-of-the-art results on 2D to 3D pose estimation using the CMU mocap dataset. We also show superior results on manual annotations on real images and automatic part-based detections on the Leeds sports pose dataset.

ps

pdf Extended Abstract video project/data/code poster DOI Project Page Project Page [BibTex]

pdf Extended Abstract video project/data/code poster DOI Project Page Project Page [BibTex]


Thumb xl jonasteaser
Efficient Sparse-to-Dense Optical Flow Estimation using a Learned Basis and Layers

Wulff, J., Black, M. J.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2015), pages: 120-130, June 2015 (inproceedings)

Abstract
We address the elusive goal of estimating optical flow both accurately and efficiently by adopting a sparse-to-dense approach. Given a set of sparse matches, we regress to dense optical flow using a learned set of full-frame basis flow fields. We learn the principal components of natural flow fields using flow computed from four Hollywood movies. Optical flow fields are then compactly approximated as a weighted sum of the basis flow fields. Our new PCA-Flow algorithm robustly estimates these weights from sparse feature matches. The method runs in under 300ms/frame on the MPI-Sintel dataset using a single CPU and is more accurate and significantly faster than popular methods such as LDOF and Classic+NL. The results, however, are too smooth for some applications. Consequently, we develop a novel sparse layered flow method in which each layer is represented by PCA-flow. Unlike existing layered methods, estimation is fast because it uses only sparse matches. We combine information from different layers into a dense flow field using an image-aware MRF. The resulting PCA-Layers method runs in 3.6s/frame, is significantly more accurate than PCA-flow and achieves state-of-the-art performance in occluded regions on MPI-Sintel.

ps

pdf Extended Abstract Supplemental Material Poster Code Project Page Project Page [BibTex]