Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Perceiving Systems Conference Paper 3D2PM – 3D Deformable Part Models Pepik, B., Gehler, P., Stark, M., Schiele, B. In Proceedings of the European Conference on Computer Vision (ECCV), 356-370, Lecture Notes in Computer Science, (Editors: Fitzgibbon, Andrew W. and Lazebnik, Svetlana and Perona, Pietro and Sato, Yoichi and Schmid, Cordelia), Springer, Firenze, October 2012 pdf video poster BibTeX
Thumb ticker lg tripod seq 16 054 part 3d vis

Perceiving Systems Conference Paper A naturalistic open source movie for optical flow evaluation Butler, D. J., Wulff, J., Stanley, G. B., Black, M. J. In European Conf. on Computer Vision (ECCV), 611-625, Part IV, LNCS 7577, (Editors: A. Fitzgibbon et al. (Eds.)), Springer-Verlag, October 2012
Ground truth optical flow is difficult to measure in real scenes with natural motion. As a result, optical flow data sets are restricted in terms of size, complexity, and diversity, making optical flow algorithms difficult to train and test on realistic data. We introduce a new optical flow data set derived from the open source 3D animated short film Sintel. This data set has important features not present in the popular Middlebury flow evaluation: long sequences, large motions, specular reflections, motion blur, defocus blur, and atmospheric effects. Because the graphics data that generated the movie is open source, we are able to render scenes under conditions of varying complexity to evaluate where existing flow algorithms fail. We evaluate several recent optical flow algorithms and find that current highly-ranked methods on the Middlebury evaluation have difficulty with this more complex data set suggesting further research on optical flow estimation is needed. To validate the use of synthetic data, we compare the image- and flow-statistics of Sintel to those of real films and videos and show that they are similar. The data set, metrics, and evaluation website are publicly available.
pdf dataset youtube talk supplemental material BibTeX
Thumb ticker lg sinteleccv2012crop

Perceiving Systems Conference Paper Coregistration: Simultaneous alignment and modeling of articulated 3D shape Hirshberg, D., Loper, M., Rachlin, E., Black, M. In European Conf. on Computer Vision (ECCV), 242-255, LNCS 7577, Part IV, (Editors: A. Fitzgibbon et al. (Eds.)), Springer-Verlag, October 2012
Three-dimensional (3D) shape models are powerful because they enable the inference of object shape from incomplete, noisy, or ambiguous 2D or 3D data. For example, realistic parameterized 3D human body models have been used to infer the shape and pose of people from images. To train such models, a corpus of 3D body scans is typically brought into registration by aligning a common 3D human-shaped template to each scan. This is an ill-posed problem that typically involves solving an optimization problem with regularization terms that penalize implausible deformations of the template. When aligning a corpus, however, we can do better than generic regularization. If we have a model of how the template can deform then alignments can be regularized by this model. Constructing a model of deformations, however, requires having a corpus that is already registered. We address this chicken-and-egg problem by approaching modeling and registration together. By minimizing a single objective function, we reliably obtain high quality registration of noisy, incomplete, laser scans, while simultaneously learning a highly realistic articulated body model. The model greatly improves robustness to noise and missing data. Since the model explains a corpus of body scans, it captures how body shape varies across people and poses.
pdf publisher site poster supplemental material (400MB) DOI BibTeX
Thumb ticker lg coregteaser

Perceiving Systems Article Coupled Action Recognition and Pose Estimation from Multiple Views Yao, A., Gall, J., van Gool, L. International Journal of Computer Vision, 100(1):16-37, October 2012 publisher's site code pdf BibTeX
Thumb ticker lg posear

Perceiving Systems Conference Paper Lessons and insights from creating a synthetic optical flow benchmark Wulff, J., Butler, D. J., Stanley, G. B., Black, M. J. In ECCV Workshop on Unsolved Problems in Optical Flow and Stereo Estimation, 168-177, Part II, LNCS 7584, (Editors: A. Fusiello et al. (Eds.)), Springer-Verlag, October 2012 pdf dataset poster youtube BibTeX
Thumb ticker lg sintelworkshop

Perceiving Systems Conference Paper Lie Bodies: A Manifold Representation of 3D Human Shape Freifeld, O., Black, M. J. In European Conf. on Computer Vision (ECCV), 1-14, Part I, LNCS 7572, (Editors: A. Fitzgibbon et al. (Eds.)), Springer-Verlag, October 2012
Three-dimensional object shape is commonly represented in terms of deformations of a triangular mesh from an exemplar shape. Existing models, however, are based on a Euclidean representation of shape deformations. In contrast, we argue that shape has a manifold structure: For example, summing the shape deformations for two people does not necessarily yield a deformation corresponding to a valid human shape, nor does the Euclidean difference of these two deformations provide a meaningful measure of shape dissimilarity. Consequently, we define a novel manifold for shape representation, with emphasis on body shapes, using a new Lie group of deformations. This has several advantages. First we define triangle deformations exactly, removing non-physical deformations and redundant degrees of freedom common to previous methods. Second, the Riemannian structure of Lie Bodies enables a more meaningful definition of body shape similarity by measuring distance between bodies on the manifold of body shape deformations. Third, the group structure allows the valid composition of deformations. This is important for models that factor body shape deformations into multiple causes or represent shape as a linear combination of basis shapes. Finally, body shape variation is modeled using statistics on manifolds. Instead of modeling Euclidean shape variation with Principal Component Analysis we capture shape variation on the manifold using Principal Geodesic Analysis. Our experiments show consistent visual and quantitative advantages of Lie Bodies over traditional Euclidean models of shape deformation and our representation can be easily incorporated into existing methods.
pdf supplemental material youtube poster eigenshape video code BibTeX
Thumb ticker lg paperfig

Perceiving Systems Conference Paper A framework for relating neural activity to freely moving behavior Foster, J. D., Nuyujukian, P., Freifeld, O., Ryu, S., Black, M. J., Shenoy, K. V. In 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC’12), 2736 -2739 , IEEE, San Diego, August 2012 pdf BibTeX
Thumb ticker lg embs2012

Perceiving Systems Conference Paper Pottics – The Potts Topic Model for Semantic Image Segmentation Dann, C., Gehler, P., Roth, S., Nowozin, S. In Proceedings of 34th DAGM Symposium, 397-407, Lecture Notes in Computer Science, (Editors: Pinz, Axel and Pock, Thomas and Bischof, Horst and Leberl, Franz), Springer, August 2012 code pdf poster BibTeX
Thumb ticker lg screen shot 2012 06 25 at 1.59.41 pm

Perceiving Systems Empirical Inference Probabilistic Numerics Conference Paper Quasi-Newton Methods: A New Direction Hennig, P., Kiefel, M. In Proceedings of the 29th International Conference on Machine Learning, 25-32, ICML ’12, (Editors: John Langford and Joelle Pineau), Omnipress, New York, NY, USA, ICML, July 2012
Four decades after their invention, quasi- Newton methods are still state of the art in unconstrained numerical optimization. Although not usually interpreted thus, these are learning algorithms that fit a local quadratic approximation to the objective function. We show that many, including the most popular, quasi-Newton methods can be interpreted as approximations of Bayesian linear regression under varying prior assumptions. This new notion elucidates some shortcomings of classical algorithms, and lights the way to a novel nonparametric quasi-Newton method, which is able to make more efficient use of available information at computational cost similar to its predecessors.
website+code pdf URL BibTeX
Thumb ticker lg thumb hennigk2012

Perceiving Systems Article DRAPE: DRessing Any PErson Guan, P., Reiss, L., Hirshberg, D., Weiss, A., Black, M. J. ACM Trans. on Graphics (Proc. SIGGRAPH), 31(4):35:1-35:10, July 2012
We describe a complete system for animating realistic clothing on synthetic bodies of any shape and pose without manual intervention. The key component of the method is a model of clothing called DRAPE (DRessing Any PErson) that is learned from a physics-based simulation of clothing on bodies of different shapes and poses. The DRAPE model has the desirable property of "factoring" clothing deformations due to body shape from those due to pose variation. This factorization provides an approximation to the physical clothing deformation and greatly simplifies clothing synthesis. Given a parameterized model of the human body with known shape and pose parameters, we describe an algorithm that dresses the body with a garment that is customized to fit and possesses realistic wrinkles. DRAPE can be used to dress static bodies or animated sequences with a learned model of the cloth dynamics. Since the method is fully automated, it is appropriate for dressing large numbers of virtual characters of varying shape. The method is significantly more efficient than physical simulation.
YouTube pdf talk BibTeX
Thumb ticker lg representativecrop

Perceiving Systems Conference Paper From pictorial structures to deformable structures Zuffi, S., Freifeld, O., Black, M. J. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 3546-3553, IEEE, June 2012
Pictorial Structures (PS) define a probabilistic model of 2D articulated objects in images. Typical PS models assume an object can be represented by a set of rigid parts connected with pairwise constraints that define the prior probability of part configurations. These models are widely used to represent non-rigid articulated objects such as humans and animals despite the fact that such objects have parts that deform non-rigidly. Here we define a new Deformable Structures (DS) model that is a natural extension of previous PS models and that captures the non-rigid shape deformation of the parts. Each part in a DS model is represented by a low-dimensional shape deformation space and pairwise potentials between parts capture how the shape varies with pose and the shape of neighboring parts. A key advantage of such a model is that it more accurately models object boundaries. This enables image likelihood models that are more discriminative than previous PS likelihoods. This likelihood is learned using training imagery annotated using a DS “puppet.” We focus on a human DS model learned from 2D projections of a realistic 3D human body model and use it to infer human poses in images using a form of non-parametric belief propagation.
pdf sup mat code poster BibTeX
Thumb ticker lg frompstods2

Perceiving Systems Conference Paper Teaching 3D Geometry to Deformable Part Models Pepik, B., Stark, M., Gehler, P., Schiele, B. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3362 -3369, IEEE, Providence, RI, USA, June 2012, oral presentation pdf DOI BibTeX
Thumb ticker lg screen shot 2012 03 22 at 17.51.07

Perceiving Systems Article Visual Orientation and Directional Selectivity Through Thalamic Synchrony Stanley, G., Jin, J., Wang, Y., Desbordes, G., Wang, Q., Black, M., Alonso, J. Journal of Neuroscience, 32(26):9073-9088, June 2012
Thalamic neurons respond to visual scenes by generating synchronous spike trains on the timescale of 10–20 ms that are very effective at driving cortical targets. Here we demonstrate that this synchronous activity contains unexpectedly rich information about fundamental properties of visual stimuli. We report that the occurrence of synchronous firing of cat thalamic cells with highly overlapping receptive fields is strongly sensitive to the orientation and the direction of motion of the visual stimulus. We show that this stimulus selectivity is robust, remaining relatively unchanged under different contrasts and temporal frequencies (stimulus velocities). A computational analysis based on an integrate-and-fire model of the direct thalamic input to a layer 4 cortical cell reveals a strong correlation between the degree of thalamic synchrony and the nonlinear relationship between cortical membrane potential and the resultant firing rate. Together, these findings suggest a novel population code in the synchronous firing of neurons in the early visual pathway that could serve as the substrate for establishing cortical representations of the visual scene.
preprint BibTeX
Thumb ticker lg jneuroscicrop

Perceiving Systems Conference Paper A Geometric Take on Metric Learning Hauberg, S., Freifeld, O., Black, M. J. In Advances in Neural Information Processing Systems (NIPS) 25, 2033-2041, (Editors: P. Bartlett and F.C.N. Pereira and C.J.C. Burges and L. Bottou and K.Q. Weinberger), MIT Press, 2012
Multi-metric learning techniques learn local metric tensors in different parts of a feature space. With such an approach, even simple classifiers can be competitive with the state-of-the-art because the distance measure locally adapts to the structure of the data. The learned distance measure is, however, non-metric, which has prevented multi-metric learning from generalizing to tasks such as dimensionality reduction and regression in a principled way. We prove that, with appropriate changes, multi-metric learning corresponds to learning the structure of a Riemannian manifold. We then show that this structure gives us a principled way to perform dimensionality reduction and regression according to the learned metrics. Algorithmically, we provide the first practical algorithm for computing geodesics according to the learned metrics, as well as algorithms for computing exponential and logarithmic maps on the Riemannian manifold. Together, these tools let many Euclidean algorithms take advantage of multi-metric learning. We illustrate the approach on regression and dimensionality reduction tasks that involve predicting measurements of the human body from shape data.
PDF Youtube Suppl. material Poster BibTeX
Thumb ticker lg nips teaser

Perceiving Systems Book Chapter An Introduction to Random Forests for Multi-class Object Detection Gall, J., Razavi, N., van Gool, L. In Outdoor and Large-Scale Real-World Scene Analysis, 7474:243-263, LNCS, (Editors: Dellaert, Frank and Frahm, Jan-Michael and Pollefeys, Marc and Rosenhahn, Bodo and Leal-Taix’e, Laura), Springer, 2012 code for Hough forest publisher's site pdf BibTeX
Thumb ticker lg multiclasshf

Perceiving Systems Book Consumer Depth Cameras for Computer Vision - Research Topics and Applications Fossati, A., Gall, J., Grabner, H., Ren, X., Konolige, K. Advances in Computer Vision and Pattern Recognition, Springer, 2012 publisher's site BibTeX
Thumb ticker lg bookcdc4cv

Perceiving Systems Conference Paper Destination Flow for Crowd Simulation Pellegrini, S., Gall, J., Sigal, L., van Gool, L. In Workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Streams, 7585:162-171, LNCS, Springer, 2012 pdf BibTeX
Thumb ticker lg destflow

Perceiving Systems Conference Paper From Deformations to Parts: Motion-based Segmentation of 3D Objects Ghosh, S., Sudderth, E., Loper, M., Black, M. In Advances in Neural Information Processing Systems 25 (NIPS), 2006-2014, (Editors: P. Bartlett and F.C.N. Pereira and C.J.C. Burges and L. Bottou and K.Q. Weinberger), MIT Press, 2012
We develop a method for discovering the parts of an articulated object from aligned meshes of the object in various three-dimensional poses. We adapt the distance dependent Chinese restaurant process (ddCRP) to allow nonparametric discovery of a potentially unbounded number of parts, while simultaneously guaranteeing a spatially connected segmentation. To allow analysis of datasets in which object instances have varying 3D shapes, we model part variability across poses via affine transformations. By placing a matrix normal-inverse-Wishart prior on these affine transformations, we develop a ddCRP Gibbs sampler which tractably marginalizes over transformation uncertainty. Analyzing a dataset of humans captured in dozens of poses, we infer parts which provide quantitatively better deformation predictions than conventional clustering methods.
pdf supplemental code poster URL BibTeX
Thumb ticker lg soumyanips

Perceiving Systems Book Chapter Home 3D body scans from noisy image and range data Weiss, A., Hirshberg, D., Black, M. J. In Consumer Depth Cameras for Computer Vision: Research Topics and Applications, 99-118, 6, (Editors: Andrea Fossati and Juergen Gall and Helmut Grabner and Xiaofeng Ren and Kurt Konolige), Springer-Verlag, 2012 BibTeX
Thumb ticker lg kinectbookchap

Perceiving Systems Conference Paper Interactive Object Detection Yao, A., Gall, J., Leistner, C., van Gool, L. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3242-3249, IEEE, Providence, RI, USA, 2012 video pdf BibTeX
Thumb ticker lg cells

Perceiving Systems Conference Paper Latent Hough Transform for Object Detection Razavi, N., Gall, J., Kohli, P., van Gool, L. In European Conference on Computer Vision (ECCV), 7574:312-325, LNCS, Springer, 2012 pdf BibTeX
Thumb ticker lg lht

Perceiving Systems Conference Paper Layered segmentation and optical flow estimation over time Sun, D., Sudderth, E., Black, M. J. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 1768-1775, IEEE, 2012
Layered models provide a compelling approach for estimating image motion and segmenting moving scenes. Previous methods, however, have failed to capture the structure of complex scenes, provide precise object boundaries, effectively estimate the number of layers in a scene, or robustly determine the depth order of the layers. Furthermore, previous methods have focused on optical flow between pairs of frames rather than longer sequences. We show that image sequences with more frames are needed to resolve ambiguities in depth ordering at occlusion boundaries; temporal layer constancy makes this feasible. Our generative model of image sequences is rich but difficult to optimize with traditional gradient descent methods. We propose a novel discrete approximation of the continuous objective in terms of a sequence of depth-ordered MRFs and extend graph-cut optimization methods with new “moves” that make joint layer segmentation and motion estimation feasible. Our optimizer, which mixes discrete and continuous optimization, automatically determines the number of layers and reasons about their depth ordering. We demonstrate the value of layered models, our optimization strategy, and the use of more than two frames on both the Middlebury optical flow benchmark and the MIT layer segmentation benchmark.
pdf sup mat poster BibTeX
Thumb ticker lg cvprlayers12crop

Perceiving Systems Conference Paper Local Context Priors for Object Proposal Generation Ristin, M., Gall, J., van Gool, L. In Asian Conference on Computer Vision (ACCV), 7724:57-70, LNCS, Springer-Verlag, 2012 pdf DOI BibTeX
Thumb ticker lg objectproposal

Perceiving Systems Conference Paper Metric Learning from Poses for Temporal Clustering of Human Motion L’opez-M’endez, A., Gall, J., Casas, J., van Gool, L. In British Machine Vision Conference (BMVC), 49.1-49.12, (Editors: Bowden, Richard and Collomosse, John and Mikolajczyk, Krystian), BMVA Press, 2012 video pdf BibTeX
Thumb ticker lg metricpose

Perceiving Systems Conference Paper Motion Capture of Hands in Action using Discriminative Salient Points Ballan, L., Taneja, A., Gall, J., van Gool, L., Pollefeys, M. In European Conference on Computer Vision (ECCV), 7577:640-653, LNCS, Springer, 2012 data video pdf supplementary BibTeX
Thumb ticker lg hands

Perceiving Systems Conference Paper Real-time Facial Feature Detection using Conditional Regression Forests Dantone, M., Gall, J., Fanelli, G., van Gool, L. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2578-2585, IEEE, Providence, RI, USA, 2012 code pdf BibTeX
Thumb ticker lg facialfeature

Perceiving Systems Conference Paper Sparsity Potentials for Detecting Objects with the Hough Transform Razavi, N., Alvar, N., Gall, J., van Gool, L. In British Machine Vision Conference (BMVC), 11.1-11.10, (Editors: Bowden, Richard and Collomosse, John and Mikolajczyk, Krystian), BMVA Press, 2012 pdf BibTeX
Thumb ticker lg selfsimilarity small

Perceiving Systems Conference Paper Spatial Measures between Human Poses for Classification and Understanding Hauberg, S., Pedersen, K. S. In Articulated Motion and Deformable Objects, 7378:26-36, LNCS, (Editors: Perales, Francisco J. and Fisher, Robert B. and Moeslund, Thomas B.), Springer Berlin Heidelberg, 2012 Publishers site BibTeX
Thumb ticker lg amdo2012v2