Header logo is


2018


Thumb xl imgidx 00326
Customized Multi-Person Tracker

Ma, L., Tang, S., Black, M. J., Gool, L. V.

In Computer Vision – ACCV 2018, Springer International Publishing, Asian Conference on Computer Vision, December 2018 (inproceedings)

ps

PDF Project Page [BibTex]

2018


PDF Project Page [BibTex]


Thumb xl sevillagcpr
On the Integration of Optical Flow and Action Recognition

Sevilla-Lara, L., Liao, Y., Güney, F., Jampani, V., Geiger, A., Black, M. J.

In German Conference on Pattern Recognition (GCPR), LNCS 11269, pages: 281-297, Springer, Cham, October 2018 (inproceedings)

Abstract
Most of the top performing action recognition methods use optical flow as a "black box" input. Here we take a deeper look at the combination of flow and action recognition, and investigate why optical flow is helpful, what makes a flow method good for action recognition, and how we can make it better. In particular, we investigate the impact of different flow algorithms and input transformations to better understand how these affect a state-of-the-art action recognition method. Furthermore, we fine tune two neural-network flow methods end-to-end on the most widely used action recognition dataset (UCF101). Based on these experiments, we make the following five observations: 1) optical flow is useful for action recognition because it is invariant to appearance, 2) optical flow methods are optimized to minimize end-point-error (EPE), but the EPE of current methods is not well correlated with action recognition performance, 3) for the flow methods tested, accuracy at boundaries and at small displacements is most correlated with action recognition performance, 4) training optical flow to minimize classification error instead of minimizing EPE improves recognition performance, and 5) optical flow learned for the task of action recognition differs from traditional optical flow especially inside the human body and at the boundary of the body. These observations may encourage optical flow researchers to look beyond EPE as a goal and guide action recognition researchers to seek better motion cues, leading to a tighter integration of the optical flow and action recognition communities.

avg ps

arXiv DOI [BibTex]

arXiv DOI [BibTex]


Thumb xl iros18
Towards Robust Visual Odometry with a Multi-Camera System

Liu, P., Geppert, M., Heng, L., Sattler, T., Geiger, A., Pollefeys, M.

In International Conference on Intelligent Robots and Systems (IROS) 2018, International Conference on Intelligent Robots and Systems, October 2018 (inproceedings)

Abstract
We present a visual odometry (VO) algorithm for a multi-camera system and robust operation in challenging environments. Our algorithm consists of a pose tracker and a local mapper. The tracker estimates the current pose by minimizing photometric errors between the most recent keyframe and the current frame. The mapper initializes the depths of all sampled feature points using plane-sweeping stereo. To reduce pose drift, a sliding window optimizer is used to refine poses and structure jointly. Our formulation is flexible enough to support an arbitrary number of stereo cameras. We evaluate our algorithm thoroughly on five datasets. The datasets were captured in different conditions: daytime, night-time with near-infrared (NIR) illumination and night-time without NIR illumination. Experimental results show that a multi-camera setup makes the VO more robust to challenging environments, especially night-time conditions, in which a single stereo configuration fails easily due to the lack of features.

avg

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Thumb xl interpolation
Temporal Interpolation as an Unsupervised Pretraining Task for Optical Flow Estimation

Wulff, J., Black, M. J.

In German Conference on Pattern Recognition (GCPR), LNCS 11269, pages: 567-582, Springer, Cham, October 2018 (inproceedings)

Abstract
The difficulty of annotating training data is a major obstacle to using CNNs for low-level tasks in video. Synthetic data often does not generalize to real videos, while unsupervised methods require heuristic n losses. Proxy tasks can overcome these issues, and start by training a network for a task for which annotation is easier or which can be trained unsupervised. The trained network is then fine-tuned for the original task using small amounts of ground truth data. Here, we investigate frame interpolation as a proxy task for optical flow. Using real movies, we train a CNN unsupervised for temporal interpolation. Such a network implicitly estimates motion, but cannot handle untextured regions. By fi ne-tuning on small amounts of ground truth flow, the network can learn to fill in homogeneous regions and compute full optical flow fi elds. Using this unsupervised pre-training, our network outperforms similar architectures that were trained supervised using synthetic optical flow.

ps

pdf arXiv DOI Project Page [BibTex]

pdf arXiv DOI Project Page [BibTex]


Thumb xl bmvc pic
Human Motion Parsing by Hierarchical Dynamic Clustering

Zhang, Y., Tang, S., Sun, H., Neumann, H.

In Proceedings of the British Machine Vision Conference (BMVC), pages: 269, BMVA Press, 29th British Machine Vision Conference, September 2018 (inproceedings)

Abstract
Parsing continuous human motion into meaningful segments plays an essential role in various applications. In this work, we propose a hierarchical dynamic clustering framework to derive action clusters from a sequence of local features in an unsuper- vised bottom-up manner. We systematically investigate the modules in this framework and particularly propose diverse temporal pooling schemes, in order to realize accurate temporal action localization. We demonstrate our method on two motion parsing tasks: temporal action segmentation and abnormal behavior detection. The experimental results indicate that the proposed framework is significantly more effective than the other related state-of-the-art methods on several datasets.

ps

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Thumb xl ianeccv18
Learning Priors for Semantic 3D Reconstruction

Cherabier, I., Schönberger, J., Oswald, M., Pollefeys, M., Geiger, A.

In Computer Vision – ECCV 2018, Springer International Publishing, Cham, September 2018 (inproceedings)

Abstract
We present a novel semantic 3D reconstruction framework which embeds variational regularization into a neural network. Our network performs a fixed number of unrolled multi-scale optimization iterations with shared interaction weights. In contrast to existing variational methods for semantic 3D reconstruction, our model is end-to-end trainable and captures more complex dependencies between the semantic labels and the 3D geometry. Compared to previous learning-based approaches to 3D reconstruction, we integrate powerful long-range dependencies using variational coarse-to-fine optimization. As a result, our network architecture requires only a moderate number of parameters while keeping a high level of expressiveness which enables learning from very little data. Experiments on real and synthetic datasets demonstrate that our network achieves higher accuracy compared to a purely variational approach while at the same time requiring two orders of magnitude less iterations to converge. Moreover, our approach handles ten times more semantic class labels using the same computational resources.

avg

pdf suppmat Project Page Video DOI Project Page [BibTex]

pdf suppmat Project Page Video DOI Project Page [BibTex]


Thumb xl coma faces
Generating 3D Faces using Convolutional Mesh Autoencoders

Ranjan, A., Bolkart, T., Sanyal, S., Black, M. J.

In European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science, vol 11207, pages: 725-741, Springer, Cham, September 2018 (inproceedings)

Abstract
Learned 3D representations of human faces are useful for computer vision problems such as 3D face tracking and reconstruction from images, as well as graphics applications such as character generation and animation. Traditional models learn a latent representation of a face using linear subspaces or higher-order tensor generalizations. Due to this linearity, they can not capture extreme deformations and non-linear expressions. To address this, we introduce a versatile model that learns a non-linear representation of a face using spectral convolutions on a mesh surface. We introduce mesh sampling operations that enable a hierarchical mesh representation that captures non-linear variations in shape and expression at multiple scales within the model. In a variational setting, our model samples diverse realistic 3D faces from a multivariate Gaussian distribution. Our training data consists of 20,466 meshes of extreme expressions captured over 12 different subjects. Despite limited training data, our trained model outperforms state-of-the-art face models with 50% lower reconstruction error, while using 75% fewer parameters. We also show that, replacing the expression space of an existing state-of-the-art face model with our autoencoder, achieves a lower reconstruction error. Our data, model and code are available at http://coma.is.tue.mpg.de/.

ps

Code (tensorflow) Code (pytorch) Project Page paper supplementary DOI Project Page Project Page [BibTex]

Code (tensorflow) Code (pytorch) Project Page paper supplementary DOI Project Page Project Page [BibTex]


Thumb xl person reid.001
Part-Aligned Bilinear Representations for Person Re-identification

Suh, Y., Wang, J., Tang, S., Mei, T., Lee, K. M.

In European Conference on Computer Vision (ECCV), 11218, pages: 418-437, Springer, Cham, September 2018 (inproceedings)

Abstract
Comparing the appearance of corresponding body parts is essential for person re-identification. However, body parts are frequently misaligned be- tween detected boxes, due to the detection errors and the pose/viewpoint changes. In this paper, we propose a network that learns a part-aligned representation for person re-identification. Our model consists of a two-stream network, which gen- erates appearance and body part feature maps respectively, and a bilinear-pooling layer that fuses two feature maps to an image descriptor. We show that it results in a compact descriptor, where the inner product between two image descriptors is equivalent to an aggregation of the local appearance similarities of the cor- responding body parts, and thereby significantly reduces the part misalignment problem. Our approach is advantageous over other pose-guided representations by learning part descriptors optimal for person re-identification. Training the net- work does not require any part annotation on the person re-identification dataset. Instead, we simply initialize the part sub-stream using a pre-trained sub-network of an existing pose estimation network and train the whole network to minimize the re-identification loss. We validate the effectiveness of our approach by demon- strating its superiority over the state-of-the-art methods on the standard bench- mark datasets including Market-1501, CUHK03, CUHK01 and DukeMTMC, and standard video dataset MARS.

ps

pdf supplementary DOI Project Page [BibTex]

pdf supplementary DOI Project Page [BibTex]


Thumb xl persondetect  copy
Learning Human Optical Flow

Ranjan, A., Romero, J., Black, M. J.

In 29th British Machine Vision Conference, September 2018 (inproceedings)

Abstract
The optical flow of humans is well known to be useful for the analysis of human action. Given this, we devise an optical flow algorithm specifically for human motion and show that it is superior to generic flow methods. Designing a method by hand is impractical, so we develop a new training database of image sequences with ground truth optical flow. For this we use a 3D model of the human body and motion capture data to synthesize realistic flow fields. We then train a convolutional neural network to estimate human flow fields from pairs of images. Since many applications in human motion analysis depend on speed, and we anticipate mobile applications, we base our method on SpyNet with several modifications. We demonstrate that our trained network is more accurate than a wide range of top methods on held-out test data and that it generalizes well to real image sequences. When combined with a person detector/tracker, the approach provides a full solution to the problem of 2D human flow estimation. Both the code and the dataset are available for research.

ps

video code pdf link (url) Project Page Project Page [BibTex]

video code pdf link (url) Project Page Project Page [BibTex]


Thumb xl nbf
Neural Body Fitting: Unifying Deep Learning and Model-Based Human Pose and Shape Estimation

(Best Student Paper Award)

Omran, M., Lassner, C., Pons-Moll, G., Gehler, P. V., Schiele, B.

In 3DV, September 2018 (inproceedings)

Abstract
Direct prediction of 3D body pose and shape remains a challenge even for highly parameterized deep learning models. Mapping from the 2D image space to the prediction space is difficult: perspective ambiguities make the loss function noisy and training data is scarce. In this paper, we propose a novel approach (Neural Body Fitting (NBF)). It integrates a statistical body model within a CNN, leveraging reliable bottom-up semantic body part segmentation and robust top-down body model constraints. NBF is fully differentiable and can be trained using 2D and 3D annotations. In detailed experiments, we analyze how the components of our model affect performance, especially the use of part segmentations as an explicit intermediate representation, and present a robust, efficiently trainable framework for 3D human pose estimation from 2D images with competitive results on standard benchmarks. Code is available at https://github.com/mohomran/neural_body_fitting

ps

arXiv code Project Page [BibTex]


Thumb xl joeleccv18
Unsupervised Learning of Multi-Frame Optical Flow with Occlusions

Janai, J., Güney, F., Ranjan, A., Black, M. J., Geiger, A.

In European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science, vol 11220, pages: 713-731, Springer, Cham, September 2018 (inproceedings)

avg ps

pdf suppmat Video Project Page DOI Project Page [BibTex]

pdf suppmat Video Project Page DOI Project Page [BibTex]


Thumb xl sample3 merge black
Learning an Infant Body Model from RGB-D Data for Accurate Full Body Motion Analysis

Hesse, N., Pujades, S., Romero, J., Black, M. J., Bodensteiner, C., Arens, M., Hofmann, U. G., Tacke, U., Hadders-Algra, M., Weinberger, R., Muller-Felber, W., Schroeder, A. S.

In Int. Conf. on Medical Image Computing and Computer Assisted Intervention (MICCAI), September 2018 (inproceedings)

Abstract
Infant motion analysis enables early detection of neurodevelopmental disorders like cerebral palsy (CP). Diagnosis, however, is challenging, requiring expert human judgement. An automated solution would be beneficial but requires the accurate capture of 3D full-body movements. To that end, we develop a non-intrusive, low-cost, lightweight acquisition system that captures the shape and motion of infants. Going beyond work on modeling adult body shape, we learn a 3D Skinned Multi-Infant Linear body model (SMIL) from noisy, low-quality, and incomplete RGB-D data. We demonstrate the capture of shape and motion with 37 infants in a clinical environment. Quantitative experiments show that SMIL faithfully represents the data and properly factorizes the shape and pose of the infants. With a case study based on general movement assessment (GMA), we demonstrate that SMIL captures enough information to allow medical assessment. SMIL provides a new tool and a step towards a fully automatic system for GMA.

ps

pdf Project page video extended arXiv version DOI Project Page [BibTex]

pdf Project page video extended arXiv version DOI Project Page [BibTex]


Thumb xl eccv pascal results  thumbnail
Deep Directional Statistics: Pose Estimation with Uncertainty Quantification

Prokudin, S., Gehler, P., Nowozin, S.

European Conference on Computer Vision (ECCV), September 2018 (conference)

Abstract
Modern deep learning systems successfully solve many perception tasks such as object pose estimation when the input image is of high quality. However, in challenging imaging conditions such as on low resolution images or when the image is corrupted by imaging artifacts, current systems degrade considerably in accuracy. While a loss in performance is unavoidable we would like our models to quantify their uncertainty in order to achieve robustness against images of varying quality. Probabilistic deep learning models combine the expressive power of deep learning with uncertainty quantification. In this paper, we propose a novel probabilistic deep learning model for the task of angular regression. Our model uses von Mises distributions to predict a distribution over object pose angle. Whereas a single von Mises distribution is making strong assumptions about the shape of the distribution, we extend the basic model to predict a mixture of von Mises distributions. We show how to learn a mixture model using a finite and infinite number of mixture components. Our model allow for likelihood-based training and efficient inference at test time. We demonstrate on a number of challenging pose estimation datasets that our model produces calibrated probability predictions and competitive or superior point estimates compared to the current state-of-the-art.

ps

code pdf [BibTex]

code pdf [BibTex]


Thumb xl beneccv18
SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images

Coors, B., Condurache, A. P., Geiger, A.

European Conference on Computer Vision (ECCV), September 2018 (conference)

Abstract
Omnidirectional cameras offer great benefits over classical cameras wherever a wide field of view is essential, such as in virtual reality applications or in autonomous robots. Unfortunately, standard convolutional neural networks are not well suited for this scenario as the natural projection surface is a sphere which cannot be unwrapped to a plane without introducing significant distortions, particularly in the polar regions. In this work, we present SphereNet, a novel deep learning framework which encodes invariance against such distortions explicitly into convolutional neural networks. Towards this goal, SphereNet adapts the sampling locations of the convolutional filters, effectively reversing distortions, and wraps the filters around the sphere. By building on regular convolutions, SphereNet enables the transfer of existing perspective convolutional neural network models to the omnidirectional case. We demonstrate the effectiveness of our method on the tasks of image classification and object detection, exploiting two newly created semi-synthetic and real-world omnidirectional datasets.

avg

pdf suppmat Project Page [BibTex]


Thumb xl vip
Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera

Marcard, T. V., Henschel, R., Black, M. J., Rosenhahn, B., Pons-Moll, G.

In European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science, vol 11214, pages: 614-631, Springer, Cham, September 2018 (inproceedings)

Abstract
In this work, we propose a method that combines a single hand-held camera and a set of Inertial Measurement Units (IMUs) attached at the body limbs to estimate accurate 3D poses in the wild. This poses many new challenges: the moving camera, heading drift, cluttered background, occlusions and many people visible in the video. We associate 2D pose detections in each image to the corresponding IMU-equipped persons by solving a novel graph based optimization problem that forces 3D to 2D coherency within a frame and across long range frames. Given associations, we jointly optimize the pose of a statistical body model, the camera pose and heading drift using a continuous optimization framework. We validated our method on the TotalCapture dataset, which provides video and IMU synchronized with ground truth. We obtain an accuracy of 26mm, which makes it accurate enough to serve as a benchmark for image-based 3D pose estimation in the wild. Using our method, we recorded 3D Poses in the Wild (3DPW ), a new dataset consisting of more than 51; 000 frames with accurate 3D pose in challenging sequences, including walking in the city, going up-stairs, having co ffee or taking the bus. We make the reconstructed 3D poses, video, IMU and 3D models available for research purposes at http://virtualhumans.mpi-inf.mpg.de/3DPW.

ps

pdf SupMat data project DOI Project Page [BibTex]

pdf SupMat data project DOI Project Page [BibTex]


Thumb xl aircap ca 3
Decentralized MPC based Obstacle Avoidance for Multi-Robot Target Tracking Scenarios

Tallamraju, R., Rajappa, S., Black, M. J., Karlapalem, K., Ahmad, A.

2018 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), pages: 1-8, IEEE, August 2018 (conference)

Abstract
In this work, we consider the problem of decentralized multi-robot target tracking and obstacle avoidance in dynamic environments. Each robot executes a local motion planning algorithm which is based on model predictive control (MPC). The planner is designed as a quadratic program, subject to constraints on robot dynamics and obstacle avoidance. Repulsive potential field functions are employed to avoid obstacles. The novelty of our approach lies in embedding these non-linear potential field functions as constraints within a convex optimization framework. Our method convexifies nonconvex constraints and dependencies, by replacing them as pre-computed external input forces in robot dynamics. The proposed algorithm additionally incorporates different methods to avoid field local minima problems associated with using potential field functions in planning. The motion planner does not enforce predefined trajectories or any formation geometry on the robots and is a comprehensive solution for cooperative obstacle avoidance in the context of multi-robot target tracking. We perform simulation studies for different scenarios to showcase the convergence and efficacy of the proposed algorithm.

ps

Published Version link (url) DOI [BibTex]

Published Version link (url) DOI [BibTex]


Thumb xl teaser image
Probabilistic Recurrent State-Space Models

Doerr, A., Daniel, C., Schiegg, M., Nguyen-Tuong, D., Schaal, S., Toussaint, M., Trimpe, S.

In Proceedings of the International Conference on Machine Learning (ICML), International Conference on Machine Learning (ICML), July 2018 (inproceedings)

Abstract
State-space models (SSMs) are a highly expressive model class for learning patterns in time series data and for system identification. Deterministic versions of SSMs (e.g., LSTMs) proved extremely successful in modeling complex time-series data. Fully probabilistic SSMs, however, unfortunately often prove hard to train, even for smaller problems. To overcome this limitation, we propose a scalable initialization and training algorithm based on doubly stochastic variational inference and Gaussian processes. In the variational approximation we propose in contrast to related approaches to fully capture the latent state temporal correlations to allow for robust training.

am ics

arXiv pdf Project Page [BibTex]

arXiv pdf Project Page [BibTex]


Thumb xl andrease teaser 2
Robust Dense Mapping for Large-Scale Dynamic Environments

Barsan, I. A., Liu, P., Pollefeys, M., Geiger, A.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2018, IEEE, International Conference on Robotics and Automation, May 2018 (inproceedings)

Abstract
We present a stereo-based dense mapping algorithm for large-scale dynamic urban environments. In contrast to other existing methods, we simultaneously reconstruct the static background, the moving objects, and the potentially moving but currently stationary objects separately, which is desirable for high-level mobile robotic tasks such as path planning in crowded environments. We use both instance-aware semantic segmentation and sparse scene flow to classify objects as either background, moving, or potentially moving, thereby ensuring that the system is able to model objects with the potential to transition from static to dynamic, such as parked cars. Given camera poses estimated from visual odometry, both the background and the (potentially) moving objects are reconstructed separately by fusing the depth maps computed from the stereo input. In addition to visual odometry, sparse scene flow is also used to estimate the 3D motions of the detected moving objects, in order to reconstruct them accurately. A map pruning technique is further developed to improve reconstruction accuracy and reduce memory consumption, leading to increased scalability. We evaluate our system thoroughly on the well-known KITTI dataset. Our system is capable of running on a PC at approximately 2.5Hz, with the primary bottleneck being the instance-aware semantic segmentation, which is a limitation we hope to address in future work.

avg

pdf Video Project Page Project Page [BibTex]

pdf Video Project Page Project Page [BibTex]


Thumb xl meta learning overview
Online Learning of a Memory for Learning Rates

(nominated for best paper award)

Meier, F., Kappler, D., Schaal, S.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2018, IEEE, International Conference on Robotics and Automation, May 2018, accepted (inproceedings)

Abstract
The promise of learning to learn for robotics rests on the hope that by extracting some information about the learning process itself we can speed up subsequent similar learning tasks. Here, we introduce a computationally efficient online meta-learning algorithm that builds and optimizes a memory model of the optimal learning rate landscape from previously observed gradient behaviors. While performing task specific optimization, this memory of learning rates predicts how to scale currently observed gradients. After applying the gradient scaling our meta-learner updates its internal memory based on the observed effect its prediction had. Our meta-learner can be combined with any gradient-based optimizer, learns on the fly and can be transferred to new optimization tasks. In our evaluations we show that our meta-learning algorithm speeds up learning of MNIST classification and a variety of learning control tasks, either in batch or online learning settings.

am

pdf video code [BibTex]

pdf video code [BibTex]


Thumb xl learning ct w asm block diagram detailed
Learning Sensor Feedback Models from Demonstrations via Phase-Modulated Neural Networks

Sutanto, G., Su, Z., Schaal, S., Meier, F.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2018, IEEE, International Conference on Robotics and Automation, May 2018 (inproceedings)

am

pdf video [BibTex]

pdf video [BibTex]


Thumb xl despoina paper teaser
RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials

Paschalidou, D., Ulusoy, A. O., Schmitt, C., Gool, L., Geiger, A.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2018, 2018 (inproceedings)

Abstract
In this paper, we consider the problem of reconstructing a dense 3D model using images captured from different views. Recent methods based on convolutional neural networks (CNN) allow learning the entire task from data. However, they do not incorporate the physics of image formation such as perspective geometry and occlusion. Instead, classical approaches based on Markov Random Fields (MRF) with ray-potentials explicitly model these physical processes, but they cannot cope with large surface appearance variations across different viewpoints. In this paper, we propose RayNet, which combines the strengths of both frameworks. RayNet integrates a CNN that learns view-invariant feature representations with an MRF that explicitly encodes the physics of perspective projection and occlusion. We train RayNet end-to-end using empirical risk minimization. We thoroughly evaluate our approach on challenging real-world datasets and demonstrate its benefits over a piece-wise trained baseline, hand-crafted models as well as other learning-based approaches.

avg

pdf suppmat Video Project Page code Poster Project Page [BibTex]

pdf suppmat Video Project Page code Poster Project Page [BibTex]


Thumb xl hmrteaser
End-to-end Recovery of Human Shape and Pose

Kanazawa, A., Black, M. J., Jacobs, D. W., Malik, J.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2018, 2018 (inproceedings)

Abstract
We describe Human Mesh Recovery (HMR), an end-to-end framework for reconstructing a full 3D mesh of a human body from a single RGB image. In contrast to most current methods that compute 2D or 3D joint locations, we produce a richer and more useful mesh representation that is parameterized by shape and 3D joint angles. The main objective is to minimize the reprojection loss of keypoints, which allows our model to be trained using in-the-wild images that only have ground truth 2D annotations. However, the reprojection loss alone is highly underconstrained. In this work we address this problem by introducing an adversary trained to tell whether human body shape and pose parameters are real or not using a large database of 3D human meshes. We show that HMR can be trained with and without using any paired 2D-to-3D supervision. We do not rely on intermediate 2D keypoint detections and infer 3D pose and shape parameters directly from image pixels. Our model runs in real-time given a bounding box containing the person. We demonstrate our approach on various images in-the-wild and out-perform previous optimization-based methods that output 3D meshes and show competitive results on tasks such as 3D joint location estimation and part segmentation.

ps

pdf code project video Project Page [BibTex]

pdf code project video Project Page [BibTex]


no image
Enhanced Non-Steady Gliding Performance of the MultiMo-Bat through Optimal Airfoil Configuration and Control Strategy

Kim, H., Woodward, M. A., Sitti, M.

In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages: 1382-1388, 2018 (inproceedings)

pi

[BibTex]

[BibTex]


no image
On Time Optimization of Centroidal Momentum Dynamics

Ponton, B., Herzog, A., Del Prete, A., Schaal, S., Righetti, L.

In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages: 5776-5782, IEEE, Brisbane, Australia, 2018 (inproceedings)

Abstract
Recently, the centroidal momentum dynamics has received substantial attention to plan dynamically consistent motions for robots with arms and legs in multi-contact scenarios. However, it is also non convex which renders any optimization approach difficult and timing is usually kept fixed in most trajectory optimization techniques to not introduce additional non convexities to the problem. But this can limit the versatility of the algorithms. In our previous work, we proposed a convex relaxation of the problem that allowed to efficiently compute momentum trajectories and contact forces. However, our approach could not minimize a desired angular momentum objective which seriously limited its applicability. Noticing that the non-convexity introduced by the time variables is of similar nature as the centroidal dynamics one, we propose two convex relaxations to the problem based on trust regions and soft constraints. The resulting approaches can compute time-optimized dynamically consistent trajectories sufficiently fast to make the approach realtime capable. The performance of the algorithm is demonstrated in several multi-contact scenarios for a humanoid robot. In particular, we show that the proposed convex relaxation of the original problem finds solutions that are consistent with the original non-convex problem and illustrate how timing optimization allows to find motion plans that would be difficult to plan with fixed timing † †Implementation details and demos can be found in the source code available at https://git-amd.tuebingen.mpg.de/bponton/timeoptimization.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Thumb xl yiyi paper teaser
Deep Marching Cubes: Learning Explicit Surface Representations

Liao, Y., Donne, S., Geiger, A.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2018, 2018 (inproceedings)

Abstract
Existing learning based solutions to 3D surface prediction cannot be trained end-to-end as they operate on intermediate representations (eg, TSDF) from which 3D surface meshes must be extracted in a post-processing step (eg, via the marching cubes algorithm). In this paper, we investigate the problem of end-to-end 3D surface prediction. We first demonstrate that the marching cubes algorithm is not differentiable and propose an alternative differentiable formulation which we insert as a final layer into a 3D convolutional neural network. We further propose a set of loss functions which allow for training our model with sparse point supervision. Our experiments demonstrate that the model allows for predicting sub-voxel accurate 3D shapes of arbitrary topology. Additionally, it learns to complete shapes and to separate an object's inside from its outside even in the presence of sparse and incomplete ground truth. We investigate the benefits of our approach on the task of inferring shapes from 3D point clouds. Our model is flexible and can be combined with a variety of shape encoder and shape inference techniques.

avg

pdf suppmat Video Project Page Poster Project Page [BibTex]

pdf suppmat Video Project Page Poster Project Page [BibTex]


Thumb xl teaser andreas
Semantic Visual Localization

Schönberger, J., Pollefeys, M., Geiger, A., Sattler, T.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2018, 2018 (inproceedings)

Abstract
Robust visual localization under a wide range of viewing conditions is a fundamental problem in computer vision. Handling the difficult cases of this problem is not only very challenging but also of high practical relevance, eg, in the context of life-long localization for augmented reality or autonomous robots. In this paper, we propose a novel approach based on a joint 3D geometric and semantic understanding of the world, enabling it to succeed under conditions where previous approaches failed. Our method leverages a novel generative model for descriptor learning, trained on semantic scene completion as an auxiliary task. The resulting 3D descriptors are robust to missing observations by encoding high-level 3D geometric and semantic information. Experiments on several challenging large-scale localization datasets demonstrate reliable localization under extreme viewpoint, illumination, and geometry changes.

avg

pdf suppmat Poster Project Page [BibTex]

pdf suppmat Poster Project Page [BibTex]


Thumb xl eigval gradpen
Which Training Methods for GANs do actually Converge?

Mescheder, L., Geiger, A., Nowozin, S.

International Conference on Machine learning (ICML), 2018 (conference)

Abstract
Recent work has shown local convergence of GAN training for absolutely continuous data and generator distributions. In this paper, we show that the requirement of absolute continuity is necessary: we describe a simple yet prototypical counterexample showing that in the more realistic case of distributions that are not absolutely continuous, unregularized GAN training is not always convergent. Furthermore, we discuss regularization strategies that were recently proposed to stabilize GAN training. Our analysis shows that GAN training with instance noise or zero-centered gradient penalties converges. On the other hand, we show that Wasserstein-GANs and WGAN-GP with a finite number of discriminator updates per generator update do not always converge to the equilibrium point. We discuss these results, leading us to a new explanation for the stability problems of GAN training. Based on our analysis, we extend our convergence results to more general GANs and prove local convergence for simplified gradient penalties even if the generator and data distributions lie on lower dimensional manifolds. We find these penalties to work well in practice and use them to learn high-resolution generative image models for a variety of datasets with little hyperparameter tuning.

avg

code video paper supplement slides poster Project Page [BibTex]


no image
Collectives of Spinning Mobile Microrobots for Navigation and Object Manipulation at the Air-Water Interface

Wang, W., Kishore, V., Koens, L., Lauga, E., Sitti, M.

In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages: 1-9, 2018 (inproceedings)

pi

[BibTex]

[BibTex]


Thumb xl david paper teaser
Learning 3D Shape Completion from Laser Scan Data with Weak Supervision

Stutz, D., Geiger, A.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2018, 2018 (inproceedings)

Abstract
3D shape completion from partial point clouds is a fundamental problem in computer vision and computer graphics. Recent approaches can be characterized as either data-driven or learning-based. Data-driven approaches rely on a shape model whose parameters are optimized to fit the observations. Learning-based approaches, in contrast, avoid the expensive optimization step and instead directly predict the complete shape from the incomplete observations using deep neural networks. However, full supervision is required which is often not available in practice. In this work, we propose a weakly-supervised learning-based approach to 3D shape completion which neither requires slow optimization nor direct supervision. While we also learn a shape prior on synthetic data, we amortize, ie, learn, maximum likelihood fitting using deep neural networks resulting in efficient shape completion without sacrificing accuracy. Tackling 3D shape completion of cars on ShapeNet and KITTI, we demonstrate that the proposed amortized maximum likelihood approach is able to compete with a fully supervised baseline and a state-of-the-art data-driven approach while being significantly faster. On ModelNet, we additionally show that the approach is able to generalize to other object categories as well.

avg

pdf suppmat Project Page Poster Project Page [BibTex]

pdf suppmat Project Page Poster Project Page [BibTex]


no image
Endo-VMFuseNet: A Deep Visual-Magnetic Sensor Fusion Approach for Endoscopic Capsule Robots

Turan, M., Almalioglu, Y., Gilbert, H. B., Sari, A. E., Soylu, U., Sitti, M.

In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages: 1-7, 2018 (inproceedings)

pi

[BibTex]

[BibTex]


no image
Endosensorfusion: Particle filtering-based multi-sensory data fusion with switching state-space model for endoscopic capsule robots

Turan, M., Almalioglu, Y., Gilbert, H., Araujo, H., Cemgil, T., Sitti, M.

In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages: 1-8, 2018 (inproceedings)

pi

[BibTex]

[BibTex]


Thumb xl benvisapp
Learning Transformation Invariant Representations with Weak Supervision

Coors, B., Condurache, A., Mertins, A., Geiger, A.

In International Conference on Computer Vision Theory and Applications, International Conference on Computer Vision Theory and Applications, 2018 (inproceedings)

Abstract
Deep convolutional neural networks are the current state-of-the-art solution to many computer vision tasks. However, their ability to handle large global and local image transformations is limited. Consequently, extensive data augmentation is often utilized to incorporate prior knowledge about desired invariances to geometric transformations such as rotations or scale changes. In this work, we combine data augmentation with an unsupervised loss which enforces similarity between the predictions of augmented copies of an input sample. Our loss acts as an effective regularizer which facilitates the learning of transformation invariant representations. We investigate the effectiveness of the proposed similarity loss on rotated MNIST and the German Traffic Sign Recognition Benchmark (GTSRB) in the context of different classification models including ladder networks. Our experiments demonstrate improvements with respect to the standard data augmentation approach for supervised and semi-supervised learning tasks, in particular in the presence of little annotated data. In addition, we analyze the performance of the proposed approach with respect to its hyperparameters, including the strength of the regularization as well as the layer where representation similarity is enforced.

avg

pdf [BibTex]

pdf [BibTex]


Thumb xl smalrteaser
Lions and Tigers and Bears: Capturing Non-Rigid, 3D, Articulated Shape from Images

Zuffi, S., Kanazawa, A., Black, M. J.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2018, 2018 (inproceedings)

Abstract
Animals are widespread in nature and the analysis of their shape and motion is important in many fields and industries. Modeling 3D animal shape, however, is difficult because the 3D scanning methods used to capture human shape are not applicable to wild animals or natural settings. Consequently, we propose a method to capture the detailed 3D shape of animals from images alone. The articulated and deformable nature of animals makes this problem extremely challenging, particularly in unconstrained environments with moving and uncalibrated cameras. To make this possible, we use a strong prior model of articulated animal shape that we fit to the image data. We then deform the animal shape in a canonical reference pose such that it matches image evidence when articulated and projected into multiple images. Our method extracts significantly more 3D shape detail than previous methods and is able to model new species, including the shape of an extinct animal, using only a few video frames. Additionally, the projected 3D shapes are accurate enough to facilitate the extraction of a realistic texture map from multiple frames.

ps

pdf code/data 3D models Project Page [BibTex]

pdf code/data 3D models Project Page [BibTex]


no image
Unsupervised Contact Learning for Humanoid Estimation and Control

Rotella, N., Schaal, S., Righetti, L.

In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages: 411-417, IEEE, Brisbane, Australia, 2018 (inproceedings)

Abstract
This work presents a method for contact state estimation using fuzzy clustering to learn contact probability for full, six-dimensional humanoid contacts. The data required for training is solely from proprioceptive sensors - endeffector contact wrench sensors and inertial measurement units (IMUs) - and the method is completely unsupervised. The resulting cluster means are used to efficiently compute the probability of contact in each of the six endeffector degrees of freedom (DoFs) independently. This clustering-based contact probability estimator is validated in a kinematics-based base state estimator in a simulation environment with realistic added sensor noise for locomotion over rough, low-friction terrain on which the robot is subject to foot slip and rotation. The proposed base state estimator which utilizes these six DoF contact probability estimates is shown to perform considerably better than that which determines kinematic contact constraints purely based on measured normal force.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Learning Task-Specific Dynamics to Improve Whole-Body Control

Gams, A., Mason, S., Ude, A., Schaal, S., Righetti, L.

In Hua, IEEE, Beijing, China, November 2018 (inproceedings)

Abstract
In task-based inverse dynamics control, reference accelerations used to follow a desired plan can be broken down into feedforward and feedback trajectories. The feedback term accounts for tracking errors that are caused from inaccurate dynamic models or external disturbances. On underactuated, free-floating robots, such as humanoids, high feedback terms can be used to improve tracking accuracy; however, this can lead to very stiff behavior or poor tracking accuracy due to limited control bandwidth. In this paper, we show how to reduce the required contribution of the feedback controller by incorporating learned task-space reference accelerations. Thus, we i) improve the execution of the given specific task, and ii) offer the means to reduce feedback gains, providing for greater compliance of the system. With a systematic approach we also reduce heuristic tuning of the model parameters and feedback gains, often present in real-world experiments. In contrast to learning task-specific joint-torques, which might produce a similar effect but can lead to poor generalization, our approach directly learns the task-space dynamics of the center of mass of a humanoid robot. Simulated and real-world results on the lower part of the Sarcos Hermes humanoid robot demonstrate the applicability of the approach.

am mg

link (url) [BibTex]

link (url) [BibTex]


no image
An MPC Walking Framework With External Contact Forces

Mason, S., Rotella, N., Schaal, S., Righetti, L.

In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages: 1785-1790, IEEE, Brisbane, Australia, May 2018 (inproceedings)

Abstract
In this work, we present an extension to a linear Model Predictive Control (MPC) scheme that plans external contact forces for the robot when given multiple contact locations and their corresponding friction cone. To this end, we set up a two-step optimization problem. In the first optimization, we compute the Center of Mass (CoM) trajectory, foot step locations, and introduce slack variables to account for violating the imposed constraints on the Zero Moment Point (ZMP). We then use the slack variables to trigger the second optimization, in which we calculate the optimal external force that compensates for the ZMP tracking error. This optimization considers multiple contacts positions within the environment by formulating the problem as a Mixed Integer Quadratic Program (MIQP) that can be solved at a speed between 100-300 Hz. Once contact is created, the MIQP reduces to a single Quadratic Program (QP) that can be solved in real-time ({\textless}; 1kHz). Simulations show that the presented walking control scheme can withstand disturbances 2-3× larger with the additional force provided by a hand contact.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]

2012


Thumb xl screen shot 2015 08 23 at 13.56.29
Towards Multi-DOF model mediated teleoperation: Using vision to augment feedback

Willaert, B., Bohg, J., Van Brussel, H., Niemeyer, G.

In IEEE International Workshop on Haptic Audio Visual Environments and Games (HAVE), pages: 25-31, October 2012 (inproceedings)

Abstract
In this paper, we address some of the challenges that arise as model-mediated teleoperation is applied to systems with multiple degrees of freedom and multiple sensors. Specifically we use a system with position, force, and vision sensors to explore an environment geometry in two degrees of freedom. The inclusion of vision is proposed to alleviate the difficulties of estimating an increasing number of environment properties. Vision can furthermore increase the predictive nature of model-mediated teleoperation, by effectively predicting touch feedback before the slave is even in contact with the environment. We focus on the case of estimating the location and orientation of a local surface patch at the contact point between the slave and the environment. We describe the various information sources with their respective limitations and create a combined model estimator as part of a multi-d.o.f. model-mediated controller. An experiment demonstrates the feasibility and benefits of utilizing vision sensors in teleoperation.

am

DOI [BibTex]

2012


DOI [BibTex]


Thumb xl sankaran iros 20121
Failure Recovery with Shared Autonomy

Sankaran, B., Pitzer, B., Osentoski, S.

In International Conference on Intelligent Robots and Systems, October 2012 (inproceedings)

Abstract
Building robots capable of long term autonomy has been a long standing goal of robotics research. Such systems must be capable of performing certain tasks with a high degree of robustness and repeatability. In the context of personal robotics, these tasks could range anywhere from retrieving items from a refrigerator, loading a dishwasher, to setting up a dinner table. Given the complexity of tasks there are a multitude of failure scenarios that the robot can encounter, irrespective of whether the environment is static or dynamic. For a robot to be successful in such situations, it would need to know how to recover from failures or when to ask a human for help. This paper, presents a novel shared autonomy behavioral executive to addresses these issues. We demonstrate how this executive combines generalized logic based recovery and human intervention to achieve continuous failure free operation. We tested the systems over 250 trials of two different use case experiments. Our current algorithm drastically reduced human intervention from 26% to 4% on the first experiment and 46% to 9% on the second experiment. This system provides a new dimension to robot autonomy, where robots can exhibit long term failure free operation with minimal human supervision. We also discuss how the system can be generalized.

am

link (url) [BibTex]

link (url) [BibTex]


Thumb xl paperfig
Lie Bodies: A Manifold Representation of 3D Human Shape

Freifeld, O., Black, M. J.

In European Conf. on Computer Vision (ECCV), pages: 1-14, Part I, LNCS 7572, (Editors: A. Fitzgibbon et al. (Eds.)), Springer-Verlag, October 2012 (inproceedings)

Abstract
Three-dimensional object shape is commonly represented in terms of deformations of a triangular mesh from an exemplar shape. Existing models, however, are based on a Euclidean representation of shape deformations. In contrast, we argue that shape has a manifold structure: For example, summing the shape deformations for two people does not necessarily yield a deformation corresponding to a valid human shape, nor does the Euclidean difference of these two deformations provide a meaningful measure of shape dissimilarity. Consequently, we define a novel manifold for shape representation, with emphasis on body shapes, using a new Lie group of deformations. This has several advantages. First we define triangle deformations exactly, removing non-physical deformations and redundant degrees of freedom common to previous methods. Second, the Riemannian structure of Lie Bodies enables a more meaningful definition of body shape similarity by measuring distance between bodies on the manifold of body shape deformations. Third, the group structure allows the valid composition of deformations. This is important for models that factor body shape deformations into multiple causes or represent shape as a linear combination of basis shapes. Finally, body shape variation is modeled using statistics on manifolds. Instead of modeling Euclidean shape variation with Principal Component Analysis we capture shape variation on the manifold using Principal Geodesic Analysis. Our experiments show consistent visual and quantitative advantages of Lie Bodies over traditional Euclidean models of shape deformation and our representation can be easily incorporated into existing methods.

ps

pdf supplemental material youtube poster eigenshape video code Project Page Project Page Project Page [BibTex]

pdf supplemental material youtube poster eigenshape video code Project Page Project Page Project Page [BibTex]


Thumb xl coregteaser
Coregistration: Simultaneous alignment and modeling of articulated 3D shape

Hirshberg, D., Loper, M., Rachlin, E., Black, M.

In European Conf. on Computer Vision (ECCV), pages: 242-255, LNCS 7577, Part IV, (Editors: A. Fitzgibbon et al. (Eds.)), Springer-Verlag, October 2012 (inproceedings)

Abstract
Three-dimensional (3D) shape models are powerful because they enable the inference of object shape from incomplete, noisy, or ambiguous 2D or 3D data. For example, realistic parameterized 3D human body models have been used to infer the shape and pose of people from images. To train such models, a corpus of 3D body scans is typically brought into registration by aligning a common 3D human-shaped template to each scan. This is an ill-posed problem that typically involves solving an optimization problem with regularization terms that penalize implausible deformations of the template. When aligning a corpus, however, we can do better than generic regularization. If we have a model of how the template can deform then alignments can be regularized by this model. Constructing a model of deformations, however, requires having a corpus that is already registered. We address this chicken-and-egg problem by approaching modeling and registration together. By minimizing a single objective function, we reliably obtain high quality registration of noisy, incomplete, laser scans, while simultaneously learning a highly realistic articulated body model. The model greatly improves robustness to noise and missing data. Since the model explains a corpus of body scans, it captures how body shape varies across people and poses.

ps

pdf publisher site poster supplemental material (400MB) Project Page Project Page [BibTex]

pdf publisher site poster supplemental material (400MB) Project Page Project Page [BibTex]


Thumb xl sintelworkshop
Lessons and insights from creating a synthetic optical flow benchmark

Wulff, J., Butler, D. J., Stanley, G. B., Black, M. J.

In ECCV Workshop on Unsolved Problems in Optical Flow and Stereo Estimation, pages: 168-177, Part II, LNCS 7584, (Editors: A. Fusiello et al. (Eds.)), Springer-Verlag, October 2012 (inproceedings)

ps

pdf dataset poster youtube Project Page [BibTex]

pdf dataset poster youtube Project Page [BibTex]


Thumb xl tripod seq 16 054 part 3d vis
3D2PM – 3D Deformable Part Models

Pepik, B., Gehler, P., Stark, M., Schiele, B.

In Proceedings of the European Conference on Computer Vision (ECCV), pages: 356-370, Lecture Notes in Computer Science, (Editors: Fitzgibbon, Andrew W. and Lazebnik, Svetlana and Perona, Pietro and Sato, Yoichi and Schmid, Cordelia), Springer, Firenze, October 2012 (inproceedings)

ps

pdf video poster Project Page [BibTex]

pdf video poster Project Page [BibTex]


Thumb xl sinteleccv2012crop
A naturalistic open source movie for optical flow evaluation

Butler, D. J., Wulff, J., Stanley, G. B., Black, M. J.

In European Conf. on Computer Vision (ECCV), pages: 611-625, Part IV, LNCS 7577, (Editors: A. Fitzgibbon et al. (Eds.)), Springer-Verlag, October 2012 (inproceedings)

Abstract
Ground truth optical flow is difficult to measure in real scenes with natural motion. As a result, optical flow data sets are restricted in terms of size, complexity, and diversity, making optical flow algorithms difficult to train and test on realistic data. We introduce a new optical flow data set derived from the open source 3D animated short film Sintel. This data set has important features not present in the popular Middlebury flow evaluation: long sequences, large motions, specular reflections, motion blur, defocus blur, and atmospheric effects. Because the graphics data that generated the movie is open source, we are able to render scenes under conditions of varying complexity to evaluate where existing flow algorithms fail. We evaluate several recent optical flow algorithms and find that current highly-ranked methods on the Middlebury evaluation have difficulty with this more complex data set suggesting further research on optical flow estimation is needed. To validate the use of synthetic data, we compare the image- and flow-statistics of Sintel to those of real films and videos and show that they are similar. The data set, metrics, and evaluation website are publicly available.

ps

pdf dataset youtube talk supplemental material Project Page Project Page [BibTex]

pdf dataset youtube talk supplemental material Project Page Project Page [BibTex]


Thumb xl bottlehandovergrasp
Task-Based Grasp Adaptation on a Humanoid Robot

Bohg, J., Welke, K., León, B., Do, M., Song, D., Wohlkinger, W., Aldoma, A., Madry, M., Przybylski, M., Asfour, T., Marti, H., Kragic, D., Morales, A., Vincze, M.

In 10th IFAC Symposium on Robot Control, SyRoCo 2012, Dubrovnik, Croatia, September 5-7, 2012., pages: 779-786, September 2012 (inproceedings)

Abstract
In this paper, we present an approach towards autonomous grasping of objects according to their category and a given task. Recent advances in the field of object segmentation and categorization as well as task-based grasp inference have been leveraged by integrating them into one pipeline. This allows us to transfer task-specific grasp experience between objects of the same category. The effectiveness of the approach is demonstrated on the humanoid robot ARMAR-IIIa.

am

Video pdf DOI [BibTex]

Video pdf DOI [BibTex]


Thumb xl embs2012
A framework for relating neural activity to freely moving behavior

Foster, J. D., Nuyujukian, P., Freifeld, O., Ryu, S., Black, M. J., Shenoy, K. V.

In 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC’12), pages: 2736 -2739 , IEEE, San Diego, August 2012 (inproceedings)

ps

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Thumb xl screen shot 2012 06 25 at 1.59.41 pm
Pottics – The Potts Topic Model for Semantic Image Segmentation

Dann, C., Gehler, P., Roth, S., Nowozin, S.

In Proceedings of 34th DAGM Symposium, pages: 397-407, Lecture Notes in Computer Science, (Editors: Pinz, Axel and Pock, Thomas and Bischof, Horst and Leberl, Franz), Springer, August 2012 (inproceedings)

ps

code pdf poster [BibTex]

code pdf poster [BibTex]


Thumb xl thumb hennigk2012
Quasi-Newton Methods: A New Direction

Hennig, P., Kiefel, M.

In Proceedings of the 29th International Conference on Machine Learning, pages: 25-32, ICML ’12, (Editors: John Langford and Joelle Pineau), Omnipress, New York, NY, USA, ICML, July 2012 (inproceedings)

Abstract
Four decades after their invention, quasi- Newton methods are still state of the art in unconstrained numerical optimization. Although not usually interpreted thus, these are learning algorithms that fit a local quadratic approximation to the objective function. We show that many, including the most popular, quasi-Newton methods can be interpreted as approximations of Bayesian linear regression under varying prior assumptions. This new notion elucidates some shortcomings of classical algorithms, and lights the way to a novel nonparametric quasi-Newton method, which is able to make more efficient use of available information at computational cost similar to its predecessors.

ei ps pn

website+code pdf link (url) [BibTex]

website+code pdf link (url) [BibTex]


Thumb xl frompstods2
From pictorial structures to deformable structures

Zuffi, S., Freifeld, O., Black, M. J.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 3546-3553, IEEE, June 2012 (inproceedings)

Abstract
Pictorial Structures (PS) define a probabilistic model of 2D articulated objects in images. Typical PS models assume an object can be represented by a set of rigid parts connected with pairwise constraints that define the prior probability of part configurations. These models are widely used to represent non-rigid articulated objects such as humans and animals despite the fact that such objects have parts that deform non-rigidly. Here we define a new Deformable Structures (DS) model that is a natural extension of previous PS models and that captures the non-rigid shape deformation of the parts. Each part in a DS model is represented by a low-dimensional shape deformation space and pairwise potentials between parts capture how the shape varies with pose and the shape of neighboring parts. A key advantage of such a model is that it more accurately models object boundaries. This enables image likelihood models that are more discriminative than previous PS likelihoods. This likelihood is learned using training imagery annotated using a DS “puppet.” We focus on a human DS model learned from 2D projections of a realistic 3D human body model and use it to infer human poses in images using a form of non-parametric belief propagation.

ps

pdf sup mat code poster Project Page Project Page Project Page Project Page [BibTex]

pdf sup mat code poster Project Page Project Page Project Page Project Page [BibTex]


Thumb xl screen shot 2012 03 22 at 17.51.07
Teaching 3D Geometry to Deformable Part Models

Pepik, B., Stark, M., Gehler, P., Schiele, B.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 3362 -3369, IEEE, Providence, RI, USA, June 2012, oral presentation (inproceedings)

ps

pdf DOI Project Page [BibTex]

pdf DOI Project Page [BibTex]