Autonomous Vision – Max Planck Institute for Intelligent Systems

Institute Homepage

Institute Homepage Sign In

Back

Research Overview

3D Reconstruction

Motion Estimation and Scene Understanding

Generative Models and Image Synthesis

/avg/projects/3d-shape-completion

/avg/projects/learning-high-dimensional-deep-representations

Autonomous Vision Members Publications

Code Videos

Code

The code that accompanies our CVPR 2018 paper can be found here. The dedicated documentation page can be found in our documentation site.

Videos

Towards Probabilistic Volumetric Reconstruction using Ray Potentials

This work presents a novel probabilistic foundation for volumetric 3-d reconstruction. We formulate the problem as inference in a Markov random field, which accurately captures the dependencies between the occupancy and appearance of each voxel, given all input images. Our main contribution is an approximate highly parallelized discrete-continuous inference algorithm to compute the marginal distributions of each voxel's occupancy and appearance. In contrast to the MAP solution, marginals encode the underlying uncertainty and ambiguity in the reconstruction. Moreover, the proposed algorithm allows for a Bayes optimal prediction with respect to a natural reconstruction loss. We compare our method to two state-of-the-art volumetric reconstruction algorithms on three challenging aerial datasets with LIDAR ground truth. Our experiments demonstrate that the proposed algorithm compares favorably in terms of reconstruction accuracy and the ability to expose reconstruction uncertainty.

Members

Autonomous Vision

Despoina Paschalidou

Autonomous Vision, Perceiving Systems

Andreas Geiger

Guest Scientist

Perceiving Systems, Autonomous Vision

Osman Ulusoy

Publications

Autonomous Vision Conference Paper Learning Priors for Semantic 3D Reconstruction Cherabier, I., Schönberger, J., Oswald, M., Pollefeys, M., Geiger, A. In Computer Vision – ECCV 2018, Springer International Publishing, Cham, September 2018

Abstract ›

We present a novel semantic 3D reconstruction framework which embeds variational regularization into a neural network. Our network performs a fixed number of unrolled multi-scale optimization iterations with shared interaction weights. In contrast to existing variational methods for semantic 3D reconstruction, our model is end-to-end trainable and captures more complex dependencies between the semantic labels and the 3D geometry. Compared to previous learning-based approaches to 3D reconstruction, we integrate powerful long-range dependencies using variational coarse-to-fine optimization. As a result, our network architecture requires only a moderate number of parameters while keeping a high level of expressiveness which enables learning from very little data. Experiments on real and synthetic datasets demonstrate that our network achieves higher accuracy compared to a purely variational approach while at the same time requiring two orders of magnitude less iterations to converge. Moreover, our approach handles ten times more semantic class labels using the same computational resources.

pdf suppmat Project Page Video DOI BibTeX

Autonomous Vision Conference Paper RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials Paschalidou, D., Ulusoy, A. O., Schmitt, C., Gool, L., Geiger, A. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2018

Abstract ›

In this paper, we consider the problem of reconstructing a dense 3D model using images captured from different views. Recent methods based on convolutional neural networks (CNN) allow learning the entire task from data. However, they do not incorporate the physics of image formation such as perspective geometry and occlusion. Instead, classical approaches based on Markov Random Fields (MRF) with ray-potentials explicitly model these physical processes, but they cannot cope with large surface appearance variations across different viewpoints. In this paper, we propose RayNet, which combines the strengths of both frameworks. RayNet integrates a CNN that learns view-invariant feature representations with an MRF that explicitly encodes the physics of perspective projection and occlusion. We train RayNet end-to-end using empirical risk minimization. We thoroughly evaluate our approach on challenging real-world datasets and demonstrate its benefits over a piece-wise trained baseline, hand-crafted models as well as other learning-based approaches.

pdf suppmat Video Project Page code Poster BibTeX

Perceiving Systems Autonomous Vision Conference Paper Semantic Multi-view Stereo: Jointly Estimating Objects and Voxels Ulusoy, A. O., Black, M. J., Geiger, A. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, 4531-4540, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

Abstract ›

Dense 3D reconstruction from RGB images is a highly ill-posed problem due to occlusions, textureless or reflective surfaces, as well as other challenges. We propose object-level shape priors to address these ambiguities. Towards this goal, we formulate a probabilistic model that integrates multi-view image evidence with 3D shape information from multiple objects. Inference in this model yields a dense 3D reconstruction of the scene as well as the existence and precise 3D pose of the objects in it. Our approach is able to recover fine details not captured in the input shapes while defaulting to the input models in occluded regions where image evidence is weak. Due to its probabilistic nature, the approach is able to cope with the approximate geometry of the 3D models as well as input shapes that are not present in the scene. We evaluate the approach quantitatively on several challenging indoor and outdoor datasets.

YouTube pdf suppmat BibTeX

Perceiving Systems Autonomous Vision Conference Paper Patches, Planes and Probabilities: A Non-local Prior for Volumetric 3D Reconstruction Ulusoy, A. O., Black, M. J., Geiger, A. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 3280-3289, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

Abstract ›

In this paper, we propose a non-local structured prior for volumetric multi-view 3D reconstruction. Towards this goal, we present a novel Markov random field model based on ray potentials in which assumptions about large 3D surface patches such as planarity or Manhattan world constraints can be efficiently encoded as probabilistic priors. We further derive an inference algorithm that reasons jointly about voxels, pixels and image segments, and estimates marginal distributions of appearance, occupancy, depth, normals and planarity. Key to tractable inference is a novel hybrid representation that spans both voxel and pixel space and that integrates non-local information from 2D image segmentations in a principled way. We compare our non-local prior to commonly employed local smoothness assumptions and a variety of state-of-the-art volumetric reconstruction baselines on challenging outdoor scenes with textureless and reflective surfaces. Our experiments indicate that regularizing over larger distances has the potential to resolve ambiguities where local regularizers fail.

YouTube pdf poster suppmat BibTeX

Perceiving Systems Autonomous Vision Conference Paper Towards Probabilistic Volumetric Reconstruction using Ray Potentials Ulusoy, A. O., Geiger, A., Black, M. J. In 3D Vision (3DV), 2015 3rd International Conference on, 10-18, Lyon, October 2015

Abstract ›

This paper presents a novel probabilistic foundation for volumetric 3-d reconstruction. We formulate the problem as inference in a Markov random field, which accurately captures the dependencies between the occupancy and appearance of each voxel, given all input images. Our main contribution is an approximate highly parallelized discrete-continuous inference algorithm to compute the marginal distributions of each voxel's occupancy and appearance. In contrast to the MAP solution, marginals encode the underlying uncertainty and ambiguity in the reconstruction. Moreover, the proposed algorithm allows for a Bayes optimal prediction with respect to a natural reconstruction loss. We compare our method to two state-of-the-art volumetric reconstruction algorithms on three challenging aerial datasets with LIDAR ground truth. Our experiments demonstrate that the proposed algorithm compares favorably in terms of reconstruction accuracy and the ability to expose reconstruction uncertainty.

code YouTube pdf suppmat DOI BibTeX