Embodied Vision Members Publications

Physically Consistent Object-Level Scene Reconstruction

Example result by CIR-phys [File Icon] (right). Left: ground truth. Middle: baseline deep learning-based approach.

Members

Embodied Vision
Max Planck Research Group Leader
Embodied Vision
Embodied Vision
Research Associate
no image
Embodied Vision
no image
Embodied Vision

Publications

Embodied Vision Article Event-based Non-Rigid Reconstruction of Low-Rank Parametrized Deformations from Contours Xue, Y., Li, H., Leutenegger, S., Stueckler, J. International Journal of Computer Vision (IJCV), 2024 (Published)
Visual reconstruction of fast non-rigid object deformations over time is a challenge for conventional frame-based cameras. In recent years, event cameras have gained significant attention due to their bio-inspired properties, such as high temporal resolution and high dynamic range. In this paper, we propose a novel approach for reconstructing such deformations using event measurements. Under the assumption of a static background, where all events are generated by the motion, our approach estimates the deformation of objects from events generated at the object contour in a probabilistic optimization framework. It associates events to mesh faces on the contour and maximizes the alignment of the line of sight through the event pixel with the associated face. In experiments on synthetic and real data of human body motion, we demonstrate the advantages of our method over state-of-the-art optimization and learning-based approaches for reconstructing the motion of human arms and hands. In addition, we propose an efficient event stream simulator to synthesize realistic event data for human motion.
DOI URL BibTeX

Embodied Vision Conference Paper Physically Plausible Object Pose Refinement in Cluttered Scenes Strecke, M., Stueckler, J. In Proceedings of the German Conference on Pattern Recognition (GCPR), 2024, to appear (To be published) code preprint (submitted version) BibTeX

Embodied Vision Conference Paper Physics-Based Rigid Body Object Tracking and Friction Filtering From RGB-D Videos Kandukuri, R. K., Strecke, M., Stueckler, J. In Proceedings of the International Conference on 3D Vision (3DV), 2024 (Published)
Physics-based understanding of object interactions from sensory observations is an essential capability in augmented reality and robotics. It enables to capture the properties of a scene for simulation and control. In this paper, we propose a novel approach for real-to-sim which tracks rigid objects in 3D from RGB-D images and infers physical properties of the objects. We use a differentiable physics simulation as state-transition model in an Extended Kalman Filter which can model contact and friction for arbitrary mesh-based shapes and in this way estimate physically plausible trajectories. We demonstrate that our approach can filter position, orientation, velocities, and concurrently can estimate the coefficient of friction of the objects. We analyze our approach on various sliding scenarios in synthetic image sequences of single objects and colliding objects. We also demonstrate and evaluate our approach on a real-world dataset. We make our novel benchmark datasets publicly available to foster future research in this novel problem setting and comparison with our method.
preprint supplemental video dataset DOI URL BibTeX

Embodied Vision Conference Paper Learning-based Relational Object Matching Across Views Elich, C., Armeni, I., Oswald, M. R., Pollefeys, M., Stueckler, J. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2023 (Published)
Intelligent robots require object-level scene understanding to reason about possible tasks and interactions with the environment. Moreover, many perception tasks such as scene reconstruction, image retrieval, or place recognition can benefit from reasoning on the level of objects. While keypoint-based matching can yield strong results for finding correspondences for images with small to medium view point changes, for large view point changes, matching semantically on the object-level becomes advantageous. In this paper, we propose a learning-based approach which combines local keypoints with novel object-level features for matching object detections between RGB images. We train our object-level matching features based on appearance and inter-frame and cross-frame spatial relations between objects in an associative graph neural network. We demonstrate our approach in a large variety of views on realistically rendered synthetic images. Our approach compares favorably to previous state-of-the-art object-level matching approaches and achieves improved performance over a pure keypoint-based approach for large view-point changes.
preprint code DOI URL BibTeX

Embodied Vision Article Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using Deep Shape Priors Elich, C., Oswald, M. R., Pollefeys, M., Stueckler, J. Computer Vision and Image Understanding (CVIU), 220, July 2022 (Published)
Representing scenes at the granularity of objects is a prerequisite for scene understanding and decision making. We propose PriSMONet, a novel approach based on Prior Shape knowledge for learning Multi-Object 3D scene decomposition and representations from single images. Our approach learns to decompose images of synthetic scenes with multiple objects on a planar surface into its constituent scene objects and to infer their 3D properties from a single view. A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image. By differentiable rendering, we train our model to decompose scenes from RGB-D images in a self-supervised way. The 3D shapes are represented continuously in function-space as signed distance functions which we pre-train from example shapes in a supervised way. These shape priors provide weak supervision signals to better condition the challenging overall learning task. We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.
Link Preprint DOI URL BibTeX

Embodied Vision Conference Paper Event-based Non-Rigid Reconstruction from Contours Xue, Y., Li, H., Leutenegger, S., Stueckler, J. In Proceedings of the British Machine Vision Conference (BMVC), 2022 (Published)
Visual reconstruction of fast non-rigid object deformations over time is a challenge for conventional frame-based cameras. In this paper, we propose a novel approach for reconstructing such deformations using measurements from event-based cameras. Our approach estimates the deformation of objects from events generated at the object contour in a probabilistic optimization framework. It associates events to mesh faces on the contour and maximizes the alignment of the line of sight through the event pixel with the associated face. In experiments on synthetic and real data, we demonstrate the advantages of our method over state-of-the-art optimization and learning-based approaches for reconstructing the motion of human hands.
preprint video URL BibTeX

Embodied Vision Conference Paper DirectShape: Photometric Alignment of Shape Priors for Visual Vehicle Pose and Shape Estimation Wang, R., Yang, N., Stückler, J., Cremers, D. In Proceedings of the IEEE international Conference on Robotics and Automation (ICRA), 11067 - 11073, IEEE, Piscataway, NJ, IEEE International Conference on Robotics and Automation (ICRA 2020), May 2020, arXiv:1904.10097 (Published) DOI BibTeX