Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Embodied Vision Conference Paper Online Calibration of a Single-Track Ground Vehicle Dynamics Model by Tight Fusion with Visual-Inertial Odometry Li, H., Stueckler, J. In 2024 IEEE International Conference on Robotics and Automation (ICRA 2024) , 1631-1637, Piscataway, NJ, IEEE International Conference on Robotics and Automation (ICRA 2024), August 2024 (Published)
Wheeled mobile robots need the ability to estimate their motion and the effect of their control actions for navigation planning. In this paper, we present ST-VIO, a novel approach which tightly fuses a single-track dynamics model for wheeled ground vehicles with visual-inertial odometry (VIO). Our method calibrates and adapts the dynamics model online to improve the accuracy of forward prediction conditioned on future control inputs. The single-track dynamics model approximates wheeled vehicle motion under specific control inputs on flat ground using ordinary differential equations. We use a singularity-free and differentiable variant of the single-track model to enable seamless integration as dynamics factor into VIO and to optimize the model parameters online together with the VIO state variables. We validate our method with real-world data in both indoor and outdoor environments with different terrain types and wheels. In experiments, we demonstrate that ST-VIO can not only adapt to wheel or ground changes and improve the accuracy of prediction under new control inputs, but can even improve tracking accuracy.
preprint supplemental video code datasets DOI URL BibTeX

Embodied Vision Conference Paper Analytical Uncertainty-Based Loss Weighting in Multi-Task Learning Kirchdorfer, L., Elich, C., Kutsche, S., Stuckenschmidt, H., Schott, L., Köhler, J. M. In Proceedings of the German Conference on Pattern Recognition (GCPR), 2024, to appear (To be published) BibTeX

Embodied Vision Article Attention Normalization Impacts Cardinality Generalization in Slot Attention Krimmel, M., Achterhold, J., Stueckler, J. In Transactions on Machine Learning Research (TMLR), 2024 (Published)
Object-centric scene decompositions are important representations for downstream tasks in fields such as computer vision and robotics. The recently proposed Slot Attention module, already leveraged by several derivative works for image segmentation and object tracking in videos, is a deep learning component which performs unsupervised object-centric scene decomposition on input images. It is based on an attention architecture, in which latent slot vectors, which hold compressed information on objects, attend to localized perceptual features from the input image. In this paper, we demonstrate that design decisions on normalizing the aggregated values in the attention architecture have considerable impact on the capabilities of Slot Attention to generalize to a higher number of slots and objects as seen during training. We propose and investigate alternatives to the original normalization scheme which increase the generalization capabilities of Slot Attention to varying slot and object counts, resulting in performance gains on the task of unsupervised image segmentation. The newly proposed normalizations represent minimal and easy to implement modifications of the usual Slot Attention module, changing the value aggregation mechanism from a weighted mean operation to a scaled weighted sum operation.
preprint video source code URL BibTeX

Embodied Vision Article Event-based Non-Rigid Reconstruction of Low-Rank Parametrized Deformations from Contours Xue, Y., Li, H., Leutenegger, S., Stueckler, J. International Journal of Computer Vision (IJCV), 2024 (Published)
Visual reconstruction of fast non-rigid object deformations over time is a challenge for conventional frame-based cameras. In recent years, event cameras have gained significant attention due to their bio-inspired properties, such as high temporal resolution and high dynamic range. In this paper, we propose a novel approach for reconstructing such deformations using event measurements. Under the assumption of a static background, where all events are generated by the motion, our approach estimates the deformation of objects from events generated at the object contour in a probabilistic optimization framework. It associates events to mesh faces on the contour and maximizes the alignment of the line of sight through the event pixel with the associated face. In experiments on synthetic and real data of human body motion, we demonstrate the advantages of our method over state-of-the-art optimization and learning-based approaches for reconstructing the motion of human arms and hands. In addition, we propose an efficient event stream simulator to synthesize realistic event data for human motion.
DOI URL BibTeX

Embodied Vision Conference Paper Examining Common Paradigms in Multi-Task Learning Elich, C., Kirchdorfer, L., M. Köhler, J., Schott, L. In Proceedings of the German Conference on Pattern Recognition (GCPR), 2024, to appear (To be published) paper BibTeX

Embodied Vision Technical Report Incremental Few-Shot Adaptation for Non-Prehensile Object Manipulation using Parallelizable Physics Simulators Baumeister, F., Mack, L., Stueckler, J. CoRR abs/2409.13228, CoRR, 2024, Submitted to IEEE International Conference on Robotics and Automation (ICRA) 2025 (Submitted)
Few-shot adaptation is an important capability for intelligent robots that perform tasks in open-world settings such as everyday environments or flexible production. In this paper, we propose a novel approach for non-prehensile manipulation which iteratively adapts a physics-based dynamics model for model-predictive control. We adapt the parameters of the model incrementally with a few examples of robot-object interactions. This is achieved by sampling-based optimization of the parameters using a parallelizable rigid-body physics simulation as dynamic world model. In turn, the optimized dynamics model can be used for model-predictive control using efficient sampling-based optimization. We evaluate our few-shot adaptation approach in several object pushing experiments in simulation and with a real robot.
URL BibTeX

Embodied Vision Technical Report Learning a Terrain- and Robot-Aware Dynamics Model for Autonomous Mobile Robot Navigation Achterhold, J., Guttikonda, S., Kreber, J. U., Li, H., Stueckler, J. CoRR abs/2409.11452, 2024, Preprint submitted to Robotics and Autonomous Systems Journal. https://arxiv.org/abs/2409.11452 (Submitted)
Mobile robots should be capable of planning cost-efficient paths for autonomous navigation. Typically, the terrain and robot properties are subject to variations. For instance, properties of the terrain such as friction may vary across different locations. Also, properties of the robot may change such as payloads or wear and tear, e.g., causing changing actuator gains or joint friction. Autonomous navigation approaches should thus be able to adapt to such variations. In this article, we propose a novel approach for learning a probabilistic, terrain- and robot-aware forward dynamics model (TRADYN) which can adapt to such variations and demonstrate its use for navigation. Our learning approach extends recent advances in meta-learning forward dynamics models based on Neural Processes for mobile robot navigation. We evaluate our method in simulation for 2D navigation of a robot with uni-cycle dynamics with varying properties on terrain with spatially varying friction coefficients. In our experiments, we demonstrate that TRADYN has lower prediction error over long time horizons than model ablations which do not adapt to robot or terrain variations. We also evaluate our model for navigation planning in a model-predictive control framework and under various sources of noise. We demonstrate that our approach yields improved performance in planning control-efficient paths by taking robot and terrain properties into account.
BibTeX

Embodied Vision Conference Paper Physically Plausible Object Pose Refinement in Cluttered Scenes Strecke, M., Stueckler, J. In Proceedings of the German Conference on Pattern Recognition (GCPR), 2024, to appear (To be published) code preprint (submitted version) BibTeX

Embodied Vision Conference Paper Physics-Based Rigid Body Object Tracking and Friction Filtering From RGB-D Videos Kandukuri, R. K., Strecke, M., Stueckler, J. In Proceedings of the International Conference on 3D Vision (3DV), 2024 (Published)
Physics-based understanding of object interactions from sensory observations is an essential capability in augmented reality and robotics. It enables to capture the properties of a scene for simulation and control. In this paper, we propose a novel approach for real-to-sim which tracks rigid objects in 3D from RGB-D images and infers physical properties of the objects. We use a differentiable physics simulation as state-transition model in an Extended Kalman Filter which can model contact and friction for arbitrary mesh-based shapes and in this way estimate physically plausible trajectories. We demonstrate that our approach can filter position, orientation, velocities, and concurrently can estimate the coefficient of friction of the objects. We analyze our approach on various sliding scenarios in synthetic image sequences of single objects and colliding objects. We also demonstrate and evaluate our approach on a real-world dataset. We make our novel benchmark datasets publicly available to foster future research in this novel problem setting and comparison with our method.
preprint supplemental video dataset DOI URL BibTeX

Embodied Vision Ph.D. Thesis Investigating Shape Priors, Relationships, and Multi-Task Cues for Object-level Scene Understanding Elich, C. ETH Zürich, Zurich, 2024 (Published)
Humans are proficient at intuitively identifying objects and reasoning about their diverse properties from complex visual observations. Despite significant advances in artificial intelligence, computers have yet to achieve a comparable level of understanding, which is crucial for effective reasoning about tasks and interactions within an environment. In this thesis, we explore the benefits of various visual cues when dealing with key challenges in scene understanding, specifically focusing on weak supervision, finding view correspondence, and paradigms for simultaneously learning multiple tasks. We begin by investigating cues that reduce the need for full supervision. In particular, we propose an approach for learning multi-object 3D scene decomposition and object-wise properties from single images with only weak supervision. Our method utilizes a recurrent encoder to infer a latent representation for each object and a differentiable renderer to obtain a training signal. To guide the training process and constrain the search space of possible solutions, we leverage prior knowledge through pre-trained 3D shape spaces. Subsequently, we investigate the benefits of reasoning about relations between objects to learn more distinct object representations that allow for matching object detections across viewpoint changes. To address this, we introduce an approach that employs graph neural networks to learn matching features based on appearance as well as inter- and cross-frame relations. We conduct comparisons with keypoint-based methods and propose a methodology to combine these approaches, aiming to achieve overall improved performance. Finally, we consider the challenge of multi-task learning and analyze related paradigms in the context of basic single-task learning. In particular, we study the impact of the choice of optimizer, the role of gradient conflicts, and the effects on the transferability of features learned through either learning setup on common image corruptions. Our findings reveal surprising similarities between single-task and multi-task learning, suggesting that methods and techniques from one field could be advantageously applied to the other.
DOI URL BibTeX

Embodied Vision Ph.D. Thesis Methods for Learning Adaptive and Symbolic Forward Models for Control and Planning Achterhold, J. M. Eberhard Karls Universität Tübingen, Tübingen, 2024 (Published)
Learning-based methods for sequential decision making, i.e., methods which leverage data, have shown the ability to solve complex problems in recent years. This includes control of dynamical systems, as well as mastering games such as Go and StarCraft. In addition, these methods often promise to be applicable to a wide variety of problems. A subclass of these methods are model-based methods. They leverage data to learn a model which allows predicting the evolution of a dynamical system to control. In recent research, it was shown that these methods, in contrast to model-free methods, require less data to be trained. In addition, model-based methods allow re-using the dynamics model when the task to be solved has changed, and straightforward adaptation to changes in the system’s dynamics. One particular focus of this thesis is on learning dynamics models which can data-efficiently adapt to changes in the system’s dynamics, as well as the efficient collection of data to adapt a learned model. In this regard, two novel methods are presented. In the application domain of autonomous robot navigation, in which both parameters of the robot and the terrain are subject to change, a novel method comprising an adaptive dynamics model is presented and evaluated on a simulated environment. A further advantage of model-based methods is the ability to incorporate physical prior knowledge for model design. In this thesis, we demonstrate that leveraging physical prior knowledge is advantageous for the task of tracking and predicting the motion of a table tennis ball, respecting its spin. However, model-based methods, in particular planning with learned models, have to cope with certain challenges. For long prediction horizons, which are required if the effect of an action is apparent only far in the future, model errors accumulate. In addition, model-based planning is commonly computationally intensive, which is problematic if high-frequency, reactive control is required. In this thesis, a method is presented to alleviate these problems. To this end, we propose a two-layered hierarchical method. Model-based planning is only applied on the higher layer on symbolic abstractions. On the lower-layer, model-free reactive control is used. We show successful application of this method to board games which can only be interacted with through a robotic manipulator, e.g., a robotic arm, which requires high-frequency reactive control.
DOI URL BibTeX