Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Embodied Vision Ph.D. Thesis Investigating Shape Priors, Relationships, and Multi-Task Cues for Object-level Scene Understanding Elich, C. ETH Zürich, Zurich, 2024 (Published)
Humans are proficient at intuitively identifying objects and reasoning about their diverse properties from complex visual observations. Despite significant advances in artificial intelligence, computers have yet to achieve a comparable level of understanding, which is crucial for effective reasoning about tasks and interactions within an environment. In this thesis, we explore the benefits of various visual cues when dealing with key challenges in scene understanding, specifically focusing on weak supervision, finding view correspondence, and paradigms for simultaneously learning multiple tasks. We begin by investigating cues that reduce the need for full supervision. In particular, we propose an approach for learning multi-object 3D scene decomposition and object-wise properties from single images with only weak supervision. Our method utilizes a recurrent encoder to infer a latent representation for each object and a differentiable renderer to obtain a training signal. To guide the training process and constrain the search space of possible solutions, we leverage prior knowledge through pre-trained 3D shape spaces. Subsequently, we investigate the benefits of reasoning about relations between objects to learn more distinct object representations that allow for matching object detections across viewpoint changes. To address this, we introduce an approach that employs graph neural networks to learn matching features based on appearance as well as inter- and cross-frame relations. We conduct comparisons with keypoint-based methods and propose a methodology to combine these approaches, aiming to achieve overall improved performance. Finally, we consider the challenge of multi-task learning and analyze related paradigms in the context of basic single-task learning. In particular, we study the impact of the choice of optimizer, the role of gradient conflicts, and the effects on the transferability of features learned through either learning setup on common image corruptions. Our findings reveal surprising similarities between single-task and multi-task learning, suggesting that methods and techniques from one field could be advantageously applied to the other.
DOI URL BibTeX

Embodied Vision Ph.D. Thesis Methods for Learning Adaptive and Symbolic Forward Models for Control and Planning Achterhold, J. M. Eberhard Karls Universität Tübingen, Tübingen, 2024 (Published)
Learning-based methods for sequential decision making, i.e., methods which leverage data, have shown the ability to solve complex problems in recent years. This includes control of dynamical systems, as well as mastering games such as Go and StarCraft. In addition, these methods often promise to be applicable to a wide variety of problems. A subclass of these methods are model-based methods. They leverage data to learn a model which allows predicting the evolution of a dynamical system to control. In recent research, it was shown that these methods, in contrast to model-free methods, require less data to be trained. In addition, model-based methods allow re-using the dynamics model when the task to be solved has changed, and straightforward adaptation to changes in the system’s dynamics. One particular focus of this thesis is on learning dynamics models which can data-efficiently adapt to changes in the system’s dynamics, as well as the efficient collection of data to adapt a learned model. In this regard, two novel methods are presented. In the application domain of autonomous robot navigation, in which both parameters of the robot and the terrain are subject to change, a novel method comprising an adaptive dynamics model is presented and evaluated on a simulated environment. A further advantage of model-based methods is the ability to incorporate physical prior knowledge for model design. In this thesis, we demonstrate that leveraging physical prior knowledge is advantageous for the task of tracking and predicting the motion of a table tennis ball, respecting its spin. However, model-based methods, in particular planning with learned models, have to cope with certain challenges. For long prediction horizons, which are required if the effect of an action is apparent only far in the future, model errors accumulate. In addition, model-based planning is commonly computationally intensive, which is problematic if high-frequency, reactive control is required. In this thesis, a method is presented to alleviate these problems. To this end, we propose a two-layered hierarchical method. Model-based planning is only applied on the higher layer on symbolic abstractions. On the lower-layer, model-free reactive control is used. We show successful application of this method to board games which can only be interacted with through a robotic manipulator, e.g., a robotic arm, which requires high-frequency reactive control.
DOI URL BibTeX

Embodied Vision Ph.D. Thesis Object-Level Dynamic Scene Reconstruction With Physical Plausibility From RGB-D Images Strecke, M. F. Eberhard Karls Universität Tübingen, Tübingen, 2023 (Published)
Humans have the remarkable ability to perceive and interact with objects in the world around them. They can easily segment objects from visual data and have an intuitive understanding of how physics influences objects. By contrast, robots are so far often constrained to tailored environments for a specific task, due to their inability to reconstruct a versatile and accurate scene representation. In this thesis, we combine RGB-D video data with background knowledge of real-world physics to develop such a representation for robots.</br> </br> Our contributions can be separated into two main parts: a dynamic object tracking tool and optimization frameworks that allow for improving shape reconstructions based on physical plausibility. The dynamic object tracking tool "EM-Fusion" detects, segments, reconstructs, and tracks objects from RGB-D video data. We propose a probabilistic data association approach for attributing the image pixels to the different moving objects in the scene. This allows us to track and reconstruct moving objects and the background scene with state-of-the art accuracy and robustness towards occlusions.</br> </br> We investigate two ways of further optimizing the reconstructed shapes of moving objects based on physical plausibility. The first of these, "Co-Section", includes physical plausibility by reasoning about the empty space around an object. We observe that no two objects can occupy the same space at the same time and that the depth images in the input video provide an estimate of observed empty space. Based on these observations, we propose intersection and hull constraints, which we combine with the observed surfaces in a global optimization approach. Compared to EM-Fusion, which only reconstructs the observed surface, Co-Section optimizes watertight shapes. These watertight shapes provide a rough estimate of unseen surfaces and could be useful as initialization for further refinement, e.g., by interactive perception. In the second optimization approach, "DiffSDFSim", we reason about object shapes based on physically plausible object motion. We observe that object trajectories after collisions depend on the object's shape, and extend a differentiable physics simulation for optimizing object shapes together with other physical properties (e.g., forces, masses, friction) based on the motion of the objects and their interactions. Our key contributions are using signed distance function models for representing shapes and a novel method for computing gradients that models the dependency of the time of contact on object shapes. We demonstrate that our approach recovers target shapes well by fitting to target trajectories and depth observations. Further, the ground-truth trajectories are recovered well in simulation using the resulting shape and physical properties. This enables predictions about the future motion of objects by physical simulation.</br> </br> We anticipate that our contributions can be useful building blocks in the development of 3D environment perception for robots. The reconstruction of individual objects as in EM-Fusion is a key ingredient required for interactions with objects. Completed shapes as the ones provided by Co-Section provide useful cues for planning interactions like grasping of objects. Finally, the recovery of shape and other physical parameters using differentiable simulation as in DiffSDFSim allows simulating objects and thus predicting the effects of interactions. Future work might extend the presented works for interactive perception of dynamic environments by comparing these predictions with observed real-world interactions to further improve the reconstructions and physical parameter estimations.
DOI URL BibTeX