Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Haptic Intelligence Robotics Embodied Vision Conference Paper ISyHand: A Dexterous Multi-finger Robot Hand with an Articulated Palm Richardson, B. A., Grüninger, F., Mack, L., Stueckler, J., Kuchenbecker, K. J. In Proceedings of the IEEE-RAS International Conference on Humanoid Robots (Humanoids), 720-727, Seoul, South Korea, September 2025, Benjamin A. Richardson, Felix Grueninger and Lukas Mack contributed equally to this publication (Published) DOI BibTeX

Haptic Intelligence Embodied Vision Robotics Conference Paper Visuo-Tactile Object Pose Estimation for a Multi-Finger Robot Hand with Low-Resolution In-Hand Tactile Sensing Mack, L., Grüninger, F., Richardson, B. A., Lendway, R., Kuchenbecker, K. J., Stueckler, J. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 12401-12407, Atlanta, USA, May 2025 (Published)
Accurate 3D pose estimation of grasped objects is an important prerequisite for robots to perform assembly or in-hand manipulation tasks, but object occlusion by the robot's own hand greatly increases the difficulty of this perceptual task. Here, we propose that combining visual information with binary, low-resolution tactile contact measurements from across the interior surface of an articulated robotic hand can mitigate this issue. The visuo-tactile object-pose-estimation problem is formulated probabilistically in a factor graph. The pose of the object is optimized to align with the two kinds of measurements using a robust cost function to reduce the influence of outlier readings. The advantages of the proposed approach are first demonstrated in simulation: a custom 15-DOF robot hand with one binary tactile sensor per link grasps 17 YCB objects while observed by an RGB-D camera. This low-resolution in-hand tactile sensing significantly improves object-pose estimates under high occlusion and also high visual noise. We also show these benefits through grasping tests with a preliminary real version of our tactile hand, obtaining reasonable visuo-tactile estimates of object pose at approximately 12.9 Hz on average.
DOI BibTeX

Embodied Vision Conference Paper Online Calibration of a Single-Track Ground Vehicle Dynamics Model by Tight Fusion with Visual-Inertial Odometry Li, H., Stueckler, J. In 2024 IEEE International Conference on Robotics and Automation (ICRA 2024) , 1631-1637, Piscataway, NJ, IEEE International Conference on Robotics and Automation (ICRA 2024), August 2024 (Published)
Wheeled mobile robots need the ability to estimate their motion and the effect of their control actions for navigation planning. In this paper, we present ST-VIO, a novel approach which tightly fuses a single-track dynamics model for wheeled ground vehicles with visual-inertial odometry (VIO). Our method calibrates and adapts the dynamics model online to improve the accuracy of forward prediction conditioned on future control inputs. The single-track dynamics model approximates wheeled vehicle motion under specific control inputs on flat ground using ordinary differential equations. We use a singularity-free and differentiable variant of the single-track model to enable seamless integration as dynamics factor into VIO and to optimize the model parameters online together with the VIO state variables. We validate our method with real-world data in both indoor and outdoor environments with different terrain types and wheels. In experiments, we demonstrate that ST-VIO can not only adapt to wheel or ground changes and improve the accuracy of prediction under new control inputs, but can even improve tracking accuracy.
preprint supplemental video code datasets DOI URL BibTeX

Embodied Vision Conference Paper Analytical Uncertainty-Based Loss Weighting in Multi-Task Learning Kirchdorfer, L., Elich, C., Kutsche, S., Stuckenschmidt, H., Schott, L., Köhler, J. M. In Proceedings of the German Conference on Pattern Recognition (GCPR), 2024, to appear (To be published) BibTeX

Embodied Vision Conference Paper Examining Common Paradigms in Multi-Task Learning Elich, C., Kirchdorfer, L., M. Köhler, J., Schott, L. In Proceedings of the German Conference on Pattern Recognition (GCPR), 2024, to appear (To be published) paper BibTeX

Embodied Vision Conference Paper Physically Plausible Object Pose Refinement in Cluttered Scenes Strecke, M., Stueckler, J. In Proceedings of the German Conference on Pattern Recognition (GCPR), 2024, to appear (To be published) code preprint (submitted version) BibTeX

Embodied Vision Conference Paper Physics-Based Rigid Body Object Tracking and Friction Filtering From RGB-D Videos Kandukuri, R. K., Strecke, M., Stueckler, J. In Proceedings of the International Conference on 3D Vision (3DV), 2024 (Published)
Physics-based understanding of object interactions from sensory observations is an essential capability in augmented reality and robotics. It enables to capture the properties of a scene for simulation and control. In this paper, we propose a novel approach for real-to-sim which tracks rigid objects in 3D from RGB-D images and infers physical properties of the objects. We use a differentiable physics simulation as state-transition model in an Extended Kalman Filter which can model contact and friction for arbitrary mesh-based shapes and in this way estimate physically plausible trajectories. We demonstrate that our approach can filter position, orientation, velocities, and concurrently can estimate the coefficient of friction of the objects. We analyze our approach on various sliding scenarios in synthetic image sequences of single objects and colliding objects. We also demonstrate and evaluate our approach on a real-world dataset. We make our novel benchmark datasets publicly available to foster future research in this novel problem setting and comparison with our method.
preprint supplemental video dataset DOI URL BibTeX

Embodied Vision Learning and Dynamical Systems Empirical Inference Conference Paper Black-Box vs. Gray-Box: A Case Study on Learning Table Tennis Ball Trajectory Prediction with Spin and Impacts Achterhold, J., Tobuschat, P., Ma, H., Büchler, D., Muehlebach, M., Stueckler, J. In Conference on Learning for Dynamics and Control, 211:878-890, Proceedings of Machine Learning Research, (Editors: Nikolai Matni, Manfred Morari and George J. Pappa), PMLR, June 2023 (Published) preprint code URL BibTeX

Embodied Vision Conference Paper Context-Conditional Navigation with a Learning-Based Terrain- and Robot-Aware Dynamics Model Guttikonda, S., Achterhold, J., Li, H., Boedecker, J., Stueckler, J. In Proceedings of the European Conference on Mobile Robots (ECMR), 2023 (Published)
In autonomous navigation settings, several quantities can be subject to variations. Terrain properties such as friction coefficients may vary over time depending on the location of the robot. Also, the dynamics of the robot may change due to, e.g., different payloads, changing the system's mass, or wear and tear, changing actuator gains or joint friction. An autonomous agent should thus be able to adapt to such variations. In this paper, we develop a novel probabilistic, terrain- and robot-aware forward dynamics model, termed TRADYN, which is able to adapt to the above-mentioned variations. It builds on recent advances in meta-learning forward dynamics models based on Neural Processes. We evaluate our method in a simulated 2D navigation setting with a unicycle-like robot and different terrain layouts with spatially varying friction coefficients. In our experiments, the proposed model exhibits lower prediction error for the task of long-horizon trajectory prediction, compared to non-adaptive ablation models. We also evaluate our model on the downstream task of navigation planning, which demonstrates improved performance in planning control-efficient paths by taking robot and terrain properties into account.
preprint code DOI URL BibTeX

Embodied Vision Conference Paper Learning-based Relational Object Matching Across Views Elich, C., Armeni, I., Oswald, M. R., Pollefeys, M., Stueckler, J. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2023 (Published)
Intelligent robots require object-level scene understanding to reason about possible tasks and interactions with the environment. Moreover, many perception tasks such as scene reconstruction, image retrieval, or place recognition can benefit from reasoning on the level of objects. While keypoint-based matching can yield strong results for finding correspondences for images with small to medium view point changes, for large view point changes, matching semantically on the object-level becomes advantageous. In this paper, we propose a learning-based approach which combines local keypoints with novel object-level features for matching object detections between RGB images. We train our object-level matching features based on appearance and inter-frame and cross-frame spatial relations between objects in an associative graph neural network. We demonstrate our approach in a large variety of views on realistically rendered synthetic images. Our approach compares favorably to previous state-of-the-art object-level matching approaches and achieves improved performance over a pure keypoint-based approach for large view-point changes.
preprint code DOI URL BibTeX

Embodied Vision Autonomous Motion Movement Generation and Control Conference Paper Visual-Inertial and Leg Odometry Fusion for Dynamic Locomotion Dhédin, V., Li, H., Khorshidi, S., Mack, L., Ravi, A. K. C., Meduri, A., Shah, P., Grimminger, F., Righetti, L., Khadiv, M., Stueckler, J. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2023 (Published)
Implementing dynamic locomotion behaviors on legged robots requires a high-quality state estimation module. Especially when the motion includes flight phases, state-of-the-art approaches fail to produce reliable estimation of the robot posture, in particular base height. In this paper, we propose a novel approach for combining visual-inertial odometry (VIO) with leg odometry in an extended Kalman filter (EKF) based state estimator. The VIO module uses a stereo camera and IMU to yield low-drift 3D position and yaw orientation and drift-free pitch and roll orientation of the robot base link in the inertial frame. However, these values have a considerable amount of latency due to image processing and optimization, while the rate of update is quite low which is not suitable for low-level control. To reduce the latency, we predict the VIO state estimate at the rate of the IMU measurements of the VIO sensor. The EKF module uses the base pose and linear velocity predicted by VIO, fuses them further with a second high-rate IMU and leg odometry measurements, and produces robot state estimates with a high frequency and small latency suitable for control. We integrate this lightweight estimation framework with a nonlinear model predictive controller and show successful implementation of a set of agile locomotion behaviors, including trotting and jumping at varying horizontal speeds, on a torque-controlled quadruped robot.
preprint video DOI URL BibTeX

Embodied Vision Conference Paper Event-based Non-Rigid Reconstruction from Contours Xue, Y., Li, H., Leutenegger, S., Stueckler, J. In Proceedings of the British Machine Vision Conference (BMVC), 2022 (Published)
Visual reconstruction of fast non-rigid object deformations over time is a challenge for conventional frame-based cameras. In this paper, we propose a novel approach for reconstructing such deformations using measurements from event-based cameras. Our approach estimates the deformation of objects from events generated at the object contour in a probabilistic optimization framework. It associates events to mesh faces on the contour and maximizes the alignment of the line of sight through the event pixel with the associated face. In experiments on synthetic and real data, we demonstrate the advantages of our method over state-of-the-art optimization and learning-based approaches for reconstructing the motion of human hands.
preprint video URL BibTeX

Embodied Vision Conference Paper Learning Temporally Extended Skills in Continuous Domains as Symbolic Actions for Planning Achterhold, J., Krimmel, M., Stueckler, J. In Proceedings of the 6th Conference on Robot Learning (CoRL), 205:225-236 , Proceedings of Machine Learning Research , 6th Annual Conference on Robot Learning (CoRL 2022) , 2022 (Published)
Problems which require both long-horizon planning and continuous control capabilities pose significant challenges to existing reinforcement learning agents. In this paper we introduce a novel hierarchical reinforcement learning agent which links temporally extended skills for continuous control with a forward model in a symbolic discrete abstraction of the environment’s state for planning. We term our agent SEADS for Symbolic Effect-Aware Diverse Skills. We formulate an objective and corresponding algorithm which leads to unsupervised learning of a diverse set of skills through intrinsic motivation given a known state abstraction. The skills are jointly learned with the symbolic forward model which captures the effect of skill execution in the state abstraction. After training, we can leverage the skills as symbolic actions using the forward model for long-horizon planning and subsequently execute the plan using the learned continuous-action control skills. The proposed algorithm learns skills and forward models that can be used to solve complex tasks which require both continuous control and long-horizon planning capabilities with high success rate. It compares favorably with other flat and hierarchical reinforcement learning baseline agents and is successfully demonstrated with a real robot.
preprint project website URL BibTeX

Embodied Vision Conference Paper DiffSDFSim: Differentiable Rigid-Body Dynamics With Implicit Shapes Strecke, M., Stückler, J. In 2021 International Conference on 3D Vision (3DV 2021) , 96-105 , International Conference on 3D Vision (3DV 2021) , December 2021 (Published) Project website Preprint Code DOI URL BibTeX

Embodied Vision Conference Paper Explore the Context: Optimal Data Collection for Context-Conditional Dynamics Models Achterhold, J., Stueckler, J. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (AISTATS 2021) , 130, JMLR, Cambridge, MA, Titel The 24th International Conference on Artificial Intelligence and Statistics (AISTATS 2021) , April 2021, preprint CoRR abs/2102.11394 (Published)
In this paper, we learn dynamics models for parametrized families of dynamical systems with varying properties. The dynamics models are formulated as stochastic processes conditioned on a latent context variable which is inferred from observed transitions of the respective system. The probabilistic formulation allows us to compute an action sequence which, for a limited number of environment interactions, optimally explores the given system within the parametrized family. This is achieved by steering the system through transitions being most informative for the context variable. We demonstrate the effectiveness of our method for exploration on a non-linear toy-problem and two well-known reinforcement learning environments.
Preprint Project page Poster URL BibTeX

Embodied Vision Conference Paper Tracking 6-DoF Object Motion from Events and Frames Li, H., Stueckler, J. In Proc. of IEEE Int. Conf. on Robotics and Automation (ICRA), 2021 (Published) preprint DOI URL BibTeX

Embodied Vision Conference Paper Where Does It End? - Reasoning About Hidden Surfaces by Object Intersection Constraints Strecke, M., Stückler, J. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), 9589 - 9597, IEEE, Piscataway, NJ, IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR 2020), June 2020, preprint Corr abs/2004.04630 (Published) preprint project page Code DOI BibTeX

Embodied Vision Conference Paper DirectShape: Photometric Alignment of Shape Priors for Visual Vehicle Pose and Shape Estimation Wang, R., Yang, N., Stückler, J., Cremers, D. In Proceedings of the IEEE international Conference on Robotics and Automation (ICRA), 11067 - 11073, IEEE, Piscataway, NJ, IEEE International Conference on Robotics and Automation (ICRA 2020), May 2020, arXiv:1904.10097 (Published) DOI BibTeX

Embodied Vision Conference Paper Learning to Adapt Multi-View Stereo by Self-Supervision Mallick, A., Stückler, J., Lensch, H. In Proceedings of the British Machine Vision Conference (BMVC), 2020, preprint https://arxiv.org/abs/2009.13278 (Published) URL BibTeX

Autonomous Learning Embodied Vision Conference Paper Sample-efficient Cross-Entropy Method for Real-time Planning Pinneri, C., Sawant, S., Blaes, S., Achterhold, J., Stueckler, J., Rolinek, M., Martius, G. In Conference on Robot Learning 2020, 2020 (Published)
Trajectory optimizers for model-based reinforcement learning, such as the Cross-Entropy Method (CEM), can yield compelling results even in high-dimensional control tasks and sparse-reward environments. However, their sampling inefficiency prevents them from being used for real-time planning and control. We propose an improved version of the CEM algorithm for fast planning, with novel additions including temporally-correlated actions and memory, requiring 2.7-22x less samples and yielding a performance increase of 1.2-10x in high-dimensional control problems.
Paper Code Spotlight-Video URL BibTeX

Embodied Vision Conference Paper EM-Fusion: Dynamic Object-Level SLAM With Probabilistic Data Association Strecke, M., Stückler, J. In Proceedings IEEE/CVF International Conference on Computer Vision 2019 (ICCV), 5864-5873, IEEE, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), October 2019 (Published) preprint Project page Code Poster DOI BibTeX

Embodied Vision Conference Paper Learning to Disentangle Latent Physical Factors for Video Prediction Zhu, D., Munderloh, M., Rosenhahn, B., Stückler, J. In Pattern Recognition - Proceedings German Conference on Pattern Recognition (GCPR), Springer International, German Conference on Pattern Recognition (GCPR), September 2019 (Published) dataset & evaluation code video preprint DOI BibTeX