Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Haptic Intelligence Autonomous Learning Empirical Inference Conference Paper Adding Internal Audio Sensing to Internal Vision Enables Human-Like In-Hand Fabric Recognition with Soft Robotic Fingertips Andrussow, I., Solano, J., Richardson, B. A., Martius, G., Kuchenbecker, K. J. In Proceedings of the IEEE-RAS International Conference on Humanoid Robots (Humanoids), 373-380, Seoul, South Korea, September 2025 (Published)
Distinguishing the feel of smooth silk from coarse cotton is a trivial everyday task for humans. When exploring such fabrics, fingertip skin senses both spatio-temporal force patterns and texture-induced vibrations that are integrated to form a haptic representation of the explored material. It is challenging to reproduce this rich, dynamic perceptual capability in robots because tactile sensors typically cannot achieve both high spatial resolution and high temporal sampling rate. In this work, we present a system that can sense both types of haptic information, and we investigate how each type influences robotic tactile perception of fabrics. Our robotic hand's middle finger and thumb each feature a soft tactile sensor: one is the open- source Minsight sensor that uses an internal camera to measure fingertip deformation and force at 50 Hz, and the other is our new sensor Minsound that captures vibrations through an internal MEMS microphone with a bandwidth from 50 Hz to 15 kHz. Inspired by the movements humans make to evaluate fabrics, our robot actively encloses and rubs folded fabric samples between its two sensitive fingers. Our results test the influence of each sensing modality on overall classification performance, showing high utility for the audio-based sensor. Our transformer-based method achieves a maximum fabric classification accuracy of 97% on a dataset of 20 common fabrics. Incorporating an external microphone away from Minsound increases our method's robustness in loud ambient noise conditions. To show that this audio-visual tactile sensing approach generalizes beyond the training data, we learn general representations of fabric stretchiness, thickness, and roughness.
DOI BibTeX

Empirical Inference Autonomous Learning Conference Paper SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models Sancaktar, C., Gumbsch, C., Zadaianchuk, A., Kolev, P., Martius, G. In Proceedings of the 42nd International Conference on Machine Learning (ICML), 267:52745-52777, Proceedings of Machine Learning Research, (Editors: Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry), International Conference on Machine Learning , July 2025 (Published) arXiv Project website URL BibTeX

Autonomous Learning Empirical Inference Conference Paper Zero-Shot Offline Imitation Learning via Optimal Transport Rupf, T., Bagatella, M., Gürtler, N., Frey, J., Martius, G. In Proceedings of the 42nd International Conference on Machine Learning (ICML), 267:52345-52381, Proceedings of Machine Learning Research, (Editors: Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry), PMLR, International Conference on Machine Learning, July 2025 (Published)
Zero-shot imitation learning algorithms hold the promise of reproducing unseen behavior from as little as a single demonstration at test time. Existing practical approaches view the expert demonstration as a sequence of goals, enabling imitation with a high-level goal selector, and a low-level goal-conditioned policy. However, this framework can suffer from myopic behavior: the agent's immediate actions towards achieving individual goals may undermine long-term objectives. We introduce a novel method that mitigates this issue by directly optimizing the occupancy matching objective that is intrinsic to imitation learning. We propose to lift a goal-conditioned value function to a distance between occupancies, which are in turn approximated via a learned world model. The resulting method can learn from offline, suboptimal data, and is capable of non-myopic, zero-shot imitation, as we demonstrate in complex, continuous benchmarks.
arXiv URL BibTeX

Autonomous Learning Miscellaneous Emergence of natural and robust bipedal walking by learning from biologically plausible objectives Schumacher, P., Geijtenbeek, T., Caggiano, V., Kumar, V., Schmitt, S., Martius, G., Haeufle, D. F. iScience, 28(4):112203, April 2025 (Published)
Humans show unparalleled ability when maneuvering diverse terrains. While reinforcement learning (RL) has shown great promise for musculoskeletal simulation in the development of robust controllers, complex behaviors are only achievable under extensive use of motion data. We demonstrate that the combination of a recent RL algorithm with a biologically plausible reward is capable of learning controllers for 4 different musculoskeletal models and achieves locomotion with up to 90 muscles without demonstrations. Our controllers generalize to diverse and unseen terrains, while only a single adaptive objective function is needed for training. We validate our findings on four models in two different simulators. The RL agents perform robustly with complex 3D models, where reflex-controllers are difficult to apply, and produce close-to-natural motion. This is a first step for the motor control, biomechanics, and rehabilitation communities to generate complex human movements with RL, without using motion data or simple unrepresentative models.
DOI URL BibTeX

Empirical Inference Autonomous Learning Conference Paper Advancing Out-of-Distribution Detection via Local Neuroplasticity Canevaro, A., Schmidt, J., Marvi, M. S., Yu, H., Martius, G., Jordan, J. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published) arXiv BibTeX

Empirical Inference Autonomous Learning Conference Paper On the Transfer of Object-Centric Representation Learning Didolkar, A. R., Zadaianchuk, A., Goyal, A., Mozer, M. C., Bengio, Y., Martius*, G., Seitzer*, M. The Thirteenth International Conference on Learning Representations (ICLR), April 2025, *equal contribution (Published) URL BibTeX

Haptic Intelligence Autonomous Learning Empirical Inference Miscellaneous Demonstration: Minsight - A Soft Vision-Based Tactile Sensor for Robotic Fingertips Andrussow, I., Sun, H., Martius, G., Kuchenbecker, K. J. Hands-on demonstration presented at the Conference on Robot Learning (CoRL), Munich, Germany, November 2024 (Published)
Beyond vision and hearing, tactile sensing enhances a robot's ability to dexterously manipulate unfamiliar objects and safely interact with humans. Giving touch sensitivity to robots requires compact, robust, affordable, and efficient hardware designs, especially for high-resolution tactile sensing. We present a soft vision-based tactile sensor engineered to meet these requirements. Comparable in size to a human fingertip, Minsight uses machine learning to output high-resolution directional contact force distributions at 60 Hz. Minsight's tactile force maps enable precise sensing of fingertip contacts, which we use in this hands-on demonstration to allow a 3-DoF robot arm to physically track contact with a user's finger. While observing the colorful image captured by Minsight's internal camera, attendees can experience how its ability to detect delicate touches in all directions facilitates real-time robot interaction.
BibTeX

Autonomous Learning Conference Paper Active Fine-Tuning of Generalist Policies Bagatella, M., Hübotter, J., Martius, G., Krause, A. October 2024 (Submitted) BibTeX

Autonomous Learning Robotics Conference Paper Learning Diverse Skills for Local Navigation under Multi-constraint Optimality Cheng, J., Vlastelica, M., Kolev, P., Li, C., Martius, G. In Learning Diverse Skills for Local Navigation under Multi-constraint Optimality, 5083-5089, ICRA, October 2024 (Published)
Despite many successful applications of data-driven control in robotics, extracting meaningful diverse behaviors remains a challenge. Typically, task performance needs to be compromised in order to achieve diversity. In many scenarios, task requirements are specified as a multitude of reward terms, each requiring a different trade-off. In this work, we take a constrained optimization viewpoint on the quality-diversity trade-off and show that we can obtain diverse policies while imposing constraints on their value functions which are defined through distinct rewards. In line with previous work, further control of the diversity level can be achieved through an attract-repel reward term motivated by the Van der Waals force. We demonstrate the effectiveness of our method on a local navigation task where a quadruped robot needs to reach the target within a finite horizon. Finally, our trained policies transfer well to the real 12-DoF quadruped robot, Solo12, and exhibit diverse agile behaviors with successful obstacle traversal.
Website DOI URL BibTeX

Empirical Inference Autonomous Learning Conference Paper Learning to Control Emulated Muscles in Real Robots: A Software Test Bed for Bio-Inspired Actuators in Hardware Schumacher, P., Krause, L., Schneider, J., Büchler, D., Martius, G., Haeufle, D. In Proceedings 10th International Conference on Biomedical Robotics and Biomechatronics (BioRob), 806-813, IEEE, 10th International Conference on Biomedical Robotics and Biomechatronics (BioRob), September 2024 (Published) arXiv DOI URL BibTeX

Autonomous Learning Robotics Article Identifying Terrain Physical Parameters from Vision-Towards Physical-Parameter-Aware Locomotion and Navigation Chen, J., Frey, J., Zhou, R., Miki, T., Martius, G., Hutter, M. IEEE Robotics and Automation Letters, Identifying Terrain Physical Parameters From Vision, 9(11):9279-9286, August 2024 (Published)
Identifying the physical properties of the surrounding environment is essential for robotic locomotion and navigation to deal with non-geometric hazards, such as slippery and deformable terrains. It would be of great benefit for robots to anticipate these extreme physical properties before contact; however, estimating environmental physical parameters from vision is still an open challenge. Animals can achieve this by using their prior experience and knowledge of what they have seen and how it felt. In this work, we propose a cross-modal self-supervised learning framework for vision-based environmental physical parameter estimation, which paves the way for future physical-property-aware locomotion and navigation. We bridge the gap between existing policies trained in simulation and identification of physical terrain parameters from vision. We propose to train a physical decoder in simulation to predict friction and stiffness from multi-modal input. The trained network allows the labeling of real-world images with physical parameters in a self-supervised manner to further train a visual network during deployment, which can densely predict the friction and stiffness from image data. We validate our physical decoder in simulation and the real world using a quadruped ANYmal robot, outperforming an existing baseline method. We show that our visual network can predict the physical properties in indoor and outdoor experiments while allowing fast adaptation to new environments.
DOI URL BibTeX

Autonomous Learning Miscellaneous Directed Exploration in Reinforcement Learning from Linear Temporal Logic Bagatella, M., Krause, A., Martius, G. August 2024 (In revision)
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning, as it allows describing objectives beyond the expressivity of conventional discounted return formulations. Nonetheless, recent works have shown that LTL formulas can be translated into a variable rewarding and discounting scheme, whose optimization produces a policy maximizing a lower bound on the probability of formula satisfaction. However, the synthesized reward signal remains fundamentally sparse, making exploration challenging. We aim to overcome this limitation, which can prevent current algorithms from scaling beyond low-dimensional, short-horizon problems. We show how better exploration can be achieved by further leveraging the LTL specification and casting its corresponding Limit Deterministic Büchi Automaton (LDBA) as a Markov reward process, thus enabling a form of high-level value estimation. By taking a Bayesian perspective over LDBA dynamics and proposing a suitable prior distribution, we show that the values estimated through this procedure can be treated as a shaping potential and mapped to informative intrinsic rewards. Empirically, we demonstrate applications of our method from tabular settings to high-dimensional continuous systems, which have so far represented a significant challenge for LTL-based reinforcement learning algorithms.
URL BibTeX

Autonomous Learning Conference Paper Zero-Shot Object-Centric Representation Learning Didolkar, A., Zadaianchuk, A., Goyal, A., Mozer, M., Bengio, Y., Martius, G., Seitzer, M. In Zero-Shot Object-Centric Representation Learning, August 2024 (Accepted)
The goal of object-centric representation learning is to decompose visual scenes into a structured representation that isolates the entities. Recent successes have shown that object-centric representation learning can be scaled to real-world scenes by utilizing pre-trained self-supervised features. However, so far, object-centric methods have mostly been applied in-distribution, with models trained and evaluated on the same dataset. This is in contrast to the wider trend in machine learning towards general-purpose models directly applicable to unseen data and tasks. Thus, in this work, we study current object-centric methods through the lens of zero-shot generalization by introducing a benchmark comprising eight different synthetic and real-world datasets. We analyze the factors influencing zero-shot performance and find that training on diverse real-world images improves transferability to unseen scenarios. Furthermore, inspired by the success of task-specific fine-tuning in foundation models, we introduce a novel fine-tuning strategy to adapt pre-trained vision encoders for the task of object discovery. We find that the proposed approach results in state-of-the-art performance for unsupervised object discovery, exhibiting strong zero-shot transfer to unseen datasets.
URL BibTeX

Autonomous Learning Article Sensing multi-directional forces at superresolution using taxel value isoline theory Sun, H., Spiers, A., Lee, H., Fiene, J., Martius, G. Sensing multi-directional forces at superresolution using taxel value isoline theory, August 2024 (Accepted)
Robots can benefit from a good sense of touch to perceive their interaction with the world. However, contacts are complex phenomena that involve tactile sensing devices, contact objects, and the complex directional (normal and shear) force motions in-between. To advance tactile sensor research, we propose a comprehensive theory that unites these components, providing insights for sensor designs, explaining performance drops due to shear forces, and suggesting application scenarios with various contact objects. Our theory, based on sensor isolines, achieves superresolution sensing performance using only a few sensing units, avoiding the need for dense layouts. Through analysis of the sensor perception field and force sensitivity from a structural perspective, along with the influences of contact object sizes, we also explore the effects of different force directions: normal, tangential shear, and radial shear forces. The theoretical model covers all these aspects and predicts a system-level inherent accuracy loss introduced by shear forces compared to pure normal forces. To validate our theory, we developed Barodome, a 3D sensor capable of predicting contact locations and decoupling shear forces from normal forces. The sensor's performance confirms the significant impact of shear forces on performance, alongside normal forces. The observed 0.5 mm drop in the real sensor's performance (normal and shear forces) closely matches the theoretical prediction of 0.33 mm. Overall, our theory offers valuable guidance for future tactile sensor designs, informing various design choices and enhancing the development of advanced robotic touch …
URL BibTeX

Autonomous Learning Conference Paper Dual-Force: Enhanced Offline Diversity Maximization under Imitation Constraints Kolev, P., Vlastelica, M., Martius, G. In Seventeenth European Workshop on Reinforcement Learning, August 2024 (Accepted)
While many algorithms for diversity maximization under imitation constraints are online in nature, many applications require offline algorithms without environment interactions. Tackling this problem in the offline setting, however, presents significant challenges that require non-trivial, multi-stage optimization processes with non-stationary rewards. In this work, we present a novel offline algorithm that enhances diversity using an objective based on Van der Waals (VdW) force and successor features, and eliminates the need to learn a previously used skill discriminator. Moreover, by conditioning the value function and policy on a pre-trained Functional Reward Encoding (FRE), our method allows for better handling of non-stationary rewards and provides zero-shot recall of all skills encountered during training, significantly expanding the set of skills learned in prior work. Consequently, our algorithm benefits from receiving a consistently strong diversity signal (VdW), and enjoys more stable and efficient training. We demonstrate the effectiveness of our method in generating diverse skills for two robotic tasks in simulation: locomotion of a quadruped and local navigation with obstacle traversal.
URL BibTeX

Autonomous Learning Conference Paper Causal Action Influence Aware Counterfactual Data Augmentation Urpi, N. A., Bagatella, M., Vlastelica, M., Martius, G. In Proceedings of the 41st International Conference on Machine Learning (ICML), 235:1709-1729, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) URL BibTeX

Autonomous Learning Conference Paper LPGD: A General Framework for Backpropagation through Embedded Optimization Layers Paulus, A., Martius, G., Musil, V. In Proceedings of the 41st International Conference on Machine Learning (ICML), 235:39989-40014, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) URL BibTeX

Autonomous Learning Conference Paper Learning with 3D rotations, a hitchhiker’s guide to SO(3) Geist, A. R., Frey, J., Zhobro, M., Levina, A., Martius, G. In Proceedings of Machine Learning Research, Proceedings of the Forty-First International Conference on Machine Learning , 235:15331-15350, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), Forty-First International Conference on Machine Learning , July 2024 (Published)
Many settings in machine learning require the selection of a rotation representation. However, choosing a suitable representation from the many available options is challenging. This paper acts as a survey and guide through rotation representations. We walk through their properties that harm or benefit deep learning with gradient-based optimization. By consolidating insights from rotation-based learning, we provide a comprehensive overview of learning functions with rotation representations. We provide guidance on selecting representations based on whether rotations are in the model's input or output and whether the data primarily comprises small angles.
URL BibTeX

Autonomous Learning Conference Paper Modelling Microbial Communities with Graph Neural Networks Ruaud, A., Sancaktar, C., Bagatella, M., Ratzke, C., Martius, G. In Proceedings of the 41st International Conference on Machine Learning (ICML), 235:42742-42765, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) URL BibTeX

Autonomous Learning Article PaSTS An Operational Dataset for Domestic Solar Thermal Systems Ebmeier, F., Ludwig, N., Martius, G., Franz, V. H. PaSTS An Operational Dataset for Domestic Solar Thermal Systems, June 2024 (Accepted)
Solar thermal systems play an important role in the decarbonization of the domestic heating sector, yet there exist no publicly available datasets of such systems. Therefore, this paper presents the PaSTS dataset, a unique collection of operational data from domestic Solar Thermal Systems (STS) manufactured by Ritter Energie and marketed under the Paradigma brand. Unlike previous research that primarily relied on simulated or unpublished experimental data, this dataset is derived from the service team at Ritter Energie, offering a realistic reflection of the challenges commonly faced in the field. This paper provides a comprehensive dataset overview, emphasizing its application in anomaly and fault detection tasks within STS and establishes the dataset as the first of its kind. Given the inherent complexities of fault detection in STS, we elaborate on the expert system-based fault detection mechanism currently in …
URL BibTeX

Autonomous Learning Conference Paper Emergent mechanisms for long timescales depend on training curriculum and affect performance in memory tasks Khajehabdollahi, S., Zeraati, R., Giannakakis, E., Schäfer, T. J., Martius, G., Levina, A. In The Twelfth International Conference on Learning Representations, ICLR 2024, May 2024 (Published) URL BibTeX

Autonomous Learning Conference Paper Learning Hierarchical World Models with Adaptive Temporal Abstractions from Discrete Latent Dynamics Gumbsch, C., Sajid, N., Martius, G., Butz, M. V. In The Twelfth International Conference on Learning Representations, ICLR 2024, May 2024 URL BibTeX

Empirical Inference Autonomous Learning Conference Paper Multi-View Causal Representation Learning with Partial Observability Yao, D., Xu, D., Lachapelle, S., Magliacane, S., Taslakian, P., Martius, G., von Kügelgen, J., Locatello, F. The Twelfth International Conference on Learning Representations (ICLR), May 2024 (Published) arXiv BibTeX

Autonomous Learning Conference Paper Wild Visual Navigation: Fast Traversability Learning via Pre-Trained Models and Online Self-Supervision Mattamala, M., Frey, J., Libera, P., Chebrolu, N., Martius, G., Cadena, C., Hutter, M., Fallon, M. April 2024 (Accepted)
Natural environments such as forests and grasslands are challenging for robotic navigation because of the false perception of rigid obstacles from high grass, twigs, or bushes. In this work, we present Wild Visual Navigation (WVN), an online self-supervised learning system for visual traversability estimation. The system is able to continuously adapt from a short human demonstration in the field, only using onboard sensing and computing. One of the key ideas to achieve this is the use of high-dimensional features from pre-trained self-supervised models, which implicitly encode semantic information that massively simplifies the learning task. Further, the development of an online scheme for supervision generator enables concurrent training and inference of the learned model in the wild. We demonstrate our approach through diverse real-world deployments in forests, parks, and grasslands. Our system is able to bootstrap the traversable terrain segmentation in less than 5 min of in-field training time, enabling the robot to navigate in complex, previously unseen outdoor terrains.
URL BibTeX

Autonomous Learning Conference Paper Generating Realistic Arm Movements in Reinforcement Learning: A Quantitative Comparison of Reward Terms and Task Requirements Charaja, J. P., Wochner, I., Schumacher, P., Ilg, W., Giese, M., Maufroy, C., Bulling, A., Schmitt, S., Martius, G., Haeufle, D. F. Proceedings 2024 10th IEEE RAS/EMBS International Conference for Biomedical Robotics and Biomechatronics (BioRob), 562-568, IEEE, 2024 10TH IEEE RAS/EMBS INTERNATIONAL CONFERENCE FOR BIOMEDICAL ROBOTICS\nAND BIOMECHATRONICS, BIOROB, February 2024 (Published)
The mimicking of human-like arm movement characteristics involves the consideration of three factors during control policy synthesis: (a) chosen task requirements, (b) inclusion of noise during movement execution and (c) chosen optimality principles. Previous studies showed that when considering these factors (a-c) individually, it is possible to synthesize arm movements that either kinematically match the experimental data or reproduce the stereotypical triphasic muscle activation pattern. However, to date no quantitative comparison has been made on how realistic the arm movement generated by each factor is; as well as whether a partial or total combination of all factors results in arm movements with human-like kinematic characteristics and a triphasic muscle pattern. To investigate this, we used reinforcement learning to learn a control policy for a musculoskeletal arm model, aiming to discern which combination of factors (a-c) results in realistic arm movements according to four frequently reported stereotypical characteristics. Our findings indicate that incorporating velocity and acceleration requirements into the reaching task, employing reward terms that encourage minimization of mechanical work, hand jerk, and control effort, along with the inclusion of noise during movement, leads to the emergence of realistic human arm movements in reinforcement learning. We expect that the gained insights will help in the future to better predict desired arm movements and corrective forces in wearable assistive devices.
DOI URL BibTeX

Autonomous Learning Article Machine learning of a density functional for anisotropic patchy particles Simon, A., Weimar, J., Martius, G., Oettel, M. Journal of Chemical Theory and Computation, 2024 (Accepted)
Anisotropic patchy particles have become an archetypical statistical model system for associating fluids. Here we formulate an approach to the Kern-Frenkel model via classical density functional theory to describe the positionally and orientationally resolved equilibrium density distributions in flat wall geometries. The density functional is split into a reference part for the orientationally averaged density and an orientational part in mean-field approximation. To bring the orientational part into a kernel form suitable for machine learning techniques, an expansion into orientational invariants and the proper incorporation of single-particle symmetries is formulated. The mean-field kernel is constructed via machine learning on the basis of hard wall simulation data. Results are compared to the well-known random-phase approximation which strongly underestimates the orientational correlations close to the wall. Successes and shortcomings of the mean-field treatment of the orientational part are highlighted and perspectives are given for attaining a full density functional via machine learning.
DOI URL BibTeX

Autonomous Learning Conference Paper SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models Sancaktar, C., Gumbsch, C., Zadaianchuk, A., Kolev, P., Martius, G. In The Training Agents with Foundation Models Workshop at RLC, 2024, indicates equal contribution (Published)
Exploring useful behavior is a keystone of reinforcement learning (RL). Existing approaches to intrinsic motivation, following general principles such as information gain, mostly uncover low-level interactions. In contrast, children’s play suggests that they engage in semantically meaningful high-level behavior by imitating or interacting with their caregivers. Recent work has focused on using foundation models to inject these semantic biases into exploration. However, these methods often rely on unrealistic assumptions, such as environments already embedded in language or access to high-level actions. To bridge this gap, we propose SEmaNtically Sensible ExploratIon (Sensei), a framework to equip model-based RL agents with intrinsic motivation for semantically meaningful behavior. To do so, we distill an intrinsic reward signal of interestingness from Vision Language Model (VLM) annotations. The agent learns to predict and maximize these intrinsic rewards using a world model learned directly from intrinsic rewards, image observations, and low-level actions. We show that in both robotic and video game-like simulations Sensei manages to discover a variety of meaningful behaviors. We believe Sensei provides a general tool for integrating feedback from foundation models into autonomous agents, a crucial research direction as openly available VLMs become more powerful.
URL BibTeX

Autonomous Learning Conference Paper On Imitation in Mean-field Games Ramponi, G., Kolev, P., Olivier, P., He, N., Laurière, M., Geist, M. In Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 1-12, Curran Associates Inc., NeurIPS, December 2023 (Published)
We explore the problem of imitation learning (IL) in the context of mean-field games (MFGs), where the goal is to imitate the behavior of a population of agents following a Nash equilibrium policy according to some unknown payoff function. IL in MFGs presents new challenges compared to single-agent IL, particularly when both the reward function and the transition kernel depend on the population distribution. In this paper, departing from the existing literature on IL for MFGs, we introduce a new solution concept called the Nash imitation gap. Then we show that when only the reward depends on the population distribution, IL in MFGs can be reduced to single-agent IL with similar guarantees. However, when the dynamics is population-dependent, we provide a novel upper-bound that suggests IL is harder in this setting. To address this issue, we propose a new adversarial formulation where the reinforcement learning problem is replaced by a mean-field control (MFC) problem, suggesting progress in IL within MFGs may have to build upon MFC.
DOI URL BibTeX

Autonomous Learning Conference Paper Goal-conditioned Offline Planning from Curious Exploration Bagatella, M., Martius, G. In Advances in Neural Information Processing Systems 36, December 2023 (Published)
Curiosity has established itself as a powerful exploration strategy in deep reinforcement learning. Notably, leveraging expected future novelty as intrinsic motivation has been shown to efficiently generate exploratory trajectories, as well as a robust dynamics model. We consider the challenge of extracting goal-conditioned behavior from the products of such unsupervised exploration techniques, without any additional environment interaction. We find that conventional goal-conditioned reinforcement learning approaches for extracting a value function and policy fall short in this difficult offline setting. By analyzing the geometry of optimal goal-conditioned value functions, we relate this issue to a specific class of estimation artifacts in learned values. In order to mitigate their occurrence, we propose to combine model-based planning over learned value landscapes with a graph-based value aggregation scheme. We show how this combination can correct both local and global artifacts, obtaining significant improvements in zero-shot goal-reaching performance across diverse simulated environments.
URL BibTeX

Autonomous Learning Conference Paper Object-Centric Learning for Real-World Videos by Predicting Temporal Feature Similarities Zadaianchuk, A., Seitzer, M., Martius, G. In Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023), Advances in Neural Information Processing Systems 36, December 2023
Unsupervised video-based object-centric learning is a promising avenue to learn structured representations from large, unlabeled video collections, but previous approaches have only managed to scale to real-world datasets in restricted domains. Recently, it was shown that the reconstruction of pre-trained self-supervised features leads to object-centric representations on unconstrained real-world image datasets. Building on this approach, we propose a novel way to use such pre-trained features in the form of a temporal feature similarity loss. This loss encodes semantic and temporal correlations between image patches and is a natural way to introduce a motion bias for object discovery. We demonstrate that this loss leads to state-of-the-art performance on the challenging synthetic MOVi datasets. When used in combination with the feature reconstruction loss, our model is the first object-centric video model that scales to unconstrained video datasets such as YouTube-VIS.
arXiv Website OpenReview URL BibTeX

Autonomous Learning Conference Paper Improving Behavioural Cloning with Positive Unlabeled Learning Wang, Q., McCarthy, R., Bulens, D. C., McGuinness, K., O’Connor, N. E., Sanchez, F. R., Gürtler, N., Widmaier, F., Redmond, S. J. 7th Annual Conference on Robot Learning (CoRL), November 2023 (Accepted) BibTeX

Autonomous Learning Conference Paper Regularity as Intrinsic Reward for Free Play Sancaktar, C., Piater, J., Martius, G. In Advances in Neural Information Processing Systems (NeurIPS, Advances in Neural Information Processing Systems 36, September 2023 (Published)
We propose regularity as a novel reward signal for intrinsically-motivated reinforcement learning. Taking inspiration from child development, we postulate that striving for structure and order helps guide exploration towards a subspace of tasks that are not favored by naive uncertainty-based intrinsic rewards. Our generalized formulation of Regularity as Intrinsic Reward (RaIR) allows us to operationalize it within model-based reinforcement learning. In a synthetic environment, we showcase the plethora of structured patterns that can emerge from pursuing this regularity objective. We also demonstrate the strength of our method in a multi-object robotic manipulation environment. We incorporate RaIR into free play and use it to complement the model’s epistemic uncertainty as an intrinsic reward. Doing so, we witness the autonomous construction of towers and other regular structures during free play, which leads to a substantial improvement in zero-shot downstream task performance on assembly tasks.
URL BibTeX

Haptic Intelligence Autonomous Learning Empirical Inference Article Minsight: A Fingertip-Sized Vision-Based Tactile Sensor for Robotic Manipulation Andrussow, I., Sun, H., Kuchenbecker, K. J., Martius, G. Advanced Intelligent Systems, 5(8):2300042, August 2023, Inside back cover, DOI: 10.1002/aisy.202370035 (Published)
Intelligent interaction with the physical world requires perceptual abilities beyond vision and hearing; vibrant tactile sensing is essential for autonomous robots to dexterously manipulate unfamiliar objects or safely contact humans. Therefore, robotic manipulators need high-resolution touch sensors that are compact, robust, inexpensive, and efficient. The soft vision-based haptic sensor presented herein is a miniaturized and optimized version of the previously published sensor Insight. Minsight has the size and shape of a human fingertip and uses machine learning methods to output high-resolution maps of 3D contact force vectors at 60 Hz. Experiments confirm its excellent sensing performance, with a mean absolute force error of 0.07 N and contact location error of 0.6 mm across its surface area. Minsight's utility is shown in two robotic tasks on a 3-DoF manipulator. First, closed-loop force control enables the robot to track the movements of a human finger based only on tactile data. Second, the informative value of the sensor output is shown by detecting whether a hard lump is embedded within a soft elastomer with an accuracy of 98\%. These findings indicate that Minsight can give robots the detailed fingertip touch sensing needed for dexterous manipulation and physical human–robot interaction.
DOI BibTeX

Autonomous Learning Article Offline Diversity Maximization under Imitation Constraints Marin, V., Jin, C., Martius, G., Kolev, P. Reinforcement Learning Journal, Offline Diversity Maximization under Imitation Constraints, 3:1377-1409, July 2023 (Published)
There has been significant recent progress in the area of unsupervised skill discovery, utilizing various information-theoretic objectives as measures of diversity. Despite these advances, challenges remain: current methods require significant online interaction, fail to leverage vast amounts of available task-agnostic data and typically lack a quantitative measure of skill utility. We address these challenges by proposing a principled offline algorithm for unsupervised skill discovery that, in addition to maximizing diversity, ensures that each learned skill imitates state-only expert demonstrations to a certain degree. Our main analytical contribution is to connect Fenchel duality, reinforcement learning, and unsupervised skill discovery to maximize a mutual information objective subject to KL-divergence state occupancy constraints. Furthermore, we demonstrate the effectiveness of our method on the standard offline benchmark D4RL and on a custom offline dataset collected from a 12-DoF quadruped robot for which the policies trained in simulation transfer well to the real robotic system.
Website DOI URL BibTeX

Autonomous Learning Conference Paper Backpropagation through Combinatorial Algorithms: Identity with Projection Works Sahoo, S., Paulus, A., Vlastelica, M., Musil, V., Kuleshov, V., Martius, G. In Proceedings of the Eleventh International Conference on Learning Representations, May 2023 (Accepted)
Embedding discrete solvers as differentiable layers has given modern deep learning architectures combinatorial expressivity and discrete reasoning capabilities. The derivative of these solvers is zero or undefined, therefore a meaningful replacement is crucial for effective gradient-based learning. Prior works rely on smoothing the solver with input perturbations, relaxing the solver to continuous problems, or interpolating the loss landscape with techniques that typically require additional solver calls, introduce extra hyper-parameters, or compromise performance. We propose a principled approach to exploit the geometry of the discrete solution space to treat the solver as a negative identity on the backward pass and further provide a theoretical justification. Our experiments demonstrate that such a straightforward hyper-parameter-free approach is able to compete with previous more complex methods on numerous experiments such as backpropagation through discrete samplers, deep graph matching, and image retrieval. Furthermore, we substitute the previously proposed problem-specific and label-dependent margin with a generic regularization procedure that prevents cost collapse and increases robustness.
OpenReview Arxiv Pdf URL BibTeX

Autonomous Learning Empirical Inference Conference Paper Benchmarking Offline Reinforcement Learning on Real-Robot Hardware Gürtler, N., Blaes, S., Kolev, P., Widmaier, F., Wüthrich, M., Bauer, S., Schölkopf, B., Martius, G. In Proceedings of the Eleventh International Conference on Learning Representations, The Eleventh International Conference on Learning Representations (ICLR), May 2023 (Published)
Learning policies from previously recorded data is a promising direction for real-world robotics tasks, as online learning is often infeasible. Dexterous manipulation in particular remains an open problem in its general form. The combination of offline reinforcement learning with large diverse datasets, however, has the potential to lead to a breakthrough in this challenging domain analogously to the rapid progress made in supervised learning in recent years. To coordinate the efforts of the research community toward tackling this problem, we propose a benchmark including: i) a large collection of data for offline learning from a dexterous manipulation platform on two tasks, obtained with capable RL agents trained in simulation; ii) the option to execute learned policies on a real-world robotic system and a simulation for efficient debugging. We evaluate prominent open-sourced offline reinforcement learning algorithms on the datasets and provide a reproducible experimental setup for offline reinforcement learning on real systems.
Website arXiv Code URL BibTeX

Autonomous Learning Empirical Inference Conference Paper DEP-RL: Embodied Exploration for Reinforcement Learning in Overactuated and Musculoskeletal Systems Schumacher, P., Haeufle, D. F., Büchler, D., Schmitt, S., Martius, G. In The Eleventh International Conference on Learning Representations (ICLR), May 2023 (Published)
Muscle-actuated organisms are capable of learning an unparalleled diversity of dexterous movements despite their vast amount of muscles. Reinforcement learning (RL) on large musculoskeletal models, however, has not been able to show similar performance. We conjecture that ineffective exploration in large overactuated action spaces is a key problem. This is supported by our finding that common exploration noise strategies are inadequate in synthetic examples of overactuated systems. We identify differential extrinsic plasticity (DEP), a method from the domain of self-organization, as being able to induce state-space covering exploration within seconds of interaction. By integrating DEP into RL, we achieve fast learning of reaching and locomotion in musculoskeletal systems, outperforming current approaches in all considered tasks in sample efficiency and robustness.
Arxiv pdf Website URL BibTeX

Autonomous Learning Empirical Inference Conference Paper Bridging the Gap to Real-World Object-Centric Learning Seitzer, M., Horn, M., Zadaianchuk, A., Zietlow, D., Xiao, T., Simon-Gabriel, C., He, T., Zhang, Z., Schölkopf, B., Brox, T., Locatello, F. In Proceedings of the Eleventh International Conference on Learning Representations, The Eleventh International Conference on Learning Representations (ICLR), May 2023 (Published)
Humans naturally decompose their environment into entities at the appropriate level of abstraction to act in the world. Allowing machine learning algorithms to derive this decomposition in an unsupervised way has become an important line of research. However, current methods are restricted to simulated data or require additional information in the form of motion or depth in order to successfully discover objects. In this work, we overcome this limitation by showing that reconstructing features from models trained in a self-supervised manner is a sufficient training signal for object-centric representations to arise in a fully unsupervised way. Our approach, DINOSAUR, significantly out-performs existing object-centric learning models on simulated data and is the first unsupervised object-centric model that scales to real world-datasets such as COCO and PASCAL VOC. DINOSAUR is conceptually simple and shows competitive performance compared to more involved pipelines from the computer vision literature.
Code Website URL BibTeX

Autonomous Learning Conference Paper Efficient Learning of High Level Plans from Play Armengol Urpi, N., Bagatella, M., Hilliges, O., Martius, G., Coros, S. In International Conference on Robotics and Automation, May 2023 (Accepted)
Real-world robotic manipulation tasks remain an elusive challenge, since they involve both fine-grained environment interaction, as well as the ability to plan for long-horizon goals. Although deep reinforcement learning (RL) methods have shown encouraging results when planning end-to-end in high-dimensional environments, they remain fundamentally limited by poor sample efficiency due to inefficient exploration, and by the complexity of credit assignment over long horizons. In this work, we present Efficient Learning of High-Level Plans from Play (ELF-P), a framework for robotic learning that bridges motion planning and deep RL to achieve long-horizon complex manipulation tasks. We leverage task-agnostic play data to learn a discrete behavioral prior over object-centric primitives, modeling their feasibility given the current context. We then design a high-level goal-conditioned policy which (1) uses primitives as building blocks to scaffold complex long-horizon tasks and (2) leverages the behavioral prior to accelerate learning. We demonstrate that ELF-P has significantly better sample efficiency than relevant baselines over multiple realistic manipulation tasks and learns policies that can be easily transferred to physical hardware.
Arxiv Website Poster BibTeX

Autonomous Learning Conference Paper Pink Noise Is All You Need: Colored Noise Exploration in Deep Reinforcement Learning Eberhard, O., Hollenstein, J., Pinneri, C., Martius, G. In Proceedings of the Eleventh International Conference on Learning Representations (ICLR), The Eleventh International Conference on Learning Representations (ICLR), May 2023
In off-policy deep reinforcement learning with continuous action spaces, exploration is often implemented by injecting action noise into the action selection process. Popular algorithms based on stochastic policies, such as SAC or MPO, inject white noise by sampling actions from uncorrelated Gaussian distributions. In many tasks, however, white noise does not provide sufficient exploration, and temporally correlated noise is used instead. A common choice is Ornstein-Uhlenbeck (OU) noise, which is closely related to Brownian motion (red noise). Both red noise and white noise belong to the broad family of colored noise. In this work, we perform a comprehensive experimental evaluation on MPO and SAC to explore the effectiveness of other colors of noise as action noise. We find that pink noise, which is halfway between white and red noise, significantly outperforms white noise, OU noise, and other alternatives on a wide range of environments. Thus, we recommend it as the default choice for action noise in continuous control.
URL BibTeX

Autonomous Learning Haptic Intelligence Empirical Inference Article Predicting the Force Map of an ERT-Based Tactile Sensor Using Simulation and Deep Networks Lee, H., Sun, H., Park, H., Serhat, G., Javot, B., Martius, G., Kuchenbecker, K. J. IEEE Transactions on Automation Science and Engineering, 20(1):425-439, January 2023 (Published)
Electrical resistance tomography (ERT) can be used to create large-scale soft tactile sensors that are flexible and robust. Good performance requires a fast and accurate mapping from the sensor's sequential voltage measurements to the distribution of force across its surface. However, particularly with multiple contacts, this task is challenging for both previously developed approaches: physics-based modeling and end-to-end data-driven learning. Some promising results were recently achieved using sim-to-real transfer learning, but estimating multiple contact locations and accurate contact forces remains difficult because simulations tend to be less accurate with a high number of contact locations and/or high force. This paper introduces a modular hybrid method that combines simulation data synthesized from an electromechanical finite element model with real measurements collected from a new ERT-based tactile sensor. We use about 290,000 simulated and 90,000 real measurements to train two deep neural networks: the first (Transfer-Net) captures the inevitable gap between simulation and reality, and the second (Recon-Net) reconstructs contact forces from voltage measurements. The number of contacts, contact locations, force magnitudes, and contact diameters are evaluated for a manually collected multi-contact dataset of 150 measurements. Our modular pipeline's results outperform predictions by both a physics-based model and end-to-end learning.
DOI BibTeX

Autonomous Learning Article Discovering causal relations and equations from data Camps-Valls, G., Gerhardus, A., Ninad, U., Varando, G., Martius, G., Balaguer-Ballester, E., Vinuesa, R., Diaz, E., Zanna, L., Runge, J. Physics Reports, 1044:1-68, 2023
Physics is a field of science that has traditionally used the scientific method to answer questions about why natural phenomena occur and to make testable models that explain the phenomena. Discovering equations, laws, and principles that are invariant, robust, and causal has been fundamental in physical sciences throughout the centuries. Discoveries emerge from observing the world and, when possible, performing interventions on the system under study. With the advent of big data and data-driven methods, the fields of causal and equation discovery have developed and accelerated progress in computer science, physics, statistics, philosophy, and many applied fields. This paper reviews the concepts, methods, and relevant works on causal and equation discovery in the broad field of physics and outlines the most important challenges and promising future lines of research. We also provide a taxonomy for data-driven causal and equation discovery, point out connections, and showcase comprehensive case studies in Earth and climate sciences, fluid dynamics and mechanics, and the neurosciences. This review demonstrates that discovering fundamental laws and causal relations by observing natural phenomena is revolutionised with the efficient exploitation of observational data and simulations, modern machine learning algorithms and the combination with domain knowledge. Exciting times are ahead with many challenges and opportunities to improve our understanding of complex systems.
DOI BibTeX

Autonomous Learning Article Interpretable Symbolic Regression for Data Science: Analysis of the 2022 Competition Franca, F. D., Virgolin, M., Kommenda, M., Majumder, M., Cranmer, M., Espada, G., Ingelse, L., Fonseca, A., Landajuela, M., Petersen, B., Glatt, R., Mundhenk, N., Lee, C., Hochhalter, J., Randall, D., Kamienny, P., Zhang, H., Dick, G., Simon, A., Burlacu, B., et al. arXiv, 2023 URL BibTeX

Haptic Intelligence Autonomous Learning Empirical Inference Miscellaneous A Sequential Group VAE for Robot Learning of Haptic Representations Richardson, B. A., Kuchenbecker, K. J., Martius, G. 1-11, Workshop paper (8 pages) presented at the CoRL Workshop on Aligning Robot Representations with Humans, Auckland, New Zealand, December 2022 (Published)
Haptic representation learning is a difficult task in robotics because information can be gathered only by actively exploring the environment over time, and because different actions elicit different object properties. We propose a Sequential Group VAE that leverages object persistence to learn and update latent general representations of multimodal haptic data. As a robot performs sequences of exploratory procedures on an object, the model accumulates data and learns to distinguish between general object properties, such as size and mass, and trial-to-trial variations, such as initial object position. We demonstrate that after very few observations, the general latent representations are sufficiently refined to accurately encode many haptic object properties.
URL BibTeX

Autonomous Learning Conference Paper Curious Exploration via Structured World Models Yields Zero-Shot Object Manipulation Sancaktar, C., Blaes, S., Martius, G. In Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 24170-24183 , Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems (NeurIPS 2022), December 2022 (Published) Arxiv Videos Openreview URL BibTeX

Empirical Inference Autonomous Learning Robust Machine Learning Conference Paper Embrace the Gap: VAEs Perform Independent Mechanism Analysis Reizinger*, P., Gresele*, L., Brady*, J., von Kügelgen, J., Zietlow, D., Schölkopf, B., Martius, G., Brendel, W., Besserve, M. Advances in Neural Information Processing Systems (NeurIPS 2022), 35:12040-12057, (Editors: S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh), Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems, December 2022, *equal first authorship (Published) Arxiv PDF URL BibTeX

Autonomous Learning Conference Paper Learning Agile Skills via Adversarial Imitation of Rough Partial Demonstrations Li, C., Vlastelica, M., Blaes, S., Frey, J., Grimminger, F., Martius, G. Proceedings of the 6th Conference on Robot Learning (CoRL), Conference on Robot Learning (CoRL), December 2022 (Accepted)
Learning agile skills is one of the main challenges in robotics. To this end, reinforcement learning approaches have achieved impressive results. These methods require explicit task information in terms of a reward function or an expert that can be queried in simulation to provide a target control output, which limits their applicability. In this work, we propose a generative adversarial method for inferring reward functions from partial and potentially physically incompatible demonstrations for successful skill acquirement where reference or expert demonstrations are not easily accessible. Moreover, we show that by using a Wasserstein GAN formulation and transitions from demonstrations with rough and partial information as input, we are able to extract policies that are robust and capable of imitating demonstrated behaviors. Finally, the obtained skills such as a backflip are tested on an agile quadruped robot called Solo 8 and present faithful replication of hand-held human demonstrations.
Arxiv Videos Project URL BibTeX