Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Perceiving Systems Article Localization and recognition of human action in 3D using transformers Sun, J., Huang, L., Hongsong Wang, C. Z. J. Q., Islam, M. T., Xie, E., Zhou, B., Xing, L., Chandrasekaran, A., Black, M. J. Nature Communications Engineering , 13(125), September 2024 (Published)
Understanding a person’s behavior from their 3D motion sequence is a fundamental problem in computer vision with many applications. An important component of this problem is 3D action localization, which involves recognizing what actions a person is performing, and when the actions occur in the sequence. To promote the progress of the 3D action localization community, we introduce a new, challenging, and more complex benchmark dataset, BABEL-TAL (BT), for 3D action localization. Important baselines and evaluating metrics, as well as human evaluations, are carefully established on this benchmark. We also propose a strong baseline model, i.e., Localizing Actions with Transformers (LocATe), that jointly localizes and recognizes actions in a 3D sequence. The proposed LocATe shows superior performance on BABEL-TAL as well as on the large-scale PKU-MMD dataset, achieving state-of-the-art performance by using only 10% of the labeled training data. Our research could advance the development of more accurate and efficient systems for human behavior analysis, with potential applications in areas such as human-computer interaction and healthcare.
paper DOI BibTeX

Physics for Inference and Optimization Article Similarity and economy of scale in urban transportation networks and optimal transport-based infrastructures Leite, D., De Bacco, C. Nature Communications, September 2024 (Published)
Designing and optimizing the structure of urban transportation networks is a challenging task. In this study, we propose a method inspired by optimal transport theory to reproduce the optimal structure of public transportation networks, that uses little information in input. Contrarily to standard approaches, it does not assume any initial backbone network infrastructure, but rather extracts this directly from a continuous space using only a few origin and destination points. Analyzing a set of urban rail, tram and subway networks, we find a high degree of similarity between simulated and real infrastructures. By tuning one parameter, our method can simulate a range of different networks that can be further used to suggest possible improvements in terms of relevant transportation properties. Outputs of our algorithm provide naturally a principled quantitative measure of similarity between two networks that can be used to automatize the selection of similar simulated networks.
Preprint Code Paper DOI URL BibTeX

Perceiving Systems Conference Paper AWOL: Analysis WithOut synthesis using Language Zuffi, S., Black, M. J. In European Conference on Computer Vision (ECCV 2024), LNCS, Springer Cham, September 2024 (Published)
Many classical parametric 3D shape models exist, but creating novel shapes with such models requires expert knowledge of their parameters. For example, imagine creating a specific type of tree using procedural graphics or a new kind of animal from a statistical shape model. Our key idea is to leverage language to control such existing models to produce novel shapes. This involves learning a mapping between the latent space of a vision-language model and the parameter space of the 3D model, which we do using a small set of shape and text pairs. Our hypothesis is that mapping from language to parameters allows us to generate parameters for objects that were never seen during training. If the mapping between language and parameters is sufficiently smooth, then interpolation or generalization in language should translate appropriately into novel 3D shapes. We test our approach with two very different types of parametric shape models (quadrupeds and arboreal trees). We use a learned statistical shape model of quadrupeds and show that we can use text to generate new animals not present during training. In particular, we demonstrate state-of-the-art shape estimation of 3D dogs. This work also constitutes the first language-driven method for generating 3D trees. Finally, embedding images in the CLIP latent space enables us to generate animals and trees directly from images.
Paper URL BibTeX

Perceiving Systems Article EarthRanger: An Open-Source Platform for Ecosystem Monitoring, Research, and Management Wall, J., Lefcourt, J., Jones, C., Doehring, C., O’Neill, D., Schneider, D., Steward, J., Krautwurst, J., Wong, T., Jones, B., Goodfellow, K., Schmitt, T., Gobush, K., Douglas-Hamilton, I., Pope, F., Schmidt, E., Palmer, J., Stokes, E., Reid, A., Elbroch, M. L., et al. Methods in Ecology and Evolution, 13, British Ecological Society, September 2024 (Published)
1. Effective approaches are needed to conserve the planet's remaining wildlife and wilderness landscapes, especially concerning global biodiversity conservation targets. Here, we present a new software system called EarthRanger: an open-source platform built to help monitor, research and manage ecosystems. 2. EarthRanger consists of seven main components (Core Server, API, Storage, Gundi, Web App, Mobile App, Ecoscope) that provide functionality for data (i) aggregation & collection, (ii) storage & management, (iii) real-time and post hoc analysis, (iv) visualisation and (v) dissemination. The mobile application provides field-based data recording and visualisation tools. EarthRanger may be deployed for single project use or can aggregate across multiple geographies as a centralised hub. EarthRanger can be used to collect standardised tracking data (e.g. from wildlife collars, vehicles and ranger patrols) and configurable event information (e.g. a singular recording with associated user-defined attribute information such as a wildlife sighting or encounter with a poacher). 3. Since development began in 2015, the platform has (at the time of writing) been deployed at over 500 sites across 70 countries and with myriad configurations and objectives. EarthRanger has improved the ability to monitor data feeds and manage conservation-related operations in real time. For instance, the deployment of EarthRanger by African Parks has led to the removal of over 50,000 snares, steady population growth of key species of concern and near cessation of poaching. In Liwonde's protected area, enhanced mitigation efforts supported by EarthRanger reduced the number of deaths from wildlife conflict by more than 91%. EarthRanger is also providing a platform to enhance standardisation, aggregation, transfer and long-term storage of ecological information and promote collaboration between groups conducting protected area management and ecology and biodiversity research.
pdf DOI BibTeX

Perceiving Systems Conference Paper GraspXL: Generating Grasping Motions for Diverse Objects at Scale Zhang, H., Christen, S., Fan, Z., Hilliges, O., Song, J. In European Conference on Computer Vision (ECCV 2024), Part XXVI:386-403, LNCS, Springer Cham, September 2024 (Published) Code Video Paper DOI URL BibTeX

Autonomous Learning Robotics Article Identifying Terrain Physical Parameters from Vision-Towards Physical-Parameter-Aware Locomotion and Navigation Chen, J., Frey, J., Zhou, R., Miki, T., Martius, G., Hutter, M. IEEE Robotics and Automation Letters, Identifying Terrain Physical Parameters From Vision, 9(11):9279-9286, August 2024 (Published)
Identifying the physical properties of the surrounding environment is essential for robotic locomotion and navigation to deal with non-geometric hazards, such as slippery and deformable terrains. It would be of great benefit for robots to anticipate these extreme physical properties before contact; however, estimating environmental physical parameters from vision is still an open challenge. Animals can achieve this by using their prior experience and knowledge of what they have seen and how it felt. In this work, we propose a cross-modal self-supervised learning framework for vision-based environmental physical parameter estimation, which paves the way for future physical-property-aware locomotion and navigation. We bridge the gap between existing policies trained in simulation and identification of physical terrain parameters from vision. We propose to train a physical decoder in simulation to predict friction and stiffness from multi-modal input. The trained network allows the labeling of real-world images with physical parameters in a self-supervised manner to further train a visual network during deployment, which can densely predict the friction and stiffness from image data. We validate our physical decoder in simulation and the real world using a quadruped ANYmal robot, outperforming an existing baseline method. We show that our visual network can predict the physical properties in indoor and outdoor experiments while allowing fast adaptation to new environments.
DOI URL BibTeX

Autonomous Learning Miscellaneous Directed Exploration in Reinforcement Learning from Linear Temporal Logic Bagatella, M., Krause, A., Martius, G. August 2024 (In revision)
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning, as it allows describing objectives beyond the expressivity of conventional discounted return formulations. Nonetheless, recent works have shown that LTL formulas can be translated into a variable rewarding and discounting scheme, whose optimization produces a policy maximizing a lower bound on the probability of formula satisfaction. However, the synthesized reward signal remains fundamentally sparse, making exploration challenging. We aim to overcome this limitation, which can prevent current algorithms from scaling beyond low-dimensional, short-horizon problems. We show how better exploration can be achieved by further leveraging the LTL specification and casting its corresponding Limit Deterministic Büchi Automaton (LDBA) as a Markov reward process, thus enabling a form of high-level value estimation. By taking a Bayesian perspective over LDBA dynamics and proposing a suitable prior distribution, we show that the values estimated through this procedure can be treated as a shaping potential and mapped to informative intrinsic rewards. Empirically, we demonstrate applications of our method from tabular settings to high-dimensional continuous systems, which have so far represented a significant challenge for LTL-based reinforcement learning algorithms.
URL BibTeX

Autonomous Learning Conference Paper Zero-Shot Object-Centric Representation Learning Didolkar, A., Zadaianchuk, A., Goyal, A., Mozer, M., Bengio, Y., Martius, G., Seitzer, M. In Zero-Shot Object-Centric Representation Learning, August 2024 (Accepted)
The goal of object-centric representation learning is to decompose visual scenes into a structured representation that isolates the entities. Recent successes have shown that object-centric representation learning can be scaled to real-world scenes by utilizing pre-trained self-supervised features. However, so far, object-centric methods have mostly been applied in-distribution, with models trained and evaluated on the same dataset. This is in contrast to the wider trend in machine learning towards general-purpose models directly applicable to unseen data and tasks. Thus, in this work, we study current object-centric methods through the lens of zero-shot generalization by introducing a benchmark comprising eight different synthetic and real-world datasets. We analyze the factors influencing zero-shot performance and find that training on diverse real-world images improves transferability to unseen scenarios. Furthermore, inspired by the success of task-specific fine-tuning in foundation models, we introduce a novel fine-tuning strategy to adapt pre-trained vision encoders for the task of object discovery. We find that the proposed approach results in state-of-the-art performance for unsupervised object discovery, exhibiting strong zero-shot transfer to unseen datasets.
URL BibTeX

Autonomous Learning Article Sensing multi-directional forces at superresolution using taxel value isoline theory Sun, H., Spiers, A., Lee, H., Fiene, J., Martius, G. Sensing multi-directional forces at superresolution using taxel value isoline theory, August 2024 (Accepted)
Robots can benefit from a good sense of touch to perceive their interaction with the world. However, contacts are complex phenomena that involve tactile sensing devices, contact objects, and the complex directional (normal and shear) force motions in-between. To advance tactile sensor research, we propose a comprehensive theory that unites these components, providing insights for sensor designs, explaining performance drops due to shear forces, and suggesting application scenarios with various contact objects. Our theory, based on sensor isolines, achieves superresolution sensing performance using only a few sensing units, avoiding the need for dense layouts. Through analysis of the sensor perception field and force sensitivity from a structural perspective, along with the influences of contact object sizes, we also explore the effects of different force directions: normal, tangential shear, and radial shear forces. The theoretical model covers all these aspects and predicts a system-level inherent accuracy loss introduced by shear forces compared to pure normal forces. To validate our theory, we developed Barodome, a 3D sensor capable of predicting contact locations and decoupling shear forces from normal forces. The sensor's performance confirms the significant impact of shear forces on performance, alongside normal forces. The observed 0.5 mm drop in the real sensor's performance (normal and shear forces) closely matches the theoretical prediction of 0.33 mm. Overall, our theory offers valuable guidance for future tactile sensor designs, informing various design choices and enhancing the development of advanced robotic touch …
URL BibTeX

Organizational Leadership and Diversity Conference Paper Gig work in organizations: Trends and perspectives from Human Resource Management professionals Singh, V., Keplinger, K., Tursunbayeva, A., Di Lauro, S. In Proceedings of the 84th Annual Meeting of the Academy of Management, https://doi.org/10.5465/AMPROC.2024.14769symposium, Chicago, USA, 84th Annual Meeting of the Academy of Management, August 2024 (Published)
The gig economy has expanded beyond platform-based work and is also transforming standard organizations that are accustomed to stable employment arrangements and long-term-oriented HRM practices. The shift towards gig workers and blended teams disrupts standard HR practices due to the short-term, transactional nature of gig work. This research investigates the implications of gig work on HRM practices in standard organizations. Specifically, we 1) examine the trends and perspectives of HR professionals on the use of gig work in standard organizations, 2) investigate whether HR professionals apply standard HRM practices for gig workers, and 3) conduct a longitudinal analysis of HRM perspectives applicable to gig workers before and post-COVID-19 pandemic. To achieve these research objectives, we employ natural language processing techniques to analyze more than 500 YouTube videos of HR professionals offering their opinions about gig work. The findings suggest that despite the widely conceived notion that gig workers are ‘self-managed’, various HRM practices are utilized in the context of gig work.
Gig work and HRM DOI URL BibTeX

Embodied Vision Conference Paper Online Calibration of a Single-Track Ground Vehicle Dynamics Model by Tight Fusion with Visual-Inertial Odometry Li, H., Stueckler, J. In 2024 IEEE International Conference on Robotics and Automation (ICRA 2024) , 1631-1637, Piscataway, NJ, IEEE International Conference on Robotics and Automation (ICRA 2024), August 2024 (Published)
Wheeled mobile robots need the ability to estimate their motion and the effect of their control actions for navigation planning. In this paper, we present ST-VIO, a novel approach which tightly fuses a single-track dynamics model for wheeled ground vehicles with visual-inertial odometry (VIO). Our method calibrates and adapts the dynamics model online to improve the accuracy of forward prediction conditioned on future control inputs. The single-track dynamics model approximates wheeled vehicle motion under specific control inputs on flat ground using ordinary differential equations. We use a singularity-free and differentiable variant of the single-track model to enable seamless integration as dynamics factor into VIO and to optimize the model parameters online together with the VIO state variables. We validate our method with real-world data in both indoor and outdoor environments with different terrain types and wheels. In experiments, we demonstrate that ST-VIO can not only adapt to wheel or ground changes and improve the accuracy of prediction under new control inputs, but can even improve tracking accuracy.
preprint supplemental video code datasets DOI URL BibTeX

Autonomous Learning Conference Paper Dual-Force: Enhanced Offline Diversity Maximization under Imitation Constraints Kolev, P., Vlastelica, M., Martius, G. In Seventeenth European Workshop on Reinforcement Learning, August 2024 (Accepted)
While many algorithms for diversity maximization under imitation constraints are online in nature, many applications require offline algorithms without environment interactions. Tackling this problem in the offline setting, however, presents significant challenges that require non-trivial, multi-stage optimization processes with non-stationary rewards. In this work, we present a novel offline algorithm that enhances diversity using an objective based on Van der Waals (VdW) force and successor features, and eliminates the need to learn a previously used skill discriminator. Moreover, by conditioning the value function and policy on a pre-trained Functional Reward Encoding (FRE), our method allows for better handling of non-stationary rewards and provides zero-shot recall of all skills encountered during training, significantly expanding the set of skills learned in prior work. Consequently, our algorithm benefits from receiving a consistently strong diversity signal (VdW), and enjoys more stable and efficient training. We demonstrate the effectiveness of our method in generating diverse skills for two robotic tasks in simulation: locomotion of a quadruped and local navigation with obstacle traversal.
URL BibTeX

Haptic Intelligence Miscellaneous Adapting a High-Fidelity Simulation of Human Skin for Comparative Touch Sensing Schulz, A., Serhat, G., Kuchenbecker, K. J. Extended abstract (1 page) presented at the American Society of Biomechanics Annual Meeting (ASB), Madison, USA, August 2024 (Published) BibTeX

Empirical Inference Conference Paper Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals Ortu*, F., Jin*, Z., Doimo, D., Sachan, M., Cazzaniga, A., Schölkopf, B. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL) , Volume 1, Long Papers:8420-8436, (Editors: Lun-Wei Ku and Andre Martins and Vivek Srikumar), Association for Computational Linguistics, August 2024, *equal contribution (Published) arXiv URL BibTeX

Haptic Intelligence Ph.D. Thesis Engineering and Evaluating Naturalistic Vibrotactile Feedback for Telerobotic Assembly Gong, Y. University of Stuttgart, Stuttgart, Germany, August 2024, Faculty of Engineering Design, Production Engineering and Automotive Engineering (Published)
Teleoperation allows workers on a construction site to assemble pre-fabricated building components by controlling powerful machines from a safe distance. However, teleoperation's primary reliance on visual feedback limits the operator's efficiency in situations with stiff contact or poor visibility, compromising their situational awareness and thus increasing the difficulty of the task; it also makes construction machines more difficult to learn to operate. To bridge this gap, we propose that reliable, economical, and easy-to-implement naturalistic vibrotactile feedback could improve telerobotic control interfaces in construction and other application areas such as surgery. This type of feedback enables the operator to feel the natural vibrations experienced by the robot, which contain crucial information about its motions and its physical interactions with the environment. This dissertation explores how to deliver naturalistic vibrotactile feedback from a robot's end-effector to the hand of an operator performing telerobotic assembly tasks; furthermore, it seeks to understand the effects of such haptic cues. The presented research can be divided into four parts. We first describe the engineering of AiroTouch, a naturalistic vibrotactile feedback system tailored for use on construction sites but suitable for many other applications of telerobotics. Then we evaluate AiroTouch and explore the effects of the naturalistic vibrotactile feedback it delivers in three user studies conducted either in laboratory settings or on a construction site. We begin this dissertation by developing guidelines for creating a haptic feedback system that provides high-quality naturalistic vibrotactile feedback. These guidelines include three sections: component selection, component placement, and system evaluation. We detail each aspect with the parameters that need to be considered. Based on these guidelines, we adapt widely available commercial audio equipment to create our system called AiroTouch, which measures the vibration experienced by each robot tool with a high-bandwidth three-axis accelerometer and enables the user to feel this vibration in real time through a voice-coil actuator. Accurate haptic transmission is achieved by optimizing the positions of the system's off-the-shelf sensors and actuators and is then verified through measurements. The second part of this thesis presents our initial validation of AiroTouch. We explored how adding this naturalistic type of vibrotactile feedback affects the operator during small-scale telerobotic assembly. Due to the limited accessibility of teleoperated robots and to maintain safety, we conducted a user study in lab with a commercial bimanual dexterous teleoperation system developed for surgery (Intuitive da Vinci Si). Thirty participants used this robot equipped with AiroTouch to assemble a small stiff structure under three randomly ordered haptic feedback conditions: no vibrations, one-axis vibrations, and summed three-axis vibrations. The results show that participants learn to take advantage of both tested versions of the haptic feedback in the given tasks, as significantly lower vibrations and forces are observed in the second trial. Subjective responses indicate that naturalistic vibrotactile feedback increases the realism of the interaction and reduces the perceived task duration, task difficulty, and fatigue. To test our approach on a real construction site, we enhanced AiroTouch using wireless signal-transmission technologies and waterproofing, and then we adapted it to a mini-crane construction robot. A study was conducted to evaluate how naturalistic vibrotactile feedback affects an observer's understanding of telerobotic assembly performed by this robot on a construction site. Seven adults without construction experience observed a mix of manual and autonomous assembly processes both with and without naturalistic vibrotactile feedback. Qualitative analysis of their survey responses and interviews indicates that all participants had positive responses to this technology and believed it would be beneficial for construction activities. Finally, we evaluated the effects of naturalistic vibrotactile feedback provided by wireless AiroTouch during live teleoperation of the mini-crane. Twenty-eight participants remotely controlled the mini-crane to complete three large-scale assembly-related tasks in lab, both with and without this type of haptic feedback. Our results show that naturalistic vibrotactile feedback enhances the participants' awareness of both robot motion and contact between the robot and other objects, particularly in scenarios with limited visibility. These effects increase participants' confidence when controlling the robot. Moreover, there is a noticeable trend of reduced vibration magnitude in the conditions where this type of haptic feedback is provided. The primary contribution of this dissertation is the clear explanation of details that are essential for the effective implementation of naturalistic vibrotactile feedback. We demonstrate that our accessible, audio-based approach can enhance user performance and experience during telerobotic assembly in construction and other application domains. These findings lay the foundation for further exploration of the potential benefits of incorporating haptic cues to enhance user experience during teleoperation.
BibTeX

Haptic Intelligence Article Fingertip Dynamic Response Simulated Across Excitation Points and Frequencies Serhat, G., Kuchenbecker, K. J. Biomechanics and Modeling in Mechanobiology, 23(4):1369-1376, August 2024 (Published)
Predicting how the fingertip will mechanically respond to different stimuli can help explain human haptic perception and enable improvements to actuation approaches such as ultrasonic mid-air haptics. This study addresses this goal using high-fidelity 3D finite element analyses. We compute the deformation profiles and amplitudes caused by harmonic forces applied in the normal direction at four locations: the center of the finger pad, the side of the finger, the tip of the finger, and the oblique midpoint of these three sites. The excitation frequency is swept from 2.5 to 260 Hz. The simulated frequency response functions (FRFs) obtained for displacement demonstrate that the relative magnitudes of the deformations elicited by stimulating at each of these four locations greatly depends on whether only the excitation point or the entire finger is considered. The point force that induces the smallest local deformation can even cause the largest overall deformation at certain frequency intervals. Above 225 Hz, oblique excitation produces larger mean displacement amplitudes than the other three forces due to excitation of multiple modes involving diagonal deformation. These simulation results give novel insights into the combined influence of excitation location and frequency on the fingertip dynamic response, potentially facilitating the design of future vibration feedback devices.
DOI BibTeX

Empirical Inference Article Leveraging Task Structures for Improved Identifiability in Neural Network Representations Chen*, W., Horwood*, J., Heo, J., Hernández-Lobato, J. M. Transactions on Machine Learning Research, August 2024, *equal contribution (Published) URL BibTeX

Haptic Intelligence Robotics Miscellaneous Modeling Shank Tissue Properties and Quantifying Body Composition with a Wearable Actuator-Accelerometer Set Rokhmanova, N., Martus, J., Faulkner, R., Fiene, J., Kuchenbecker, K. J. Extended abstract (1 page) presented at the American Society of Biomechanics Annual Meeting (ASB), Madison, USA, August 2024 (Published) BibTeX

Empirical Inference Conference Paper Modelling Variability in Human Annotator Simulation Wu*, W., Chen*, W., Zhang, C., Woodland, P. C. Findings of the Association for Computational Linguistics (ACL), 1139-1157, (Editors: Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek), Association for Computational Linguistics, August 2024, *equal contribution (Published) URL BibTeX

Empirical Inference Conference Paper Moûsai: Efficient Text-to-Music Diffusion Models Schneider, F., Kamal, O., Jin, Z., Schölkopf, B. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL), Volume 1: Long Papers:8050-8068, (Editors: Lun-Wei Ku and Andre Martins and Vivek Srikumar), Association for Computational Linguistics, August 2024 (Published) URL BibTeX

Perceiving Systems Article Re-Thinking Inverse Graphics with Large Language Models Kulits, P., Feng, H., Liu, W., Abrevaya, V., Black, M. J. Transactions on Machine Learning Research, August 2024 (Published)
Inverse graphics -- the task of inverting an image into physical variables that, when rendered, enable reproduction of the observed scene -- is a fundamental challenge in computer vision and graphics. Successfully disentangling an image into its constituent elements, such as the shape, color, and material properties of the objects of the 3D scene that produced it, requires a comprehensive understanding of the environment. This complexity limits the ability of existing carefully engineered approaches to generalize across domains. Inspired by the zero-shot ability of large language models (LLMs) to generalize to novel contexts, we investigate the possibility of leveraging the broad world knowledge encoded in such models to solve inverse-graphics problems. To this end, we propose the Inverse-Graphics Large Language Model (IG-LLM), an inverse-graphics framework centered around an LLM, that autoregressively decodes a visual embedding into a structured, compositional 3D-scene representation. We incorporate a frozen pre-trained visual encoder and a continuous numeric head to enable end-to-end training. Through our investigation, we demonstrate the potential of LLMs to facilitate inverse graphics through next-token prediction, without the application of image-space supervision. Our analysis enables new possibilities for precise spatial reasoning about images that exploit the visual knowledge of LLMs. We release our code and data at https://ig-llm.is.tue.mpg.de/ to ensure the reproducibility of our investigation and to facilitate future research.
pdf URL BibTeX

Empirical Inference Conference Paper CausalCite: A Causal Formulation of Paper Citations Agrawal, I., Jin, Z., Mokhtarian, E., Guo, S., Chen, Y., Sachan, M., Schölkopf, B. Findings of the Association for Computational Linguistics (ACL), 8395-8410, (Editors: Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek), Association for Computational Linguistics, August 2024 (Published) arXiv URL BibTeX

Empirical Inference Conference Paper A Sparsity Principle for Partially Observable Causal Representation Learning Xu, D., Yao, D., Lachapelle, S., Taslakian, P., von Kügelgen, J., Locatello, F., Magliacane, S. Proceedings of the 41st International Conference on Machine Learning (ICML), 235:55389-55433, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) URL BibTeX

Physics for Inference and Optimization Conference Paper A causality-inspired plus-minus model for player evaluation in team sports De Bacco, C., Wang, Y., Blei, D. In Proceedings of Machine Learning Research , Conference on Causal Learning and Reasoning, July 2024 (Published) Paper DOI URL BibTeX

Empirical Inference Conference Paper Accuracy on the wrong line: On the pitfalls of noisy data for OOD generalisation Sanyal, A., Hu, Y., Yu, Y., Ma, Y., Wang, Y., Schölkopf, B. ICML 2024 Next Generation of AI Safety Workshop (Oral), July 2024 (Published) arXiv PDF BibTeX

Empirical Inference Conference Paper All-in-one simulation-based inference Gloeckler, M., Deistler, M., Weilbach, C. D., Wood, F., Macke, J. H. Proceedings of the 41st International Conference on Machine Learning (ICML), 235:15735-15766, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) URL BibTeX

Social Foundations of Computation Conference Paper Allocation Requires Prediction Only if Inequality Is Low Shirali, A., Abebe, R., Hardt, M. In Proceedings of the 41st International Conference on Machine Learning (ICML 2024), PMLR, The Forty-First International Conference on Machine Learning (ICML), July 2024, *equal contribution (Published)
Algorithmic predictions are emerging as a promising solution concept for efficiently allocating societal resources. Fueling their use is an underlying assumption that such systems are necessary to identify individuals for interventions. We propose a principled framework for assessing this assumption: Using a simple mathematical model, we evaluate the efficacy of prediction-based allocations in settings where individuals belong to larger units such as hospitals, neighborhoods, or schools. We find that prediction-based allocations outperform baseline methods using aggregate unit-level statistics only when between-unit inequality is low and the intervention budget is high. Our results hold for a wide range of settings for the price of prediction, treatment effect heterogeneity, and unit-level statistics’ learnability. Combined, we highlight the potential limits to improving the efficacy of interventions through prediction
ArXiv URL BibTeX

Autonomous Learning Conference Paper Causal Action Influence Aware Counterfactual Data Augmentation Urpi, N. A., Bagatella, M., Vlastelica, M., Martius, G. In Proceedings of the 41st International Conference on Machine Learning (ICML), 235:1709-1729, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) URL BibTeX

Social Foundations of Computation Conference Paper Causal Inference from Competing Treatments Stoica, A., Nastl, V. Y., Hardt, M. In Proceedings of the 41st International Conference on Machine Learning (ICML 2024), PMLR, The Forty-First International Conference on Machine Learning (ICML), July 2024 (Published)
Many applications of RCTs involve the presence of multiple treatment administrators -- from field experiments to online advertising -- that compete for the subjects' attention. In the face of competition, estimating a causal effect becomes difficult, as the position at which a subject sees a treatment influences their response, and thus the treatment effect. In this paper, we build a game-theoretic model of agents who wish to estimate causal effects in the presence of competition, through a bidding system and a utility function that minimizes estimation error. Our main technical result establishes an approximation with a tractable objective that maximizes the sample value obtained through strategically allocating budget on subjects. This allows us to find an equilibrium in our model: we show that the tractable objective has a pure Nash equilibrium, and that any Nash equilibrium is an approximate equilibrium for our general objective that minimizes estimation error under broad conditions. Conceptually, our work successfully combines elements from causal inference and game theory to shed light on the equilibrium behavior of experimentation under competition.
ArXiv URL BibTeX

Empirical Inference Conference Paper Detecting and Identifying Selection Structure in Sequential Data Zheng, Y., Tang, Z., Qiu, Y., Schölkopf, B., Zhang, K. Proceedings of the 41st International Conference on Machine Learning (ICML), 235:61498-61525, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Diffusion Tempering Improves Parameter Estimation with Probabilistic Integrators for ODEs Beck, J., Bosch, N., Deistler, M., Kadhim, K. L., Macke, J. H., Hennig, P., Berens, P. Proceedings of the 41st International Conference on Machine Learning (ICML), 235:3305-3326, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) arXiv URL BibTeX

Empirical Inference Conference Paper Diffusive Gibbs Sampling Chen*, W., Zhang*, M., Paige, B., Hernández-Lobato, J. M., Barber, D. Proceedings of the 41st International Conference on Machine Learning (ICML), 235:7731-7747, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024, *equal contribution (Published) URL BibTeX

Empirical Inference Conference Paper Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners? Opedal, A., Stolfo, A., Shirakami, H., Jiao, Y., Cotterell, R., Schölkopf, B., Saparov, A., Sachan, M. Proceedings of the 41st International Conference on Machine Learning (ICML), 235:38762-38778, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) URL BibTeX

Social Foundations of Computation Conference Paper Don’t Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget Dorner, F. E., Hardt, M. In Proceedings of the 41st International Conference on Machine Learning (ICML 2024), PMLR, The Forty-First International Conference on Machine Learning (ICML), July 2024 (Published)
We study how to best spend a budget of noisy labels to compare the accuracy of two binary classifiers. It's common practice to collect and aggregate multiple noisy labels for a given data point into a less noisy label via a majority vote. We prove a theorem that runs counter to conventional wisdom. If the goal is to identify the better of two classifiers, we show it's best to spend the budget on collecting a single label for more samples. Our result follows from a non-trivial application of Cram\'er's theorem, a staple in the theory of large deviations. We discuss the implications of our work for the design of machine learning benchmarks, where they overturn some time-honored recommendations. In addition, our results provide sample size bounds superior to what follows from Hoeffding's bound.
ArXiv URL BibTeX

Haptic Intelligence Master Thesis Estimating Contact Forces Across Soft Capacitive Tactile Sensors Using Machine Learning Tiwari, A. Saarland University, Saarbrücken, Germany, July 2024, M.Sc. in Embedded Systems (Published)
Robots have become an essential part of the modern world, playing a crucial role in applications from manufacturing to healthcare. Despite significant advancements, the operational range of robots remains relatively narrow, often limited to controlled environments and simple, predetermined tasks. Tactile sensors show promise in broadening this range by enhancing a robot's performance in fine manipulation tasks. These sensors enable robots to perceive contact, providing a more nuanced understanding of their environment in real time. The challenge, however, lies in deriving meaningful and interpretable insights from these sensors, such as contact location and force, which are crucial for dexterous manipulation tasks. To address this challenge, this thesis develops machine learning-based software that achieves precise real-time contact location and force sensing across the entire surface of a grid-based soft capacitive tactile sensor, enabling rapid and straightforward deployment and facilitating transferability to other sensor instances, all while retaining the advantageous attributes of capacitance technology. Machine learning models were trained using data captured by indenting the sensor surface and measuring the sensor responses and the applied normal forces. Convolutional neural networks (CNNs) were selected for their low prediction errors in contact force estimation with the collected dataset. Two distinct models were developed: one for estimating contact forces at a single point and another for estimating normal force distributions. The transferability of the trained models across different sensor instances was evaluated and improved. The single point contact force estimation model's practical utility was demonstrated through real-time closed-loop control of a Franka Emika Panda robot arm through two specific tasks: tactile servoing in 1D and active object centering in 2D. This research contributes to enhancing the accessibility of soft tactile sensors in robotic applications through machine learning and demonstrates that this approach can improve the capabilities of tactile sensors.
BibTeX

Empirical Inference Conference Paper Geometry-Aware Instrumental Variable Regression Kremer, H., Schölkopf, B. Proceedings of the 41st International Conference on Machine Learning (ICML), 235:25560-25582, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Implicit meta-learning may lead language models to trust more reliable sources Krasheninnikov, D., Krasheninnikov, E., Mlodozeniec, B. K., Maharaj, T., Krueger, D. Proceedings of the 41st International Conference on Machine Learning, 235:25534-25559, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Improving Neural Additive Models with Bayesian Principles Bouchiat, K., Immer, A., Yèche, H., Rätsch, G., Fortuin, V. Proceedings of the 41st International Conference on Machine Learning (ICML), 235:4416-4443, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) URL BibTeX

Robust Machine Learning Conference Paper InfoNCE: Identifying the Gap Between Theory and Practice Rusak, E., Reizinger, P., Juhos, A., Bringmann, O., Zimmermann, R. S., Brendel, W. In July 2024 (Published) BibTeX

Social Foundations of Computation Conference Paper Inherent Trade-Offs between Diversity and Stability in Multi-Task Benchmarks Zhang, G., Hardt, M. In Proceedings of the 41st International Conference on Machine Learning (ICML 2024), PMLR, The Forty-First International Conference on Machine Learning (ICML), July 2024 (Published)
We examine multi-task benchmarks in machine learning through the lens of social choice theory. We draw an analogy between benchmarks and electoral systems, where models are candidates and tasks are voters. This suggests a distinction between cardinal and ordinal benchmark systems. The former aggregate numerical scores into one model ranking; the latter aggregate rankings for each task. We apply Arrow's impossibility theorem to ordinal benchmarks to highlight the inherent limitations of ordinal systems, particularly their sensitivity to the inclusion of irrelevant models. Inspired by Arrow's theorem, we empirically demonstrate a strong trade-off between diversity and sensitivity to irrelevant changes in existing multi-task benchmarks. Our result is based on new quantitative measures of diversity and sensitivity that we introduce. Sensitivity quantifies the impact that irrelevant changes to tasks have on a benchmark. Diversity captures the degree of disagreement in model rankings across tasks. We develop efficient approximation algorithms for both measures, as exact computation is computationally challenging. Through extensive experiments on seven cardinal benchmarks and eleven ordinal benchmarks, we demonstrate a clear trade-off between diversity and stability: The more diverse a multi-task benchmark, the more sensitive to trivial changes it is. Additionally, we show that the aggregated rankings of existing benchmarks are highly unstable under irrelevant changes.
ArXiv Code URL BibTeX

Autonomous Learning Conference Paper LPGD: A General Framework for Backpropagation through Embedded Optimization Layers Paulus, A., Martius, G., Musil, V. In Proceedings of the 41st International Conference on Machine Learning (ICML), 235:39989-40014, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) URL BibTeX

Autonomous Learning Conference Paper Learning with 3D rotations, a hitchhiker’s guide to SO(3) Geist, A. R., Frey, J., Zhobro, M., Levina, A., Martius, G. In Proceedings of Machine Learning Research, Proceedings of the Forty-First International Conference on Machine Learning , 235:15331-15350, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), Forty-First International Conference on Machine Learning , July 2024 (Published)
Many settings in machine learning require the selection of a rotation representation. However, choosing a suitable representation from the many available options is challenging. This paper acts as a survey and guide through rotation representations. We walk through their properties that harm or benefit deep learning with gradient-based optimization. By consolidating insights from rotation-based learning, we provide a comprehensive overview of learning functions with rotation representations. We provide guidance on selecting representations based on whether rotations are in the model's input or output and whether the data primarily comprises small angles.
URL BibTeX

Perceiving Systems Ph.D. Thesis Modelling Dynamic 3D Human-Object Interactions: From Capture to Synthesis Taheri, O. University of Tübingen, July 2024 (Accepted)
Modeling digital humans that move and interact realistically with virtual 3D worlds has emerged as an essential research area recently, with significant applications in computer graphics, virtual and augmented reality, telepresence, the Metaverse, and assistive technologies. In particular, human-object interaction, encompassing full-body motion, hand-object grasping, and object manipulation, lies at the core of how humans execute tasks and represents the complex and diverse nature of human behavior. Therefore, accurate modeling of these interactions would enable us to simulate avatars to perform tasks, enhance animation realism, and develop applications that better perceive and respond to human behavior. Despite its importance, this remains a challenging problem, due to several factors such as the complexity of human motion, the variance of interaction based on the task, and the lack of rich datasets capturing the complexity of real-world interactions. Prior methods have made progress, but limitations persist as they often focus on individual aspects of interaction, such as body, hand, or object motion, without considering the holistic interplay among these components. This Ph.D. thesis addresses these challenges and contributes to the advancement of human-object interaction modeling through the development of novel datasets, methods, and algorithms.
BibTeX