Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Empirical Inference Ph.D. Thesis Advances in Probabilistic Methods for Deep Learning Immer, A. ETH Zurich, Switzerland, September 2024, CLS PhD Program (Published) BibTeX

Perceiving Systems Conference Paper Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects Fan, Z., Ohkawa, T., Yang, L., Lin, N., Zhou, Z., Zhou, S., Liang, J., Gao, Z., Zhang, X., Zhang, X., Li, F., Zheng, L., Lu, F., Zeid, K. A., Leibe, B., On, J., Baek, S., Prakash, A., Gupta, S., He, K., et al. In European Conference on Computer Vision (ECCV 2024), 428-448, LNCS, Springer Cham, September 2024 (Published)
We interact with the world with our hands and see it through our own (egocentric) perspective.A holistic 3D understanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation. Accurately reconstructing such interactions in 3D is challenging due to heavy occlusion, viewpoint bias, camera distortion, and motion blur from the head movement.To this end, we designed the HANDS23 challenge based on the AssemblyHands and ARCTIC datasets with carefully designed training and testing splits. Based on the results of the top submitted methods and more recent baselines on the leaderboards, we perform a thorough analysis on 3D hand(-object) reconstruction tasks.Our analysis demonstrates the effectiveness of addressing distortion specific to egocentric cameras, adopting high-capacity transformers to learn complex hand-object interactions, and fusing predictions from different views.Our study further reveals challenging scenarios intractable with state-of-the-art methods, such as fast hand motion, object reconstruction from narrow egocentric views, and close contact between two hands and objects.Our efforts will enrich the community’s knowledge foundation and facilitate future hand studies on egocentric hand-object interactions.
Paper Leaderboard DOI BibTeX

Perceiving Systems Conference Paper Explorative Inbetweening of Time and Space Feng, H., Ding, Z., Xia, Z., Niklaus, S., Fernandez Abrevaya, V., Black, M. J., Zhang, X. In European Conference on Computer Vision (ECCV 2024), 378-395, LNCS, Springer Cham, September 2024 (Published)
We introduce bounded generation as a generalized task to control video generation to synthesize arbitrary camera and subject motion based only on a given start and end frame. Our objective is to fully leverage the inherent generalization capability of an image-to-video model without additional training or fine-tuning of the original model. This is achieved through the proposed new sampling strategy, which we call Time Reversal Fusion, that fuses the temporally forward and backward denoising paths conditioned on the start and end frame, respectively. The fused path results in a video that smoothly connects the two frames, generating inbetweening of faithful subject motion, novel views of static scenes, and seamless video looping when the two bounding frames are identical. We curate a diverse evaluation dataset of image pairs and compare against the closest existing methods. We find that Time Reversal Fusion outperforms related work on all subtasks, exhibiting the ability to generate complex motions and 3D-consistent views guided by bounded frames.
Paper Website DOI URL BibTeX

Social Foundations of Computation Conference Paper Fairness in Social Influence Maximization via Optimal Transport Chowdhary, S., De Pasquale, G., Lanzetti, N., Stoica, A., Dorfler, F. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), The Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), September 2024 (Published)
We study fairness in social influence maximization, whereby one seeks to select seeds that spread a given information throughout a network, ensuring balanced outreach among different communities (e.g. demographic groups). In the literature, fairness is often quantified in terms of the expected outreach within individual communities. In this paper, we demonstrate that such fairness metrics can be misleading since they ignore the stochastic nature of information diffusion processes. When information diffusion occurs in a probabilistic manner, multiple outreach scenarios can occur. As such, outcomes such as "in 50\% of the cases, no one of group 1 receives the information and everyone in group 2 receives it and in other 50\%, the opposite happens", which always results in largely unfair outcomes, are classified as fair by a variety of fairness metrics in the literature. We tackle this problem by designing a new fairness metric, mutual fairness, that captures variability in outreach through optimal transport theory. We propose a new seed selection algorithm that optimizes both outreach and mutual fairness, and we show its efficacy on several real datasets. We find that our algorithm increases fairness with only a minor decrease (and at times, even an increase) in efficiency.
ArXiv URL BibTeX

Haptic Intelligence Empirical Inference Optics and Sensing Laboratory Software Workshop Article Fiber-Optic Shape Sensing Using Neural Networks Operating on Multispecklegrams Cao, C. G. L., Javot, B., Bhattarai, S., Bierig, K., Oreshnikov, I., Volchkov, V. V. IEEE Sensors Journal, 24(17):27532-27540, September 2024 (Published)
Application of machine learning techniques on fiber speckle images to infer fiber deformation allows the use of an unmodified multimode fiber to act as a shape sensor. This approach eliminates the need for complex fiber design or construction (e.g., Bragg gratings and time-of-flight). Prior work in shape determination using neural networks trained on a finite number of possible fiber shapes (formulated as a classification task), or trained on a few continuous degrees of freedom, has been limited to reconstruction of fiber shapes only one bend at a time. Furthermore, generalization to shapes that were not used in training is challenging. Our innovative approach improves generalization capabilities, using computer vision-assisted parameterization of the actual fiber shape to provide a ground truth, and multiple specklegrams per fiber shape obtained by controlling the input field. Results from experimenting with several neural network architectures, shape parameterization, number of inputs, and specklegram resolution show that fiber shapes with multiple bends can be accurately predicted. Our approach is able to generalize to new shapes that were not in the training set. This approach of end-to-end training on parameterized ground truth opens new avenues for fiber-optic sensor applications. We publish the datasets used for training and validation, as well as an out-of-distribution (OOD) test set, and encourage interested readers to access these datasets for their own model development.
DOI BibTeX

Perceiving Systems Conference Paper Generating Human Interaction Motions in Scenes with Text Control Yi, H., Thies, J., Black, M. J., Peng, X. B., Rempe, D. In European Conference on Computer Vision (ECCV 2024), 246-263, LNCS, Springer Cham, September 2024 (Published)
We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models. Previous text-to-motion methods focus on characters in isolation without considering scenes due to the limited availability of datasets that include motion, text descriptions, and interactive scenes. Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model, emphasizing goal-reaching constraints on large-scale motion-capture datasets. We then enhance this model with a scene-aware component, fine-tuned using data augmented with detailed scene information, including ground plane and object shapes. To facilitate training, we embed annotated navigation and interaction motions within scenes. The proposed method produces realistic and diverse human-object interactions, such as navigation and sitting, in different scenes with various object shapes, orientations, initial body positions, and poses. Extensive experiments demonstrate that our approach surpasses prior techniques in terms of the plausibility of human-scene interactions, as well as the realism and variety of the generated motions.
pdf project DOI URL BibTeX

Empirical Inference Autonomous Learning Conference Paper Learning to Control Emulated Muscles in Real Robots: A Software Test Bed for Bio-Inspired Actuators in Hardware Schumacher, P., Krause, L., Schneider, J., Büchler, D., Martius, G., Haeufle, D. In Proceedings 10th International Conference on Biomedical Robotics and Biomechatronics (BioRob), 806-813, IEEE, 10th International Conference on Biomedical Robotics and Biomechatronics (BioRob), September 2024 (Published) arXiv DOI URL BibTeX

Perceiving Systems Article Localization and recognition of human action in 3D using transformers Sun, J., Huang, L., Hongsong Wang, C. Z. J. Q., Islam, M. T., Xie, E., Zhou, B., Xing, L., Chandrasekaran, A., Black, M. J. Nature Communications Engineering , 13(125), September 2024 (Published)
Understanding a person’s behavior from their 3D motion sequence is a fundamental problem in computer vision with many applications. An important component of this problem is 3D action localization, which involves recognizing what actions a person is performing, and when the actions occur in the sequence. To promote the progress of the 3D action localization community, we introduce a new, challenging, and more complex benchmark dataset, BABEL-TAL (BT), for 3D action localization. Important baselines and evaluating metrics, as well as human evaluations, are carefully established on this benchmark. We also propose a strong baseline model, i.e., Localizing Actions with Transformers (LocATe), that jointly localizes and recognizes actions in a 3D sequence. The proposed LocATe shows superior performance on BABEL-TAL as well as on the large-scale PKU-MMD dataset, achieving state-of-the-art performance by using only 10% of the labeled training data. Our research could advance the development of more accurate and efficient systems for human behavior analysis, with potential applications in areas such as human-computer interaction and healthcare.
paper DOI BibTeX

Physics for Inference and Optimization Article Similarity and economy of scale in urban transportation networks and optimal transport-based infrastructures Leite, D., De Bacco, C. Nature Communications, September 2024 (Published)
Designing and optimizing the structure of urban transportation networks is a challenging task. In this study, we propose a method inspired by optimal transport theory to reproduce the optimal structure of public transportation networks, that uses little information in input. Contrarily to standard approaches, it does not assume any initial backbone network infrastructure, but rather extracts this directly from a continuous space using only a few origin and destination points. Analyzing a set of urban rail, tram and subway networks, we find a high degree of similarity between simulated and real infrastructures. By tuning one parameter, our method can simulate a range of different networks that can be further used to suggest possible improvements in terms of relevant transportation properties. Outputs of our algorithm provide naturally a principled quantitative measure of similarity between two networks that can be used to automatize the selection of similar simulated networks.
Preprint Code Paper DOI URL BibTeX

Perceiving Systems Conference Paper AWOL: Analysis WithOut synthesis using Language Zuffi, S., Black, M. J. In European Conference on Computer Vision (ECCV 2024), LNCS, Springer Cham, September 2024 (Published)
Many classical parametric 3D shape models exist, but creating novel shapes with such models requires expert knowledge of their parameters. For example, imagine creating a specific type of tree using procedural graphics or a new kind of animal from a statistical shape model. Our key idea is to leverage language to control such existing models to produce novel shapes. This involves learning a mapping between the latent space of a vision-language model and the parameter space of the 3D model, which we do using a small set of shape and text pairs. Our hypothesis is that mapping from language to parameters allows us to generate parameters for objects that were never seen during training. If the mapping between language and parameters is sufficiently smooth, then interpolation or generalization in language should translate appropriately into novel 3D shapes. We test our approach with two very different types of parametric shape models (quadrupeds and arboreal trees). We use a learned statistical shape model of quadrupeds and show that we can use text to generate new animals not present during training. In particular, we demonstrate state-of-the-art shape estimation of 3D dogs. This work also constitutes the first language-driven method for generating 3D trees. Finally, embedding images in the CLIP latent space enables us to generate animals and trees directly from images.
Paper URL BibTeX

Perceiving Systems Article EarthRanger: An Open-Source Platform for Ecosystem Monitoring, Research, and Management Wall, J., Lefcourt, J., Jones, C., Doehring, C., O’Neill, D., Schneider, D., Steward, J., Krautwurst, J., Wong, T., Jones, B., Goodfellow, K., Schmitt, T., Gobush, K., Douglas-Hamilton, I., Pope, F., Schmidt, E., Palmer, J., Stokes, E., Reid, A., Elbroch, M. L., et al. Methods in Ecology and Evolution, 13, British Ecological Society, September 2024 (Published)
1. Effective approaches are needed to conserve the planet's remaining wildlife and wilderness landscapes, especially concerning global biodiversity conservation targets. Here, we present a new software system called EarthRanger: an open-source platform built to help monitor, research and manage ecosystems. 2. EarthRanger consists of seven main components (Core Server, API, Storage, Gundi, Web App, Mobile App, Ecoscope) that provide functionality for data (i) aggregation & collection, (ii) storage & management, (iii) real-time and post hoc analysis, (iv) visualisation and (v) dissemination. The mobile application provides field-based data recording and visualisation tools. EarthRanger may be deployed for single project use or can aggregate across multiple geographies as a centralised hub. EarthRanger can be used to collect standardised tracking data (e.g. from wildlife collars, vehicles and ranger patrols) and configurable event information (e.g. a singular recording with associated user-defined attribute information such as a wildlife sighting or encounter with a poacher). 3. Since development began in 2015, the platform has (at the time of writing) been deployed at over 500 sites across 70 countries and with myriad configurations and objectives. EarthRanger has improved the ability to monitor data feeds and manage conservation-related operations in real time. For instance, the deployment of EarthRanger by African Parks has led to the removal of over 50,000 snares, steady population growth of key species of concern and near cessation of poaching. In Liwonde's protected area, enhanced mitigation efforts supported by EarthRanger reduced the number of deaths from wildlife conflict by more than 91%. EarthRanger is also providing a platform to enhance standardisation, aggregation, transfer and long-term storage of ecological information and promote collaboration between groups conducting protected area management and ecology and biodiversity research.
pdf DOI BibTeX

Perceiving Systems Conference Paper GraspXL: Generating Grasping Motions for Diverse Objects at Scale Zhang, H., Christen, S., Fan, Z., Hilliges, O., Song, J. In European Conference on Computer Vision (ECCV 2024), Part XXVI:386-403, LNCS, Springer Cham, September 2024 (Published) Code Video Paper DOI URL BibTeX

Autonomous Learning Robotics Article Identifying Terrain Physical Parameters from Vision-Towards Physical-Parameter-Aware Locomotion and Navigation Chen, J., Frey, J., Zhou, R., Miki, T., Martius, G., Hutter, M. IEEE Robotics and Automation Letters, Identifying Terrain Physical Parameters From Vision, 9(11):9279-9286, August 2024 (Published)
Identifying the physical properties of the surrounding environment is essential for robotic locomotion and navigation to deal with non-geometric hazards, such as slippery and deformable terrains. It would be of great benefit for robots to anticipate these extreme physical properties before contact; however, estimating environmental physical parameters from vision is still an open challenge. Animals can achieve this by using their prior experience and knowledge of what they have seen and how it felt. In this work, we propose a cross-modal self-supervised learning framework for vision-based environmental physical parameter estimation, which paves the way for future physical-property-aware locomotion and navigation. We bridge the gap between existing policies trained in simulation and identification of physical terrain parameters from vision. We propose to train a physical decoder in simulation to predict friction and stiffness from multi-modal input. The trained network allows the labeling of real-world images with physical parameters in a self-supervised manner to further train a visual network during deployment, which can densely predict the friction and stiffness from image data. We validate our physical decoder in simulation and the real world using a quadruped ANYmal robot, outperforming an existing baseline method. We show that our visual network can predict the physical properties in indoor and outdoor experiments while allowing fast adaptation to new environments.
DOI URL BibTeX

Autonomous Learning Miscellaneous Directed Exploration in Reinforcement Learning from Linear Temporal Logic Bagatella, M., Krause, A., Martius, G. August 2024 (In revision)
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning, as it allows describing objectives beyond the expressivity of conventional discounted return formulations. Nonetheless, recent works have shown that LTL formulas can be translated into a variable rewarding and discounting scheme, whose optimization produces a policy maximizing a lower bound on the probability of formula satisfaction. However, the synthesized reward signal remains fundamentally sparse, making exploration challenging. We aim to overcome this limitation, which can prevent current algorithms from scaling beyond low-dimensional, short-horizon problems. We show how better exploration can be achieved by further leveraging the LTL specification and casting its corresponding Limit Deterministic Büchi Automaton (LDBA) as a Markov reward process, thus enabling a form of high-level value estimation. By taking a Bayesian perspective over LDBA dynamics and proposing a suitable prior distribution, we show that the values estimated through this procedure can be treated as a shaping potential and mapped to informative intrinsic rewards. Empirically, we demonstrate applications of our method from tabular settings to high-dimensional continuous systems, which have so far represented a significant challenge for LTL-based reinforcement learning algorithms.
URL BibTeX

Autonomous Learning Conference Paper Zero-Shot Object-Centric Representation Learning Didolkar, A., Zadaianchuk, A., Goyal, A., Mozer, M., Bengio, Y., Martius, G., Seitzer, M. In Zero-Shot Object-Centric Representation Learning, August 2024 (Accepted)
The goal of object-centric representation learning is to decompose visual scenes into a structured representation that isolates the entities. Recent successes have shown that object-centric representation learning can be scaled to real-world scenes by utilizing pre-trained self-supervised features. However, so far, object-centric methods have mostly been applied in-distribution, with models trained and evaluated on the same dataset. This is in contrast to the wider trend in machine learning towards general-purpose models directly applicable to unseen data and tasks. Thus, in this work, we study current object-centric methods through the lens of zero-shot generalization by introducing a benchmark comprising eight different synthetic and real-world datasets. We analyze the factors influencing zero-shot performance and find that training on diverse real-world images improves transferability to unseen scenarios. Furthermore, inspired by the success of task-specific fine-tuning in foundation models, we introduce a novel fine-tuning strategy to adapt pre-trained vision encoders for the task of object discovery. We find that the proposed approach results in state-of-the-art performance for unsupervised object discovery, exhibiting strong zero-shot transfer to unseen datasets.
URL BibTeX

Autonomous Learning Article Sensing multi-directional forces at superresolution using taxel value isoline theory Sun, H., Spiers, A., Lee, H., Fiene, J., Martius, G. Sensing multi-directional forces at superresolution using taxel value isoline theory, August 2024 (Accepted)
Robots can benefit from a good sense of touch to perceive their interaction with the world. However, contacts are complex phenomena that involve tactile sensing devices, contact objects, and the complex directional (normal and shear) force motions in-between. To advance tactile sensor research, we propose a comprehensive theory that unites these components, providing insights for sensor designs, explaining performance drops due to shear forces, and suggesting application scenarios with various contact objects. Our theory, based on sensor isolines, achieves superresolution sensing performance using only a few sensing units, avoiding the need for dense layouts. Through analysis of the sensor perception field and force sensitivity from a structural perspective, along with the influences of contact object sizes, we also explore the effects of different force directions: normal, tangential shear, and radial shear forces. The theoretical model covers all these aspects and predicts a system-level inherent accuracy loss introduced by shear forces compared to pure normal forces. To validate our theory, we developed Barodome, a 3D sensor capable of predicting contact locations and decoupling shear forces from normal forces. The sensor's performance confirms the significant impact of shear forces on performance, alongside normal forces. The observed 0.5 mm drop in the real sensor's performance (normal and shear forces) closely matches the theoretical prediction of 0.33 mm. Overall, our theory offers valuable guidance for future tactile sensor designs, informing various design choices and enhancing the development of advanced robotic touch …
URL BibTeX

Organizational Leadership and Diversity Conference Paper Gig work in organizations: Trends and perspectives from Human Resource Management professionals Singh, V., Keplinger, K., Tursunbayeva, A., Di Lauro, S. In Proceedings of the 84th Annual Meeting of the Academy of Management, https://doi.org/10.5465/AMPROC.2024.14769symposium, Chicago, USA, 84th Annual Meeting of the Academy of Management, August 2024 (Published)
The gig economy has expanded beyond platform-based work and is also transforming standard organizations that are accustomed to stable employment arrangements and long-term-oriented HRM practices. The shift towards gig workers and blended teams disrupts standard HR practices due to the short-term, transactional nature of gig work. This research investigates the implications of gig work on HRM practices in standard organizations. Specifically, we 1) examine the trends and perspectives of HR professionals on the use of gig work in standard organizations, 2) investigate whether HR professionals apply standard HRM practices for gig workers, and 3) conduct a longitudinal analysis of HRM perspectives applicable to gig workers before and post-COVID-19 pandemic. To achieve these research objectives, we employ natural language processing techniques to analyze more than 500 YouTube videos of HR professionals offering their opinions about gig work. The findings suggest that despite the widely conceived notion that gig workers are ‘self-managed’, various HRM practices are utilized in the context of gig work.
Gig work and HRM DOI URL BibTeX

Embodied Vision Conference Paper Online Calibration of a Single-Track Ground Vehicle Dynamics Model by Tight Fusion with Visual-Inertial Odometry Li, H., Stueckler, J. In 2024 IEEE International Conference on Robotics and Automation (ICRA 2024) , 1631-1637, Piscataway, NJ, IEEE International Conference on Robotics and Automation (ICRA 2024), August 2024 (Published)
Wheeled mobile robots need the ability to estimate their motion and the effect of their control actions for navigation planning. In this paper, we present ST-VIO, a novel approach which tightly fuses a single-track dynamics model for wheeled ground vehicles with visual-inertial odometry (VIO). Our method calibrates and adapts the dynamics model online to improve the accuracy of forward prediction conditioned on future control inputs. The single-track dynamics model approximates wheeled vehicle motion under specific control inputs on flat ground using ordinary differential equations. We use a singularity-free and differentiable variant of the single-track model to enable seamless integration as dynamics factor into VIO and to optimize the model parameters online together with the VIO state variables. We validate our method with real-world data in both indoor and outdoor environments with different terrain types and wheels. In experiments, we demonstrate that ST-VIO can not only adapt to wheel or ground changes and improve the accuracy of prediction under new control inputs, but can even improve tracking accuracy.
preprint supplemental video code datasets DOI URL BibTeX

Autonomous Learning Conference Paper Dual-Force: Enhanced Offline Diversity Maximization under Imitation Constraints Kolev, P., Vlastelica, M., Martius, G. In Seventeenth European Workshop on Reinforcement Learning, August 2024 (Accepted)
While many algorithms for diversity maximization under imitation constraints are online in nature, many applications require offline algorithms without environment interactions. Tackling this problem in the offline setting, however, presents significant challenges that require non-trivial, multi-stage optimization processes with non-stationary rewards. In this work, we present a novel offline algorithm that enhances diversity using an objective based on Van der Waals (VdW) force and successor features, and eliminates the need to learn a previously used skill discriminator. Moreover, by conditioning the value function and policy on a pre-trained Functional Reward Encoding (FRE), our method allows for better handling of non-stationary rewards and provides zero-shot recall of all skills encountered during training, significantly expanding the set of skills learned in prior work. Consequently, our algorithm benefits from receiving a consistently strong diversity signal (VdW), and enjoys more stable and efficient training. We demonstrate the effectiveness of our method in generating diverse skills for two robotic tasks in simulation: locomotion of a quadruped and local navigation with obstacle traversal.
URL BibTeX

Haptic Intelligence Miscellaneous Adapting a High-Fidelity Simulation of Human Skin for Comparative Touch Sensing Schulz, A., Serhat, G., Kuchenbecker, K. J. Extended abstract (1 page) presented at the American Society of Biomechanics Annual Meeting (ASB), Madison, USA, August 2024 (Published) BibTeX

Empirical Inference Conference Paper Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals Ortu*, F., Jin*, Z., Doimo, D., Sachan, M., Cazzaniga, A., Schölkopf, B. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL) , Volume 1, Long Papers:8420-8436, (Editors: Lun-Wei Ku and Andre Martins and Vivek Srikumar), Association for Computational Linguistics, August 2024, *equal contribution (Published) arXiv URL BibTeX

Haptic Intelligence Ph.D. Thesis Engineering and Evaluating Naturalistic Vibrotactile Feedback for Telerobotic Assembly Gong, Y. University of Stuttgart, Stuttgart, Germany, August 2024, Faculty of Engineering Design, Production Engineering and Automotive Engineering (Published)
Teleoperation allows workers on a construction site to assemble pre-fabricated building components by controlling powerful machines from a safe distance. However, teleoperation's primary reliance on visual feedback limits the operator's efficiency in situations with stiff contact or poor visibility, compromising their situational awareness and thus increasing the difficulty of the task; it also makes construction machines more difficult to learn to operate. To bridge this gap, we propose that reliable, economical, and easy-to-implement naturalistic vibrotactile feedback could improve telerobotic control interfaces in construction and other application areas such as surgery. This type of feedback enables the operator to feel the natural vibrations experienced by the robot, which contain crucial information about its motions and its physical interactions with the environment. This dissertation explores how to deliver naturalistic vibrotactile feedback from a robot's end-effector to the hand of an operator performing telerobotic assembly tasks; furthermore, it seeks to understand the effects of such haptic cues. The presented research can be divided into four parts. We first describe the engineering of AiroTouch, a naturalistic vibrotactile feedback system tailored for use on construction sites but suitable for many other applications of telerobotics. Then we evaluate AiroTouch and explore the effects of the naturalistic vibrotactile feedback it delivers in three user studies conducted either in laboratory settings or on a construction site. We begin this dissertation by developing guidelines for creating a haptic feedback system that provides high-quality naturalistic vibrotactile feedback. These guidelines include three sections: component selection, component placement, and system evaluation. We detail each aspect with the parameters that need to be considered. Based on these guidelines, we adapt widely available commercial audio equipment to create our system called AiroTouch, which measures the vibration experienced by each robot tool with a high-bandwidth three-axis accelerometer and enables the user to feel this vibration in real time through a voice-coil actuator. Accurate haptic transmission is achieved by optimizing the positions of the system's off-the-shelf sensors and actuators and is then verified through measurements. The second part of this thesis presents our initial validation of AiroTouch. We explored how adding this naturalistic type of vibrotactile feedback affects the operator during small-scale telerobotic assembly. Due to the limited accessibility of teleoperated robots and to maintain safety, we conducted a user study in lab with a commercial bimanual dexterous teleoperation system developed for surgery (Intuitive da Vinci Si). Thirty participants used this robot equipped with AiroTouch to assemble a small stiff structure under three randomly ordered haptic feedback conditions: no vibrations, one-axis vibrations, and summed three-axis vibrations. The results show that participants learn to take advantage of both tested versions of the haptic feedback in the given tasks, as significantly lower vibrations and forces are observed in the second trial. Subjective responses indicate that naturalistic vibrotactile feedback increases the realism of the interaction and reduces the perceived task duration, task difficulty, and fatigue. To test our approach on a real construction site, we enhanced AiroTouch using wireless signal-transmission technologies and waterproofing, and then we adapted it to a mini-crane construction robot. A study was conducted to evaluate how naturalistic vibrotactile feedback affects an observer's understanding of telerobotic assembly performed by this robot on a construction site. Seven adults without construction experience observed a mix of manual and autonomous assembly processes both with and without naturalistic vibrotactile feedback. Qualitative analysis of their survey responses and interviews indicates that all participants had positive responses to this technology and believed it would be beneficial for construction activities. Finally, we evaluated the effects of naturalistic vibrotactile feedback provided by wireless AiroTouch during live teleoperation of the mini-crane. Twenty-eight participants remotely controlled the mini-crane to complete three large-scale assembly-related tasks in lab, both with and without this type of haptic feedback. Our results show that naturalistic vibrotactile feedback enhances the participants' awareness of both robot motion and contact between the robot and other objects, particularly in scenarios with limited visibility. These effects increase participants' confidence when controlling the robot. Moreover, there is a noticeable trend of reduced vibration magnitude in the conditions where this type of haptic feedback is provided. The primary contribution of this dissertation is the clear explanation of details that are essential for the effective implementation of naturalistic vibrotactile feedback. We demonstrate that our accessible, audio-based approach can enhance user performance and experience during telerobotic assembly in construction and other application domains. These findings lay the foundation for further exploration of the potential benefits of incorporating haptic cues to enhance user experience during teleoperation.
BibTeX

Haptic Intelligence Article Fingertip Dynamic Response Simulated Across Excitation Points and Frequencies Serhat, G., Kuchenbecker, K. J. Biomechanics and Modeling in Mechanobiology, 23(4):1369-1376, August 2024 (Published)
Predicting how the fingertip will mechanically respond to different stimuli can help explain human haptic perception and enable improvements to actuation approaches such as ultrasonic mid-air haptics. This study addresses this goal using high-fidelity 3D finite element analyses. We compute the deformation profiles and amplitudes caused by harmonic forces applied in the normal direction at four locations: the center of the finger pad, the side of the finger, the tip of the finger, and the oblique midpoint of these three sites. The excitation frequency is swept from 2.5 to 260 Hz. The simulated frequency response functions (FRFs) obtained for displacement demonstrate that the relative magnitudes of the deformations elicited by stimulating at each of these four locations greatly depends on whether only the excitation point or the entire finger is considered. The point force that induces the smallest local deformation can even cause the largest overall deformation at certain frequency intervals. Above 225 Hz, oblique excitation produces larger mean displacement amplitudes than the other three forces due to excitation of multiple modes involving diagonal deformation. These simulation results give novel insights into the combined influence of excitation location and frequency on the fingertip dynamic response, potentially facilitating the design of future vibration feedback devices.
DOI BibTeX

Empirical Inference Article Leveraging Task Structures for Improved Identifiability in Neural Network Representations Chen*, W., Horwood*, J., Heo, J., Hernández-Lobato, J. M. Transactions on Machine Learning Research, August 2024, *equal contribution (Published) URL BibTeX

Haptic Intelligence Robotics Miscellaneous Modeling Shank Tissue Properties and Quantifying Body Composition with a Wearable Actuator-Accelerometer Set Rokhmanova, N., Martus, J., Faulkner, R., Fiene, J., Kuchenbecker, K. J. Extended abstract (1 page) presented at the American Society of Biomechanics Annual Meeting (ASB), Madison, USA, August 2024 (Published) BibTeX

Empirical Inference Conference Paper Modelling Variability in Human Annotator Simulation Wu*, W., Chen*, W., Zhang, C., Woodland, P. C. Findings of the Association for Computational Linguistics (ACL), 1139-1157, (Editors: Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek), Association for Computational Linguistics, August 2024, *equal contribution (Published) URL BibTeX

Empirical Inference Conference Paper Moûsai: Efficient Text-to-Music Diffusion Models Schneider, F., Kamal, O., Jin, Z., Schölkopf, B. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL), Volume 1: Long Papers:8050-8068, (Editors: Lun-Wei Ku and Andre Martins and Vivek Srikumar), Association for Computational Linguistics, August 2024 (Published) URL BibTeX

Perceiving Systems Article Re-Thinking Inverse Graphics with Large Language Models Kulits, P., Feng, H., Liu, W., Abrevaya, V., Black, M. J. Transactions on Machine Learning Research, August 2024 (Published)
Inverse graphics -- the task of inverting an image into physical variables that, when rendered, enable reproduction of the observed scene -- is a fundamental challenge in computer vision and graphics. Successfully disentangling an image into its constituent elements, such as the shape, color, and material properties of the objects of the 3D scene that produced it, requires a comprehensive understanding of the environment. This complexity limits the ability of existing carefully engineered approaches to generalize across domains. Inspired by the zero-shot ability of large language models (LLMs) to generalize to novel contexts, we investigate the possibility of leveraging the broad world knowledge encoded in such models to solve inverse-graphics problems. To this end, we propose the Inverse-Graphics Large Language Model (IG-LLM), an inverse-graphics framework centered around an LLM, that autoregressively decodes a visual embedding into a structured, compositional 3D-scene representation. We incorporate a frozen pre-trained visual encoder and a continuous numeric head to enable end-to-end training. Through our investigation, we demonstrate the potential of LLMs to facilitate inverse graphics through next-token prediction, without the application of image-space supervision. Our analysis enables new possibilities for precise spatial reasoning about images that exploit the visual knowledge of LLMs. We release our code and data at https://ig-llm.is.tue.mpg.de/ to ensure the reproducibility of our investigation and to facilitate future research.
pdf URL BibTeX

Empirical Inference Conference Paper CausalCite: A Causal Formulation of Paper Citations Agrawal, I., Jin, Z., Mokhtarian, E., Guo, S., Chen, Y., Sachan, M., Schölkopf, B. Findings of the Association for Computational Linguistics (ACL), 8395-8410, (Editors: Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek), Association for Computational Linguistics, August 2024 (Published) arXiv URL BibTeX

Empirical Inference Conference Paper A Sparsity Principle for Partially Observable Causal Representation Learning Xu, D., Yao, D., Lachapelle, S., Taslakian, P., von Kügelgen, J., Locatello, F., Magliacane, S. Proceedings of the 41st International Conference on Machine Learning (ICML), 235:55389-55433, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) URL BibTeX

Physics for Inference and Optimization Conference Paper A causality-inspired plus-minus model for player evaluation in team sports De Bacco, C., Wang, Y., Blei, D. In Proceedings of Machine Learning Research , Conference on Causal Learning and Reasoning, July 2024 (Published) Paper DOI URL BibTeX

Empirical Inference Conference Paper Accuracy on the wrong line: On the pitfalls of noisy data for OOD generalisation Sanyal, A., Hu, Y., Yu, Y., Ma, Y., Wang, Y., Schölkopf, B. ICML 2024 Next Generation of AI Safety Workshop (Oral), July 2024 (Published) arXiv PDF BibTeX

Empirical Inference Conference Paper All-in-one simulation-based inference Gloeckler, M., Deistler, M., Weilbach, C. D., Wood, F., Macke, J. H. Proceedings of the 41st International Conference on Machine Learning (ICML), 235:15735-15766, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) URL BibTeX

Social Foundations of Computation Conference Paper Allocation Requires Prediction Only if Inequality Is Low Shirali, A., Abebe, R., Hardt, M. In Proceedings of the 41st International Conference on Machine Learning (ICML 2024), PMLR, The Forty-First International Conference on Machine Learning (ICML), July 2024, *equal contribution (Published)
Algorithmic predictions are emerging as a promising solution concept for efficiently allocating societal resources. Fueling their use is an underlying assumption that such systems are necessary to identify individuals for interventions. We propose a principled framework for assessing this assumption: Using a simple mathematical model, we evaluate the efficacy of prediction-based allocations in settings where individuals belong to larger units such as hospitals, neighborhoods, or schools. We find that prediction-based allocations outperform baseline methods using aggregate unit-level statistics only when between-unit inequality is low and the intervention budget is high. Our results hold for a wide range of settings for the price of prediction, treatment effect heterogeneity, and unit-level statistics’ learnability. Combined, we highlight the potential limits to improving the efficacy of interventions through prediction
ArXiv URL BibTeX

Autonomous Learning Conference Paper Causal Action Influence Aware Counterfactual Data Augmentation Urpi, N. A., Bagatella, M., Vlastelica, M., Martius, G. In Proceedings of the 41st International Conference on Machine Learning (ICML), 235:1709-1729, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) URL BibTeX

Social Foundations of Computation Conference Paper Causal Inference from Competing Treatments Stoica, A., Nastl, V. Y., Hardt, M. In Proceedings of the 41st International Conference on Machine Learning (ICML 2024), PMLR, The Forty-First International Conference on Machine Learning (ICML), July 2024 (Published)
Many applications of RCTs involve the presence of multiple treatment administrators -- from field experiments to online advertising -- that compete for the subjects' attention. In the face of competition, estimating a causal effect becomes difficult, as the position at which a subject sees a treatment influences their response, and thus the treatment effect. In this paper, we build a game-theoretic model of agents who wish to estimate causal effects in the presence of competition, through a bidding system and a utility function that minimizes estimation error. Our main technical result establishes an approximation with a tractable objective that maximizes the sample value obtained through strategically allocating budget on subjects. This allows us to find an equilibrium in our model: we show that the tractable objective has a pure Nash equilibrium, and that any Nash equilibrium is an approximate equilibrium for our general objective that minimizes estimation error under broad conditions. Conceptually, our work successfully combines elements from causal inference and game theory to shed light on the equilibrium behavior of experimentation under competition.
ArXiv URL BibTeX

Empirical Inference Conference Paper Detecting and Identifying Selection Structure in Sequential Data Zheng, Y., Tang, Z., Qiu, Y., Schölkopf, B., Zhang, K. Proceedings of the 41st International Conference on Machine Learning (ICML), 235:61498-61525, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Diffusion Tempering Improves Parameter Estimation with Probabilistic Integrators for ODEs Beck, J., Bosch, N., Deistler, M., Kadhim, K. L., Macke, J. H., Hennig, P., Berens, P. Proceedings of the 41st International Conference on Machine Learning (ICML), 235:3305-3326, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) arXiv URL BibTeX

Empirical Inference Conference Paper Diffusive Gibbs Sampling Chen*, W., Zhang*, M., Paige, B., Hernández-Lobato, J. M., Barber, D. Proceedings of the 41st International Conference on Machine Learning (ICML), 235:7731-7747, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024, *equal contribution (Published) URL BibTeX

Empirical Inference Conference Paper Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners? Opedal, A., Stolfo, A., Shirakami, H., Jiao, Y., Cotterell, R., Schölkopf, B., Saparov, A., Sachan, M. Proceedings of the 41st International Conference on Machine Learning (ICML), 235:38762-38778, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) URL BibTeX