Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Haptic Intelligence Article Towards Semi-Automated Pleural Cavity Access for Pneumothorax in Austere Environments L’Orsa, R., Lama, S., Westwick, D., Sutherland, G., Kuchenbecker, K. J. Acta Astronautica, 212:48-53, November 2023 (Published)
Astronauts are at risk for pneumothorax, a condition where injury or disease introduces air between the chest wall and the lungs (i.e., the pleural cavity). In a worst-case scenario, it can rapidly lead to a fatality if left unmanaged and will require prompt treatment in situ if developed during spaceflight. Chest tube insertion is the definitive treatment for pneumothorax, but it requires a high level of skill and frequent practice for safe use. Physician astronauts may struggle to maintain this skill on medium- and long-duration exploration-class missions, and it is inappropriate for pure just-in-time learning or skill refreshment paradigms. This paper proposes semi-automating tool insertion to reduce the risk of complications in austere environments and describes preliminary experiments providing initial validation of an intelligent prototype system. Specifically, we showcase and analyse motion and force recordings from a sensorized percutaneous access needle inserted repeatedly into an ex vivo tissue phantom, along with relevant physiological data simultaneously recorded from the operator. When coupled with minimal just-in-time training and/or augmented reality guidance, the proposed system may enable non-expert operators to safely perform emergency chest tube insertion without the use of ground resources.
DOI BibTeX

Empirical Inference Article Variational Causal Dynamics: Discovering Modular World Models from Interventions Lei, A., Schölkopf, B., Posner, I. Transactions on Machine Learning Research, November 2023 (Published) URL BibTeX

Robotic Materials Patent Hydraulically Amplified Self-healing Electrostatic Actuators Keplinger, C. M., Acome, E. L., Kellaris, N. A., Mitchell, S. K. (US Patent 11795979B2), October 2023
An electro-hydraulic actuator includes a deformable shell defining an enclosed internal cavity and containing a liquid dielectric, first and second electrodes on first and second sides, respectively, of the enclosed internal cavity. An electrostatic force between the first and second electrodes upon application of a voltage to one of the electrodes draws the electrodes towards each other to displace the liquid dielectric within the enclosed internal cavity. The shell includes active and inactive areas such that the electrostatic forces between the first and second electrodes displaces the liquid dielectric within the enclosed internal cavity from the active area of the shell to the inactive area of the shell. The first and second electrodes, the deformable shell, and the liquid dielectric cooperate to form a self-healing capacitor, and the liquid dielectric is configured for automatically filling breaches in the liquid dielectric resulting from dielectric breakdown.
URL BibTeX

Perceiving Systems Ph.D. Thesis Neural Shape Modeling of 3D Clothed Humans Ma, Q. October 2023 (Published)
Parametric models for 3D human bodies play a crucial role in the synthesis and analysis of humans in visual computing. While current models effectively capture body pose and shape variations, a significant aspect has been overlooked – clothing. Existing 3D human models mostly produce a minimally-clothed body geometry, limiting their ability to represent the complexity of dressed people in real-world data sources. The challenge lies in the unique characteristics of garments, which make modeling clothed humans particularly difficult. Clothing exhibits diverse topologies, and as the body moves, it introduces wrinkles at various spatial scales. Moreover, pose-dependent clothing deformations are non-rigid and non-linear, exceeding the capabilities of classical body models constructed with fixed-topology surface meshes and linear approximations of pose-aware shape deformations. This thesis addresses these challenges by innovating in two key areas: the 3D shape representation and deformation modeling techniques. We demonstrate that, the seemingly old-fashioned shape representation, point clouds – when equipped with deep learning and neural fields – can be a powerful tool for modeling clothed characters. Specifically, the thesis begins by introducing a large-scale dataset of dynamic 3D humans in various clothing, which serves as a foundation for training the models presented in this work. The first model we present is CAPE: a neural generative model for 3D clothed human meshes. Here, a clothed body is straightforwardly obtained by applying per-vetex offsets to a pre-defined, unclothed body template mesh. Sampling from the CAPE model generates plausibly-looking digital humans wearing common garments, but the fixed-topology mesh representation limits its applicability to more complex garment types. To address this limitation, we present a series of point-based clothed human models: SCALE, PoP and SkiRT. The SCALE model represents a clothed human using a collection of points organized into local patches. The patches can freely move and deform to represent garments of diverse topologies, unlocking the generalization to more challenging outfits such as dresses and jackets. Unlike traditional approaches based on physics simulations, SCALE learns pose-dependent cloth deformations from data with minimal manual intervention. To further improve the geometric quality, the PoP model eliminates the concept of patches and instead learns a continuous neural deformation field from the body surface. Densely querying this field results in a highresolution point cloud of a dressed human, showcasing intricate clothing wrinkles. PoP can generalize across multiple subjects and outfits, and can even bring a single, static scan into animation. Finally, we tackle a long-standing challenge in learning-based digital human modeling: loose garments, in particular skirts and dresses. Building upon PoP, the SkiRT pipeline further learns a shape “template” and neural field of linear-blend-skinning weights for clothed bodies, improving the models’ robustness for loose garments of varied topology. Our point-based human models are “interplicit”: the output point clouds capture surfaces explicitly at discrete points but implicitly in between. The explicit points are fast, topologically flexible, and are compatible with existing graphics tools, while the implicit neural deformation field contributes to high-quality geometry. This thesis primarily demonstrates these advantages in the context of clothed human shape modeling; future work can apply our representation and techniques to general 3D deformable shapes and neural rendering.
download Thesis DOI BibTeX

Perceiving Systems Conference Paper Optimizing the 3D Plate Shape for Proximal Humerus Fractures Keller, M., Krall, M., Smith, J., Clement, H., Kerner, A. M., Gradischar, A., Schäfer, Ü., Black, M. J., Weinberg, A., Pujades, S. International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 487-496, Springer, Cham, MICCAI, October 2023 (Published)
To treat bone fractures, implant manufacturers produce 2D anatomically contoured plates. Unfortunately, existing plates only fit a limited segment of the population and/or require manual bending during surgery. Patient-specific implants would provide major benefits such as reducing surgery time and improving treatment outcomes but they are still rare in clinical practice. In this work, we propose a patient-specific design for the long helical 2D PHILOS (Proximal Humeral Internal Locking System) plate, used to treat humerus shaft fractures. Our method automatically creates a custom plate from a CT scan of a patient's bone. We start by designing an optimal plate on a template bone and, with an anatomy-aware registration method, we transfer this optimal design to any bone. In addition, for an arbitrary bone, our method assesses if a given plate is fit for surgery by automatically positioning it on the bone. We use this process to generate a compact set of plate shapes capable of fitting the bones within a given population. This plate set can be pre-printed in advance and readily available, removing the fabrication time between the fracture occurrence and the surgery. Extensive experiments on ex-vivo arms and 3D-printed bones show that the generated plate shapes (personalized and plate-set) faithfully match the individual bone anatomy and are suitable for clinical practice.
Project page Code Paper Poster DOI URL BibTeX

Empirical Inference Article A taxonomy and review of generalization research in NLP Hupkes, D., Giulianelli, M., Dankers, V., Artetxe, M., Elazar, Y., Pimentel, T., Christodoulopoulos, C., Lasri, K., Saphra, N., Sinclair, A., Ulmer, D., Schottmann, F., Batsuren, K., Sun, K., Sinha, K., Khalatbari, L., Ryskina, M., Frieske, R., Cotterell, R., Jin, Z. Nature Machine Intelligence, 5(10):1161-1174, October 2023 (Published) DOI BibTeX

Empirical Inference Article Artificial Intelligence in Oncological Hybrid Imaging Feuerecker, B., Heimer, M. M., Geyer, T., Fabritius, M. P., Gu, S., Schachtner, B., Beyer, L., Ricke, J., Gatidis, S., Ingrisch, M., Cyran, C. C. Nuklearmedizin, 62(5):296-305, October 2023 (Published) DOI BibTeX

Haptic Intelligence Intelligent Control Systems Conference Paper Enhancing Surgical Team Collaboration and Situation Awareness through Multimodal Sensing Allemang–Trivalle, A. In Proceedings of the ACM International Conference on Multimodal Interaction, 716-720, Extended abstract (5 pages) presented at the ACM International Conference on Multimodal Interaction (ICMI) Doctoral Consortium, Paris, France, October 2023 (Published)
Surgery, typically seen as the surgeon's sole responsibility, requires a broader perspective acknowledging the vital roles of other operating room (OR) personnel. The interactions among team members are crucial for delivering quality care and depend on shared situation awareness. I propose a two-phase approach to design and evaluate a multimodal platform that monitors OR members, offering insights into surgical procedures. The first phase focuses on designing a data-collection platform, tailored to surgical constraints, to generate novel collaboration and situation-awareness metrics using synchronous recordings of the participants' voices, positions, orientations, electrocardiograms, and respiration signals. The second phase concerns the creation of intuitive dashboards and visualizations, aiding surgeons in reviewing recorded surgery, identifying adverse events and contributing to proactive measures. This work aims to demonstrate an innovative approach to data collection and analysis, augmenting the surgical team's capabilities. The multimodal platform has the potential to enhance collaboration, foster situation awareness, and ultimately mitigate surgical adverse events. This research sets the stage for a transformative shift in the OR, enabling a more holistic and inclusive perspective that recognizes that surgery is a team effort.
DOI BibTeX

Perceiving Systems Conference Paper Generalizing Neural Human Fitting to Unseen Poses With Articulated SE(3) Equivariance Feng, H., Kulits, P., Liu, S., Black, M. J., Fernandez Abrevaya, V. In Proc. International Conference on Computer Vision (ICCV), International Conference on Computer Vision, October 2023 (Published)
We address the problem of fitting a parametric human body model (SMPL) to point cloud data. Optimization based methods require careful initialization and are prone to becoming trapped in local optima. Learning-based methods address this but do not generalize well when the input pose is far from those seen during training. For rigid point clouds, remarkable generalization has been achieved by leveraging SE(3)-equivariant networks, but these methods do not work on articulated objects. In this work we extend this idea to human bodies and propose ArtEq, a novel part-based SE(3)-equivariant neural architecture for SMPL model estimation from point clouds. Specifically, we learn a part detection network by leveraging local SO(3) invariance, and regress shape and pose using articulated SE(3) shape-invariant and pose-equivariant networks, all trained end-to-end. Our novel pose regression module leverages the permutation-equivariant property of self-attention layers to preserve rotational equivariance. Experimental results show that ArtEq generalizes to poses not seen during training, outperforming state-of-the-art methods by ~44%in terms of body reconstruction accuracy, without requiring an optimization refinement step. Furthermore, ArtEq is three orders of magnitude faster during inference than prior work and has 97.3% fewer parameters. The code and model are available for research purposes at https://arteq.is.tue.mpg.de.
arxiv project URL BibTeX

Haptic Intelligence Ph.D. Thesis Gesture-Based Nonverbal Interaction for Exercise Robots Mohan, M. University of Tübingen, Tübingen, Germany, October 2023, Department of Computer Science (Published)
When teaching or coaching, humans augment their words with carefully timed hand gestures, head and body movements, and facial expressions to provide feedback to their students. Robots, however, rarely utilize these nuanced cues. A minimally supervised social robot equipped with these abilities could support people in exercising, physical therapy, and learning new activities. This thesis examines how the intuitive power of human gestures can be harnessed to enhance human-robot interaction. To address this question, this research explores gesture-based interactions to expand the capabilities of a socially assistive robotic exercise coach, investigating the perspectives of both novice users and exercise-therapy experts. This thesis begins by concentrating on the user's engagement with the robot, analyzing the feasibility of minimally supervised gesture-based interactions. This exploration seeks to establish a framework in which robots can interact with users in a more intuitive and responsive manner. The investigation then shifts its focus toward the professionals who are integral to the success of these innovative technologies: the exercise-therapy experts. Roboticists face the challenge of translating the knowledge of these experts into robotic interactions. We address this challenge by developing a teleoperation algorithm that can enable exercise therapists to create customized gesture-based interactions for a robot. Thus, this thesis lays the groundwork for dynamic gesture-based interactions in minimally supervised environments, with implications for not only exercise-coach robots but also broader applications in human-robot interaction.
BibTeX

Social Foundations of Computation Conference Paper Is Your Model Predicting the Past? Hardt, M., Kim, M. P. In Proceedings of the Third ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO), ACM, October 2023 (Published)
When does a machine learning model predict the future of individuals and when does it recite patterns that predate the individuals? In this work, we propose a distinction between these two pathways of prediction, supported by theoretical, empirical, and normative arguments. At the center of our proposal is a family of simple and efficient statistical tests, called backward baselines, that demonstrate if, and to what extent, a model recounts the past. Our statistical theory provides guidance for interpreting backward baselines, establishing equivalences between different baselines and familiar statistical concepts. Concretely, we derive a meaningful backward baseline for auditing a prediction system as a black box, given only background variables and the system’s predictions. Empirically, we evaluate the framework on different prediction tasks derived from longitudinal panel surveys, demonstrating the ease and effectiveness of incorporating backward baselines into the practice of machine learning.
URL BibTeX

Empirical Inference Perceiving Systems Conference Paper One-shot Implicit Animatable Avatars with Model-based Priors Huang, Y., Yi, H., Liu, W., Wang, H., Wu, B., Wang, W., Lin, B., Zhang, D., Cai, D. In Proc. International Conference on Computer Vision (ICCV), 8940-8951, International Conference on Computer Vision, October 2023, *equal contribution (Published)
Existing neural rendering methods for creating human avatars typically either require dense input signals such as video or multi-view images, or leverage a learned prior from large-scale specific 3D human datasets such that reconstruction can be performed with sparse-view inputs. Most of these methods fail to achieve realistic reconstruction when only a single image is available. To enable the data-efficient creation of realistic animatable 3D humans, we propose ELICIT, a novel method for learning human-specific neural radiance fields from a single image. Inspired by the fact that humans can easily reconstruct the body geometry and infer the full-body clothing from a single image, we leverage two priors in ELICIT: 3D geometry prior and visual semantic prior. Specifically, ELICIT introduces the 3D body shape geometry prior from a skinned vertex-based template model (i.e., SMPL) and implements the visual clothing semantic prior with the CLIP-based pre-trained models. Both priors are used to jointly guide the optimization for creating plausible content in the invisible areas. In order to further improve visual details, we propose a segmentation-based sampling strategy that locally refines different parts of the avatar.Comprehensive evaluations on multiple popular benchmarks, including ZJU-MoCAP, Human3.6M, and DeepFashion, show that ELICIT has outperformed current state-of-the-art avatar creation methods when only a single image is available. Code will be public for reseach purpose at https://github.com/huangyangyi/ELICIT
arXiv code project DOI BibTeX

Perceiving Systems Empirical Inference Conference Paper Pairwise Similarity Learning is SimPLE Wen, Y., Liu, W., Feng, Y., Raj, B., Singh, R., Weller, A., Black, M. J., Schölkopf, B. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), International Conference on Computer Vision, October 2023 (Published)
In this paper, we focus on a general yet important learning problem, pairwise similarity learning (PSL). PSL subsumes a wide range of important applications, such as open-set face recognition, speaker verification, image retrieval and person re-identification. The goal of PSL is to learn a pairwise similarity function assigning a higher similarity score to positive pairs (i.e., a pair of samples with the same label) than to negative pairs (i.e., a pair of samples with different label). We start by identifying a key desideratum for PSL, and then discuss how existing methods can achieve this desideratum. We then propose a surprisingly simple proxy-free method, called SimPLE, which requires neither feature/proxy normalization nor angular margin and yet is able to generalize well in open-set recognition. We apply the proposed method to three challenging PSL tasks: open-set face recognition, image retrieval and speaker verification. Comprehensive experimental results on large-scale benchmarks show that our method performs significantly better than current state-of-the-art methods.
URL BibTeX

Robust Machine Learning Conference Paper Scale Alone Does not Improve Mechanistic Interpretability in Vision Models Zimmermann, R. S., Klein, T., Brendel, W. In Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 57876 - 57907, Curran Associates Inc., NeurIPS, October 2023 (Published) NeurIPS Proceedings DOI URL BibTeX

Haptic Intelligence Miscellaneous Seeking Causal, Invariant, Structures with Kernel Mean Embeddings in Haptic-Auditory Data from Tool-Surface Interaction Khojasteh, B., Shao, Y., Kuchenbecker, K. J. Workshop paper (4 pages) presented at the IROS Workshop on Causality for Robotics: Answering the Question of Why, Detroit, USA, October 2023 (Published)
Causal inference could give future learning robots strong generalization and scalability capabilities, which are crucial for safety, fault diagnosis and error prevention. One application area of interest consists of the haptic recognition of surfaces. We seek to understand cause and effect during physical surface interaction by examining surface and tool identity, their interplay, and other contact-irrelevant factors. To work toward elucidating the mechanism of surface encoding, we attempt to recognize surfaces from haptic-auditory data captured by previously unseen hemispherical steel tools that differ from the recording tool in diameter and mass. In this context, we leverage ideas from kernel methods to quantify surface similarity through descriptive differences in signal distributions. We find that the effect of the tool is significantly present in higher-order statistical moments of contact data: aligning the means of the distributions being compared somewhat improves recognition but does not fully separate tool identity from surface identity. Our findings shed light on salient aspects of haptic-auditory data from tool-surface interaction and highlight the challenges involved in generalizing artificial surface discrimination capabilities.
Manuscript URL BibTeX

Perceiving Systems Conference Paper AG3D: Learning to Generate 3D Avatars from 2D Image Collections Dong, Z., Chen, X., Yang, J., Black, M. J., Hilliges, O., Geiger, A. In Proc. International Conference on Computer Vision (ICCV), 14916-14927, International Conference on Computer Vision (ICCV), October 2023 (Published)
While progress in 2D generative models of human appearance has been rapid, many applications require 3D avatars that can be animated and rendered. Unfortunately, most existing methods for learning generative models of 3D humans with diverse shape and appearance require 3D training data, which is limited and expensive to acquire. The key to progress is hence to learn generative models of 3D avatars from abundant unstructured 2D image collections. However, learning realistic and complete 3D appearance and geometry in this under-constrained setting remains challenging, especially in the presence of loose clothing such as dresses. In this paper, we propose a new adversarial generative model of realistic 3D people from 2D images. Our method captures shape and deformation of the body and loose clothing by adopting a holistic 3D generator and integrating an efficient and flexible articulation module. To improve realism, we train our model using multiple discriminators while also integrating geometric cues in the form of predicted 2D normal maps. We experimentally find that our method outperforms previous 3D- and articulation-aware methods in terms of geometry and appearance. We validate the effectiveness of our model and the importance of each component via systematic ablation studies.
project pdf code video DOI URL BibTeX

Empirical Inference Article CROCODILE - Incorporating medium-resolution spectroscopy of close-in directly imaged exoplanets into atmospheric retrievals via cross-correlation Hayoz, J., Cugno, G., Quanz, S. P., Patapis, P., Alei, E., Bonse, M. J., Dannert, F. A., Garvin, E. O., Gebhard, T. D., Konrad, B. S., Sartori, L. F. Astronomy & Astrophysics, 678, October 2023 (Published) DOI BibTeX

Perceiving Systems Conference Paper D-IF: Uncertainty-aware Human Digitization via Implicit Distribution Field Yang, X., Luo, Y., Xiu, Y., Wang, W., Xu, H., Fan, Z. In Proc. International Conference on Computer Vision (ICCV), 9122-9132, International Conference on Computer Vision, October 2023 (Published)
Realistic virtual humans play a crucial role in numerous industries, such as metaverse, intelligent healthcare, and self-driving simulation. But creating them on a large scale with high levels of realism remains a challenge. The utilization of deep implicit function sparks a new era of image-based 3D clothed human reconstruction, enabling pixel-aligned shape recovery with fine details. Subsequently, the vast majority of works locate the surface by regressing the deterministic implicit value for each point. However, should all points be treated equally regardless of their proximity to the surface? In this paper, we propose replacing the implicit value with an adaptive uncertainty distribution, to differentiate between points based on their distance to the surface. This simple "value to distribution" transition yields significant improvements on nearly all the baselines. Furthermore, qualitative results demonstrate that the models trained using our uncertainty distribution loss, can capture more intricate wrinkles, and realistic limbs.
Code Homepage URL BibTeX

Perceiving Systems Software Workshop Conference Paper DECO: Dense Estimation of 3D Human-Scene Contact in the Wild Tripathi, S., Chatterjee, A., Passy, J., Yi, H., Tzionas, D., Black, M. J. In Proc. International Conference on Computer Vision (ICCV), 8001-8013, International Conference on Computer Vision, October 2023 (Published)
Understanding how humans use physical contact to interact with the world is key to enabling human-centric artificial intelligence. While inferring 3D contact is crucial for modeling realistic and physically-plausible human-object interactions, existing methods either focus on 2D, consider body joints rather than the surface, use coarse 3D body regions, or do not generalize to in-the-wild images. In contrast, we focus on inferring dense, 3D contact between the full body surface and objects in arbitrary images. To achieve this, we first collect DAMON, a new dataset containing dense vertex-level contact annotations paired with RGB images containing complex human-object and human-scene contact. Second, we train DECO, a novel 3D contact detector that uses both body-part-driven and scene-context-driven attention to estimate vertex-level contact on the SMPL body. DECO builds on the insight that human observers recognize contact by reasoning about the contacting body parts, their proximity to scene objects, and the surrounding scene context. We perform extensive evaluations of our detector on DAMON as well as on the RICH and BEHAVE datasets. We significantly outperform existing SOTA methods across all benchmarks. We also show qualitatively that DECO generalizes well to diverse and challenging real-world human interactions in natural images. The code, data, and models are available at https://deco.is.tue.mpg.de/login.php.
Project Video Poster Code Data DOI URL BibTeX

Perceiving Systems Conference Paper SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation Athanasiou, N., Petrovich, M., Black, M. J., Varol, G. In Proc. International Conference on Computer Vision (ICCV), 9984-9995, International Conference on Computer Vision, October 2023 (Published)
Our goal is to synthesize 3D human motions given textual inputs describing multiple simultaneous actions, for example ‘waving hand’ while ‘walking’ at the same time. We refer to generating such simultaneous movements as performing ‘spatial compositions’. In contrast to ‘temporal compositions’ that seek to transition from one action to another in a sequence, spatial compositing requires understanding which body parts are involved with which action. Motivated by the observation that the correspondence between actions and body parts is encoded in powerful language models, we extract this knowledge by prompting GPT-3 with text such as “what parts of the body are moving when someone is doing the action <action name>?”. Given this action-part mapping, we automatically create new training data by artificially combining body parts from multiple text-motion pairs together. We extend previous work on text-to-motions synthesis to train on spatial compositions, and introduce SINC (“SImultaneous actioN Compositions for 3D human motions”). We experimentally validate that our additional GPT-guided data helps to better learn compositionality compared to training only on existing real data of simultaneous actions, which is limited in quantity.
website code paper-arxiv video BibTeX

Perceiving Systems Conference Paper TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis Petrovich, M., Black, M. J., Varol, G. In Proc. International Conference on Computer Vision (ICCV), 9488-9497, International Conference on Computer Vision, October 2023 (Published)
In this paper, we present TMR, a simple yet effective approach for text to 3D human motion retrieval. While previous work has only treated retrieval as a proxy evaluation metric, we tackle it as a standalone task. Our method extends the state-of-the-art text-to-motion synthesis model TEMOS, and incorporates a contrastive loss to better structure the cross-modal latent space. We show that maintaining the motion generation loss, along with the contrastive training, is crucial to obtain good performance. We introduce a benchmark for evaluation and provide an in-depth analysis by reporting results on several protocols. Our extensive experiments on the KIT-ML and HumanML3D datasets show that TMR outperforms the prior work by a significant margin, for example reducing the median rank from 54 to 19. Finally, we showcase the potential of our approach on moment retrieval. Our code and models are publicly available.
website code paper-arxiv video URL BibTeX

Autonomous Learning Conference Paper Regularity as Intrinsic Reward for Free Play Sancaktar, C., Piater, J., Martius, G. In Advances in Neural Information Processing Systems (NeurIPS, Advances in Neural Information Processing Systems 36, September 2023 (Published)
We propose regularity as a novel reward signal for intrinsically-motivated reinforcement learning. Taking inspiration from child development, we postulate that striving for structure and order helps guide exploration towards a subspace of tasks that are not favored by naive uncertainty-based intrinsic rewards. Our generalized formulation of Regularity as Intrinsic Reward (RaIR) allows us to operationalize it within model-based reinforcement learning. In a synthetic environment, we showcase the plethora of structured patterns that can emerge from pursuing this regularity objective. We also demonstrate the strength of our method in a multi-object robotic manipulation environment. We incorporate RaIR into free play and use it to complement the model’s epistemic uncertainty as an intrinsic reward. Doing so, we witness the autonomous construction of towers and other regular structures during free play, which leads to a substantial improvement in zero-shot downstream task performance on assembly tasks.
URL BibTeX

Organizational Leadership and Diversity Article Hooked on artificial agents: a systems thinking perspective Ðula, I., Berberena, T., Keplinger, K., Wirzberger, M. Frontiers in Behavioral Economics, 2:1223281, September 2023 (Published)
Following recent technological developments in the artificial intelligence space, artificial agents are increasingly taking over organizational tasks typically reserved for humans. Studies have shown that humans respond differently to this, with some being appreciative of their advice (algorithm appreciation), others being averse toward them (algorithm aversion), and others still fully relinquishing control to artificial agents without adequate oversight (automation bias). Using systems thinking, we analyze the existing literature on these phenomena and develop a conceptual model that provides an underlying structural explanation for their emergence. In doing so, we create a powerful visual tool that can be used to ground discussions about the impact artificial agents have on organizations and humans within them.
Hooked on artificial agents DOI URL BibTeX

Empirical Inference Article A historical perspective of biomedical explainable AI research Malinverno, L., Barros, V., Ghisoni, F., Visonà, G., Kern, R., Nickel, P. J., Ventura, B. E., Šimić, I., Stryeck, S., Manni, F., Ferri, C., Jean-Quartier, C., Genga, L., Schweikert, G., Lovrić, M., Rosen-Zvi, M. Patterns, 4(9), September 2023 (Published) DOI BibTeX

Empirical Inference Conference Paper Certified private data release for sparse Lipschitz functions Donhauser, K., Lokna, J., Sanyal, A., Boedihardjo, M., Hönig, R., Yang, F. TPDP 2023 - Theory and Practice of Differential Privacy, September 2023 (Published) arXiv URL BibTeX

Empirical Inference Master Thesis Efficient Sampling from Differentiable Matrix Elements Kofler, A. Technical University of Munich, Germany, September 2023 (Published) BibTeX

Empirical Inference Conference Paper How to make semi-private learning more effective Pinto, F., Hu, Y., Yang, F., Sanyal, A. TPDP 2023 - Theory and Practice of Differential Privacy, September 2023 (Published) arXiv URL BibTeX

Social Foundations of Computation Conference Paper Incentivizing Honesty among Competitors in Collaborative Learning and Optimization Dorner, F. E., Konstantinov, N., Pashaliev, G., Vechev, M. In Advances in Neural Information Processing Systems 36 (NeurIPS 2023), The Thirty-Seventh Annual Conference on Neural Information Processing Systems (NeurIPS), September 2023 (Published)
Collaborative learning techniques have the potential to enable training machine learning models that are superior to models trained on a single entity’s data. However, in many cases, potential participants in such collaborative schemes are competitors on a downstream task, such as firms that each aim to attract customers by providing the best recommendations. This can incentivize dishonest updates that damage other participants' models, potentially undermining the benefits of collaboration. In this work, we formulate a game that models such interactions and study two learning tasks within this framework: single-round mean estimation and multi-round SGD on strongly-convex objectives. For a natural class of player actions, we show that rational clients are incentivized to strongly manipulate their updates, preventing learning. We then propose mechanisms that incentivize honest communication and ensure learning quality comparable to full cooperation. Lastly, we empirically demonstrate the effectiveness of our incentive scheme on a standard non-convex federated learning benchmark. Our work shows that explicitly modeling the incentives and actions of dishonest clients, rather than assuming them malicious, can enable strong robustness guarantees for collaborative learning.
arXiv URL BibTeX

Haptic Intelligence Miscellaneous NearContact: Accurate Human Detection using Tomographic Proximity and Contact Sensing with Cross-Modal Attention Garrofé, G., Schoeffmann, C., Zangl, H., Kuchenbecker, K. J., Lee, H. Extended abstract (4 pages) presented at the International Workshop on Human-Friendly Robotics (HFR), Munich, Germany, September 2023 (Published) BibTeX

Empirical Inference Article Neural Causal Structure Discovery from Interventions Ke*, N. R., Bilaniuk*, O., Goyal, A., Bauer, S., Larochelle, H., Schölkopf, B., Mozer, M. C., Pal, C., Bengio, Y. Transactions on Machine Learning Research, September 2023, *equal contribution (Published) URL BibTeX

Empirical Inference Article Simulation-based inference for efficient identification of generative models in computational connectomics Boelts, J., Harth, P., Gao, R., Udvary, D., Yáñez, F., Baum, D., Hege, H., Oberlaender, M., Macke, J. H. PLOS Computational Biology, 19(9):1-28, September 2023 (Published) DOI BibTeX

Perceiving Systems Conference Paper Synthetic Data-Based Detection of Zebras in Drone Imagery Bonetto, E., Ahmad, A. 2023 European Conference on Mobile Robots (ECMR), 1-8, IEEE, ECMR, September 2023 (Published)
Nowadays, there is a wide availability of datasets that enable the training of common object detectors or human detectors. These come in the form of labelled real-world images and require either a significant amount of human effort, with a high probability of errors such as missing labels, or very constrained scenarios, e.g. VICON systems. On the other hand, uncommon scenarios, like aerial views, animals, like wild zebras, or difficult-to-obtain information, such as human shapes, are hardly available. To overcome this, synthetic data generation with realistic rendering technologies has recently gained traction and advanced research areas such as target tracking and human pose estimation. However, subjects such as wild animals are still usually not well represented in such datasets. In this work, we first show that a pre-trained YOLO detector can not identify zebras in real images recorded from aerial viewpoints. To solve this, we present an approach for training an animal detector using only synthetic data. We start by generating a novel synthetic zebra dataset using GRADE, a state-of-the-art framework for data generation. The dataset includes RGB, depth, skeletal joint locations, pose, shape and instance segmentations for each subject. We use this to train a YOLO detector from scratch. Through extensive evaluations of our model with real-world data from i) limited datasets available on the internet and ii) a new one collected and manually labelled by us, we show that we can detect zebras by using only synthetic data during training. The code, results, trained models, and both the generated and training data are provided as open-source at https://eliabntt.github.io/grade-rr.
Generation code pdf DOI URL BibTeX

Organizational Leadership and Diversity Conference Paper Constructing and deconstructing bias: modeling privilege and mentorship in agent-based simulations Smith, A., Heuschkel, S., Keplinger, K., Wu, C. Conference on Cognitive Computational Neuroscience, 10.32470/CCN.2023.1257-0, Conference on Cognitive Computational Neuroscience, Oxford, UK, Conference on Cognitive Computational Neuroscience, August 2023 (Published)
Bias exists in how we pick leaders, who we perceive as being influential, and who we interact with, not only in society, but in organizational contexts. Drawing from leadership emergence and social influence theories, we investigate potential interventions that support diverse leaders. Using agent-based simulations, we model a collective search process on a fitness landscape. Agents combine individual and social learning, and are represented as a feature vector blending relevant (e.g., individual learning characteristics) and irrelevant (e.g., race or gender) features. Agents use rational principles of learning to estimate feature weights on the basis of performance predictions, which are used to dynamically define social influence in their network. We show how biases arise based on historic privilege, but can be drastically reduced through the use of an intervention (e.g. mentorship). This work provides important insights into the cognitive mechanisms underlying bias construction and deconstruction, while pointing towards real-world interventions to be tested in future empirical work.
CCN2023 DOI URL BibTeX

Robotic Materials Patent High Strain Peano Hydraulically Amplified Self-Healing Electrostatic (HASEL) Transducers Keplinger, C. M., Wang, X., Mitchell, S. K. (US Patent App. 18/138,621), August 2023
High strain hydraulically amplified self-healing electrostatic transducers having increased maximum theoretical and practical strains are disclosed. In particular, the actuators include electrode configurations having a zipping front created by the attraction of the electrodes that is configured orthogonally to a strain axis along which the actuators. This configuration produces increased strains. In turn, various form factors for the actuator configuration are presented including an artificial circular muscle and a strain amplifying pulley system. Other actuator configurations are contemplated that include independent and opposed electrode pairs to create cyclic activation, hybrid electrode configurations, and use of strain limiting layers for controlled deflection of the actuator.
URL BibTeX

Haptic Intelligence Software Workshop Autonomous Motion Conference Paper Augmenting Human Policies using Riemannian Metrics for Human-Robot Shared Control Oh, Y., Passy, J., Mainprice, J. In Proceedings of the IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 1612-1618, Busan, South Korea, August 2023 (Published)
We present a shared control framework for teleoperation that combines the human and autonomous robot agents operating in different dimension spaces. The shared control problem is an optimization problem to maximize the human's internal action-value function while guaranteeing that the shared control policy is close to the autonomous robot policy. This results in a state update rule that augments the human controls using the Riemannian metric that emerges from computing the curvature of the robot's value function to account for any cost terms or constraints that the human operator may neglect when operating a redundant manipulator. In our experiments, we apply Linear Quadratic Regulators to locally approximate the robot policy using a single optimized robot trajectory, thereby preventing the need for an optimization step at each time step to determine the optimal policy. We show preliminary results of reach-and-grasp teleoperation tasks with a simulated human policy and a pilot user study using the VR headset and controllers. However, the mixed user preference ratings and quantitative results show that more investigation is required to prove the efficacy of the proposed paradigm.
DOI BibTeX

Learning and Dynamical Systems Empirical Inference Conference Paper Causal Effect Estimation from Observational and Interventional Data Through Matrix Weighted Linear Estimators Kladny, K., von Kügelgen, J., Schölkopf, B., Muehlebach, M. Conference on Uncertainty in Artificial Intelligence, 216:1087-1097, Proceedings of Machine Learning Research, (Editors: Evans, Robin J. and Shpitser, Ilya), PMLR, August 2023 (Published) URL BibTeX

Empirical Inference Article Chasing rainbows and ocean glints: Inner working angle constraints for the Habitable Worlds Observatory Vaughan, S. R., Gebhard, T. D., Bott, K., Casewell, S. L., Cowan, N. B., Doelman, D. S., Kenworthy, M., Mazoyer, J., Millar-Blanchaer, M. A., Trees, V. J. H., Stam, D. M., Absil, O., Altinier, L., Baudoz, P., Belikov, R., Bidot, A., Birkby, J. L., Bonse, M. J., Brandl, B., Carlotti, A., et al. Monthly Notices of the Royal Astronomical Society, 524(4):5477-5485, August 2023 (Published) DOI BibTeX

Haptic Intelligence Perceiving Systems Article Learning to Estimate Palpation Forces in Robotic Surgery From Visual-Inertial Data Lee, Y., Mat Husin, H., Forte, M., Lee, S., Kuchenbecker, K. J. IEEE Transactions on Medical Robotics and Bionics, 5(3):496-506, August 2023, Young-Eun Lee and Haliza Mat Husin contributed equally to this publication (Published)
Surgeons cannot directly touch the patient's tissue in robot-assisted minimally invasive procedures. Instead, they must palpate using instruments inserted into the body through trocars. This way of operating largely prevents surgeons from using haptic cues to localize visually undetectable structures such as tumors and blood vessels, motivating research on direct and indirect force sensing. We propose an indirect force-sensing method that combines monocular images of the operating field with measurements from IMUs attached externally to the instrument shafts. Our method is thus suitable for various robotic surgery systems as well as laparoscopic surgery. We collected a new dataset using a da Vinci Si robot, a force sensor, and four different phantom tissue samples. The dataset includes 230 one-minute-long recordings of repeated bimanual palpation tasks performed by four lay operators. We evaluated several network architectures and investigated the role of the network inputs. Using the DenseNet vision model and including inertial data best-predicted palpation forces (lowest average root-mean-square error and highest average coefficient of determination). Ablation studies revealed that video frames carry significantly more information than inertial signals. Finally, we demonstrated the model's ability to generalize to unseen tissue and predict shear contact forces.
DOI BibTeX

Haptic Intelligence Autonomous Learning Empirical Inference Article Minsight: A Fingertip-Sized Vision-Based Tactile Sensor for Robotic Manipulation Andrussow, I., Sun, H., Kuchenbecker, K. J., Martius, G. Advanced Intelligent Systems, 5(8):2300042, August 2023, Inside back cover, DOI: 10.1002/aisy.202370035 (Published)
Intelligent interaction with the physical world requires perceptual abilities beyond vision and hearing; vibrant tactile sensing is essential for autonomous robots to dexterously manipulate unfamiliar objects or safely contact humans. Therefore, robotic manipulators need high-resolution touch sensors that are compact, robust, inexpensive, and efficient. The soft vision-based haptic sensor presented herein is a miniaturized and optimized version of the previously published sensor Insight. Minsight has the size and shape of a human fingertip and uses machine learning methods to output high-resolution maps of 3D contact force vectors at 60 Hz. Experiments confirm its excellent sensing performance, with a mean absolute force error of 0.07 N and contact location error of 0.6 mm across its surface area. Minsight's utility is shown in two robotic tasks on a 3-DoF manipulator. First, closed-loop force control enables the robot to track the movements of a human finger based only on tactile data. Second, the informative value of the sensor output is shown by detecting whether a hard lump is embedded within a soft elastomer with an accuracy of 98\%. These findings indicate that Minsight can give robots the detailed fingertip touch sensing needed for dexterous manipulation and physical human–robot interaction.
DOI BibTeX

Physical Intelligence Article Reconfigurable Innervation of Modular Soft Machines via Soft, Sticky, and Instant Electronic Adhesive Interlocking Yoon, J., Byun, J., Park, M., Kim, H., Kim, W., Yoon, J., Cho, K., Hong, Y. Advanced Intelligent Systems, 5(8), August 2023 (Published) DOI URL BibTeX

Empirical Inference Conference Paper Socially Responsible Machine Learning: A Causal Perspective Moraffah, R., Karimi, A., Raglin, A., Liu, H. Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 5819-5820, Association for Computing Machinery, August 2023 (Published) DOI BibTeX

Perceiving Systems Conference Paper Synthesizing Physical Character-scene Interactions Hassan, M., Guo, Y., Wang, T., Black, M. J., Fidler, S., Bin Peng, X. In ACM SIGGRAPH 2023 Conference Proceedings, SIGGRAPH, August 2023 (Published)
Movement is how people interact with and affect their environment. For realistic virtual character animation, it is necessary to realistically synthesize such interactions between virtual characters and their surroundings. Despite recent progress in character animation using machine learning, most systems focus on controlling an agent's movements in fairly simple and homogeneous environments, with limited interactions with other objects. Furthermore, many previous approaches that synthesize human-scene interaction require significant manual labeling of the training data. In contrast, we present a system that uses adversarial imitation learning and reinforcement learning to train physically-simulated characters that perform scene interaction tasks in a natural and life-like manner. Our method is able to learn natural scene interaction behaviors from large unstructured motion datasets, without manual annotation of the motion data. These scene interactions are learned using an adversarial discriminator that evaluates the realism of a motion within the context of a scene. The key novelty involves conditioning both the discriminator and the policy networks on scene context. We demonstrate the effectiveness of our approach through three challenging scene interaction tasks: carrying, sitting, and lying down, which require coordination of a character's movements in relation to objects in the environment. Our policies learn to seamlessly transition between different behaviors like idling, walking, and sitting. Using an efficient approach to randomize the training objects and their placements during training enables our method to generalize beyond the objects and scenarios in the training dataset, producing natural character-scene interactions despite wide variation in object shape and placement. The approach takes physics-based character motion generation a step closer to broad applicability.
video arXiv ACM paper pdf BibTeX

Haptic Intelligence Miscellaneous The Role of Kinematics Estimation Accuracy in Learning with Wearable Haptics Rokhmanova, N., Pearl, O., Kuchenbecker, K. J., Halilaj, E. Abstract (1 page) presented at the American Society of Biomechanics Annual Meeting (ASB), Knoxville, USA, August 2023 (Published) BibTeX