Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Empirical Inference Conference Paper On the Importance of Step-wise Embeddings for Heterogeneous Clinical Time-Series Kuznetsova*, R., Pace*, A., Burger*, M., Yèche, H., Rätsch, G. Proceedings of the 3rd Machine Learning for Health Symposium (ML4H) , 225:268-291, Proceedings of Machine Learning Research, (Editors: Hegselmann, S.and Parziale, A. and Shanmugam, D. and Tang, S. and Asiedu, M. N. and Chang, S. and Hartvigsen, T. and Singh, H.), PMLR, December 2023, *equal contribution (Published) URL BibTeX

Empirical Inference Conference Paper SE(3) Equivariant Augmented Coupling Flows Midgley*, L. I., Stimper*, V., Antorán*, J., Mathieu*, E., Schölkopf, B., Hernández-Lobato, J. M. Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 36:79200-79225, (Editors: A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine), Curran Associates, Inc., 37th Annual Conference on Neural Information Processing Systems, December 2023, *equal contribution (Published)
Coupling normalizing flows allow for fast sampling and density evaluation, making them the tool of choice for probabilistic modeling of physical systems. However, the standard coupling architecture precludes endowing flows that operate on the Cartesian coordinates of atoms with the SE(3) and permutation invariances of physical systems. This work proposes a coupling flow that preserves SE(3) and permutation equivariance by performing coordinate splits along additional augmented dimensions. At each layer, the flow maps atoms’ positions into learned SE(3) invariant bases, where we apply standard flow transformations, such as monotonic rational-quadratic splines, before returning to the original basis. Crucially, our flow preserves fast sampling and density evaluation, and may be used to produce unbiased estimates of expectations with respect to the target distribution via importance sampling. When trained on the DW4, LJ13 and QM9-positional datasets, our flow is competitive with equivariant continuous normalizing flows, while allowing sampling two orders of magnitude faster. Moreover, to the best of our knowledge, we are the first to learn the full Boltzmann distribution of alanine dipeptide by only modeling the Cartesian positions of its atoms. Lastly, we demonstrate that our flow can be trained to approximately sample from the Boltzmann distribution of the DW4 and LJ13 particle systems using only their energy functions.
arXiv URL BibTeX

Empirical Inference Conference Paper Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent Lin*, J. A., Antorán*, J., Padhy*, S., Janz, D., Hernández-Lobato, J. M., Terenin, A. Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 36:36886-36912, (Editors: A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine), Curran Associates, Inc., 37th Annual Conference on Neural Information Processing Systems, December 2023, *equal contribution (Published) URL BibTeX

Empirical Inference Conference Paper Spuriosity Didn’t Kill the Classifier: Using Invariant Predictions to Harness Spurious Features Eastwood*, C., Singh*, S., Nicolicioiu, A. L., Vlastelica, M., von Kügelgen, J., Schölkopf, B. In Advances in Neural Information Processing Systems 36, 36:18291-18324, (Editors: A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine), Curran Associates Inc., 37th Annual Conference on Neural Information Processing Systems, December 2023, *equal contribution (Published) URL BibTeX

Empirical Inference Conference Paper Spuriosity Didn’t Kill the Classifier: Using Invariant Predictions to Harness Spurious Features Eastwood*, C., Singh*, S., Nicolicioiu, A. L., Vlastelica, M., von Kügelgen, J., Schölkopf, B. Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 36:18291-18324, (Editors: A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine), Curran Associates, Inc., 37th Annual Conference on Neural Information Processing Systems, December 2023, *equal contribution (Published) URL BibTeX

Empirical Inference Ph.D. Thesis Stochastic Predictive Control for Legged Robots Gazar, A. University of Tübingen, Germany, December 2023 (Published) DOI BibTeX

Perceiving Systems Article FLARE: Fast learning of Animatable and Relightable Mesh Avatars Bharadwaj, S., Zheng, Y., Hilliges, O., Black, M. J., Fernandez Abrevaya, V. ACM Transactions on Graphics (TOG), ACM Transactions on Graphics (TOG), 42(6):204:1-204:15, ACM New York, NY, USA, December 2023 (Published)
Our goal is to efficiently learn personalized animatable 3D head avatars from videos that are geometrically accurate, realistic, relightable, and compatible with current rendering systems. While 3D meshes enable efficient processing and are highly portable, they lack realism in terms of shape and appearance. Neural representations, on the other hand, are realistic but lack compatibility and are slow to train and render. Our key insight is that it is possible to efficiently learn high-fidelity 3D mesh representations via differentiable rendering by exploiting highly-optimized methods from traditional computer graphics and approximating some of the components with neural networks. To that end, we introduce FLARE, a technique that enables the creation of animatable and relightable mesh avatars from a single monocular video. First, we learn a canonical geometry using a mesh representation, enabling efficient differentiable rasterization and straightforward animation via learned blendshapes and linear blend skinning weights. Second, we follow physically-based rendering and factor observed colors into intrinsic albedo, roughness, and a neural representation of the illumination, allowing the learned avatars to be relit in novel scenes. Since our input videos are captured on a single device with a narrow field of view, modeling the surrounding environment light is non-trivial. Based on the split-sum approximation for modeling specular reflections, we address this by approximating the pre-filtered environment map with a multi-layer perceptron (MLP) modulated by the surface roughness, eliminating the need to explicitly model the light. We demonstrate that our mesh-based avatar formulation, combined with learned deformation, material, and lighting MLPs, produces avatars with high-quality geometry and appearance, while also being efficient to train and render compared to existing approaches.
Paper Project Page Code DOI URL BibTeX

Empirical Inference Article Data-Efficient Learning via Minimizing Hyperspherical Energy Cao, X., Liu, W., Tsang, I. W. IEEE transactions on pattern analysis and machine intelligence, 45(11):13422-13437, November 2023 (Published) DOI BibTeX

Robotic Materials Article Electrochemically Controlled Hydrogels with Electrotunable Permeability and Uniaxial Actuation Benselfelt, T., Shakya, J., Rothemund, P., Lindström, S. B., Piper, A., Winkler, T. E., Hajian, A., Wågberg, L., Keplinger, C., Hamedi, M. M. Advanced Materials, 35(45):2303255, Wiley-VCH GmbH, November 2023
The unique properties of hydrogels enable the design of life-like soft intelligent systems. However, stimuli-responsive hydrogels still suffer from limited actuation control. Direct electronic control of electronically conductive hydrogels can solve this challenge and allow direct integration with modern electronic systems. An electrochemically controlled nanowire composite hydrogel with high in-plane conductivity that stimulates a uniaxial electrochemical osmotic expansion is demonstrated. This materials system allows precisely controlled shape-morphing at only −1 V, where capacitive charging of the hydrogel bulk leads to a large uniaxial expansion of up to 300%, caused by the ingress of ≈700 water molecules per electron–ion pair. The material retains its state when turned off, which is ideal for electrotunable membranes as the inherent coupling between the expansion and mesoporosity enables electronic control of permeability for adaptive separation, fractionation, and distribution. Used as electrochemical osmotic hydrogel actuators, they achieve an electroactive pressure of up to 0.7 MPa (1.4 MPa vs dry) and a work density of ≈150 kJ m−3 (2 MJ m−3 vs dry). This new materials system paves the way to integrate actuation, sensing, and controlled permeation into advanced soft intelligent systems.
DOI URL BibTeX

Autonomous Learning Conference Paper Improving Behavioural Cloning with Positive Unlabeled Learning Wang, Q., McCarthy, R., Bulens, D. C., McGuinness, K., O’Connor, N. E., Sanchez, F. R., Gürtler, N., Widmaier, F., Redmond, S. J. 7th Annual Conference on Robot Learning (CoRL), November 2023 (Accepted) BibTeX

Haptic Intelligence Article Towards Semi-Automated Pleural Cavity Access for Pneumothorax in Austere Environments L’Orsa, R., Lama, S., Westwick, D., Sutherland, G., Kuchenbecker, K. J. Acta Astronautica, 212:48-53, November 2023 (Published)
Astronauts are at risk for pneumothorax, a condition where injury or disease introduces air between the chest wall and the lungs (i.e., the pleural cavity). In a worst-case scenario, it can rapidly lead to a fatality if left unmanaged and will require prompt treatment in situ if developed during spaceflight. Chest tube insertion is the definitive treatment for pneumothorax, but it requires a high level of skill and frequent practice for safe use. Physician astronauts may struggle to maintain this skill on medium- and long-duration exploration-class missions, and it is inappropriate for pure just-in-time learning or skill refreshment paradigms. This paper proposes semi-automating tool insertion to reduce the risk of complications in austere environments and describes preliminary experiments providing initial validation of an intelligent prototype system. Specifically, we showcase and analyse motion and force recordings from a sensorized percutaneous access needle inserted repeatedly into an ex vivo tissue phantom, along with relevant physiological data simultaneously recorded from the operator. When coupled with minimal just-in-time training and/or augmented reality guidance, the proposed system may enable non-expert operators to safely perform emergency chest tube insertion without the use of ground resources.
DOI BibTeX

Empirical Inference Article Variational Causal Dynamics: Discovering Modular World Models from Interventions Lei, A., Schölkopf, B., Posner, I. Transactions on Machine Learning Research, November 2023 (Published) URL BibTeX

Robotic Materials Patent Hydraulically Amplified Self-healing Electrostatic Actuators Keplinger, C. M., Acome, E. L., Kellaris, N. A., Mitchell, S. K. (US Patent 11795979B2), October 2023
An electro-hydraulic actuator includes a deformable shell defining an enclosed internal cavity and containing a liquid dielectric, first and second electrodes on first and second sides, respectively, of the enclosed internal cavity. An electrostatic force between the first and second electrodes upon application of a voltage to one of the electrodes draws the electrodes towards each other to displace the liquid dielectric within the enclosed internal cavity. The shell includes active and inactive areas such that the electrostatic forces between the first and second electrodes displaces the liquid dielectric within the enclosed internal cavity from the active area of the shell to the inactive area of the shell. The first and second electrodes, the deformable shell, and the liquid dielectric cooperate to form a self-healing capacitor, and the liquid dielectric is configured for automatically filling breaches in the liquid dielectric resulting from dielectric breakdown.
URL BibTeX

Perceiving Systems Ph.D. Thesis Neural Shape Modeling of 3D Clothed Humans Ma, Q. October 2023 (Published)
Parametric models for 3D human bodies play a crucial role in the synthesis and analysis of humans in visual computing. While current models effectively capture body pose and shape variations, a significant aspect has been overlooked – clothing. Existing 3D human models mostly produce a minimally-clothed body geometry, limiting their ability to represent the complexity of dressed people in real-world data sources. The challenge lies in the unique characteristics of garments, which make modeling clothed humans particularly difficult. Clothing exhibits diverse topologies, and as the body moves, it introduces wrinkles at various spatial scales. Moreover, pose-dependent clothing deformations are non-rigid and non-linear, exceeding the capabilities of classical body models constructed with fixed-topology surface meshes and linear approximations of pose-aware shape deformations. This thesis addresses these challenges by innovating in two key areas: the 3D shape representation and deformation modeling techniques. We demonstrate that, the seemingly old-fashioned shape representation, point clouds – when equipped with deep learning and neural fields – can be a powerful tool for modeling clothed characters. Specifically, the thesis begins by introducing a large-scale dataset of dynamic 3D humans in various clothing, which serves as a foundation for training the models presented in this work. The first model we present is CAPE: a neural generative model for 3D clothed human meshes. Here, a clothed body is straightforwardly obtained by applying per-vetex offsets to a pre-defined, unclothed body template mesh. Sampling from the CAPE model generates plausibly-looking digital humans wearing common garments, but the fixed-topology mesh representation limits its applicability to more complex garment types. To address this limitation, we present a series of point-based clothed human models: SCALE, PoP and SkiRT. The SCALE model represents a clothed human using a collection of points organized into local patches. The patches can freely move and deform to represent garments of diverse topologies, unlocking the generalization to more challenging outfits such as dresses and jackets. Unlike traditional approaches based on physics simulations, SCALE learns pose-dependent cloth deformations from data with minimal manual intervention. To further improve the geometric quality, the PoP model eliminates the concept of patches and instead learns a continuous neural deformation field from the body surface. Densely querying this field results in a highresolution point cloud of a dressed human, showcasing intricate clothing wrinkles. PoP can generalize across multiple subjects and outfits, and can even bring a single, static scan into animation. Finally, we tackle a long-standing challenge in learning-based digital human modeling: loose garments, in particular skirts and dresses. Building upon PoP, the SkiRT pipeline further learns a shape “template” and neural field of linear-blend-skinning weights for clothed bodies, improving the models’ robustness for loose garments of varied topology. Our point-based human models are “interplicit”: the output point clouds capture surfaces explicitly at discrete points but implicitly in between. The explicit points are fast, topologically flexible, and are compatible with existing graphics tools, while the implicit neural deformation field contributes to high-quality geometry. This thesis primarily demonstrates these advantages in the context of clothed human shape modeling; future work can apply our representation and techniques to general 3D deformable shapes and neural rendering.
download Thesis DOI BibTeX

Perceiving Systems Conference Paper Optimizing the 3D Plate Shape for Proximal Humerus Fractures Keller, M., Krall, M., Smith, J., Clement, H., Kerner, A. M., Gradischar, A., Schäfer, Ü., Black, M. J., Weinberg, A., Pujades, S. International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 487-496, Springer, Cham, MICCAI, October 2023 (Published)
To treat bone fractures, implant manufacturers produce 2D anatomically contoured plates. Unfortunately, existing plates only fit a limited segment of the population and/or require manual bending during surgery. Patient-specific implants would provide major benefits such as reducing surgery time and improving treatment outcomes but they are still rare in clinical practice. In this work, we propose a patient-specific design for the long helical 2D PHILOS (Proximal Humeral Internal Locking System) plate, used to treat humerus shaft fractures. Our method automatically creates a custom plate from a CT scan of a patient's bone. We start by designing an optimal plate on a template bone and, with an anatomy-aware registration method, we transfer this optimal design to any bone. In addition, for an arbitrary bone, our method assesses if a given plate is fit for surgery by automatically positioning it on the bone. We use this process to generate a compact set of plate shapes capable of fitting the bones within a given population. This plate set can be pre-printed in advance and readily available, removing the fabrication time between the fracture occurrence and the surgery. Extensive experiments on ex-vivo arms and 3D-printed bones show that the generated plate shapes (personalized and plate-set) faithfully match the individual bone anatomy and are suitable for clinical practice.
Project page Code Paper Poster DOI URL BibTeX

Empirical Inference Article A taxonomy and review of generalization research in NLP Hupkes, D., Giulianelli, M., Dankers, V., Artetxe, M., Elazar, Y., Pimentel, T., Christodoulopoulos, C., Lasri, K., Saphra, N., Sinclair, A., Ulmer, D., Schottmann, F., Batsuren, K., Sun, K., Sinha, K., Khalatbari, L., Ryskina, M., Frieske, R., Cotterell, R., Jin, Z. Nature Machine Intelligence, 5(10):1161-1174, October 2023 (Published) DOI BibTeX

Empirical Inference Article Artificial Intelligence in Oncological Hybrid Imaging Feuerecker, B., Heimer, M. M., Geyer, T., Fabritius, M. P., Gu, S., Schachtner, B., Beyer, L., Ricke, J., Gatidis, S., Ingrisch, M., Cyran, C. C. Nuklearmedizin, 62(5):296-305, October 2023 (Published) DOI BibTeX

Haptic Intelligence Intelligent Control Systems Conference Paper Enhancing Surgical Team Collaboration and Situation Awareness through Multimodal Sensing Allemang–Trivalle, A. In Proceedings of the ACM International Conference on Multimodal Interaction, 716-720, Extended abstract (5 pages) presented at the ACM International Conference on Multimodal Interaction (ICMI) Doctoral Consortium, Paris, France, October 2023 (Published)
Surgery, typically seen as the surgeon's sole responsibility, requires a broader perspective acknowledging the vital roles of other operating room (OR) personnel. The interactions among team members are crucial for delivering quality care and depend on shared situation awareness. I propose a two-phase approach to design and evaluate a multimodal platform that monitors OR members, offering insights into surgical procedures. The first phase focuses on designing a data-collection platform, tailored to surgical constraints, to generate novel collaboration and situation-awareness metrics using synchronous recordings of the participants' voices, positions, orientations, electrocardiograms, and respiration signals. The second phase concerns the creation of intuitive dashboards and visualizations, aiding surgeons in reviewing recorded surgery, identifying adverse events and contributing to proactive measures. This work aims to demonstrate an innovative approach to data collection and analysis, augmenting the surgical team's capabilities. The multimodal platform has the potential to enhance collaboration, foster situation awareness, and ultimately mitigate surgical adverse events. This research sets the stage for a transformative shift in the OR, enabling a more holistic and inclusive perspective that recognizes that surgery is a team effort.
DOI BibTeX

Perceiving Systems Conference Paper Generalizing Neural Human Fitting to Unseen Poses With Articulated SE(3) Equivariance Feng, H., Kulits, P., Liu, S., Black, M. J., Fernandez Abrevaya, V. In Proc. International Conference on Computer Vision (ICCV), International Conference on Computer Vision, October 2023 (Published)
We address the problem of fitting a parametric human body model (SMPL) to point cloud data. Optimization based methods require careful initialization and are prone to becoming trapped in local optima. Learning-based methods address this but do not generalize well when the input pose is far from those seen during training. For rigid point clouds, remarkable generalization has been achieved by leveraging SE(3)-equivariant networks, but these methods do not work on articulated objects. In this work we extend this idea to human bodies and propose ArtEq, a novel part-based SE(3)-equivariant neural architecture for SMPL model estimation from point clouds. Specifically, we learn a part detection network by leveraging local SO(3) invariance, and regress shape and pose using articulated SE(3) shape-invariant and pose-equivariant networks, all trained end-to-end. Our novel pose regression module leverages the permutation-equivariant property of self-attention layers to preserve rotational equivariance. Experimental results show that ArtEq generalizes to poses not seen during training, outperforming state-of-the-art methods by ~44%in terms of body reconstruction accuracy, without requiring an optimization refinement step. Furthermore, ArtEq is three orders of magnitude faster during inference than prior work and has 97.3% fewer parameters. The code and model are available for research purposes at https://arteq.is.tue.mpg.de.
arxiv project URL BibTeX

Haptic Intelligence Ph.D. Thesis Gesture-Based Nonverbal Interaction for Exercise Robots Mohan, M. University of Tübingen, Tübingen, Germany, October 2023, Department of Computer Science (Published)
When teaching or coaching, humans augment their words with carefully timed hand gestures, head and body movements, and facial expressions to provide feedback to their students. Robots, however, rarely utilize these nuanced cues. A minimally supervised social robot equipped with these abilities could support people in exercising, physical therapy, and learning new activities. This thesis examines how the intuitive power of human gestures can be harnessed to enhance human-robot interaction. To address this question, this research explores gesture-based interactions to expand the capabilities of a socially assistive robotic exercise coach, investigating the perspectives of both novice users and exercise-therapy experts. This thesis begins by concentrating on the user's engagement with the robot, analyzing the feasibility of minimally supervised gesture-based interactions. This exploration seeks to establish a framework in which robots can interact with users in a more intuitive and responsive manner. The investigation then shifts its focus toward the professionals who are integral to the success of these innovative technologies: the exercise-therapy experts. Roboticists face the challenge of translating the knowledge of these experts into robotic interactions. We address this challenge by developing a teleoperation algorithm that can enable exercise therapists to create customized gesture-based interactions for a robot. Thus, this thesis lays the groundwork for dynamic gesture-based interactions in minimally supervised environments, with implications for not only exercise-coach robots but also broader applications in human-robot interaction.
BibTeX

Social Foundations of Computation Conference Paper Is Your Model Predicting the Past? Hardt, M., Kim, M. P. In Proceedings of the Third ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO), ACM, October 2023 (Published)
When does a machine learning model predict the future of individuals and when does it recite patterns that predate the individuals? In this work, we propose a distinction between these two pathways of prediction, supported by theoretical, empirical, and normative arguments. At the center of our proposal is a family of simple and efficient statistical tests, called backward baselines, that demonstrate if, and to what extent, a model recounts the past. Our statistical theory provides guidance for interpreting backward baselines, establishing equivalences between different baselines and familiar statistical concepts. Concretely, we derive a meaningful backward baseline for auditing a prediction system as a black box, given only background variables and the system’s predictions. Empirically, we evaluate the framework on different prediction tasks derived from longitudinal panel surveys, demonstrating the ease and effectiveness of incorporating backward baselines into the practice of machine learning.
URL BibTeX

Empirical Inference Perceiving Systems Conference Paper One-shot Implicit Animatable Avatars with Model-based Priors Huang, Y., Yi, H., Liu, W., Wang, H., Wu, B., Wang, W., Lin, B., Zhang, D., Cai, D. In Proc. International Conference on Computer Vision (ICCV), 8940-8951, International Conference on Computer Vision, October 2023, *equal contribution (Published)
Existing neural rendering methods for creating human avatars typically either require dense input signals such as video or multi-view images, or leverage a learned prior from large-scale specific 3D human datasets such that reconstruction can be performed with sparse-view inputs. Most of these methods fail to achieve realistic reconstruction when only a single image is available. To enable the data-efficient creation of realistic animatable 3D humans, we propose ELICIT, a novel method for learning human-specific neural radiance fields from a single image. Inspired by the fact that humans can easily reconstruct the body geometry and infer the full-body clothing from a single image, we leverage two priors in ELICIT: 3D geometry prior and visual semantic prior. Specifically, ELICIT introduces the 3D body shape geometry prior from a skinned vertex-based template model (i.e., SMPL) and implements the visual clothing semantic prior with the CLIP-based pre-trained models. Both priors are used to jointly guide the optimization for creating plausible content in the invisible areas. In order to further improve visual details, we propose a segmentation-based sampling strategy that locally refines different parts of the avatar.Comprehensive evaluations on multiple popular benchmarks, including ZJU-MoCAP, Human3.6M, and DeepFashion, show that ELICIT has outperformed current state-of-the-art avatar creation methods when only a single image is available. Code will be public for reseach purpose at https://github.com/huangyangyi/ELICIT
arXiv code project DOI BibTeX

Perceiving Systems Empirical Inference Conference Paper Pairwise Similarity Learning is SimPLE Wen, Y., Liu, W., Feng, Y., Raj, B., Singh, R., Weller, A., Black, M. J., Schölkopf, B. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), International Conference on Computer Vision, October 2023 (Published)
In this paper, we focus on a general yet important learning problem, pairwise similarity learning (PSL). PSL subsumes a wide range of important applications, such as open-set face recognition, speaker verification, image retrieval and person re-identification. The goal of PSL is to learn a pairwise similarity function assigning a higher similarity score to positive pairs (i.e., a pair of samples with the same label) than to negative pairs (i.e., a pair of samples with different label). We start by identifying a key desideratum for PSL, and then discuss how existing methods can achieve this desideratum. We then propose a surprisingly simple proxy-free method, called SimPLE, which requires neither feature/proxy normalization nor angular margin and yet is able to generalize well in open-set recognition. We apply the proposed method to three challenging PSL tasks: open-set face recognition, image retrieval and speaker verification. Comprehensive experimental results on large-scale benchmarks show that our method performs significantly better than current state-of-the-art methods.
URL BibTeX

Robust Machine Learning Conference Paper Scale Alone Does not Improve Mechanistic Interpretability in Vision Models Zimmermann, R. S., Klein, T., Brendel, W. In Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 57876 - 57907, Curran Associates Inc., NeurIPS, October 2023 (Published) NeurIPS Proceedings DOI URL BibTeX

Haptic Intelligence Miscellaneous Seeking Causal, Invariant, Structures with Kernel Mean Embeddings in Haptic-Auditory Data from Tool-Surface Interaction Khojasteh, B., Shao, Y., Kuchenbecker, K. J. Workshop paper (4 pages) presented at the IROS Workshop on Causality for Robotics: Answering the Question of Why, Detroit, USA, October 2023 (Published)
Causal inference could give future learning robots strong generalization and scalability capabilities, which are crucial for safety, fault diagnosis and error prevention. One application area of interest consists of the haptic recognition of surfaces. We seek to understand cause and effect during physical surface interaction by examining surface and tool identity, their interplay, and other contact-irrelevant factors. To work toward elucidating the mechanism of surface encoding, we attempt to recognize surfaces from haptic-auditory data captured by previously unseen hemispherical steel tools that differ from the recording tool in diameter and mass. In this context, we leverage ideas from kernel methods to quantify surface similarity through descriptive differences in signal distributions. We find that the effect of the tool is significantly present in higher-order statistical moments of contact data: aligning the means of the distributions being compared somewhat improves recognition but does not fully separate tool identity from surface identity. Our findings shed light on salient aspects of haptic-auditory data from tool-surface interaction and highlight the challenges involved in generalizing artificial surface discrimination capabilities.
Manuscript URL BibTeX

Perceiving Systems Conference Paper AG3D: Learning to Generate 3D Avatars from 2D Image Collections Dong, Z., Chen, X., Yang, J., Black, M. J., Hilliges, O., Geiger, A. In Proc. International Conference on Computer Vision (ICCV), 14916-14927, International Conference on Computer Vision (ICCV), October 2023 (Published)
While progress in 2D generative models of human appearance has been rapid, many applications require 3D avatars that can be animated and rendered. Unfortunately, most existing methods for learning generative models of 3D humans with diverse shape and appearance require 3D training data, which is limited and expensive to acquire. The key to progress is hence to learn generative models of 3D avatars from abundant unstructured 2D image collections. However, learning realistic and complete 3D appearance and geometry in this under-constrained setting remains challenging, especially in the presence of loose clothing such as dresses. In this paper, we propose a new adversarial generative model of realistic 3D people from 2D images. Our method captures shape and deformation of the body and loose clothing by adopting a holistic 3D generator and integrating an efficient and flexible articulation module. To improve realism, we train our model using multiple discriminators while also integrating geometric cues in the form of predicted 2D normal maps. We experimentally find that our method outperforms previous 3D- and articulation-aware methods in terms of geometry and appearance. We validate the effectiveness of our model and the importance of each component via systematic ablation studies.
project pdf code video DOI URL BibTeX

Empirical Inference Article CROCODILE - Incorporating medium-resolution spectroscopy of close-in directly imaged exoplanets into atmospheric retrievals via cross-correlation Hayoz, J., Cugno, G., Quanz, S. P., Patapis, P., Alei, E., Bonse, M. J., Dannert, F. A., Garvin, E. O., Gebhard, T. D., Konrad, B. S., Sartori, L. F. Astronomy & Astrophysics, 678, October 2023 (Published) DOI BibTeX

Perceiving Systems Conference Paper D-IF: Uncertainty-aware Human Digitization via Implicit Distribution Field Yang, X., Luo, Y., Xiu, Y., Wang, W., Xu, H., Fan, Z. In Proc. International Conference on Computer Vision (ICCV), 9122-9132, International Conference on Computer Vision, October 2023 (Published)
Realistic virtual humans play a crucial role in numerous industries, such as metaverse, intelligent healthcare, and self-driving simulation. But creating them on a large scale with high levels of realism remains a challenge. The utilization of deep implicit function sparks a new era of image-based 3D clothed human reconstruction, enabling pixel-aligned shape recovery with fine details. Subsequently, the vast majority of works locate the surface by regressing the deterministic implicit value for each point. However, should all points be treated equally regardless of their proximity to the surface? In this paper, we propose replacing the implicit value with an adaptive uncertainty distribution, to differentiate between points based on their distance to the surface. This simple "value to distribution" transition yields significant improvements on nearly all the baselines. Furthermore, qualitative results demonstrate that the models trained using our uncertainty distribution loss, can capture more intricate wrinkles, and realistic limbs.
Code Homepage URL BibTeX

Perceiving Systems Software Workshop Conference Paper DECO: Dense Estimation of 3D Human-Scene Contact in the Wild Tripathi, S., Chatterjee, A., Passy, J., Yi, H., Tzionas, D., Black, M. J. In Proc. International Conference on Computer Vision (ICCV), 8001-8013, International Conference on Computer Vision, October 2023 (Published)
Understanding how humans use physical contact to interact with the world is key to enabling human-centric artificial intelligence. While inferring 3D contact is crucial for modeling realistic and physically-plausible human-object interactions, existing methods either focus on 2D, consider body joints rather than the surface, use coarse 3D body regions, or do not generalize to in-the-wild images. In contrast, we focus on inferring dense, 3D contact between the full body surface and objects in arbitrary images. To achieve this, we first collect DAMON, a new dataset containing dense vertex-level contact annotations paired with RGB images containing complex human-object and human-scene contact. Second, we train DECO, a novel 3D contact detector that uses both body-part-driven and scene-context-driven attention to estimate vertex-level contact on the SMPL body. DECO builds on the insight that human observers recognize contact by reasoning about the contacting body parts, their proximity to scene objects, and the surrounding scene context. We perform extensive evaluations of our detector on DAMON as well as on the RICH and BEHAVE datasets. We significantly outperform existing SOTA methods across all benchmarks. We also show qualitatively that DECO generalizes well to diverse and challenging real-world human interactions in natural images. The code, data, and models are available at https://deco.is.tue.mpg.de/login.php.
Project Video Poster Code Data DOI URL BibTeX

Perceiving Systems Conference Paper SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation Athanasiou, N., Petrovich, M., Black, M. J., Varol, G. In Proc. International Conference on Computer Vision (ICCV), 9984-9995, International Conference on Computer Vision, October 2023 (Published)
Our goal is to synthesize 3D human motions given textual inputs describing multiple simultaneous actions, for example ‘waving hand’ while ‘walking’ at the same time. We refer to generating such simultaneous movements as performing ‘spatial compositions’. In contrast to ‘temporal compositions’ that seek to transition from one action to another in a sequence, spatial compositing requires understanding which body parts are involved with which action. Motivated by the observation that the correspondence between actions and body parts is encoded in powerful language models, we extract this knowledge by prompting GPT-3 with text such as “what parts of the body are moving when someone is doing the action <action name>?”. Given this action-part mapping, we automatically create new training data by artificially combining body parts from multiple text-motion pairs together. We extend previous work on text-to-motions synthesis to train on spatial compositions, and introduce SINC (“SImultaneous actioN Compositions for 3D human motions”). We experimentally validate that our additional GPT-guided data helps to better learn compositionality compared to training only on existing real data of simultaneous actions, which is limited in quantity.
website code paper-arxiv video BibTeX

Perceiving Systems Conference Paper TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis Petrovich, M., Black, M. J., Varol, G. In Proc. International Conference on Computer Vision (ICCV), 9488-9497, International Conference on Computer Vision, October 2023 (Published)
In this paper, we present TMR, a simple yet effective approach for text to 3D human motion retrieval. While previous work has only treated retrieval as a proxy evaluation metric, we tackle it as a standalone task. Our method extends the state-of-the-art text-to-motion synthesis model TEMOS, and incorporates a contrastive loss to better structure the cross-modal latent space. We show that maintaining the motion generation loss, along with the contrastive training, is crucial to obtain good performance. We introduce a benchmark for evaluation and provide an in-depth analysis by reporting results on several protocols. Our extensive experiments on the KIT-ML and HumanML3D datasets show that TMR outperforms the prior work by a significant margin, for example reducing the median rank from 54 to 19. Finally, we showcase the potential of our approach on moment retrieval. Our code and models are publicly available.
website code paper-arxiv video URL BibTeX

Autonomous Learning Conference Paper Regularity as Intrinsic Reward for Free Play Sancaktar, C., Piater, J., Martius, G. In Advances in Neural Information Processing Systems (NeurIPS, Advances in Neural Information Processing Systems 36, September 2023 (Published)
We propose regularity as a novel reward signal for intrinsically-motivated reinforcement learning. Taking inspiration from child development, we postulate that striving for structure and order helps guide exploration towards a subspace of tasks that are not favored by naive uncertainty-based intrinsic rewards. Our generalized formulation of Regularity as Intrinsic Reward (RaIR) allows us to operationalize it within model-based reinforcement learning. In a synthetic environment, we showcase the plethora of structured patterns that can emerge from pursuing this regularity objective. We also demonstrate the strength of our method in a multi-object robotic manipulation environment. We incorporate RaIR into free play and use it to complement the model’s epistemic uncertainty as an intrinsic reward. Doing so, we witness the autonomous construction of towers and other regular structures during free play, which leads to a substantial improvement in zero-shot downstream task performance on assembly tasks.
URL BibTeX

Organizational Leadership and Diversity Article Hooked on artificial agents: a systems thinking perspective Ðula, I., Berberena, T., Keplinger, K., Wirzberger, M. Frontiers in Behavioral Economics, 2:1223281, September 2023 (Published)
Following recent technological developments in the artificial intelligence space, artificial agents are increasingly taking over organizational tasks typically reserved for humans. Studies have shown that humans respond differently to this, with some being appreciative of their advice (algorithm appreciation), others being averse toward them (algorithm aversion), and others still fully relinquishing control to artificial agents without adequate oversight (automation bias). Using systems thinking, we analyze the existing literature on these phenomena and develop a conceptual model that provides an underlying structural explanation for their emergence. In doing so, we create a powerful visual tool that can be used to ground discussions about the impact artificial agents have on organizations and humans within them.
Hooked on artificial agents DOI URL BibTeX

Empirical Inference Article A historical perspective of biomedical explainable AI research Malinverno, L., Barros, V., Ghisoni, F., Visonà, G., Kern, R., Nickel, P. J., Ventura, B. E., Šimić, I., Stryeck, S., Manni, F., Ferri, C., Jean-Quartier, C., Genga, L., Schweikert, G., Lovrić, M., Rosen-Zvi, M. Patterns, 4(9), September 2023 (Published) DOI BibTeX

Empirical Inference Conference Paper Certified private data release for sparse Lipschitz functions Donhauser, K., Lokna, J., Sanyal, A., Boedihardjo, M., Hönig, R., Yang, F. TPDP 2023 - Theory and Practice of Differential Privacy, September 2023 (Published) arXiv URL BibTeX

Empirical Inference Master Thesis Efficient Sampling from Differentiable Matrix Elements Kofler, A. Technical University of Munich, Germany, September 2023 (Published) BibTeX

Empirical Inference Conference Paper How to make semi-private learning more effective Pinto, F., Hu, Y., Yang, F., Sanyal, A. TPDP 2023 - Theory and Practice of Differential Privacy, September 2023 (Published) arXiv URL BibTeX

Social Foundations of Computation Conference Paper Incentivizing Honesty among Competitors in Collaborative Learning and Optimization Dorner, F. E., Konstantinov, N., Pashaliev, G., Vechev, M. In Advances in Neural Information Processing Systems 36 (NeurIPS 2023), The Thirty-Seventh Annual Conference on Neural Information Processing Systems (NeurIPS), September 2023 (Published)
Collaborative learning techniques have the potential to enable training machine learning models that are superior to models trained on a single entity’s data. However, in many cases, potential participants in such collaborative schemes are competitors on a downstream task, such as firms that each aim to attract customers by providing the best recommendations. This can incentivize dishonest updates that damage other participants' models, potentially undermining the benefits of collaboration. In this work, we formulate a game that models such interactions and study two learning tasks within this framework: single-round mean estimation and multi-round SGD on strongly-convex objectives. For a natural class of player actions, we show that rational clients are incentivized to strongly manipulate their updates, preventing learning. We then propose mechanisms that incentivize honest communication and ensure learning quality comparable to full cooperation. Lastly, we empirically demonstrate the effectiveness of our incentive scheme on a standard non-convex federated learning benchmark. Our work shows that explicitly modeling the incentives and actions of dishonest clients, rather than assuming them malicious, can enable strong robustness guarantees for collaborative learning.
arXiv URL BibTeX

Haptic Intelligence Miscellaneous NearContact: Accurate Human Detection using Tomographic Proximity and Contact Sensing with Cross-Modal Attention Garrofé, G., Schoeffmann, C., Zangl, H., Kuchenbecker, K. J., Lee, H. Extended abstract (4 pages) presented at the International Workshop on Human-Friendly Robotics (HFR), Munich, Germany, September 2023 (Published) BibTeX

Empirical Inference Article Neural Causal Structure Discovery from Interventions Ke*, N. R., Bilaniuk*, O., Goyal, A., Bauer, S., Larochelle, H., Schölkopf, B., Mozer, M. C., Pal, C., Bengio, Y. Transactions on Machine Learning Research, September 2023, *equal contribution (Published) URL BibTeX

Empirical Inference Article Simulation-based inference for efficient identification of generative models in computational connectomics Boelts, J., Harth, P., Gao, R., Udvary, D., Yáñez, F., Baum, D., Hege, H., Oberlaender, M., Macke, J. H. PLOS Computational Biology, 19(9):1-28, September 2023 (Published) DOI BibTeX