Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Empirical Inference Perceiving Systems Conference Paper Controlling Text-to-Image Diffusion by Orthogonal Finetuning Qiu*, Z., Liu*, W., Feng, H., Xue, Y., Feng, Y., Liu, Z., Zhang, D., Weller, A., Schölkopf, B. In Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 36:79320-79362, (Editors: A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine), Curran Associates, Inc., 37th Annual Conference on Neural Information Processing Systems , December 2023, *equal contribution (Published)
Large text-to-image diffusion models have impressive capabilities in generating photorealistic images from text prompts. How to effectively guide or control these powerful models to perform different downstream tasks becomes an important open problem. To tackle this challenge, we introduce a principled finetuning method -- Orthogonal Finetuning (OFT), for adapting text-to-image diffusion models to downstream tasks. Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere. We find that this property is crucial for preserving the semantic generation ability of text-to-image diffusion models. To improve finetuning stability, we further propose Constrained Orthogonal Finetuning (COFT) which imposes an additional radius constraint to the hypersphere. Specifically, we consider two important finetuning text-to-image tasks: subject-driven generation where the goal is to generate subject-specific images given a few images of a subject and a text prompt, and controllable generation where the goal is to enable the model to take in additional control signals. We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.
Home Code URL BibTeX

Empirical Inference Master Thesis Denoising Representation Learning for Causal Discovery Sakenyte, U. Université de Genèva, Switzerland, December 2023, external supervision (Published) BibTeX

Social Foundations of Computation Poster Do Personality Tests Generalize to Large Language Models Dorner, F. E., Sühr, T., Samadi, S., Kelava, A. Socially Responsible Language Modelling Research (SoLaR) Workshop, The Thirty-Seventh Annual Conference on Neural Information Processing Systems (NeurIPS), December 2023, *equal contribution (Published)
With large language models (LLMs) appearing to behave increasingly human-like in text-based interactions, it has become popular to attempt to evaluate various properties of these models using tests originally designed for humans. While re-using existing tests is a resource-efficient way to evaluate LLMs, careful adjustments are usually required to ensure that test results are even valid across human sub-populations. Thus, it is not clear to what extent different tests’ validity generalizes to LLMs. In this work, we provide evidence that LLMs’ responses to personality tests systematically deviate from typical human responses, implying that these results cannot be interpreted in the same way as human test results. Concretely, reverse-coded items (e.g. “I am introverted” vs “I am extraverted”) are often both answered affirmatively by LLMs. In addition, variation across different prompts designed to “steer” LLMs to simulate particular personality types does not follow the clear separation into five independent personality factors from human samples. In light of these results, we believe it is important to pay more attention to tests’ validity for LLMs before drawing strong conclusions about potentially ill-defined concepts like LLMs’ “personality”.
URL BibTeX

Social Foundations of Computation Book Fairness and Machine Learning: Limitations and Opportunities Barocas, S., Hardt, M., Narayanan, A. MIT Press, December 2023 (Published)
An introduction to the intellectual foundations and practical utility of the recent work on fairness and machine learning. Fairness and Machine Learning introduces advanced undergraduate and graduate students to the intellectual foundations of this recently emergent field, drawing on a diverse range of disciplinary perspectives to identify the opportunities and hazards of automated decision-making. It surveys the risks in many applications of machine learning and provides a review of an emerging set of proposed solutions, showing how even well-intentioned applications may give rise to objectionable results. It covers the statistical and causal measures used to evaluate the fairness of machine learning models as well as the procedural and substantive aspects of decision-making that are core to debates about fairness, including a review of legal and philosophical perspectives on discrimination. This incisive textbook prepares students of machine learning to do quantitative work on fairness while reflecting critically on its foundations and its practical utility.• Introduces the technical and normative foundations of fairness in automated decision-making• Covers the formal and computational methods for characterizing and addressing problems• Provides a critical assessment of their intellectual foundations and practical utility• Features rich pedagogy and extensive instructor resources
URL BibTeX

Empirical Inference Conference Paper Flow Matching for Scalable Simulation-Based Inference Wildberger*, J., Dax*, M., Buchholz*, S., Green, S. R., Macke, J. H., Schölkopf, B. Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 36:16837-16864, (Editors: A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine), Curran Associates, Inc., 37th Annual Conference on Neural Information Processing Systems, December 2023, *equal contribution (Published) URL BibTeX

Perceiving Systems Article From Skin to Skeleton: Towards Biomechanically Accurate 3D Digital Humans Keller, M., Werling, K., Shin, S., Delp, S., Pujades, S., Liu, C. K., Black, M. J. ACM Transactions on Graphics (TOG), ACM Transactions on Graphics (TOG), 42(6):253:1-253:15, ACM New York, NY, USA, December 2023 (Published)
Great progress has been made in estimating 3D human pose and shape from images and video by training neural networks to directly regress the parameters of parametric human models like SMPL. However, existing body models have simplified kinematic structures that do not correspond to the true joint locations and articulations in the human skeletal system, limiting their potential use in biomechanics. On the other hand, methods for estimating biomechanically accurate skeletal motion typically rely on complex motion capture systems and expensive optimization methods. What is needed is a parametric 3D human model with a biomechanically accurate skeletal structure that can be easily posed. To that end, we develop SKEL, which re-rigs the SMPL body model with a biomechanics skeleton. To enable this, we need training data of skeletons inside SMPL meshes in diverse poses. We build such a dataset by optimizing biomechanically accurate skeletons inside SMPL meshes from AMASS sequences. We then learn a regressor from SMPL mesh vertices to the optimized joint locations and bone rotations. Finally, we re-parametrize the SMPL mesh with the new kinematic parameters. The resulting SKEL model is animatable like SMPL but with fewer, and biomechanically-realistic, degrees of freedom. We show that SKEL has more biomechanically accurate joint locations than SMPL, and the bones fit inside the body surface better than previous methods. By fitting SKEL to SMPL meshes we are able to “upgrade" existing human pose and shape datasets to include biomechanical parameters. SKEL provides a new tool to enable biomechanics in the wild, while also providing vision and graphics researchers with a better constrained
Project Page Paper DOI URL BibTeX

Empirical Inference Conference Paper Generalized Bayesian Inference for Scientific Simulators via Amortized Cost Estimation Gao*, R., Deistler*, M., Macke, J. H. Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 36:80191-80219, (Editors: A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine), Curran Associates, Inc., 37th Annual Conference on Neural Information Processing Systems, December 2023, *equal contribution (Published) URL BibTeX

Autonomous Learning Conference Paper Goal-conditioned Offline Planning from Curious Exploration Bagatella, M., Martius, G. In Advances in Neural Information Processing Systems 36, December 2023 (Published)
Curiosity has established itself as a powerful exploration strategy in deep reinforcement learning. Notably, leveraging expected future novelty as intrinsic motivation has been shown to efficiently generate exploratory trajectories, as well as a robust dynamics model. We consider the challenge of extracting goal-conditioned behavior from the products of such unsupervised exploration techniques, without any additional environment interaction. We find that conventional goal-conditioned reinforcement learning approaches for extracting a value function and policy fall short in this difficult offline setting. By analyzing the geometry of optimal goal-conditioned value functions, we relate this issue to a specific class of estimation artifacts in learned values. In order to mitigate their occurrence, we propose to combine model-based planning over learned value landscapes with a graph-based value aggregation scheme. We show how this combination can correct both local and global artifacts, obtaining significant improvements in zero-shot goal-reaching performance across diverse simulated environments.
URL BibTeX

Empirical Inference Conference Paper Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures Eschenhagen, R., Immer, A., Turner, R., Schneider, F., Hennig, P. Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 36:33624-33655, (Editors: A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine), Curran Associates, Inc., 37th Annual Conference on Neural Information Processing Systems, December 2023 (Published) URL BibTeX

Empirical Inference Conference Paper Learning Layer-wise Equivariances Automatically using Gradients van der Ouderaa, T., Immer, A., van der Wilk, M. Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 36:28365-28377, (Editors: A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine), Curran Associates, Inc., 37th Annual Conference on Neural Information Processing Systems, December 2023 (Published) URL BibTeX

Empirical Inference Conference Paper Learning Linear Causal Representations from Interventions under General Nonlinear Mixing Buchholz*, S., Rajendran*, G., Rosenfeld, E., Aragam, B., Schölkopf, B., Ravikumar, P. Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 36:45419-45462, (Editors: A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine), Curran Associates, Inc., 37th Annual Conference on Neural Information Processing Systems, December 2023, *equal contribution (Published) URL BibTeX

Empirical Inference Conference Paper Meta-learning families of plasticity rules in recurrent spiking networks using simulation-based inference Confavreux*, B., Ramesh*, P., Goncalves, P. J., Macke, J. H., Vogels, T. P. Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 36:13545-13558, (Editors: A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine), Curran Associates, Inc., 37th Annual Conference on Neural Information Processing Systems, December 2023, *equal contribution (Published) URL BibTeX

Empirical Inference Article Multimodal learning in clinical proteomics: enhancing antimicrobial resistance prediction models with chemical information Visonà, G., Duroux, D., Miranda, L., Sükei, E., Li, Y., Borgwardt, K., Oliver, C. Bioinformatics, 39(12), December 2023 (Published) DOI BibTeX

Empirical Inference Conference Paper Neural Harmonics: Bridging Spectral Embedding and Matrix Completion in Self-Supervised Learning Munkhoeva, M., Oseledets, I. Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 36:60712-60723, (Editors: A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine), Curran Associates, Inc., 37th Annual Conference on Neural Information Processing Systems, December 2023 (Published) URL BibTeX

Empirical Inference Conference Paper Nonparametric Identifiability of Causal Representations from Unknown Interventions von Kügelgen, J., Besserve, M., Liang, W., Gresele, L., Kekić, A., Bareinboim, E., Blei, D., Schölkopf, B. Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 36:48603-48638, (Editors: A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine), Curran Associates, Inc., 37th Annual Conference on Neural Information Processing Systems, December 2023 (Published) URL BibTeX

Empirical Inference Conference Paper Nonparametric Teaching for Multiple Learners Zhang, C., Cao, X., Liu, W., Tsang, I. W., Kwok, J. T. In Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 36:7756-7786, (Editors: A. Oh and T. Naumann and A. Globerson and K. Saenko and M. Hardt and S. Levine), Curran Associates, Inc., 37th Annual Conference on Neural Information Processing Systems, December 2023 (Published) URL BibTeX

Autonomous Learning Conference Paper Object-Centric Learning for Real-World Videos by Predicting Temporal Feature Similarities Zadaianchuk, A., Seitzer, M., Martius, G. In Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023), Advances in Neural Information Processing Systems 36, December 2023
Unsupervised video-based object-centric learning is a promising avenue to learn structured representations from large, unlabeled video collections, but previous approaches have only managed to scale to real-world datasets in restricted domains. Recently, it was shown that the reconstruction of pre-trained self-supervised features leads to object-centric representations on unconstrained real-world image datasets. Building on this approach, we propose a novel way to use such pre-trained features in the form of a temporal feature similarity loss. This loss encodes semantic and temporal correlations between image patches and is a natural way to introduce a motion bias for object discovery. We demonstrate that this loss leads to state-of-the-art performance on the challenging synthetic MOVi datasets. When used in combination with the feature reconstruction loss, our model is the first object-centric video model that scales to unconstrained video datasets such as YouTube-VIS.
arXiv Website OpenReview URL BibTeX

Empirical Inference Conference Paper On the Importance of Step-wise Embeddings for Heterogeneous Clinical Time-Series Kuznetsova*, R., Pace*, A., Burger*, M., Yèche, H., Rätsch, G. Proceedings of the 3rd Machine Learning for Health Symposium (ML4H) , 225:268-291, Proceedings of Machine Learning Research, (Editors: Hegselmann, S.and Parziale, A. and Shanmugam, D. and Tang, S. and Asiedu, M. N. and Chang, S. and Hartvigsen, T. and Singh, H.), PMLR, December 2023, *equal contribution (Published) URL BibTeX

Empirical Inference Conference Paper SE(3) Equivariant Augmented Coupling Flows Midgley*, L. I., Stimper*, V., Antorán*, J., Mathieu*, E., Schölkopf, B., Hernández-Lobato, J. M. Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 36:79200-79225, (Editors: A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine), Curran Associates, Inc., 37th Annual Conference on Neural Information Processing Systems, December 2023, *equal contribution (Published)
Coupling normalizing flows allow for fast sampling and density evaluation, making them the tool of choice for probabilistic modeling of physical systems. However, the standard coupling architecture precludes endowing flows that operate on the Cartesian coordinates of atoms with the SE(3) and permutation invariances of physical systems. This work proposes a coupling flow that preserves SE(3) and permutation equivariance by performing coordinate splits along additional augmented dimensions. At each layer, the flow maps atoms’ positions into learned SE(3) invariant bases, where we apply standard flow transformations, such as monotonic rational-quadratic splines, before returning to the original basis. Crucially, our flow preserves fast sampling and density evaluation, and may be used to produce unbiased estimates of expectations with respect to the target distribution via importance sampling. When trained on the DW4, LJ13 and QM9-positional datasets, our flow is competitive with equivariant continuous normalizing flows, while allowing sampling two orders of magnitude faster. Moreover, to the best of our knowledge, we are the first to learn the full Boltzmann distribution of alanine dipeptide by only modeling the Cartesian positions of its atoms. Lastly, we demonstrate that our flow can be trained to approximately sample from the Boltzmann distribution of the DW4 and LJ13 particle systems using only their energy functions.
arXiv URL BibTeX

Empirical Inference Conference Paper Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent Lin*, J. A., Antorán*, J., Padhy*, S., Janz, D., Hernández-Lobato, J. M., Terenin, A. Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 36:36886-36912, (Editors: A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine), Curran Associates, Inc., 37th Annual Conference on Neural Information Processing Systems, December 2023, *equal contribution (Published) URL BibTeX

Empirical Inference Conference Paper Spuriosity Didn’t Kill the Classifier: Using Invariant Predictions to Harness Spurious Features Eastwood*, C., Singh*, S., Nicolicioiu, A. L., Vlastelica, M., von Kügelgen, J., Schölkopf, B. In Advances in Neural Information Processing Systems 36, 36:18291-18324, (Editors: A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine), Curran Associates Inc., 37th Annual Conference on Neural Information Processing Systems, December 2023, *equal contribution (Published) URL BibTeX

Empirical Inference Conference Paper Spuriosity Didn’t Kill the Classifier: Using Invariant Predictions to Harness Spurious Features Eastwood*, C., Singh*, S., Nicolicioiu, A. L., Vlastelica, M., von Kügelgen, J., Schölkopf, B. Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 36:18291-18324, (Editors: A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine), Curran Associates, Inc., 37th Annual Conference on Neural Information Processing Systems, December 2023, *equal contribution (Published) URL BibTeX

Empirical Inference Ph.D. Thesis Stochastic Predictive Control for Legged Robots Gazar, A. University of Tübingen, Germany, December 2023 (Published) DOI BibTeX

Perceiving Systems Article FLARE: Fast learning of Animatable and Relightable Mesh Avatars Bharadwaj, S., Zheng, Y., Hilliges, O., Black, M. J., Fernandez Abrevaya, V. ACM Transactions on Graphics (TOG), ACM Transactions on Graphics (TOG), 42(6):204:1-204:15, ACM New York, NY, USA, December 2023 (Published)
Our goal is to efficiently learn personalized animatable 3D head avatars from videos that are geometrically accurate, realistic, relightable, and compatible with current rendering systems. While 3D meshes enable efficient processing and are highly portable, they lack realism in terms of shape and appearance. Neural representations, on the other hand, are realistic but lack compatibility and are slow to train and render. Our key insight is that it is possible to efficiently learn high-fidelity 3D mesh representations via differentiable rendering by exploiting highly-optimized methods from traditional computer graphics and approximating some of the components with neural networks. To that end, we introduce FLARE, a technique that enables the creation of animatable and relightable mesh avatars from a single monocular video. First, we learn a canonical geometry using a mesh representation, enabling efficient differentiable rasterization and straightforward animation via learned blendshapes and linear blend skinning weights. Second, we follow physically-based rendering and factor observed colors into intrinsic albedo, roughness, and a neural representation of the illumination, allowing the learned avatars to be relit in novel scenes. Since our input videos are captured on a single device with a narrow field of view, modeling the surrounding environment light is non-trivial. Based on the split-sum approximation for modeling specular reflections, we address this by approximating the pre-filtered environment map with a multi-layer perceptron (MLP) modulated by the surface roughness, eliminating the need to explicitly model the light. We demonstrate that our mesh-based avatar formulation, combined with learned deformation, material, and lighting MLPs, produces avatars with high-quality geometry and appearance, while also being efficient to train and render compared to existing approaches.
Paper Project Page Code DOI URL BibTeX

Empirical Inference Article Data-Efficient Learning via Minimizing Hyperspherical Energy Cao, X., Liu, W., Tsang, I. W. IEEE transactions on pattern analysis and machine intelligence, 45(11):13422-13437, November 2023 (Published) DOI BibTeX

Robotic Materials Article Electrochemically Controlled Hydrogels with Electrotunable Permeability and Uniaxial Actuation Benselfelt, T., Shakya, J., Rothemund, P., Lindström, S. B., Piper, A., Winkler, T. E., Hajian, A., Wågberg, L., Keplinger, C., Hamedi, M. M. Advanced Materials, 35(45):2303255, Wiley-VCH GmbH, November 2023
The unique properties of hydrogels enable the design of life-like soft intelligent systems. However, stimuli-responsive hydrogels still suffer from limited actuation control. Direct electronic control of electronically conductive hydrogels can solve this challenge and allow direct integration with modern electronic systems. An electrochemically controlled nanowire composite hydrogel with high in-plane conductivity that stimulates a uniaxial electrochemical osmotic expansion is demonstrated. This materials system allows precisely controlled shape-morphing at only −1 V, where capacitive charging of the hydrogel bulk leads to a large uniaxial expansion of up to 300%, caused by the ingress of ≈700 water molecules per electron–ion pair. The material retains its state when turned off, which is ideal for electrotunable membranes as the inherent coupling between the expansion and mesoporosity enables electronic control of permeability for adaptive separation, fractionation, and distribution. Used as electrochemical osmotic hydrogel actuators, they achieve an electroactive pressure of up to 0.7 MPa (1.4 MPa vs dry) and a work density of ≈150 kJ m−3 (2 MJ m−3 vs dry). This new materials system paves the way to integrate actuation, sensing, and controlled permeation into advanced soft intelligent systems.
DOI URL BibTeX

Autonomous Learning Conference Paper Improving Behavioural Cloning with Positive Unlabeled Learning Wang, Q., McCarthy, R., Bulens, D. C., McGuinness, K., O’Connor, N. E., Sanchez, F. R., Gürtler, N., Widmaier, F., Redmond, S. J. 7th Annual Conference on Robot Learning (CoRL), November 2023 (Accepted) BibTeX

Haptic Intelligence Article Towards Semi-Automated Pleural Cavity Access for Pneumothorax in Austere Environments L’Orsa, R., Lama, S., Westwick, D., Sutherland, G., Kuchenbecker, K. J. Acta Astronautica, 212:48-53, November 2023 (Published)
Astronauts are at risk for pneumothorax, a condition where injury or disease introduces air between the chest wall and the lungs (i.e., the pleural cavity). In a worst-case scenario, it can rapidly lead to a fatality if left unmanaged and will require prompt treatment in situ if developed during spaceflight. Chest tube insertion is the definitive treatment for pneumothorax, but it requires a high level of skill and frequent practice for safe use. Physician astronauts may struggle to maintain this skill on medium- and long-duration exploration-class missions, and it is inappropriate for pure just-in-time learning or skill refreshment paradigms. This paper proposes semi-automating tool insertion to reduce the risk of complications in austere environments and describes preliminary experiments providing initial validation of an intelligent prototype system. Specifically, we showcase and analyse motion and force recordings from a sensorized percutaneous access needle inserted repeatedly into an ex vivo tissue phantom, along with relevant physiological data simultaneously recorded from the operator. When coupled with minimal just-in-time training and/or augmented reality guidance, the proposed system may enable non-expert operators to safely perform emergency chest tube insertion without the use of ground resources.
DOI BibTeX

Empirical Inference Article Variational Causal Dynamics: Discovering Modular World Models from Interventions Lei, A., Schölkopf, B., Posner, I. Transactions on Machine Learning Research, November 2023 (Published) URL BibTeX

Robotic Materials Patent Hydraulically Amplified Self-healing Electrostatic Actuators Keplinger, C. M., Acome, E. L., Kellaris, N. A., Mitchell, S. K. (US Patent 11795979B2), October 2023
An electro-hydraulic actuator includes a deformable shell defining an enclosed internal cavity and containing a liquid dielectric, first and second electrodes on first and second sides, respectively, of the enclosed internal cavity. An electrostatic force between the first and second electrodes upon application of a voltage to one of the electrodes draws the electrodes towards each other to displace the liquid dielectric within the enclosed internal cavity. The shell includes active and inactive areas such that the electrostatic forces between the first and second electrodes displaces the liquid dielectric within the enclosed internal cavity from the active area of the shell to the inactive area of the shell. The first and second electrodes, the deformable shell, and the liquid dielectric cooperate to form a self-healing capacitor, and the liquid dielectric is configured for automatically filling breaches in the liquid dielectric resulting from dielectric breakdown.
URL BibTeX

Perceiving Systems Ph.D. Thesis Neural Shape Modeling of 3D Clothed Humans Ma, Q. October 2023 (Published)
Parametric models for 3D human bodies play a crucial role in the synthesis and analysis of humans in visual computing. While current models effectively capture body pose and shape variations, a significant aspect has been overlooked – clothing. Existing 3D human models mostly produce a minimally-clothed body geometry, limiting their ability to represent the complexity of dressed people in real-world data sources. The challenge lies in the unique characteristics of garments, which make modeling clothed humans particularly difficult. Clothing exhibits diverse topologies, and as the body moves, it introduces wrinkles at various spatial scales. Moreover, pose-dependent clothing deformations are non-rigid and non-linear, exceeding the capabilities of classical body models constructed with fixed-topology surface meshes and linear approximations of pose-aware shape deformations. This thesis addresses these challenges by innovating in two key areas: the 3D shape representation and deformation modeling techniques. We demonstrate that, the seemingly old-fashioned shape representation, point clouds – when equipped with deep learning and neural fields – can be a powerful tool for modeling clothed characters. Specifically, the thesis begins by introducing a large-scale dataset of dynamic 3D humans in various clothing, which serves as a foundation for training the models presented in this work. The first model we present is CAPE: a neural generative model for 3D clothed human meshes. Here, a clothed body is straightforwardly obtained by applying per-vetex offsets to a pre-defined, unclothed body template mesh. Sampling from the CAPE model generates plausibly-looking digital humans wearing common garments, but the fixed-topology mesh representation limits its applicability to more complex garment types. To address this limitation, we present a series of point-based clothed human models: SCALE, PoP and SkiRT. The SCALE model represents a clothed human using a collection of points organized into local patches. The patches can freely move and deform to represent garments of diverse topologies, unlocking the generalization to more challenging outfits such as dresses and jackets. Unlike traditional approaches based on physics simulations, SCALE learns pose-dependent cloth deformations from data with minimal manual intervention. To further improve the geometric quality, the PoP model eliminates the concept of patches and instead learns a continuous neural deformation field from the body surface. Densely querying this field results in a highresolution point cloud of a dressed human, showcasing intricate clothing wrinkles. PoP can generalize across multiple subjects and outfits, and can even bring a single, static scan into animation. Finally, we tackle a long-standing challenge in learning-based digital human modeling: loose garments, in particular skirts and dresses. Building upon PoP, the SkiRT pipeline further learns a shape “template” and neural field of linear-blend-skinning weights for clothed bodies, improving the models’ robustness for loose garments of varied topology. Our point-based human models are “interplicit”: the output point clouds capture surfaces explicitly at discrete points but implicitly in between. The explicit points are fast, topologically flexible, and are compatible with existing graphics tools, while the implicit neural deformation field contributes to high-quality geometry. This thesis primarily demonstrates these advantages in the context of clothed human shape modeling; future work can apply our representation and techniques to general 3D deformable shapes and neural rendering.
download Thesis DOI BibTeX

Perceiving Systems Conference Paper Optimizing the 3D Plate Shape for Proximal Humerus Fractures Keller, M., Krall, M., Smith, J., Clement, H., Kerner, A. M., Gradischar, A., Schäfer, Ü., Black, M. J., Weinberg, A., Pujades, S. International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 487-496, Springer, Cham, MICCAI, October 2023 (Published)
To treat bone fractures, implant manufacturers produce 2D anatomically contoured plates. Unfortunately, existing plates only fit a limited segment of the population and/or require manual bending during surgery. Patient-specific implants would provide major benefits such as reducing surgery time and improving treatment outcomes but they are still rare in clinical practice. In this work, we propose a patient-specific design for the long helical 2D PHILOS (Proximal Humeral Internal Locking System) plate, used to treat humerus shaft fractures. Our method automatically creates a custom plate from a CT scan of a patient's bone. We start by designing an optimal plate on a template bone and, with an anatomy-aware registration method, we transfer this optimal design to any bone. In addition, for an arbitrary bone, our method assesses if a given plate is fit for surgery by automatically positioning it on the bone. We use this process to generate a compact set of plate shapes capable of fitting the bones within a given population. This plate set can be pre-printed in advance and readily available, removing the fabrication time between the fracture occurrence and the surgery. Extensive experiments on ex-vivo arms and 3D-printed bones show that the generated plate shapes (personalized and plate-set) faithfully match the individual bone anatomy and are suitable for clinical practice.
Project page Code Paper Poster DOI URL BibTeX

Empirical Inference Article A taxonomy and review of generalization research in NLP Hupkes, D., Giulianelli, M., Dankers, V., Artetxe, M., Elazar, Y., Pimentel, T., Christodoulopoulos, C., Lasri, K., Saphra, N., Sinclair, A., Ulmer, D., Schottmann, F., Batsuren, K., Sun, K., Sinha, K., Khalatbari, L., Ryskina, M., Frieske, R., Cotterell, R., Jin, Z. Nature Machine Intelligence, 5(10):1161-1174, October 2023 (Published) DOI BibTeX

Empirical Inference Article Artificial Intelligence in Oncological Hybrid Imaging Feuerecker, B., Heimer, M. M., Geyer, T., Fabritius, M. P., Gu, S., Schachtner, B., Beyer, L., Ricke, J., Gatidis, S., Ingrisch, M., Cyran, C. C. Nuklearmedizin, 62(5):296-305, October 2023 (Published) DOI BibTeX

Haptic Intelligence Intelligent Control Systems Conference Paper Enhancing Surgical Team Collaboration and Situation Awareness through Multimodal Sensing Allemang–Trivalle, A. In Proceedings of the ACM International Conference on Multimodal Interaction, 716-720, Extended abstract (5 pages) presented at the ACM International Conference on Multimodal Interaction (ICMI) Doctoral Consortium, Paris, France, October 2023 (Published)
Surgery, typically seen as the surgeon's sole responsibility, requires a broader perspective acknowledging the vital roles of other operating room (OR) personnel. The interactions among team members are crucial for delivering quality care and depend on shared situation awareness. I propose a two-phase approach to design and evaluate a multimodal platform that monitors OR members, offering insights into surgical procedures. The first phase focuses on designing a data-collection platform, tailored to surgical constraints, to generate novel collaboration and situation-awareness metrics using synchronous recordings of the participants' voices, positions, orientations, electrocardiograms, and respiration signals. The second phase concerns the creation of intuitive dashboards and visualizations, aiding surgeons in reviewing recorded surgery, identifying adverse events and contributing to proactive measures. This work aims to demonstrate an innovative approach to data collection and analysis, augmenting the surgical team's capabilities. The multimodal platform has the potential to enhance collaboration, foster situation awareness, and ultimately mitigate surgical adverse events. This research sets the stage for a transformative shift in the OR, enabling a more holistic and inclusive perspective that recognizes that surgery is a team effort.
DOI BibTeX

Perceiving Systems Conference Paper Generalizing Neural Human Fitting to Unseen Poses With Articulated SE(3) Equivariance Feng, H., Kulits, P., Liu, S., Black, M. J., Fernandez Abrevaya, V. In Proc. International Conference on Computer Vision (ICCV), International Conference on Computer Vision, October 2023 (Published)
We address the problem of fitting a parametric human body model (SMPL) to point cloud data. Optimization based methods require careful initialization and are prone to becoming trapped in local optima. Learning-based methods address this but do not generalize well when the input pose is far from those seen during training. For rigid point clouds, remarkable generalization has been achieved by leveraging SE(3)-equivariant networks, but these methods do not work on articulated objects. In this work we extend this idea to human bodies and propose ArtEq, a novel part-based SE(3)-equivariant neural architecture for SMPL model estimation from point clouds. Specifically, we learn a part detection network by leveraging local SO(3) invariance, and regress shape and pose using articulated SE(3) shape-invariant and pose-equivariant networks, all trained end-to-end. Our novel pose regression module leverages the permutation-equivariant property of self-attention layers to preserve rotational equivariance. Experimental results show that ArtEq generalizes to poses not seen during training, outperforming state-of-the-art methods by ~44%in terms of body reconstruction accuracy, without requiring an optimization refinement step. Furthermore, ArtEq is three orders of magnitude faster during inference than prior work and has 97.3% fewer parameters. The code and model are available for research purposes at https://arteq.is.tue.mpg.de.
arxiv project URL BibTeX

Haptic Intelligence Ph.D. Thesis Gesture-Based Nonverbal Interaction for Exercise Robots Mohan, M. University of Tübingen, Tübingen, Germany, October 2023, Department of Computer Science (Published)
When teaching or coaching, humans augment their words with carefully timed hand gestures, head and body movements, and facial expressions to provide feedback to their students. Robots, however, rarely utilize these nuanced cues. A minimally supervised social robot equipped with these abilities could support people in exercising, physical therapy, and learning new activities. This thesis examines how the intuitive power of human gestures can be harnessed to enhance human-robot interaction. To address this question, this research explores gesture-based interactions to expand the capabilities of a socially assistive robotic exercise coach, investigating the perspectives of both novice users and exercise-therapy experts. This thesis begins by concentrating on the user's engagement with the robot, analyzing the feasibility of minimally supervised gesture-based interactions. This exploration seeks to establish a framework in which robots can interact with users in a more intuitive and responsive manner. The investigation then shifts its focus toward the professionals who are integral to the success of these innovative technologies: the exercise-therapy experts. Roboticists face the challenge of translating the knowledge of these experts into robotic interactions. We address this challenge by developing a teleoperation algorithm that can enable exercise therapists to create customized gesture-based interactions for a robot. Thus, this thesis lays the groundwork for dynamic gesture-based interactions in minimally supervised environments, with implications for not only exercise-coach robots but also broader applications in human-robot interaction.
BibTeX

Social Foundations of Computation Conference Paper Is Your Model Predicting the Past? Hardt, M., Kim, M. P. In Proceedings of the Third ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO), ACM, October 2023 (Published)
When does a machine learning model predict the future of individuals and when does it recite patterns that predate the individuals? In this work, we propose a distinction between these two pathways of prediction, supported by theoretical, empirical, and normative arguments. At the center of our proposal is a family of simple and efficient statistical tests, called backward baselines, that demonstrate if, and to what extent, a model recounts the past. Our statistical theory provides guidance for interpreting backward baselines, establishing equivalences between different baselines and familiar statistical concepts. Concretely, we derive a meaningful backward baseline for auditing a prediction system as a black box, given only background variables and the system’s predictions. Empirically, we evaluate the framework on different prediction tasks derived from longitudinal panel surveys, demonstrating the ease and effectiveness of incorporating backward baselines into the practice of machine learning.
URL BibTeX

Empirical Inference Perceiving Systems Conference Paper One-shot Implicit Animatable Avatars with Model-based Priors Huang, Y., Yi, H., Liu, W., Wang, H., Wu, B., Wang, W., Lin, B., Zhang, D., Cai, D. In Proc. International Conference on Computer Vision (ICCV), 8940-8951, International Conference on Computer Vision, October 2023, *equal contribution (Published)
Existing neural rendering methods for creating human avatars typically either require dense input signals such as video or multi-view images, or leverage a learned prior from large-scale specific 3D human datasets such that reconstruction can be performed with sparse-view inputs. Most of these methods fail to achieve realistic reconstruction when only a single image is available. To enable the data-efficient creation of realistic animatable 3D humans, we propose ELICIT, a novel method for learning human-specific neural radiance fields from a single image. Inspired by the fact that humans can easily reconstruct the body geometry and infer the full-body clothing from a single image, we leverage two priors in ELICIT: 3D geometry prior and visual semantic prior. Specifically, ELICIT introduces the 3D body shape geometry prior from a skinned vertex-based template model (i.e., SMPL) and implements the visual clothing semantic prior with the CLIP-based pre-trained models. Both priors are used to jointly guide the optimization for creating plausible content in the invisible areas. In order to further improve visual details, we propose a segmentation-based sampling strategy that locally refines different parts of the avatar.Comprehensive evaluations on multiple popular benchmarks, including ZJU-MoCAP, Human3.6M, and DeepFashion, show that ELICIT has outperformed current state-of-the-art avatar creation methods when only a single image is available. Code will be public for reseach purpose at https://github.com/huangyangyi/ELICIT
arXiv code project DOI BibTeX

Perceiving Systems Empirical Inference Conference Paper Pairwise Similarity Learning is SimPLE Wen, Y., Liu, W., Feng, Y., Raj, B., Singh, R., Weller, A., Black, M. J., Schölkopf, B. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), International Conference on Computer Vision, October 2023 (Published)
In this paper, we focus on a general yet important learning problem, pairwise similarity learning (PSL). PSL subsumes a wide range of important applications, such as open-set face recognition, speaker verification, image retrieval and person re-identification. The goal of PSL is to learn a pairwise similarity function assigning a higher similarity score to positive pairs (i.e., a pair of samples with the same label) than to negative pairs (i.e., a pair of samples with different label). We start by identifying a key desideratum for PSL, and then discuss how existing methods can achieve this desideratum. We then propose a surprisingly simple proxy-free method, called SimPLE, which requires neither feature/proxy normalization nor angular margin and yet is able to generalize well in open-set recognition. We apply the proposed method to three challenging PSL tasks: open-set face recognition, image retrieval and speaker verification. Comprehensive experimental results on large-scale benchmarks show that our method performs significantly better than current state-of-the-art methods.
URL BibTeX