Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Empirical Inference Conference Paper Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding Pace, A., Yèche, H., Schölkopf, B., Rätsch, G., Tennenholtz, G. The Twelfth International Conference on Learning Representations (ICLR), May 2024 (Published) arXiv BibTeX

Autonomous Learning Conference Paper Emergent mechanisms for long timescales depend on training curriculum and affect performance in memory tasks Khajehabdollahi, S., Zeraati, R., Giannakakis, E., Schäfer, T. J., Martius, G., Levina, A. In The Twelfth International Conference on Learning Representations, ICLR 2024, May 2024 (Published) URL BibTeX

Perceiving Systems Article Exploring Weight Bias and Negative Self-Evaluation in Patients with Mood Disorders: Insights from the BodyTalk Project, Meneguzzo, P., Behrens, S. C., Pavan, C., Toffanin, T., Quiros-Ramirez, M. A., Black, M. J., Giel, K., Tenconi, E., Favaro, A. Frontiers in Psychiatry, 15, Sec. Psychopathology, May 2024 (Published)
Background: Negative body image and adverse body self-evaluation represent key psychological constructs within the realm of weight bias (WB), potentially intertwined with the negative self-evaluation characteristic of depressive symptomatology. Although WB encapsulates an implicit form of self-critical assessment, its exploration among people with mood disorders (MD) has been under-investigated. Our primary goal is to comprehensively assess both explicit and implicit WB, seeking to reveal specific dimensions that could interconnect with the symptoms of MDs. Methods: A cohort comprising 25 MD patients and 35 demographically matched healthy peers (with 83\%\/ female representation) participated in a series of tasks designed to evaluate the congruence between various computer-generated body representations and a spectrum of descriptive adjectives. Our analysis delved into multiple facets of body image evaluation, scrutinizing the associations between different body sizes and emotionally charged adjectives (e.g., active, apple-shaped, attractive). Results: No discernible differences emerged concerning body dissatisfaction or the correspondence of different body sizes with varying adjectives. Interestingly, MD patients exhibited a markedly higher tendency to overestimate their body weight (p = 0.011). Explicit WB did not show significant variance between the two groups, but MD participants demonstrated a notable implicit WB within a specific weight rating task for BMI between 18.5 and 25 kg/m2 (p = 0.012). Conclusions: Despite the striking similarities in the assessment of participants’ body weight, our investigation revealed an implicit WB among individuals grappling with MD. This bias potentially assumes a role in fostering self-directed negative evaluations, shedding light on a previously unexplored facet of the interplay between WB and mood disorders.
paper paper DOI URL BibTeX

Social Foundations of Computation Conference Paper Fairness Rising from the Ranks: HITS and PageRank on Homophilic Networks Stoica, A., Litvak, N., Chaintreau, A. In Proceedings of the Association for Computing Machinery (ACM) Web Conference 2024, ACM, The 2024 ACM Web Conference, May 2024 (Published)
In this paper, we investigate the conditions under which link analysis algorithms prevent minority groups from reaching high-ranking slots. We find that the most common link-based algorithms using centrality metrics, such as PageRank and HITS, can reproduce and even amplify bias against minority groups in networks. Yet, their behavior differs: on the one hand, we empirically show that PageRank mirrors the degree distribution for most of the ranking positions and it can equalize representation of minorities among the top-ranked nodes; on the other hand, we find that HITS amplifies pre-existing bias in homophilic networks through a novel theoretical analysis, supported by empirical results. We find the root cause of bias amplification in HITS to be the level of homophily present in the network, modeled through an evolving network model with two communities. We illustrate our theoretical analysis on both synthetic and real datasets and we present directions for future work.
ArXiv URL BibTeX

Haptic Intelligence Robotics Miscellaneous GaitGuide: A Wearable Device for Vibrotactile Motion Guidance Rokhmanova, N., Martus, J., Faulkner, R., Fiene, J., Kuchenbecker, K. J. Workshop paper (3 pages) presented at the ICRA Workshop on Advancing Wearable Devices and Applications Through Novel Design, Sensing, Actuation, and AI, Yokohama, Japan, May 2024 (Published)
Wearable vibrotactile devices can provide salient sensations that attract the user's attention or guide them to change. The future integration of such feedback into medical or consumer devices would benefit from understanding how vibrotactile cues vary in amplitude and perceived strength across the heterogeneity of human skin. Here, we developed an adhesive vibrotactile device (the GaitGuide) that uses two individually mounted linear resonant actuators to deliver directional motion guidance. By measuring the mechanical vibrations of the actuators via small on-board accelerometers, we compared vibration amplitudes and perceived signal strength across 20 subjects at five signal voltages and four sites around the shank. Vibrations were consistently smallest in amplitude—but perceived to be strongest—at the site located over the tibia. We created a fourth-order linear dynamic model to capture differences in tissue properties across subjects and sites via optimized stiffness and damping parameters. The anterior site had significantly higher skin stiffness and damping; these values also correlate with subject-specific body-fat percentages. Surprisingly, our study shows that the perception of vibrotactile stimuli does not solely depend on the vibration magnitude delivered to the skin. These findings also help to explain the clinical practice of evaluating vibrotactile sensitivity over a bony prominence.
URL BibTeX

Perceiving Systems Empirical Inference Conference Paper Ghost on the Shell: An Expressive Representation of General 3D Shapes Liu, Z., Feng, Y., Xiu, Y., Liu, W., Paull, L., Black, M. J., Schölkopf, B. In Proceedings of the Twelfth International Conference on Learning Representations (ICLR), The Twelfth International Conference on Learning Representations (ICLR), May 2024 (Published)
The creation of photorealistic virtual worlds requires the accurate modeling of 3D surface geometry for a wide range of objects. For this, meshes are appealing since they 1) enable fast physics-based rendering with realistic material and lighting, 2) support physical simulation, and 3) are memory-efficient for modern graphics pipelines. Recent work on reconstructing and statistically modeling 3D shape, however, has critiqued meshes as being topologically inflexible. To capture a wide range of object shapes, any 3D representation must be able to model solid, watertight, shapes as well as thin, open, surfaces. Recent work has focused on the former, and methods for reconstructing open surfaces do not support fast reconstruction with material and lighting or unconditional generative modelling. Inspired by the observation that open surfaces can be seen as islands floating on watertight surfaces, we parameterize open surfaces by defining a manifold signed distance field on watertight templates. With this parameterization, we further develop a grid-based and differentiable representation that parameterizes both watertight and non-watertight meshes of arbitrary topology. Our new representation, called Ghost-on-the-Shell (G-Shell), enables two important applications: differentiable rasterization-based reconstruction from multiview images and generative modelling of non-watertight meshes. We empirically demonstrate that G-Shell achieves state-of-the-art performance on non-watertight mesh reconstruction and generation tasks, while also performing effectively for watertight meshes.
Home Code Video Project BibTeX

Empirical Inference Conference Paper Identifying Policy Gradient Subspaces Schneider, J., Schumacher, P., Guist, S., Chen, L., Häufle, D., Schölkopf, B., Büchler, D. The Twelfth International Conference on Learning Representations (ICLR), May 2024 (Published) arXiv BibTeX

Autonomous Learning Conference Paper Learning Hierarchical World Models with Adaptive Temporal Abstractions from Discrete Latent Dynamics Gumbsch, C., Sajid, N., Martius, G., Butz, M. V. In The Twelfth International Conference on Learning Representations, ICLR 2024, May 2024 URL BibTeX

Empirical Inference Autonomous Learning Conference Paper Multi-View Causal Representation Learning with Partial Observability Yao, D., Xu, D., Lachapelle, S., Magliacane, S., Taslakian, P., Martius, G., von Kügelgen, J., Locatello, F. The Twelfth International Conference on Learning Representations (ICLR), May 2024 (Published) arXiv BibTeX

Empirical Inference Conference Paper Open X-Embodiment: Robotic Learning Datasets and RT-X Models Open X-Embodiment Collaboration ( incl. Guist, S., Schneider, J., Schölkopf, B., Büchler, D. ). IEEE International Conference on Robotics and Automation (ICRA), 6892-6903, May 2024 (Published) arXiv DOI URL BibTeX

Empirical Inference Conference Paper Out-of-Variable Generalization for Discriminative Models Guo, S., Wildberger, J., Schölkopf, B. The Twelfth International Conference on Learning Representations (ICLR), May 2024 (Published) arXiv BibTeX

Empirical Inference Perceiving Systems Conference Paper Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization Liu, W., Qiu, Z., Feng, Y., Xiu, Y., Xue, Y., Yu, L., Feng, H., Liu, Z., Heo, J., Peng, S., Wen, Y., Black, M. J., Weller, A., Schölkopf, B. In Proceedings of the Twelfth International Conference on Learning Representations (ICLR), The Twelfth International Conference on Learning Representations, May 2024 (Published)
Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a few key desiderata that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast Fourier transform algorithm enables efficient information transmission, we propose an efficient orthogonal parameterization using butterfly structures. We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT). By subsuming OFT as a special case, BOFT introduces a generalized orthogonal finetuning framework. Finally, we conduct an extensive empirical study of adapting large vision transformers, large language models, and text-to-image diffusion models to various downstream tasks in vision and language.
Home Code HuggingFace project URL BibTeX

Empirical Inference Conference Paper Skill or Luck? Return Decomposition via Advantage Functions Pan, H., Schölkopf, B. The Twelfth International Conference on Learning Representations (ICLR), May 2024 (Published) arXiv BibTeX

Empirical Inference Conference Paper Some Intriguing Aspects about Lipschitz Continuity of Neural Networks Khromov*, G., Singh*, S. P. The Twelfth International Conference on Learning Representations (ICLR), May 2024, *equal contribution (Published) arXiv BibTeX

Empirical Inference Conference Paper Stochastic Gradient Descent for Gaussian Processes Done Right Lin*, J. A., Padhy*, S., Antorán*, J., Tripp, A., Terenin, A., Szepesvari, C., Hernández-Lobato, J. M., Janz, D. The Twelfth International Conference on Learning Representations (ICLR), May 2024, *equal contribution (Published) arXiv BibTeX

Empirical Inference Conference Paper Targeted Reduction of Causal Models Kekić, A., Schölkopf, B., Besserve, M. ICLR 2024 Workshop on AI4DifferentialEquations In Science, May 2024 (Published) URL BibTeX

Social Foundations of Computation Conference Paper Test-Time Training on Nearest Neighbors for Large Language Models Hardt, M., Sun, Y. In The Twelfth International Conference on Learning Representations (ICLR 2024), May 2024 (Published)
Many recent efforts augment language models with retrieval, by adding retrieved data to the input context. For this approach to succeed, the retrieved data must be added at both training and test time. Moreover, as input length grows linearly with the size of retrieved data, cost in computation and memory grows quadratically for modern Transformers. To avoid these complications, we simply fine-tune the model on retrieved data at test time, using its standard training setup. We build a large-scale distributed index based on text embeddings of the Pile dataset. For each test input, our system retrieves its neighbors and fine-tunes the model on their text. Surprisingly, retrieving and training on as few as 20 neighbors, each for only one gradient iteration, drastically improves performance across more than 20 language modeling tasks in the Pile. For example, test-time training with nearest neighbors significantly narrows the performance gap between a small GPT-2 and a GPT-Neo model more than 10 times larger. Sufficient index quality and size, however, are necessary. Our work establishes a first baseline of test-time training for language modeling.
ArXiv Code URL BibTeX

Perceiving Systems Article The Poses for Equine Research Dataset (PFERD) Li, C., Mellbin, Y., Krogager, J., Polikovsky, S., Holmberg, M., Ghorbani, N., Black, M. J., Kjellström, H., Zuffi, S., Hernlund, E. Nature Scientific Data, 11, May 2024 (Published)
Studies of quadruped animal motion help us to identify diseases, understand behavior and unravel the mechanics behind gaits in animals. The horse is likely the best-studied animal in this aspect, but data capture is challenging and time-consuming. Computer vision techniques improve animal motion extraction, but the development relies on reference datasets, which are scarce, not open-access and often provide data from only a few anatomical landmarks. Addressing this data gap, we introduce PFERD, a video and 3D marker motion dataset from horses using a full-body set-up of densely placed over 100 skin-attached markers and synchronized videos from ten camera angles. Five horses of diverse conformations provide data for various motions from basic poses (eg. walking, trotting) to advanced motions (eg. rearing, kicking). We further express the 3D motions with current techniques and a 3D parameterized model, the hSMAL model, establishing a baseline for 3D horse markerless motion capture. PFERD enables advanced biomechanical studies and provides a resource of ground truth data for the methodological development of markerless motion capture.
paper DOI URL BibTeX

Haptic Intelligence Robotic Materials Miscellaneous Three-Dimensional Surface Reconstruction of a Soft System via Distributed Magnetic Sensing Sundaram, V. H., Smith, L., Turin, Z., Rentschler, M. E., Gonzalez Welker, C. Workshop paper (3 pages) presented at the ICRA Workshop on Advancing Wearable Devices and Applications Through Novel Design, Sensing, Actuation, and AI, Yokohama, Japan, May 2024 (Published)
This study presents a new method for reconstructing continuous 3D surface deformations for a soft pneumatic actuation system using embedded magnetic sensors. A finite element analysis (FEA) model was developed to quantify the surface deformation given the magnetometer readings, with a relative error between the experimental and the simulated sensor data of 7.8%. Using the FEA simulation solutions and a basic model-based mapping, our method achieves sub-millimeter accuracy in measuring deformation from sensor data with an absolute error between the experimental and simulated sensor data of 13.5%. These results show promise for real-time adjustments to deformation, crucial in environments like prosthetic and orthotic interfaces with human limbs.
URL BibTeX

Empirical Inference Conference Paper Towards Meta-Pruning via Optimal Transport Theus, A., Geimer, O., Wicke, F., Hofmann, T., Anagnostidis, S., Singh, S. P. The Twelfth International Conference on Learning Representations (ICLR), May 2024 (Published) arXiv BibTeX

Empirical Inference Conference Paper Towards Training Without Depth Limits: Batch Normalization Without Gradient Explosion Meterez*, A., Joudaki*, A., Orabona, F., Immer, A., Rätsch, G., Daneshmand, H. The Twelfth International Conference on Learning Representations (ICLR), May 2024, *equal contribution (Published) arXiv BibTeX

Empirical Inference Conference Paper Transformer Fusion with Optimal Transport Imfeld*, M., Graldi*, J., Giordano*, M., Hofmann, T., Anagnostidis, S., Singh, S. P. The Twelfth International Conference on Learning Representations (ICLR), May 2024, *equal contribution (Published) arXiv BibTeX

Social Foundations of Computation Conference Paper Unprocessing Seven Years of Algorithmic Fairness Cruz, A. F., Hardt, M. In The Twelfth International Conference on Learning Representations (ICLR 2024), May 2024 (Published)
Seven years ago, researchers proposed a postprocessing method to equalize the error rates of a model across different demographic groups. The work launched hundreds of papers purporting to improve over the postprocessing baseline. We empirically evaluate these claims through thousands of model evaluations on several tabular datasets. We find that the fairness-accuracy Pareto frontier achieved by postprocessing contains all other methods we were feasibly able to evaluate. In doing so, we address two common methodological errors that have confounded previous observations. One relates to the comparison of methods with different unconstrained base models. The other concerns methods achieving different levels of constraint relaxation. At the heart of our study is a simple idea we call unprocessing that roughly corresponds to the inverse of postprocessing. Unprocessing allows for a direct comparison of methods using different underlying models and levels of relaxation. Interpreting our findings, we recall a widely overlooked theoretical argument, present seven years ago, that accurately predicted what we observe.
ArXiv Code URL BibTeX

Autonomous Learning Conference Paper Wild Visual Navigation: Fast Traversability Learning via Pre-Trained Models and Online Self-Supervision Mattamala, M., Frey, J., Libera, P., Chebrolu, N., Martius, G., Cadena, C., Hutter, M., Fallon, M. April 2024 (Accepted)
Natural environments such as forests and grasslands are challenging for robotic navigation because of the false perception of rigid obstacles from high grass, twigs, or bushes. In this work, we present Wild Visual Navigation (WVN), an online self-supervised learning system for visual traversability estimation. The system is able to continuously adapt from a short human demonstration in the field, only using onboard sensing and computing. One of the key ideas to achieve this is the use of high-dimensional features from pre-trained self-supervised models, which implicitly encode semantic information that massively simplifies the learning task. Further, the development of an online scheme for supervision generator enables concurrent training and inference of the learned model in the wild. We demonstrate our approach through diverse real-world deployments in forests, parks, and grasslands. Our system is able to bootstrap the traversable terrain segmentation in less than 5 min of in-field training time, enabling the robot to navigate in complex, previously unseen outdoor terrains.
URL BibTeX

Empirical Inference Master Thesis Algorithmic Compositional Learning of Language Models Thomm, J. ETH Zurich, Switzerland, April 2024 (Published) BibTeX

Haptic Intelligence Robotic Materials Miscellaneous Cutaneous Electrohydraulic (CUTE) Wearable Devices for Multimodal Haptic Feedback Sanchez-Tamayo, N., Yoder, Z., Ballardini, G., Rothemund, P., Keplinger, C., Kuchenbecker, K. J. Extended abstract (1 page) presented at the IEEE RoboSoft Workshop on Multimodal Soft Robots for Multifunctional Manipulation, Locomotion, and Human-Machine Interaction, San Diego, USA, April 2024 (Published) BibTeX

Empirical Inference Miscellaneous Evidence for eccentricity in the population of binary black holes observed by LIGO-Virgo-KAGRA Gupte, N., Ramos-Buades, A., Buonanno, A., Gair, J., Miller, M. C., Dax, M., Green, S. R., Pürrer, M., Wildberger, J., Macke, J. H., Romero-Shaw, I. M., Schölkopf, B. April 2024 (Published) URL BibTeX

Social Foundations of Computation Conference Paper ImageNot: A Contrast with ImageNet Preserves Model Rankings Salaudeen, O., Hardt, M. April 2024 (Submitted)
We introduce ImageNot, a dataset designed to match the scale of ImageNet while differing drastically in other aspects. We show that key model architectures developed for ImageNet over the years rank identically when trained and evaluated on ImageNot to how they rank on ImageNet. This is true when training models from scratch or fine-tuning them. Moreover, the relative improvements of each model over earlier models strongly correlate in both datasets. We further give evidence that ImageNot has a similar utility as ImageNet for transfer learning purposes. Our work demonstrates a surprising degree of external validity in the relative performance of image classification models. This stands in contrast with absolute accuracy numbers that typically drop sharply even under small changes to a dataset.
ArXiv BibTeX

Empirical Inference Conference Paper PILLAR: How to make semi-private learning more effective Pinto, F., Hu, Y., Yang, F., Sanyal, A. 2nd IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 110-139, April 2024 (Published) DOI BibTeX

Empirical Inference Article SimReadUntil for benchmarking selective sequencing algorithms on ONT devices Mordig, M., Ratsch, G., Kahles, A. Bioinformatics, 40(5):btae199, April 2024 (Published) DOI URL BibTeX

Empirical Inference Article VIPurPCA: Visualizing and Propagating Uncertainty in Principal Component Analysis Zabel, S., Hennig, P., Nieselt, K. IEEE Transactions on Visualization and Computer Graphics, 30(4):2011-2022, April 2024 (Published) DOI BibTeX

Haptic Intelligence Miscellaneous CAPT Motor: A Strong Direct-Drive Rotary Haptic Interface Javot, B., Nguyen, V. H., Ballardini, G., Kuchenbecker, K. J. Hands-on demonstration presented at the IEEE Haptics Symposium, Long Beach, USA, April 2024 (Published)
We have designed and built a new motor named CAPT Motor that delivers continuous and precise torque. It is a brushless ironless motor using a Halbach-magnet ring and a planar axial Lorentz-coil array. This motor is unique as we use a two-phase design allowing for higher fill factor and geometrical accuracy of the coils, as they can all be made separately. This motor outperforms existing Halbach ring and cylinder motors with a torque constant per magnet volume of 9.94 (Nm/A)/dm3, a record in the field. The angular position of the rotor is measured by a high-resolution incremental optical encoder and tracked by a multimodal data acquisition device. The system's control firmware uses this angle measurement to calculate the two-phase motor currents needed to produce the torque commanded by the virtual environment at the rotor's position. The strength and precision of the CAPT Motor's torque and the lack of any mechanical transmission enable unusually high haptic rendering quality, indicating the promise of this new motor design.
URL BibTeX

Perceiving Systems Ph.D. Thesis Self- and Interpersonal Contact in 3D Human Mesh Reconstruction Müller, L. University of Tübingen, Tübingen, March 2024 (Published)
The ability to perceive tactile stimuli is of substantial importance for human beings in establishing a connection with the surrounding world. Humans rely on the sense of touch to navigate their environment and to engage in interactions with both themselves and other people. The field of computer vision has made great progress in estimating a person’s body pose and shape from an image, however, the investigation of self- and interpersonal contact has received little attention despite its considerable significance. Estimating contact from images is a challenging endeavor because it necessitates methodologies capable of predicting the full 3D human body surface, i.e. an individual’s pose and shape. The limitations of current methods become evident when considering the two primary datasets and labels employed within the community to supervise the task of human pose and shape estimation. First, the widely used 2D joint locations lack crucial information for representing the entire 3D body surface. Second, in datasets of 3D human bodies, e.g. collected from motion capture systems or body scanners, contact is usually avoided, since it naturally leads to occlusion which complicates data cleaning and can break the data processing pipelines. In this thesis, we first address the problem of estimating contact that humans make with themselves from RGB images. To do this, we introduce two novel methods that we use to create new datasets tailored for the task of human mesh estimation for poses with self-contact. We create (1) 3DCP, a dataset of 3D body scan and motion capture data of humans in poses with self-contact and (2) MTP, a dataset of images taken in the wild with accurate 3D reference data using pose mimicking. Next, we observe that 2D joint locations can be readily labeled at scale given an image, however, an equivalent label for self-contact does not exist. Consequently, we introduce (3) distrecte self-contact (DSC) annotations indicating the pairwise contact of discrete regions on the human body. We annotate three existing image datasets with discrete self-contact and use these labels during mesh optimization to bring body parts supposed to touch into contact. Then we train TUCH, a human mesh regressor, on our new datasets. When evaluated on the task of human body pose and shape estimation on public benchmarks, our results show that knowing about self-contact not only improves mesh estimates for poses with self-contact, but also for poses without self-contact. Next, we study contact humans make with other individuals during close social interaction. Reconstructing these interactions in 3D is a significant challenge due to the mutual occlusion. Furthermore, the existing datasets of images taken in the wild with ground-truth contact labels are of insufficient size to facilitate the training of a robust human mesh regressor. In this work, we employ a generative model, BUDDI, to learn the joint distribution of 3D pose and shape of two individuals during their interaction and use this model as prior during an optimization routine. To construct training data we leverage pre-existing datasets, i.e. motion capture data and Flickr images with discrete contact annotations. Similar to discrete self-contact labels, we utilize discrete human- human contact to jointly fit two meshes to detected 2D joint locations. The majority of methods for generating 3D humans focus on the motion of a single person and operate on 3D joint locations. While these methods can effectively generate motion, their representation of 3D humans is not sufficient for physical contact since they do not model the body surface. Our approach, in contrast, acts on the pose and shape parameters of a human body model, which enables us to sample 3D meshes of two people. We further demonstrate how the knowledge of human proxemics, incorporated in our model, can be used to guide an optimization routine. For this, in each optimization iteration, BUDDI takes the current mesh and proposes a refinement that we subsequently consider in the objective function. This procedure enables us to go beyond state of the art by forgoing ground-truth discrete human-human contact labels during optimization. Self- and interpersonal contact happen on the surface of the human body, however, the majority of existing art tends to predict bodies with similar, “average” body shape. This is due to a lack of training data of paired images taken in the wild and ground- truth 3D body shape and because 2D joint locations are not sufficient to explain body shape. The most apparent solution would be to collect body scans of people together with their photos. This is, however, a time-consuming and cost-intensive process that lacks scalability. Instead, we leverage the vocabulary humans use to describe body shape. First, we ask annotators to label how much a word like “tall” or “long legs” applies to a human body. We gather these ratings for rendered meshes of various body shapes, for which we have ground-truth body model shape parameters, and for images collected from model agency websites. Using this data, we learn a shape-to-attribute (A2S) model that predicts body shape ratings from body shape parameters. Then we train a human mesh regressor, SHAPY, on the model agency images wherein we supervise body shape via attribute annotations using A2S. Since no suitable test set of diverse 3D ground-truth body shape with images taken in natural settings exists, we introduce Human Bodies in the Wild (HBW). This novel dataset contains photographs of individuals together with their body scan. Our model predicts more realistic body shapes from an image and quantitatively improves body shape estimation on this new benchmark. In summary, we present novel datasets, optimization methods, a generative model, and regressors to advance the field of 3D human pose and shape estimation. Taken together, these methods open up ways to obtain more accurate and realistic 3D mesh estimates from images with multiple people in self- and mutual contact poses and with diverse body shapes. This line of research also enables generative approaches to create more natural, human-like avatars. We believe that knowing about self- and human-human contact through computer vision has wide-ranging implications in other fields as for example robotics, fitness, or behavioral science.
download Thesis DOI BibTeX

Empirical Inference Poster Koopman Spectral Analysis Uncovers the Temporal Structure of Spontaneous Neural Events Shao, K., Xu, Y., Logothetis, N., Shen, Z., Besserve, M. Computational and Systems Neuroscience Meeting (COSYNE), March 2024 (Published) URL BibTeX

Haptic Intelligence Conference Paper Expert Perception of Teleoperated Social Exercise Robots Mohan, M., Mat Husin, H., Kuchenbecker, K. J. In Companion of the ACM/IEEE International Conference on Human-Robot Interaction (HRI), 769-773, Boulder, USA, March 2024, Late-Breaking Report (LBR) (5 pages) presented at the IEEE/ACM International Conference on Human-Robot Interaction (HRI) (Published)
Social robots could help address the growing issue of physical inactivity by inspiring users to engage in interactive exercise. Nevertheless, the practical implementation of social exercise robots poses substantial challenges, particularly in terms of personalizing their activities to individuals. We propose that motion-capture-based teleoperation could serve as a viable solution to address these needs by enabling experts to record custom motions that could later be played back without their real-time involvement. To gather feedback about this idea, we conducted semi-structured interviews with eight exercise-therapy professionals. Our findings indicate that experts' attitudes toward social exercise robots become more positive when considering the prospect of teleoperation to record and customize robot behaviors.
DOI BibTeX

Neural Capture and Synthesis Conference Paper GAN-Avatar: Controllable Personalized GAN-based Human Head Avatar Kabadayi, B., Zielonka, W., Bhatnagar, B. L., Pons-Moll, G., Thies, J. In International Conference on 3D Vision (3DV), March 2024 (Published)
Digital humans and, especially, 3D facial avatars have raised a lot of attention in the past years, as they are the backbone of several applications like immersive telepresence in AR or VR. Despite the progress, facial avatars reconstructed from commodity hardware are incomplete and miss out on parts of the side and back of the head, severely limiting the usability of the avatar. This limitation in prior work stems from their requirement of face tracking, which fails for profile and back views. To address this issue, we propose to learn person-specific animatable avatars from images without assuming to have access to precise facial expression tracking. At the core of our method, we leverage a 3D-aware generative model that is trained to reproduce the distribution of facial expressions from the training data. To train this appearance model, we only assume to have a collection of 2D images with the corresponding camera parameters. For controlling the model, we learn a mapping from 3DMM facial expression parameters to the latent space of the generative model. This mapping can be learned by sampling the latent space of the appearance model and reconstructing the facial parameters from a normalized frontal view, where facial expression estimation performs well. With this scheme, we decouple 3D appearance reconstruction and animation control to achieve high fidelity in image synthesis. In a series of experiments, we compare our proposed technique to state-of-the-art monocular methods and show superior quality while not requiring expression tracking of the training data.
Video Webpage Code Arxiv BibTeX

Rationality Enhancement Article Gamification of Behavior Change: Mathematical Principle and Proof-of-Concept Study Lieder, F., Chen, P., Prentice, M., Amo, V., Tošić, M. JMIR Serious Games , 12, JMIR Publications, March 2024 (Published)
Many people want to build good habits to become healthier, live longer, or become happier but struggle to change their behavior. Gamification can make behavior change easier by awarding points for the desired behavior and deducting points for its omission.
DOI URL BibTeX

Social Foundations of Computation Article Integration of Generative AI in the Digital Markets Act: Contestability and Fairness from a Cross-Disciplinary Perspective Yasar, A. G., Chong, A., Dong, E., Gilbert, T., Hladikova, S., Mougan, C., Shen, X., Singh, S., Stoica, A., Thais, S. Workshop on Generative AI + Law (GenLaw) , LSE Legal Studies Working Paper, The Fortieth International Conference on Machine Learning (ICML) 2023 , March 2024 (Published)
The EU’s Digital Markets Act (DMA) aims to address the lack of contestability and unfair practices in digital markets. But the current framework of the DMA does not adequately cover the rapid advance of generative AI. As the EU adopts AI-specific rules and considers possible amendments to the DMA, this paper suggests that generative AI should be added to the DMA’s list of core platform services. This amendment is the first necessary step to address the emergence of entrenched and durable positions in the generative AI industry.
URL BibTeX

Empirical Inference Article Learning Graph Embeddings for Open World Compositional Zero-Shot Learning Mancini, M., Naeem, M. F., Xian, Y., Akata, Z. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(3):1545-1560, IEEE, New York, NY, March 2024 (Published) DOI BibTeX

Perceiving Systems Conference Paper Physically plausible full-body hand-object interaction synthesis Braun, J., Christen, S., Kocabas, M., Aksan, E., Hilliges, O. In International Conference on 3D Vision (3DV 2024), 3DV, March 2024 (Published)
We propose a physics-based method for synthesizing dexterous hand-object interactions in a full-body setting. While recent advancements have addressed specific facets of human-object interactions, a comprehensive physics-based approach remains a challenge. Existing methods often focus on isolated segments of the interaction process and rely on data-driven techniques that may result in artifacts. In contrast, our proposed method embraces reinforcement learning (RL) and physics simulation to mitigate the limitations of data-driven approaches. Through a hierarchical framework, we first learn skill priors for both body and hand movements in a decoupled setting. The generic skill priors learn to decode a latent skill embedding into the motion of the underlying part. A high-level policy then controls hand-object interactions in these pretrained latent spaces, guided by task objectives of grasping and 3D target trajectory following. It is trained using a novel reward function that combines an adversarial style term with a task reward, encouraging natural motions while fulfilling the task incentives. Our method successfully accomplishes the complete interaction task, from approaching an object to grasping and subsequent manipulation. We compare our approach against kinematics-based baselines and show that it leads to more physically plausible motions.
arXiv Project Page Github YouTube BibTeX

Conference Paper SimpleEgo: Predicting Probabilistic Body Pose from Egocentric Cameras Velasquez, H. C., Hewitt, C., Aliakbarian, S., Baltrušaitis, T. In International Conference on 3D Vision (3DV 2024), 3DV, March 2024 (Accepted)
Our work addresses the problem of egocentric human pose estimation from downwards-facing cameras on head-mounted devices (HMD). This presents a challenging scenario, as parts of the body often fall outside of the image or are occluded. Previous solutions minimize this problem by using fish-eye camera lenses to capture a wider view, but these can present hardware design issues. They also predict 2D heat-maps per joint and lift them to 3D space to deal with self-occlusions, but this requires large network architectures which are impractical to deploy on resource-constrained HMDs. We predict pose from images captured with conventional rectilinear camera lenses. This resolves hardware design issues, but means body parts are often out of frame. As such, we directly regress probabilistic joint rotations represented as matrix Fisher distributions for a parameterized body model. This allows us to quantify pose uncertainties and explain out-of-frame or occluded joints. This also removes the need to compute 2D heat-maps and allows for simplified DNN architectures which require less compute. Given the lack of egocentric datasets using rectilinear camera lenses, we introduce the SynthEgo dataset, a synthetic dataset with 60K stereo images containing high diversity of pose, shape, clothing and skin tone. Our approach achieves state-of-the-art results for this challenging configuration, reducing mean per-joint position error by 23% overall and 58% for the lower body. Our architecture also has eight times fewer parameters and runs twice as fast as the current state-of-the-art. Experiments show that training on our synthetic dataset leads to good generalization to real world images without fine-tuning.
Home Dataset BibTeX