Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Empirical Inference Conference Paper Flow Matching for Scalable Simulation-Based Inference Wildberger*, J. B., Dax*, M., Buchholz*, S., Green, S. R., Macke, J. H., Schölkopf, B. ICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling, July 2023, *equal contribution (Published) URL BibTeX

Empirical Inference Optics and Sensing Laboratory Software Workshop Conference Paper Glare Removal for Astronomical Images with High Local Dynamic Range Bastelaer, M., Kremer, H., Volchkov, V., Passy, J., Schölkopf, B. IEEE International Conference on Computational Photography (ICCP), 1-11, IEEE, July 2023 (Published) DOI BibTeX

Empirical Inference Conference Paper Homomorphism AutoEncoder — Learning Group Structured Representations from Observed Transitions Keurti, H., Pan, H., Besserve, M., Grewe, B. F., Schölkopf, B. Proceedings of the 40th International Conference on Machine Learning (ICML), 202:16190-16215, Proceedings of Machine Learning Research, (Editors: A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato and J. Scarlett), PMLR, July 2023 (Published) arXiv URL BibTeX

Haptic Intelligence Miscellaneous Improving Haptic Rendering Quality by Measuring and Compensating for Undesired Forces Fazlollahi, F., Taghizadeh, Z., Kuchenbecker, K. J. Work-in-progress paper (1 page) presented at the IEEE World Haptics Conference (WHC), Delft, the Netherlands, July 2023 (Published) BibTeX

Physics for Inference and Optimization Article Latent Network Models to Account for Noisy, Multiply-Reported Social Network Data. De Bacco, C., Contisciani, M., Cardoso-Silva, J., Safdari, H., Theuerkauf, D. B., Sweet, T., Young, J., Koster, J., Ross, C. T., McElreath, R., Redhead, D., Power, E. A. Journal of the Royal Statistical Society: Series A, 186(3):355-375, July 2023 (Published) Code Preprint DOI URL BibTeX

Perceiving Systems Ph.D. Thesis Learning Clothed 3D Human Models with Articulated Neural Implicit Representations Chen, X. July 2023 (Published)
3D digital humans are important for a range of applications including movie and game production, virtual and augmented reality, and human-computer interaction. However, existing industrial solutions for creating 3D digital humans rely on expensive scanning devices and intensive manual labor, preventing their broader application. To address these challenges, the research community focuses on learning 3D parametric human models from data, aiming to automatically generate realistic digital humans based on input parameters that specify pose and shape attributes. Although recent advancements have enabled the generation of faithful 3D human bodies, modeling realistic humans that include additional features such as clothing, hair, and accessories remains an open research challenge. The goal of this thesis is to develop 3D parametric human models that can generate realistic digital humans including not only human bodies but also additional features, in particular clothing. The central challenge lies in the fundamental problem of how to represent non-rigid, articulated, and topology-varying shapes. Explicit geometric representations like polygon meshes lack the flexibility needed to model varying topology between clothing and human bodies, and across different clothing styles. On the other hand, implicit representations, such as signed distance functions, are topologically flexible but do not have a robust articulation algorithm yet. To tackle this problem, we first introduce a principled algorithm that models articulation for implicit representations, in particular the recently emerging neural implicit representations which have shown impressive modeling fidelity. Our algorithm, SNARF, generalizes linear blend skinning for polygon meshes to implicit representations and can faithfully articulate implicit shapes to any pose. SNARF is fully differentiable, which enables learning skinning weights and shapes jointly from posed observations. By leveraging this algorithm, we can learn single-subject clothed human models with realistic shapes and natural deformations from 3D scans. We further improve SNARF’s efficiency with several implementation and algorithmic optimizations, including using a more compact representation of the skinning weights, factoring out redundant computations, and custom CUDA kernel implementations. Collectively, these adaptations result in a speedup of 150 times while preserving accuracy, thereby enabling the efficient learning of 3D animatable humans. Next, we go beyond single-subject modeling and tackle the more challenging task of generative modeling clothed 3D humans. By integrating our articulation module with deep generative models, we have developed a generative model capable of creating novel 3D humans with various clothing styles and identities, as well as geometric details such as wrinkles. Lastly, to eliminate the reliance on expensive 3D scans and to facilitate texture learning, we introduce a system that integrates our differentiable articulation module with differentiable volume rendering in an end-to-end manner, enabling the reconstruction of animatable 3D humans directly from 2D monocular videos. The contributions of this thesis significantly advance the realistic generation and reconstruction of clothed 3D humans and provide new tools for modeling non-rigid, articulated, and topology-varying shapes. We hope that this work will contribute to the development of 3D human modeling and pave the way for new applications in the future.
download DOI BibTeX

Empirical Inference Ph.D. Thesis Learning and Testing Powerful Hypotheses Kübler, J. M. University of Tübingen, Germany, July 2023 (Published) BibTeX

Empirical Inference Conference Paper Membership Inference Attacks against Language Models via Neighbourhood Comparison Mattern, J., Mireshghallah, F., Jin, Z., Schölkopf, B., Sachan, M., Berg-Kirkpatrick, T. Findings of the Association for Computational Linguistics (ACL), 11330-11343, (Editors: Rogers, A. and Boyd-Graber, J. L. and Okazaki, N.), Association for Computational Linguistics, July 2023 (Published) DOI BibTeX

Haptic Intelligence Intelligent Control Systems Miscellaneous Multimodal Multi-User Surface Recognition with the Kernel Two-Sample Test: Code Khojasteh, B., Solowjow, F., Trimpe, S., Kuchenbecker, K. J. Code published as a companion to the journal article "Multimodal Multi-User Surface Recognition with the Kernel Two-Sample Test" in IEEE Transactions on Automation Science and Engineering, July 2023 (Published) DOI BibTeX

Haptic Intelligence Conference Paper Naturalistic Vibrotactile Feedback Could Facilitate Telerobotic Assembly on Construction Sites Gong, Y., Javot, B., Lauer, A. P. R., Sawodny, O., Kuchenbecker, K. J. In Proceedings of the IEEE World Haptics Conference (WHC), 169-175, Delft, the Netherlands, July 2023 (Published)
Telerobotics is regularly used on construction sites to build large structures efficiently. A human operator remotely controls the construction robot under direct visual feedback, but visibility is often poor. Future construction robots that move autonomously will also require operator monitoring. Thus, we designed a wireless haptic feedback system to provide the operator with task-relevant mechanical information from a construction robot in real time. Our AiroTouch system uses an accelerometer to measure the robot end-effector's vibrations and uses off-the-shelf audio equipment and a voice-coil actuator to display them to the user with high fidelity. A study was conducted to evaluate how this type of naturalistic vibration feedback affects the observer's understanding of telerobotic assembly on a real construction site. Seven adults without construction experience observed a mix of manual and autonomous assembly processes both with and without naturalistic vibrotactile feedback. Qualitative analysis of their survey responses and interviews indicated that all participants had positive responses to this technology and believed it would be beneficial for construction activities.
DOI BibTeX

Empirical Inference Conference Paper On Data Manifolds Entailed by Structural Causal Models Dominguez-Olmedo, R., Karimi, A., Arvanitidis, G., Schölkopf, B. Proceedings of the 40th International Conference on Machine Learning (ICML), 202:8188-8201, Proceedings of Machine Learning Research, (Editors: A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato and J. Scarlett), PMLR, July 2023 (Published) URL BibTeX

Empirical Inference Conference Paper On the Identifiability and Estimation of Causal Location-Scale Noise Models Immer, A., Schultheiss, C., Vogt, J. E., Schölkopf, B., Bühlmann, P., Marx, A. Proceedings of the 40th International Conference on Machine Learning (ICML), 202:14316-14332, Proceedings of Machine Learning Research, (Editors: A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato and J. Scarlett), PMLR, July 2023 (Published) URL BibTeX

Empirical Inference Conference Paper On the Relationship Between Explanation and Prediction: A Causal View Karimi, A., Muandet, K., Kornblith, S., Schölkopf, B., Kim, B. Proceedings of the 40th International Conference on Machine Learning (ICML), 202:15861-15883, Proceedings of Machine Learning Research, (Editors: A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato and J. Scarlett), PMLR, July 2023 (Published) URL BibTeX

Empirical Inference Robust Machine Learning Conference Paper Provably Learning Object-Centric Representations Brady*, J., Zimmermann*, R. S., Sharma, Y., Schölkopf, B., von Kügelen, J., Brendel, W. Proceedings of the 40th International Conference on Machine Learning (ICML), 202:3038-3062, Proceedings of Machine Learning Research, (Editors: A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato and J. Scarlett), JMLR, Cambridge, MA, July 2023, *equal contribution (Published) URL BibTeX

Empirical Inference Conference Paper Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels Immer, A., van der Ouderaa, T. F. A., van der Wilk, M., Rätsch, G., Schölkopf, B. Proceedings of the 40th International Conference on Machine Learning (ICML), 202:14333-14352, Proceedings of Machine Learning Research, (Editors: A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato and J. Scarlett), PMLR, July 2023 (Published) URL BibTeX

Haptic Intelligence Robotics Miscellaneous Strap Tightness and Tissue Composition Both Affect the Vibration Created by a Wearable Device Rokhmanova, N., Faulkner, R., Martus, J., Fiene, J., Kuchenbecker, K. J. Work-in-progress paper (1 page) presented at the IEEE World Haptics Conference (WHC), Delft, the Netherlands, July 2023 (Published)
Wearable haptic devices can provide salient real-time feedback (typically vibration) for rehabilitation, sports training, and skill acquisition. Although the body provides many sites for such cues, the influence of the mounting location on vibrotactile mechanics is commonly ignored. This study builds on previous research by quantifying how changes in strap tightness and local tissue composition affect the physical acceleration generated by a typical vibrotactile device.
BibTeX

Empirical Inference Conference Paper Temporal Label Smoothing for Early Event Prediction Yèche*, H., Pace*, A., Rätsch, G., Kuznetsova, R. Proceedings of the 40th International Conference on Machine Learning (ICML), 202:39913-39938, Proceedings of Machine Learning Research, (Editors: A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato and J. Scarlett), PMLR, July 2023, *equal contribution (Published) arXiv URL BibTeX

Empirical Inference Conference Paper The Hessian perspective into the Nature of Convolutional Neural Networks Singh, S. P., Hofmann, T., Schölkopf, B. Proceedings of the 40th International Conference on Machine Learning (ICML), 202:31930-31968, Proceedings of Machine Learning Research, (Editors: A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato and J. Scarlett), PMLR, July 2023 (Published) URL BibTeX

Haptic Intelligence Miscellaneous The Influence of Amplitude and Sharpness on the Perceived Intensity of Isoenergetic Ultrasonic Signals Gueorguiev, D., Rohou–Claquin, B., Kuchenbecker, K. J. Work-in-progress paper (1 page) presented at the IEEE World Haptics Conference (WHC), Delft, the Netherlands, July 2023 (Published) BibTeX

Haptic Intelligence Miscellaneous Toward a Device for Reliable Evaluation of Vibrotactile Perception Ballardini, G., Kuchenbecker, K. J. Work-in-progress paper (1 page) presented at the IEEE World Haptics Conference (WHC), Delft, the Netherlands, July 2023 (Published) BibTeX

Rationality Enhancement Conference Paper Toward a normative theory of (self-)management by goal-setting Singhi, N., Mohnert, F., Prystawski, B., Lieder, F. Proceedings of the Annual Meeting of the Cognitive Science Society, Annual Meeting of the Cognitive Science Society, July 2023 (Published) DOI URL BibTeX

Haptic Intelligence Miscellaneous Vibrotactile Playback for Teaching Manual Skills from Expert Recordings Gourishetti, R., Hughes, A. G., Javot, B., Kuchenbecker, K. J. Hands-on demonstration presented at the IEEE World Haptics Conference (WHC), Delft, the Netherlands, July 2023 (Published) BibTeX

Empirical Inference Conference Paper When Does Aggregating Multiple Skills with Multi-Task Learning Work? A Case Study in Financial NLP Ni, J., Jin, Z., Wang, Q., Sachan, M., Leippold, M. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), Volume 1: Long Papers:7465-7488, (Editors: Rogers, A. and Boyd-Graber, J. L. and Okazaki, N.), Association for Computational Linguistics, July 2023 (Published) DOI BibTeX

Empirical Inference Conference Paper World Models for Math Story Problems Opedal, A., Stoehr, N., Saparov, A., Sachan, M. Findings of the Association for Computational Linguistics (ACL), 9088-9115, (Editors: Anna Rogers, Jordan L. Boyd-Graber and Naoaki Okazaki), Association for Computational Linguistics, July 2023 (Published) URL BibTeX

Empirical Inference Conference Paper ALERT: Adapt Language Models to Reasoning Tasks Yu, P., Wang, T., Golovneva, O., AlKhamissi, B., Verma, S., Jin, Z., Ghosh, G., Diab, M., Celikyilmaz, A. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), 1:1055-1081, (Editors: Rogers, Anna and Boyd-Graber, Jordan and Okazaki, Naoaki), Association for Computational Linguistics, July 2023 (Published) DOI URL BibTeX

Haptic Intelligence Miscellaneous CAPT Motor: A Strong Direct-Drive Haptic Interface Javot, B., Nguyen, V. H., Ballardini, G., Kuchenbecker, K. J. Hands-on demonstration presented at the IEEE World Haptics Conference (WHC), Delft, the Netherlands, July 2023 (Published) BibTeX

Robotic Materials Patent Capacitive Self-Sensing for Electrostatic Transducers with High Voltage Isolation Correll, N., Ly, K. D., Kellaris, N. A., Keplinger, C. M. (US Patent App. 17/928,453), June 2023
Transducer systems disclosed herein include self-sensing capabilities. In particular, electrostatic transducers include a low voltage electrode and a high voltage electrode. A low voltage sensing unit is coupled with the low voltage electrode of the electrostatic transducer. The low voltage sensing unit is configured to measure a capacitance of the electrostatic transducer, from which displacement of the electrostatic transducer may be calculated. High voltage drive signals received by the high voltage electrode during actuation may be isolated from the low voltage sensing unit. The isolation may be provided by dielectric material of the electrostatic transducer, a voltage suppression component, and/or a voltage suppression module comprising a low impedance ground path. In the event of an electrical failure of the transducer, the low voltage sensing unit may be isolated from high voltages.
URL BibTeX

Neural Capture and Synthesis Conference Paper High-Res Facial Appearance Capture from Polarized Smartphone Images Azinovic, D. M. O. H. C. N. M. T. J. In Proceedings 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 16836-16846, Vancouver, CA, CVPR, June 2023 (Published) DOI URL BibTeX

Deep Models and Optimization Conference Paper Resurrecting Recurrent Neural Networks for Long Sequences Orvieto, A., Smith, S. L., Gu, A., Fernando, A., Gulcehre, C., Pascanu, R., De, S. In Proceedings of the Eleventh International Conference on Learning Representations, ICLR, June 2023 (Published) URL BibTeX

Rationality Enhancement Article A Computational Process-Tracing Method for Measuring People’s Planning Strategies and How They Change Over Time Jain, Y. R., Callaway, F., Griffiths, T. L., Dayan, P., He, R., Krueger, P. M., Lieder, F. Behavior Research Methods, 55:20377-2079, June 2023 (Published)
One of the most unique and impressive feats of the human mind is its ability to discover and continuouslyrefine its own cognitive strategies. Elucidating the underlying learning and adaptation mechanisms is verydifficult because changes in cognitive strategies are not directly observable. One important domain in whichstrategies and mechanisms are studied is planning. To enable researchers to uncover how people learn howto plan, we offer a tutorial introduction to a recently developed process-tracing paradigm along with a newcomputational method for inferring people’s planning strategies and their changes over time from the resultingprocess-tracing data. Our method allows researchers to reveal experience-driven changes in people’s choice ofindividual planning operations, planning strategies, strategy types, and the relative contributions of differentdecision systems. We validate our method on simulated and empirical data. On simulated data, its inferencesabout the strategies and the relative influence of different decision systems are accurate. When evaluated on human data generated using our process-tracing paradigm, our computational method correctly detects theplasticity-enhancing effect of feedback and the effect of the structure of the environment on people’s planningstrategies. Together, these methods can be used to investigate the mechanisms of cognitive plasticity and toelucidate how people acquire complex cognitive skills such as planning and problem-solving. Importantly, ourmethods can also be used to measure individual differences in cognitive plasticity and examine how differenttypes (pedagogical) interventions affect the acquisition of cognitive skills.
DOI URL BibTeX

Embodied Vision Learning and Dynamical Systems Empirical Inference Conference Paper Black-Box vs. Gray-Box: A Case Study on Learning Table Tennis Ball Trajectory Prediction with Spin and Impacts Achterhold, J., Tobuschat, P., Ma, H., Büchler, D., Muehlebach, M., Stueckler, J. In Conference on Learning for Dynamics and Control, 211:878-890, Proceedings of Machine Learning Research, (Editors: Nikolai Matni, Manfred Morari and George J. Pappa), PMLR, June 2023 (Published) preprint code URL BibTeX

Empirical Inference Conference Paper Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification Kim, Y., Kim, J. M., Jeong, J., Schmid, C., Akata, Z., Lee, J. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3408-3417, IEEE, June 2023 (Published) DOI URL BibTeX

Empirical Inference Article Classifying the unknown: Insect identification with deep hierarchical Bayesian learning Badirli, S., Picard, C. J., Mohler, G., Richert, F., Akata, Z., Dundar, M. Methods in Ecology and Evolution, 14(6):1515-1530, June 2023 (Published) DOI BibTeX

Perceiving Systems Conference Paper Detecting Human-Object Contact in Images Chen, Y., Kumar Dwivedi, S., Black, M. J., Tzionas, D. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 17100-17110, CVPR, June 2023 (Published)
Humans constantly contact objects to move and perform tasks. Thus, detecting human-object contact is important for building human-centered artificial intelligence. However, there exists no robust method to detect contact between the body and the scene from an image, and there exists no dataset to learn such a detector. We fill this gap with HOT ("Human-Object conTact"), a new dataset of human-object contacts for images. To build HOT, we use two data sources: (1) We use the PROX dataset of 3D human meshes moving in 3D scenes, and automatically annotate 2D image areas for contact via 3D mesh proximity and projection. (2) We use the V-COCO, HAKE and Watch-n-Patch datasets, and ask trained annotators to draw polygons for the 2D image areas where contact takes place. We also annotate the involved body part of the human body. We use our HOT dataset to train a new contact detector, which takes a single color image as input, and outputs 2D contact heatmaps as well as the body-part labels that are in contact. This is a new and challenging task that extends current foot-ground or hand-object contact detectors to the full generality of the whole body. The detector uses a part-attention branch to guide contact estimation through the context of the surrounding body parts and scene. We evaluate our detector extensively, and quantitative results show that our model outperforms baselines, and that all components contribute to better performance. Results on images from an online repository show reasonable detections and generalizability.
Project Page Paper Code DOI URL BibTeX

Empirical Inference Conference Paper Editing a Woman’s Voice Costello, A., Fedorova, E., Jin, Z., Mihalcea, R. International Conference on the Science of Science and Innovation (ICSSI), June 2023 (Published) URL BibTeX

Haptic Intelligence Article Generating Clear Vibrotactile Cues with a Magnet Embedded in a Soft Finger Sheath Gertler, I., Serhat, G., Kuchenbecker, K. J. Soft Robotics, 10(3):624-635, June 2023 (Published)
Haptic displays act on the user's body to stimulate the sense of touch and enrich applications from gaming and computer-aided design to rehabilitation and remote surgery. However, when crafted from typical rigid robotic components, they tend to be heavy, bulky, and expensive, while sleeker designs often struggle to create clear haptic cues. This article introduces a lightweight wearable silicone finger sheath that can deliver salient and rich vibrotactile cues using electromagnetic actuation. We fabricate the sheath on a ferromagnetic mandrel with a process based on dip molding, a robust fabrication method that is rarely used in soft robotics but is suitable for commercial production. A miniature rare-earth magnet embedded within the silicone layers at the center of the finger pad is driven to vibrate by the application of alternating current to a nearby air-coil. Experiments are conducted to determine the amplitude of the magnetic force and the frequency response function for the displacement amplitude of the magnet perpendicular to the skin. In addition, high-fidelity finite element analyses of the finger wearing the device are performed to investigate the trends observed in the measurements. The experimental and simulated results show consistent dynamic behavior from 10 to 1000 Hz, with the displacement decreasing after about 300 Hz. These results match the detection threshold profile obtained in a psychophysical study performed by 17 users, where more current was needed only at the highest frequency. A cue identification experiment and a demonstration in virtual reality validate the feasibility of this approach to fingertip haptics.
DOI BibTeX

Perceiving Systems Conference Paper Generating Holistic 3D Human Motion from Speech Yi, H., Liang, H., Liu, Y., Cao, Q., Wen, Y., Bolkart, T., Tao, D., Black, M. J. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 469-480, CVPR, June 2023 (Published)
This work addresses the problem of generating 3D holistic body motions from human speech. Given a speech recording, we synthesize sequences of 3D body poses, hand gestures, and facial expressions that are realistic and diverse. To achieve this, we first build a high-quality dataset of 3D holistic body meshes with synchronous speech. We then define a novel speech-to-motion generation framework in which the face, body, and hands are modeled separately. The separated modeling stems from the fact that face articulation strongly correlates with human speech, while body poses and hand gestures are less correlated. Specifically, we employ an autoencoder for face motions, and a compositional vector-quantized variational autoencoder (VQ-VAE) for the body and hand motions. The compositional VQ-VAE is key to generating diverse results. Additionally, we propose a cross-conditional autoregressive model that generates body poses and hand gestures, leading to coherent and realistic motions. Extensive experiments and user studies demonstrate that our proposed approach achieves state-of-the-art performance both qualitatively and quantitatively. Our novel dataset and code are released for research purposes at https://talkshow.is.tue.mpg.de.
project SHOW code TalkSHOW code arXiv paper BibTeX

Perceiving Systems Conference Paper High-Fidelity Clothed Avatar Reconstruction from a Single Image Liao, T., Zhang, X., Xiu, Y., Yi, H., Liu, X., Qi, G., Zhang, Y., Wang, X., Zhu, X., Lei, Z. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 8662-8672, CVPR, June 2023 (Published)
This paper presents a framework for efficient 3D clothed avatar reconstruction. By combining the advantages of the high accuracy of optimization-based methods and the efficiency of learning-based methods, we propose a coarse-to-fine way to realize a high-fidelity clothed avatar reconstruction (CAR) from a single image. At the first stage, we use an implicit model to learn the general shape in the canonical space of a person in a learning-based way, and at the second stage, we refine the surface detail by estimating the non-rigid deformation in the posed space in an optimization way. A hyper-network is utilized to generate a good initialization so that the convergence of the optimization process is greatly accelerated. Extensive experiments on various datasets show that the proposed CAR successfully produces high-fidelity avatars for arbitrarily clothed humans in real scenes.
Code Paper Homepage Youtube URL BibTeX

Haptic Intelligence Article In the Arms of a Robot: Designing Autonomous Hugging Robots with Intra-Hug Gestures Block, A. E., Seifi, H., Hilliges, O., Gassert, R., Kuchenbecker, K. J. ACM Transactions on Human-Robot Interaction, 12(2):1-49, June 2023, Special Issue on Designing the Robot Body: Critical Perspectives on Affective Embodied Interaction (Published)
Hugs are complex affective interactions that often include gestures like squeezes. We present six new guidelines for designing interactive hugging robots, which we validate through two studies with our custom robot. To achieve autonomy, we investigated robot responses to four human intra-hug gestures: holding, rubbing, patting, and squeezing. Thirty-two users each exchanged and rated sixteen hugs with an experimenter-controlled HuggieBot 2.0. The robot's inflated torso's microphone and pressure sensor collected data of the subjects' demonstrations that were used to develop a perceptual algorithm that classifies user actions with 88\% accuracy. Users enjoyed robot squeezes, regardless of their performed action, they valued variety in the robot response, and they appreciated robot-initiated intra-hug gestures. From average user ratings, we created a probabilistic behavior algorithm that chooses robot responses in real time. We implemented improvements to the robot platform to create HuggieBot 3.0 and then validated its gesture perception system and behavior algorithm with sixteen users. The robot's responses and proactive gestures were greatly enjoyed. Users found the robot more natural, enjoyable, and intelligent in the last phase of the experiment than in the first. After the study, they felt more understood by the robot and thought robots were nicer to hug.
DOI BibTeX

Perceiving Systems Conference Paper Instant Multi-View Head Capture through Learnable Registration Bolkart, T., Li, T., Black, M. J. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 768-779, CVPR, June 2023 (Published)
Existing methods for capturing datasets of 3D heads in dense semantic correspondence are slow, and commonly address the problem in two separate steps; multi-view stereo (MVS) reconstruction followed by non-rigid registration. To simplify this process, we introduce TEMPEH (Towards Estimation of 3D Meshes from Performances of Expressive Heads) to directly infer 3D heads in dense correspondence from calibrated multi-view images. Registering datasets of 3D scans typically requires manual parameter tuning to find the right balance between accurately fitting the scans’ surfaces and being robust to scanning noise and outliers. Instead, we propose to jointly register a 3D head dataset while training TEMPEH. Specifically, during training we minimize a geometric loss commonly used for surface registration, effectively leveraging TEMPEH as a regularizer. Our multi-view head inference builds on a volumetric feature representation that samples and fuses features from each view using camera calibration information. To account for partial occlusions and a large capture volume that enables head movements, we use view- and surface-aware feature fusion, and a spatial transformer-based head localization module, respectively. We use raw MVS scans as supervision during training, but, once trained, TEMPEH directly predicts 3D heads in dense correspondence without requiring scans. Predicting one head takes about 0.3 seconds with a median reconstruction error of 0.26 mm, 64% lower than the current state-of-the-art. This enables the efficient capture of large datasets containing multiple people and diverse facial motions. Code, model, and data are publicly available at https://tempeh.is.tue.mpg.de.
project video paper sup. mat. poster BibTeX

Neural Capture and Synthesis Perceiving Systems Conference Paper Instant Volumetric Head Avatars Zielonka, W., Bolkart, T., Thies, J. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 4574-4584, CVPR, June 2023 (Published)
We present Instant Volumetric Head Avatars (INSTA),a novel approach for reconstructing photo-realistic digital avatars instantaneously. INSTA models a dynamic neural radiance field based on neural graphics primitives embedded around a parametric face model. Our pipeline is trained on a single monocular RGB portrait video that observes the subject under different expressions and views. While state-of-the-art methods take up to several days to train an avatar, our method can reconstruct a digital avatar in less than 10 minutes on modern GPU hardware, which is orders of magnitude faster than previous solutions. In addition, it allows for the interactive rendering of novel poses and expressions. By leveraging the geometry prior of the underlying parametric face model, we demonstrate that INSTA extrapolates to unseen poses. In quantitative and qualitative studies on various subjects, INSTA outperforms state-of-the-art methods regarding rendering quality and training time.
pdf project video code face tracker code dataset DOI URL BibTeX