Publications

Perceiving Systems Ph.D. Thesis Whole-Body Motion Capture and Beyond: From Model-Based Inference to Learning-Based Regression Huang, Y. University of Tübingen, December 2022 (Published)

Abstract ›

Though effective and successful, traditional marker-less Motion Capture (MoCap) methods suffer from several limitations: 1) they presume a character-specific body model, thus they do not permit a fully automatic pipeline and generalization over diverse body shapes; 2) no objects humans interact with are tracked, while in reality interaction between humans and objects is ubiquitous; 3) they heavily rely on a sophisticated optimization process, which needs a good initialization and strong priors. This process can be slow. We address all the aforementioned issues in this thesis, as described below. Firstly we propose a fully automatic method to accurately reconstruct a 3D human body from multi-view RGB videos, the typical setup for MoCap systems. We pre-process all RGB videos to obtain 2D keypoints and silhouettes. Then we fit the SMPL body model into the 2D measurements in two successive stages. In the first stage, the shape and pose parameters of SMPL are estimated frame-wise sequentially. In the second stage, a batch of frames are refined jointly with an extra DCT prior. Our method can naturally handle different body shapes and challenging poses without human intervention. Then we extend this system to support tracking of rigid objects the subjects interact with. Our setup consists of 6 Azure Kinect cameras. Firstly we pre-process all the videos by segmenting humans and objects and detecting 2D body joints. We adopt the SMPL-X model here to capture body and hand pose. The model is fitted to 2D keypoints and point clouds. Then the body poses and object poses are jointly updated with contact and interpenetration constraints. With this approach, we capture a novel human-object interaction dataset with natural RGB images and plausible body and object motion information. Lastly, we present the first practical and lightweight MoCap system that needs only 6 IMUs. Our approach is based on Bi-directional RNNs. The network can make use of temporal information by jointly reasoning about past and future IMU measurements. To handle the data scarcity issue, we create synthetic data from archival MoCap data. Overall, our system runs ten times faster than traditional optimization-based methods, and is numerically more accurate. We also show it is feasible to estimate which activity the subject is doing by only observing the IMU measurement from a smartwatch worn by the subject. This not only can be useful for a high-level semantic understanding of the human behavior, but also alarms the public of potential privacy concerns. In summary, we advance marker-less MoCap by contributing the first automatic yet accurate system, extending the MoCap methods to support rigid object tracking, and proposing a practical and lightweight algorithm via 6 IMUs. We believe our work makes marker-less and IMUs-based MoCap cheaper and more practical, thus closer to end-users for daily usage.

download Thesis DOI BibTeX

Haptic Intelligence Autonomous Learning Empirical Inference Miscellaneous A Sequential Group VAE for Robot Learning of Haptic Representations Richardson, B. A., Kuchenbecker, K. J., Martius, G. 1-11, Workshop paper (8 pages) presented at the CoRL Workshop on Aligning Robot Representations with Humans, Auckland, New Zealand, December 2022 (Published)

Abstract ›

Haptic representation learning is a difficult task in robotics because information can be gathered only by actively exploring the environment over time, and because different actions elicit different object properties. We propose a Sequential Group VAE that leverages object persistence to learn and update latent general representations of multimodal haptic data. As a robot performs sequences of exploratory procedures on an object, the model accumulates data and learns to distinguish between general object properties, such as size and mass, and trial-to-trial variations, such as initial object position. We demonstrate that after very few observations, the general latent representations are sufficiently refined to accurately encode many haptic object properties.

URL BibTeX

Empirical Inference Article A survey of algorithmic recourse: contrastive explanations and consequential recommendations Karimi, A., Barthe, G., Schölkopf, B., Valera, I. ACM Computing Surveys, 55(5), Association for Computing Machinery (ACM), December 2022 (Published) arXiv DOI URL BibTeX

Empirical Inference Conference Paper Active Bayesian Causal Inference Toth, C., Lorch, L., Knoll, C., Krause, A., Pernkopf, F., Peharz*, R., von Kügelgen*, J. Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 35:16261-16275, (Editors: S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh), Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems (NeurIPS 2022), December 2022, *shared last author (Published) arXiv URL BibTeX

Empirical Inference Conference Paper Amortized Inference for Causal Structure Learning Lorch, L., Sussex, S., Rothfuss, J., Krause, A., Schölkopf, B. Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 35:13104-13118, (Editors: S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh), Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems, December 2022 (Published) arXiv URL BibTeX

Physics for Inference and Optimization Article Anomaly detection and community detection in networks Safdari, H., De Bacco, C. Journal of Big Data, 9, 122, December 2022 (Published) Preprint Code Published version DOI BibTeX

Empirical Inference Conference Paper AutoML Two-Sample Test Kübler, J. M., Stimper, V., Buchholz, S., Muandet, K., Schölkopf, B. Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 35:15929-15941, (Editors: S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh), Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems, December 2022 (Published) arXiv URL BibTeX

Perceiving Systems Conference Paper Capturing and Animation of Body and Clothing from Monocular Video Feng, Y., Yang, J., Pollefeys, M., Black, M. J., Bolkart, T. In Proceedings SIGGRAPH Asia 2022 Conference Papers Proceedings (SA ’22 2022) , Association for Computing Machinery, New York, NY, SIGGRAPH Asia 2022 (SA '22) , December 2022 (Published)

Abstract ›

We propose SCARF (Segmented Clothed Avatar Radiance Field), a hybrid model combining a mesh-based body with a neural radiance field. Integrating the mesh into the volumetric rendering in combination with a differentiable rasterizer enables us to optimize SCARF directly from monocular videos, without any 3D supervision. The hybrid modeling enables SCARF to (i) animate the clothed body avatar by changing body poses (including hand articulation and facial expressions), (ii) synthesize novel views of the avatar, and (iii) transfer clothing between avatars in virtual try-on applications. We demonstrate that SCARF reconstructs clothing with higher visual quality than existing methods, that the clothing deforms with changing body pose and body shape, and that clothing can be successfully transferred between avatars of different subjects.

project code pdf DOI URL BibTeX

Empirical Inference Conference Paper Causal Discovery in Heterogeneous Environments Under the Sparse Mechanism Shift Hypothesis Perry, R., von Kügelgen*, J., Schölkopf*, B. Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 10904-10917, (Editors: S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh), Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems (NeurIPS 2022), December 2022, *shared last author (Published) arXiv URL BibTeX

Physical Intelligence Article Control of Two-Degree-of-Freedom Inertial Appendages of a Small-Scale Jumping Robot for Enhanced Terrestrial and Aerial Maneuverability Hong, C., Tang, D., Quan, Q., Cao, Z., Wang, C., Sitti, M., Deng, Z. IEEE/ASME Transactions on Mechatronics, 28(3):1754-1765, December 2022 DOI URL BibTeX

Autonomous Learning Conference Paper Curious Exploration via Structured World Models Yields Zero-Shot Object Manipulation Sancaktar, C., Blaes, S., Martius, G. In Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 24170-24183 , Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems (NeurIPS 2022), December 2022 (Published) Arxiv Videos Openreview URL BibTeX

Physical Intelligence Patent DRY ADHESIVES AND METHODS FOR MAKING DRY ADHESIVES Metin Sitti, M. M. B. A. December 2022, US Patent App. 17/895,334, 2022 BibTeX

Empirical Inference Conference Paper Decoding Attention from Gaze: A Benchmark Dataset and End-to-End Models Uppal, K., Kim, J., Singh, S. Proceedings of The 1st Gaze Meets ML workshop in conjunction with NeurIPS 2022, 210:219-240, Proceedings of Machine Learning Research, (Editors: Lourentzou, Ismini and Wu, Joy and Kashyap, Satyananda and Karargyris, Alexandros and Celi, Leo Anthony and Kawas, Ban and Talathi, Sachin), PMLR, December 2022 (Published) PDF URL BibTeX

Empirical Inference Conference Paper Differentially Private Language Models for Secure Data Sharing Mattern, J., Jin, Z., Weggenmann, B., Schölkopf, B., Sachan, M. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), 4860-4873, (Editors: Yoav Goldberg and Zornitsa Kozareva and Yue Zhang), Association for Computational Linguistics, The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022) , December 2022 (Published) arXiv DOI URL BibTeX

Empirical Inference Conference Paper Direct Advantage Estimation Pan, H., Gürtler, N., Neitz, A., Schölkopf, B. Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 35:11869-11880, (Editors: S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh), Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems, December 2022, *also at 15th European Workshop on Reinforcement Learning (EWRL 2022) (Published) arXiv URL BibTeX

Empirical Inference Conference Paper Efficient identification of informative features in simulation-based inference Beck, J., Deistler, M., Bernaerts, Y., Macke, J. H., Berens, P. Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 35:19260-19273, (Editors: S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh), Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems, December 2022 (Published) URL BibTeX

Empirical Inference Autonomous Learning Robust Machine Learning Conference Paper Embrace the Gap: VAEs Perform Independent Mechanism Analysis Reizinger*, P., Gresele*, L., Brady*, J., von Kügelgen, J., Zietlow, D., Schölkopf, B., Martius, G., Brendel, W., Besserve, M. Advances in Neural Information Processing Systems (NeurIPS 2022), 35:12040-12057, (Editors: S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh), Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems, December 2022, *equal first authorship (Published) Arxiv PDF URL BibTeX

Empirical Inference Conference Paper Exploring the Latent Space of Autoencoders with Interventional Assays Leeb, F., Bauer, S., Besserve, M., Schölkopf, B. Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 35:21562-21574, (Editors: S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh), Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems, December 2022 (Published) arXiv URL BibTeX

Empirical Inference Conference Paper Function Classes for Identifiable Nonlinear Independent Component Analysis Buchholz, S., Besserve, M., Schölkopf, B. Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 35:16946-16961, (Editors: S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh), Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems, December 2022 (Published) arXiv URL BibTeX

Empirical Inference Article Generalized Few-Shot Video Classification With Video Retrieval and Feature Generation Xian, Y., Korbar, B., Douze, M., Torresani, L., Schiele, B., Akata, Z. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12):8949-8961, December 2022 (Published) DOI URL BibTeX

Empirical Inference Conference Paper Interventions, Where and How? Experimental Design for Causal Models at Scale Tigas, P., Annadani, Y., Jesson, A., Schölkopf, B., Gal, Y., Bauer, S. Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 35:24130-24143, (Editors: S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh), Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems, December 2022 (Published) arXiv URL BibTeX

Empirical Inference Conference Paper Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations Immer, A., van der Ouderaa, T. F. A., Rätsch, G., Fortuin, V., van der Wilk, M. Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 35:12449-12463, (Editors: S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh), Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems, December 2022 (Published) arXiv URL BibTeX

Autonomous Learning Conference Paper Learning Agile Skills via Adversarial Imitation of Rough Partial Demonstrations Li, C., Vlastelica, M., Blaes, S., Frey, J., Grimminger, F., Martius, G. Proceedings of the 6th Conference on Robot Learning (CoRL), Conference on Robot Learning (CoRL), December 2022 (Accepted)

Abstract ›

Learning agile skills is one of the main challenges in robotics. To this end, reinforcement learning approaches have achieved impressive results. These methods require explicit task information in terms of a reward function or an expert that can be queried in simulation to provide a target control output, which limits their applicability. In this work, we propose a generative adversarial method for inferring reward functions from partial and potentially physically incompatible demonstrations for successful skill acquirement where reference or expert demonstrations are not easily accessible. Moreover, we show that by using a Wasserstein GAN formulation and transitions from demonstrations with rough and partial information as input, we are able to extract policies that are robust and capable of imitating demonstrated behaviors. Finally, the obtained skills such as a backflip are tested on an agile quadruped robot called Solo 8 and present faithful replication of hand-held human demonstrations.

Arxiv Videos Project URL BibTeX

Empirical Inference Conference Paper Learning Random Feature Dynamics for Uncertainty Quantification Agudelo-España, D., Nemmour, Y., Schölkopf, B., Zhu, J. 2022 IEEE 61st IEEE Conference on Decision and Control (CDC), 4937-4944, IEEE, 61st IEEE Conference on Decision and Control (CDC 2022), December 2022 (Published) PDF DOI URL BibTeX

Empirical Inference Autonomous Learning Conference Paper Learning with Muscles: Benefits for Data-Efficiency and Robustness in Anthropomorphic Tasks Wochner, I., Schumacher, P., Martius, G., Büchler, D., Schmitt, S., Haeufle, D. Proceedings of the 6th Conference on Robot Learning (CoRL), 205:1178-1188, Proceedings of Machine Learning Research, (Editors: Liu, Karen and Kulic, Dana and Ichnowski, Jeff), PMLR, December 2022 (Published) URL BibTeX

Empirical Inference Conference Paper Logical Fallacy Detection Jin, Z., Lalwani, A., Vaidhya, T., Shen, X., Ding, Y., Lyu, Z., Sachan, M., Mihalcea, R., Schölkopf, B. Findings of the Association for Computational Linguistics: EMNLP 2022, 7180-7198, (Editors: Goldberg, Yoav and Kozareva, Zornitsa and Zhang, Yue), Association for Computational Linguistics, The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022) , December 2022 (Published) arXiv DOI URL BibTeX

Empirical Inference Conference Paper Maximum Mean Discrepancy Distributionally Robust Nonlinear Chance-Constrained Optimization with Finite-Sample Guarantee Nemmour, Y., Kremer, H., Schölkopf, B., Zhu, J. IEEE 61st Conference on Decision and Control (CDC), 5660-5667, December 2022 (Published) DOI BibTeX

Haptic Intelligence Ph.D. Thesis Multi-Timescale Representation Learning of Human and Robot Haptic Interactions Richardson, B. University of Stuttgart, Stuttgart, Germany, December 2022, Faculty of Computer Science, Electrical Engineering and Information Technology (Published)

Abstract ›

The sense of touch is one of the most crucial components of the human sensory system. It allows us to safely and intelligently interact with the physical objects and environment around us. By simply touching or dexterously manipulating an object, we can quickly infer a multitude of its properties. For more than fifty years, researchers have studied how humans physically explore and form perceptual representations of objects. Some of these works proposed the paradigm through which human haptic exploration is presently understood: humans use a particular set of exploratory procedures to elicit specific semantic attributes from objects. Others have sought to understand how physically measured object properties correspond to human perception of semantic attributes. Few, however, have investigated how specific explorations are perceived. As robots become increasingly advanced and more ubiquitous in daily life, they are beginning to be equipped with haptic sensing capabilities and algorithms for processing and structuring haptic information. Traditional haptics research has so far strongly influenced the introduction of haptic sensation and perception into robots but has not proven sufficient to give robots the necessary tools to become intelligent autonomous agents. The work presented in this thesis seeks to understand how single and sequential haptic interactions are perceived by both humans and robots. In our first study, we depart from the more traditional methods of studying human haptic perception and investigate how the physical sensations felt during single explorations are perceived by individual people. We treat interactions as probability distributions over a haptic feature space and train a model to predict how similarly a pair of surfaces is rated, predicting perceived similarity with a reasonable degree of accuracy. Our novel method also allows us to evaluate how individual people weigh different surface properties when they make perceptual judgments. The method is highly versatile and presents many opportunities for further studies into how humans form perceptual representations of specific explorations. Our next body of work explores how to improve robotic haptic perception of single interactions. We use unsupervised feature-learning methods to derive powerful features from raw robot sensor data and classify robot explorations into numerous haptic semantic property labels that were assigned from human ratings. Additionally, we provide robots with more nuanced perception by learning to predict graded ratings of a subset of properties. Our methods outperform previous attempts that all used hand-crafted features, demonstrating the limitations of such traditional approaches. To push robot haptic perception beyond evaluation of single explorations, our final work introduces and evaluates a method to give robots the ability to accumulate information over many sequential actions; our approach essentially takes advantage of object permanence by conditionally and recursively updating the representation of an object as it is sequentially explored. We implement our method on a robotic gripper platform that performs multiple exploratory procedures on each of many objects. As the robot explores objects with new procedures, it gains confidence in its internal representations and classification of object properties, thus moving closer to the marvelous haptic capabilities of humans and providing a solid foundation for future research in this domain.

URL BibTeX

Physical Intelligence Medical Systems Article Multifunctional 3D-Printed Pollen Grain-Inspired Hydrogel Microrobots for On-Demand Anchoring and Cargo Delivery Lee, Y., Kim, J., Bozuyuk, U., Dogan, N. O., Khan, M. T. A., Shiva, A., Wild, A., Sitti, M. Advanced Materials, 35(10):2209812, December 2022 (Published) DOI BibTeX

Empirical Inference Conference Paper Neural Attentive Circuits Weiss*, M., Rahaman*, N., Locatello, F., Pal, C., Bengio, Y., Schölkopf, B., Li, E. L., Ballas, N. Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 35:7741-7754, (Editors: S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh), Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems, December 2022, *equal contribution (Published) arXiv URL BibTeX

Empirical Inference Conference Paper Optimal Binary Classification Beyond Accuracy Singh, S., Khim, J. Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 18226-18240, (Editors: S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh), Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems (NeurIPS 2022), December 2022 (Published) arXiv URL BibTeX

Dynamic Locomotion Article Physically Modelling Fluid- and Soft-tissue Mechanics of Lumbosacral Intraspinal Mechanosensing in Avians Mo, A., Kamska, V., Bribiesca-Contreras, F., Hauptmann, J., Daley, M., Badri-Spröwitz, A. arxiv, December 2022 (Submitted)

Abstract ›

The lumbosacral organ (LSO) is a lumbosacral spinal canal morphology that is universally and uniquely found in birds. Recent studies suggested an intraspinal mechanosensor function that relies on the compliant motion of soft tissue in the spinal cord fluid. It has not yet been possible to observe LSO soft tissue motion in vivo due to limitations of imaging technologies. As an alternative approach, we developed an artificial biophysical model of the LSO, and characterize the dynamic responses of this model when entrained by external motion. The parametric model incorporates morphological and material properties of the LSO. We varied the model's parameters to study the influence of individual features on the system response. We characterized the system in a locomotion simulator, producing vertical oscillations similar to the trunk motions. We show how morphological and material properties effectively shape the system's oscillation characteristics. We conclude that external oscillations could entrain the soft tissue of the intraspinal lumbosacral organ during locomotion, consistent with recently proposed sensing mechanisms.

URL BibTeX

Empirical Inference Conference Paper Posterior Refinement Improves Sample Efficiency in Bayesian Neural Networks Kristiadi, A., Eschenhagen, R., Hennig, P. Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 30333-30346, (Editors: S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh), Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems, December 2022 (Published) arXiv URL BibTeX

Empirical Inference Conference Paper Posterior and Computational Uncertainty in Gaussian Processes Wenger, J., Pleiss, G., Pförtner, M., Hennig, P., Cunningham, J. P. Advances in Neural Information Processing Systems 35 , 10876-10890, (Editors: S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh), Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems (NeurIPS 2022), December 2022 (Published) arXiv URL BibTeX

Empirical Inference Conference Paper Probable Domain Generalization via Quantile Risk Minimization Eastwood, C., Robey, A., Singh, S., von Kügelgen, J., Hassani, H., Pappas, G. J., Schölkopf, B. Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 35:17340-17358, (Editors: S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh), Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems, December 2022 (Published) arXiv URL BibTeX

Empirical Inference Proceedings Proceedings of the Second Workshop on NLP for Positive Impact (NLP4PI) Biester, L., Demszky, D., Jin, Z., Sachan, M., Tetreault, J., Wilson, S., Xiao, L., Zhao, J. Association for Computational Linguistics, December 2022 (Published) URL BibTeX

Physical Intelligence Article Programmable aniso-electrodeposited modular hydrogel microrobots Zheng, Z., Wang, H., Demir, S. O., Huang, Q., Fukuda, T., Sitti, M. Science Advances, 8(50):eade6135, December 2022 (Published) DOI BibTeX

Empirical Inference Autonomous Learning Conference Paper Real Robot Challenge 2022: Learning Dexterous Manipulation from Offline Data in the Real World Gürtler, N., Widmaier, F., Sancaktar, C., Blaes, S., Kolev, P., Bauer, S., Wüthrich, M., Wulfmeier, M., Riedmiller, M., Allshire, A., Wang, Q., McCarthy, R., Kim, H., Baek, J., Kwon, W., Qian, S., Toshimitsu, Y., Michelis, M. Y., Kazemipour, A., Raayatsanati, A., et al. Proceedings of the NeurIPS 2022 Competitions Track, 220:133-150, Proceedings of Machine Learning Research, (Editors: Ciccone, Marco and Stolovitzky, Gustavo and Albrecht, Jacob), PMLR, December 2022 (Published) URL BibTeX

Perceiving Systems Ph.D. Thesis Reconstructing Expressive 3D Humans from RGB Images Choutas, V. ETH Zurich, Max Planck Institute for Intelligent Systems and ETH Zurich, December 2022 (Published)

Abstract ›

To interact with our environment, we need to adapt our body posture and grasp objects with our hands. During a conversation our facial expressions and hand gestures convey important non-verbal cues about our emotional state and intentions towards our fellow speakers. Thus, modeling and capturing 3D full-body shape and pose, hand articulation and facial expressions are necessary to create realistic human avatars for augmented and virtual reality. This is a complex task, due to the large number of degrees of freedom for articulation, body shape variance, occlusions from objects and self-occlusions from body parts, e.g. crossing our hands, and subject appearance. The community has thus far relied on expensive and cumbersome equipment, such as multi-view cameras or motion capture markers, to capture the 3D human body. While this approach is effective, it is limited to a small number of subjects and indoor scenarios. Using monocular RGB cameras would greatly simplify the avatar creation process, thanks to their lower cost and ease of use. These advantages come at a price though, since RGB capture methods need to deal with occlusions, perspective ambiguity and large variations in subject appearance, in addition to all the challenges posed by full-body capture. In an attempt to simplify the problem, researchers generally adopt a divide-and-conquer strategy, estimating the body, face and hands with distinct methods using part-specific datasets and benchmarks. However, the hands and face constrain the body and vice-versa, e.g. the position of the wrist depends on the elbow, shoulder, etc.; the divide-and-conquer approach can not utilize this constraint. In this thesis, we aim to reconstruct the full 3D human body, using only readily accessible monocular RGB images. In a first step, we introduce a parametric 3D body model, called SMPL-X, that can represent full-body shape and pose, hand articulation and facial expression. Next, we present an iterative optimization method, named SMPLify-X, that fits SMPL-X to 2D image keypoints. While SMPLify-X can produce plausible results if the 2D observations are sufficiently reliable, it is slow and susceptible to initialization. To overcome these limitations, we introduce ExPose, a neural network regressor, that predicts SMPL-X parameters from an image using body-driven attention, i.e. by zooming in on the hands and face, after predicting the body. From the zoomed-in part images, dedicated part networks predict the hand and face parameters. ExPose combines the independent body, hand, and face estimates by trusting them equally. This approach though does not fully exploit the correlation between parts and fails in the presence of challenges such as occlusion or motion blur. Thus, we need a better mechanism to aggregate information from the full body and part images. PIXIE uses neural networks called moderators that learn to fuse information from these two image sets before predicting the final part parameters. Overall, the addition of the hands and face leads to noticeably more natural and expressive reconstructions. Creating high fidelity avatars from RGB images requires accurate estimation of 3D body shape. Although existing methods are effective at predicting body pose, they struggle with body shape. We identify the lack of proper training data as the cause. To overcome this obstacle, we propose to collect internet images from fashion models websites, together with anthropometric measurements. At the same time, we ask human annotators to rate images and meshes according to a pre-defined set of linguistic attributes. We then define mappings between measurements, linguistic shape attributes and 3D body shape. Equipped with these mappings, we train a neural network regressor, SHAPY, that predicts accurate 3D body shapes from a single RGB image. We observe that existing 3D shape benchmarks lack subject variety and/or ground-truth shape. Thus, we introduce a new benchmark, Human Bodies in the Wild (HBW), which contains images of humans and their corresponding 3D ground-truth body shape. SHAPY shows how we can overcome the lack of in-the-wild images with 3D shape annotations through easy-to-obtain anthropometric measurements and linguistic shape attributes. Regressors that estimate 3D model parameters are robust and accurate, but often fail to tightly fit the observations. Optimization-based approaches tightly fit the data, by minimizing an energy function composed of a data term that penalizes deviations from the observations and priors that encode our knowledge of the problem. Finding the balance between these terms and implementing a performant version of the solver is a time-consuming and non-trivial task. Machine-learned continuous optimizers combine the benefits of both regression and optimization approaches. They learn the priors directly from data, avoiding the need for hand-crafted heuristics and loss term balancing, and benefit from optimized neural network frameworks for fast inference. Inspired from the classic Levenberg-Marquardt algorithm, we propose a neural optimizer that outperforms classic optimization, regression and hybrid optimization-regression approaches. Our proposed update rule uses a weighted combination of gradient descent and a network-predicted update. To show the versatility of the proposed method, we apply it on three other problems, namely full body estimation from (i) 2D keypoints, (ii) head and hand location from a head-mounted device and (iii) face tracking from dense 2D landmarks. Our method can easily be applied to new model fitting problems and offers a competitive alternative to well-tuned traditional model fitting pipelines, both in terms of accuracy and speed. To summarize, we propose a new and richer representation of the human body, SMPL-X, that is able to jointly model the 3D human body pose and shape, facial expressions and hand articulation. We propose methods, SMPLify-X, ExPose and PIXIE that estimate SMPL-X parameters from monocular RGB images, progressively improving the accuracy and realism of the predictions. To further improve reconstruction fidelity, we demonstrate how we can use easy-to-collect internet data and human annotations to overcome the lack of 3D shape data and train a model, SHAPY, that predicts accurate 3D body shape from a single RGB image. Finally, we propose a flexible learnable update rule for parametric human model fitting that outperforms both classic optimization and neural network approaches. This approach is easily applicable to a variety of problems, unlocking new applications in AR/VR scenarios.

pdf DOI BibTeX

Empirical Inference Conference Paper Relational Proxies: Emergent Relationships as Fine-Grained Discriminators Chaudhuri, A., Mancini, M., Akata, Z., Dutta, A. Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 31145-31157, (Editors: S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh), Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems (NeurIPS 2022), December 2022 (Published) URL BibTeX

Empirical Inference Learning and Dynamical Systems Conference Paper Sampling without Replacement Leads to Faster Rates in Finite-Sum Minimax Optimization Das, A., Schölkopf, B., Muehlebach, M. Advances in Neural Information Processing Systems 35, 6749-6762, (Editors: S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh), 36th Conference on Neural Information Processing Systems (NeurIPS 2022) , December 2022 (Published) arXiv URL BibTeX

Empirical Inference Conference Paper Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse Noci*, L., Sotiris*, A., Biggio*, L., Orvieto*, A., Singh*, S. P., Lucchi, A. Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 35:27198-27211, (Editors: S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh), Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems, December 2022, *equal contribution (Published) arXiv URL BibTeX

Empirical Inference Conference Paper Truncated Proposals for scalable and hassle-free simulation-based inference Deistler, M., Gonçalves*, P. J., Macke*, J. H. Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 35:23135-23149, (Editors: S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh), Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems, December 2022, *equal contribution (Published) arXiv URL BibTeX

Empirical Inference Conference Paper When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment Jin*, Z., Levine*, S., Gonzalez*, F., Kamal, O., Sap, M., Sachan, M., Mihalcea, R., Tenenbaum, J., Schölkopf, B. Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 28458-28473, (Editors: S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh), Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems (NeurIPS 2022), December 2022, *equal contribution (Published) arXiv URL BibTeX

Dynamic Locomotion Ph.D. Thesis Mechanical Design, Development and Testing of Bioinspired Legged Robots for Dynamic Locomotion Sarvestani, L. A. Eberhard Karls Universität Tübingen, Tübingen , November 2022 DOI BibTeX

Robotic Materials Patent Hydraulically Amplified Self-healing Electrostatic Transducers Harnessing Zipping Mechanism Keplinger, C. M., Acome, E. L., Kellaris, N. A., Mitchell, S. K., Morrissey, T. G. (US Patent 11486421B2), November 2022

Abstract ›

Hydraulically-amplified, self-healing, electrostatic transducers that harness electrostatic and hydraulic forces to achieve various actuation modes. Electrostatic forces between electrode pairs of the transducers generated upon application of a voltage to the electrode pairs draws the electrodes in each pair towards each other to displace a liquid dielectric contained within an enclosed internal cavity of the transducers to drive actuation in various manners. The electrodes and the liquid dielectric form a self-healing capacitor whereby the liquid dielectric automatically fills breaches in the liquid dielectric resulting from dielectric breakdown. Due to the resting shape of the cavity, a zipping-mechanism allows for selectively actuating the electrodes to a desired extent by controlling the voltage supplied.

URL BibTeX

Empirical Inference Conference Paper Cross-Modal Fusion Distillation for Fine-Grained Sketch-Based Image Retrieval Chaudhuri, A., Mancini, M., Chen, Y., Akata, Z., Dutta, A. 33rd British Machine Vision Conference (BMVC), BMVA Press, November 2022 (Published) PDF BibTeX

Social Foundations of Computation Algorithms and Society Conference Paper Anticipating Performativity by Predicting from Predictions Mendler-Dünner, C., Ding, F., Wang, Y. In Advances in Neural Information Processing Systems 35 (NeurIPS 2022), Curran Associates, Inc., The Thirty-Six Annual Conference on Neural Information Processing Systems (NeurIPS), November 2022 (Published)

Abstract ›

Predictions about people, such as their expected educational achievement or their credit risk, can be performative and shape the outcome that they are designed to predict. Understanding the causal effect of predictions on the eventual outcomes is crucial for foreseeing the implications of future predictive models and selecting which models to deploy. However, this causal estimation task poses unique challenges: model predictions are usually deterministic functions of input features and highly correlated with outcomes, which can make the causal effects of predictions on outcomes impossible to disentangle from the direct effect of the covariates. We study this problem through the lens of causal identifiability. Despite the hardness of this problem in full generality, we highlight three natural scenarios where the causal effect of predictions can be identified from observational data: randomization in predictions, overparameterization of the predictive model deployed during data collection, and discrete prediction outputs. Empirically we show that given our identifiability conditions hold, standard variants of supervised learning that predict from predictions by treating the prediction as an input feature can find transferable functional relationships that allow for conclusions about newly deployed predictive models. These positive results fundamentally rely on model predictions being recorded during data collection, bringing forward the importance of rethinking standard data collection practices to enable progress towards a better understanding of social outcomes and performative feedback loops.

ArXiv URL BibTeX

Empirical Inference Article Automated imaging-based abdominal organ segmentation and quality control in 20,000 participants of the UK Biobank and German National Cohort Studies Kart, T., Fischer, M., Winzeck, S., Glocker, B., Bai, W., Bülow, R., Emmel, C., Friedrich, L., Kauczor, H. U. K. T., Kröncke, T., Mayer, P., Niendorf, T., Peters, A., Pischon, T., Schaarschmidt, B. M., Schmidt, B., Schulze, M. B., Umutle, L., Völzke, H., Küstner, T., et al. Scientific Reports, 12(1):article no. 18733, November 2022 (Published) DOI BibTeX

Empirical Inference Conference Paper Distilling Knowledge from Self-Supervised Teacher by Embedding Graph Alignment Ma, Y., Chen, Y., Akata, Z. The 33rd British Machine Vision Conference Proceedings , BMVA Press, The 33rd British Machine Vision Conference (BMVC 2022) , November 2022 (Published) URL BibTeX

Forschung

Abteilungen

Max Planck Research Groups

Start-Up Teams

Forschungsgruppen

Personen

Kontakt

Our Institute

Unsere Geschichte

Karriere

Überblick über Promotionsprogramme

Karriere

Service-Einrichtungen

Zentrale Wissenschaftliche Einrichtungen

Werkstätten

Campus Services

Impact

Kooperationen

Initiativen und Partner

Forschung

Abteilungen

Max Planck Research Groups

Start-Up Teams

Forschungsgruppen

Personen

Kontakt

Our Institute

Unsere Geschichte

Karriere

Überblick über Promotionsprogramme

Karriere

Service-Einrichtungen

Zentrale Wissenschaftliche Einrichtungen

Werkstätten

Campus Services

Impact

Kooperationen

Initiativen und Partner

Publications

Filter by