Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Embodied Vision Ph.D. Thesis Object-Level Dynamic Scene Reconstruction With Physical Plausibility From RGB-D Images Strecke, M. F. Eberhard Karls Universität Tübingen, Tübingen, 2023 (Published)
Humans have the remarkable ability to perceive and interact with objects in the world around them. They can easily segment objects from visual data and have an intuitive understanding of how physics influences objects. By contrast, robots are so far often constrained to tailored environments for a specific task, due to their inability to reconstruct a versatile and accurate scene representation. In this thesis, we combine RGB-D video data with background knowledge of real-world physics to develop such a representation for robots.</br> </br> Our contributions can be separated into two main parts: a dynamic object tracking tool and optimization frameworks that allow for improving shape reconstructions based on physical plausibility. The dynamic object tracking tool "EM-Fusion" detects, segments, reconstructs, and tracks objects from RGB-D video data. We propose a probabilistic data association approach for attributing the image pixels to the different moving objects in the scene. This allows us to track and reconstruct moving objects and the background scene with state-of-the art accuracy and robustness towards occlusions.</br> </br> We investigate two ways of further optimizing the reconstructed shapes of moving objects based on physical plausibility. The first of these, "Co-Section", includes physical plausibility by reasoning about the empty space around an object. We observe that no two objects can occupy the same space at the same time and that the depth images in the input video provide an estimate of observed empty space. Based on these observations, we propose intersection and hull constraints, which we combine with the observed surfaces in a global optimization approach. Compared to EM-Fusion, which only reconstructs the observed surface, Co-Section optimizes watertight shapes. These watertight shapes provide a rough estimate of unseen surfaces and could be useful as initialization for further refinement, e.g., by interactive perception. In the second optimization approach, "DiffSDFSim", we reason about object shapes based on physically plausible object motion. We observe that object trajectories after collisions depend on the object's shape, and extend a differentiable physics simulation for optimizing object shapes together with other physical properties (e.g., forces, masses, friction) based on the motion of the objects and their interactions. Our key contributions are using signed distance function models for representing shapes and a novel method for computing gradients that models the dependency of the time of contact on object shapes. We demonstrate that our approach recovers target shapes well by fitting to target trajectories and depth observations. Further, the ground-truth trajectories are recovered well in simulation using the resulting shape and physical properties. This enables predictions about the future motion of objects by physical simulation.</br> </br> We anticipate that our contributions can be useful building blocks in the development of 3D environment perception for robots. The reconstruction of individual objects as in EM-Fusion is a key ingredient required for interactions with objects. Completed shapes as the ones provided by Co-Section provide useful cues for planning interactions like grasping of objects. Finally, the recovery of shape and other physical parameters using differentiable simulation as in DiffSDFSim allows simulating objects and thus predicting the effects of interactions. Future work might extend the presented works for interactive perception of dynamic environments by comparing these predictions with observed real-world interactions to further improve the reconstructions and physical parameter estimations.
DOI URL BibTeX

Modern Magnetic Systems Article Pump probe x-ray microscopy of photo-induced magnetization dynamics at MHz repetition rates Gerlinger, K., Pfau, B., Hennecke, M., Kern, L., Will, I., Noll, T., Weigand, M., Gräfe, J., Traeger, N., Schneider, M., Günther, C. M., Engel, D., Schütz, G., Eisebitt, S. Structural Dynamics, 10(2):024301, American Institute of Physics, Melville, NY, 2023 (Published) DOI BibTeX

Modern Magnetic Systems Article Quantifying the spin-wave asymmetry in single and double rectangular Ni80Fe20 microstrips by TR-STXM, FMR, and micromagnetic simulations Pile, S., Ney, A., Lenz, K., Narkowicz, R., Lindner, J., Wintz, S., Förster, J., Mayr, S., Weigand, M. IEEE Transactions on Magnetics, 59(11), Published by the Institute of Electrical and Electronics Engineers for the Magnetics Group, New York, NY, 2023 DOI BibTeX

Materials Article Retention of dissolved organic matter during podzolisation: Testing processes in laboratory experiments and at the submicron scale Krettek, A., Höschen, C., Richter, G., Schweizer, S., Thilo, R. Geoderma Regional, 32:e00606, Elsevier Science, Amsterdam, 2023 (Published) DOI BibTeX

Modern Magnetic Systems Article Seeding and emergence of composite skyrmions in a van der Waals magnet Powalla, L., Birch, M. T., Litzius, K., Wintz, S., Yasin, F. S., Turnbull, L. A., Schulz, F., Mayoh, D. A., Balakrishnan, G., Weigand, M., Yu, X., Kern, K., Schütz, G., Burghard, M. Advanced Materials, 35(12):2208930, Wiley-VCH, Weinheim, 2023 (Published) DOI BibTeX

Modern Magnetic Systems Materials Article Site-selective substitution and resulting magnetism in arc-melted perovskite ATiO3-delta (A \textequals Ca, Sr, Ba) Yoon, S., Xie, W., Xiao, X., Checchia, S., Coduri, M., Schützendübe, P., Widenmeyer, M., Ebbinghaus, S. G., Balke, B., Weidenkaff, A., Schütz, G., Son, K. Journal of the American Ceramic Society, 106(11):6778-6786, American Ceramic Society, Westerville, OH, USA, 2023 (Published) DOI BibTeX

Modern Magnetic Systems Article Skyrmion and skyrmionium formation in the two-dimensional magnet Cr2Ge2Te6 Powalla, L., Birch, M. T., Litzius, K., Wintz, S., Satheesh, S., Weigand, M., Goering, E., Schütz, G., Burghard, M. Physical Review B, 108(21), American Physical Society, Woodbury, NY, 2023 DOI BibTeX

Modern Magnetic Systems Article Spatially-resolved dynamic sampling of different phasic magnetic resonances of nanoparticle ensembles in a magnetotactic bacterium Magnetospirillum magnetotacticum Feggeler, T., Lill, J., Günzing, D., Meckenstock, R., Spoddig, D., Efremova, M. V., Wintz, S., Weigand, M., Zingsem, B. W., Farle, M., Wende, H., Ollefs, K. J., Ohldag, H. New Journal of Physics, 25(4):043010, IOP Publishing, Bristol, 2023 (Published) DOI BibTeX

Empirical Inference Technical Report Synchronizing Machine Learning Algorithms, Realtime Robotic Control and Simulated Environment with o80 Berenz, V., Widmaier, F., Guist, S., Schölkopf, B., Büchler, D. Robot Software Architectures Workshop (RSA) 2023, ICRA, 2023 (Published)
Robotic applications require the integration of various modalities, encompassing perception, control of real robots and possibly the control of simulated environments. While the state-of-the-art robotic software solutions such as ROS 2 provide most of the required features, flexible synchronization between algorithms, data streams and control loops can be tedious. o80 is a versatile C++ framework for robotics which provides a shared memory model and a command framework for real-time critical systems. It enables expert users to set up complex robotic systems and generate Python bindings for scientists. o80's unique feature is its flexible synchronization between processes, including the traditional blocking commands and the novel ``bursting mode'', which allows user code to control the execution of the lower process control loop. This makes it particularly useful for setups that mix real and simulated environments.
arxiv poster URL BibTeX

Conference Paper Unsupervised Semantic Segmentation with Self-supervised Object-centric Representations Zadaianchuk, A., Kleindessner, M., Zhu, Y., Locatello, F., Brox, T. In International Conference on Learning Representations, 2023 URL BibTeX

Dynamic Locomotion Conference Paper Upside down: affordable high-performance motion platform Pradhan, N. M. S., Frank, P., Mo, A., Badri-Spröwitz, A. Proceedings: ISR Europe 2023, 412-418, VDE, Stuttgart, ISR Europe 2023 - 56th International Symposium on Robotics, 2023 (Published)
Parallel robots are capable of high-speed manipulation and have become essential tools in the industry. The proximal placement of their motors and the low weight of their end effectors make them ideal for generating highly dynamic motion. Therefore, parallel robots can be adopted for motion platform designs, as long as end effector loads are low. Traditional motion platforms can be large and powerful to generate multiple g acceleration. However, these designs tend to be expensive and large. Similar but smaller motion platforms feature a small work range with reduced degrees of freedom (DoFs) and a limited payload. Here we seek a medium-sized affordable parallel robot capable of powerful and high-speed 6-DoF motion in a comparably large workspace. This work explores the concept of a quadruped robot flipped upside-down, with the motion platform fixed between its feet. In particular, we exploit the high-power dynamic brushless actuation and the four-leg redundancy when moving the motion platform. We characterize the resulting motion platform by tracking sinusoidal and circular trajectories with varying loads. Dynamic motions in 6 DoFs up to 10 Hz and ± 10 mm amplitude are possible when moving a mass of 300 grams. We demonstrate single-axis end-effector translations up to ± 20 mm at 10 Hz for higher loads of 1.2 kg. The motion platform can be replicated easily by 3D printing and off-the-shelf components. All motion platform-related hardware and the custom-written software required to replicate are open-source.
youtube github arxiv DOI URL BibTeX

Theory of Inhomogeneous Condensed Matter Article Versatile Microfluidics Separation of Colloids by Combining External Flow with Light-Induced Chemical Activity Bekir, M., Sperling, M., Muñoz, D. V., Braksch, C., Böker, A., Lomadze, N., Popescu, M. N., Santer, S. Advanced Materials, 35(25):2300358, Wiley-VCH, Weinheim, 2023 (Published) DOI BibTeX

Perceiving Systems Article Viewpoint-Driven Formation Control of Airships for Cooperative Target Tracking Price, E., Black, M. J., Ahmad, A. IEEE Robotics and Automation Letters, 8(6):3653-3660, 2023 (Published)
For tracking and motion capture (MoCap) of animals in their natural habitat, a formation of safe and silent aerial platforms, such as airships with on-board cameras, is well suited. In our prior work we derived formation properties for optimal MoCap, which include maintaining constant angular separation between observers w.r.t. the subject, threshold distance to it and keeping it centered in the camera view. Unlike multi-rotors, airships have non-holonomic constrains and are affected by ambient wind. Their orientation and flight direction are also tightly coupled. Therefore a control scheme for multicopters that assumes independence of motion direction and orientation is not applicable. In this letter, we address this problem by first exploiting a periodic relationship between the airspeed of an airship and its distance to the subject. We use it to derive analytical and numeric solutions that satisfy the formation properties for optimal MoCap. Based on this, we develop an MPC-based formation controller. We perform theoretical analysis of our solution, boundary conditions of its applicability, extensive simulation experiments and a real world demonstration of our control method with an unmanned airship.
publisher's site DOI BibTeX

Embodied Vision Autonomous Motion Movement Generation and Control Conference Paper Visual-Inertial and Leg Odometry Fusion for Dynamic Locomotion Dhédin, V., Li, H., Khorshidi, S., Mack, L., Ravi, A. K. C., Meduri, A., Shah, P., Grimminger, F., Righetti, L., Khadiv, M., Stueckler, J. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2023 (Published)
Implementing dynamic locomotion behaviors on legged robots requires a high-quality state estimation module. Especially when the motion includes flight phases, state-of-the-art approaches fail to produce reliable estimation of the robot posture, in particular base height. In this paper, we propose a novel approach for combining visual-inertial odometry (VIO) with leg odometry in an extended Kalman filter (EKF) based state estimator. The VIO module uses a stereo camera and IMU to yield low-drift 3D position and yaw orientation and drift-free pitch and roll orientation of the robot base link in the inertial frame. However, these values have a considerable amount of latency due to image processing and optimization, while the rate of update is quite low which is not suitable for low-level control. To reduce the latency, we predict the VIO state estimate at the rate of the IMU measurements of the VIO sensor. The EKF module uses the base pose and linear velocity predicted by VIO, fuses them further with a second high-rate IMU and leg odometry measurements, and produces robot state estimates with a high frequency and small latency suitable for control. We integrate this lightweight estimation framework with a nonlinear model predictive controller and show successful implementation of a set of agile locomotion behaviors, including trotting and jumping at varying horizontal speeds, on a torque-controlled quadruped robot.
preprint video DOI URL BibTeX

Modern Magnetic Systems Article ZIF-8 pellets as a robust material for hydrogen cryo-adsorption tanks Balderas-Xicohténcatl, R., Villajos, J. A., Casabán, J., Wong, D., Maiwald, M., Hirscher, M. ACS Applied Energy Materials, 6(18):9145-9152, American Chemical Society, Washington, DC, 2023 (Published) DOI BibTeX

Empirical Inference Article normflows: A PyTorch Package for Normalizing Flows Stimper, V., Liu, D., Campbell, A., Berenz, V., Ryll, L., Schölkopf, B., Hernández-Lobato, J. M. Journal of Open Source Software, 8(86):5361, The Journal of Open Source Software, 2023 (Published)
Normalizing flows model probability distributions through an expressive tractable density (D. Rezende & Mohamed, 2015; Esteban G. Tabak & Turner, 2013; Esteban G. Tabak & Vanden-Eijnden, 2010). They transform a simple base distribution, such as a Gaussian, through a sequence of invertible functions, which are referred to as layers. These layers typically use neural networks to become very expressive. Flows are ubiquitous in machine learning and have been applied to image generation (Grcić et al., 2021; Kingma & Dhariwal, 2018), text modeling (Wang & Wang, 2019), variational inference (D. Rezende & Mohamed, 2015), approximating Boltzmann distributions (Noé et al., 2019), and many other problems (Kobyzev et al., 2021; Papamakarios et al., 2021). Here, we present normflows, a Python package for normalizing flows. It allows to build normalizing flow models from a suite of base distributions, flow layers, and neural networks. The package is implemented in the popular deep learning framework PyTorch (Paszke et al., 2019), which simplifies the integration of flows in larger machine learning models or pipelines. It supports most of the common normalizing flow architectures, such as Real NVP (Dinh et al., 2017), Glow (Kingma & Dhariwal, 2018), Masked Autoregressive Flows (Papamakarios et al., 2017), Neural Spline Flows (Durkan et al., 2019; Müller et al., 2019), Residual Flows (Chen et al., 2019), and many more. The package can be easily installed via pip and the code is publicly available on GitHub.
JOSS GitHub DOI URL BibTeX

Perceiving Systems Article SmartMocap: Joint Estimation of Human and Camera Motion Using Uncalibrated RGB Cameras Saini, N., Huang, C. P., Black, M. J., Ahmad, A. IEEE Robotics and Automation Letters, 8(6):3206-3213, 2023 (Published)
Markerless human motion capture (mocap) from multiple RGB cameras is a widely studied problem. Existing methods either need calibrated cameras or calibrate them relative to a static camera, which acts as the reference frame for the mocap system. The calibration step has to be done a priori for every capture session, which is a tedious process, and re-calibration is required whenever cameras are intentionally or accidentally moved. In this letter, we propose a mocap method which uses multiple static and moving extrinsically uncalibrated RGB cameras. The key components of our method are as follows. First, since the cameras and the subject can move freely, we select the ground plane as a common reference to represent both the body and the camera motions unlike existing methods which represent bodies in the camera coordinate system. Second, we learn a probability distribution of short human motion sequences (~1sec) relative to the ground plane and leverage it to disambiguate between the camera and human motion. Third, we use this distribution as a motion prior in a novel multi-stage optimization approach to fit the SMPL human body model and the camera poses to the human body keypoints on the images. Finally, we show that our method can work on a variety of datasets ranging from aerial cameras to smartphones. It also gives more accurate results compared to the state-of-the-art on the task of monocular human mocap with a static camera.
publisher site Pre-print DOI URL BibTeX

Perceiving Systems Article AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time Fang, H., Li, J., Tang, H., Xu, C., Zhu, H., Xiu, Y., Li, Y., Lu, C. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 45(6):7157 - 7173, 2023 (Published) DOI URL BibTeX

Conference Paper CaPhy: Capturing Physical Properties for Animatable Human Avatars Su, Z., Hu, L., Lin, S., Zhang, H., Zhang, S., Thies, J., Liu, Y. In Proceedings 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 14104-14114, 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023 (Published) DOI URL BibTeX

Human Aspects of Machine Learning Social Foundations of Computation Unpublished Challenging the validity of personality tests for large language models Tom, S., Florian, D., Samadi, S., Kelava, A. arXiv, 2023 (Submitted)
With large language models (LLMs) like GPT-4 appearing to behave increasingly human-like in text-based interactions, it has become popular to attempt to evaluate personality traits of LLMs using questionnaires originally developed for humans. While reusing measures is a resource-efficient way to evaluate LLMs, careful adaptations are usually required to ensure that assessment results are valid even across human subpopulations. In this work, we provide evidence that LLMs’ responses to personality tests systematically deviate from human responses, implying that the results of these tests cannot be interpreted in the same way. Concretely, reversecoded items (“I am introverted” vs.“I am extraverted”) are often both answered affirmatively. Furthermore, variation across prompts designed to “steer” LLMs to simulate particular personality types does not follow the clear separation into five independent personality factors from human samples. In light of these results, we believe that it is important to investigate tests’ validity for LLMs before drawing strong conclusions about potentially ill-defined concepts like LLMs’“personality”.
BibTeX

Neural Capture and Synthesis Conference Paper ClipFace: Text-guided Editing of Textured 3D Morphable Models Aneja, S., Thies, J., Dai, A., Niessner, M. In PROCEEDINGS OF SIGGRAPH 2023 CONFERENCE PAPERS, SIGGRAPH 2023, 1-11, ACM SIGGRAPH Conference, 2023 (Published) DOI URL BibTeX

Human Aspects of Machine Learning Social Foundations of Computation Conference Paper Do personality tests generalize to Large Language Models? Sühr, T., Dorner, F., Samadi, S., Kelava, A. In Proceedings of the Thirty-Seventh Annual Conference on Neural Information Processing Systems., Ernest N. Morial Convention Center, New Orleans, Louisiana., Socially Responsible Language Modelling Research (SoLaR) Workshop at NeurIPS, 2023 (Published) URL BibTeX

Conference Paper Dynamic Point Fields Prokudin, S., Ma, Q., Raafat, M., Valentin, J., Tang, S. In 2023 (Published) BibTeX

Empirical Inference Conference Paper Effective Bayesian Heteroscedastic Regression with Deep Neural Networks Immer, A. P. E. M. A. V. J. E. In Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 1-24, Curran Associates Inc., NeurIPS, 2023 (Published) DOI URL BibTeX

Conference Paper Imitator: Personalized Speech-driven 3D Facial Animation Thambiraja, B., Habibie, I., Aliakbarian, S., Cosker, D., Theobalt, C., Thies, J. In Proceedings 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 20564-20574, 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023 (Published) DOI URL BibTeX

Perceiving Systems Conference Paper InstantAvatar: Learning Avatars from Monocular Video in 60 Seconds Jiang, T. C. X. S. J. H. O. In Proceedings 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023 (Published) DOI URL BibTeX

Perceiving Systems Ph.D. Thesis Monocular 3D Shape and Pose Estimation for Humans and Animals Rueegg, N. 2023 (Published)
Accurately estimating the 3D shape and pose of humans and animals from images is a key problem in the computer vision field. These estimates have numerous potential applications in areas including virtual reality, health monitoring, sports analysis, and robotics. Although in recent years significant progress has been made on monocular 3D human reconstruction, research on animals has lagged behind, largely due to the scarcity of 3D scans and motion capture data, which hinders the development of expressive shape and pose priors. Exploiting such priors is a common approach for addressing the inherent ambiguities that arise when attempting to predict 3D articulated pose from 2D data. Additionally, the extreme appearance variability and frequent occlusions that occur with quadrupeds present further challenges for accurate 3D shape and pose recovery. With 3D animal reconstruction in mind, our goal is to advance monocular 3D shape and pose estimation for cases where data is hard to obtain. We begin by demonstrating a conceptually innovative solution to a problem setting with very little to no labeled data. Specifically, we learn the underlying relationship between a 3D parametric model and a set of unlabelled (no keypoints, no segmentation masks) images which show the object of interest. Our solution involves designing a chain of two unsupervised cycles that connect representations at three levels of abstraction – image, segmentation and finally a 3D mesh. We prove the feasibility of our approach on synthetic as well as real data for humans. Subsequently, we investigate the potential for enhanced results by leveraging 2D data that is readily available. Using the representative class of dogs as an example, we start with the key insight that animal class – or breed – is directly related to shape similarity. There is significant intra-class variability, but in general dogs of the same breed look more alike than dogs with different breed affiliation. A triplet loss, together with a classification loss, enables us to learn a structured latent shape space, which in turn enhances 3D dog shape estimation results at test time. Finally, we focus on 3D pose estimation. We show how a different cue, namely contact, can reduce the requirement for either images with 3D ground truth or expressive pose priors – both of which are not available for most of the animals species. We learn to predict 3D poses which are consistent with ground contact. To that aim, we define losses pulling contact vertices towards a common, estimated, ground plane and a constraint to penalize interpenetration of the floor. This results in significant advances compared to previous state-of-the-art. Furthermore, if desired, our predicted ground contact labels can be used in a test-time optimization loop, enhancing 3D shape and pose recovery even more.
download pdf DOI BibTeX

Perceiving Systems Conference Paper On Fairness in Face Albedo Estimation Feng, H., Bolkart, T., Tesch, J., Black, M. J., Abrevaya, V. In PROCEEDINGS SIGGRAPH 2022 TALKS, PROCEEDINGS SIGGRAPH 2022 TALKS, 1-2, SIGGRAPH, 2023 (Published) DOI URL BibTeX

Perceiving Systems Conference Paper OpenScene: 3D Scene Understanding with Open Vocabularies Peng, S., Genova, K., Jiang, C. M., Tagliasacchi, A., Pollefeys, M., Funkhouser, T. In Proceedings 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 815-824, IEEE Explore, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023 (Published) DOI URL BibTeX

Empirical Inference Article Quantitative three-dimensional imaging of chemical short-range order via machine learning enhanced atom probe tomography Li, Y. W. Y. W. Z. L. X. C. T. H. L. R. Z. Z. X. H. L. D. R. G. Y. N. J. M. A. R. M. B. S. L. H. B. I. S. L. T. G. B. Nature Communications, 14:7410, 2023 (Published)
Chemical short-range order (CSRO) refers to atoms of specific elements self-organising within a disordered crystalline matrix to form particular atomic neighbourhoods. CSRO is typically characterized indirectly, using volume-averaged or through projection microscopy techniques that fail to capture the three-dimensional atomistic architectures. Here, we present a machine-learning enhanced approach to break the inherent resolution limits of atom probe tomography enabling three-dimensional imaging of multiple CSROs. We showcase our approach by addressing a long-standing question encountered in body-centred-cubic Fe-Al alloys that see anomalous property changes upon heat treatment. We use it to evidence non-statistical B2-CSRO instead of the generally-expected D03-CSRO. We introduce quantitative correlations among annealing temperature, CSRO, and nano-hardness and electrical resistivity. Our approach is further validated on modified D03-CSRO detected in Fe-Ga. The proposed strategy can be generally employed to investigate short/medium/long-range ordering phenomena in different materials and help design future high-performance materials.
DOI URL BibTeX

Perceiving Systems Conference Paper Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition Guo, C., Jiang, T., Chen, X., Song, J., Hilliges, O. In EEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 12858-12868, 2023 (Published)
We present Vid2Avatar, a method to learn human avatars from monocular in-the-wild videos. Reconstructing humans that move naturally from monocular in-the-wild videos is difficult. Solving it requires accurately separating humans from arbitrary backgrounds. Moreover, it requires reconstructing detailed 3D surface from short video sequences, making it even more challenging. Despite these challenges, our method does not require any groundtruth supervision or priors extracted from large datasets of clothed human scans, nor do we rely on any external segmentation modules. Instead, it solves the tasks of scene decomposition and surface reconstruction directly in 3D by modeling both the human and the background in the scene jointly, parameterized via two separate neural fields. Specifically, we define a temporally consistent human representation in canonical space and formulate a global optimization over the background model, the canonical human shape and texture, and per-frame human pose parameters. A coarse-to-fine sampling strategy for volume rendering and novel objectives are introduced for a clean separation of dynamic human and static background, yielding detailed and robust 3D human reconstructions. The evaluation of our method shows improvements over prior art on publicly available datasets.
DOI URL BibTeX

Perceiving Systems Ph.D. Thesis Whole-Body Motion Capture and Beyond: From Model-Based Inference to Learning-Based Regression Huang, Y. University of Tübingen, December 2022 (Published)
Though effective and successful, traditional marker-less Motion Capture (MoCap) methods suffer from several limitations: 1) they presume a character-specific body model, thus they do not permit a fully automatic pipeline and generalization over diverse body shapes; 2) no objects humans interact with are tracked, while in reality interaction between humans and objects is ubiquitous; 3) they heavily rely on a sophisticated optimization process, which needs a good initialization and strong priors. This process can be slow. We address all the aforementioned issues in this thesis, as described below. Firstly we propose a fully automatic method to accurately reconstruct a 3D human body from multi-view RGB videos, the typical setup for MoCap systems. We pre-process all RGB videos to obtain 2D keypoints and silhouettes. Then we fit the SMPL body model into the 2D measurements in two successive stages. In the first stage, the shape and pose parameters of SMPL are estimated frame-wise sequentially. In the second stage, a batch of frames are refined jointly with an extra DCT prior. Our method can naturally handle different body shapes and challenging poses without human intervention. Then we extend this system to support tracking of rigid objects the subjects interact with. Our setup consists of 6 Azure Kinect cameras. Firstly we pre-process all the videos by segmenting humans and objects and detecting 2D body joints. We adopt the SMPL-X model here to capture body and hand pose. The model is fitted to 2D keypoints and point clouds. Then the body poses and object poses are jointly updated with contact and interpenetration constraints. With this approach, we capture a novel human-object interaction dataset with natural RGB images and plausible body and object motion information. Lastly, we present the first practical and lightweight MoCap system that needs only 6 IMUs. Our approach is based on Bi-directional RNNs. The network can make use of temporal information by jointly reasoning about past and future IMU measurements. To handle the data scarcity issue, we create synthetic data from archival MoCap data. Overall, our system runs ten times faster than traditional optimization-based methods, and is numerically more accurate. We also show it is feasible to estimate which activity the subject is doing by only observing the IMU measurement from a smartwatch worn by the subject. This not only can be useful for a high-level semantic understanding of the human behavior, but also alarms the public of potential privacy concerns. In summary, we advance marker-less MoCap by contributing the first automatic yet accurate system, extending the MoCap methods to support rigid object tracking, and proposing a practical and lightweight algorithm via 6 IMUs. We believe our work makes marker-less and IMUs-based MoCap cheaper and more practical, thus closer to end-users for daily usage.
download Thesis DOI BibTeX

Haptic Intelligence Autonomous Learning Empirical Inference Miscellaneous A Sequential Group VAE for Robot Learning of Haptic Representations Richardson, B. A., Kuchenbecker, K. J., Martius, G. 1-11, Workshop paper (8 pages) presented at the CoRL Workshop on Aligning Robot Representations with Humans, Auckland, New Zealand, December 2022 (Published)
Haptic representation learning is a difficult task in robotics because information can be gathered only by actively exploring the environment over time, and because different actions elicit different object properties. We propose a Sequential Group VAE that leverages object persistence to learn and update latent general representations of multimodal haptic data. As a robot performs sequences of exploratory procedures on an object, the model accumulates data and learns to distinguish between general object properties, such as size and mass, and trial-to-trial variations, such as initial object position. We demonstrate that after very few observations, the general latent representations are sufficiently refined to accurately encode many haptic object properties.
URL BibTeX

Empirical Inference Article A survey of algorithmic recourse: contrastive explanations and consequential recommendations Karimi, A., Barthe, G., Schölkopf, B., Valera, I. ACM Computing Surveys, 55(5), Association for Computing Machinery (ACM), December 2022 (Published) arXiv DOI URL BibTeX

Empirical Inference Conference Paper Active Bayesian Causal Inference Toth, C., Lorch, L., Knoll, C., Krause, A., Pernkopf, F., Peharz*, R., von Kügelgen*, J. Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 35:16261-16275, (Editors: S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh), Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems (NeurIPS 2022), December 2022, *shared last author (Published) arXiv URL BibTeX

Empirical Inference Conference Paper Amortized Inference for Causal Structure Learning Lorch, L., Sussex, S., Rothfuss, J., Krause, A., Schölkopf, B. Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 35:13104-13118, (Editors: S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh), Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems, December 2022 (Published) arXiv URL BibTeX