Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Empirical Inference Conference Paper Neural Lyapunov Redesign Mehrjou, A., Ghavamzadeh, M., Schölkopf, B. Proceedings of the 3rd Conference on Learning for Dynamics and Control (L4DC), 144:459-470, Proceedings of Machine Learning Research, (Editors: Jadbabaie, Ali and Lygeros, John and Pappas, George J. and Parrilo, Pablo A. and Recht, Benjamin and Tomlin, Claire J. and Zeilinger, Melanie N.), PMLR, 3rd Annual Conference on Learning for Dynamics and Control (L4DC) , June 2021 (Published) URL BibTeX

Perceiving Systems Conference Paper On Self-Contact and Human Pose Müller, L., Osman, A. A. A., Tang, S., Huang, C. P., Black, M. J. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 9985-9994, IEEE, Piscataway, NJ, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), June 2021 (Published)
People touch their face 23 times an hour, they cross their arms and legs, put their hands on their hips, etc. While many images of people contain some form of self-contact, current 3D human pose and shape (HPS) regression methods typically fail to estimate this contact. To address this, we develop new datasets and methods that significantly improve human pose estimation with self-contact. First, we create a dataset of 3D Contact Poses (3DCP) containing SMPL-X bodies fit to 3D scans as well as poses from AMASS, which we refine to ensure good contact. Second, we leverage this to create the Mimic-The-Pose (MTP) dataset of images, collected via Amazon Mechanical Turk, containing people mimicking the 3DCP poses with self-contact. Third, we develop a novel HPS optimization method, SMPLify-XMC, that includes contact constraints and uses the known 3DCP body pose during fitting to create near ground-truth poses for MTP images. Fourth, for more image variety, we label a dataset of in-the-wild images with Discrete Self-Contact (DSC) information and use another new optimization method, SMPLify-DC, that exploits discrete contacts during pose optimization. Finally, we use our datasets during SPIN training to learn a new 3D human pose regressor, called TUCH (Towards Understanding Contact in Humans). We show that the new self-contact training data significantly improves 3D human pose estimates on withheld test data and existing datasets like 3DPW. Not only does our method improve results for self-contact poses, but it also improves accuracy for non-contact poses. The code and data are available for research purposes at https://tuch.is.tue.mpg.de.
project arXiv poster video code DOI BibTeX

Intelligent Control Systems Conference Paper On exploration requirements for learning safety constraints Massiani, P., Heim, S., Trimpe, S. In Proceedings of the 3rd Conference on Learning for Dynamics and Control, 905-916, Proceedings of Machine Learning Research (PMLR), Vol. 144, (Editors: Jadbabaie, Ali and Lygeros, John and Pappas, George J. and Parrilo, Pablo A. and Recht, Benjamin and Tomlin, Claire J. and Zeilinger, Melanie), PMLR, 3rd Annual Conference on Learning for Dynamics and Control (L4DC), June 2021 (Published)
Enforcing safety for dynamical systems is challenging, since it requires constraint satisfaction along trajectory predictions. Equivalent control constraints can be computed in the form of sets that enforce positive invariance, and can thus guarantee safety in feedback controllers without predictions. However, these constraints are cumbersome to compute from models, and it is not yet well established how to infer constraints from data. In this paper, we shed light on the key objects involved in learning control constraints from data in a model-free setting. In particular, we discuss the family of constraints that enforce safety in the context of a nominal control policy, and expose that these constraints do not need to be accurate everywhere. They only need to correctly exclude a subset of the state-actions that would cause failure, which we call the critical set.
URL BibTeX

Physics for Inference and Optimization Article Optimal Transport in Multilayer Networks for Traffic Flow Optimization for Traffic Flow Optimization Ibrahim, A. A., Lonardi, A., Bacco, C. D. Algorithms, 14(7):189, June 2021 (Published)
Modeling traffic distribution and extracting optimal flows in multilayer networks is of the utmost importance to design efficient, multi-modal network infrastructures. Recent results based on optimal transport theory provide powerful and computationally efficient methods to address this problem, but they are mainly focused on modeling single-layer networks. Here, we adapt these results to study how optimal flows distribute on multilayer networks. We propose a model where optimal flows on different layers contribute differently to the total cost to be minimized. This is done by means of a parameter that varies with layers, which allows to flexibly tune the sensitivity to the traffic congestion of the various layers. As an application, we consider transportation networks, where each layer is associated to a different transportation system, and show how the traffic distribution varies as we tune this parameter across layers. We show an example of this result on the real, 2-layer network of the city of Bordeaux with a bus and tram, where we find that in certain regimes, the presence of the tram network significantly unburdens the traffic on the road network. Our model paves the way for further analysis of optimal flows and navigability strategies in real, multilayer networks.
Code Preprint DOI BibTeX

Empirical Inference Ph.D. Thesis Optimization Algorithms for Machine Learning Raj, A. University of Tübingen, Germany, June 2021 (Published) BibTeX

Empirical Inference Conference Paper Orthogonal Over-Parameterized Training Liu, W., Lin, R., Liu, Z., Rehg, J., Paull, L., Xiong, L., Song, L., Weller, A. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 7251-7260, Computer Vision Foundation / IEEE, CVPR, June 2021 (Published) URL BibTeX

Perceiving Systems Conference Paper Populating 3D Scenes by Learning Human-Scene Interaction Hassan, M., Ghosh, P., Tesch, J., Tzionas, D., Black, M. J. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 14703-14713, IEEE, Piscataway, NJ, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), June 2021 (Published)
Humans live within a 3D space and constantly interact with it to perform tasks. Such interactions involve physical contact between surfaces that is semantically meaningful. Our goal is to learn how humans interact with scenes and leverage this to enable virtual characters to do the same. To that end, we introduce a novel Human-Scene Interaction (HSI) model that encodes proximal relationships, called POSA for “Pose with prOximitieS and contActs”. The representation of interaction is body-centric, which enables it to generalize to new scenes. Specifically, POSA augments the SMPL-X parametric human body model such that, for every mesh vertex, it encodes (a) the contact probability with the scene surface and (b) the corresponding semantic scene label. We learn POSA with a VAE conditioned on the SMPL-X vertices, and train on the PROX dataset, which contains SMPL-X meshes of people interacting with 3D scenes, and the corresponding scene semantics from the PROX-E dataset. We demonstrate the value of POSA with two applications. First, we automatically place 3D scans of people in scenes. We use a SMPL-X model fit to the scan as a proxy and then find its most likely placement in 3D. POSA provides an effective representation to search for “affordances” in the scene that match the likely contact relationships for that pose. We perform a perceptual study that shows significant improvement over the state of the art on this task. Second, we show that POSA’s learned representation of body-scene interaction supports monocular human pose estimation that is consistent with a 3D scene, improving on the state of the art. Our model and code are available for research purposes at https://posa.is.tue.mpg.de.
project pdf poster video DOI BibTeX

Rationality Enhancement Poster Promoting metacognitive learning through systematic reflection Frederic Becker, , Lieder, F. The first edition of Life Improvement Science Conference, June 2021 (Published)
Human decision-making is sometimes systematically biased toward suboptimal decisions. For example, people often make short-sighted choices because they don't give enough weight to the long-term consequences of their actions. Previous studies showed that it is possible to overcome such biases by teaching people a more rational decision strategy through instruction, demonstrations, or practice with feedback. The benefits of these approaches tend to be limited to situations that are very similar to those used during the training. One way to overcome this limitation is to create general tools and strategies that people can use to improve their decision-making in any situation. Here we propose one such approach, namely directing people to systematically reflect on how they make their decisions. In systematic reflection, past experience is re-evaluated with the intention to learn. In this study, we investigate how reflection affects how people learn to plan and whether reflective learning can help people to discover more far-sighted planning strategies. In our experiment participants solve a series of 30 planning problems where the immediate rewards are smaller and therefore less important than long-term rewards. Building on Wolfbauer et al. (2020), the experimental group is guided by four reflection prompts asking the participant to describe their planning strategy, the strategy's performance, and his or her emotional response, insights, and intention to change their strategy. The control group practices planning without reflection prompts. Our pilot data suggest that systematic reflection helps people to more rapidly discover adaptive planning strategies. Our findings suggest that reflection is useful not only for helping people learn what to do in a specific situation but also for helping people learn how to think about what to do. In future work, we will compare the effects of different types of reflection on the subsequent changes in people's decision strategies. Developing apps that prompt people to reflect on their decisions may be a promising approach to accelerating cognitive growth and promoting lifelong learning.
BibTeX

Perceiving Systems Article Red shape, blue shape: Political ideology influences the social perception of body shape Quiros-Ramirez, M. A., Streuber, S., Black, M. J. Nature Humanities and Social Sciences Communications, 8:148, June 2021 (Published)
Political elections have a profound impact on individuals and societies. Optimal voting is thought to be based on informed and deliberate decisions yet, it has been demonstrated that the outcomes of political elections are biased by the perception of candidates’ facial features and the stereotypical traits voters attribute to these. Interestingly, political identification changes the attribution of stereotypical traits from facial features. This study explores whether the perception of body shape elicits similar effects on political trait attribution and whether these associations can be visualized. In Experiment 1, ratings of 3D body shapes were used to model the relationship between perception of 3D body shape and the attribution of political traits such as ‘Republican’, ‘Democrat’, or ‘Leader’. This allowed analyzing and visualizing the mental representations of stereotypical 3D body shapes associated with each political trait. Experiment 2 was designed to test whether political identification of the raters affected the attribution of political traits to different types of body shapes. The results show that humans attribute political traits to the same body shapes differently depending on their own political preference. These findings show that our judgments of others are influenced by their body shape and our own political views. Such judgments have potential political and societal implications.
pdf on-line sup. mat. sup. figure author pdf DOI BibTeX

Intelligent Control Systems Article Structured learning of rigid-body dynamics: A survey and unified view from a robotics perspective Geist, A. R., Trimpe, S. GAMM-Mitteilungen, 44(2):e202100009, Special Issue: Scientific Machine Learning, June 2021 (Published)
Accurate models of mechanical system dynamics are often critical for model-based control and reinforcement learning. Fully data-driven dynamics models promise to ease the process of modeling and analysis, but require considerable amounts of data for training and often do not generalize well to unseen parts of the state space. Combining data-driven modeling with prior analytical knowledge is an attractive alternative as the inclusion of structural knowledge into a regression model improves the model's data efficiency and physical integrity. In this article, we survey supervised regression models that combine rigid-body mechanics with data-driven modeling techniques. We analyze the different latent functions (such as kinetic energy or dissipative forces) and operators (such as differential operators and projection matrices) underlying common descriptions of rigid-body mechanics. Based on this analysis, we provide a unified view on the combination of data-driven regression models, such as neural networks and Gaussian processes, with analytical model priors. Furthermore, we review and discuss key techniques for designing structured models such as automatic differentiation.
DOI BibTeX

Rationality Enhancement Article Toward a Formal Theory of Proactivity Lieder, F., Iwama, G. Cognitive, Affective, & Behavioral Neuroscience, 42:490-508, Springer, June 2021 (Published)
Beyond merely reacting to their environment and impulses, people have the remarkable capacity to proactively set and pursue their own goals. But the extent to which they leverage this capacity varies widely across people and situations. The goal of this article is to make the mechanisms and variability of proactivity more amenable to rigorous experiments and computational modeling. We proceed in three steps. First, we develop and validate a mathematically precise behavioral measure of proactivity and reactivity that can be applied across a wide range of experimental paradigms. Second, we propose a formal definition of proactivity and reactivity, and develop a computational model of proactivity in the AX Continuous Performance Task (AX-CPT). Third, we develop and test a computational-level theory of meta-control over proactivity in the AX-CPT that identifies three distinct meta-decision-making problems: intention setting, resolving response conflict between intentions and automaticity, and deciding whether to recall context and intentions into working memory. People's response frequencies in the AX-CPT were remarkably well captured by a mixture between the predictions of our models of proactive and reactive control. Empirical data from an experiment varying the incentives and contextual load of an AX-CPT confirmed the predictions of our meta-control model of individual differences in proactivity. Our results suggest that proactivity can be understood in terms of computational models of meta-control. Our model makes additional empirically testable predictions. Future work will extend our models from proactive control in the AX-CPT to proactive goal creation and goal pursuit in the real world.
Toward a formal theory of proactivity DOI URL BibTeX

Empirical Inference Conference Paper Uncertainty-Based Biological Age Estimation of Brain MRI Scans Armanious, K., Abdulatif, S., Shi, W., Hepp, T., Gatidis, S., Yang, B. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1100-1104, IEEE, June 2021 (Published) DOI BibTeX

Perceiving Systems Conference Paper We are More than Our Joints: Predicting how 3D Bodies Move Zhang, Y., Black, M. J., Tang, S. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 3371-3381, IEEE, Piscataway, NJ, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), June 2021 (Published)
A key step towards understanding human behavior is the prediction of 3D human motion. Successful solutions have many applications in human tracking, HCI, and graphics. Most previous work focuses on predicting a time series of future 3D joint locations given a sequence 3D joints from the past. This Euclidean formulation generally works better than predicting pose in terms of joint rotations. Body joint locations, however, do not fully constrain 3D human pose, leaving degrees of freedom (like rotation about a limb) undefined. Note that 3D joints can be viewed as a sparse point cloud. Thus the problem of human motion prediction can be seen as a problem of point cloud prediction. With this observation, we instead predict a sparse set of locations on the body surface that correspond to motion capture markers. Given such markers, we fit a parametric body model to recover the 3D body of the person. These sparse surface markers also carry detailed information about human movement that is not present in the joints, increasing the naturalness of the predicted motions. Using the AMASS dataset, we train MOJO (More than Our JOints), which is a novel variational autoencoder with a latent DCT space that generates motions from latent frequencies. MOJO preserves the full temporal resolution of the input motion, and sampling from the latent frequencies explicitly introduces high-frequency components into the generated motion. We note that motion prediction methods accumulate errors over time, resulting in joints or markers that diverge from true human bodies. To address this, we fit the SMPL-X body model to the predictions at each time step, projecting the solution back onto the space of valid bodies, before propagating the new markers in time. Quantitative and qualitative experiments show that our approach produces state-of-the-art results and realistic 3D body animations. The code is available for research purposes at https://yz-cnsdqz.github.io/MOJO/MOJO.html.
code arXiv DOI BibTeX

Perceiving Systems Conference Paper AGORA: Avatars in Geography Optimized for Regression Analysis Patel, P., Huang, C. P., Tesch, J., Hoffmann, D. T., Tripathi, S., Black, M. J. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 13463-13473, IEEE, Piscataway, NJ, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), June 2021 (Published)
While the accuracy of 3D human pose estimation from images has steadily improved on benchmark datasets, the best methods still fail in many real-world scenarios. This suggests that there is a domain gap between current datasets and common scenes containing people. To obtain ground-truth 3D pose, current datasets limit the complexity of clothing, environmental conditions, number of subjects, and occlusion. Moreover, current datasets evaluate sparse 3D joint locations corresponding to the major joints of the body, ignoring the hand pose and the face shape. To evaluate the current state-of-the-art methods on more challenging images, and to drive the field to address new problems, we introduce AGORA, a synthetic dataset with high realism and highly accurate ground truth. Here we use 4240 commercially-available, high-quality, textured human scans in diverse poses and natural clothing; this includes 257 scans of children. We create reference 3D poses and body shapes by fitting the SMPL-X body model (with face and hands) to the 3D scans, taking into account clothing. We create around 14K training and 3K test images by rendering between 5 and 15 people per image us- ing either image-based lighting or rendered 3D environments, taking care to make the images physically plausible and photoreal. In total, AGORA consists of 173K individual person crops. We evaluate existing state-of-the- art methods for 3D human pose estimation on this dataset. and find that most methods perform poorly on images of children. Hence, we extend the SMPL-X model to better capture the shape of children. Additionally, we fine- tune methods on AGORA and show improved performance on both AGORA and 3DPW, confirming the realism of the dataset. We provide all the registered 3D reference training data, rendered images, and a web-based evaluation site at https://agora.is.tue.mpg.de/.
dataset pdf video DOI BibTeX

Perceiving Systems Conference Paper BABEL: Bodies, Action and Behavior with English Labels Punnakkal, A. R., Chandrasekaran, A., Athanasiou, N., Quiros-Ramirez, M. A., Black, M. J. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 722-731, IEEE, Piscataway, NJ, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021) , June 2021 (Published)
Understanding the semantics of human movement -- the what, how and why of the movement -- is an important problem that requires datasets of human actions with semantic labels. Existing datasets take one of two approaches. Large-scale video datasets contain many action labels but do not contain ground-truth 3D human motion. Alternatively, motion-capture (mocap) datasets have precise body motions but are limited to a small number of actions. To address this, we present BABEL, a large dataset with language labels describing the actions being performed in mocap sequences. BABEL consists of action labels for about 43.5 hours of mocap sequences from AMASS. Action labels are at two levels of abstraction -- sequence labels which describe the overall action in the sequence, and frame labels which describe all actions in every frame of the sequence. Each frame label is precisely aligned with the duration of the corresponding action in the mocap sequence, and multiple actions can overlap. There are over 28k sequence labels, and 63k frame labels in BABEL, which belong to over 250 unique action categories. Labels from BABEL can be leveraged for tasks like action recognition, temporal action localization, motion synthesis, etc. To demonstrate the value of BABEL as a benchmark, we evaluate the performance of models on 3D action recognition. We demonstrate that BABEL poses interesting learning challenges that are applicable to real-world scenarios, and can serve as a useful benchmark of progress in 3D action recognition. The dataset, baseline method, and evaluation code is made available, and supported for academic research purposes at https://babel.is.tue.mpg.de/.
dataset poster pdf sup mat video code DOI BibTeX

Perceiving Systems Conference Paper LEAP: Learning Articulated Occupancy of People Mihajlovic, M., Zhang, Y., Black, M. J., Tang, S. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 10456-10466, IEEE, Piscataway, NJ, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), June 2021 (Published)
Substantial progress has been made on modeling rigid 3D objects using deep implicit representations. Yet, extending these methods to learn neural models of human shape is still in its infancy. Human bodies are complex and the key challenge is to learn a representation that generalizes such that it can express body shape deformations for unseen subjects in unseen, highly-articulated, poses. To address this challenge, we introduce LEAP (LEarning Articulated occupancy of People), a novel neural occupancy representation of the human body. Given a set of bone transformations (i.e. joint locations and rotations) and a query point in space, LEAP first maps the query point to a canonical space via learned linear blend skinning (LBS) functions and then efficiently queries the occupancy value via an occupancy network that models accurate identity- and pose- dependent deformations in the canonical space. Experiments show that our canonicalized occupancy estimation with the learned LBS functions greatly improves the generalization capability of the learned occupancy representation across various human shapes and poses, outperforming existing solutions in all settings.
project arXiv pdf code DOI BibTeX

Perceiving Systems Conference Paper SCALE: Modeling Clothed Humans with a Surface Codec of Articulated Local Elements Ma, Q., Saito, S., Yang, J., Tang, S., Black, M. J. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 16077-16088, IEEE, Piscataway, NJ, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), June 2021 (Published)
Learning to model and reconstruct humans in clothing is challenging due to articulation, non-rigid deformation, and varying clothing types and topologies. To enable learning, the choice of representation is the key. Recent work uses neural networks to parameterize local surface elements. This approach captures locally coherent geometry and non-planar details, can deal with varying topology, and does not require registered training data. However, naively using such methods to model 3D clothed humans fails to capture fine-grained local deformations and generalizes poorly. To address this, we present three key innovations: First, we deform surface elements based on a human body model such that large-scale deformations caused by articulation are explicitly separated from topological changes and local clothing deformations. Second, we address the limitations of existing neural surface elements by regressing local geometry from local features, significantly improving the expressiveness. Third, we learn a pose embedding on a 2D parameterization space that encodes posed body geometry, improving generalization to unseen poses by reducing non-local spurious correlations. We demonstrate the efficacy of our surface representation by learning models of complex clothing from point clouds. The clothing can change topology and deviate from the topology of the body. Once learned, we can animate previously unseen motions, producing high-quality point clouds, from which we generate realistic images with neural rendering. We assess the importance of each technical contribution and show that our approach outperforms the state-of-the- art methods in terms of reconstruction accuracy and inference time. The code is available for research purposes at https://qianlim.github.io/SCALE.
Project Page Code Video arXiv PDF Supp. Poster DOI BibTeX

Perceiving Systems Conference Paper SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks Saito, S., Yang, J., Ma, Q., Black, M. J. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2885-2896, IEEE, Piscataway, NJ, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), June 2021 (Published)
We present SCANimate, an end-to-end trainable framework that takes raw 3D scans of a clothed human and turns them into an animatable avatar. These avatars are driven by pose parameters and have realistic clothing that moves and deforms naturally. SCANimate does not rely on a customized mesh template or surface mesh registration. We observe that fitting a parametric 3D body model, like SMPL, to a clothed human scan is tractable while surface registration of the body topology to the scan is often not, because clothing can deviate significantly from the body shape. We also observe that articulated transformations are invertible, resulting in geometric cycle-consistency in the posed and unposed shapes. These observations lead us to a weakly supervised learning method that aligns scans into a canonical pose by disentangling articulated deformations without template-based surface registration. Furthermore, to complete missing regions in the aligned scans while modeling pose-dependent deformations, we introduce a locally pose-aware implicit function that learns to complete and model geometry with learned pose correctives. In contrast to commonly used global pose embeddings, our local pose conditioning significantly reduces long-range spurious correlations and improves generalization to unseen poses, especially when training data is limited. Our method can be applied to pose- aware appearance modeling to generate a fully textured avatar. We demonstrate our approach on various clothing types with different amounts of training data, outperforming existing solutions and other variants in terms of fidelity and generality in every setting. The code is available at https://scanimate.is.tue.mpg.de.
Project Page PDF Supp. Video arXiv Poster code DOI URL BibTeX

Micro, Nano, and Molecular Systems Article Soft urinary bladder phantom for endoscopic training Choi, E., Waldbillig, F., Jeong, M., Li, D., Goyal, R., Weber, P., Miernik, A., Grüne, B., Hein, S., Suarez-Ibarrola, R., Kriegmair, M. C., Qiu, T. Annals of Biomedical Engineering, 49(9):2412-2420, May 2021
Bladder cancer (BC) is the main disease in the urinary tract with a high recurrence rate and it is diagnosed by cystoscopy (CY). To train the CY procedures, a realistic bladder phantom with correct anatomy and physiological properties is highly required. Here, we report a soft bladder phantom (FlexBlad) that mimics many important features of a human bladder. Under filling, it shows a large volume expansion of more than 300% with a tunable compliance in the range of 12.2 ± 2.8 – 32.7 ± 5.4 mL cmH2O−1 by engineering the thickness of the bladder wall. By 3D printing and multi-step molding, detailed anatomical structures are represented on the inner bladder wall, including sub-millimeter blood vessels and reconfigurable bladder tumors. Endoscopic inspection and tumor biopsy were successfully performed. A multi-center study was carried out, where two groups of urologists with different experience levels executed consecutive CYs in the phantom and filled in questionnaires. The learning curves reveal that the FlexBlad has a positive effect in the endourological training across different skill levels. The statistical results validate the usability of the phantom as a valuable educational tool, and the dynamic feature expands its use as a versatile endoscopic training platform.
DOI URL BibTeX

Robotic Materials Conference Paper Soft Electrohydraulic Actuators for Origami Inspired Shape-Changing Interfaces Purnendu, , Acome, E., Keplinger, C., Gross, M. D., Bruns, C., Leithinger, D. In CHI EA ’21: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, 377, ACM, New York, NY, Conference on Human Factors in Computing Systems (CHI 2021), May 2021 (Published)
In this paper, we present electrohydraulic actuators for origami inspired shape-changing interfaces, which are capable of producing sharp hinge-like bends. These compliant actuators generate an immediate hydraulic force upon electrostatic activation without an external fluid supply source, are silent and fast in operation, and can be fabricated with commodity materials. We experimentally investigate the characteristics of these actuators and present application scenarios for actuating existing objects as well as origami folds. In addition, we present a software tool for the design and fabrication of shape-changing interfaces using these electrohydraulic actuators. We also discuss how this work opens avenues for other possible applications in Human Computer Interaction (HCI).
DOI URL BibTeX

Empirical Inference Conference Paper CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning Ahmed*, O., Träuble*, F., Goyal, A., Neitz, A., Bengio, Y., Schölkopf, B., Wüthrich, M., Bauer, S. In 9th International Conference on Learning Representations (ICLR 2021), 20, ICLR, Wien, International Conference on Learning Representations (ICLR), May 2021, *equal contribution (Published) URL BibTeX

Empirical Inference Conference Paper A teacher-student framework to distill future trajectories Neitz*, A., Parascandolo*, G., Schölkopf, B. In 9th International Conference on Learning Representations (ICLR), May 2021, *equal contribution (Published) URL BibTeX

Empirical Inference Conference Paper Fast And Slow Learning Of Recurrent Independent Mechanisms Madan, K., Ke, N. R., Goyal, A., Schölkopf, B., Bengio, Y. In 9th International Conference on Learning Representations (ICLR), May 2021 (Published) URL BibTeX

Empirical Inference Conference Paper Learning explanations that are hard to vary Parascandolo*, G., Neitz*, A., Orvieto, A., Gresele, L., Schölkopf, B. In 9th International Conference on Learning Representations (ICLR), May 2021, *equal contribution (Published) arXiv URL BibTeX

Empirical Inference Conference Paper Predicting Infectiousness for Proactive Contact Tracing Bengio, Y., Gupta, P., Maharaj, T., Rahaman, N., Weiss, M., Deleu, T., Muller, E. B., Qu, M., Schmidt, V., St-Charles, P., Alsdurf, H., Bilaniuk, O., Buckeridge, D., Marceau-Caron, G., Carrier, P., Ghosn, J., Ortiz Gagne, S., Pal, C., Rish, I., Schölkopf, B., et al. In The Ninth International Conference on Learning Representations (ICLR 2021), 9th International Conference on Learning Representations (ICLR), May 2021 (Published) URL BibTeX

Empirical Inference Conference Paper Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient Estimator Paulus, M. B., Maddison, C. J., Krause, A. In The Ninth International Conference on Learning Representations , 9th International Conference on Learning Representations (ICLR 2021), May 2021 (Published) URL BibTeX

Empirical Inference Conference Paper Spatially Structured Recurrent Modules Rahaman, N., Goyal, A., Gondal, M. W., Wüthrich, M., Bauer, S., Sharma, Y., Bengio, Y., Schölkopf, B. In 9th International Conference on Learning Representations (ICLR), May 2021 (Published) URL BibTeX

Rationality Enhancement Conference Paper ’What Do You Want in Life and How Can You Get There?’ An Evaluation of a Hierarchical Goal-Setting Chatbot González Cruz, H., Prentice, M., Lieder, F. 13th Annual meeting of the Society for the Science of Motivation, Abstract of presentation at the 13th SSM Virtual Congress, Society for the Science of Motivation, Virtual Congress, May 2021 (Published)
The translation of abstract, long-term goals, such as “make a contribution to the field of motivation science,” into short-term, actionable intentions is inherently difficult. Hierarchical goal-setting, a goal-setting strategy in which people construct a hierarchy of increasingly more concrete and proximal subgoals is a promising way to support this process. We designed a goal-setting chatbot that helps people craft action hierarchies for achieving their life goals. We conducted a large online field experiment with two follow-up surveys at one week and one month after the intervention to evaluate the effects of a brief hierarchical planning session with our chatbot on goal pursuit. Although there were no main effects of hierarchical planning on goal-related outcomes, exploratory analyses indicated that hierarchical goal-setting enabled people to make more progress towards goals that appeared less actionable. This suggests that supporting hierarchical goal-setting with chatbots is a promising approach to helping people who don’t know how to pursue their goals.
BibTeX

Empirical Inference Conference Paper Contextual Latent-Movements Off-Policy Optimization for Robotic Manipulation Skills Tosatto, S., Chalvatzaki, G., Peters, J. 2021 IEEE International Conference on Robotics and Automation (ICRA), 10815-10821, IEEE, IEEE International Conference on Robotics and Automation (ICRA 2021) , May 2021 (Published) DOI URL BibTeX

Empirical Inference Conference Paper Differentiable Physics Models for Real-world Offline Model-based Reinforcement Learning Lutter, M., Silberbauer, J., Watson, J., Peters, J. IEEE International Conference on Robotics and Automation (ICRA), 4163-4170, IEEE, May 2021 (Published) DOI URL BibTeX

Empirical Inference Conference Paper Directed Acyclic Graph Neural Network for Human Motion Prediction Li, Q., Chalvatzaki, G., Peters, J., Wang, Y. 2021 IEEE International Conference on Robotics and Automation (ICRA), 3197-3204, IEEE, IEEE International Conference on Robotics and Automation (ICRA 2021), May 2021 (Published) DOI URL BibTeX

Rationality Enhancement Conference Paper Evaluating Life Reflection Techniques to Help People Set Better Value-Driven Life Goals Prentice, M., González Cruz, H., Lieder, F. 13th Annual Conference of the Society for the Science of Motivation, Society for the Science of Motivation, 13th Annual Conference of the Society for the Science of Motivation , May 2021
We tested two reflection techniques derived from Acceptance Commitment Therapy for helping people set life goals that are self-determined, communal, and future-minded. Participants were assigned randomly to control, Eulogy, or the Valued Living Questionnaire (VLQ) conditions. Eulogy participants envisioned what they wanted people to say about them at their funeral. In VLQ, participants rated the importance of life domains and how consistent their behavior has recently been with the importance assigned to each domain. Participants then set a life goal, rated it for self-determination, and indicated its time horizon and life domain. Despite only requiring internal reflection, Eulogy was particularly effective for generating self-determined goals that were interpersonal and future-minded. The Eulogy exercise may be a useful and important building block for inspiring the setting and effective pursuit of goals that are simultaneously self-determined, communal, and future-minded. Future research will examine its efficacy in changing experienced well-being and enacted well-doing.
BibTeX

Autonomous Learning Conference Paper Extracting Strong Policies for Robotics Tasks from Zero-order Trajectory Optimizers Pinneri*, C., Sawant*, S., Blaes, S., Martius, G. In The Ninth International Conference on Learning Representations (ICLR), 9th International Conference on Learning Representations (ICLR 2021) , May 2021, *equal contribution (Published)
Solving high-dimensional, continuous robotic tasks is a challenging optimization problem. Model-based methods that rely on zero-order optimizers like the cross-entropy method (CEM) have so far shown strong performance and are considered state-of-the-art in the model-based reinforcement learning community. However, this success comes at the cost of high computational complexity, being therefore not suitable for real-time control. In this paper, we propose a technique to jointly optimize the trajectory and distill a policy, which is essential for fast execution in real robotic systems. Our method builds upon standard approaches, like guidance cost and dataset aggregation, and introduces a novel adaptive factor which prevents the optimizer from collapsing to the learner's behavior at the beginning of the training. The extracted policies reach unprecedented performance on challenging tasks as making a humanoid stand up and opening a door without reward shaping
OpenReview URL BibTeX

Empirical Inference Conference Paper Learning Human-like Hand Reaching for Human-Robot Handshaking Prasad, V., Stock-Homburg, R., Peters, J. 2021 IEEE International Conference on Robotics and Automation (ICRA), 3612-3618, IEEE, IEEE International Conference on Robotics and Automation (ICRA 2021), May 2021 (Published) DOI URL BibTeX

Empirical Inference Conference Paper Meta Attention Networks: Meta-Learning Attention to Modulate Information Between Recurrent Independent Mechanisms Madan, K., Ke, N. R., Goyal, A., Schölkopf, B., Bengio, Y. 9th International Conference on Learning Representations (ICLR), May 2021 (Published) URL BibTeX

Empirical Inference Conference Paper Model Predictive Actor-Critic: Accelerating Robot Skill Acquisition with Deep Reinforcement Learning Morgan, A., Nandha, D., Chalvatzaki, G., D’Eramo, C., Dollar, A., Peters, J. IEEE International Conference on Robotics and Automation (ICRA), 6672-6678, IEEE, May 2021 (Published) DOI URL BibTeX

Empirical Inference Conference Paper On the Transfer of Disentangled Representations in Realistic Settings Dittadi*, A., Träuble*, F., Locatello, F., Wüthrich, M., Agrawal, V., Winther, O., Bauer, S., Schölkopf, B. In The Ninth International Conference on Learning Representations (ICLR), The 9th International Conference on Learning Representations (ICLR 2021) , May 2021, *equal contribution (Published) URL BibTeX

Intelligent Control Systems Conference Paper Practical and Rigorous Uncertainty Bounds for Gaussian Process Regression Fiedler, C., Scherer, C. W., Trimpe, S. In The Thirty-Fifth AAAI Conference on Artificial Intelligence, the Thirty-Third Conference on Innovative Applications of Artificial Intelligence, the Eleventh Symposium on Educational Advances in Artificial Intelligence, 8:7439-7447, AAAI Press, Palo Alto, CA, Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2021), Thirty-Third Conference on Innovative Applications of Artificial Intelligence (IAAI 2021), Eleventh Symposium on Educational Advances in Artificial Intelligence (EAAI 2021), May 2021
Gaussian Process regression is a popular nonparametric regression method based on Bayesian principles that provides uncertainty estimates for its predictions. However, these estimates are of a Bayesian nature, whereas for some important applications, like learning-based control with safety guarantees, frequentist uncertainty bounds are required. Although such rigorous bounds are available for Gaussian Processes, they are too conservative to be useful in applications. This often leads practitioners to replacing these bounds by heuristics, thus breaking all theoretical guarantees. To address this problem, we introduce new uncertainty bounds that are rigorous, yet practically useful at the same time. In particular, the bounds can be explicitly evaluated and are much less conservative than state of the art results. Furthermore, we show that certain model misspecifications lead to only graceful degradation. We demonstrate these advantages and the usefulness of our results for learning-based control with numerical examples.},
URL BibTeX

Empirical Inference Conference Paper Recurrent Independent Mechanisms Goyal, A., Lamb, A., Hoffmann, J., Sodhani, S., Levine, S., Bengio, Y., Schölkopf, B. In The Ninth International Conference on Learning Representations (ICLR), 9th International Conference on Learning Representations (ICLR 2021), May 2021 (Published) URL BibTeX

Empirical Inference Conference Paper ResNet After All: Neural ODEs and Their Numerical Solution Ott, K., Katiyar, P., Hennig, P., Tiemann, M. In The Ninth International Conference on Learning Representations (ICLR 2021), 9th International Conference on Learning Representations (ICLR), May 2021 (Published) URL BibTeX

Haptic Intelligence Conference Paper Robot Interaction Studio: A Platform for Unsupervised HRI Mohan, M., Nunez, C. M., Kuchenbecker, K. J. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 3330-3336, Xi’an, China, May 2021 (Published)
Robots hold great potential for supporting exercise and physical therapy, but such systems are often cumbersome to set up and require expert supervision. We aim to solve these concerns by combining Captury Live, a real-time markerless motion-capture system, with a Rethink Robotics Baxter Research Robot to create the Robot Interaction Studio. We evaluated this platform for unsupervised human-robot interaction (HRI) through a 75-minute-long user study with seven adults who were given minimal instructions and no feedback about their actions. The robot used sounds, facial expressions, facial colors, head motions, and arm motions to sequentially present three categories of cues in randomized order while constantly rotating its face screen to look at the user. Analysis of the captured user motions shows that the cue type significantly affected the distance subjects traveled and the amount of time they spent within the robot’s reachable workspace, in alignment with the design of the cues. Heat map visualizations of the recorded user hand positions confirm that users tended to mimic the robot’s arm poses. Despite some initial frustration, taking part in this study did not significantly change user opinions of the robot. We reflect on the advantages of the proposed approach to unsupervised HRI as well as the limitations and possible future extensions of our system.
DOI BibTeX

Haptic Intelligence Master Thesis Robotic Surgery Training in AR: Multimodal Record and Replay Krauthausen, F. University of Stuttgart, Stuttgart, Germany, May 2021, Study Program in Software Engineering (Published) BibTeX

Empirical Inference Probabilistic Learning Group Conference Paper Scaling Guarantees for Nearest Counterfactual Explanations Mohammadi, K., Karimi, A., Barthe, G., Valera, I. AIES ’21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 177-187, (Editors: Marion Fourcade, Benjamin Kuipers, Seth Lazar and Deirdre K. Mulligan), ACM, New York, NY, Fourth AAAI/ACM Conference on AI, Ethics, and Society (AIES 2021), May 2021 (Published) arXiv DOI BibTeX

Autonomous Learning Conference Paper Self-supervised Visual Reinforcement Learning with Object-centric Representations Zadaianchuk*, A., Seitzer*, M., Martius, G. In 9th International Conference on Learning Representations (ICLR 2021), May 2021, *equal contribution
Autonomous agents need large repertoires of skills to act reasonably on new tasks that they have not seen before. However, acquiring these skills using only a stream of high-dimensional, unstructured, and unlabeled observations is a tricky challenge for any autonomous agent. Previous methods have used variational autoencoders to encode a scene into a low-dimensional vector that can be used as a goal for an agent to discover new skills. Nevertheless, in compositional/multi-object environments it is difficult to disentangle all the factors of variation into such a fixed-length representation of the whole scene. We propose to use object-centric representations as a modular and structured observation space, which is learned with a compositional generative world model. We show that the structure in the representations in combination with goal-conditioned attention policies helps the autonomous agent to discover and learn useful skills. These skills can be further combined to address compositional tasks like the manipulation of several different objects.
Arxiv Code Paper @ ICLR 2021 (spotlight video) OpenReview BibTeX

Empirical Inference Conference Paper Spatial Dependency Networks: Neural Layers for Improved Generative Image Modeling Miladinović, D., Stanić, A., Bauer, S., Schmidhuber, J., Buhmann, J. M. In The 9th International Conference on Learning Representations (ICLR), The Ninth International Conference on Learning Representations (ICLR 2021), May 2021 (Published) URL BibTeX

Article Tails, Flails, and Sails: How Appendages Improve Terrestrial Maneuverability by Improving Stability Shield, S., Jericevich, R., Patel, A., Jusufi, A. Integrative and Comparative Biology, 61(2), May 2021 DOI BibTeX

Rationality Enhancement Technical Report Toward a Science of Effective Well-Doing Lieder, F., Prentice, M., Corwin-Renner, E. May 2021
Well-doing, broadly construed, encompasses acting and thinking in ways that contribute to humanity’s flourishing in the long run. This often takes the form of setting a prosocial goal and pursuing it over an extended period of time. To set and pursue goals in a way that is extremely beneficial for humanity (effective well-doing), people often have to employ critical thinking and far-sighted, rational decision-making in the service of the greater good. To promote effective well-doing, we need to better understand its determinants and psychological mechanisms, as well as the barriers to effective well-doing and how they can be overcome. In this article, we introduce a taxonomy of different forms of well-doing and introduce a conceptual model of the cognitive mechanisms of effective well-doing. We view effective well-doing as the upper end of a moral continuum whose lower half comprises behaviors that are harmful to humanity (ill-doing), and we argue that the capacity for effective well-doing has to be developed through personal growth (e.g., learning how to pursue goals effectively). Research on these phenomena has so far been scattered across numerous disconnected literatures from multiple disciplines. To bring these communities together, we call for the establishment of a transdisciplinary research field focussed on understanding and promoting effective well-doing and personal growth as well as understanding and reducing ill-doing. We define this research field in terms of its goals and questions. We review what is already known about these questions in different disciplines and argue that laying the scientific foundation for promoting effective well-doing is one of the most valuable contributions that the behavioral sciences can make in the 21st century.
Preprint BibTeX

Haptic Intelligence Conference Paper Ungrounded Vari-Dimensional Tactile Fingertip Feedback for Virtual Object Interaction Young, E. M., Kuchenbecker, K. J. In Proceedings of the ACM CHI Conference on Human Factors in Computing Systems, (217)1-14, Yokohama, Japan, May 2021 (Published)
Compared to grounded force feedback, providing tactile feedback via a wearable device can free the user and broaden the potential applications of simulated physical interactions. However, neither the limitations nor the full potential of tactile-only feedback have been precisely examined. Here we investigate how the dimensionality of cutaneous fingertip feedback affects user movements and virtual object recognition. We combine a recently invented 6-DOF fingertip device with motion tracking, a head-mounted display, and novel contact-rendering algorithms to enable a user to tactilely explore immersive virtual environments. We evaluate rudimentary 1-DOF, moderate 3-DOF, and complex 6-DOF tactile feedback during shape discrimination and mass discrimination, also comparing to interactions with real objects. Results from 20 naive study participants show that higher-dimensional tactile feedback may indeed allow completion of a wider range of virtual tasks, but that feedback dimensionality surprisingly does not greatly affect the exploratory techniques employed by the user.
DOI BibTeX