Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Haptic Intelligence Miscellaneous Understanding the Pull-off Force of the Human Fingerpad Nam, S., Kuchenbecker, K. J. Work-in-progress paper (2 pages) presented at the IEEE World Haptics Conference (WHC), Tokyo, Japan, July 2019 (Published)
To understand the adhesive force that occurs when a finger pulls off of a smooth surface, we built an apparatus to measure the fingerpad’s moisture, normal force, and real contact area over time during interactions with a glass plate. We recorded a total of 450 trials (45 interactions by each of ten human subjects), capturing a wide range of values across the aforementioned variables. The experimental results showed that the pull-off force increases with larger finger contact area and faster detachment rate. Additionally, moisture generally increases the contact area of the finger, but too much moisture can restrict the increase in the pull-off force.
BibTeX

Rationality Enhancement Conference Paper What’s in the Adaptive Toolbox and How Do People Choose From It? Rational Models of Strategy Selection in Risky Choice Mohnert, F., Pachur, T., Lieder, F. 41st Annual Meeting of the Cognitive Science Society, July 2019
Although process data indicates that people often rely on various (often heuristic) strategies to choose between risky options, our models of heuristics cannot predict people's choices very accurately. To address this challenge, it has been proposed that people adaptively choose from a toolbox of simple strategies. But which strategies are contained in this toolbox? And how do people decide when to use which decision strategy? Here, we develop a model according to which each person selects decisions strategies rationally from their personal toolbox; our model allows one to infer which strategies are contained in the cognitive toolbox of an individual decision-maker and specifies when she will use which strategy. Using cross-validation on an empirical data set, we find that this rational model of strategy selection from a personal adaptive toolbox predicts people's choices better than any single strategy (even when it is allowed to vary across participants) and better than previously proposed toolbox models. Our model comparisons show that both inferring the toolbox and rational strategy selection are critical for accurately predicting people's risky choices. Furthermore, our model-based data analysis reveals considerable individual differences in the set of strategies people are equipped with and how they choose among them; these individual differences could partly explain why some people make better choices than others. These findings represent an important step towards a complete formalization of the notion that people select their cognitive strategies from a personal adaptive toolbox.
URL BibTeX

Perceiving Systems Conference Paper Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation Ranjan, A., Jampani, V., Balles, L., Kim, K., Sun, D., Wulff, J., Black, M. J. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 12240-12249, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
We address the unsupervised learning of several interconnected problems in low-level vision: single view depth prediction, camera motion estimation, optical flow, and segmentation of a video into the static scene and moving regions. Our key insight is that these four fundamental vision problems are coupled through geometric constraints. Consequently, learning to solve them together simplifies the problem because the solutions can reinforce each other. We go beyond previous work by exploiting geometry more explicitly and segmenting the scene into static and moving regions. To that end, we introduce Competitive Collaboration, a framework that facilitates the coordinated training of multiple specialized neural networks to solve complex problems. Competitive Collaboration works much like expectation-maximization, but with neural networks that act as both competitors to explain pixels that correspond to static or moving regions, and as collaborators through a moderator that assigns pixels to be either static or independently moving. Our novel method integrates all these problems in a common framework and simultaneously reasons about the segmentation of the scene into moving objects and the static background, the camera motion, depth of the static scene structure, and the optical flow of moving objects. Our model is trained without any supervision and achieves state-of-the-art performance among joint unsupervised methods on all sub-problems.
Paper URL BibTeX

Autonomous Learning Article Autonomous Identification and Goal-Directed Invocation of Event-Predictive Behavioral Primitives Gumbsch, C., Butz, M. V., Martius, G. IEEE Transactions on Cognitive and Developmental Systems, 13(2):298-311, June 2019 (Published)
Voluntary behavior of humans appears to be composed of small, elementary building blocks or behavioral primitives. While this modular organization seems crucial for the learning of complex motor skills and the flexible adaption of behavior to new circumstances, the problem of learning meaningful, compositional abstractions from sensorimotor experiences remains an open challenge. Here, we introduce a computational learning architecture, termed surprise-based behavioral modularization into event-predictive structures (SUBMODES), that explores behavior and identifies the underlying behavioral units completely from scratch. The SUBMODES architecture bootstraps sensorimotor exploration using a self-organizing neural controller. While exploring the behavioral capabilities of its own body, the system learns modular structures that predict the sensorimotor dynamics and generate the associated behavior. In line with recent theories of event perception, the system uses unexpected prediction error signals, i.e., surprise, to detect transitions between successive behavioral primitives. We show that, when applied to two robotic systems with completely different body kinematics, the system manages to learn a variety of complex behavioral primitives. Moreover, after initial self-exploration the system can use its learned predictive models progressively more effectively for invoking model predictive planning and goal-directed control in different tasks and environments.
arXiv PDF video DOI URL BibTeX

Empirical Inference Conference Paper Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities Ganea, O., Gelly, S., Becigneul, G., Severyn, A. Proceedings of the 36th International Conference on Machine Learning (ICML), 97:2073-2082, Proceedings of Machine Learning Research, (Editors: Chaudhuri, Kamalika and Salakhutdinov, Ruslan), PMLR, June 2019 (Published) URL BibTeX

Perceiving Systems Conference Paper Capture, Learning, and Synthesis of 3D Speaking Styles Cudeiro, D., Bolkart, T., Laidlaw, C., Ranjan, A., Black, M. J. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 10101-10111, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Audio-driven 3D facial animation has been widely explored, but achieving realistic, human-like performance is still unsolved. This is due to the lack of available 3D datasets, models, and standard evaluation metrics. To address this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers. We then train a neural network on our dataset that factors identity from facial motion. The learned model, VOCA (Voice Operated Character Animation) takes any speech signal as input—even speech in languages other than English—and realistically animates a wide range of adult faces. Conditioning on subject labels during training allows the model to learn a variety of realistic speaking styles. VOCA also provides animator controls to alter speaking style, identity-dependent facial shape, and pose (i.e. head, jaw, and eyeball rotations) during animation. To our knowledge, VOCA is the only realistic 3D facial animation model that is readily applicable to unseen subjects without retargeting. This makes VOCA suitable for tasks like in-game video, virtual reality avatars, or any scenario in which the speaker, speech, or language is not known in advance. We make the dataset and model available for research purposes at http://voca.is.tue.mpg.de.
code Project Page video paper BibTeX

Empirical Inference Conference Paper Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations Locatello, F., Bauer, S., Lucic, M., Raetsch, G., Gelly, S., Schölkopf, B., Bachem, O. Proceedings of the 36th International Conference on Machine Learning (ICML), 97:4114-4124, Proceedings of Machine Learning Research, (Editors: Chaudhuri, Kamalika and Salakhutdinov, Ruslan), PMLR, June 2019 (Published) PDF URL BibTeX

Physical Intelligence Conference Paper Collective formation and cooperative function of a magnetic microrobotic swarm Xiaoguang Dong, M. S. IEEE, Robotics: Science and Systems, June 2019
Untethered magnetically actuated microrobots can access distant, enclosed and small spaces, such as inside microfluidic channels and the human body, making them appealing for minimal invasive tasks. Despite the simplicity of individual magnetic microrobots, a collective of these microrobots that can work closely and cooperatively would significantly enhance their capabilities. However, a challenge of realizing such collective magnetic microrobots is to coordinate their formations and motions with underactuated control signals. Here, we report a method that allows collective magnetic microrobots working closely and cooperatively by controlling their two-dimensional (2D) formations and collective motions in a programmable manner. The actively designed formation and intrinsic adjustable compliance within the group allow bio-inspired collective behaviors, such as navigating through cluttered environments and reconfigurable cooperative manipulation ability. These collective magnetic microrobots thus could enable potential applications in programmable self-assembly, modular robotics, swarm robotics, and biomedicine.
Collective Formation and Cooperative Function of a Magnetic Microrobotic Swarm DOI BibTeX

Autonomous Vision Conference Paper Connecting the Dots: Learning Representations for Active Monocular Depth Estimation Riegler, G., Liao, Y., Donne, S., Koltun, V., Geiger, A. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
We propose a technique for depth estimation with a monocular structured-light camera, \ie, a calibrated stereo set-up with one camera and one laser projector. Instead of formulating the depth estimation via a correspondence search problem, we show that a simple convolutional architecture is sufficient for high-quality disparity estimates in this setting. As accurate ground-truth is hard to obtain, we train our model in a self-supervised fashion with a combination of photometric and geometric losses. Further, we demonstrate that the projected pattern of the structured light sensor can be reliably separated from the ambient information. This can then be used to improve depth boundaries in a weakly supervised fashion by modeling the joint statistics of image and depth edges. The model trained in this fashion compares favorably to the state-of-the-art on challenging synthetic and real-world datasets. In addition, we contribute a novel simulator, which allows to benchmark active depth prediction algorithms in controlled conditions.
pdf suppmat Poster Project Page BibTeX

Intelligent Control Systems Conference Paper Data-driven inference of passivity properties via Gaussian process optimization Romer, A., Trimpe, S., Allgöwer, F. In Proceedings of the European Control Conference, European Control Conference (ECC), June 2019 (Published) PDF BibTeX

Perceiving Systems Conference Paper Expressive Body Capture: 3D Hands, Face, and Body from a Single Image Pavlakos, G., Choutas, V., Ghorbani, N., Bolkart, T., Osman, A. A. A., Tzionas, D., Black, M. J. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR) , 10975-10985, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
To facilitate the analysis of human actions, interactions and emotions, we compute a 3D model of human body pose, hand pose, and facial expression from a single monocular image. To achieve this, we use thousands of 3D scans to train a new, unified, 3D model of the human body, SMPL-X, that extends SMPL with fully articulated hands and an expressive face. Learning to regress the parameters of SMPL-X directly from images is challenging without paired images and 3D ground truth. Consequently, we follow the approach of SMPLify, which estimates 2D features and then optimizes model parameters to fit the features. We improve on SMPLify in several significant ways: (1) we detect 2D features corresponding to the face, hands, and feet and fit the full SMPL-X model to these; (2) we train a new neural network pose prior using a large MoCap dataset; (3) we define a new interpenetration penalty that is both fast and accurate; (4) we automatically detect gender and the appropriate body models (male, female, or neutral); (5) our PyTorch implementation achieves a speedup of more than 8x over Chumpy. We use the new method, SMPLify-X, to fit SMPL-X to both controlled images and images in the wild. We evaluate 3D accuracy on a new curated dataset comprising 100 images with pseudo ground-truth. This is a step towards automatic expressive human capture from monocular RGB data. The models, code, and data are available for research purposes at https://smpl-x.is.tue.mpg.de.
video code pdf suppl poster DOI URL BibTeX

Empirical Inference Conference Paper First-Order Adversarial Vulnerability of Neural Networks and Input Dimension Simon-Gabriel, C., Ollivier, Y., Bottou, L., Schölkopf, B., Lopez-Paz, D. Proceedings of the 36th International Conference on Machine Learning (ICML), 97:5809-5817, Proceedings of Machine Learning Research, (Editors: Chaudhuri, Kamalika and Salakhutdinov, Ruslan), PMLR, June 2019 (Published) PDF URL BibTeX

Empirical Inference Conference Paper Generate Semantically Similar Images with Kernel Mean Matching Jitkrittum*, W., Sangkloy*, P., Gondal, M. W., Raj, A., Hays, J., Schölkopf, B. 6th Workshop Women in Computer Vision (WiCV) (oral presentation), June 2019, *equal contribution (Published) BibTeX

Haptic Intelligence Article Implementation of a 6-DOF Parallel Continuum Manipulator for Delivering Fingertip Tactile Cues Young, E. M., Kuchenbecker, K. J. IEEE Transactions on Haptics, 12(3):295-306, June 2019 (Published)
Existing fingertip haptic devices can deliver different subsets of tactile cues in a compact package, but we have not yet seen a wearable six-degree-of-freedom (6-DOF) display. This paper presents the Fuppeteer (short for Fingertip Puppeteer), a device that is capable of controlling the position and orientation of a flat platform, such that any combination of normal and shear force can be delivered at any location on any human fingertip. We build on our previous work of designing a parallel continuum manipulator for fingertip haptics by presenting a motorized version in which six flexible Nitinol wires are actuated via independent roller mechanisms and proportional-derivative controllers. We evaluate the settling time and end-effector vibrations observed during system responses to step inputs. After creating a six-dimensional lookup table and adjusting simulated inputs using measured Jacobians, we show that the device can make contact with all parts of the fingertip with a mean error of 1.42 mm. Finally, we present results from a human-subject study. A total of 24 users discerned 9 evenly distributed contact locations with an average accuracy of 80.5%. Translational and rotational shear cues were identified reasonably well near the center of the fingertip and more poorly around the edges.
DOI BibTeX

Rationality Enhancement Conference Paper Introducing the Decision Advisor: A simple online tool that helps people overcome cognitive biases and experience less regret in real-life decisions lawama, G., Greenberg, S., Moore, D., Lieder, F. 40th Annual Meeting of the Society for Judgement and Decision Making, June 2019 (Published)
Cognitive biases shape many decisions people come to regret. To help people overcome these biases, Clear-erThinking.org developed a free online tool, called the Decision Advisor (https://programs.clearerthinking.org/decisionmaker.html). The Decision Advisor assists people in big real-life decisions by prompting them to generate more alternatives, guiding them to evaluate their alternatives according to principles of decision analysis, and educates them about pertinent biases while they are making their decision. In a within-subjects experiment, 99 participants reported significantly fewer biases and less regret for a decision supported by the Decision Advisor than for a previous unassisted decision.
DOI BibTeX

Empirical Inference Conference Paper Kernel Mean Matching for Content Addressability of GANs Jitkrittum*, W., Sangkloy*, P., Gondal, M. W., Raj, A., Hays, J., Schölkopf, B. Proceedings of the 36th International Conference on Machine Learning (ICML), 97:3140-3151, Proceedings of Machine Learning Research, (Editors: Chaudhuri, Kamalika and Salakhutdinov, Ruslan), PMLR, June 2019, *equal contribution (Published) PDF URL BibTeX

Perceiving Systems Conference Paper Learning Joint Reconstruction of Hands and Manipulated Objects Hasson, Y., Varol, G., Tzionas, D., Kalevatykh, I., Black, M. J., Laptev, I., Schmid, C. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR) , 11807-11816, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Estimating hand-object manipulations is essential for interpreting and imitating human actions. Previous work has made significant progress towards reconstruction of hand poses and object shapes in isolation. Yet, reconstructing hands and objects during manipulation is a more challenging task due to significant occlusions of both the hand and object. While presenting challenges, manipulations may also simplify the problem since the physics of contact restricts the space of valid hand-object configurations. For example, during manipulation, the hand and object should be in contact but not interpenetrate. In this work, we regularize the joint reconstruction of hands and objects with manipulation constraints. We present an end-to-end learnable model that exploits a novel contact loss that favors physically plausible hand-object constellations. Our approach improves grasp quality metrics over baselines, using RGB images as input. To train and evaluate the model, we also propose a new large-scale synthetic dataset, ObMan, with hand-object manipulations. We demonstrate the transferability of ObMan-trained models to real data.
pdf suppl poster DOI URL BibTeX

Autonomous Vision Conference Paper Learning Non-volumetric Depth Fusion using Successive Reprojections Donne, S., Geiger, A. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Given a set of input views, multi-view stereopsis techniques estimate depth maps to represent the 3D reconstruction of the scene; these are fused into a single, consistent, reconstruction -- most often a point cloud. In this work we propose to learn an auto-regressive depth refinement directly from data. While deep learning has improved the accuracy and speed of depth estimation significantly, learned MVS techniques remain limited to the planesweeping paradigm. We refine a set of input depth maps by successively reprojecting information from neighbouring views to leverage multi-view constraints. Compared to learning-based volumetric fusion techniques, an image-based representation allows significantly more detailed reconstructions; compared to traditional point-based techniques, our method learns noise suppression and surface completion in a data-driven fashion. Due to the limited availability of high-quality reconstruction datasets with ground truth, we introduce two novel synthetic datasets to (pre-)train our network. Our approach is able to improve both the output depth maps and the reconstructed point cloud, for both learned and traditional depth estimation front-ends, on both synthetic and real data.
pdf suppmat Project Page Video Poster blog BibTeX

Perceiving Systems Conference Paper Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision Sanyal, S., Bolkart, T., Feng, H., Black, M. J. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 7763-7772, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
The estimation of 3D face shape from a single image must be robust to variations in lighting, head pose, expression, facial hair, makeup, and occlusions. Robustness requires a large training set of in-the-wild images, which by construction, lack ground truth 3D shape. To train a network without any 2D-to-3D supervision, we present RingNet, which learns to compute 3D face shape from a single image. Our key observation is that an individual’s face shape is constant across images, regardless of expression, pose, lighting, etc. RingNet leverages multiple images of a person and automatically detected 2D face features. It uses a novel loss that encourages the face shape to be similar when the identity is the same and different for different people. We achieve invariance to expression by representing the face using the FLAME model. Once trained, our method takes a single image and outputs the parameters of FLAME, which can be readily animated. Additionally we create a new database of faces “not quite in-the-wild” (NoW) with 3D head scans and high-resolution images of the subjects in a wide variety of conditions. We evaluate publicly available methods and find that RingNet is more accurate than methods that use 3D supervision. The dataset, model, and results are available for research purposes.
code pdf preprint URL BibTeX

Perceiving Systems Empirical Inference Conference Paper Local Temporal Bilinear Pooling for Fine-grained Action Parsing Zhang, Y., Tang, S., Muandet, K., Jarvers, C., Neumann, H. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 12005-12015, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Fine-grained temporal action parsing is important in many applications, such as daily activity understanding, human motion analysis, surgical robotics and others requiring subtle and precise operations in a long-term period. In this paper we propose a novel bilinear pooling operation, which is used in intermediate layers of a temporal convolutional encoder-decoder net. In contrast to other work, our proposed bilinear pooling is learnable and hence can capture more complex local statistics than the conventional counterpart. In addition, we introduce exact lower-dimension representations of our bilinear forms, so that the dimensionality is reduced with neither information loss nor extra computation. We perform intensive experiments to quantitatively analyze our model and show the superior performances to other state-of-the-art work on various datasets.
Code video demo pdf URL BibTeX

Autonomous Vision Conference Paper MOTS: Multi-Object Tracking and Segmentation Voigtlaender, P., Krause, M., Osep, A., Luiten, J., Sekar, B. B. G., Geiger, A., Leibe, B. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
This paper extends the popular task of multi-object tracking to multi-object tracking and segmentation (MOTS). Towards this goal, we create dense pixel-level annotations for two existing tracking datasets using a semi-automatic annotation procedure. Our new annotations comprise 65,213 pixel masks for 977 distinct objects (cars and pedestrians) in 10,870 video frames. For evaluation, we extend existing multi-object tracking metrics to this new task. Moreover, we propose a new baseline method which jointly addresses detection, tracking, and segmentation with a single convolutional network. We demonstrate the value of our datasets by achieving improvements in performance when training on MOTS annotations. We believe that our datasets, metrics and baseline will become a valuable resource towards developing multi-object tracking approaches that go beyond 2D bounding boxes.
pdf suppmat Project Page Poster Video Project Page BibTeX

Empirical Inference Conference Paper Overcoming Mean-Field Approximations in Recurrent Gaussian Process Models Ialongo, A. D., Van Der Wilk, M., Hensman, J., Rasmussen, C. E. In Proceedings of the 36th International Conference on Machine Learning (ICML), 97:2931-2940, Proceedings of Machine Learning Research, (Editors: Chaudhuri, Kamalika and Salakhutdinov, Ruslan), PMLR, June 2019 (Published) PDF URL BibTeX

Autonomous Vision Conference Paper PointFlowNet: Learning Representations for Rigid Motion Estimation from Point Clouds Behl, A., Paschalidou, D., Donne, S., Geiger, A. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Despite significant progress in image-based 3D scene flow estimation, the performance of such approaches has not yet reached the fidelity required by many applications. Simultaneously, these applications are often not restricted to image-based estimation: laser scanners provide a popular alternative to traditional cameras, for example in the context of self-driving cars, as they directly yield a 3D point cloud. In this paper, we propose to estimate 3D motion from such unstructured point clouds using a deep neural network. In a single forward pass, our model jointly predicts 3D scene flow as well as the 3D bounding box and rigid body motion of objects in the scene. While the prospect of estimating 3D scene flow from unstructured point clouds is promising, it is also a challenging task. We show that the traditional global representation of rigid body motion prohibits inference by CNNs, and propose a translation equivariant representation to circumvent this problem. For training our deep network, a large dataset is required. Because of this, we augment real scans from KITTI with virtual objects, realistically modeling occlusions and simulating sensor noise. A thorough comparison with classic and learning-based techniques highlights the robustness of the proposed approach.
pdf suppmat Project Page Poster Video BibTeX

Empirical Inference Conference Paper Projections for Approximate Policy Iteration Algorithms Akrour, R., Pajarinen, J., Peters, J., Neumann, G. Proceedings of the 36th International Conference on Machine Learning (ICML), 97:181-190, Proceedings of Machine Learning Research, (Editors: Chaudhuri, Kamalika and Salakhutdinov, Ruslan), PMLR, June 2019 (Published) URL BibTeX

Intelligent Control Systems Article Resource-aware IoT Control: Saving Communication through Predictive Triggering Trimpe, S., Baumann, D. IEEE Internet of Things Journal, 6(3):5013-5028, June 2019 (Published)
The Internet of Things (IoT) interconnects multiple physical devices in large-scale networks. When the 'things' coordinate decisions and act collectively on shared information, feedback is introduced between them. Multiple feedback loops are thus closed over a shared, general-purpose network. Traditional feedback control is unsuitable for design of IoT control because it relies on high-rate periodic communication and is ignorant of the shared network resource. Therefore, recent event-based estimation methods are applied herein for resource-aware IoT control allowing agents to decide online whether communication with other agents is needed, or not. While this can reduce network traffic significantly, a severe limitation of typical event-based approaches is the need for instantaneous triggering decisions that leave no time to reallocate freed resources (e.g., communication slots), which hence remain unused. To address this problem, novel predictive and self triggering protocols are proposed herein. From a unified Bayesian decision framework, two schemes are developed: self triggers that predict, at the current triggering instant, the next one; and predictive triggers that check at every time step, whether communication will be needed at a given prediction horizon. The suitability of these triggers for feedback control is demonstrated in hardware experiments on a cart-pole, and scalability is discussed with a multi-vehicle simulation.
PDF arXiv DOI BibTeX

Empirical Inference Conference Paper Robustly Disentangled Causal Mechanisms: Validating Deep Representations for Interventional Robustness Suter, R., Miladinovic, D., Schölkopf, B., Bauer, S. Proceedings of the 36th International Conference on Machine Learning (ICML), 97:6056-6065, Proceedings of Machine Learning Research, (Editors: Chaudhuri, Kamalika and Salakhutdinov, Ruslan), PMLR, June 2019 (Published) PDF URL BibTeX

Autonomous Vision Conference Paper Superquadrics Revisited: Learning 3D Shape Parsing beyond Cuboids Paschalidou, D., Ulusoy, A. O., Geiger, A. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Abstracting complex 3D shapes with parsimonious part-based representations has been a long standing goal in computer vision. This paper presents a learning-based solution to this problem which goes beyond the traditional 3D cuboid representation by exploiting superquadrics as atomic elements. We demonstrate that superquadrics lead to more expressive 3D scene parses while being easier to learn than 3D cuboid representations. Moreover, we provide an analytical solution to the Chamfer loss which avoids the need for computational expensive reinforcement learning or iterative prediction. Our model learns to parse 3D objects into consistent superquadric representations without supervision. Results on various ShapeNet categories as well as the SURREAL human body dataset demonstrate the flexibility of our model in capturing fine details and complex poses that could not have been modelled using cuboids.
Project Page Poster suppmat pdf Video blog handout BibTeX

Empirical Inference Conference Paper Switching Linear Dynamics for Variational Bayes Filtering Becker-Ehmck, P., Peters, J., van der Smagt, P. Proceedings of the 36th International Conference on Machine Learning (ICML), 97:553-562, Proceedings of Machine Learning Research, (Editors: Chaudhuri, Kamalika and Salakhutdinov, Ruslan), PMLR, June 2019 (Published) URL BibTeX

Autonomous Vision Conference Paper Taking a Deeper Look at the Inverse Compositional Algorithm Lv, Z., Dellaert, F., Rehg, J. M., Geiger, A. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
In this paper, we provide a modern synthesis of the classic inverse compositional algorithm for dense image alignment. We first discuss the assumptions made by this well-established technique, and subsequently propose to relax these assumptions by incorporating data-driven priors into this model. More specifically, we unroll a robust version of the inverse compositional algorithm and replace multiple components of this algorithm using more expressive models whose parameters we train in an end-to-end fashion from data. Our experiments on several challenging 3D rigid motion estimation tasks demonstrate the advantages of combining optimization with learning-based techniques, outperforming the classic inverse compositional algorithm as well as data-driven image-to-pose regression approaches.
pdf suppmat Video Project Page Poster BibTeX

Rationality Enhancement Conference Paper The Goal Characteristics (GC) questionannaire: A comprehensive measure for goals’ content, attainability, interestingness, and usefulness Iwama, G., Wirzberger, M., Lieder, F. 40th Annual Meeting of the Society for Judgement and Decision Making, June 2019
Many studies have investigated how goal characteristics affect goal achievement. However, most of them considered only a small number of characteristics and the psychometric properties of their measures remains unclear. To overcome these limitations, we developed and validated a comprehensive questionnaire of goal characteristics with four subscales - measuring the goal’s content, attainability, interestingness, and usefulness respectively. 590 participants completed the questionnaire online. A confirmatory factor analysis supported the four subscales and their structure. The GC questionnaire (https://osf.io/qfhup) can be easily applied to investigate goal setting, pursuit and adjustment in a wide range of contexts.
DOI BibTeX

Intelligent Control Systems Conference Paper Trajectory-Based Off-Policy Deep Reinforcement Learning Doerr, A., Volpp, M., Toussaint, M., Trimpe, S., Daniel, C. In Proceedings of the International Conference on Machine Learning (ICML), International Conference on Machine Learning (ICML), June 2019 (Published)
Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks. However, these methods are also data-inefficient, afflicted with high variance gradient estimates, and frequently get stuck in local optima. This work addresses these weaknesses by combining recent improvements in the reuse of off-policy data and exploration in parameter space with deterministic behavioral policies. The resulting objective is amenable to standard neural network optimization strategies like stochastic gradient descent or stochastic gradient Hamiltonian Monte Carlo. Incorporation of previous rollouts via importance sampling greatly improves data-efficiency, whilst stochastic optimization schemes facilitate the escape from local optima. We evaluate the proposed approach on a series of continuous control benchmark tasks. The results show that the proposed algorithm is able to successfully and reliably learn solutions using fewer system interactions than standard policy gradient methods.
arXiv PDF BibTeX

Autonomous Learning Conference Paper Variational Autoencoders Pursue PCA Directions (by Accident) Rolinek, M., Zietlow, D., Martius, G. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 12406-12415, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
The Variational Autoencoder (VAE) is a powerful architecture capable of representation learning and generative modeling. When it comes to learning interpretable (disentangled) representations, VAE and its variants show unparalleled performance. However, the reasons for this are unclear, since a very particular alignment of the latent embedding is needed but the design of the VAE does not encourage it in any explicit way. We address this matter and offer the following explanation: the diagonal approximation in the encoder together with the inherent stochasticity force local orthogonality of the decoder. The local behavior of promoting both reconstruction and orthogonality matches closely how the PCA embedding is chosen. Alongside providing an intuitive understanding, we justify the statement with full theoretical analysis as well as with experiments.
Arxiv URL BibTeX

Micro, Nano, and Molecular Systems Ph.D. Thesis The acoustic hologram and particle manipulation with structured acoustic fields Melde, K. Karlsruher Institut für Technologie (KIT), May 2019
This thesis presents holograms as a novel approach to create arbitrary ultrasound fields. It is shown how any wavefront can simply be encoded in the thickness profile of a phase plate. Contemporary 3D-printers enable fabrication of structured surfaces with feature sizes corresponding to wavelengths of ultrasound up to 7.5 MHz in water—covering the majority of medical and industrial applications. The whole workflow for designing and creating acoustic holograms has been developed and is presented in this thesis. To reconstruct the encoded fields a single transducer element is sufficient. Arbitrary fields are demonstrated in transmission and reflection configurations in water and air and validated by extensive hydrophone scans. To complement these time-consuming measurements a new approach, based on thermography, is presented, which enables volumetric sound field scans in just a few seconds. Several original experiments demonstrate the advantages of using acoustic holograms for particle manipulation. Most notably, directed parallel assembly of microparticles in the shape of a projected acoustic image has been shown and extended to a fabrication method by fusing the particles in a polymerization reaction. Further, seemingly dynamic propulsion from a static hologram is demonstrated by controlling the phase gradient along a projected track. The necessary complexity to create ultrasound fields with set amplitude and phase distributions is easily managed using acoustic holograms. The acoustic hologram is a simple and cost-effective tool for shaping ultrasound fields with high-fidelity. It is expected to have an impact in many applications where ultrasound is employed.
DOI URL BibTeX

Micro, Nano, and Molecular Systems Article Recent advances in gold nanoparticles forbiomedical applications: from hybrid structuresto multi-functionality Jeong, H., Choi, E., Ellis, E., Lee, T. J. of Mat. Chem. B, 7:3480, May 2019
Gold nanoparticles (Au NPs) are arguably the most versatile nanomaterials reported to date. Recentadvances in nanofabrication and chemical synthesis have expanded the scope of Au NPs from classicalhomogeneous nanospheres to a wide range of hybrid nanostructures with programmable size, shapeand composition. Novel physiochemical properties can be achievedviadesign and engineering of thehybrid nanostructures. In this review we discuss the recent progress in the development of complexhybrid Au NPs and propose a classification framework based on three fundamental structuraldimensions (length scale, complexity and symmetry) to aid categorising, comparing and designingvarious types of Au NPs. Their novel functions and potential for biomedical applications will also bediscussed, featuring point-of-care diagnostics by advanced optical spectroscopy and assays, as well asminimally invasive surgeries and targeted drug delivery using multifunctional nano-robot
DOI URL BibTeX

Autonomous Vision Conference Paper Impact of Expertise on Interaction Preferences for Navigation Assistance of Visually Impaired Individuals Dragan, A., Joao, G., Eshed, O., M., K. K., Chieko, A. Proceedings International Web for All Conference (W4A), Association for Computing Machinery, 16th International Web for All Conference (W4A), May 2019 (Published) DOI BibTeX

Empirical Inference Conference Paper Deep Lagrangian Networks: Using Physics as Model Prior for Deep Learning Lutter, M., Ritter, C., Peters, J. 7th International Conference on Learning Representations (ICLR), ICLR, 7th International Conference on Learning Representations (ICLR), May 2019 (Published) URL BibTeX

Empirical Inference Conference Paper Meta-Learning Probabilistic Inference for Prediction Gordon, J., Bronskill, J., Bauer, M., Nowozin, S., Turner, R. 7th International Conference on Learning Representations (ICLR), ICLR, 7th International Conference on Learning Representations (ICLR), May 2019 (Published) URL BibTeX

Empirical Inference Conference Paper SOM-VAE: Interpretable Discrete Representation Learning on Time Series Fortuin, V., Hüser, M., Locatello, F., Strathmann, H., Rätsch, G. 7th International Conference on Learning Representations (ICLR), ICLR, 7th International Conference on Learning Representations (ICLR), May 2019 (Published) URL BibTeX

Haptic Intelligence Conference Paper A Clustering Approach to Categorizing 7 Degree-of-Freedom Arm Motions during Activities of Daily Living Gloumakov, Y., Spiers, A. J., Dollar, A. M. In Proceedings of the International Conference on Robotics and Automation (ICRA), 7214-7220, Montreal, Canada, May 2019 (Published)
In this paper we present a novel method of categorizing naturalistic human arm motions during activities of daily living using clustering techniques. While many current approaches attempt to define all arm motions using heuristic interpretation, or a combination of several abstract motion primitives, our unsupervised approach generates a hierarchical description of natural human motion with well recognized groups. Reliable recommendation of a subset of motions for task achievement is beneficial to various fields, such as robotic and semi-autonomous prosthetic device applications. The proposed method makes use of well-known techniques such as dynamic time warping (DTW) to obtain a divergence measure between motion segments, DTW barycenter averaging (DBA) to get a motion average, and Ward's distance criterion to build the hierarchical tree. The clusters that emerge summarize the variety of recorded motions into the following general tasks: reach-to-front, transfer-box, drinking from vessel, on-table motion, turning a key or door knob, and reach-to-back pocket. The clustering methodology is justified by comparing against an alternative measure of divergence using Bezier coefficients and K-medoids clustering.
DOI BibTeX

Theory of Inhomogeneous Condensed Matter Article Aging phenomena during phase separation in fluids: decay of autocorrelation for vapor-liquid transitions Roy, S., Bera, A., Majumder, S., Das, S. K. Soft Matter, 15(23):4743-4750, Royal Society of Chemistry, Cambridge, UK, May 2019 (Published)
We performed molecular dynamics simulations to study relaxation phenomena during vapor–liquid transitions in a single component Lennard-Jones system. Results from two different overall densities are presented: one in the neighborhood of the vapor branch of the coexistence curve and the other being close to the critical density. The nonequilibrium morphologies, growth mechanisms and growth laws in the two cases are vastly different. In the low density case growth occurs via diffusive coalescence of droplets in a disconnected morphology. On the other hand, the elongated structure in the higher density case grows via advective transport of particles inside the tube-like liquid domains. The objective in this work has been to identify how the decay of the order-parameter autocorrelation, an important quantity to understand aging dynamics, differs in the two cases. In the case of the disconnected morphology, we observe a very robust power-law decay, as a function of the ratio of the characteristic lengths at the observation time and at the age of the system, whereas the results for the percolating structure appear rather complex. To quantify the decay in the latter case, unlike the standard method followed in a previous study, here we have performed a finite-size scaling analysis. The outcome of this analysis shows the presence of a strong preasymptotic correction, while revealing that in this case also, albeit in the asymptotic limit, the decay follows a power-law. Even though the corresponding exponents in the two cases differ drastically, this study, combined with a few recent ones, suggests that power-law behavior of this correlation function is rather universal in coarsening dynamics.
DOI URL BibTeX

Empirical Inference Conference Paper Disentangled State Space Models: Unsupervised Learning of Dynamics across Heterogeneous Environments Miladinović*, D., Gondal*, M. W., Schölkopf, B., Buhmann, J. M., Bauer, S. Deep Generative Models for Highly Structured Data Workshop at ICLR, May 2019, *equal contribution (Published) URL BibTeX

Movement Generation and Control Conference Paper Efficient Humanoid Contact Planning using Learned Centroidal Dynamics Prediction Lin, Y., Ponton, B., Righetti, L., Berenson, D. International Conference on Robotics and Automation (ICRA), 5280-5286, IEEE, May 2019 (Published) DOI BibTeX

Haptic Intelligence Miscellaneous Explorations of Shape-Changing Haptic Interfaces for Blind and Sighted Pedestrian Navigation Spiers, A., Kuchenbecker, K. J. Workshop paper (6 pages) presented at the CHI Workshop on Hacking Blind Navigation, Glasgow, UK, May 2019 (Published)
Since the 1960s, technologists have worked to develop systems that facilitate independent navigation by vision-impaired (VI) pedestrians. These devices vary in terms of conveyed information and feedback modality. Unfortunately, many such prototypes never progress beyond laboratory testing. Conversely, smartphone-based navigation systems for sighted pedestrians have grown in robustness and capabilities, to the point of now being ubiquitous. How can we leverage the success of sighted navigation technology, which is driven by a larger global market, as a way to progress VI navigation systems? We believe one possibility is to make common devices that benefit both VI and sighted individuals, by providing information in a way that does not distract either user from their tasks or environment. To this end we have developed physical interfaces that eschew visual, audio or vibratory feedback, instead relying on the natural human ability to perceive the shape of a handheld object.
URL BibTeX

Empirical Inference Conference Paper Foundations and New Horizons for Causal Inference Meinshausen, N., Peters, J., Richardson, T. S., Schölkopf, B. In Oberwolfach Reports, 16(2):1499-1571, May 2019 (Published) DOI URL BibTeX

Haptic Intelligence Conference Paper Haptipedia: Accelerating Haptic Device Discovery to Support Interaction & Engineering Design Seifi, H., Fazlollahi, F., Oppermann, M., Sastrillo, J. A., Ip, J., Agrawal, A., Park, G., Kuchenbecker, K. J., MacLean, K. E. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), 1-12, Glasgow, UK, May 2019 (Published)
Creating haptic experiences often entails inventing, modifying, or selecting specialized hardware. However, experience designers are rarely engineers, and 30 years of haptic inventions are buried in a fragmented literature that describes devices mechanically rather than by potential purpose. We conceived of Haptipedia to unlock this trove of examples: Haptipedia presents a device corpus for exploration through metadata that matter to both device and experience designers. It is a taxonomy of device attributes that go beyond physical description to capture potential utility, applied to a growing database of 105 grounded force-feedback devices, and accessed through a public visualization that links utility to morphology. Haptipedia's design was driven by both systematic review of the haptic device literature and rich input from diverse haptic designers. We describe Haptipedia's reception (including hopes it will redefine device reporting standards) and our plans for its sustainability through community participation.
DOI BibTeX

Haptic Intelligence Conference Paper Improving Haptic Adjective Recognition with Unsupervised Feature Learning Richardson, B. A., Kuchenbecker, K. J. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 3804-3810, Montreal, Canada, May 2019 (Published)
Humans can form an impression of how a new object feels simply by touching its surfaces with the densely innervated skin of the fingertips. Many haptics researchers have recently been working to endow robots with similar levels of haptic intelligence, but these efforts almost always employ hand-crafted features, which are brittle, and concrete tasks, such as object recognition. We applied unsupervised feature learning methods, specifically K-SVD and Spatio-Temporal Hierarchical Matching Pursuit (ST-HMP), to rich multi-modal haptic data from a diverse dataset. We then tested the learned features on 19 more abstract binary classification tasks that center on haptic adjectives such as smooth and squishy. The learned features proved superior to traditional hand-crafted features by a large margin, almost doubling the average F1 score across all adjectives. Additionally, particular exploratory procedures (EPs) and sensor channels were found to support perception of certain haptic adjectives, underlining the need for diverse interactions and multi-modal haptic data.
DOI BibTeX

Haptic Intelligence Conference Paper Internal Array Electrodes Improve the Spatial Resolution of Soft Tactile Sensors Based on Electrical Resistance Tomography Lee, H., Park, K., Kim, J., Kuchenbecker, K. J. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 5411-5417, Montreal, Canada, May 2019, Hyosang Lee and Kyungseo Park contributed equally to this publication (Published) DOI BibTeX