Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Perceiving Systems Autonomous Vision Conference Paper Attacking Optical Flow Ranjan, A., Janai, J., Geiger, A., Black, M. J. In Proceedings International Conference on Computer Vision (ICCV), 2404-2413, IEEE, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), November 2019, ISSN: 2380-7504 (Published)
Deep neural nets achieve state-of-the-art performance on the problem of optical flow estimation. Since optical flow is used in several safety-critical applications like self-driving cars, it is important to gain insights into the robustness of those techniques. Recently, it has been shown that adversarial attacks easily fool deep neural networks to misclassify objects. The robustness of optical flow networks to adversarial attacks, however, has not been studied so far. In this paper, we extend adversarial patch attacks to optical flow networks and show that such attacks can compromise their performance. We show that corrupting a small patch of less than 1% of the image size can significantly affect optical flow estimates. Our attacks lead to noisy flow estimates that extend significantly beyond the region of the attack, in many cases even completely erasing the motion of objects in the scene. While networks using an encoder-decoder architecture are very sensitive to these attacks, we found that networks using a spatial pyramid architecture are less affected. We analyse the success and failure of attacking both architectures by visualizing their feature maps and comparing them to classical optical flow techniques which are robust to these attacks. We also demonstrate that such attacks are practical by placing a printed pattern into real scenes.
Video Project Page Paper Supplementary Material DOI URL BibTeX

Haptic Intelligence Miscellaneous A Fabric-based Scalable Robotic Skin Mimicking Biological Tactile Hyperacuity Lee, H., Park, K., Kim, J., Kuchenbecker, K. J. Workshop paper (3 pages) presented at the IROS RoboTac Workshop on New Advances in Tactile Sensation, Perception, and Learning in Robotics: Emerging Materials and Technologies for Manipulation, Macao, China, November 2019, Co-Winner of the Award for Best Poster (Published)
Implementing a whole-body tactile sensor is becoming a critical topic in robotics since physical contacts can occur at any location of the robot. Fabricating such a large-scale system typically requires complex electrical wiring to achieve high spatial resolution. Interestingly, biological skins have tactile hyperacuity, which is enabled by overlapping the receptive fields. This study introduces a fabric-based tactile sensor inspired by this biological feature. The tactile sensor injects electrical current into a pair of electrodes and measures the corresponding electrical potentials formed around the current pathway, which can be considered as a receptive field. When two or more neighboring pairs of electrodes are sampled, sensitive regions overlap in a way similar to the biological system. For the experiments, a fabric-based tactile sensor with only 24 electrodes in an area of 200 mm × 200 mm is developed. The sensor can localize point contact with an error of 8.13 mm, while the sensor’s minimum two-point discrimination distance is nearly 35 mm. This performance is comparable to that of the stomach region of human skin. This sensing approach could greatly simplify whole-body tactile skin development in the future.
BibTeX

Perceiving Systems Conference Paper AirCap – Aerial Outdoor Motion Capture Ahmad, A., Price, E., Tallamraju, R., Saini, N., Lawless, G., Ludwig, R., Martinovic, I., Bülthoff, H. H., Black, M. J. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2019), Workshop on Aerial Swarms, November 2019
This paper presents an overview of the Grassroots project Aerial Outdoor Motion Capture (AirCap) running at the Max Planck Institute for Intelligent Systems. AirCap's goal is to achieve markerless, unconstrained, human motion capture (mocap) in unknown and unstructured outdoor environments. To that end, we have developed an autonomous flying motion capture system using a team of aerial vehicles (MAVs) with only on-board, monocular RGB cameras. We have conducted several real robot experiments involving up to 3 aerial vehicles autonomously tracking and following a person in several challenging scenarios using our approach of active cooperative perception developed in AirCap. Using the images captured by these robots during the experiments, we have demonstrated a successful offline body pose and shape estimation with sufficiently high accuracy. Overall, we have demonstrated the first fully autonomous flying motion capture system involving multiple robots for outdoor scenarios.
Talk slides BibTeX

Empirical Inference Conference Paper Chance-Constrained Trajectory Optimization for Non-linear Systems with Unknown Stochastic Dynamics Celik, O., Abdulsamad, H., Peters, J. International Conference on Intelligent Robots and Systems (IROS), 6828-6833, IEEE, November 2019 (Published) DOI BibTeX

Perceiving Systems Article Decoding subcategories of human bodies from both body- and face-responsive cortical regions Foster, C., Zhao, M., Romero, J., Black, M. J., Mohler, B. J., Bartels, A., Bülthoff, I. NeuroImage, 202(15):116085, November 2019
Our visual system can easily categorize objects (e.g. faces vs. bodies) and further differentiate them into subcategories (e.g. male vs. female). This ability is particularly important for objects of social significance, such as human faces and bodies. While many studies have demonstrated category selectivity to faces and bodies in the brain, how subcategories of faces and bodies are represented remains unclear. Here, we investigated how the brain encodes two prominent subcategories shared by both faces and bodies, sex and weight, and whether neural responses to these subcategories rely on low-level visual, high-level visual or semantic similarity. We recorded brain activity with fMRI while participants viewed faces and bodies that varied in sex, weight, and image size. The results showed that the sex of bodies can be decoded from both body- and face-responsive brain areas, with the former exhibiting more consistent size-invariant decoding than the latter. Body weight could also be decoded in face-responsive areas and in distributed body-responsive areas, and this decoding was also invariant to image size. The weight of faces could be decoded from the fusiform body area (FBA), and weight could be decoded across face and body stimuli in the extrastriate body area (EBA) and a distributed body-responsive area. The sex of well-controlled faces (e.g. excluding hairstyles) could not be decoded from face- or body-responsive regions. These results demonstrate that both face- and body-responsive brain regions encode information that can distinguish the sex and weight of bodies. Moreover, the neural patterns corresponding to sex and weight were invariant to image size and could sometimes generalize across face and body stimuli, suggesting that such subcategorical information is encoded with a high-level visual or semantic code.
paper pdf DOI BibTeX

Empirical Inference Conference Paper Deep Lagrangian Networks for end-to-end learning of energy-based control for under-actuated systems Lutter, M., Listmann, K., Peters, J. International Conference on Intelligent Robots and Systems (IROS), 7718-7725, IEEE, November 2019 (Published) DOI BibTeX

Intelligent Control Systems Article Fast Feedback Control over Multi-hop Wireless Networks with Mode Changes and Stability Guarantees Baumann, D., Mager, F., Jacob, R., Thiele, L., Zimmerling, M., Trimpe, S. ACM Transactions on Cyber-Physical Systems, 4(2):18, November 2019 (Published) arXiv PDF DOI BibTeX

Empirical Inference Conference Paper Generalized Multiple Correlation Coefficient as a Similarity Measurement between Trajectories Urain, J., Peters, J. International Conference on Intelligent Robots and Systems (IROS), 1363-1369, IEEE, November 2019 (Published) DOI BibTeX

Haptic Intelligence Miscellaneous HuggieChest: An Inflatable Haptic Sensing Chest for a Hugging Robot Block, A. E., Kuchenbecker, K. J. Workshop paper (4 pages) presented at the IROS RoboTac Workshop on New Advances in Tactile Sensation, Perception, and Learning in Robotics: Emerging Materials and Technologies for Manipulation, Macao, China, November 2019 (Published)
During hugs, humans naturally provide and intuit subtle non-verbal cues that signify the desired strength and duration of an exchanged hug. Personal preferences for this close interaction may vary greatly between people; robots do not currently have the abilities to perceive or understand these preferences. This workshop paper discusses designing, building, and testing a novel inflatable chest that can simultaneously soften a robot and act as a tactile sensor to enable more natural and responsive hugging. Using PVC vinyl, two microphones, and two barometric pressure sensors, we created an inflatable two-chambered chest that forms the torso of a hugging robot. One chamber is located in the front of the robot, and the other chamber is in the back. While contacting HuggieChest in several ways common in hugs (start hug, rub, scratch, pat, squeeze, release), we recorded data from the two sensors in each chamber. The preliminary results suggest that the complementary haptic sensing channels allow the robot wearing the chest to detect coarse and fine contacts typically experienced during hugs, regardless of where the user contacts the robot. We also verified that we can detect contacts regardless of noise from the robot’s movement, as long as the HuggieChest is inflated within a certain pressure range.
BibTeX

Empirical Inference Conference Paper Multimodal Uncertainty Reduction for Intention Recognition in Human-Robot Interaction Trick, S., Koert, D., Peters, J., Rothkopf, C. A. International Conference on Intelligent Robots and Systems (IROS), 7009-7016, IEEE, November 2019 (Published) DOI BibTeX

Empirical Inference Conference Paper Receding Horizon Curiosity Schultheis, M., Belousov, B., Abdulsamad, H., Peters, J. Proceedings of the 3rd Annual Conference on Robot Learning (CoRL), 100:1278-1288, Proceedings of Machine Learning Research, (Editors: Leslie Pack Kaelbling and Danica Kragic and Komei Sugiura), PMLR, November 2019 (Published) URL BibTeX

Empirical Inference Conference Paper Reinforcement Learning of Trajectory Distributions: Applications in Assisted Teleoperation and Motion Planning Ewerton, M., Guilherme, M., Koert, D., Kolev, Z., Takahashi, M., Peters, J. International Conference on Intelligent Robots and Systems (IROS), 4294-4300, IEEE, November 2019 (Published) DOI BibTeX

Haptic Intelligence Miscellaneous Robust Visual Augmented Reality for Robot-Assisted Surgery Forte, M., Kuchenbecker, K. J. Extended abstract presented as a podium presentation at the IROS Workshop on Legacy Disruptors in Applied Telerobotics, Macao, China, November 2019 (Published) BibTeX

Physics for Inference and Optimization Article Sampling on Networks: Estimating Eigenvector Centrality on Incomplete Networks Ruggeri, N., De Bacco, C. International Conference on Complex Networks and Their Applications, November 2019 (Published)
We develop a new sampling method to estimate eigenvector centrality on incomplete networks. Our goalis to estimate this global centrality measure having at disposal a limited amount of data. This is the case inmany real-world scenarios where data collection is expensive, the network is too big for data storage capacityor only partial information is available. The sampling algorithm is theoretically grounded by results derivedfrom spectral approximation theory. We studied the problemon both synthetic and real data and tested theperformance comparing with traditional methods, such as random walk and uniform sampling. We show thatapproximations obtained from such methods are not always reliable and that our algorithm, while preservingcomputational scalability, improves performance under different error measures.
Code Preprint pdf DOI BibTeX

Empirical Inference Conference Paper Self-Paced Contextual Reinforcement Learning Klink, P., Abdulsamad, H., Belousov, B., Peters, J. Proceedings of the 3rd Annual Conference on Robot Learning (CoRL), 100:513-529, Proceedings of Machine Learning Research, (Editors: Leslie Pack Kaelbling and Danica Kragic and Komei Sugiura), PMLR, November 2019 (Published) URL BibTeX

Empirical Inference Conference Paper Stochastic Optimal Control as Approximate Input Inference Watson, J., Abdulsamad, H., Peters, J. Proceedings of the 3rd Annual Conference on Robot Learning (CoRL), 100:697-716, Proceedings of Machine Learning Research, (Editors: Leslie Pack Kaelbling and Danica Kragic and Komei Sugiura), PMLR, November 2019 (Published) URL BibTeX

Empirical Inference Conference Paper HJB Optimal Feedback Control with Deep Differential Value Functions and Action Constraints Lutter, M., Belousov, B., Listmann, K., Clever, D., Peters, J. Proceedings of the 3rd Annual Conference on Robot Learning (CoRL), 100:640-650, Proceedings of Machine Learning Research, (Editors: Leslie Pack Kaelbling and Danica Kragic and Komei Sugiura), PMLR, November 2019 (Published) URL BibTeX

Perceiving Systems Conference Paper Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop Kolotouros, N., Pavlakos, G., Black, M. J., Daniilidis, K. Proceedings International Conference on Computer Vision (ICCV), 2252-2261, IEEE, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), October 2019, ISSN: 2380-7504 (Published)
Model-based human pose estimation is currently approached through two different paradigms. Optimization-based methods fit a parametric body model to 2D observations in an iterative manner, leading to accurate image-model alignments, but are often slow and sensitive to the initialization. In contrast, regression-based methods, that use a deep network to directly estimate the model parameters from pixels, tend to provide reasonable, but not pixel accurate, results while requiring huge amounts of supervision. In this work, instead of investigating which approach is better, our key insight is that the two paradigms can form a strong collaboration. A reasonable, directly regressed estimate from the network can initialize the iterative optimization making the fitting faster and more accurate. Similarly, a pixel accurate fit from iterative optimization can act as strong supervision for the network. This is the core of our proposed approach SPIN (SMPL oPtimization IN the loop). The deep network initializes an iterative optimization routine that fits the body model to 2D joints within the training loop, and the fitted estimate is subsequently used to supervise the network. Our approach is self-improving by nature, since better network estimates can lead the optimization to better solutions, while more accurate optimization fits provide better supervision for the network. We demonstrate the effectiveness of our approach in different settings, where 3D ground truth is scarce, or not available, and we consistently outperform the state-of-the-art model-based pose estimation approaches by significant margins.
pdf code project DOI BibTeX

Perceiving Systems Conference Paper Markerless Outdoor Human Motion Capture Using Multiple Autonomous Micro Aerial Vehicles Saini, N., Price, E., Tallamraju, R., Enficiaud, R., Ludwig, R., Martinović, I., Ahmad, A., Black, M. Proceedings 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 823-832, IEEE, International Conference on Computer Vision (ICCV), October 2019 (Published)
Capturing human motion in natural scenarios means moving motion capture out of the lab and into the wild. Typical approaches rely on fixed, calibrated, cameras and reflective markers on the body, significantly limiting the motions that can be captured. To make motion capture truly unconstrained, we describe the first fully autonomous outdoor capture system based on flying vehicles. We use multiple micro-aerial-vehicles(MAVs), each equipped with a monocular RGB camera, an IMU, and a GPS receiver module. These detect the person, optimize their position, and localize themselves approximately. We then develop a markerless motion capture method that is suitable for this challenging scenario with a distant subject, viewed from above, with approximately calibrated and moving cameras. We combine multiple state-of-the-art 2D joint detectors with a 3D human body model and a powerful prior on human pose. We jointly optimize for 3D body pose and camera pose to robustly fit the 2D measurements. To our knowledge, this is the first successful demonstration of outdoor, full-body, markerless motion capture from autonomous flying vehicles.
Code Data Video Paper Manuscript DOI BibTeX

Perceiving Systems Conference Paper Resolving 3D Human Pose Ambiguities with 3D Scene Constraints Hassan, M., Choutas, V., Tzionas, D., Black, M. J. In International Conference on Computer Vision (ICCV), 2282-2292, October 2019 (Published)
To understand and analyze human behavior, we need to capture humans moving in, and interacting with, the world. Most existing methods perform 3D human pose estimation without explicitly considering the scene. We observe however that the world constrains the body and vice-versa. To motivate this, we show that current 3D human pose estimation methods produce results that are not consistent with the 3D scene. Our key contribution is to exploit static 3D scene structure to better estimate human pose from monocular images. The method enforces Proximal Relationships with Object eXclusion and is called PROX. To test this, we collect a new dataset composed of 12 different 3D scenes and RGB sequences of 20 subjects moving in and interacting with the scenes. We represent human pose using the 3D human body model SMPL-X and extend SMPLify-X to estimate body pose using scene constraints. We make use of the 3D scene information by formulating two main constraints. The interpenetration constraint penalizes intersection between the body model and the surrounding 3D scene. The contact constraint encourages specific parts of the body to be in contact with scene surfaces if they are close enough in distance and orientation. For quantitative evaluation we capture a separate dataset with 180 RGB frames in which the ground-truth body pose is estimated using a motion-capture system. We show quantitatively that introducing scene constraints significantly reduces 3D joint error and vertex error. Our code and data are available for research at https://prox.is.tue.mpg.de.
pdf poster DOI URL BibTeX

Embodied Vision Conference Paper EM-Fusion: Dynamic Object-Level SLAM With Probabilistic Data Association Strecke, M., Stückler, J. In Proceedings IEEE/CVF International Conference on Computer Vision 2019 (ICCV), 5864-5873, IEEE, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), October 2019 (Published) preprint Project page Code Poster DOI BibTeX

Perceiving Systems Conference Paper Three-D Safari: Learning to Estimate Zebra Pose, Shape, and Texture from Images "In the Wild" Zuffi, S., Kanazawa, A., Berger-Wolf, T., Black, M. J. In International Conference on Computer Vision, 5358-5367, IEEE, International Conference on Computer Vision, October 2019 (Published)
We present the first method to perform automatic 3D pose, shape and texture capture of animals from images acquired in-the-wild. In particular, we focus on the problem of capturing 3D information about Grevy's zebras from a collection of images. The Grevy's zebra is one of the most endangered species in Africa, with only a few thousand individuals left. Capturing the shape and pose of these animals can provide biologists and conservationists with information about animal health and behavior. In contrast to research on human pose, shape and texture estimation, training data for endangered species is limited, the animals are in complex natural scenes with occlusion, they are naturally camouflaged, travel in herds, and look similar to each other. To overcome these challenges, we integrate the recent SMAL animal model into a network-based regression pipeline, which we train end-to-end on synthetically generated images with pose, shape, and background variation. Going beyond state-of-the-art methods for human shape and pose estimation, our method learns a shape space for zebras during training. Learning such a shape space from images using only a photometric loss is novel, and the approach can be used to learn shape in other settings with limited 3D supervision. Moreover, we couple 3D pose and shape prediction with the task of texture synthesis, obtaining a full texture map of the animal from a single image. We show that the predicted texture map allows a novel per-instance unsupervised optimization over the network features. This method, SMALST (SMAL with learned Shape and Texture) goes beyond previous work, which assumed manual keypoints and/or segmentation, to regress directly from pixels to 3D animal shape, pose and texture. Code and data are available at https://github.com/silviazuffi/smalst
code pdf supmat iccv19 presentation DOI BibTeX

Micro, Nano, and Molecular Systems Article A Helical Microrobot with an Optimized Propeller-Shape for Propulsion in Viscoelastic Biological Media Li., D., Jeong, M., Oren, E., Yu, T., Qiu, T. Robotics, 8:87, MDPI, October 2019
One major challenge for microrobots is to penetrate and effectively move through viscoelastic biological tissues. Most existing microrobots can only propel in viscous liquids. Recent advances demonstrate that sub-micron robots can actively penetrate nanoporous biological tissue, such as the vitreous of the eye. However, it is still difficult to propel a micron-sized device through dense biological tissue. Here, we report that a special twisted helical shape together with a high aspect ratio in cross-section permit a microrobot with a diameter of hundreds-of-micrometers to move through mouse liver tissue. The helical microrobot is driven by a rotating magnetic field and localized by ultrasound imaging inside the tissue. The twisted ribbon is made of molybdenum and a sharp tip is chemically etched to generate a higher pressure at the edge of the propeller to break the biopolymeric network of the dense tissue.
DOI URL BibTeX

Dynamic Locomotion Conference Paper Trunk Pitch Oscillations for Joint Load Redistribution in Humans and Humanoid Robots Drama, Ö., Badri-Spröwitz, A. Proceedings of 2019 IEEE-RAS 19th International Conference on Humanoid Robots, 531-536, IEEE, Humanoids, October 2019 (Published)
Creating natural-looking running gaits for humanoid robots is a complex task due to the underactuated degree of freedom in the trunk, which makes the motion planning and control difficult. The research on trunk movements in human locomotion is insufficient, and no formalism is known to transfer human motion patterns onto robots. Related work mostly focuses on the lower extremities, and simplifies the problem by stabilizing the trunk at a fixed angle. In contrast, humans display significant trunk motions that follow the natural dynamics of the gait. In this work, we use a spring-loaded inverted pendulum model with a trunk (TSLIP) together with a virtual point (VP) target to create trunk oscillations and investigate the impact of these movements. We analyze how the VP location and forward speed determine the direction and magnitude of the trunk oscillations. We show that positioning the VP below the center of mass (CoM) can explain the forward trunk pitching observed in human running. The VP below the CoM leads to a synergistic work between the hip and leg, reducing the leg loading. However, it comes at the cost of increased peak hip torque. Our results provide insights for leveraging the trunk motion to redistribute joint loads and potentially improve the energy efficiency in humanoid robots.
DOI URL BibTeX

Perceiving Systems Conference Paper Energy Conscious Over-actuated Multi-Agent Payload Transport Robot: Simulations and Preliminary Physical Validation Tallamraju, R., Verma, P., Sripada, V., Agrawal, S., Karlapalem, K. 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), 1-7, IEEE, 2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), October 2019 (Published)
In this work, we consider a multi-wheeled payload transport system. Each of the wheels can be selectively actuated. When they are not actuated, wheels are free moving and do not consume battery power. The payload transport system is modeled as an actuated multi-agent system, with each wheel-motor pair as an agent. Kinematic and dynamic models are developed to ensure that the payload transport system moves as desired. We design optimization formulations to decide on the number of wheels to be active and which of the wheels to be active so that the battery is conserved and the wear on the motors is reduced. Our multi-level control framework over the agents ensures that near-optimal number of agents is active for the payload transport system to function. Through simulation studies we show that our solution ensures energy efficient operation and increases the distance traveled by the payload transport system, for the same battery power. We have built the payload transport system and provide results for preliminary experimental validation.
DOI BibTeX

Micro, Nano, and Molecular Systems Article Acoustic Holographic Cell Patterning in a Biocompatible Hydrogel Ma, Z., Holle, A., Melde, K., Qiu, T., Poeppel, K., Kadiri, V., Fischer, P. Adv. Mat., 32(1904181), October 2019
Acoustophoresis is promising as a rapid, biocompatible, non-contact cell manipulation method, where cells are arranged along the nodes or antinodes of the acoustic field. Typically, the acoustic field is formed in a resonator, which results in highly symmetric regular patterns. However, arbitrary, non-symmetrically shaped cell assemblies are necessary to obtain the irregular cellular arrangements found in biological tissues. We show that arbitrarily shaped cell patterns can be obtained from the complex acoustic field distribution defined by an acoustic hologram. Attenuation of the sound field induces localized acoustic streaming and the resultant convection flow gently delivers the suspended cells to the image plane where they form the designed pattern. We show that the process can be implemented in a biocompatible collagen solution, which can then undergo gelation to immobilize the cell pattern inside the viscoelastic matrix. The patterned cells exhibit F-actin-based protrusions, which indicates that the cells grow and thrive within the matrix. Cell viability assays and brightfield imaging after one week confirm cell survival and that the patterns persist. Acoustophoretic cell manipulation by holographic fields thus holds promise for non-contact, long-range, long-term cellular pattern formation, with a wide variety of potential applications in tissue engineering and mechanobiology.
DOI URL BibTeX

Optics and Sensing Laboratory Article Ultracold atoms in disordered potentials: elastic scattering time in the strong scattering regime Signoles, A., Lecoutre, B., Richard, J., Lim, L., Denechaud, V., Volchkov, V., Angelopoulou, V., Jendrzejewski, F., Aspect, A., Sanchez-Palencia, L., Josse, V. New Journal of Physics, 21:105002, IOP Publishing and Deutsche Physikalische Gesellschaft, October 2019 (Published) DOI URL BibTeX

Perceiving Systems Article Active Perception based Formation Control for Multiple Aerial Vehicles Tallamraju, R., Price, E., Ludwig, R., Karlapalem, K., Bülthoff, H. H., Black, M. J., Ahmad, A. IEEE Robotics and Automation Letters, Robotics and Automation Letters, 4(4):4491-4498, IEEE, October 2019
We present a novel robotic front-end for autonomous aerial motion-capture (mocap) in outdoor environments. In previous work, we presented an approach for cooperative detection and tracking (CDT) of a subject using multiple micro-aerial vehicles (MAVs). However, it did not ensure optimal view-point configurations of the MAVs to minimize the uncertainty in the person's cooperatively tracked 3D position estimate. In this article, we introduce an active approach for CDT. In contrast to cooperatively tracking only the 3D positions of the person, the MAVs can actively compute optimal local motion plans, resulting in optimal view-point configurations, which minimize the uncertainty in the tracked estimate. We achieve this by decoupling the goal of active tracking into a quadratic objective and non-convex constraints corresponding to angular configurations of the MAVs w.r.t. the person. We derive this decoupling using Gaussian observation model assumptions within the CDT algorithm. We preserve convexity in optimization by embedding all the non-convex constraints, including those for dynamic obstacle avoidance, as external control inputs in the MPC dynamics. Multiple real robot experiments and comparisons involving 3 MAVs in several challenging scenarios are presented.
pdf DOI BibTeX

Empirical Inference Conference Paper Building a Library of Tactile Skills Based on FingerVision Belousov, B., Sadybakasov, A., Wibranek, B., Veiga, F., Tessmann, O., Peters, J. International Conference on Humanoid Robots (Humanoids), 717-722, IEEE, October 2019 (Published) DOI BibTeX

Rationality Enhancement Article Doing More with Less: Meta-Reasoning and Meta-Learning in Humans and Machines Griffiths, T. L., Callaway, F., Chang, M. B., Grant, E., Krueger, P. M., Lieder, F. Current Opinion in Behavioral Sciences, 29:24-30, October 2019 (Published)
Artificial intelligence systems use an increasing amount of computation and data to solve very specific problems. By contrast, human minds solve a wide range of problems using a fixed amount of computation and limited experience. We identify two abilities that we see as crucial to this kind of general intelligence: meta-reasoning (deciding how to allocate computational resources) and meta-learning (modeling the learning environment to make better use of limited data). We summarize the relevant AI literature and relate the resulting ideas to recent work in psychology.
DOI BibTeX

Physics for Inference and Optimization Article Dynamics of beneficial epidemics Berdahl, A., Brelsford, C., De Bacco, C., Dumas, M., Ferdinand, V., Grochow, J. A., nt Hébert-Dufresne, L., Kallus, Y., Kempes, C. P., Kolchinsky, A., Larremore, D. B., Libby, E., Power, E. A., A., S. C., Tracey, B. D. Scientific Reports, 9:15093, October 2019 (Published) DOI BibTeX

Perceiving Systems Conference Paper Efficient Learning on Point Clouds With Basis Point Sets Prokudin, S., Lassner, C., Romero, J. International Conference on Computer Vision, 4332-4341, October 2019
With an increased availability of 3D scanning technology, point clouds are moving into the focus of computer vision as a rich representation of everyday scenes. However, they are hard to handle for machine learning algorithms due to the unordered structure. One common approach is to apply voxelization, which dramatically increases the amount of data stored and at the same time loses details through discretization. Recently, deep learning models with hand-tailored architectures were proposed to handle point clouds directly and achieve input permutation invariance. However, these architectures use an increased number of parameters and are computationally inefficient. In this work we propose basis point sets as a highly efficient and fully general way to process point clouds with machine learning algorithms. Basis point sets are a residual representation that can be computed efficiently and can be used with standard neural network architectures. Using the proposed representation as the input to a relatively simple network allows us to match the performance of PointNet on a shape classification task while using three order of magnitudes less floating point operations. In a second experiment, we show how proposed representation can be used for obtaining high resolution meshes from noisy 3D scans. Here, our network achieves performance comparable to the state-of-the-art computationally intense multi-step frameworks, in one network pass that can be done in less than 1ms.
code pdf BibTeX

Perceiving Systems Conference Paper End-to-end Learning for Graph Decomposition Song, J., Andres, B., Black, M., Hilliges, O., Tang, S. In International Conference on Computer Vision, 10093-10102, October 2019
Deep neural networks provide powerful tools for pattern recognition, while classical graph algorithms are widely used to solve combinatorial problems. In computer vision, many tasks combine elements of both pattern recognition and graph reasoning. In this paper, we study how to connect deep networks with graph decomposition into an end-to-end trainable framework. More specifically, the minimum cost multicut problem is first converted to an unconstrained binary cubic formulation where cycle consistency constraints are incorporated into the objective function. The new optimization problem can be viewed as a Conditional Random Field (CRF) in which the random variables are associated with the binary edge labels. Cycle constraints are introduced into the CRF as high-order potentials. A standard Convolutional Neural Network (CNN) provides the front-end features for the fully differentiable CRF. The parameters of both parts are optimized in an end-to-end manner. The efficacy of the proposed learning algorithm is demonstrated via experiments on clustering MNIST images and on the challenging task of real-world multi-people pose estimation.
PDF BibTeX

Empirical Inference Conference Paper Neural Signatures of Motor Skill in the Resting Brain Ozdenizci, O., Meyer, T., Wichmann, F., Peters, J., Schölkopf, B., Cetin, M., Grosse-Wentrup, M. Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (SMC 2019), 4387-4394, IEEE, October 2019 (Published) DOI BibTeX

Autonomous Vision Conference Paper Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics Niemeyer, M., Mescheder, L., Oechsle, M., Geiger, A. International Conference on Computer Vision, October 2019
Deep learning based 3D reconstruction techniques have recently achieved impressive results. However, while state-of-the-art methods are able to output complex 3D geometry, it is not clear how to extend these results to time-varying topologies. Approaches treating each time step individually lack continuity and exhibit slow inference, while traditional 4D reconstruction methods often utilize a template model or discretize the 4D space at fixed resolution. In this work, we present Occupancy Flow, a novel spatio-temporal representation of time-varying 3D geometry with implicit correspondences. Towards this goal, we learn a temporally and spatially continuous vector field which assigns a motion vector to every point in space and time. In order to perform dense 4D reconstruction from images or sparse point clouds, we combine our method with a continuous 3D representation. Implicitly, our model yields correspondences over time, thus enabling fast inference while providing a sound physical description of the temporal dynamics. We show that our method can be used for interpolation and reconstruction tasks, and demonstrate the accuracy of the learned correspondences. We believe that Occupancy Flow is a promising new 4D representation which will be useful for a variety of spatio-temporal reconstruction tasks.
pdf poster suppmat code Project page video blog BibTeX

Probabilistic Numerics Article Probabilistic Solutions To Ordinary Differential Equations As Non-Linear Bayesian Filtering: A New Perspective Tronarp, F., Kersting, H., Särkkä, S., Hennig, P. Statistics and Computing, 29(6):1297-1315, October 2019 (Published)
We formulate probabilistic numerical approximations to solutions of ordinary differential equations (ODEs) as problems in Gaussian process (GP) regression with non-linear measurement functions. This is achieved by defining the measurement sequence to consists of the observations of the difference between the derivative of the GP and the vector field evaluated at the GP---which are all identically zero at the solution of the ODE. When the GP has a state-space representation, the problem can be reduced to a Bayesian state estimation problem and all widely-used approximations to the Bayesian filtering and smoothing problems become applicable. Furthermore, all previous GP-based ODE solvers, which were formulated in terms of generating synthetic measurements of the vector field, come out as specific approximations. We derive novel solvers, both Gaussian and non-Gaussian, from the Bayesian state estimation problem posed in this paper and compare them with other probabilistic solvers in illustrative experiments.
DOI URL BibTeX

Movement Generation and Control Conference Paper Robust Humanoid Locomotion Using Trajectory Optimization and Sample-Efficient Learning Yeganegi, M. H., Khadiv, M., Moosavian, S. A. A., Zhu, J., Prete, A. D., Righetti, L. Proceedings International Conference on Humanoid Robots, IEEE, 2019 IEEE-RAS International Conference on Humanoid Robots, October 2019 (Published)
Trajectory optimization (TO) is one of the most powerful tools for generating feasible motions for humanoid robots. However, including uncertainties and stochasticity in the TO problem to generate robust motions can easily lead to intractable problems. Furthermore, since the models used in TO have always some level of abstraction, it can be hard to find a realistic set of uncertainties in the model space. In this paper we leverage a sample-efficient learning technique (Bayesian optimization) to robustify TO for humanoid locomotion. The main idea is to use data from full-body simulations to make the TO stage robust by tuning the cost weights. To this end, we split the TO problem into two phases. The first phase solves a convex optimization problem for generating center of mass (CoM) trajectories based on simplified linear dynamics. The second stage employs iterative Linear-Quadratic Gaussian (iLQG) as a whole-body controller to generate full body control inputs. Then we use Bayesian optimization to find the cost weights to use in the first stage that yields robust performance in the simulation/experiment, in the presence of different disturbance/uncertainties. The results show that the proposed approach is able to generate robust motions for different sets of disturbances and uncertainties.
https://arxiv.org/abs/1907.04616 URL BibTeX

Autonomous Vision Conference Paper Texture Fields: Learning Texture Representations in Function Space Oechsle, M., Mescheder, L., Niemeyer, M., Strauss, T., Geiger, A. International Conference on Computer Vision, October 2019
In recent years, substantial progress has been achieved in learning-based reconstruction of 3D objects. At the same time, generative models were proposed that can generate highly realistic images. However, despite this success in these closely related tasks, texture reconstruction of 3D objects has received little attention from the research community and state-of-the-art methods are either limited to comparably low resolution or constrained experimental setups. A major reason for these limitations is that common representations of texture are inefficient or hard to interface for modern deep learning techniques. In this paper, we propose Texture Fields, a novel texture representation which is based on regressing a continuous 3D function parameterized with a neural network. Our approach circumvents limiting factors like shape discretization and parameterization, as the proposed texture representation is independent of the shape representation of the 3D object. We show that Texture Fields are able to represent high frequency texture and naturally blend with modern deep learning techniques. Experimentally, we find that Texture Fields compare favorably to state-of-the-art methods for conditional texture reconstruction of 3D objects and enable learning of probabilistic generative models for texturing unseen 3D models. We believe that Texture Fields will become an important building block for the next generation of generative 3D models.
pdf suppmat video poster blog Project Page BibTeX

Perceiving Systems Conference Paper AMASS: Archive of Motion Capture as Surface Shapes Mahmood, N., Ghorbani, N., Troje, N. F., Pons-Moll, G., Black, M. J. In Proceedings International Conference on Computer Vision, 5442-5451, IEEE, International Conference on Computer Vision (ICCV), October 2019 (Published)
Large datasets are the cornerstone of recent advances in computer vision using deep learning. In contrast, existing human motion capture (mocap) datasets are small and the motions limited, hampering progress on learning models of human motion. While there are many different datasets available, they each use a different parameterization of the body, making it difficult to integrate them into a single meta dataset. To address this, we introduce AMASS, a large and varied database of human motion that unifies 15 different optical marker-based mocap datasets by representing them within a common framework and parameterization. We achieve this using a new method, MoSh++, that converts mocap data into realistic 3D human meshes represented by a rigged body model. Here we use SMPL [26], which is widely used and provides a standard skeletal representation as well as a fully rigged surface mesh. The method works for arbitrary marker-sets, while recovering soft-tissue dynamics and realistic hand motion. We evaluate MoSh++ and tune its hyper-parameters using a new dataset of 4D body scans that are jointly recorded with marker-based mocap. The consistent representation of AMASS makes it readily useful for animation, visualization, and generating training data for deep learning. Our dataset is significantly richer than previous human motion collections, having more than 40 hours of motion data, spanning over 300 subjects, more than 11000 motions, and is available for research at https://amass.is.tue.mpg.de/.
code pdf suppl arxiv project website video poster AMASS_Poster DOI BibTeX

Micro, Nano, and Molecular Systems Article Arrays of plasmonic nanoparticle dimers with defined nanogap spacers Jeong, H., Adams, M. C., Guenther, J., Alarcon-Correa, M., Kim, I., Choi, E., Miksch, C., Mark, A. F. M., Mark, A. G., Fischer, P. ACS Nano, 13:11453-11459, September 2019
Plasmonic molecules are building blocks of metallic nanostructures that give rise to intriguing optical phenomena with similarities to those seen in molecular systems. The ability to design plasmonic hybrid structures and molecules with nanometric resolution would enable applications in optical metamaterials and sensing that presently cannot be demonstrated, because of a lack of suitable fabrication methods allowing the structural control of the plasmonic atoms on a large scale. Here we demonstrate a wafer-scale “lithography-free” parallel fabrication scheme to realize nanogap plasmonic meta-molecules with precise control over their size, shape, material, and orientation. We demonstrate how we can tune the corresponding coupled resonances through the entire visible spectrum. Our fabrication method, based on glancing angle physical vapor deposition with gradient shadowing, permits critical parameters to be varied across the wafer and thus is ideally suited to screen potential structures. We obtain billions of aligned dimer structures with controlled variation of the spectral properties across the wafer. We spectroscopically map the plasmonic resonances of gold dimer structures and show that they not only are in good agreement with numerically modeled spectra, but also remain functional, at least for a year, in ambient conditions.
DOI URL BibTeX

Perceiving Systems Conference Paper The Influence of Visual Perspective on Body Size Estimation in Immersive Virtual Reality Thaler, A., Pujades, S., Stefanucci, J. K., Creem-Regehr, S. H., Tesch, J., Black, M. J., Mohler, B. J. In ACM Symposium on Applied Perception, 1-12, ACM, SAP '19: ACM Symposium on Applied Perception, September 2019 (Published)
The creation of realistic self-avatars that users identify with is important for many virtual reality applications. However, current approaches for creating biometrically plausible avatars that represent a particular individual require expertise and are time-consuming. We investigated the visual perception of an avatar’s body dimensions by asking males and females to estimate their own body weight and shape on a virtual body using a virtual reality avatar creation tool. In a method of adjustment task, the virtual body was presented in an HTC Vive head-mounted display either co-located with (first-person perspective) or facing (third-person perspective) the participants. Participants adjusted the body weight and dimensions of various body parts to match their own body shape and size. Both males and females underestimated their weight by 10-20% in the virtual body, but the estimates of the other body dimensions were relatively accurate and within a range of ±6%. There was a stronger influence of visual perspective on the estimates for males, but this effect was dependent on the amount of control over the shape of the virtual body, indicating that the results might be caused by where in the body the weight changes expressed themselves. These results suggest that this avatar creation tool could be used to allow participants to make a relatively accurate self-avatar in terms of adjusting body part dimensions, but not weight, and that the influence of visual perspective and amount of control needed over the body shape are likely gender-specific.
pdf DOI BibTeX

Perceiving Systems Patent Method for providing a three dimensional body model Loper, M., Mahmood, N., Black, M. September 2019, U.S.~Patent 10,417,818
A method for providing a three-dimensional body model which may be applied for an animation, based on a moving body, wherein the method comprises providing a parametric three-dimensional body model, which allows shape and pose variations; applying a standard set of body markers; optimizing the set of body markers by generating an additional set of body markers and applying the same for providing 3D coordinate marker signals for capturing shape and pose of the body and dynamics of soft tissue; and automatically providing an animation by processing the 3D coordinate marker signals in order to provide a personalized three-dimensional body model, based on estimated shape and an estimated pose of the body by means of predicted marker locations.
MoSh Project pdf BibTeX

Autonomous Vision Conference Paper NoVA: Learning to See in Novel Viewpoints and Domains Coors, B., Condurache, A. P., Geiger, A. In 2019 International Conference on 3D Vision (3DV), 116-125, IEEE, 2019 International Conference on 3D Vision (3DV), September 2019 (Published)
Domain adaptation techniques enable the re-use and transfer of existing labeled datasets from a source to a target domain in which little or no labeled data exists. Recently, image-level domain adaptation approaches have demonstrated impressive results in adapting from synthetic to real-world environments by translating source images to the style of a target domain. However, the domain gap between source and target may not only be caused by a different style but also by a change in viewpoint. This case necessitates a semantically consistent translation of source images and labels to the style and viewpoint of the target domain. In this work, we propose the Novel Viewpoint Adaptation (NoVA) model, which enables unsupervised adaptation to a novel viewpoint in a target domain for which no labeled data is available. NoVA utilizes an explicit representation of the 3D scene geometry to translate source view images and labels to the target view. Experiments on adaptation to synthetic and real-world datasets show the benefit of NoVA compared to state-of-the-art domain adaptation approaches on the task of semantic segmentation.
pdf suppmat poster video DOI BibTeX

Embodied Vision Conference Paper Learning to Disentangle Latent Physical Factors for Video Prediction Zhu, D., Munderloh, M., Rosenhahn, B., Stückler, J. In Pattern Recognition - Proceedings German Conference on Pattern Recognition (GCPR), Springer International, German Conference on Pattern Recognition (GCPR), September 2019 (Published) dataset & evaluation code video preprint DOI BibTeX

Empirical Inference Conference Paper A Differentially Private Kernel Two-Sample Test Raj*, A., Law*, L., Sejdinovic*, D., Park, M. Machine Learning and Knowledge Discovery in Databases (ECML/PKDD), 119066:697-724, Lecture Notes in Computer Science, (Editors: Brefeld, Ulf and Fromont, Elisa and Hotho, Andreas and Knobbe, Arno and Maathuis, Marloes and Robardet, Céline), Springer International Publishing, September 2019, *equal contribution (Published) DOI BibTeX

Empirical Inference Article Color Constancy in Deep Neural Networks Flachot, A., Schuett, H., Fleming, R. W., Wichmann, F. A., Gegenfurtner, K. R. Journal of Vision, 19(10):article no. 298, September 2019 (Published)
Journal of Vision 2019;19(10):298. doi: https://doi.org/10.1167/19.10.298.
DOI BibTeX

Empirical Inference Article Convolutional neural networks: A magic bullet for gravitational-wave detection? Gebhard, T., Kilbertus, N., Harry, I., Schölkopf, B. Physical Review D, 100(6):article no. 063015, American Physical Society, September 2019 (Published) DOI URL BibTeX