Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Social Foundations of Computation Conference Paper Don’t Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget Dorner, F. E., Hardt, M. In Proceedings of the 41st International Conference on Machine Learning (ICML 2024), PMLR, The Forty-First International Conference on Machine Learning (ICML), July 2024 (Published)
We study how to best spend a budget of noisy labels to compare the accuracy of two binary classifiers. It's common practice to collect and aggregate multiple noisy labels for a given data point into a less noisy label via a majority vote. We prove a theorem that runs counter to conventional wisdom. If the goal is to identify the better of two classifiers, we show it's best to spend the budget on collecting a single label for more samples. Our result follows from a non-trivial application of Cram\'er's theorem, a staple in the theory of large deviations. We discuss the implications of our work for the design of machine learning benchmarks, where they overturn some time-honored recommendations. In addition, our results provide sample size bounds superior to what follows from Hoeffding's bound.
ArXiv URL BibTeX

Haptic Intelligence Master Thesis Estimating Contact Forces Across Soft Capacitive Tactile Sensors Using Machine Learning Tiwari, A. Saarland University, Saarbrücken, Germany, July 2024, M.Sc. in Embedded Systems (Published)
Robots have become an essential part of the modern world, playing a crucial role in applications from manufacturing to healthcare. Despite significant advancements, the operational range of robots remains relatively narrow, often limited to controlled environments and simple, predetermined tasks. Tactile sensors show promise in broadening this range by enhancing a robot's performance in fine manipulation tasks. These sensors enable robots to perceive contact, providing a more nuanced understanding of their environment in real time. The challenge, however, lies in deriving meaningful and interpretable insights from these sensors, such as contact location and force, which are crucial for dexterous manipulation tasks. To address this challenge, this thesis develops machine learning-based software that achieves precise real-time contact location and force sensing across the entire surface of a grid-based soft capacitive tactile sensor, enabling rapid and straightforward deployment and facilitating transferability to other sensor instances, all while retaining the advantageous attributes of capacitance technology. Machine learning models were trained using data captured by indenting the sensor surface and measuring the sensor responses and the applied normal forces. Convolutional neural networks (CNNs) were selected for their low prediction errors in contact force estimation with the collected dataset. Two distinct models were developed: one for estimating contact forces at a single point and another for estimating normal force distributions. The transferability of the trained models across different sensor instances was evaluated and improved. The single point contact force estimation model's practical utility was demonstrated through real-time closed-loop control of a Franka Emika Panda robot arm through two specific tasks: tactile servoing in 1D and active object centering in 2D. This research contributes to enhancing the accessibility of soft tactile sensors in robotic applications through machine learning and demonstrates that this approach can improve the capabilities of tactile sensors.
BibTeX

Empirical Inference Conference Paper Geometry-Aware Instrumental Variable Regression Kremer, H., Schölkopf, B. Proceedings of the 41st International Conference on Machine Learning (ICML), 235:25560-25582, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Implicit meta-learning may lead language models to trust more reliable sources Krasheninnikov, D., Krasheninnikov, E., Mlodozeniec, B. K., Maharaj, T., Krueger, D. Proceedings of the 41st International Conference on Machine Learning, 235:25534-25559, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Improving Neural Additive Models with Bayesian Principles Bouchiat, K., Immer, A., Yèche, H., Rätsch, G., Fortuin, V. Proceedings of the 41st International Conference on Machine Learning (ICML), 235:4416-4443, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) URL BibTeX

Robust Machine Learning Conference Paper InfoNCE: Identifying the Gap Between Theory and Practice Rusak, E., Reizinger, P., Juhos, A., Bringmann, O., Zimmermann, R. S., Brendel, W. In July 2024 (Published) BibTeX

Social Foundations of Computation Conference Paper Inherent Trade-Offs between Diversity and Stability in Multi-Task Benchmarks Zhang, G., Hardt, M. In Proceedings of the 41st International Conference on Machine Learning (ICML 2024), PMLR, The Forty-First International Conference on Machine Learning (ICML), July 2024 (Published)
We examine multi-task benchmarks in machine learning through the lens of social choice theory. We draw an analogy between benchmarks and electoral systems, where models are candidates and tasks are voters. This suggests a distinction between cardinal and ordinal benchmark systems. The former aggregate numerical scores into one model ranking; the latter aggregate rankings for each task. We apply Arrow's impossibility theorem to ordinal benchmarks to highlight the inherent limitations of ordinal systems, particularly their sensitivity to the inclusion of irrelevant models. Inspired by Arrow's theorem, we empirically demonstrate a strong trade-off between diversity and sensitivity to irrelevant changes in existing multi-task benchmarks. Our result is based on new quantitative measures of diversity and sensitivity that we introduce. Sensitivity quantifies the impact that irrelevant changes to tasks have on a benchmark. Diversity captures the degree of disagreement in model rankings across tasks. We develop efficient approximation algorithms for both measures, as exact computation is computationally challenging. Through extensive experiments on seven cardinal benchmarks and eleven ordinal benchmarks, we demonstrate a clear trade-off between diversity and stability: The more diverse a multi-task benchmark, the more sensitive to trivial changes it is. Additionally, we show that the aggregated rankings of existing benchmarks are highly unstable under irrelevant changes.
ArXiv Code URL BibTeX

Autonomous Learning Conference Paper LPGD: A General Framework for Backpropagation through Embedded Optimization Layers Paulus, A., Martius, G., Musil, V. In Proceedings of the 41st International Conference on Machine Learning (ICML), 235:39989-40014, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) URL BibTeX

Autonomous Learning Conference Paper Learning with 3D rotations, a hitchhiker’s guide to SO(3) Geist, A. R., Frey, J., Zhobro, M., Levina, A., Martius, G. In Proceedings of Machine Learning Research, Proceedings of the Forty-First International Conference on Machine Learning , 235:15331-15350, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), Forty-First International Conference on Machine Learning , July 2024 (Published)
Many settings in machine learning require the selection of a rotation representation. However, choosing a suitable representation from the many available options is challenging. This paper acts as a survey and guide through rotation representations. We walk through their properties that harm or benefit deep learning with gradient-based optimization. By consolidating insights from rotation-based learning, we provide a comprehensive overview of learning functions with rotation representations. We provide guidance on selecting representations based on whether rotations are in the model's input or output and whether the data primarily comprises small angles.
URL BibTeX

Perceiving Systems Ph.D. Thesis Modelling Dynamic 3D Human-Object Interactions: From Capture to Synthesis Taheri, O. University of Tübingen, July 2024 (Accepted)
Modeling digital humans that move and interact realistically with virtual 3D worlds has emerged as an essential research area recently, with significant applications in computer graphics, virtual and augmented reality, telepresence, the Metaverse, and assistive technologies. In particular, human-object interaction, encompassing full-body motion, hand-object grasping, and object manipulation, lies at the core of how humans execute tasks and represents the complex and diverse nature of human behavior. Therefore, accurate modeling of these interactions would enable us to simulate avatars to perform tasks, enhance animation realism, and develop applications that better perceive and respond to human behavior. Despite its importance, this remains a challenging problem, due to several factors such as the complexity of human motion, the variance of interaction based on the task, and the lack of rich datasets capturing the complexity of real-world interactions. Prior methods have made progress, but limitations persist as they often focus on individual aspects of interaction, such as body, hand, or object motion, without considering the holistic interplay among these components. This Ph.D. thesis addresses these challenges and contributes to the advancement of human-object interaction modeling through the development of novel datasets, methods, and algorithms.
BibTeX

Autonomous Learning Conference Paper Modelling Microbial Communities with Graph Neural Networks Ruaud, A., Sancaktar, C., Bagatella, M., Ratzke, C., Martius, G. In Proceedings of the 41st International Conference on Machine Learning (ICML), 235:42742-42765, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) URL BibTeX

Haptic Intelligence Intelligent Control Systems Article Multimodal Multi-User Surface Recognition with the Kernel Two-Sample Test Khojasteh, B., Solowjow, F., Trimpe, S., Kuchenbecker, K. J. IEEE Transactions on Automation Science and Engineering, 21(3):4432-4447, July 2024 (Published)
Machine learning and deep learning have been used extensively to classify physical surfaces through images and time-series contact data. However, these methods rely on human expertise and entail the time-consuming processes of data and parameter tuning. To overcome these challenges, we propose an easily implemented framework that can directly handle heterogeneous data sources for classification tasks. Our data-versus-data approach automatically quantifies distinctive differences in distributions in a high-dimensional space via kernel two-sample testing between two sets extracted from multimodal data (e.g., images, sounds, haptic signals). We demonstrate the effectiveness of our technique by benchmarking against expertly engineered classifiers for visual-audio-haptic surface recognition due to the industrial relevance, difficulty, and competitive baselines of this application; ablation studies confirm the utility of key components of our pipeline. As shown in our open-source code, we achieve 97.2\% accuracy on a standard multi-user dataset with 108 surface classes, outperforming the state-of-the-art machine-learning algorithm by 6\% on a more difficult version of the task. The fact that our classifier obtains this performance with minimal data processing in the standard algorithm setting reinforces the powerful nature of kernel methods for learning to recognize complex patterns. Note to Practitioners—We demonstrate how to apply the kernel two-sample test to a surface-recognition task, discuss opportunities for improvement, and explain how to use this framework for other classification problems with similar properties. Automating surface recognition could benefit both surface inspection and robot manipulation. Our algorithm quantifies class similarity and therefore outputs an ordered list of similar surfaces. This technique is well suited for quality assurance and documentation of newly received materials or newly manufactured parts. More generally, our automated classification pipeline can handle heterogeneous data sources including images and high-frequency time-series measurements of vibrations, forces and other physical signals. As our approach circumvents the time-consuming process of feature engineering, both experts and non-experts can use it to achieve high-accuracy classification. It is particularly appealing for new problems without existing models and heuristics. In addition to strong theoretical properties, the algorithm is straightforward to use in practice since it requires only kernel evaluations. Its transparent architecture can provide fast insights into the given use case under different sensing combinations without costly optimization. Practitioners can also use our procedure to obtain the minimum data-acquisition time for independent time-series data from new sensor recordings.
DOI BibTeX

Empirical Inference Conference Paper On the Growth of Mistakes in Differentially Private Online Learning: A Lower Bound Perspective Dmitriev, D., Szabó, K., Sanyal, A. Proceedings of the 37th Annual Conference on Learning Theory (COLT), 247:1379-1398, Proceedings of Machine Learning Research, (Editors: Agrawal, Shipra and Roth, Aaron), PMLR, July 2024, (talk) (Published) URL BibTeX

Empirical Inference Robust Machine Learning Conference Paper Position: Understanding LLMs Requires More Than Statistical Generalization Reizinger, P., Ujváry, S., Mészáros, A., Kerekes, A., Brendel, W., Huszár, F. Proceedings of the 41st International Conference on Machine Learning (ICML), 235:42365-42390, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) arXiv URL BibTeX

Empirical Inference Article Probabilistic pathway-based multimodal factor analysis Immer, A., Stark, S. G., Jacob, F., Bonilla, X., Thomas, T., Kahles, A., Goetze, S., Milani, E. S., Wollscheid, B., Consortium, T. T. P., Rätsch, G., Lehmann, K. Bioinformatics, 40(Supplement 1):i189-i198, July 2024 (Published) DOI URL BibTeX

Empirical Inference Conference Paper Products, Abstractions and Inclusions of Causal Spaces Buchholz, S., Park, J., Schölkopf, B. 40th Conference on Uncertainty in Artificial Intelligence (UAI), 244:430-449, Proceedings of Machine Learning Research, (Editors: Kiyavash, Negar and Mooij, Joris M.), PMLR, July 2024 (Published) arXiv URL BibTeX

Empirical Inference Conference Paper Provable Privacy with Non-Private Pre-Processing Hu, Y., Sanyal, A., Schölkopf, B. Proceedings of the 41st International Conference on Machine Learning (ICML), 235:19402-19437, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) URL BibTeX

Haptic Intelligence Conference Paper Reflectance Outperforms Force and Position in Model-Free Needle Puncture Detection L’Orsa, R., Bisht, A., Yu, L., Murari, K., Westwick, D. T., Sutherland, G. R., Kuchenbecker, K. J. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 1-7, Orlando, USA, July 2024 (Published)
The surgical procedure of needle thoracostomy temporarily corrects accidental over-pressurization of the space between the chest wall and the lungs. However, failure rates of up to 94.1\% have been reported, likely because this procedure is done blind: operators estimate by feel when the needle has reached its target. We believe instrumented needles could help operators discern entry into the target space, but limited success has been achieved using force and/or position to try to discriminate needle puncture events during simulated surgical procedures. We thus augmented our needle insertion system with a novel in-bore double-fiber optical setup. Tissue reflectance measurements as well as 3D force, torque, position, and orientation were recorded while two experimenters repeatedly inserted a bevel-tipped percutaneous needle into ex vivo porcine ribs. We applied model-free puncture detection to various filtered time derivatives of each sensor data stream offline. In the held-out test set of insertions, puncture-detection precision improved substantially using reflectance measurements compared to needle insertion force alone (3.3-fold increase) or position alone (11.6-fold increase).
DOI BibTeX

Empirical Inference Conference Paper Robustness of Nonlinear Representation Learning Buchholz, S., Schölkopf, B. Proceedings of the 41st International Conference on Machine Learning (ICML), 235:4785-4821, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Simultaneous identification of models and parameters of scientific simulators Schröder, C., Macke, J. H. Proceedings of the 41st International Conference on Machine Learning (ICML), 235:43895-43927, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Stitching Manifolds: Leveraging Interaction to Compose Object Representations into Scenes Keurti, H., Schölkopf, B., Aceituno, P. V., Grewe, B. ICML 2024 Workshop on Geometry-grounded Representation Learning and Generative Modeling (GRaM), July 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Targeted Reduction of Causal Models Kekić, A., Schölkopf, B., Besserve, M. 40th Conference on Uncertainty in Artificial Intelligence (UAI), 244:1953-1980, Proceedings of Machine Learning Research, (Editors: Kiyavash, Negar and Mooij, Joris M.), PMLR, July 2024 (Published) arXiv URL BibTeX

Human Aspects of Machine Learning Empirical Inference Conference Paper The Role of Learning Algorithms in Collective Action Ben-Dov*, O., Fawkes*, J., Samadi, S., Sanyal, A. Proceedings of the 41st International Conference on Machine Learning (ICML), 235:3443-3461, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024, *equal contribution (Published) URL BibTeX

Empirical Inference Conference Paper Unveiling CLIP Dynamics: Linear Mode Connectivity and Generalization Abdolahpourrostam, A., Sanyal, A., Moosavi-Dezfooli, S. ICML 2024 Workshop on Foundation Models in the Wild, July 2024 (Published) URL BibTeX

Empirical Inference Conference Paper What Makes Safety Fine-tuning Methods Safe? A Mechanistic Study Jain, S., Lubana, E. S., Oksuz, K., Joy, T., Torr, P. H. S., Sanyal, A., Dokania, P. K. ICML 2024 Workshop on Mechanistic Interpretability (Spotlight), July 2024 (Published) URL BibTeX

Perceiving Systems Conference Paper ContourCraft: Learning to Resolve Intersections in Neural Multi-Garment Simulations Grigorev, A., Becherini, G., Black, M., Hilliges, O., Thomaszewski, B. In Proceedings SIGGRAPH 2024 Conference Papers , Association for Computing Machinery, New York, NY, USA, SIGGRAPH '24 , July 2024 (Published)
Learning-based approaches to cloth simulation have started to show their potential in recent years. However, handling collisions and intersections in neural simulations remains a largely unsolved problem. In this work, we present ContourCraft, a learning-based solution for handling intersections in neural cloth simulations. Unlike conventional approaches that critically rely on intersection-free inputs, ContourCraft robustly recovers from intersections introduced through missed collisions, self-penetrating bodies, or errors in manually designed multi-layer outfits. The technical core of ContourCraft is a novel intersection contour loss that penalizes interpenetrations and encourages rapid resolution thereof. We integrate our intersection loss with a collision-avoiding repulsion objective into a neural cloth simulation method based on graph neural networks (GNNs). We demonstrate our method’s ability across a challenging set of diverse multi-layer outfits under dynamic human motions. Our extensive analysis indicates that ContourCraft significantly improves collision handling for learned simulation and produces visually compelling results.
paper arXiv project video code DOI URL BibTeX

Perceiving Systems Conference Paper Airship Formations for Animal Motion Capture and Behavior Analysis Price, E., Ahmad, A. Proceedings 2nd International Conference on Design and Engineering of Lighter-Than-Air systems (DELTAS2024), 2nd International Conference on Design and Engineering of Lighter-Than-Air systems (DELTAS2024), June 2024 (Published)
Using UAVs for wildlife observation and motion capture offers manifold advantages for studying animals in the wild, especially grazing herds in open terrain. The aerial perspective allows observation at a scale and depth that is not possible on the ground, offering new insights into group behavior. However, the very nature of wildlife field-studies puts traditional fixed wing and multi-copter systems to their limits: limited flight time, noise and safety aspects affect their efficacy, where lighter than air systems can remain on station for many hours. Nevertheless, airships are challenging from a ground handling perspective as well as from a control point of view, being voluminous and highly affected by wind. In this work, we showcase a system designed to use airship formations to track, follow, and visually record wild horses from multiple angles, including airship design, simulation, control, on board computer vision, autonomous operation and practical aspects of field experiments.
arXiv URL BibTeX

Autonomous Learning Article PaSTS An Operational Dataset for Domestic Solar Thermal Systems Ebmeier, F., Ludwig, N., Martius, G., Franz, V. H. PaSTS An Operational Dataset for Domestic Solar Thermal Systems, June 2024 (Accepted)
Solar thermal systems play an important role in the decarbonization of the domestic heating sector, yet there exist no publicly available datasets of such systems. Therefore, this paper presents the PaSTS dataset, a unique collection of operational data from domestic Solar Thermal Systems (STS) manufactured by Ritter Energie and marketed under the Paradigma brand. Unlike previous research that primarily relied on simulated or unpublished experimental data, this dataset is derived from the service team at Ritter Energie, offering a realistic reflection of the challenges commonly faced in the field. This paper provides a comprehensive dataset overview, emphasizing its application in anomaly and fault detection tasks within STS and establishes the dataset as the first of its kind. Given the inherent complexities of fault detection in STS, we elaborate on the expert system-based fault detection mechanism currently in …
URL BibTeX

Deep Models and Optimization Conference Paper Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues Orvieto, A., De, S., Gulcehre, C., Pascanu, R., Smith, S. L. In Proceedings of Machine Learning Research, Proceedings of the Forty-First International Conference on Machine Learning , Forty-First International Conference on Machine Learning , June 2024 (Published) URL BibTeX

Empirical Inference Ph.D. Thesis Advancing Normalising Flows to Model Boltzmann Distributions Stimper, V. University of Cambridge, UK, Cambridge, June 2024, (Cambridge-Tübingen-Fellowship-Program) (Published) BibTeX

Empirical Inference Conference Paper Analyzing the Role of Semantic Representations in the Era of Large Language Models Jin*, Z., Chen*, Y., Gonzalez*, F., Liu, J., Zhang, J., Michael, J., Schölkopf, B., Diab, M. Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Volume 1: Long Papers:3781-3798, (Editors: Duh, Kevin and Gomez, Helena and Bethard, Steven), Association for Computational Linguistics, June 2024, *equal contribution (Published) arXiv DOI URL BibTeX

Empirical Inference Conference Paper Automatic Generation of Model and Data Cards: A Step Towards Responsible AI Liu, J., Li, W., Jin, Z., Diab, M. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), Volume 1: Long Papers:1975-1997, (Editors: Duh, Kevin and Gomez, Helena and Bethard, Steven), Association for Computational Linguistics, June 2024 (Published) DOI URL BibTeX

Haptic Intelligence Bachelor Thesis Kalman Filter Approach to Sensor Fusion of Ultra-Wideband Positioning and IMU Readings for Enhanced Indoor Tracking of Collaborating Humans Hudhud Mughrabi, M. Kadir Has University, Istanbul, Turkey, June 2024, Bachelor of Science (BSc) in Mechatronics Engineering (Published)
The question of how humans collaborate to perform complex tasks such as surgery has previously been investigated via multimodal sensing and analysis. Ultra-wideband (UWB) localization systems can be deployed to track collaborating team members due to good maneuverability even in cramped environments. However, UWB systems' sampling rate is inversely proportional to the number of people tracked, and their accuracy is hindered by electromagnetic occlusion. This thesis combines UWB positioning with measurements from a wearable inertial measurement unit (IMU) by applying an error-state extended Kalman filter (ES-EKF) to improve position and orientation estimation during team collaborative studies. ES-EKF offers faster and more consistent estimation and can be estimated even without UWB input. Single-human and multi-human sessions were recorded and filtered for evaluation in comparison to ground truth from optical motion capture. By integrating the IMU, the ES-EKF increases the sampling rate from 0.5–20 Hz to 100 Hz. As it is corrected in only 2 degrees of freedom (DOF), the ES-EKF yields improved results over UWB in 4 out of 6 DOF: lateral and longitudinal position and yaw and pitch orientation. Further filter design implications are suggested for future application of ES-EKF in position and orientation estimation of collaborating humans.
BibTeX

Perceiving Systems Conference Paper Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation Petrovich, M., Litany, O., Iqbal, U., Black, M. J., Varol, G., Peng, X. B., Rempe, D. In CVPR Workshop on Human Motion Generation, Seattle, CVPR, June 2024 (Published)
Recent advances in generative modeling have led to promising progress on synthesizing 3D human motion from text, with methods that can generate character animations from short prompts and specified durations. However, using a single text prompt as input lacks the fine-grained control needed by animators, such as composing multiple actions and defining precise durations for parts of the motion. To address this, we introduce the new problem of timeline control for text-driven motion synthesis, which provides an intuitive, yet fine-grained, input interface for users. Instead of a single prompt, users can specify a multi-track timeline of multiple prompts organized in temporal intervals that may overlap. This enables specifying the exact timings of each action and composing multiple actions in sequence or at overlapping intervals. To generate composite animations from a multi-track timeline, we propose a new test-time denoising method. This method can be integrated with any pre-trained motion diffusion model to synthesize realistic motions that accurately reflect the timeline. At every step of denoising, our method processes each timeline interval (text prompt) individually, subsequently aggregating the predictions with consideration for the specific body parts engaged in each action. Experimental comparisons and ablations validate that our method produces realistic motions that respect the semantics and timing of given text prompts.
code website paper-arxiv video URL BibTeX

Neural Capture and Synthesis Perceiving Systems Conference Paper Neuropostors: Neural Geometry-aware 3D Crowd Character Impostors Ostrek, M., Mitra, N. J., O’Sullivan, C. In 27th International Conference on Pattern Recognition (ICPR), Springer, 27th International Conference on Pattern Recognition (ICPR), June 2024 (Published)
Crowd rendering and animation was a very active research area over a decade ago, but in recent years this has lessened, mainly due to improvements in graphics acceleration hardware. Nevertheless, there is still a high demand for generating varied crowd appearances and animation for games, movie production, and mixed-reality applications. Current approaches are still limited in terms of both the behavioral and appearance aspects of virtual characters due to (i) high memory and computational demands; and (ii) person-hours needed of skilled artists in the context of short production cycles. A promising previous approach to generating varied crowds was the use of pre-computed impostor representations for crowd characters, which could replace an animation of a 3D mesh with a simplified 2D impostor for every frame of an animation sequence, e.g., Geopostors [1]. However, with their high memory demands at a time when improvements in consumer graphics accelerators were outpacing memory availability, the practicality of such methods was limited. Inspired by this early work and recent advances in the field of Neural Rendering, we present a new character representation: Neuropostors. We train a Convolutional Neural Network as a means of compressing both the geometric properties and animation key-frames for a 3D character, thereby allowing for constant-time rendering of animated characters from arbitrary camera views. Our method also allows for explicit illumination and material control, by utilizing a flexible rendering equation that is connected to the outputs of the neural network.
BibTeX

Robust Machine Learning Article Translational symmetry in convolutions with localized kernels causes an implicit bias toward high frequency adversarial examples Caro, J. O., Ju, Y., Pyle, R., Dey, S., Brendel, W., Anselmi, F., Patel, A. B. Frontiers in Computational Neuroscience, 18:1387077, June 2024 (Published) Frontiers in Computational Neuroscience BibTeX

Perceiving Systems Conference Paper 4D-DRESS: A 4D Dataset of Real-World Human Clothing With Semantic Annotations Wang, W., Ho, H., Guo, C., Rong, B., Grigorev, A., Song, J., Zarate, J. J., Hilliges, O. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), CVPR, June 2024 (Published)
The studies of human clothing for digital avatars have predominantly relied on synthetic datasets. While easy to collect, synthetic data often fall short in realism and fail to capture authentic clothing dynamics. Addressing this gap, we introduce 4D-DRESS, the first real-world 4D dataset advancing human clothing research with its high-quality 4D textured scans and garment meshes. 4D-DRESS captures 64 outfits in 520 human motion sequences amounting to a total of 78k textured scans. Creating a real-world clothing dataset is challenging, particularly in annotating and segmenting the extensive and complex 4D human scans. To address this, we develop a semi-automatic 4D human parsing pipeline. We efficiently combine a human-in-the-loop process with automation to accurately label 4D scans in diverse garments and body movements. Leveraging precise annotations and high-quality garment meshes, we establish a number of benchmarks for clothing simulation and reconstruction. 4D-DRESS offers realistic and challenging data that complements synthetic sources, paving the way for advancements in research of lifelike human clothing.
arXiv project code data BibTeX

Perceiving Systems Conference Paper MonoHair: High-Fidelity Hair Modeling from a Monocular Video Wu, K., Yang, L., Kuang, Z., Feng, Y., Han, X., Shen, Y., Fu, H., Zhou, K., Zheng, Y. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 24164-24173, CVPR, June 2024 (Published)
Undoubtedly, high-fidelity 3D hair is crucial for achieving realism, artistic expression, and immersion in computer graphics. While existing 3D hair modeling methods have achieved impressive performance, the challenge of achieving high-quality hair reconstruction persists: they either require strict capture conditions, making practical applications difficult, or heavily rely on learned prior data, obscuring fine-grained details in images. To address these challenges, we propose a generic framework to achieve high-fidelity hair reconstruction from a monocular video, without specific requirements for environments. Our approach bifurcates the hair modeling process into two main stages: precise exterior reconstruction and interior structure inference. The exterior is meticulously crafted using our Patch-based Multi-View Optimization (PMVO). This method strategically collects and integrates hair information from multiple views, independent of prior data, to produce a high-fidelity exterior 3D line map. This map not only captures intricate details but also facilitates the inference of the hair’s inner structure. For the interior, we employ a data-driven, multi-view 3D hair reconstruction method. This method utilizes 2D structural renderings derived from the reconstructed exterior, mirroring the synthetic 2D inputs used during training. This alignment effectively bridges the domain gap between our training data and real-world data, thereby enhancing the accuracy and reliability of our interior structure inference. Lastly, we generate a strand model and resolve the directional ambiguity by our hair growth algorithm. Our experiments demonstrate that our method exhibits robustness across diverse hairstyles and achieves state-of-the-art performance.
Project Arxiv DOI URL BibTeX

Perceiving Systems Conference Paper TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation Dwivedi, S. K., Sun, Y., Patel, P., Feng, Y., Black, M. J. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 1323-1333, CVPR, June 2024 (Published)
We address the problem of regressing 3D human pose and shape from a single image, with a focus on 3D accuracy. The current best methods leverage large datasets of 3D pseudo-ground-truth (p-GT) and 2D keypoints, leading to robust performance. With such methods, we observe a paradoxical decline in 3D pose accuracy with increasing 2D accuracy. This is caused by biases in the p-GT and the use of an approximate camera projection model. We quantify the error induced by current camera models and show that fitting 2D keypoints and p-GT accurately causes incorrect 3D poses. Our analysis defines the invalid distances within which minimizing 2D and p-GT losses is detrimental. We use this to formulate a new loss Threshold-Adaptive Loss Scaling (TALS) that penalizes gross 2D and p-GT losses but not smaller ones. With such a loss, there are many 3D poses that could equally explain the 2D evidence. To reduce this ambiguity we need a prior over valid human poses but such priors can introduce unwanted bias. To address this, we exploit a tokenized representation of human pose and reformulate the problem as token prediction. This restricts the estimated poses to the space of valid poses, effectively providing a uniform prior. Extensive experiments on the EMDB and 3DPW datasets show that our reformulated keypoint loss and tokenization allows us to train on in-the-wild data while improving 3D accuracy over the state-of-the-art.
Paper Project Code Poster Video DOI URL BibTeX

Haptic Intelligence Article AiroTouch: Enhancing Telerobotic Assembly through Naturalistic Haptic Feedback of Tool Vibrations Gong, Y., Mat Husin, H., Erol, E., Ortenzi, V., Kuchenbecker, K. J. Frontiers in Robotics and AI, 11(1355205):1-15, May 2024 (Published)
Teleoperation allows workers to safely control powerful construction machines; however, its primary reliance on visual feedback limits the operator's efficiency in situations with stiff contact or poor visibility, hindering its use for assembly of pre-fabricated building components. Reliable, economical, and easy-to-implement haptic feedback could fill this perception gap and facilitate the broader use of robots in construction and other application areas. Thus, we adapted widely available commercial audio equipment to create AiroTouch, a naturalistic haptic feedback system that measures the vibration experienced by each robot tool and enables the operator to feel a scaled version of this vibration in real time. Accurate haptic transmission was achieved by optimizing the positions of the system's off-the-shelf accelerometers and voice-coil actuators. A study was conducted to evaluate how adding this naturalistic type of vibrotactile feedback affects the operator during telerobotic assembly. Thirty participants used a bimanual dexterous teleoperation system (Intuitive da Vinci Si) to build a small rigid structure under three randomly ordered haptic feedback conditions: no vibrations, one-axis vibrations, and summed three-axis vibrations. The results show that users took advantage of both tested versions of the naturalistic haptic feedback after gaining some experience with the task, causing significantly lower vibrations and forces in the second trial. Subjective responses indicate that haptic feedback increased the realism of the interaction and reduced the perceived task duration, task difficulty, and fatigue. As hypothesized, higher haptic feedback gains were chosen by users with larger hands and for the smaller sensed vibrations in the one-axis condition. These results elucidate important details for effective implementation of naturalistic vibrotactile feedback and demonstrate that our accessible audio-based approach could enhance user performance and experience during telerobotic assembly in construction and other application domains.
DOI BibTeX

Empirical Inference Conference Paper Can Large Language Models Infer Causation from Correlation? Jin, Z., Liu, J., Lyu, Z., Poff, S., Sachan, M., Mihalcea, R., Diab*, M., Schölkopf*, B. The Twelfth International Conference on Learning Representations (ICLR), May 2024, *equal supervision (Published) arXiv URL BibTeX

Empirical Inference Conference Paper Causal Modeling with Stationary Diffusions Lorch, L., Krause*, A., Schölkopf*, B. Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS), 238:1927-1935, Proceedings of Machine Learning Research, (Editors: Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen), PMLR, May 2024, *equal supervision (Published) URL BibTeX

Empirical Inference Conference Paper Certified private data release for sparse Lipschitz functions Donhauser, K., Lokna, J., Sanyal, A., Boedihardjo, M., Hönig, R., Yang, F. Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS), 238:1396-1404, Proceedings of Machine Learning Research, (Editors: Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen), PMLR, May 2024 (Published) URL BibTeX