Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Empirical Inference Ph.D. Thesis The Geometry of Learning Via Loss Landscape Curvature Singh, S. P. ETH Zurich, Switzerland, May 2025, CLS Fellowship Program (Published) BibTeX

Social Foundations of Computation Conference Paper To Give or Not to Give? The Impacts of Strategically Withheld Recourse Chen, Y., Estornell, A., Vorobeychik, Y., Liu, Y. In Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTAS), PMLR, The Twenty-Eight International Conference on Artificial Intelligence and Statistics (AISTATS), May 2025 (Published)
Individuals often aim to reverse undesired outcomes in interactions with automated systems, like loan denials, by either implementing system-recommended actions (recourse), or manipulating their features. While providing recourse benefits users and enhances system utility, it also provides information about the decision process that can be used for more effective strategic manipulation, especially when the individuals collectively share such information with each other. We show that this tension leads rational utility-maximizing systems to frequently withhold recourse, resulting in decreased population utility, particularly impacting sensitive groups. To mitigate these effects, we explore the role of recourse subsidies, finding them effective in increasing the provision of recourse actions by rational systems, as well as lowering the potential social cost and mitigating unfairness caused by recourse withholding.
arXiv URL BibTeX

Empirical Inference Conference Paper Training Neural Samplers with Reverse Diffusive KL Divergence He*, J., Chen*, W., Zhang*, M., Barber, D., Hernández-Lobato, J. M. Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS), 258:5167-5175, Proceedings of Machine Learning Research, (Editors: Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz), PMLR, May 2025, *equal contribution (Published) URL BibTeX

Haptic Intelligence Embodied Vision Robotics Conference Paper Visuo-Tactile Object Pose Estimation for a Multi-Finger Robot Hand with Low-Resolution In-Hand Tactile Sensing Mack, L., Grüninger, F., Richardson, B. A., Lendway, R., Kuchenbecker, K. J., Stueckler, J. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 12401-12407, Atlanta, USA, May 2025 (Published)
Accurate 3D pose estimation of grasped objects is an important prerequisite for robots to perform assembly or in-hand manipulation tasks, but object occlusion by the robot's own hand greatly increases the difficulty of this perceptual task. Here, we propose that combining visual information with binary, low-resolution tactile contact measurements from across the interior surface of an articulated robotic hand can mitigate this issue. The visuo-tactile object-pose-estimation problem is formulated probabilistically in a factor graph. The pose of the object is optimized to align with the two kinds of measurements using a robust cost function to reduce the influence of outlier readings. The advantages of the proposed approach are first demonstrated in simulation: a custom 15-DOF robot hand with one binary tactile sensor per link grasps 17 YCB objects while observed by an RGB-D camera. This low-resolution in-hand tactile sensing significantly improves object-pose estimates under high occlusion and also high visual noise. We also show these benefits through grasping tests with a preliminary real version of our tactile hand, obtaining reasonable visuo-tactile estimates of object pose at approximately 12.9 Hz on average.
DOI BibTeX

Empirical Inference Conference Paper Your Finetuned Large Language Model is Already a Powerful Out-of-distribution Detector Zhang, A., Xiao, T. Z., Liu, W., Bamler, R., Wischik, D. Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS), 258:2701-2709, Proceedings of Machine Learning Research, (Editors: Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz), PMLR, May 2025 (Published) URL BibTeX

Autonomous Learning Miscellaneous Emergence of natural and robust bipedal walking by learning from biologically plausible objectives Schumacher, P., Geijtenbeek, T., Caggiano, V., Kumar, V., Schmitt, S., Martius, G., Haeufle, D. F. iScience, 28(4):112203, April 2025 (Published)
Humans show unparalleled ability when maneuvering diverse terrains. While reinforcement learning (RL) has shown great promise for musculoskeletal simulation in the development of robust controllers, complex behaviors are only achievable under extensive use of motion data. We demonstrate that the combination of a recent RL algorithm with a biologically plausible reward is capable of learning controllers for 4 different musculoskeletal models and achieves locomotion with up to 90 muscles without demonstrations. Our controllers generalize to diverse and unseen terrains, while only a single adaptive objective function is needed for training. We validate our findings on four models in two different simulators. The RL agents perform robustly with complex 3D models, where reflex-controllers are difficult to apply, and produce close-to-natural motion. This is a first step for the motor control, biomechanics, and rehabilitation communities to generate complex human movements with RL, without using motion data or simple unrepresentative models.
DOI URL BibTeX

Perceiving Systems Ph.D. Thesis Estimating Human and Camera Motion From RGB Data Kocabas, M. April 2025 (Published)
This thesis presents a unified framework for markerless 3D human motion analysis from monocular videos, addressing three interrelated challenges that have limited the fidelity of existing approaches: (i) achieving temporally consistent and physically plausible human motion estimation, (ii) accurately modeling perspective camera effects in unconstrained settings, and (iii) disentangling human motion from camera motion in dynamic scenes. Our contributions are realized through three complementary methods. First, we introduce VIBE (Video Inference for Body Pose and Shape Estimation), a novel video pose and shape estimation framework. Despite progress on single-image 3D pose and shape estimation, existing video-based state-of-the-art methods fail to produce accurate and natural motion sequences due to a lack of ground-truth 3D motion data for training. To address this problem, we propose VIBE, which makes use of an existing large-scale motion capture dataset (AMASS) together with unpaired, in-the-wild, 2D keypoint annotations. Our key novelty is an adversarial learning framework that leverages AMASS to discriminate between real human motions and those produced by our temporal pose and shape regression networks. We define a temporal network architecture and show that adversarial training, at the sequence level, produces kinematically plausible motion sequences without in-the-wild ground-truth 3D labels. Second, we propose SPEC (Seeing People in the wild with Estimated Cameras), the first in-the-wild 3D human and shape (HPS) method that estimates the perspective camera from a single image and employs this to reconstruct 3D human bodies more accurately. Due to the lack of camera parameter information for in-the-wild images, existing 3D HPS estimation methods make several simplifying assumptions: weak-perspective projection, large constant focal length, and zero camera rotation. These assumptions often do not hold and we show, quantitatively and qualitatively, that they cause errors in the reconstructed 3D shape and pose. To address this, we introduce SPEC, the first in-the-wild 3D HPS method that estimates the perspective camera from a single image and employs this to reconstruct 3D human bodies more accurately. First, we train a neural network to estimate the field of view, camera pitch, and roll given an input image. We employ novel losses that improve the camera calibration accuracy over previous work. We then train a novel network that concatenates the camera calibration to the image features and uses these together to regress 3D body shape and pose. SPEC is more accurate than the prior art on the standard benchmark (3DPW) as well as two new datasets with more challenging camera views and varying focal lengths. Specifically, we create a new photorealistic synthetic dataset (SPEC-SYN) with ground truth 3D bodies and a novel in-the-wild dataset (SPEC-MTP) with calibration and high-quality reference bodies. Third, we develop PACE (Person And Camera Estimation), a method to estimate human motion in a global scene from moving cameras. This is a highly challenging task due to the entangling of human and camera motions in the video. Existing works assume camera is static and focus on solving the human motion in camera space. To address this problem, we propose a joint optimization framework that disentangles human and camera motions using both foreground human motion priors and background scene features. Unlike existing methods that use Simultaneous Localization and Mapping (SLAM) as initialization, we propose to tightly integrate SLAM and human motion priors in an optimization that is inspired by bundle adjustment. Specifically, we optimize human and camera motions to match both the observed human pose and scene features. This design combines the strengths of SLAM and motion priors, which leads to significant improvements in human and camera motion estimation. We additionally introduce a motion prior that is suitable for batch optimization, making our approach significantly more efficient than existing approaches. Finally, we propose a novel synthetic dataset that enables evaluating camera motion in addition to human motion from dynamic videos. Experiments on the synthetic and real-world datasets demonstrate that our approach substantially outperforms prior art in recovering both human and camera motions. Extensive experiments on standard benchmarks and new datasets we introduced demonstrate that our integrated approach substantially outperforms prior methods in terms of temporal consistency, reconstruction accuracy, and global motion estimation. While these results represent a significant advance in markerless human motion analysis, further work is needed to extend these techniques to multi-person scenarios, severe occlusions, and real-time applications. Overall, this thesis lays a strong foundation for more robust and accurate human motion analysis in unconstrained environments, with promising applications in robotics, augmented reality, sports analysis, and beyond.
Thesis PDF BibTeX

Perceiving Systems Ph.D. Thesis Understanding Human-Scene Interaction through Perception and Generation Yi, H. April 2025 (Published)
Humans are in constant contact with the world as they move through it and interact with it. Understanding Human-Scene Interactions (HSIs) is key to enhancing our perception and manipulation of three-dimensional (3D) environments, which is crucial for various applications such as gaming, architecture, and synthetic data creation. However, creating realistic 3D scenes populated by moving humans is a challenging and labor-intensive task. Existing human-scene interaction datasets are scarce and captured motion datasets often lack scene information. This thesis addresses these challenges by leveraging three specific types of HSI con- straints: (1) depth ordering constraint: humans that move in a scene are occluded or occlude objects, thus, defining the relative depth ordering of the objects, (2) collision constraint: humans move through free space and do not interpenetrate objects, (3) in- teraction constraint: when humans and objects are in contact, the contact surfaces oc- cupy the same place in space. Building on these constraints, we propose three distinct methodologies: capturing HSI from a monocular RGB video, generating HSI by gen- erating scenes from input human motions (scenes from humans) and generating human motion from scenes (humans from scenes). Firstly, we introduce MOVER , which jointly reconstructs 3D human motion and the interactive scenes from a RGB video. This optimization-based approach leverages these three aforementioned constraints to enhance the consistency and plausibility of recon- structed scene layouts and to refine the initial 3D human pose and shape estimations. Secondly, we present MIME , which takes 3D humans and a floor map as input to create realistic and interactive 3D environments. This method applies collision and interaction constraints, and employs an auto-regressive transformer architecture that integrates ob- jects into the scene based on existing human motion. The training data is enriched by populating the 3D FRONT scene dataset with 3D humans. By treating human movement as a “scanner” of the environment, this method results in furniture layouts that reflect true human activities, increasing the diversity and authenticity of the environments. Lastly, we introduce TeSMo , which generates 3D human motion from given 3D scenes and text descriptions, adhering to the collision and interaction constraints. It utilizes a text-controlled scene-aware motion generation framework based on denoising diffusion models. Annotated navigation and interaction motions are embedded within scenes to support the model’s training, allowing for the generation of diverse and realistic human- scene interactions tailored to specific settings and object arrangements. In conclusion, these methodologies significantly advance our understanding and syn- thesis of human-scene interactions, offering realistic modeling of 3D environments.
thesis BibTeX

Physical Intelligence Article Navigating microalgal biohybrids through confinements with magnetic guidance Akolpoglu, M. B., Baltaci, S. F., Bozuyuk, U., Karaz, S., Sitti, M. Matter, 8:102052, April 2025 (Published)
In the natural world, microorganisms constantly navigate through confined spaces—such as those found in tissues, biological gels, and soil—yet their behavior in such environments remains poorly understood. Here, we explore this phenomenon by examining the navigation of magnetic microalgal biohybrids in constrained microenvironments. By leveraging the inherent propulsion of green microalgae and external steering capabilities acquired through the magnetization of microalgal cells, our biohybrids exhibit efficient navigation in viscous and confined microenvironments. Through high-yield fabrication and magnetic manipulation, we show precise control over their movement. Our findings reveal distinct navigation patterns influenced by magnetic guidance, namely backtracking and crossing, shedding light on the unexplored dynamics of confined locomotion assisted by magnetism. Our work highlights the significance of understanding microalgal biohybrid swimming behavior, offering crucial insights for future biotechnological and biomedical applications requiring precise navigation in confined environments.
DOI URL BibTeX

Haptic Intelligence Miscellaneous A Method for Single-Input Sequencing of Hyperelastic Balloons Gertler, I., Kuchenbecker, K. J. Extended abstract (3 pages) presented at the IEEE-RAS International Conference on Soft Robotics (RoboSoft), Lausanne, Switzerland, April 2025 (Published)
This study demonstrates that encasing a hyperelastic balloon in an inextensible sleeve greatly increases its burst pressure while not influencing its minimum pressure. This simple mechanical behavior can be used to produce an asymmetric inflation-deflation sequence for coupled balloons with different thicknesses so they could serve as a soft robot's rear and front anchors when driven from a single fluid supply.
BibTeX

Empirical Inference Autonomous Learning Conference Paper Advancing Out-of-Distribution Detection via Local Neuroplasticity Canevaro, A., Schmidt, J., Marvi, M. S., Yu, H., Martius, G., Jordan, J. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published) arXiv BibTeX

Haptic Intelligence Robotics Miscellaneous Bio-Inspired Gradient (BIG) Whiskers: Stiffness-Shifting Structures Provide Dynamic Functional Benefits for Contact Sensing Schulz, A. K., Andrussow, I., Farsijani, F., Faulkner, R., Kuchenbecker, K. J. Extended abstract (3 pages) presented at the IEEE-RAS International Conference on Soft Robotics (RoboSoft), Lausanne, Switzerland, April 2025 (Published)
Mammal whiskers have inspired many sensors that can help robots find obstacles, identify textures, or sense flow. Though they vary in geometry, past bio-inspired whisker sensors were primarily constructed from homogenous materials. Interestingly, animal whiskers tend to shift from a stiff root to a much softer point; this material stiffness gradient is hypothesized to provide functional benefits such as reduction of wear and amplification of contact sensations. We take inspiration from nature to fabricate bio-inspired gradient (BIG) whiskers via 3D printing, and we assess their performance compared to stiff, medium, and soft homogenous artificial whiskers with the same geometry. Tests with controlled quasi-static and dynamic perturbations allow us to measure the whisker point deflection and the reaction torque at the stationary whisker root, respectively. The dynamic results reveal that BIG whiskers uniquely encode contact location along their length through torque magnitude and frequency, features that are not seen in the homogenous whiskers. These exciting preliminary findings motivate further exploration of robotic whiskers and other sensing structures with bio-inspired stiffness gradients.
BibTeX

Haptic Intelligence Robotics Article Building Instructions You Can Feel: Edge-Changing Haptic Devices for Digitally Guided Construction Tashiro, N., Faulkner, R., Melnyk, S., Rosales Rodriguez, T., Javot, B., Tahouni, Y., Cheng, T., Wood, D., Menges, A., Kuchenbecker, K. J. ACM Transactions on Computer-Human Interaction, 32(1):1-40, April 2025 (Published)
Recent efforts to connect builders to digital designs during construction have primarily focused on visual augmented reality, which requires accurate registration and specific lighting, and which could prevent a user from noticing safety hazards. Haptic interfaces, on the other hand, can convey physical design parameters through tangible local cues that don't distract from the surroundings. We propose two edge-changing haptic devices that use small inertial measurement units (IMUs) and linear actuators to guide users to perform construction tasks in real time: Drangle gives feedback for angling a drill relative to gravity, and Brangle assists with orienting bricks in the plane. We conducted a study with 18 participants to evaluate user performance and gather qualitative feedback. All users understood the edge-changing cues from both devices with minimal training. Drilling holes with Drangle was somewhat less accurate but much faster and easier than with a mechanical guide; 89% of participants preferred Drangle over the mechanical guide. Users generally understood Brangle's feedback but found its hand-size-specific grip, palmar contact, and attractive tactile cues less intuitive than Drangle's generalized form factor, fingertip contact, and repulsive cues. After summarizing design considerations, we propose application scenarios and speculate how such devices could improve construction workflows.
DOI BibTeX

Empirical Inference Perceiving Systems Conference Paper Can Large Language Models Understand Symbolic Graphics Programs? Qiu, Z., Liu, W., Feng, H., Liu, Z., Xiao, T. Z., Collins, K. M., Tenenbaum, J. B., Weller, A., Black, M. J., Schölkopf, B. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published)
Against the backdrop of enthusiasm for large language models (LLMs), there is a growing need to scientifically assess their capabilities and shortcomings. This is nontrivial in part because it is difficult to find tasks which the models have not encountered during training. Utilizing symbolic graphics programs, we propose a domain well-suited to test multiple spatial-semantic reasoning skills of LLMs. Popular in computer graphics, these programs procedurally generate visual data. While LLMs exhibit impressive skills in general program synthesis and analysis, symbolic graphics programs offer a new layer of evaluation: they allow us to test an LLM’s ability to answer semantic questions about the images or 3D geometries without a vision encoder. To semantically understand the symbolic programs, LLMs would need to possess the ability to “imagine” and reason how the corresponding graphics content would look with only the symbolic description of the local curvatures and strokes. We use this task to evaluate LLMs by creating a large benchmark for the semantic visual understanding of symbolic graphics programs, built procedurally with minimal human effort. Particular emphasis is placed on transformations of images that leave the image level semantics invariant while introducing significant changes to the underlying program. We evaluate commercial and open-source LLMs on our benchmark to assess their ability to reason about visual output of programs, finding that LLMs considered stronger at reasoning generally perform better. Lastly, we introduce a novel method to improve this ability – Symbolic Instruction Tuning (SIT), in which the LLM is finetuned with pre-collected instruction data on symbolic graphics programs. Interestingly, we find that SIT not only improves LLM’s understanding on symbolic programs, but it also improves general reasoning ability on various other benchmarks.
arXiv Paper BibTeX

Empirical Inference Conference Paper Compositional simulation-based inference for time series Gloeckler*, M., Toyota*, S., Fukumizu, K., Macke, J. H. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published) arXiv BibTeX

Empirical Inference Robust Machine Learning Conference Paper Cross-Entropy Is All You Need to Invert the Data Generating Process Reizinger*, P., Bizeul*, A., Juhos*, A., Vogt, J. E., Balestriero, R., Brendel, W., Klindt, D. The Thirteenth International Conference on Learning Representations (ICLR), April 2025, *Joint first authorship (Published) arXiv BibTeX

Empirical Inference Conference Paper Differentially private steering for Large language model alignment Goel, A., Hu, Y., Gurevych, I., Sanyal, A. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published) arXiv BibTeX

Empirical Inference Perceiving Systems Conference Paper Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets Liu, Z., Xiao, T. Z., Liu, W., Bengio, Y., Zhang, D. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published)
While one commonly trains large diffusion models by collecting datasets on target downstream tasks, it is often desired to align and finetune pretrained diffusion models with some reward functions that are either designed by experts or learned from small-scale datasets. Existing post-training methods for reward finetuning of diffusion models typically suffer from lack of diversity in generated samples, lack of prior preservation, and/or slow convergence in finetuning. Inspired by recent successes in generative flow networks (GFlowNets), a class of probabilistic models that sample with the unnormalized density of a reward function, we propose a novel GFlowNet method dubbed Nabla-GFlowNet (abbreviated as ∇-GFlowNet), the first GFlowNet method that leverages the rich signal in reward gradients, together with an objective called ∇-DB plus its variant residual ∇-DB designed for prior-preserving diffusion finetuning. We show that our proposed method achieves fast yet diversity- and prior-preserving finetuning of Stable Diffusion, a large-scale text-conditioned image diffusion model, on different realistic reward functions.
arXiv BibTeX

Empirical Inference Conference Paper Improving Probabilistic Diffusion Models With Optimal Covariance Matching Ou*, Z., Zhang*, M., Zhang, A., Xiao, T. Z., Li, Y., Barber, D. The Thirteenth International Conference on Learning Representations (ICLR), April 2025, *equal contribution (Published) arXiv BibTeX

Empirical Inference Conference Paper Influence Functions for Scalable Data Attribution in Diffusion Models Mlodozeniec, B. K., Eschenhagen, R., Bae, J., Immer, A., Krueger, D., Turner, R. E. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published) arXiv BibTeX

Empirical Inference Robust Machine Learning Conference Paper Interaction Asymmetry: A General Principle for Learning Composable Abstractions Brady, J., von Kügelgen, J., Lachapelle, S., Buchholz, S., Kipf*, T., Brendel*, W. The Thirteenth International Conference on Learning Representations (ICLR), April 2025, *joint senior author (Published) arXiv BibTeX

Empirical Inference Conference Paper Language Model Alignment in Multilingual Trolley Problems Jin, Z., Kleiman-Weiner, M., Piatti, G., Levine, S., Liu, J., Gonzalez, F., Ortu, F., Strausz, A., Sachan, M., Mihalcea, R., Choi, Y., Schölkopf, B. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published) arXiv BibTeX

Social Foundations of Computation Conference Paper Limits to Predicting Online Speech Using Large Language Models Remeli, M., Hardt, M., Williamson, R. C. April 2025 (Submitted)
We study the predictability of online speech on social media, and whether predictability improves with information outside a user's own posts. Recent work suggests that the predictive information contained in posts written by a user's peers can surpass that of the user's own posts. Motivated by the success of large language models, we empirically test this hypothesis. We define unpredictability as a measure of the model's uncertainty, i.e., its negative log-likelihood on future tokens given context. As the basis of our study, we collect a corpus of 6.25M posts from more than five thousand X (previously Twitter) users and their peers. Across three large language models ranging in size from 1 billion to 70 billion parameters, we find that predicting a user's posts from their peers' posts performs poorly. Moreover, the value of the user's own posts for prediction is consistently higher than that of their peers'. Across the board, we find that the predictability of social media posts remains low, comparable to predicting financial news without context. We extend our investigation with a detailed analysis about the causes of unpredictability and the robustness of our findings. Specifically, we observe that a significant amount of predictive uncertainty comes from hashtags and @-mentions. Moreover, our results replicate if instead of prompting the model with additional context, we finetune on additional context.
arXiv BibTeX

Haptic Intelligence Conference Paper My Robot, My Motion: Expressive Real-Time Teleoperation Mohan, M., Kuchenbecker, K. J. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI), 1797-1799, Hands-on demonstration presented at the ACM/IEEE International Conference on Human-Robot Interaction (HRI), Melbourne, Australia, April 2025 (Published)
Humanoid social robots need to be able to move expressively. Traditional manipulation-focused teleoperation systems primarily control the end-effector's position and orientation, neglecting the extra degrees of freedom in human and robotic arms, which can lead to unnatural movements. This demonstration presents our Optimization-based Customizable Retargeting Algorithm (OCRA), designed for real-time motion mapping between dissimilar kinematic chains. OCRA functions well with widely varying robot-arm joint configurations. The presenter will use a commercial motion-capture suit to teleoperate the upper body of a NAO humanoid robot, demonstrating OCRA's ability to create intuitive, human-like movements in real time.
DOI URL BibTeX

Empirical Inference Autonomous Learning Conference Paper On the Transfer of Object-Centric Representation Learning Didolkar, A. R., Zadaianchuk, A., Goyal, A., Mozer, M. C., Bengio, Y., Martius*, G., Seitzer*, M. The Thirteenth International Conference on Learning Representations (ICLR), April 2025, *equal contribution (Published) URL BibTeX

Empirical Inference Conference Paper Preference Elicitation for Offline Reinforcement Learning Pace, A., Schölkopf, B., Rätsch, G., Ramponi, G. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published) arXiv BibTeX

Haptic Intelligence Article Simulation Training with Haptic Feedback of Instrument Vibrations Reduces Resident Workload During Live Robot-Assisted Sleeve Gastrectomy Gomez, E. D., Mat Husin, H., Dumon, K. R., Williams, N. N., Kuchenbecker, K. J. Surgical Endoscopy, 39(3):1523-1535, April 2025 (Published)
Background: New surgeons experience heavy workload during robot-assisted surgery partially because they must use vision to compensate for the lack of haptic feedback. We hypothesize that providing realistic haptic feedback during dry-lab simulation training may accelerate learning and reduce workload during subsequent surgery on patients. Methods: We conducted a single-blinded study with twelve general surgery residents (third and seventh post-graduate year, PGY) randomized into haptic and control groups. Participants performed five simulated bariatric surgeries on a custom inanimate simulator followed by live robot-assisted sleeve gastrectomies (RASGs) using da Vinci robots. The haptic group received naturalistic haptic feedback of instrument vibrations during their first four simulated procedures. Participants completed pre-/post-procedure STAI and post-procedure NASA-TLX questionnaires in both simulation and the operating room (OR). Results: Higher PGY level (simulation: p<0.001, OR p=0.004), shorter operative time (simulation: p<0.001, OR: p=0.003), and lower pre-procedure STAI (simulation: p=0.003, OR: p<0.001) were significantly associated with lower self-reported overall workload in both operative settings; PGY-7s reported about 10% lower workload than PGY-3s. The haptic group had significantly lower overall covariate-adjusted NASA-TLX during the fourth (p=0.03) and fifth (p=0.04) simulated procedures and across all OR procedures (p=0.047), though not for only the first three OR procedures. Haptic feedback reduced physical demand (simulation: p<0.001, OR: p=0.001) and increased perceived performance (simulation: p=0.031, OR: p<0.001) in both settings. Conclusion: Haptic feedback of instrument vibrations provided during robotic surgical simulation reduces trainee workload during both simulation and live OR cases. The implications of workload reduction and its potential effects on patient safety warrant further investigation.
DOI BibTeX

Empirical Inference Conference Paper Standardizing Structural Causal Models Ormaniec*, W., Sussex*, S., Lorch*, L., Schölkopf, B., Krause, A. The Thirteenth International Conference on Learning Representations (ICLR), April 2025, *equal contribution (Published) arXiv BibTeX

Empirical Inference Conference Paper The Directionality of Optimization Trajectories in Neural Networks Singh, S. P., He, B., Hofmann, T., Schölkopf, B. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published) URL BibTeX

Empirical Inference Article The Fiction Machine Bottou, L., Schölkopf, B. SIAM News, 58(3), April 2025 (Published) URL BibTeX

Empirical Inference Conference Paper What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis Ormaniec, W., Dangel, F., Singh, S. P. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published) arXiv BibTeX

Empirical Inference Conference Paper Why AI Is WEIRD and Should Not Be This Way: Towards AI For Everyone, With Everyone, By Everyone Mihalcea*, R., Ignat*, O., Bai, L., Borah, A., Chiruzzo, L., Jin, Z., Kwizera, C., Nwatu, J., Poria, S., Solorio, T. The Thirty-Nineth AAAI Conference on Artificial Intelligence, AAAI 2025 (Senior Member Presentation Track), (27)28657-28670, (Editors: Toby Walsh, Julie Shah, Zico Kolter ), AAAI Press, April 2025, *equal contribution (Published)
This paper presents a vision for creating AI systems that are inclusive at every stage of development, from data collection to model design and evaluation. We address key limitations in the current AI pipeline and its WEIRD* representation, such as lack of data diversity, biases in model performance, and narrow evaluation metrics. We also focus on the need for diverse representation among the developers of these systems, as well as incentives that are not skewed toward certain groups. We highlight opportunities to develop AI systems that are for everyone (with diverse stakeholders in mind), with everyone (inclusive of diverse data and annotators), and by everyone (designed and developed by a globally diverse workforce). *WEIRD = an acronym coined by Joseph Henrich to highlight the coverage limitations of many psychological studies, referring to populations that are Western, Educated, Industrialized, Rich, and Democratic; while we do not fully adopt this term for AI, as its current scope does not perfectly align with the WEIRD dimensions, we believe that today’s AI has a similarly &quot;weird&quot; coverage, particularly in terms of who is involved in its development and who benefits from it.
arXiv DOI URL BibTeX

Haptic Intelligence Perceiving Systems Article Wrist-to-Wrist Bioimpedance Can Reliably Detect Discrete Self-Touch Forte, M., Vardar, Y., Javot, B., Kuchenbecker, K. J. IEEE Transactions on Instrumentation and Measurement, 74(4006511):1-11, April 2025 (Published)
Self-touch is crucial in human communication, psychology, and disease transmission, yet existing methods for detecting self-touch are often invasive or limited in scope. This study systematically investigates the feasibility of using non-invasive electrical bioimpedance for detecting discrete self-touch poses across individuals. While previous research has focused on classifying defined self-touch poses, our work explores how various poses cause bioimpedance changes, providing insights into the underlying physiological mechanisms. We thus created a dataset of 27 genuine self-touch poses, including skin-to-skin contact between the hands and face and skin-to-clothing contact between the hands and chest, alongside six adversarial mid-air gestures. We then measured the wrist-to-wrist bioimpedance of 30 adults (15 female, 15 male) across these poses, with each measurement preceded by a no-touch pose serving as a baseline. Statistical analysis of the measurements showed that skin-to-skin contacts cause significant changes in bioimpedance magnitude between 237.8 kHz and 4.1 MHz, while adversarial gestures do not; skin-to-clothing contacts cause less-significant changes due to the influence and variability of the clothing material. Furthermore, our analysis highlights the sensitivity of bioimpedance to the body parts involved, skin contact area, and individual's characteristics. Our contributions are two-fold: (1) we demonstrate that bioimpedance offers a practical, non-invasive solution for detecting self-touch poses involving skin-to-skin contact, (2) researchers can leverage insights from our study to determine whether a pose can be detected without extensive testing.
DOI BibTeX

Empirical Inference Conference Paper MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs Opedal*, A., Shirakami*, H., Schölkopf, B., Saparov, A., Sachan, M. The Thirteenth International Conference on Learning Representations (ICLR), April 2025, *equal contribution (Published) arXiv BibTeX

Perceiving Systems Ph.D. Thesis Democratizing 3D Human Digitization Xiu, Y. March 2025 (Published)
Richard Feynman once said, “What I cannot create, I do not understand.” Similarly, making virtual humans more realistic helps us better grasp human nature. Simulating lifelike avatars has scientific value (such as in biomechanics) and practical applications (like the Metaverse). However, creating them affordably at scale with high quality remains challenging. Reconstructing complex poses, varied clothing, and unseen areas from casual photos under real-world conditions is still difficult. We address this through a series of works—ICON, ECON, TeCH, PuzzleAvatar—bridging pixel-based reconstruction with text-guided generation to reframe reconstruction as conditional generation. This allows us to turn everyday photos, like personal albums featuring random poses, diverse clothing, tricky angles, and arbitrary cropping, into 3D avatars. The process converts unstructured data into structured output without unnecessary complexity. With these techniques, we can efficiently scale up the creation of digital humans using readily available imagery.
Thesis BibTeX

Robotic Materials Article A robotic and virtual testing platform highlighting the promise of soft wearable actuators for wrist tremor suppression Shagan Shomron, A., Chase-Markopoulou, C., Walter, J. R., Sellhorn-Timm, J., Shao, Y., Nadler, T., Benson, A., Wochner, I., Rumley, E. H., Wurster, I., Klocke, P., Weiss, D., Schmitt, S., Keplinger, C., Haeufle, D. F. Device, 3:100719, March 2025 (Published)
Nearly 80 million people in the world deal with medical conditions that cause involuntary periodic movements known as tremors. Wearable soft robotic devices offer a potential solution for actively suppressing these tremors. However, existing prototypes face limitations in actuation performance and complex testing procedures. We present a comprehensive approach for the rapid evaluation of emerging wearable tremor-suppression technologies. This method combines reproducing patient-recorded tremor episodes and measuring tremor suppression in a robotic platform, termed a "mechanical patient", with validation of the achieved suppression performance of soft actuators via biomechanical modeling, thereby avoiding time-consuming clinical testing in the early stages of development. Using this approach, we highlight that an antagonistic pair of slim and lightweight electrohydraulic actuators can effectively …
Press release Video (overview) Video (technical description) Article in pdf DOI URL BibTeX

Haptic Intelligence Article A Sleeve Alters the Pressure-Stretch Curve of a Hyperelastic Balloon to Enable Pre-Programmed Sequencing Gertler, I., Kuchenbecker, K. J. Advanced Materials Technologies, 10(6):2400993, March 2025 (Published)
Coupled hyperelastic balloons that anchor alternately against a lumen wall provide an appealing locomotion method for soft robots, especially for pipe inspection and medical interventions. However, it is still challenging to use a single fluid channel to obtain a practical balloon actuation sequence, where the rear anchor is both the first to inflate and the first to deflate. The common solution delays the front balloon's reaction using fluid dynamics, producing a slow and/or bulky system. This study presents a new method that utilizes an inextensible sleeve along with geometry and mechanical properties to set the pressure-stretch curve of two silicone-rubber balloons so they could serve as the rear and front anchors when driven from a single fluid supply. Experimental measurements and numerical simulations compare the characteristic curves of thin and thick spherical balloons with identical diameters to that of a thin balloon inside a rigid encasing sleeve that delays its initial expansion. Pairing this encased thin balloon with a non-encased thick balloon yields the desired asymmetric actuation sequence. A physical demonstration of the behavior needed for self-propelling robots is achieved by placing such balloons within rigid tubes, connecting them to a shared supply, and sequentially adding and removing fluid.
DOI BibTeX

Empirical Inference Article Early warning of complex climate risk with integrated artificial intelligence Reichstein, M., Benson, V., Blunk, J., Camps-Valls, G., Creutzig, F., Fearnley, C. J., Han, B., Kornhuber, K., Rahaman, N., Schölkopf, B., Tárraga, J. M., Vinuesa, R., Dall, K., Denzler, J., Frank, D., Martini, G., Nganga, N., Maddix, D. C., Weldemariam, K. Nature Communications, 16(1), March 2025 (Published) DOI BibTeX

Haptic Intelligence Miscellaneous Error-State Extended Kalman Filter Sensor Fusion for Tracking Collaborating Humans Hudhud Mughrabi, M., Allemang–Trivalle, A., Kuchenbecker, K. J. Extended abstract (3 pages) presented at the German Robotics Conference (GRC), Nuremberg, Germany, March 2025 (Published)
How teams collaborate to perform complex tasks , from team sports to surgical procedures, has previously been investigated via multimodal sensing and analysis. Ultra-wideband (UWB) positioning systems are highly mobile and can be used to track collaborating team members even in cramped environments. However, the sampling rate of UWB systems is inversely proportional to the number of people tracked, and their accuracy is hindered by electromagnetic occlusion. To improve position and orientation estimation during team collaborative studies, we propose to fuse UWB positioning with a wearable inertial measurement unit (IMU) by applying an error-state extended Kalman filter (ES-EKF). This filter offers faster and more consistent estimation and remains functional even in the absence of UWB input. Single-human and multi-human sessions were recorded and filtered for evaluation against ground truth from optical motion capture. By integrating IMU readings, the ES-EKF increases the sampling rate from 0.5-20 Hz to 100 Hz. Even by correcting only planar position in the room, the ES-EKF yields improved results over UWB in four out of six DOF: lateral and longitudinal position and yaw and pitch orientation.
BibTeX

Perceiving Systems Conference Paper Gaussian Garments: Reconstructing Simulation-Ready Clothing with Photo-Realistic Appearance from Multi-View Video Rong, B., Grigorev, A., Wang, W., Black, M. J., Thomaszewski, B., Tsalicoglou, C., Hilliges, O. In International Conference on 3D Vision (3DV), International Conference on 3D Vision, March 2025 (Published)
We introduce Gaussian Garments, a novel approach for reconstructing realistic-looking, simulation-ready garment assets from multi-view videos. Our method represents garments with a combination of a 3D mesh and a Gaussian texture that encodes both the color and high-frequency surface details. This representation enables accurate registration of garment geometries to multi-view videos and helps disentangle albedo textures from lighting effects. Furthermore, we demonstrate how a pre-trained Graph Neural Network (GNN) can be fine-tuned to replicate the real behavior of each garment. The reconstructed Gaussian Garments can be automatically combined into multi-garment outfits and animated with the fine-tuned GNN.
arXiv project video URL BibTeX

Haptic Intelligence Miscellaneous Haptify: A Measurement System for Benchmarking Grounded Force-Feedback Devices Fazlollahi, F., Kuchenbecker, K. J. Extended abstract (3 pages) presented at the German Robotics Conference (GRC), Nuremberg, Germany , March 2025 (Published)
Grounded force-feedback (GFF) devices are a well-established and diverse category of haptic technology based on robotic arms. However, the number of designs and their specifications make it challenging to compare devices effectively. We address this challenge by presenting Haptify, a benchmarking system capable of evaluating GFF haptic devices in a thorough, fair, and non-invasive way. The user holds the instrumented device end-effector and moves it through a series of passive and active experiments. Haptify captures the interaction between the hand, device, and ground using a seven-camera optical motion-capture system, a custom 60-cm-square force plate, and a customized sensing end-effector. We propose six key metrics for evaluating GFF device performance: workspace shape, global free-space forces, global free-space vibrations, local dynamic forces and torques, frictionless surface rendering, and stiffness rendering. We then benchmark two commercial haptic devices using Haptify. The more expensive Touch X has a smaller workspace than the 3D Systems Touch, but it outputs smaller free-space forces and vibrations, smaller and more predictable dynamic forces and torques, and higher-quality renderings of a frictionless surface and high stiffness.
BibTeX

Empirical Inference Ph.D. Thesis Learning to Generalize Across Distribution Shifts Träuble, F. J. University of Tübingen, Germany, March 2025, (IMPRS-PhD-Fellowship-Program and ELLIS-PhD-Fellowship-Program) (Published) BibTeX

Empirical Inference Article Real-time inference for binary neutron star mergers using machine learning Dax, M., Green, S. R., Gair, J., Gupte, N., Pürrer, M., Raymond, V., Wildberger, J., Macke, J. H., Buonanno, A., Schölkopf, B. Nature, 639(8053):49-53, March 2025 (Published) DOI URL BibTeX

Perceiving Systems Conference Paper CameraHMR: Aligning People with Perspective Patel, P., Black, M. J. In International Conference on 3D Vision (3DV), International Conference on 3D Vision, March 2025 (Published)
We address the challenge of accurate 3D human pose and shape estimation from monocular images. The key to accuracy and robustness lies in high-quality training data. Existing training datasets containing real images with pseudo ground truth (pGT) use SMPLify to fit SMPL to sparse 2D joint locations, assuming a simplified camera with default intrinsics. We make two contributions that improve pGT accuracy. First, to estimate camera intrinsics, we develop a field-of-view prediction model (HumanFoV) trained on a dataset of images containing people. We use the estimated intrinsics to enhance the 4D-Humans dataset by incorporating a full perspective camera model during SMPLify fitting. Second, 2D joints provide limited constraints on 3D body shape, resulting in average-looking bodies. To address this, we use the BEDLAM dataset to train a dense surface keypoint detector. We apply this detector to the 4D-Humans dataset and modify SMPLify to fit the detected keypoints, resulting in significantly more realistic body shapes. Finally, we upgrade the HMR2.0 architecture to include the estimated camera parameters. We iterate model training and SMPLify fitting initialized with the previously trained model. This leads to more accurate pGT and a new model, CameraHMR, with state-of-the-art accuracy. Code and pGT are available for research purposes.
arXiv project BibTeX

Perceiving Systems Conference Paper CHOIR: A Versatile and Differentiable Hand-Object Interaction Representation Morales, T., Taheri, O., Lacey, G. In Winter Conference on Applications of Computer Vision (WACV), February 2025 (Published)
Synthesizing accurate hands-object interactions (HOI) is critical for applications in Computer Vision, Augmented Reality (AR), and Mixed Reality (MR). Despite recent advances, the accuracy of reconstructed or generated HOI leaves room for refinement. Some techniques have improved the accuracy of dense correspondences by shifting focus from generating explicit contacts to using rich HOI fields. Still, they lack full differentiability or continuity and are tailored to specific tasks. In contrast, we present a Coarse Hand-Object Interaction Representation (CHOIR), a novel, versatile and fully differentiable field for HOI modelling. CHOIR leverages discrete unsigned distances for continuous shape and pose encoding, alongside multivariate Gaussian distributions to represent dense contact maps with few parameters. To demonstrate the versatility of CHOIR we design JointDiffusion, a diffusion model to learn a grasp distribution conditioned on noisy hand-object interactions or only object geometries, for both refinement and synthesis applications. We demonstrate JointDiffusion’s improvements over the SOTA in both applications: it increases the contact F1 score by 5% for refinement and decreases the sim. displacement by 46% for synthesis. Our experiments show that JointDiffusion with CHOIR yield superior contact accuracy and physical realism compared to SOTA methods designed for specific tasks.
GitHub Paper URL BibTeX

Biomimetic Materials and Machines Article Highly agile flat swimming robot Hartmann, F., Baskaran, M., Raynaud, G., Benbedda, M., Mulleners, K., Shea, H. February 2025 (Published) BibTeX

Rationality Enhancement Article Evaluating the Effectiveness of the InsightApp: A Longitudinal Randomized Controlled Trial on Anxiety, Valued Action, and Psychological Resilience Amo, V., Lieder, F. JMIR Mental Health, 12:e57201, February 2025 (Published)
Background: Anxiety disorders are among the most prevalent mental disorders, and stress plays a significant role in their development. Ecological momentary interventions (EMIs) hold great potential to help people manage stress and anxiety by training emotion regulation and coping skills in real-life settings. InsightApp is a gamified EMI and research tool that incorporates elements from evidence-based therapeutic approaches. It is designed to strengthen people’s metacognitive skills for coping with challenging real-life situations and embracing anxiety and other emotions. Objective: This randomized controlled trial aims to examine the effectiveness of InsightApp in (1) improving individuals’ metacognitive strategies for coping with stress and anxiety and (2) promoting value-congruent action. It also evaluates how long these effects are retained. This experiment advances our understanding of the role of metacognition in emotional and behavioral reactivity to stress. Methods: We conducted a randomized controlled trial with 228 participants (completion rate: n=197, 86.4%; mean age 38, SD 11.50 years; age range 20-80 years; female: n=101, 52.6%; and White: n=175, 91.1%), who were randomly assigned to either the treatment or the active placebo control group. During the 1-week intervention phase, the treatment group engaged with InsightApp, while participants in the control group interacted with a placebo version of the app that delivered executive function training. We assessed the differences between the 2 groups in posttest and follow-up assessments of mental health and well-being while controlling for preexisting differences. Moreover, we used a multilevel model to analyze the longitudinal data, focusing on the within-participant causal effects of the intervention on emotional and behavioral reactivity to daily stressors. Specifically, we measured daily anxiety, struggle with anxiety, and value-congruent action. Results: The intervention delivered by InsightApp yielded mixed results. On one hand, we found no significant posttest scores on mental health and well-being measures directly after the intervention or 7 days later (all P>.22). In contrast, when confronted with real-life stress, the treatment group experienced a 15% lower increase in anxiety (1-tailed t test, t197=–2.4; P=.009) and a 12% lower increase in the struggle with anxiety (t197=–1.87; P=.031) than the control group. Furthermore, individuals in the treatment group demonstrated a 7% higher tendency to align their actions with their values compared to the control group (t197=3.23; P=.002). After the intervention period, InsightApp’s positive effects on the struggle with anxiety in reaction to stress were sustained, and increased to an 18% lower reactivity to stress (t197=–2.84; P=.002). Conclusions: As our study yielded mixed results, further studies are needed to obtain an accurate and reliable understanding of the effectiveness of InsightApp. Overall, our findings tentatively suggest that guiding people to apply adaptive metacognitive strategies for coping with real-life stress daily with a gamified EMI is a promising approach that deserves further evaluation.
DOI URL BibTeX

Empirical Inference Article Artificial intelligence for modelling infectious disease epidemics Kraemer, M. U. G., Tsui, J. L., Chang, S. Y., Lytras, S., Khurana, M. P., Vanderslott, S., Bajaj, S., Scheidwasser, N., Curran-Sebastian, J. L., Semenova, E., Zhang, M., Unwin, H. J. T., Watson, O. J., Mills, C., Dasgupta, A., Ferretti, L., Scarpino, S. V., Koua, E., Morgan, O., Tegally, H., et al. Nature, 638(8051):623-635, February 2025 (Published) DOI URL BibTeX