Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Haptic Intelligence Miscellaneous The Benefits of Gait Retraining with Vibrotactile Feedback Outweigh Higher Perceived Mental Load Sundaram, V. H., Rokhmanova, N., Halilaj, E., Kuchenbecker, K. J. Extended abstract (1 page) presented at the American Society of Biomechanics Annual Meeting (ASB), Pittsburgh, USA, August 2025 (Published)
Knee osteoarthritis (KOA) affects millions worldwide, with excessive joint loading linked to disease progression. Modifying the foot progression angle (FPA) while walking is one strategy to reduce knee adduction moments, a measure associated with medial knee joint loading. This study investigated whether two types of vibrotactile biofeedback during a 20-minute treadmill gait-retraining session helped healthy adults better learn and retain a 10°toe-in gait. Participants who received feedback showed greater improvements in FPA accuracy than those without feedback and also reported significantly higher mental effort. The type of feedback that scaled the duration of the vibration with the magnitude of the error led to better short-term retention than no feedback, and it was also preferred by almost all subjects over constant-duration cues. These findings suggest that despite the added cognitive demand, users value biofeedback, emphasizing the need to design gait-retraining tools that consider both learning effectiveness and user experience.
BibTeX

Materials Article Sensitivity Enhancement of a Micro Ring Resonator-Based Photonic Sensor by Using a Gelatin Methacryloyl Functional Coating for the Detection of Metoprolol Tsianaka, A., Schweikert, C., Southan, A., Hoppe, N., Greul, M., Kaschel, M., Vogel, W., Berroth, M., Rademacher, G., Tovar, G. E. M. ACS Applied Optical Materials, 3(7):1556-1566, July 2025 (Published)
Aquatic environments are often contaminated with biopersistent pharmaceuticals, such as the β-blocker metoprolol. The quantitative determination of such pollutants is crucial for environmental monitoring. Therefore, a highly sensitive integrated photonic biosensor for the detection of minute concentrations of metoprolol is presented here. The sensor is based on a thermally robust ring resonator with a hydrogel coating for metoprolol adsorption. Hydrogels consisting of gelatin methacryloyl enabled an increase in the concentration of metoprolol ions in the vicinity of the photonic chip, resulting in high sensitivity of the sensor setup. Compared to an uncoated chip, an increase in sensitivity of up to a factor of 20 was observed. In combination with software-implemented signal processing, the setup showed a detection limit of less than 1 × 10–4 μmol mL–1. The combination of functional coating, thermally insensitive design, and applied digital signal postprocessing makes the system introduced here an attractive approach toward sensor-based wastewater analysis and monitoring.
pdf DOI URL BibTeX

Haptic Intelligence Miscellaneous A DNN-Based Metamodel for Simulating Fingertip Deformation Deshmukh, Y., Kuchenbecker, K. J., Serhat, G. Work-in-progress paper (2 pages) presented at the IEEE World Haptics Conference (WHC), Suwon, South Korea, July 2025 (Published) BibTeX

Empirical Inference Conference Paper Active Fine-Tuning of Multi-Task Policies Bagatella, M., Hübotter, J., Martius, G., Krause, A. In Proceedings of the 42nd International Conference on Machine Learning (ICML), 267:2409-2441, Proceedings of Machine Learning Research, (Editors: Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry), PMLR, International Conference on Machine Learning, July 2025 (Published) arXiv URL BibTeX

Social Foundations of Computation Miscellaneous Answer Matching Outperforms Multiple Choice for Language Model Evaluation Chandak, N., Goel, S., Prabhu, A., Hardt, M., Geiping, J. July 2025 (Submitted)
Multiple choice benchmarks have long been the workhorse of language model evaluation because grading multiple choice is objective and easy to automate. However, we show multiple choice questions from popular benchmarks can often be answered without even seeing the question. These shortcuts arise from a fundamental limitation of discriminative evaluation not shared by evaluations of the model's free-form, generative answers. Until recently, there appeared to be no viable, scalable alternative to multiple choice--but, we show that this has changed. We consider generative evaluation via what we call answer matching: Give the candidate model the question without the options, have it generate a free-form response, then use a modern language model with the reference answer to determine if the response matches the reference. To compare the validity of different evaluation strategies, we annotate MMLU-Pro and GPQA-Diamond to obtain human grading data, and measure the agreement of each evaluation approach. We find answer matching using recent models--even small ones--achieves near-perfect agreement, in the range of inter-annotator agreement. In contrast, both multiple choice evaluation and using LLM-as-a-judge without reference answers aligns poorly with human grading. Improving evaluations via answer matching is not merely a conceptual concern: the rankings of several models change significantly when evaluating their free-form responses with answer matching. In light of these findings, we discuss how to move the evaluation ecosystem from multiple choice to answer matching.
arXiv BibTeX

Empirical Inference Deep Models and Optimization Conference Paper Generalized Interpolating Discrete Diffusion von Rütte, D., Fluri, J., Ding, Y., Orvieto, A., Schölkopf, B., Hofmann, T. Proceedings of the 42nd International Conference on Machine Learning (ICML), 267:61810-61843, Proceedings of Machine Learning Research, (Editors: Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry), PMLR, International Conference on Machine Learning, July 2025 (Published) arXiv URL BibTeX

Empirical Inference Conference Paper Generative Intervention Models for Causal Perturbation Modeling Schneider, N., Lorch, L., Kilbertus, N., Schölkopf, B., Krause, A. Proceedings of the 42nd International Conference on Machine Learning (ICML), 267:53388-53412, Proceedings of Machine Learning Research, (Editors: Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry), PMLR, International Conference on Machine Learning, July 2025 (Published) arXiv URL BibTeX

Empirical Inference Conference Paper Learning Joint Interventional Effects from Single-Variable Interventions in Additive Models Kekić, A., Garrido Mejia, S., Schölkopf, B. Proceedings of the 42nd International Conference on Machine Learning (ICML), 267:29651-29669, Proceedings of Machine Learning Research, (Editors: Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry), PMLR, International Conference on Machine Learning, July 2025 (Published) arXiv URL BibTeX

Haptic Intelligence Robotic Materials Miscellaneous Learning-Based Touch Detection and Force Estimation in Cutaneous Electrohydraulic Devices Sanchez-Tamayo, N., Singer, D., Keplinger, C., Kuchenbecker, K. J. Work-in-progress paper (2 pages) presented at the IEEE World Haptics Conference (WHC), Suwon, South Korea, July 2025 (Published) BibTeX

Haptic Intelligence Miscellaneous Perception of Diverse Asymmetric Vibration Signals Tashiro, N., Ballardini, G., Nunez, C. M., Vardar, Y., Kuchenbecker, K. J. Work-in-progress paper (2 pages) presented at the IEEE World Haptics Conference (WHC), Suwon, South Korea, July 2025 (Published) BibTeX

Empirical Inference Conference Paper Position: Probabilistic Modelling is Sufficient for Causal Inference Mlodozeniec, B. K., Krueger, D., Turner, R. E. Proceedings of the 42nd International Conference on Machine Learning (ICML), 267:81810-81840, Proceedings of Machine Learning Research, (Editors: Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry), PMLR, International Conference on Machine Learning, July 2025 (Published) URL BibTeX

Empirical Inference Ph.D. Thesis Probabilistic Machine Learning for Real-Time Gravitational-Wave Inference Dax, M. Eberhard Karls Universität Tübingen, July 2025, (MPI IS + ELLIS Institute T{\"u}bingen) (Published) BibTeX

Empirical Inference Conference Paper Progressive Tempering Sampler with Diffusion Rissanen*, S., OuYang*, R., He*, J., Chen, W., Heinonen, M., Solin, A., Hernández-Lobato, J. M. Proceedings of the 42nd International Conference on Machine Learning (ICML), 267:51724-51746, Proceedings of Machine Learning Research, (Editors: Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry), PMLR, International Conference on Machine Learning, July 2025, *equal contribution (Published) arXiv URL BibTeX

Haptic Intelligence Miscellaneous Quantifying Texture-Rendering Quality Across Haptic Devices Fazlollahi, F., Seifi, H., Ballardini, G., Taghizadeh, Z., Schulz, A. K., MacLean, K. E., Kuchenbecker, K. J. Work-in-progress paper (2 pages) presented at the IEEE World Haptics Conference (WHC), Suwon, South Korea, July 2025 (Published) BibTeX

Empirical Inference Autonomous Learning Conference Paper SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models Sancaktar, C., Gumbsch, C., Zadaianchuk, A., Kolev, P., Martius, G. In Proceedings of the 42nd International Conference on Machine Learning (ICML), 267:52745-52777, Proceedings of Machine Learning Research, (Editors: Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry), International Conference on Machine Learning , July 2025 (Published) arXiv Project website URL BibTeX

Empirical Inference Conference Paper Scalable Gaussian Processes with Latent Kronecker Structure Lin, J. A., Ament, A., Balandat, M., Eriksson, D., Hernández-Lobato, J. M., Bakshy, E. Proceedings of the 42nd International Conference on Machine Learning (ICML), 267:37730-37744, Proceedings of Machine Learning Research, (Editors: Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry), PMLR, International Conference on Machine Learning, July 2025 (Published) arXiv URL BibTeX

Haptic Intelligence Robotics Miscellaneous Soft Magnetic Fingertip Devices for Clear Vibrotactile Feedback Gertler, I., Ballardini, G., Grüninger, F., Kuchenbecker, K. J. Hands-on demonstration presented at the IEEE World Haptics Conference (WHC), Suwon, South Korea, July 2025 (Published) BibTeX

Haptic Intelligence Miscellaneous Whole-Arm Humanoid Robot Teleoperation with Naturalistic Vibrotactile Feedback Gong, Y., Hudhud Mughrabi, M., L’Orsa, R., Mohan, M., Kuchenbecker, K. J. Work-in-progress paper (2 pages) presented at the IEEE World Haptics Conference (WHC), Suwon, South Korea, July 2025 (Published) BibTeX

Autonomous Learning Empirical Inference Conference Paper Zero-Shot Offline Imitation Learning via Optimal Transport Rupf, T., Bagatella, M., Gürtler, N., Frey, J., Martius, G. In Proceedings of the 42nd International Conference on Machine Learning (ICML), 267:52345-52381, Proceedings of Machine Learning Research, (Editors: Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry), PMLR, International Conference on Machine Learning, July 2025 (Published)
Zero-shot imitation learning algorithms hold the promise of reproducing unseen behavior from as little as a single demonstration at test time. Existing practical approaches view the expert demonstration as a sequence of goals, enabling imitation with a high-level goal selector, and a low-level goal-conditioned policy. However, this framework can suffer from myopic behavior: the agent's immediate actions towards achieving individual goals may undermine long-term objectives. We introduce a novel method that mitigates this issue by directly optimizing the occupancy matching objective that is intrinsic to imitation learning. We propose to lift a goal-conditioned value function to a distance between occupancies, which are in turn approximated via a learned world model. The resulting method can learn from offline, suboptimal data, and is capable of non-myopic, zero-shot imitation, as we demonstrate in complex, continuous benchmarks.
arXiv URL BibTeX

Physical Intelligence Article Bacterial Minicell-Based Biohybrid Sub-micron Swimmers for Targeted Cargo Delivery Saadet Fatma Baltaci, M. B. A. I. K. V. S. M. S. Advanced Science, 12:e05538, June 2025 (Published)
Bacterial biohybrid microrobots possess significant potential for targeted cargo delivery and minimally invasive therapy. However, many challenges, such as biocompatibility, stability, and effective cargo loading, remain. Bacterial membrane vesicles, also referred to as minicells, offer a promising alternative for creating sub-micron scale biohybrid swimmers (minicell biohybrids) due to their active metabolism, non-dividing nature, robust structure, and high cargo-carrying capacity. Here, a biohybrid system is reported that utilizes motile minicells, ≈400 nm in diameter, generated by aberrant cell division of engineered Escherichia coli (E. coli), for the first time. Achieving over 99% purification from their parental bacterial cells, minicells are functionalized with magnetic nanoparticles (MNPs) to enable external magnetic control. Minicell biohybrids are capable of swimming at an average speed of up to 13.3 µm s−1 and being steered under a uniform magnetic field of 26 mT. Furthermore, they exhibit a significantly high drug loading capacity (2.8 µg mL−1) while maintaining their motility and show pH-sensitive release of anticancer drug doxorubicin hydrochloride (DOX) under acidic conditions. Additionally, drug-loaded minicell biohybrids notably reduce the viability of SK-BR-3 breast cancer cells in vitro. This study introduces minicell biohybrids and establishes their potential as magnetically guided, drug-loaded biohybrid systems for targeted therapies in future medical applications.
DOI URL BibTeX

Physical Intelligence Article Magnetically Controllable and Degradable Milliscale Swimmers as Intraocular Drug Implants Yildiz, E., Bozuyuk, U., Yildiz, E., Wang, F., Han, M., Karacakol, A. C., Sheehan, D., Yu, Y., Sitti, M. Advanced Science, 12:e07569, June 2025 (Published)
Intraocular drug implants are increasingly used for retinal treatments, such as age-related macular degeneration and diabetic macular edema, due to the rapidly aging global population. Although these therapies show promise in arresting disease progression and improving vision, intraocular implant-based therapies can cause unexpected complications that require further surgery due to implant dislocation or uncontrolled drug release. These frequent complications of intraocular drug implants can be overcome using magnetically controllable degradable milliscale swimmers (MDMS) with a double-helix body morphology. A biodegradable hydrogel, polyethylene glycol diacrylate, is employed as the primary 3D printing material of MDMS, and it is magnetized by decorating it with biocompatible polydopamine-encapsulated iron-platinum nanoparticles. MDMS have comparable dimensions to commercial intraocular implants that achieve translational motions in both aqueous and vitreous bodies. They can be imaged in real-time using optical coherence tomography, ultrasound, and photoacoustic imaging. Thanks to their biodegradable hydrogel-based structure, they can be loaded with anti-inflammatory drug molecules and release the medications without disrupting retinal epithelial viability and barrier function, and decrease proinflammatory cytokine release significantly. These magnetically controllable swimmers, which degrade in a couple of months, can be used for less invasive and more precise intraocular drug delivery compared to commercial intraocular drug implants.
DOI URL BibTeX

Perceiving Systems Conference Paper Reconstructing Animals and the Wild Kulits, P., Black, M. J., Zuffi, S. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), June 2025 (Published)
The idea of 3D reconstruction as scene understanding is foundational in computer vision. Reconstructing 3D scenes from 2D visual observations requires strong priors to disambiguate structure. Much work has been focused on the anthropocentric, which, characterized by smooth surfaces, coherent normals, and regular edges, allows for the integration of strong geometric inductive biases. Here, we consider a more challenging problem where such assumptions do not hold: the reconstruction of natural scenes containing trees, bushes, boulders, and animals. While numerous works have attempted to tackle the problem of reconstructing animals in the wild, they have focused solely on the animal, neglecting environmental context. This limits their usefulness for analysis tasks, as animals exist inherently within the 3D world, and information is lost when environmental factors are disregarded. We propose a method to reconstruct natural scenes from single images. We base our approach on recent advances leveraging the strong world priors ingrained in Large Language Models and train an autoregressive model to decode a CLIP embedding into a structured compositional scene representation, encompassing both animals and the wild (RAW). To enable this, we propose a synthetic dataset comprising one million images and thousands of assets. Our approach, having been trained solely on synthetic data, generalizes to the task of reconstructing animals and their environments in real-world images. We will release our dataset and code to encourage future research.
project arXiv code BibTeX

Perceiving Systems Conference Paper DiffLocks: Generating 3D Hair from a Single Image using Diffusion Models Rosu, R. A., Wu, K., Feng, Y., Zheng, Y., Black, M. J. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), June 2025 (Published)
We address the task of reconstructing 3D hair geometry from a single image, which is challenging due to the diversity of hairstyles and the lack of paired image-to-3D hair data. Previous methods are primarily trained on synthetic data and cope with the limited amount of such data by using low-dimensional intermediate representations, such as guide strands and scalp-level embeddings, that require post-processing to decode, upsample, and add realism. These approaches fail to reconstruct detailed hair, struggle with curly hair, or are limited to handling only a few hairstyles. To overcome these limitations, we propose DiffLocks, a novel framework that enables detailed reconstruction of a wide variety of hairstyles directly from a single image. First, we address the lack of 3D hair data by automating the creation of the largest synthetic hair dataset to date, containing 40K hairstyles. Second, we leverage the synthetic hair dataset to learn an image-conditioned diffusion-transfomer model that reconstructs accurate 3D strands from a single frontal image. By using a pretrained image backbone, our method generalizes to in-the-wild images despite being trained only on synthetic data. Our diffusion model predicts a scalp texture map in which any point in the map contains the latent code for an individual hair strand. These codes are directly decoded to 3D strands without post-processing techniques. Representing individual strands, instead of guide strands, enables the transformer to model the detailed spatial structure of complex hairstyles. With this, DiffLocks can reconstruct highly curled hair, like afro hairstyles, from a single image for the first time. Qualitative and quantitative results demonstrate that DiffLocks outperforms exising state-of-the-art approaches. Data and code is available for research.
project paper code dataset BibTeX

Perceiving Systems Conference Paper InterDyn: Controllable Interactive Dynamics with Video Diffusion Models Akkerman, R., Feng, H., Black, M. J., Tzionas, D., Abrevaya, V. F. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), June 2025 (Published)
Predicting the dynamics of interacting objects is essential for both humans and intelligent systems. However, existing approaches are limited to simplified, toy settings and lack generalizability to complex, real-world environments. Recent advances in generative models have enabled the prediction of state transitions based on interventions, but focus on generating a single future state which neglects the continuous dynamics resulting from the interaction. To address this gap, we propose InterDyn, a novel framework that generates videos of interactive dynamics given an initial frame and a control signal encoding the motion of a driving object or actor. Our key insight is that large video generation models can act as both neural renderers and implicit physics ``simulators'', having learned interactive dynamics from large-scale video data. To effectively harness this capability, we introduce an interactive control mechanism that conditions the video generation process on the motion of the driving entity. Qualitative results demonstrate that InterDyn generates plausible, temporally consistent videos of complex object interactions while generalizing to unseen objects. Quantitative evaluations show that InterDyn outperforms baselines that focus on static state transitions. This work highlights the potential of leveraging video generative models as implicit physics engines
project arXiv BibTeX

Perceiving Systems Conference Paper PICO: Reconstructing 3D People In Contact with Objects Cseke, A., Tripathi, S., Dwivedi, S. K., Lakshmipathy, A. S., Chatterjee, A., Black, M. J., Tzionas, D. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), June 2025 (Published)
Recovering 3D Human-Object Interaction (HOI) from single color images is challenging due to depth ambiguities, occlusions, and the huge variation in object shape and appearance. Thus, past work requires controlled settings such as known object shapes and contacts, and tackles only limited object classes. Instead, we need methods that generalize to natural images and novel object classes. We tackle this in two main ways: (1) We collect PICO-db, a new dataset of natural images uniquely paired with dense 3D contact on both body and object meshes. To this end, we use images from the recent DAMON dataset that are paired with contacts, but these contacts are only annotated on a canonical 3D body. In contrast, we seek contact labels on both the body and the object. To infer these given an image, we retrieve an appropriate 3D object mesh from a database by leveraging vision foundation models. Then, we project DAMON's body contact patches onto the object via a novel method needing only 2 clicks per patch. This minimal human input establishes rich contact correspondences between bodies and objects. (2) We exploit our new dataset of contact correspondences in a novel render-and-compare fitting method, called PICO-fit, to recover 3D body and object meshes in interaction. PICO-fit infers contact for the SMPL-X body, retrieves a likely 3D object mesh and contact from PICO-db for that object, and uses the contact to iteratively fit the 3D body and object meshes to image evidence via optimization. Uniquely, PICO-fit works well for many object categories that no existing method can tackle. This is crucial to enable HOI understanding to scale in the wild.
project arXiv video code dataset BibTeX

Perceiving Systems Conference Paper ChatGarment: Garment Estimation, Generation and Editing via Large Language Models Bian, S., Xu, C., Xiu, Y., Grigorev, A., Liu, Z., Lu, C., Black, M. J., Feng, Y. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), June 2025 (Published)
We introduce ChatGarment, a novel approach that leverages large vision-language models (VLMs) to automate the estimation, generation, and editing of 3D garment sewing patterns from images or text descriptions. Unlike previous methods that often lack robustness and interactive editing capabilities, ChatGarment finetunes a VLM to produce GarmentCode, a JSON-based, language-friendly format for 2D sewing patterns, enabling both estimating and editing from images and text instructions. To optimize performance, we refine GarmentCode by expanding its support for more diverse garment types and simplifying its structure, making it more efficient for VLM finetuning. Additionally, we develop an automated data construction pipeline to generate a large-scale dataset of image-to-sewing-pattern and text-to-sewing-pattern pairs, empowering ChatGarment with strong generalization across various garment types. Extensive evaluations demonstrate ChatGarment’s ability to accurately reconstruct, generate, and edit garments from multimodal inputs, highlighting its potential to revolutionize workflows in fashion and gaming applications.
project arXiv video code data BibTeX

Social Foundations of Computation Conference Paper Difficult Lessons on Social Prediction from Wisconsin Public Schools Perdomo, J. C., Britton, T., Hardt, M., Abebe, R. In Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency, June 2025 (Published)
Early warning systems (EWS) are predictive tools at the center of recent efforts to improve graduation rates in public schools across the United States. These systems assist in targeting interventions to individual students by predicting which students are at risk of dropping out. Despite significant investments in their widespread adoption, there remain large gaps in our understanding of the efficacy of EWS, and the role of statistical risk scores in education. In this work, we draw on nearly a decade's worth of data from a system used throughout Wisconsin to provide the first large-scale evaluation of the long-term impact of EWS on graduation outcomes. We present empirical evidence that the prediction system accurately sorts students by their dropout risk. We also find that it may have caused a single-digit percentage increase in graduation rates, though our empirical analyses cannot reliably rule out that there has been no positive treatment effect. Going beyond a retrospective evaluation of DEWS, we draw attention to a central question at the heart of the use of EWS: Are individual risk scores necessary for effectively targeting interventions? We propose a simple mechanism that only uses information about students' environments -- such as their schools, and districts -- and argue that this mechanism can target interventions just as efficiently as the individual risk score-based mechanism. Our argument holds even if individual predictions are highly accurate and effective interventions exist. In addition to motivating this simple targeting mechanism, our work provides a novel empirical backbone for the robust qualitative understanding among education researchers that dropout is structurally determined. Combined, our insights call into question the marginal value of individual predictions in settings where outcomes are driven by high levels of inequality.
arXiv URL BibTeX

Empirical Inference Article Flow annealed importance sampling bootstrap meets differentiable particle physics Kofler, A., Stimper, V., Mikhasenko, M., Kagan, M., Heinrich, L. Machine Learning: Science and Technology, 6(2), IOP Publishing, June 2025 (Published)
High-energy physics requires the generation of large numbers of simulated data samples from complex but analytically tractable distributions called matrix elements. Surrogate models, such as normalizing flows, are gaining popularity for this task due to their computational efficiency. We adopt an approach based on Flow Annealed importance sampling Bootstrap (FAB) that evaluates the differentiable target density during training and helps avoid the costly generation of training data in advance. We show that FAB reaches higher sampling efficiency with fewer target evaluations in high dimensions in comparison to other methods.
DOI URL BibTeX

Social Foundations of Computation Conference Paper How Benchmark Prediction from Fewer Data Misses the Mark Zhang, G., Dorner, F. E., Hardt, M. The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS), June 2025 (Accepted)
Large language model (LLM) evaluation is increasingly costly, prompting interest in methods that speed up evaluation by shrinking benchmark datasets. Benchmark prediction (also called efficient LLM evaluation) aims to select a small subset of evaluation points and predict overall benchmark performance from that subset. In this paper, we systematically assess the strengths and limitations of 11 benchmark prediction methods across 19 diverse benchmarks. First, we identify a highly competitive baseline: Take a random sample and fit a regression model on the sample to predict missing entries. Outperforming most existing methods, this baseline challenges the assumption that careful subset selection is necessary for benchmark prediction. Second, we discover that all existing methods crucially depend on model similarity. They work best when interpolating scores among similar models. The effectiveness of benchmark prediction sharply declines when new models have higher accuracy than previously seen models. In this setting of extrapolation, none of the previous methods consistently beat a simple average over random samples. To improve over the sample average, we introduce a new method inspired by augmented inverse propensity weighting. This method consistently outperforms the random sample average even for extrapolation. However, its performance still relies on model similarity and the gains are modest in general. This shows that benchmark prediction fails just when it is most needed: at the evaluation frontier, where the goal is to evaluate new models of unknown capabilities.
arXiv BibTeX

Empirical Inference Conference Paper Temporally Consistent Object-Centric Learning by Contrasting Slots Manasyan, A., Seitzer, M., Radovic, F., Martius, G., Zadaianchuk, A. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5401-5411, June 2025 (Published) DOI BibTeX

Haptic Intelligence Ph.D. Thesis Towards Robust and Flexible Robot State and Motion Estimation through Optimization and Learning Nubert, J. ETH Zurich, Zurich, Switzerland, June 2025, Department of Mechanical and Process Engineering (Published) BibTeX

Empirical Inference Conference Paper VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models Ye, M., Liu, W., He, P. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 8679-8688, June 2025 (Published) DOI BibTeX

Haptic Intelligence Robotic Materials Article Wearable Electrohydraulic Actuation for Salient Full-Fingertip Haptic Feedback Shao, Y., Shagan Shomron, A., Javot, B., Keplinger, C., Kuchenbecker, K. J. Advanced Materials Technologies, 10(12):2401525, June 2025, Yitian Shao and Alona Shagan Shomron contributed equally to this publication. This article was selected for the front cover. https://doi.org/10.1002/admt.202570062 (Published)
Although essential for an immersive experience in extended reality (XR), providing salient and versatile touch feedback remains a technical challenge. Existing solutions restrict hand movements with bulky rigid structures, require a tethered energy source to power actuators worn on the hand, or output vibrations that lack expressiveness. This study introduces a design strategy for compact, lightweight, untethered haptic feedback centering on a 30-µm-thick inflatable chamber that naturally conforms to the fingertip; to minimize fluidic losses and enable high bandwidth, a soft electrohydraulic pump mounted on the hand actuates the chamber via a mechanically transparent fluidic channel. A 15.2-mm-diameter prototypical actuation chamber achieves 8 N peak force, 3 N steady-state force, stroke up to 5 mm, and bandwidth from 0 to 500 Hz. In contrast to these salient fingertip cues, the entire hydraulic system has a weight less than 8 g and a thickness less than 2 mm. Additionally, this study presents a validation approach that uses a commercial fingertip sensor to confirm that the haptic feedback created by the device imitates the touch signals generated during typical hand interactions. Together, this design strategy and validation method can enable a broad spectrum of haptic activities in diverse XR applications, including medical training, online shopping, and social interactions.
DOI BibTeX

Empirical Inference Perceiving Systems Conference Paper ChatHuman: Chatting about 3D Humans with Tools Lin, J., Feng, Y., Liu, W., Black, M. J. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 8150-8161, June 2025 (Published)
Numerous methods have been proposed to detect, estimate, and analyze properties of people in images, including 3D pose, shape, contact, human-object interaction, and emotion. While widely applicable in vision and other areas, such methods require expert knowledge to select, use, and interpret the results. To address this, we introduce ChatHuman, a language-driven system that integrates the capabilities of specialized methods into a unified framework. ChatHuman functions as an assistant proficient in utilizing, analyzing, and interacting with tools specific to 3D human tasks, adeptly discussing and resolving related challenges. Built on a Large Language Model (LLM) framework, ChatHuman is trained to autonomously select, apply, and interpret a diverse set of tools in response to user inputs. Our approach overcomes significant hurdles in adapting LLMs to 3D human tasks, including the need for domain-specific knowledge and the ability to interpret complex 3D outputs. The innovations of ChatHuman include leveraging academic publications to instruct the LLM on tool usage, employing a retrieval-augmented generation model to create in-context learning examples for managing new tools, and effectively discriminating between and integrating tool results by transforming specialized 3D outputs into comprehensible formats. Experiments demonstrate that ChatHuman surpasses existing models in both tool selection accuracy and overall performance across various 3D human tasks, and it supports interactive chatting with users. ChatHuman represents a significant step toward consolidating diverse analytical methods into a unified, robust system for 3D human tasks.
project pdf Paper DOI BibTeX

Perceiving Systems Conference Paper InteractVLM: 3D Interaction Reasoning from 2D Foundational Models Dwivedi, S. K., Antić, D., Tripathi, S., Taheri, O., Schmid, C., Black, M. J., Tzionas, D. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 22605-22615, June 2025 (Published)
We introduce InteractVLM, a novel method to estimate 3D contact points on human bodies and objects from single in-the-wild images, enabling accurate human-object joint reconstruction in 3D. This is challenging due to occlusions, depth ambiguities, and widely varying object shapes. Existing methods rely on 3D contact annotations collected via expensive motion-capture systems or tedious manual labeling, limiting scalability and generalization. To overcome this, InteractVLM harnesses the broad visual knowledge of large Vision-Language Models (VLMs), fine-tuned with limited 3D contact data. However, directly applying these models is non-trivial, as they reason only in 2D, while human-object contact is inherently 3D. Thus we introduce a novel Render-Localize-Lift module that: (1) embeds 3D body and object surfaces in 2D space via multi-view rendering, (2) trains a novel multi-view localization model (MV-Loc) to infer contacts in 2D, and (3) lifts these to 3D. Additionally, we propose a new task called Semantic Human Contact estimation, where human contact predictions are conditioned explicitly on object semantics, enabling richer interaction modeling. InteractVLM outperforms existing work on contact estimation and also facilitates 3D reconstruction from an in-the wild image.
Project Paper Code Video BibTeX

Conference Paper PICO: Reconstructing 3D People In Contact with Objects Cseke, A., Tripathi, S., Dwivedi, S. K., Lakshmipathy, A., Chatterjee, A., Black, M. J., Tzionas, D. In June 2025 (Published) arXiv project BibTeX

Perceiving Systems Conference Paper PromptHMR: Promptable Human Mesh Recovery Wang, Y., Sun, Y., Patel, P., Daniilidis, K., Black, M. J., Kocabas, M. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), June 2025 (Published)
Human pose and shape (HPS) estimation presents challenges in diverse scenarios such as crowded scenes, person-person interactions, and single-view reconstruction. Existing approaches lack mechanisms to incorporate auxiliary "side information" that could enhance reconstruction accuracy in such challenging scenarios. Furthermore, the most accurate methods rely on cropped person detections and cannot exploit scene context while methods that process the whole image often fail to detect people and are less accurate than methods that use crops. While recent language-based methods explore HPS reasoning through large language or vision-language models, their metric accuracy is well below the state of the art. In contrast, we present PromptHMR, a transformer-based promptable method that reformulates HPS estimation through spatial and semantic prompts. Our method processes full images to maintain scene context and accepts multiple input modalities: spatial prompts like bounding boxes and masks, and semantic prompts like language descriptions or interaction labels. PromptHMR demonstrates robust performance across challenging scenarios: estimating people from bounding boxes as small as faces in crowded scenes, improving body shape estimation through language descriptions, modeling person-person interactions, and producing temporally coherent motions in videos. Experiments on benchmarks show that PromptHMR achieves state-of-the-art performance while offering flexible prompt-based control over the HPS estimation process.
arXiv project video BibTeX

Physical Intelligence Article 3D Locomotion of Surface-Rolling Microrobots: A Trade-off between Hydrodynamic Wall and Gravitational Effects Park, M., Bozuyuk, U., Yildiz, E., Min, H., Yoon, J., Sitti, M. Advanced Intelligent Systems, 7:2500381, May 2025 (Published)
Synthetic microrobots have gained significant attention due to their potential in various applications in biomedicine and lab-on-a-chip technologies. As a fundamental requirement, microrobots must navigate in 3D, effectively counteracting gravity to execute their tasks. However, locomotion at small scales presents numerous counterintuitive behaviors, primarily governed by the interactions between the microrobot's body and its surrounding boundaries. In this study, the locomotion of surface-rolling microrobots is investigated in 3D, particularly focusing on their ability to climb walls. Through a combination of experiments and computational fluid dynamics analyzes, it is demonstrated that the influence of gravity plays a secondary role in enabling surface-rolling microrobots to climb walls. Instead, locomotion capability in 3D settings is primarily determined by interactions with surrounding boundaries. The fundamental principles of surface-rolling locomotion in 3D spaces is elucidated and a design strategy aimed at optimizing fluid flow for efficient propulsion in future applications is proposed.
DOI URL BibTeX

Physical Intelligence Article Anisotropic Surface Microrollers for Endovascular Navigation: A Computational Analysis with a Case Study in Hepatic Perfusion Arslan, B., Bozuyuk, U., Görgülü, K., Yildiz, E., Ozturk, H., Liotta, L., Heinemann, V., Algül, H., Sitti, M. Advanced Theory and Simulations, 8:2400387, May 2025 (Published)
Magnetic surface microrollers have demonstrated promise as active drug delivery agents for targeted and minimally invasive disease treatment. Specifically, it can be employed in the circulatory system to locally release therapeutic agents at disease sites, minimizing systemic exposure and reducing side effects, particularly in the treatment of diseases like cancer. Previous research indicates that the design and shape of microrollers play a crucial role in safe navigation within blood vessels, with anisotropic microrollers exhibiting superiority due to favorable hydrodynamic interactions with nearby boundaries. In this study, the navigation potential of anisotropic microrollers is investigated in veins, venules, and capillaries through computational fluid dynamics analyses. These results indicate that robust locomotion is only achievable in larger vessels, such as veins. Subsequently, their performance is explored in a clinically relevant scenario – the hepatic circulation toward treating primary liver cancer or metastatic nodes of distant tumors (e.g., pancreatic cancer). Computational fluid dynamics analyses using the data from five different patients demonstrate that robust navigation can be achieved with high actuation frequencies. Overall, the findings presented in this study lay a preliminary foundation for the potential future application of surface microrollers in vivo.
DOI URL BibTeX

Empirical Inference Conference Paper Accuracy on the wrong line: On the pitfalls of noisy data for out-of-distribution generalisation Sanyal, A., Hu, Y., Yu, Y., Ma, Y., Wang, Y., Schölkopf, B. Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS), 258:2170-2178, Proceedings of Machine Learning Research, (Editors: Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz), PMLR, May 2025 (Published) URL BibTeX

Haptic Intelligence Article Comparing Puncture-Detection Approaches for Manual Needle Insertions Through the Parietal Pleura L’Orsa, R., Zareinia, K., Sutherland, G. R., Westwick, D., Kuchenbecker, K. J. IEEE Transactions on Medical Robotics and Bionics, 7(2):455-468, May 2025 (Published)
Tube thoracostomy (chest tube insertion) is a surgical procedure that treats pneumothorax, a potentially life-threatening condition where air accumulates between the chest wall and the lungs. The literature reports high complication rates for this procedure, including accidental fatality due to poor manual depth control during tool insertion. We hypothesize that an instrumented needle-holder could help operators recognize pleural puncture and improve depth control, and we present a puncture-detection experiment that contributes toward this goal. An operator manually inserted a bevel-tip needle into ex vivo porcine ribs and through the parietal pleura via a sensorized percutaneous device that records position, force, and videos. We use this rich dataset of 63 insertions to thoroughly test four previously published data-driven puncture-detection (DDPD) algorithms against two new real-time algorithms: a custom recursive digital filter with coefficients optimized for our application, and a difference equation that compares standard deviations between adjacent sliding windows. Our algorithms achieve a precision (true positives over total identified punctures) of 23% and 22%, respectively, while the precision of existing DDPD algorithms ranges from 0% to 21%. Despite these performance improvements, our results show the limitations of DDPD algorithms and motivate new methods for detecting pleural membrane punctures in thoracostomy.
DOI BibTeX

Haptic Intelligence Article Enhancing Needle Puncture Detection Using High-Pass Filtering and Diffuse Reflectance L’Orsa, R., Bisht, A., Yu, L., Murari, K., Sutherland, G. R., Westwick, D. T., Kuchenbecker, K. J. Frontiers in Robotics and AI, 12(1429327):1-16, May 2025 (Published)
Chest trauma or disease progression can lead to tension pneumothorax, a condition where mounting pressurization of the pleural cavity (the space between the chest wall and the lungs) leads rapidly to cardiac arrest. In pre-hospital settings, tension pneumothorax is treated by venting the pleural cavity via a needle introduced through the chest wall. Very high failure rates (up to 94.1%) have been reported for pre-hospital needle decompression, however, and the procedure can result in the accidental puncture of critical thoracic tissues because it is performed blind. Instrumented needles could help operators more reliably identify when the tool has entered the target space. This paper investigates technical approaches to provide such support; we created an experimental system that acquires needle force and position signals, as well as the diffuse backscattered reflectance from white light carried to and collected from the needle's tip via two in-bore optical fibers. Data collection occurred while two experimenters inserted a bevel-tipped percutaneous needle into an ex vivo porcine rib section simulating human chest anatomy. Four data-driven puncture-detection (DDPD) algorithms from the literature, which are appropriate for use with the variable tool velocities produced by manual insertions, were applied to the resulting data set offline. Grid search was performed across key signal-processing parameters, high-pass filters (HPFs) were applied to examine their impact on puncture detection, and a first exploration of multimodal (ensemble) methods was performed. Combining high-pass filters with DDPD methods resulted in a 2.7-fold improvement (from 8.2% to 21.9%) in the maximum overall precision (MOP) produced by force signals. Applying this HPF + DDPD scheme to reflectance data streams yielded a peak MOP of 36.4%, and combining reflectance with force generated the best MOP overall (42.1%); these results represent 4.4-fold and 5.1-fold improvements, respectively, over the best MOP produced by the traditional application of DDPD algorithms to force signals alone. These results strongly support the utility of high-pass filters combined with both reflectance-only and multimodal reflectance-plus-force data-driven puncture-detection schemes for needle decompression applications.
DOI BibTeX

Haptic Intelligence Optics and Sensing Laboratory Miscellaneous Open-Source Multi-Viewpoint Surgical Telerobotics Caccianiga, G., Sharon, Y., Javot, B., Polikovsky, S., Ergün, G., Capobianco, I., Mihaljevic, A. L., Deguet, A., Kuchenbecker, K. J. Extended abstract (2 pages) presented at the ICRA Workshop on Robot-Assisted Medical Imaging (ICRA-RAMI), Atlanta, USA, May 2025 (Published) URL BibTeX

Empirical Inference Ph.D. Thesis Scalable Gaussian Processes: Advances in Iterative Methods and Pathwise Conditioning Lin, J. University of Cambridge, UK, May 2025, (Cambridge-T{\"u}bingen-Fellowship-Program) (Published) BibTeX