Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Perceiving Systems Ph.D. Thesis Aerial Robot Formations for Dynamic Environment Perception Price, E. University of Tübingen, Tübingen, Germany, December 2025 (Published)
Perceiving moving subjects, like humans and animals, outside an enclosed and controlled environment in a lab is inherently challenging, since subjects could move outside the view and range of cameras and sensors that are static and extrinsically calibrated. Previous state-of-the-art methods for such perception in outdoor scenarios use markers or sensors on the subject, which are both intrusive and unscalable for animal subjects. To address this problem, we introduce robotic flying cameras that autonomously follow the subjects. To enable functions such as monitoring, behaviour analysis or motion capture, a single point of view is often insufficient due to self-occlusion, lack of depth perception and coverage from all sides. Therefore, we propose a team of such robotic cameras that fly in formation to provide continuous coverage from multiple view-points. The position of the subject must be determined using markerless, remote sensing methods in real time. To solve this, we combine a convolutional neural network-based detector to detect the subject with a novel cooperative Bayesian fusion method to track the detected subject from multiple robots. The robots need to then plan and control their own flight path and orientation relative to the subject to achieve and maintain continuous coverage from multiple view-points. This, we address with a model-predictive-control-based method to predict and plan the motion of every robot in the formation around the subject. A preliminary demonstrator is implemented with multi-rotor drones. However, drones are noisy and potentially unsafe for the observed subjects. To address this, we introduce non-holonomic lighter-than-air autonomous airships (blimps) as the robotic camera platform. This type of robot requires dynamically constrained orbiting formations to achieve omnidirectional visual coverage of a moving subject in the presence of wind. Therefore, we introduce a novel model-predictive formation controller for a team of airships. We demonstrate and evaluate our complete system in field experiments involving both human and wild animals as subjects. The collected data enables both human outdoor motion capture and animal behaviour analysis. Additionally, we propose our method for autonomous long-term wildlife monitoring. This dissertation covers the design and evaluation of aerial robots suitable to this task, including computer vision/sensing, data annotation and network training, sensor fusion, planning, control, simulation, and modelling.
Thesis DOI BibTeX

Perceiving Systems Conference Paper BEDLAM2.0: Synthetic humans and cameras in motion Tesch, J., Becherini, G., Achar, P., Yiannakidis, A., Kocabas, M., Patel, P., Black, M. J. In Advances in Neural Information Processing Systems (NeurIPS), Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track, December 2025 (Published)
Inferring 3D human motion from video remains a challenging problem with many applications. While traditional methods estimate the human in image coordinates, many applications require human motion to be estimated in world coordinates. This is particularly challenging when there is both human and camera motion. Progress on this topic has been limited by the lack of rich video data with ground truth human and camera movement. We address this with BEDLAM2.0, a new dataset that goes beyond the popular BEDLAM dataset in important ways. In addition to introducing more diverse and realistic cameras and camera motions, BEDLAM2.0 increases diversity and realism of body shape, motions, clothing, hair, and 3D environments. Additionally, it adds shoes, which were missing in BEDLAM. BEDLAM has become a key resource for training 3D human pose and motion regressors today and we show that BEDLAM2.0 is significantly better, particularly for training methods that estimate humans in world coordinates. We compare state-of-the art methods trained on BEDLAM and BEDLAM2.0, and find that BEDLAM2.0 significantly improves accuracy over BEDLAM. For research purposes, we provide the rendered videos, ground truth body parameters, and camera motions. We also provide the 3D assets to which we have rights and links to those from third parties.
Project Paper Video URL BibTeX

Perceiving Systems Conference Paper HairFree: Compositional 2D Head Prior for Text-Driven 360° Bald Texture Synthesis Ostrek, M., Black, M., Thies, J. In Advances in Neural Information Processing Systems (NeurIPS), Advances in Neural Information Processing Systems (NeurIPS), December 2025 (Published)
Synthesizing high-quality 3D head textures is crucial for gaming, virtual reality, and digital humans. Achieving seamless 360° textures typically requires expensive multi-view datasets with precise tracking. However, traditional methods struggle without back-view data or precise geometry, especially for human heads, where even minor inconsistencies disrupt realism. We introduce HairFree, an unsupervised texturing framework guided by textual descriptions and 2D diffusion priors, producing high-consistency 360° bald head textures—including non-human skin with fine details—without any texture, back-view, bald, non-human, or synthetic training data. We fine-tune a diffusion prior on a dataset of mostly frontal faces, conditioned on predicted 3D head geometry and face parsing. During inference, HairFree uses precise skin masks and 3D FLAME geometry as input conditioning, ensuring high 3D consistency and alignment. We synthesize the full 360° texture by first generating a frontal RGB image aligned to the 3D FLAME pose and mapping it to UV space. As the virtual camera moves, we inpaint and merge missing regions. A built-in semantic prior enables precise region separation—particularly for isolating and removing hair—allowing seamless integration with various assets like customizable 3D hair, eyeglasses, jewelry, etc. We evaluate HairFree quantitatively and qualitatively, demonstrating its superiority over state-of-the-art 3D head avatar generation methods. https://hairfree.is.tue.mpg.de/
pdf project poster BibTeX

Physical Intelligence Article Nuclear magnetic resonance for wireless magnetic tracking Efe Tiryaki, M., Esmaeili-Dokht, P., Lazovic, J., Pruessmann, K. P., Sitti, M. Nature Communications, 16:10840, December 2025 (Published)
Wireless trackers have emerged as a crucial technology in minimally invasive medical procedures with their remote localization capabilities. Existing trackers suffer from miniaturization issues and complex designs, which limit their integration into medical devices. We present nuclear magnetic resonance (NMR) magnetic sensing, a quantum sensing approach with nT sensitivity for wireless magnetic tracking. NMR magnetic sensing enables millimeter-scale tracking accuracy and versatile miniaturized tracker designs for minimally invasive medical devices in magnetic resonance imaging scanners. As examples, we demonstrate miniature magnetic trackers with submillimeter-scale diameters for guidewires and optic fibers, flexible magnetic trackers for soft devices, and ferrofluidic trackers for shape-morphing devices. With the demonstrated miniaturization and wide range of tracker design possibilities, wireless magnetic tracking with NMR is promising for future minimally invasive medical operations.
DOI URL BibTeX

Perceiving Systems Conference Paper GenLit: Reformulating Single Image Relighting as Video Generation Bharadwaj, S., Feng, H., Becherini, G., Abrevaya, V. F., Black, M. J. In SIGGRAPH Asia Conference Papers ’25, Association for Computing Machinery, SIGGRAPH Asia, December 2025 (To be published)
Manipulating the illumination of a 3D scene within a single image represents a fundamental challenge in computer vision and graphics. This problem has traditionally been addressed using inverse rendering techniques, which involve explicit 3D asset reconstruction and costly ray-tracing simulations. Meanwhile, recent advancements in visual foundation models suggest that a new paradigm could soon be possible -- one that replaces explicit physical models with networks that are trained on large amounts of image and video data. In this paper, we exploit the implicit scene understanding of a video diffusion model, particularly Stable Video Diffusion, to relight a single image. We introduce GenLit, a framework that distills the ability of a graphics engine to perform light manipulation into a video-generation model, enabling users to directly insert and manipulate a point light in the 3D world within a given image and generate results directly as a video sequence. We find that a model fine-tuned on only a small synthetic dataset generalizes to real-world scenes, enabling single-image relighting with plausible and convincing shadows and inter-reflections. Our results highlight the ability of video foundation models to capture rich information about lighting, material, and shape, and our findings indicate that such models, with minimal training, can be used to perform relighting without explicit asset reconstruction or ray-tracing.
Project Page Paper DOI URL BibTeX

Empirical Inference Conference Paper A data and task-constrained mechanistic model of the mouse outer retina shows robustness to contrast variations Kadhim, K. L., Beck, J., Huang, Z., Macke, J. H., Rieke, F., Euler, T., Deistler, M., Berens, P. Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 39th Annual Conference on Neural Information Processing Systems, December 2025 (Accepted) bioRxiv BibTeX

Empirical Inference Conference Paper Are Language Models Efficient Reasoners? A Perspective from Logic Programming Opedal, A., Zengaffinen, Y., Shirakami, H., Pasti, C., Sachan, M., Saparov, A., Cotterell, R., Schölkopf, B. Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 39th Annual Conference on Neural Information Processing Systems, December 2025 (Accepted) arXiv BibTeX

Empirical Inference Conference Paper CauSciBench: Assessing LLM Causal Reasoning for Scientific Research Acharya, S., Zhang, T. J., Kim, A., Haghighat, A., Sun, X., Shrestha, R. B., Mordig, M., Danisman, F., Jose, C., Qi, Y., Cobben, P., Schölkopf, B., Sachan, M., Jin, Z. NeurIPS 2025: 5th Workshop on Mathematical Reasoning and AI (Math-AI) and CauScien Workshop, December 2025 (Published) URL BibTeX

Empirical Inference Conference Paper Counterfactual reasoning: an analysis of in-context emergence Miller, M., Schölkopf, B., Guo, S. Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 39th Annual Conference on Neural Information Processing Systems, December 2025 (Accepted) arXiv BibTeX

Haptic Intelligence Article Creating an Affective Robot That Feels Both Touch and Emotion Burns, R. B., Richardson, B. A., Klingenberg, J., Kuchenbecker, K. J. IEEE Transactions on Affective Computing, 1-18, December 2025, Rachael Bevill Burns and Benjamin A. Richardson contributed equally to this publication (Published)
Despite the importance of sensitive skin for living creatures, most robots can feel contact on only a tiny fraction of their exterior, if at all. Furthermore, typical robot reactions to touch are limited to event-based acknowledgments, lacking perceptual richness, lifelike positive/negative responses, and temporal dynamics. We address these gaps by introducing a practical full-body tactile-perception system for social robots, turning a NAO robot into the Haptic Empathetic Robot Animal (HERA). The sixteen main regions of the robot's body are instrumented with soft resistive tactile sensors covered by a tailored koala suit. Windows of each time-varying sensor output are continually classified into five gestures at two intensities via a two-stage machine-learning model. On challenging testing data containing simultaneous contacts, touch detection achieves an F1 score of 0.773, and gesture recognition achieves 52.2% accuracy (5.2 times chance); considering the temporal, spatial, and semantic adjacency of the applied touches increases these metrics to 0.896 and 86.6%, respectively. In turn, each detected contact drives a real-time emotion model that represents the robot's affective state as a second-order dynamic system analogous to a mass-spring-damper. This model's parameters control the robot's disposition, stoicism, and calmness. We explain the connections between HERA's hardware and software subsystems and demonstrate their combined ability to create an affective robot that feels both touch and emotion.
DOI BibTeX

Empirical Inference Conference Paper Cultural Alien Sampler: Open-ended art generation balancing originality and coherence Hernandez, A., Yakura, H., Brinkmann, L., Sola, M. C., Alhaija, H. A., Serna, I., Rahaman, N., Schölkopf, B., Rahwan, I. Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 39th Annual Conference on Neural Information Processing Systems, Creative AI Track, December 2025 (Accepted) arXiv BibTeX

Empirical Inference Conference Paper Do-PFN: In-Context Learning for Causal Effect Estimation Robertson*, J., Reuter*, A., Guo, S., Hollmann, N., Hutter, F., Schölkopf, B. Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 39th Annual Conference on Neural Information Processing Systems, December 2025, *equal contribution (Accepted) arXiv BibTeX

Empirical Inference Conference Paper Effortless, Simulation-Efficient Bayesian Inference using Tabular Foundation Models Vetter, J., Gloeckler, M., Gedon, D., Macke, J. H. Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 39th Annual Conference on Neural Information Processing Systems, December 2025 (Accepted) arXiv BibTeX

Empirical Inference Conference Paper FNOPE: Simulation-based inference on function spaces with Fourier Neural Operators Moss, G., Muhle, L. S., Drews, R., Macke, J. H., Schröder, C. Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 39th Annual Conference on Neural Information Processing Systems, December 2025 (Accepted) arXiv BibTeX

Empirical Inference Conference Paper Forecasting in Offline Reinforcement Learning for Non-stationary Environments Ada, S. E., Martius, G., Ugur, E., Oztop, E. In Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 39th Annual Conference on Neural Information Processing Systems, December 2025 (Accepted) arXiv BibTeX

Empirical Inference Conference Paper Identifying multi-compartment Hodgkin-Huxley models with high-density extracellular voltage recordings Tanoh, I. C., Deistler, M., Macke, J. H., Linderman, S. Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 39th Annual Conference on Neural Information Processing Systems, December 2025 (Accepted) arXiv BibTeX

Perceiving Systems Ph.D. Thesis Learning Hands in Action Fan, Z. December 2025 (Published)
Hands are our primary interface for acting on the world. From everyday tasks like preparing food to skilled procedures like surgery, human activity is shaped by rich and varied hand interactions. These include not only manipulation of external objects but also coordinated actions between both hands. For physical AI systems to learn from human behavior, assist in physical tasks, or collaborate safely in shared environments, they must perceive and understand hands in action, how we use them to interact with each other and with the objects around us. A key component of this understanding is the ability to reconstruct human hand motion and hand-object interactions in 3D from RGB images or videos. However, existing methods focus largely on estimating the pose of a single hand, often in isolation. They struggle with scenarios involving two hands in strong interactions or the interactions with objects, particularly when those objects are articulated or previously unseen. This is because reconstructing 3D hands in action poses significant challenges, such as severe occlusions, appearance ambiguities, and the need to reason about both hand and object geometry in dynamic configurations. As a result, current systems fall short in complex real-world environments. This dissertation addresses these challenges by introducing methods and data for reconstructing hands in action from monocular RGB inputs. We begin by tackling the problem of interacting hand pose estimation. We present DIGIT, a method that leverages a part-aware semantic prior to disambiguate closely interacting hands. By explicitly modeling hand part interactions and encoding the semantics of finger parts, DIGIT robustly recovers accurate hand poses, outperforms prior baselines and provides a step forward for more complete 3D hands in action understanding. Since hands frequently manipulate objects, jointly reconstructing both is crucial. Existing methods for hand-object reconstruction are limited to rigid objects and cannot handle tools with articulation, such as scissors or laptops. This severely restricts their ability to model the full range of everyday manipulations. We present the first method that jointly reconstructs two hands and an articulated object from a single RGB image, enabling unified reasoning across both rigid and articulated object interactions. To support this, we introduce ARCTIC, a large-scale motion capture dataset of humans performing dexterous bimanual manipulation with articulated tools. ARCTIC includes both articulated and fixed (rigid) configurations, along with accurate 3D annotations of hand poses and object motions. Leveraging this dataset, our method jointly infers object articulation states, and hand poses, advancing the state of hand-object understanding in complex object manipulation settings. Finally, we address generalization to in-the-wild object interactions. Prior approaches either rely on synthetic data with limited realism or require object models at test time. We introduce HOLD, a self-supervised method that learns to reconstruct 3D hand-object interactions from monocular RGB videos, without paired 3D annotations or known object models. HOLD learns via an appearance- and motion-consistent objective across views and time, enabling strong generalization to unseen objects in interaction. Experiments demonstrate HOLD's ability to generalize to in-the-wild monocular settings, outperforming fully-supervised baselines trained on synthetic or lab-captured datasets. Together, DIGIT, ARCTIC, and HOLD advance the 3D understanding of hands in action, covering both hand-hand and hand-object interactions. These contributions improve the robustness in interacting hand pose estimation, introduce a dataset for bimanual manipulation with rigid and articulated tools, and include the first singe-image method for jointly reconstructing hands and articulated objects learned directly from this dataset. In addition, HOLD removes the need for object templates by enabling hand-object reconstruction in the wild. These developments move toward more scalable physical AI systems capable of interpreting and imitating human manipulation, with applications in teleoperation, human-robot collaboration, and embodied learning from demonstration.
PDF BibTeX

Empirical Inference Conference Paper Reparameterized LLM Training via Orthogonal Equivalence Transformation Qiu, Z., Buchholz, S., Xiao, T., Dax, M., Schölkopf, B., Liu, W. Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 39th Annual Conference on Neural Information Processing Systems, December 2025 (Accepted) arXiv BibTeX

Empirical Inference Conference Paper Root Cause Analysis of Outliers with Missing Structural Knowledge Orchard, W. R., Okati, N., Garrido Mejia, S., Blöbaum, P., Janzing, D. Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 39th Annual Conference on Neural Information Processing Systems, December 2025 (Accepted) arXiv BibTeX

Empirical Inference Conference Paper SPARTAN: A Sparse Transformer World Model Attending to What Matters Lei, A., Schölkopf, B., Posner, I. Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 39th Annual Conference on Neural Information Processing Systems, December 2025 (Accepted) arXiv BibTeX

Organizational Leadership and Diversity Conference Paper Inclusive Leadership in the Age of AI: A Dataset and Comparative Study of LLMs vs. Real-Life Leaders in Workplace Action Planning Singh, V., Schulte im Walde, S., Keplinger, K. Findings of the Association for Computational Linguistics: EMNLP 2025, 19732-19753, Association for Computational Linguistics, Suzhou, China, Empirical Methods in Natural Language Processing, November 2025 (Published)
Generative Large Language Models have emerged as useful tools, reshaping professional workflows. However, their efficacy in inherently complex and human-centric tasks such as leadership and strategic planning remains under-explored. In this interdisciplinary study, we present a novel dataset and compare LLMs and human leaders in the context of work-place action planning, specifically focusing on translating the abstract idea of inclusion into actionable SMART goals. We developed the Leader Success Bot, a script-based chat-bot co-designed with domain experts, to guide more than 250 real-life leaders in generating inclusive workplace action plans. We systematically prompted seven state-of-the-art chat-based LLMs to perform the same task using the socio-demographic data of real-life leaders and instructions co-developed with domain experts. Our publicly released dataset enables direct comparison between human and LLM-generated workplace action plans, offering in-sights into their respective strengths, biases, and limitations. Our findings highlight critical gaps and opportunities for LLMs in leadership applications, fostering interdisciplinary collaboration and NLP applications.
DOI URL BibTeX

Haptic Intelligence Perceiving Systems Ph.D. Thesis An Interdisciplinary Approach to Human Pose Estimation: Application to Sign Language Forte, M. University of Tübingen, Tübingen, Germany, November 2025, Department of Computer Science (Published)
Accessibility legislation mandates equal access to information for Deaf communities. While videos of human interpreters provide optimal accessibility, they are costly and impractical for frequently updated content. AI-driven signing avatars offer a promising alternative, but their development is limited by the lack of high-quality 3D motion-capture data at scale. Vision-based motion-capture methods are scalable but struggle with the rapid hand movements, self-occlusion, and self-touch that characterize sign language. To address these limitations, this dissertation develops two complementary solutions. SGNify improves hand pose estimation by incorporating universal linguistic rules that apply to all sign languages as computational priors. Proficient signers recognize the reconstructed signs as accurately as those in the original videos, but depth ambiguities along the camera axis can still produce incorrect reconstructions for signs involving self-touch. To overcome this remaining limitation, BioTUCH integrates electrical bioimpedance sensing between the wrists of the person being captured. Systematic measurements show that skin-to-skin contact produces distinctive bioimpedance reductions at high frequencies (240 kHz to 4.1 MHz), enabling reliable contact detection. BioTUCH uses the timing of these self-touch events to refine arm poses, producing physically plausible arm configurations and significantly reducing reconstruction error. Together, these contributions support the scalable collection of high-quality 3D sign language motion data, facilitating progress toward AI-driven signing avatars.
BibTeX

Empirical Inference Conference Paper Improving Large Language Model Safety with Contrastive Representation Learning Simko, S., Sachan, M., Schölkopf, B., Jin, Z. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), 28166-28194, (Editors: Christodoulopoulos, Christos and Chakraborty, Tanmoy and Rose, Carolyn and Peng, Violet), Association for Computational Linguistics, November 2025 (Published) arXiv DOI URL BibTeX

Social Foundations of Computation Miscellaneous Policy Design in Long-run Welfare Dynamics Wu, J., Abebe, R., Hardt, M., Stoica, A. Proceedings of the Fifth ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO), November 2025 (Published) URL BibTeX

Physical Intelligence Article Optoacoustic-Guided Magnetic Microrobot Platform for Precision Drug Delivery Wang, F., Yildiz, E., Deán-Ben, X. L., Yu, Y., Nozdriukhin, D., Kang, W., Zhang, S., Zinnanti, J., Sheehan, D., Soon, R. H., Sitti, M. Advanced Materials, 38:e11870, October 2025 (Published)
Precision drug delivery remains a significant challenge due to limitations in drug loading, targeted release, precise navigation, and real-time monitoring. Here, the study reports a magnetic microrobot platform (MMP) that integrates high-capacity drug loading, magnetically actuated collective navigation, controlled drug release, and real-time 3D optoacoustic imaging in a single system. The MMP exploits synergistic advantages by embedding hard-magnetic FePt nanoparticles in a degradable ZIF-8 shell, achieving a drug loading efficiency of ≈93.9% and enabling precise release in response to pH changes and radiofrequency-induced heating. Reconfigurable swarm behavior strategies significantly enhance the navigation efficiency of microrobots against physiological blood flows within complex cerebral vasculature. The ex vivo and in vivo experiments further demonstrate strong contrast characteristics of the microrobots, enabling high-resolution visualization of deep vascular structures and dynamic tracking of MMP with real-time 3D optoacoustic imaging. This multifunctional strategy paves the way for clinical translation and precision therapy in complex biological settings.
DOI URL BibTeX

Physical Intelligence Article Emergent Motility of Self-Organized Particle-Giant Unilamellar Vesicle Assembly Karaz, S., Gardi, G., Han, M., Baltaci, S. F., Akolpoglu, M. B., Sitti, M. Advanced Materials, xx:e12036, October 2025 (Published)
Giant unilamellar vesicles (GUVs), soft cell-sized compartments formed through the self-assembly of lipid molecules, have long been utilized as model systems and passive carriers in membrane biophysics and biomedical applications. However, their potential as dynamically responsive and motile systems remains largely untapped due to challenges in achieving controlled and sustained motion in soft, deformable structures. Here, an autonomous cell-like microrobot through the emergent self-assembly of GUVs (5-10 µm) and silica microparticles (1-3 µm) under alternating current electric fields is realized. Self-propulsion arises from asymmetric self-organization of the particles on the vesicle surface, enabling a reversible transformation of the assembly into an active structure. Unlike rigid colloidal systems, GUVs introduce unique features enabled by their soft lipid membranes: shape deformations, membrane tension-dependent motility, and field-triggered live bacteria release via vesicle bursting. Through experiments and simulations, the mechanisms underlying self-assembly and propulsion are investigated, and a dynamic phase diagram is constructed to map the motion regime as a function of field parameters. Finally, it is shown that these self-assembled structures are capable of reconfiguration in response to local constraints in the environment, suggesting potential applications in complex environments and advancing the potential of GUVs toward the rational design of cell-like microrobots or artificial cell systems.
DOI URL BibTeX

Physical Intelligence Article Wireless nonresonant stimulation of neurons on a magnetoelectric film surface Aydin, A., Jahanshahi, A., Esmaeili-Dokht, P., Han, M., Gardi, G., Yu, Y., Soon, R. H., Temel, Y., Sitti, M. Science advances, 11:eadx6829, October 2025 (Published)
Wireless neural interfaces are emerging as a minimally invasive treatment option for neurological disorders. Among the wireless technologies, magnetically powered systems are effective for targeting deep brain sites. However, dependence on high-frequency electromagnetic fields in such systems limits their safe implementation. In this study, we demonstrate the use of millimeter-scale magnetoelectric (ME) films as a direct neural interface for wireless neurostimulation, powered by static and alternating magnetic fields in the nonresonant regime (10 hertz). To accomplish this objective, electrical potential trends of the ME films under varying low-frequency magnetic fields are investigated and used to demonstrate neural stimulation by calcium imaging on primary neurons in vitro via a capacitive-like charge injection mechanism. In addition, electrical polarization orientation is revealed as a critical design parameter in direct neuron-ME interfaces. These findings collectively demonstrate the potential of nonresonant powering of ME films as a promising minimally invasive wireless neural stimulation technique.
DOI URL BibTeX

Empirical Inference Article In silico biological discovery with large perturbation models Miladinovic*, D., Höppe*, T., Chevalley, M., Georgiou, A., Stuart, L., Mehrjou, A., Bantscheff, M., Schölkopf, B., Schwab, P. Nature Computational Science, October 2025, *equal contribution (Published)
Data generated in perturbation experiments link perturbations to the changes they elicit and therefore contain information relevant to numerous biological discovery tasks—from understanding the relationships between biological entities to developing therapeutics. However, these data encompass diverse perturbations and readouts, and the complex dependence of experimental outcomes on their biological context makes it challenging to integrate insights across experiments. Here we present the large perturbation model (LPM), a deep-learning model that integrates multiple, heterogeneous perturbation experiments by representing perturbation, readout and context as disentangled dimensions. LPM outperforms existing methods across multiple biological discovery tasks, including in predicting post-perturbation transcriptomes of unseen experiments, identifying shared molecular mechanisms of action between chemical and genetic perturbations, and facilitating the inference of gene–gene interaction networks. LPM learns meaningful joint representations of perturbations, readouts and contexts, enables the study of biological relationships in silico and could considerably accelerate the derivation of insights from pooled perturbation experiments.
DOI URL BibTeX

Haptic Intelligence Perceiving Systems Conference Paper Contact-Aware Refinement of Human Pose Pseudo-Ground Truth via Bioimpedance Sensing Forte, M., Athanasiou, N., Ballardini, G., Bartels, J. U., Kuchenbecker, K. J., Black, M. J. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 5071-5080, Honolulu, USA, October 2025, Nikos Athanasiou and Giulia Ballardini contributed equally to this publication (Published) pdf URL BibTeX

Haptic Intelligence Intelligent Control Systems Conference Paper Diffusion-Based Approximate MPC: Fast and Consistent Imitation of Multi-Modal Action Distributions Marquez Julbe, P., Nubert, J., Hose, H., Trimpe, S., Kuchenbecker, K. J. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 5633-5640, Hangzhou, China, October 2025 (Published)
Approximating model predictive control (MPC) using imitation learning (IL) allows for fast control without solving expensive optimization problems online. However, methods that use neural networks in a simple L2-regression setup fail to approximate multi-modal (set-valued) solution distributions caused by local optima found by the numerical solver or non-convex constraints, such as obstacles, significantly limiting the applicability of approximate MPC in practice. We solve this issue by using diffusion models to accurately represent the complete solution distribution (i.e., all modes) at high control rates (more than 1000 Hz). This work shows that diffusion-based AMPC significantly outperforms L2-regression-based approximate MPC for multi-modal action distributions. In contrast to most earlier work on IL, we also focus on running the diffusion-based controller at a higher rate and in joint space instead of end-effector space. Additionally, we propose the use of gradient guidance during the denoising process to consistently pick the same mode in closed loop to prevent switching between solutions. We propose using the cost and constraint satisfaction of the original MPC problem during parallel sampling of solutions from the diffusion model to pick a better mode online. We evaluate our method on the fast and accurate control of a 7-DoF robot manipulator both in simulation and on hardware deployed at 250 Hz, achieving a speedup of more than 70 times compared to solving the MPC problem online and also outperforming the numerical optimization (used for training) in success ratio.
DOI BibTeX

Perceiving Systems Conference Paper Generative Zoo Niewiadomski, T., Yiannakidis, A., Cuevas-Velasquez, H., Sanyal, S., Black, M. J., Zuffi, S., Kulits, P. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Honolulu, HI, International Conference on Computer Vision, ICCV, October 2025 (Published)
The model-based estimation of 3D animal pose and shape from images enables computational modeling of animal behavior. Training models for this purpose requires large amounts of labeled image data with precise pose and shape annotations. However, capturing such data requires the use of multi-view or marker-based motion-capture systems, which are impractical to adapt to wild animals in situ and impossible to scale across a comprehensive set of animal species. Some have attempted to address the challenge of procuring training data by pseudo-labeling individual real-world images through manual 2D annotation, followed by 3D-parameter optimization to those labels. While this approach may produce silhouette-aligned samples, the obtained pose and shape parameters are often implausible due to the ill-posed nature of the monocular fitting problem. Sidestepping real-world ambiguity, others have designed complex synthetic-data-generation pipelines leveraging video-game engines and collections of artist-designed 3D assets. Such engines yield perfect ground-truth annotations but are often lacking in visual realism and require considerable manual effort to adapt to new species or environments. Motivated by these shortcomings, we propose an alternative approach to synthetic-data generation: rendering with a conditional image-generation model. We introduce a pipeline that samples a diverse set of poses and shapes for a variety of mammalian quadrupeds and generates realistic images with corresponding ground-truth pose and shape parameters. To demonstrate the scalability of our approach, we introduce GenZoo, a synthetic dataset containing one million images of distinct subjects. We train a 3D pose and shape regressor on GenZoo, which achieves state-of-the-art performance on a real-world multi-species 3D animal pose and shape estimation benchmark, despite being trained solely on synthetic data. We will release our dataset and generation pipeline to support future research.
project page code demo pdf BibTeX

Perceiving Systems Conference Paper ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness Li, B., Feng, H., Cai, Z., Black, M. J., Xiu, Y. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2025 (Published)
itting a body to a 3D clothed human point cloud is a common yet challenging task. Traditional optimization-based approaches use multi-stage pipelines that are sensitive to pose initialization, while recent learning-based methods often struggle with generalization across diverse poses and garment types. We propose Equivariant Tightness Fitting for Clothed Humans, or ETCH, a novel pipeline that estimates cloth-to-body surface mapping through locally approximate SE(3) equivariance, encoding tightness as displacement vectors from the cloth surface to the underlying body. Following this mapping, pose-invariant body features regress sparse body markers, simplifying clothed human fitting into an inner-body marker fitting task. Extensive experiments on CAPE and 4D-Dress show that ETCH significantly outperforms state-of-the-art methods -- both tightness-agnostic and tightness-aware -- in body fitting accuracy on loose clothing (16.7% ~ 69.5%) and shape accuracy (average 49.9%). Our equivariant tightness design can even reduce directional errors by (67.2% ~ 89.8%) in one-shot (or out-of-distribution) settings (~ 1% data). Qualitative results demonstrate strong generalization of ETCH, regardless of challenging poses, unseen shapes, loose clothing, and non-rigid dynamics.
project arXiv code video BibTeX

Perceiving Systems Conference Paper Im2Haircut: Single-view Strand-based Hair Reconstruction for Human Avatars Vanessa, S., Egor, Z., Malte, P., Giorgio, B., Michael, B., Justus, T. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Honolulu, USA, October 2025 (Accepted)
We present a novel approach for 3D hair reconstruction from single photographs based on a global hair prior combined with local optimization. Capturing strand-based hair geometry from single photographs is challenging due to the variety and geometric complexity of hairstyles and the lack of ground truth training data. Classical reconstruction methods like multi-view stereo only reconstruct the visible hair strands, missing the inner structure of hairstyles and hampering realistic hair simulation. To address this, existing methods leverage hairstyle priors trained on synthetic data. Such data, however, is limited in both quantity and quality since it requires manual work from skilled artists to model the 3D hairstyles and create near-photorealistic renderings. To address this, we propose a novel approach that uses both, real and synthetic data to learn an effective hairstyle prior. Specifically, we train a transformer-based prior model on synthetic data to obtain knowledge of the internal hairstyle geometry and introduce real data in the learning process to model the outer structure. This training scheme is able to model the visible hair strands depicted in an input image, while preserving the general 3D structure of hairstyles. We exploit this prior to create a Gaussian-splatting-based reconstruction method that creates hairstyles from one or more images. Qualitative and quantitative comparisons with existing reconstruction pipelines demonstrate the effectiveness and superior performance of our method for capturing detailed hair orientation, overall silhouette, and backside consistency. For additional results and code, please refer to https://im2haircut.is.tue.mpg.de.
arXiv project code BibTeX

Perceiving Systems Conference Paper MagicHOI: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips Wang, S., He, H., Parelli, M., Gebhardt, C., Fan, Z., Song, J. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), ICCV, October 2025 (Published)
Most RGB-based hand-object reconstruction methods rely on object templates, while template-free methods typically assume full object visibility. This assumption often breaks in real-world settings, where fixed camera viewpoints and static grips leave parts of the object unobserved, resulting in implausible reconstructions. To overcome this, we present MagicHOI, a method for reconstructing hands and objects from short monocular interaction videos, even under limited viewpoint variation. Our key insight is that, despite the scarcity of paired 3D hand-object data, largescale novel view synthesis diffusion models offer rich object supervision. This supervision serves as a prior to regularize unseen object regions during hand interactions. Leveraging this insight, we integrate a novel view synthesis model into our hand-object reconstruction framework. We further align hand to object by incorporating visible contact constraints. Our results demonstrate that MagicHOI significantly outperforms existing state-of-the-art hand-object reconstruction methods. We also show that novel view synthesis diffusion priors effectively regularize unseen object regions, enhancing 3D hand-object reconstruction.
Project Video Code URL BibTeX

Perceiving Systems Conference Paper MoGA: 3D Generative Avatar Prior for Monocular Gaussian Avatar Reconstruction Dong, Z., Duan, L., Song, J., Black, M. J., Geiger, A. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2025 (Published)
We present MoGA, a novel method to reconstruct high-fidelity 3D Gaussian avatars from a single-view image. The main challenge lies in inferring unseen appearance and geometric details while ensuring 3D consistency and realism. Most previous methods rely on 2D diffusion models to synthesize unseen views; however, these generated views are sparse and inconsistent, resulting in unrealistic 3D artifacts and blurred appearance. To address these limitations, we leverage a generative avatar model, that can generate diverse 3D avatars by sampling deformed Gaussians from a learned prior distribution. Due to the limited amount of 3D training data such a 3D model alone cannot capture all image details of unseen identities. Consequently, we integrate it as a prior, ensuring 3D consistency by projecting input images into its latent space and enforcing additional 3D appearance and geometric constraints. Our novel approach formulates Gaussian avatar creation as a model inversion process by fitting the generative avatar to synthetic views from 2D diffusion models. The generative avatar provides a meaningful initialization for model fitting, enforces 3D regularization, and helps in refining pose estimation. Experiments show that our method surpasses state-of-the-art techniques and generalizes well to real-world scenarios. Our Gaussian avatars are also inherently animatable.
pdf project code video BibTeX

Perceiving Systems Conference Paper PRIMAL: Physically Reactive and Interactive Motor Model for Avatar Learning Zhang, Y., Feng, Y., Cseke, A., Saini, N., Bajandas, N., Heron, N., Black, M. J. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2025 (Published)
We formulate the motor system of an interactive avatar as a generative motion model that can drive the body to move through 3D space in a perpetual, realistic, controllable, and responsive manner. Although human motion generation has been extensively studied, many existing methods lack the responsiveness and realism of real human movements. Inspired by recent advances in foundation models, we propose PRIMAL, which is learned with a two-stage paradigm. In the pretraining stage, the model learns body movements from a large number of sub-second motion segments, providing a generative foundation from which more complex motions are built. This training is fully unsupervised without annotations. Given a single-frame initial state during inference, the pretrained model not only generates unbounded, realistic, and controllable motion, but also enables the avatar to be responsive to induced impulses in real time. In the adaptation phase, we employ a novel ControlNet-like adaptor to fine-tune the base model efficiently, adapting it to new tasks such as few-shot personalized action generation and spatial target reaching. Evaluations show that our proposed method outperforms state-of-the-art baselines. We leverage the model to create a real-time character animation system in Unreal Engine that feels highly responsive and natural.
pdf project code video BibTeX

Perceiving Systems Conference Paper SDFit: 3D Object Pose and Shape by Fitting a Morphable SDF to a Single Image Antić, D., Paschalidis, G., Tripathi, S., Gevers, T., Dwivedi, S. K., Tzionas, D. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2025 (Published)
Recovering 3D object pose and shape from a single image is a challenging and ill-posed problem. This is due to strong (self-)occlusions, depth ambiguities, the vast intra- and inter-class shape variance, and the lack of 3D ground truth for natural images. Existing deep-network methods are trained on synthetic datasets to predict 3D shapes, so they often struggle generalizing to real-world images. Moreover, they lack an explicit feedback loop for refining noisy estimates, and primarily focus on geometry without directly considering pixel alignment. To tackle these limitations, we develop a novel render-and-compare optimization framework, called SDFit. This has three key innovations: First, it uses a learned category-specific and morphable signed-distance-function (mSDF) model, and fits this to an image by iteratively refining both 3D pose and shape. The mSDF robustifies inference by constraining the search on the manifold of valid shapes, while allowing for arbitrary shape topologies. Second, SDFit retrieves an initial 3D shape that likely matches the image, by exploiting foundational models for efficient look-up into 3D shape databases. Third, SDFit initializes pose by establishing rich 2D-3D correspondences between the image and the mSDF through foundational features. We evaluate SDFit on three image datasets, i.e., Pix3D, Pascal3D+, and COMIC. SDFit performs on par with SotA feed-forward networks for unoccluded images and common poses, but is uniquely robust to occlusions and uncommon poses. Moreover, it requires no retraining for unseen images. Thus, SDFit contributes new insights for generalizing in the wild.
Project arXiv Code Video Poster BibTeX

Perceiving Systems Conference Paper St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World Feng, H., Zhang, J., Wang, Q., Ye, Y., Yu, P., Black, M., Darrell, T., Kanazawa, A. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2025 (Published)
Dynamic 3D reconstruction and point tracking in videos are typically treated as separate tasks, despite their deep connection. We propose St4RTrack, a feed-forward framework that simultaneously reconstructs and tracks dynamic video content in a world coordinate frame from RGB inputs. This is achieved by predicting two appropriately defined pointmaps for a pair of frames captured at different moments. Specifically, we predict both pointmaps at the same moment, in the same world, capturing both static and dynamic scene geometry while maintaining 3D correspondences. Chaining these predictions through the video sequence with respect to a reference frame naturally computes long-range correspondences, effectively combining 3D reconstruction with 3D tracking. Unlike prior methods that rely heavily on 4D ground truth supervision we employ a novel adaptation scheme based on a reprojection loss. We establish a new extensive benchmark for world-frame reconstruction and tracking, demonstrating the effectiveness and efficiency of our unified, data-driven framework.
pdf arXiv project code demo video BibTeX

Robotic Composites and Compositions Article Jamming with magnetic composites Aktaş, B., Kim, M., Baeckert, M., Sicilia, G., Franchini, G., Heemeyer, F., Gervasoni, S., Chen, X., Pane, S., Nelson, B. Nature Communications, 16:8711, September 2025 (Published)
The jamming transition—marked by dramatic changes in mechanical properties, such as stiffness and damping—enables programmable and adaptive structures for robotic applications. This phenomenon, driven by changes in the coupling between individual subunits of an aggregate, can be controlled through external actuation sources. Existing jamming actuation methods, such as applying a vacuum with an airtight envelope, pose significant limitations, as they require the structures to be tethered, limiting reconfigurability and scalability. Here, we introduce an untethered jamming mechanism based on magnetic interactions between soft-ferromagnetic composites. We establish composite design principles to program the magnetization of the subunits, demonstrate linear, planar, and volumetric jamming and shape-locking, and model the magneto-mechanical behavior. This approach contributes to the development of jamming-based materials in which the jamming directions and transition points can be tuned on-the-fly by adjusting the external magnetic field orientation and strength, respectively.
DOI URL BibTeX

Organizational Leadership and Diversity Article Inclusive avatars in the Metaverse: learning from the lived experiences of people with disabilities Angerbauer, K., Van Wagoner, H. P., Keplinger, K., Halach, T., Vogelsang, J., Hube, N., Smith, A., Sedlmair, M. The Journal of Strategic Information Systems, 34:101935, September 2025 (Published)
Immersive platforms like the Metaverse have gained attention in information systems (IS) research, yet the diverse needs of people with disabilities (PWD) remain underexplored. This research examines the experiences of PWD using inclusive avatars that represent disabilities. Through an exploratory mixed-methods approach, combining qualitative interviews with an experience sampling study, we develop a framework informed by Affective Events Theory and voices of PWD to better understand how social interactions in the Metaverse impact PWD’s emotions and outcomes. Findings suggest that when PWD use inclusive avatars, inclusive and exclusionary social interactions shape their emotional responses, which in turn influence engagement, avatar connection and satisfaction, and perceptions of inclusion in the Metaverse. Although adopting inclusive avatars can be challenging, especially in the face of exclusionary interactions, the benefits can outweigh the costs. The role of disability identity is critical; PWD who identify strongly with their disability experience less negative emotional impact from exclusion. This research contributes to IS literature by conceptualizing the Metaverse as a relational, emotion-driven environment shaped by social interactions as well as a platform for authentic self-representation. Practical implications include supporting avatar-based disability representation, involving PWD in co-designing virtual reality technologies, and providing training to foster inclusive interactions in the Metaverse. These strategies can help organizations build more inclusive and engaging digital workplaces for an often underrepresented workforce segment.
DOI URL BibTeX

Physical Intelligence Article Mixed-length multivariate covalent organic framework for combined near-infrared photodynamic therapy and drug delivery Rodrı́guez-Camargo, A., Yildiz, E., Juela, D., Fischer, F. R., Graf, D., Rath, B. B., Ochsenfeld, C., Bauer, M., Sitti, M., Yao, L., Lotsch, B. Journal of the American Chemical Society, 147:33472-33481, September 2025 (Published)
Covalent organic frameworks (COFs) have been emerging as versatile reticular materials due to their tunable structures and functionalities, enabled by precise molecular engineering at the atomic level. While the integration of multiple components into COFs has substantially expanded their structural complexity, the strategic engineering of diverse functionalities within a single framework via the random distribution of linkers with varying lengths remains largely unexplored. Here, we report a series of highly crystalline mixed-length multivariate COFs synthesized using azobenzene and bipyridine as linkers, where tuning the ratio of linkers and incorporating palladium effectively modulates the balance between near-infrared (NIR) light absorption and catalytic sites for NIR-generation of hydrogen peroxide (H2O2). Capitalizing on the deep tissue penetration of NIR light and the generated H2O2 as reactive oxygen species, as a proof of concept, the optimal mixed-length multivariate COF reduces breast cancer cell viability by almost 90% after 1 h of irradiation in a combined in vitro photodynamic therapy and drug delivery.
DOI URL BibTeX

Haptic Intelligence Autonomous Learning Empirical Inference Conference Paper Adding Internal Audio Sensing to Internal Vision Enables Human-Like In-Hand Fabric Recognition with Soft Robotic Fingertips Andrussow, I., Solano, J., Richardson, B. A., Martius, G., Kuchenbecker, K. J. In Proceedings of the IEEE-RAS International Conference on Humanoid Robots (Humanoids), 373-380, Seoul, South Korea, September 2025 (Published)
Distinguishing the feel of smooth silk from coarse cotton is a trivial everyday task for humans. When exploring such fabrics, fingertip skin senses both spatio-temporal force patterns and texture-induced vibrations that are integrated to form a haptic representation of the explored material. It is challenging to reproduce this rich, dynamic perceptual capability in robots because tactile sensors typically cannot achieve both high spatial resolution and high temporal sampling rate. In this work, we present a system that can sense both types of haptic information, and we investigate how each type influences robotic tactile perception of fabrics. Our robotic hand's middle finger and thumb each feature a soft tactile sensor: one is the open- source Minsight sensor that uses an internal camera to measure fingertip deformation and force at 50 Hz, and the other is our new sensor Minsound that captures vibrations through an internal MEMS microphone with a bandwidth from 50 Hz to 15 kHz. Inspired by the movements humans make to evaluate fabrics, our robot actively encloses and rubs folded fabric samples between its two sensitive fingers. Our results test the influence of each sensing modality on overall classification performance, showing high utility for the audio-based sensor. Our transformer-based method achieves a maximum fabric classification accuracy of 97% on a dataset of 20 common fabrics. Incorporating an external microphone away from Minsound increases our method's robustness in loud ambient noise conditions. To show that this audio-visual tactile sensing approach generalizes beyond the training data, we learn general representations of fabric stretchiness, thickness, and roughness.
DOI BibTeX

Haptic Intelligence Robotics Embodied Vision Conference Paper ISyHand: A Dexterous Multi-finger Robot Hand with an Articulated Palm Richardson, B. A., Grüninger, F., Mack, L., Stueckler, J., Kuchenbecker, K. J. In Proceedings of the IEEE-RAS International Conference on Humanoid Robots (Humanoids), 720-727, Seoul, South Korea, September 2025, Benjamin A. Richardson, Felix Grueninger and Lukas Mack contributed equally to this publication (Published) DOI BibTeX

Social Foundations of Computation Conference Paper Strategic Hypothesis Testing Hossain, S., Chen, Y., Chen, Y. The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS), Spotlight Poster, top 3%, September 2025 (Accepted)
We examine hypothesis testing within a principal-agent framework, where a strategic agent, holding private beliefs about the effectiveness of a product, submits data to a principal who decides on approval. The principal employs a hypothesis testing rule, aiming to pick a p-value threshold that balances false positives and false negatives while anticipating the agent's incentive to maximize expected profitability. Building on prior work, we develop a game-theoretic model that captures how the agent's participation and reporting behavior respond to the principal's statistical decision rule. Despite the complexity of the interaction, we show that the principal's errors exhibit clear monotonic behavior when segmented by an efficiently computable critical p-value threshold, leading to an interpretable characterization of their optimal p-value threshold. We empirically validate our model and these insights using publicly available data on drug approvals. Overall, our work offers a comprehensive perspective on strategic interactions within the hypothesis testing framework, providing technical and regulatory insights.
arXiv BibTeX

Haptic Intelligence Master Thesis Wrist-Worn Pressure Pulses for Phantom Directional Cues in VR Kadmani, A. Technical University of Munich, Munich, Germany, September 2025, M.Sc. in Electrical Engineering and Information Technology (Published)
Haptic feedback in today's VR systems is often limited to vibration delivered through handheld controllers, leaving a gap for compact devices that can convey spatial cues without occupying the hands. This thesis presents the design and evaluation of SuperCUTE, a wrist-worn pressure feedback device that uses four soft electrohydraulic actuators to elicit phantom tactile sensations around the wrist. The device was evaluated with n = 20 participants in a user study comprising two tasks. In Task 1 (circular GUI), single-actuator cues produced tightly clustered responses (median resultant length R = 0.92); about 70% of trials fell within ± 22.5° of the stimulated cardinal. Adjacent-actuator pairs yielded in-between percepts (about 70% of reports), and intensity imbalance shifted perceived location toward the stronger actuator; reported intensity was higher for strong than weak drives (mean 0.76 vs. 0.32). Across cues, Rayleigh tests indicated strong clustering of response angles (median R ≈ 0.82). In Task 2 (VR), hand trajectories during 5 s cues aligned with cue geometry; end-directions showed strong clustering (median R ≈ 0.78), and latency estimated from a 1 cm displacement threshold had a median of 1.25 s (IQR 0.61 s). Questionnaire responses indicated clear, comfortable, and usable cues. Overall, pressure pulses are a feasible approach for directional wrist cues in VR. We provide device documentation, datasets, and analysis code to support pressure-based wearable haptics.
BibTeX

Physical Intelligence Article Real-time in situ magnetization reprogramming for soft robotics Bao, X., Wang, F., Zhang, J., Li, M., Zhang, S., Ren, Z., Liao, J., Yan, Y., Kang, W., Zhang, R., Sitti, M. Nature, 645:375–384, August 2025 (Published)
Magnetic soft robots offer considerable potential across various scenarios, such as biomedical applications and industrial tasks, because of their shape programmability and reconfigurability, safe interaction and biocompatibility1,2,3,4. Despite recent advances, magnetic soft robots are still limited by the difficulties in reprogramming their required magnetization profiles in real time on the spot (in situ), which is essential for performing multiple functions or executing diverse tasks5,6. Here we introduce a method for real-time in situ magnetization reprogramming that enables the rearrangement and recombination of magnetic units to achieve diverse magnetization profiles. We explore the applications of this method in structures of varying dimensions, from one-dimensional tubes to three-dimensional frameworks, showcasing a diverse and expanded range of configurations and their deformations. This method also demonstrates versatility in diverse scenarios, including navigating around objects without undesired contact, reprogramming cilia arrays, managing multiple instruments cooperatively or independently under the same magnetic field, and manipulating objects of various shapes. These abilities extend the range of applications for magnetic actuation technologies. Furthermore, this method frees magnetic soft robots from the sole reliance on external magnetic fields for shape change, facilitating unprecedented modes and varieties of deformation while simultaneously reducing the need for complex magnetic field generation systems, thereby opening avenues for the development of magnetic actuation technologies.
DOI URL BibTeX

Social Foundations of Computation Algorithms and Society Article Performative Prediction: Past and Future Hardt, M., Mendler-Dünner, C. Statistical Science, Institute of Mathematical Statistics, August 2025 (Published)
Predictions in the social world generally influence the target of prediction, a phenomenon known as performativity. Self-fulfilling and self-negating predictions are examples of performativity. Of fundamental importance to economics, finance, and the social sciences, the notion has been absent from the development of machine learning. In machine learning applications, performativity often surfaces as distribution shift. A predictive model deployed on a digital platform, for example, influences consumption and thereby changes the data-generating distribution. We survey the recently founded area of performative prediction that provides a definition and conceptual framework to study performativity in machine learning. A consequence of performative prediction is a natural equilibrium notion that gives rise to new optimization challenges. Another consequence is a distinction between learning and steering, two mechanisms at play in performative prediction. The notion of steering is in turn intimately related to questions of power in digital markets. We review the notion of performative power that gives an answer to the question how much a platform can steer participants through its predictions. We end on a discussion of future directions, such as the role that performativity plays in contesting algorithmic systems.
arXiv URL BibTeX

Haptic Intelligence Miscellaneous The Benefits of Gait Retraining with Vibrotactile Feedback Outweigh Higher Perceived Mental Load Sundaram, V. H., Rokhmanova, N., Halilaj, E., Kuchenbecker, K. J. Extended abstract (1 page) presented at the American Society of Biomechanics Annual Meeting (ASB), Pittsburgh, USA, August 2025 (Published)
Knee osteoarthritis (KOA) affects millions worldwide, with excessive joint loading linked to disease progression. Modifying the foot progression angle (FPA) while walking is one strategy to reduce knee adduction moments, a measure associated with medial knee joint loading. This study investigated whether two types of vibrotactile biofeedback during a 20-minute treadmill gait-retraining session helped healthy adults better learn and retain a 10°toe-in gait. Participants who received feedback showed greater improvements in FPA accuracy than those without feedback and also reported significantly higher mental effort. The type of feedback that scaled the duration of the vibration with the magnitude of the error led to better short-term retention than no feedback, and it was also preferred by almost all subjects over constant-duration cues. These findings suggest that despite the added cognitive demand, users value biofeedback, emphasizing the need to design gait-retraining tools that consider both learning effectiveness and user experience.
BibTeX

Materials Article Sensitivity Enhancement of a Micro Ring Resonator-Based Photonic Sensor by Using a Gelatin Methacryloyl Functional Coating for the Detection of Metoprolol Tsianaka, A., Schweikert, C., Southan, A., Hoppe, N., Greul, M., Kaschel, M., Vogel, W., Berroth, M., Rademacher, G., Tovar, G. E. M. ACS Applied Optical Materials, 3(7):1556-1566, July 2025 (Published)
Aquatic environments are often contaminated with biopersistent pharmaceuticals, such as the β-blocker metoprolol. The quantitative determination of such pollutants is crucial for environmental monitoring. Therefore, a highly sensitive integrated photonic biosensor for the detection of minute concentrations of metoprolol is presented here. The sensor is based on a thermally robust ring resonator with a hydrogel coating for metoprolol adsorption. Hydrogels consisting of gelatin methacryloyl enabled an increase in the concentration of metoprolol ions in the vicinity of the photonic chip, resulting in high sensitivity of the sensor setup. Compared to an uncoated chip, an increase in sensitivity of up to a factor of 20 was observed. In combination with software-implemented signal processing, the setup showed a detection limit of less than 1 × 10–4 μmol mL–1. The combination of functional coating, thermally insensitive design, and applied digital signal postprocessing makes the system introduced here an attractive approach toward sensor-based wastewater analysis and monitoring.
pdf DOI URL BibTeX

Haptic Intelligence Miscellaneous A DNN-Based Metamodel for Simulating Fingertip Deformation Deshmukh, Y., Kuchenbecker, K. J., Serhat, G. Work-in-progress paper (2 pages) presented at the IEEE World Haptics Conference (WHC), Suwon, South Korea, July 2025 (Published) BibTeX