Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Empirical Inference Ph.D. Thesis Predictions, Policies, Rewards: Models of Decision-Making from Observational Data Pace, A. ETH Zurich, Switzerland, February 2025, ETH AI Center-Fellowship-Program (Published) BibTeX

Biomimetic Materials and Machines Article Ecosystem-Centered Robot Design: Toward Ecoresorbable Sustainability Robots (ESRs) Yilmaz, T., Fang, Y., Contreras, C., Schulz, A. K., Hartmann, F. Advanced Science, e09194:1-31, January 2025 (Published)
The deployment of robots and sensors across diverse ecosystems supports ecological monitoring, nature conservation, and exploration. However, retrieving these machines is often impractical or economically infeasible, posing risks to ecosystems through pollution, physical damage, and waste generation. To alleviate these risks, the development of transient systems from biodegradable materials represents a promising solution, enabling them to decompose harmlessly after use. Robots made from soft or functional polymers exhibit a unique potential in solving this challenge by drawing from a wide range of biomaterials, while simultaneously benefiting from intrinsic adaptability. Despite significant progress in the development of sustainable soft robotics, the influence of specific ecosystems on biodegradation is frequently overlooked. The environmental context is essential, as biodegradation depends largely on environmental factors unique to each ecosystem. In this review, a comprehensive overview of various ecosystems relevant to robot deployment is provided, offering critical context for assessing sustainability and deriving principles for ecosystem-centered robot design. Co-developing materials and sustainability robots with an understanding of their operational ecosystems paves the way for environmentally friendly machines, which are named ecoresorbable sustainability robots (ESRs), that coexist harmoniously with nature.
DOI URL BibTeX

Dynamic Locomotion Article How knee muscles and ground reaction forces shape knee buckling and ankle push-off in neuromuscular simulations of human walking Buchmann, A., Kiss, B., Badri-Spröwitz, A., Renjewski, D. Scientific Reports, 15:2249, January 2025 (Published)
Ankle push-off is important for efficient, human-like walking, and many prosthetic devices mimic push-off using motors or elastic elements. The knee is extended throughout the stance phase and begins to buckle just before push-off, with timing being crucial. However, the exact mechanisms behind this buckling are still unclear. We use a predictive neuromuscular simulation to investigate whether active muscles are required for knee buckling and to what extent ground reaction forces (GRFs) drive it. In a systematic parameter search, we tested how long the knee muscles vastus (VAS), gastrocnemius (GAS), and hamstrings could be deactivated while maintaining a stable gait with impulsive push-off. VAS deactivation up to 35\% of the gait cycle resulted in a dynamic gait with increased ankle peak power. GAS deactivation up to 20\% of the gait cycle was detrimental to gait efficiency and showed reduced ankle peak power. At the start of knee buckling, the GRF vector is positioned near the knee joint’s neutral axis, assisting in knee flexion. However, this mechanism is likely not enough to drive knee flexion independently. Our findings contribute to the biomechanical understanding of ankle push-off, with applications in prosthetic and bipedal robotic design, and fundamental research on human gait mechanics.
DOI URL BibTeX

Deep Models and Optimization Conference Paper Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture Movahedi, S., Orvieto, A., Moosavi-Dezfooli, S. In The Thirteenth International Conference on Learning Representations, ICLR 2025, The Thirteenth International Conference on Learning Representations, January 2025 (Accepted) BibTeX

Dynamic Locomotion Book Special issue on embodied intelligence-understanding animal locomotion and its robotic implementations Manoonponga, P., Badri-Spröwitz, A., Owaki, D. Advanced Robotics, 39:1-2, Taylor & Francis and RSJ, Milton, January 2025 (Published)
Embodied Intelligence (EI)’ refers to the innate ability of animals to utilize their body structures and interact with their environment (morphological computation) in conjunction with their brain and nervous systems (neural computation). This synergy enables them to achieve flexible, versatile, and robust locomotion, and allows them to learn and perform complex tasks throughout their lives. In modern robotics, where artificial intelligence (AI) is the driver for transformative advancements, the harmonious and continuous dynamic interaction between neural computation (including control, memory, and plasticity), the physical (flexible) body, and the environment – collectively referred to as ‘embodiment’ – remains a fundamental principle. Given that animals exhibit adaptive movement strategies across diverse real-world scenarios, understanding these strategies can pave the way for innovative robotic systems that reflect ‘nature intelligence’.
DOI URL BibTeX

Materials Article Simultaneous Selective and Quantitative Sensing of Diclofenac and Metoprolol via Electrical Conductance of Two Polyelectrolyte Hydrogels Tsianaka, A., Fichtel, K., Tovar, G. E. M., Southan, A. Advanced Sensor Research, 4(3):2400141, January 2025 (Published)
Hydrogels containing functional groups are highly interesting for sensor applications as they can change their physical properties by interaction with their environment. In this study, it is demonstrated that by monitoring the conductance of two different functional hydrogels, the concentrations of two different drugs in aqueous solution can be selectively and quantitatively measured simultaneously based on non-specific interactions. Detailed characterization of the competitive drug adsorption on the hydrogels allows the description of both hydrogel conductances as a function of the drug concentrations based on physical models. The result is a system of non-linear equations that can be solved for the drug concentrations. The different affinities and conductance responses of the hydrogels for the two drugs is a prerequisite, which is usually achieved with different materials. This approach is demonstrated with hydrogels based on poly(ethylene glycol), functionalized with the ionic monomers [2-(acryloyloxy)ethyl] trimethylammonium chloride (AETA) and 3-sulfopropyl acrylate potassium salt (SPA), and the drugs diclofenac and metoprolol. The hydrogel conductance is found to be linear with drug concentration in the hydrogels, which in turn is described by a non-linear Langmuir-type competitive adsorption isotherm. The proposed approach thus shows potential for future studies on more complex mixtures by including a larger variety of functional hydrogels.
pdf DOI URL BibTeX

Robust Machine Learning Conference Paper Cross-Entropy Is All You Need To Invert the Data Generating Process Reizinger, P., Bizeul, A., Juhos, A., Vogt, J. E., Balestriero, R., Brendel, W., Klindt, D. In January 2025 (Published) OpenReview BibTeX

Robust Machine Learning Conference Paper In Search of Forgotten Domain Generalization Mayilvahanan, P., Zimmermann, R. S., Wiedemer, T., Rusak, E., Juhos, A., Bethge, M., Brendel, W. In January 2025 (Published) OpenReview BibTeX

Robust Machine Learning Conference Paper Interaction Asymmetry: A General Principle for Learning Composable Abstractions Brady, J., von Kügelgen, J., Lachapelle, S., Buchholz, S., Kipf, T., Brendel, W. In January 2025 (Published) OpenReview BibTeX

Social Foundations of Computation Conference Paper Lawma: The Power of Specialization for Legal Tasks Dominguez-Olmedo, R., Nanda, V., Abebe, R., Bechtold, S., Engel, C., Frankenreiter, J., Gummadi, K., Hardt, M., Livermore, M. The Thirteenth International Conference on Learning Representations (ICLR 2025), January 2025 (Accepted)
Annotation and classification of legal text are central components of empirical legal research. Traditionally, these tasks are often delegated to trained research assistants. Motivated by the advances in language modeling, empirical legal scholars are increasingly turning to prompting commercial models, hoping that it will alleviate the significant cost of human annotation. Despite growing use, our understanding of how to best utilize large language models for legal tasks remains limited. We conduct a comprehensive study of 260 legal text classification tasks, nearly all new to the machine learning community. Starting from GPT-4 as a baseline, we show that it has non-trivial but highly varied zero-shot accuracy, often exhibiting performance that may be insufficient for legal work. We then demonstrate that a lightly fine-tuned Llama 3 model vastly outperforms GPT-4 on almost all tasks, typically by double-digit percentage points. We find that larger models respond better to fine-tuning than smaller models. A few tens to hundreds of examples suffice to achieve high classification accuracy. Notably, we can fine-tune a single model on all 260 tasks simultaneously at a small loss in accuracy relative to having a separate model for each task. Our work points to a viable alternative to the predominant practice of prompting commercial models. For concrete legal tasks with some available labeled data, researchers are better off using a fine-tuned open-source model.
ArXiv Code BibTeX

Social Foundations of Computation Conference Paper Limits to Scalable Evaluation at the Frontier: LLM as Judge Won’t Beat Twice the Data Dorner, F. E., Nastl, V. Y., Hardt, M. The Thirteenth International Conference on Learning Representations (ICLR 2025), January 2025 (Accepted)
High-quality annotations are increasingly a bottleneck in the explosively growing machine learning ecosystem. Scalable evaluation methods that avoid costly annotation have therefore become an important research ambition. Many hope to use strong existing models in lieu of costly labels to provide cheap model evaluations. Unfortunately, this method of using models as judges introduces biases, such as self-preferencing, that can distort model comparisons. An emerging family of debiasing tools promises to fix these issues by using a few high-quality labels to debias a large number of model judgments. In this paper, we study how far such debiasing methods, in principle, can go. Our main result shows that when the judge is no more accurate than the evaluated model, no debiasing method can decrease the required amount of ground truth labels by more than half. Our result speaks to the severe limitations of the LLM-as-a-judge paradigm at the evaluation frontier where the goal is to assess newly released models that are possibly better than the judge. Through an empirical evaluation, we demonstrate that the sample size savings achievable in practice are even more modest than what our theoretical limit suggests. Along the way, our work provides new observations about debiasing methods for model evaluation and points out promising avenues for future work.
arXiv URL BibTeX

Social Foundations of Computation Miscellaneous Training on the Test Task Confounds Evaluation and Emergence Dominguez-Olmedo, R., Dorner, F. E., Hardt, M. The Thirteenth International Conference on Learning Representations (ICLR 2025), January 2025 (Accepted)
We study a fundamental problem in the evaluation of large language models that we call training on the test task. Unlike wrongful practices like training on the test data, leakage, or data contamination, training on the test task is not malpractice. Rather, the term describes a growing set of techniques to include task-relevant data in the pretraining stage of a language model. We demonstrate that training on the test task confounds both relative model evaluations and claims about emergent capabilities. We argue that the seeming superiority of one model family over another may be explained by a different degree of training on the test task. To this end, we propose an effective method to adjust for training on the test task by fine-tuning each model under comparison on the same task-relevant data before evaluation. We then show that instances of emergent behavior largely vanish once we adjust for training on the test task. This also applies to reported instances of emergent behavior that cannot be explained by the choice of evaluation metric. Our work promotes a new perspective on the evaluation of large language models with broad implications for benchmarking and the study of emergent capabilities.
ArXiv BibTeX

Perceiving Systems Conference Paper OpenCapBench: A Benchmark to Bridge Pose Estimation and Biomechanics Gozlan, Y., Falisse, A., Uhlrich, S., Gatti, A., Black, M., Chaudhari, A. In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , January 2025 (Published)
Pose estimation has promised to impact healthcare by enabling more practical methods to quantify nuances of human movement and biomechanics. However, despite the inherent connection between pose estimation and biomechanics, these disciplines have largely remained disparate. For example, most current pose estimation benchmarks use metrics such as Mean Per Joint Position Error, Percentage of Correct Keypoints, or mean Average Precision to assess performance, without quantifying kinematic and physiological correctness - key aspects for biomechanics. To alleviate this challenge, we develop OpenCapBench to offer an easy-to-use unified benchmark to assess common tasks in human pose estimation, evaluated under physiological constraints. OpenCapBench computes consistent kinematic metrics through joints angles provided by an open-source musculoskeletal modeling software (OpenSim). Through OpenCapBench, we demonstrate that current pose estimation models use keypoints that are too sparse for accurate biomechanics analysis. To mitigate this challenge, we introduce SynthPose, a new approach that enables finetuning of pre-trained 2D human pose models to predict an arbitrarily denser set of keypoints for accurate kinematic analysis through the use of synthetic data. Incorporating such finetuning on synthetic data of prior models leads to twofold reduced joint angle errors. Moreover, OpenCapBench allows users to benchmark their own developed models on our clinically relevant cohort. Overall, OpenCapBench bridges the computer vision and biomechanics communities, aiming to drive simultaneous advances in both areas.
arXiv code/data URL BibTeX

Deep Models and Optimization Conference Paper Using Shapley interactions to understand how models use structure Divyansh Singhvi, D. M. A. E. R. J. I. P. N. S. In Proceedings ACL, 1-20, Vienna Center, Association for Computational Linguistics (ACL 2025), 2025 (Accepted) DOI URL BibTeX

Learning and Dynamical Systems Conference Paper Adversarial Training for Defense Against Label Poisoning Attacks Bal, M. I., Cevher, V., Muehlebach, M. In International Conference on Learning Representations, 2025 (Accepted) BibTeX

Dynamic Locomotion Conference Paper Bird-inspired tendon coupling improves paddling efficiency by shortening phase transition times Lin, J., Zhao, G., Badri-Spröwitz, A. Proceedings of ICRA 2025, 6, arxiv, NY, ICRA, 2025 (Accepted)
Drag-based swimming with rowing appendages, fins, and webbed feet is a widely adapted locomotion form in aquatic animals. To develop effective underwater and swimming vehicles, a wide range of bioinspired drag-based paddles have been proposed, often faced with a trade-off between propulsive efficiency and versatility. Webbed feet provide an effective propulsive force in the power phase, are light weight and robust, and can even be partially folded away in the recovery phase. However, during the transition between recovery and power phase, much time is lost folding and unfolding, leading to drag and reducing efficiency. In this work, we took inspiration from the coupling tendons of aquatic birds and utilized tendon coupling mechanisms to shorten the transition time between recovery and power phase. Results from our hardware experiments show that the proposed mechanisms improve propulsive efficiency by 2.0 and 2.4 times compared to a design without extensor tendons or based on passive paddle, respectively. We further report that distal leg joint clutching, which has been shown to improve efficiency in terrestrial walking, did not play an major role in swimming locomotion. In sum, we describe a new principle for an efficient, drag-based leg and paddle design, with potential relevance for the swimming mechanics in aquatic birds.
DOI URL BibTeX

Neuromechanics of Movement Organizational Leadership and Diversity Article Building bridges: allyship as a catalyst for gender diversity and inclusion in experimental biology communities M. Janneke Schwaner, , Keplinger, K. 2025 (Published)
Diversity drives innovation and creativity, directly contributing to scientific excellence. However, achieving equity in academia, including in experimental biology fields such as biomechanics and comparative physiology, remains a significant challenge, with women and other historically marginalized groups underrepresented, especially in more senior roles. When considering gender, the disparity is often linked to difficulties in balancing family responsibilities with demanding careers, along with lower ‘academic visibility’, as evidenced by fewer professional awards for women scientists. Many successful women who balance career and family keep their family lives private, making these aspects invisible to early career scholars, and thus depriving them of role models. To help close the gender gap, in this Perspective, we propose 10 actionable strategies for scholars at all career stages to promote gender diversity and inclusion through active allyship. Although we focus on gender diversity, these strategies can be broadly applied to harness the benefits of other diversity dimensions (e.g. age or ethnicity). We argue that embracing allyship benefits individual scientists, their research groups, the quality of their research, the broader research community and society at large by enhancing collective scientific output and inspiring the next generation of scientists.
URL BibTeX

Human Aspects of Machine Learning Article Causal fair metric: Bridging causality, individual fairness, and adversarial robustness Ehyaei, A. R., Farnadi, G., Samadi, S. Transactions on Machine Learning Research, 2025 (Accepted) BibTeX

Organizational Leadership and Diversity Article Chatting Towards Inclusivity: A Digital Approach to Inclusion Action Plans and Leader Development Singh, V., Rivin, J. M., van Wagoner, H. P., Keplinger, K., Barbuto, J. 2025 (Published)
Inclusion is a cornerstone of success for organizations and society, yet inclusion is not guaranteed. Building on inclusive leadership research and relational models theory, we argue that inclusion cannot manifest without systematic effort and planning by leaders. Unfortunately, few resources exist to help leaders plan and enact specific inclusion behaviors. To address this, we introduce the “Leader Success Bot,” an innovative conversational chatbot designed to help leaders develop daily inclusion action plans. Through our immersive longitudinal design and mixed methods data, we advance the taxonomy of inclusive leader behaviors and test the impact of inclusion planning on leaders and followers. We demonstrate how equality matching is an overlooked relational model that is a pivotal relational dynamic for inclusion. Across two studies, our quantitative and qualitative findings show that equitable exchanges by leaders can foster a deeper sense of belonging and community. As leaders interact with the chatbot, both leaders and followers are more likely to accomplish their goals. Additionally, followers' inclusion climate and psychological safety benefited, leading to a decrease in turnover intentions. Our findings underscore the potential of chatbots to support inclusive leadership training and development by providing leaders with a structured, scalable platform for continuous reflection and growth. This research advances theoretical understanding of relational inclusion dynamics and offers practical insights and a scalable tool for HR managers seeking to build more inclusive, psychologically safe cultures.
DOI BibTeX

Learning and Dynamical Systems Conference Paper Conformal Generative Modeling with Improved Sample Efficiency through Sequential Greedy Filtering Kladny, K., Schölkopf, B., Muehlebach, M. In International Conference On Learning Representations, International Conference on Learning Representations, 2025 (Accepted) URL BibTeX

Perceiving Systems Thesis Dynamic 3D Synthesis: From Video-Based Animatable Head Avatars to Text-Guided 4D Content Creation Zheng, Y. 2025 (Published)
The synthesis of 4D content—dynamic 3D content that evolves over time—has become increasingly important across a wide range of applications, including virtual communication, gaming, AR/VR, and digital content creation. Despite recent advances, generating realistic 4D content from accessible inputs remains a significant challenge. Existing approaches often rely on dense multi-camera capture systems, which are costly and impractical for everyday use, or yield results with limited geometric and visual fidelity. This thesis investigates two sub tasks in 4D content creation: (1) the reconstruction of high-fidelity, animatable head avatars from accessible inputs such as monocular RGB videos, and (2) the generation of dynamic 4D scenes from text prompts and optionally sparse visual input, such as reference images. These two directions are unified by a common goal—enabling controllable and high-quality 4D content creation from minimal visual supervision. The first part of this thesis presents IMavatar, a morphable implicit surface representation for reconstructing personalized head avatars from monocular videos. Implicit surfaces provide topological flexibility and can recover detailed 3D geometry directly from RGB images, making them well-suited for head avatar reconstruction. However, modeling expression- and pose-dependent deformations in an interpretable and generalizable way remains a major challenge when working with implicit representations. Inspired by 3D morphable models, IMavatar models deformation by learning expression blendshapes and skinning weight fields in a canonical space, enabling structured and generalizable control over novel expressions and poses. To enable end-to-end optimization from monocular videos, we propose a novel analytical gradient formulation that supports joint training of the geometry and deformation directly from RGB supervision. By combining the geometric fidelity of neural implicit fields with the controllability of morphable models, IMavatar achieves high-quality 4D reconstructions and strong generalization to unseen expressions and head poses. The second part of this thesis presents PointAvatar, a deformable point-based representation for animatable 3D head avatars. While implicit representations are effective at learning detailed geometry from image observations, they are inherently difficult to animate and computationally expensive to render. To address these limitations, this work explores point clouds as the underlying geometric representation for head avatars, offering the efficiency of explicit representations while avoiding the fixed-topology constraints of meshes. PointAvatar uses a canonical point cloud combined with learned blendshape and skinning weight fields, and further disentangles intrinsic albedo from view-dependent shading to support relighting under novel illumination. To improve training stability and reconstruction quality, we adopt a coarse-to-fine strategy that gradually increases point cloud resolution during learning. This enables the model to effectively capture accurate geometry and high-quality texture from monocular RGB videos, including challenging cases such as eyeglasses and complex hairstyles. Compared to IMavatar, PointAvatar achieves an 8× speed-up during training and a 100× speed-up during inference rendering, while maintaining high visual and geometric quality. In the final part, this thesis explores Dream-in-4D, a diffusion-guided framework for generating creative 4D content from natural language. The focus is on synthesizing imaginative 4D scenes from minimal visual input—either a single image or no visual input at all. To this end, the method leverages prior knowledge from pre-trained image and video diffusion models to optimize a 4D representation. Dream-in-4D follows a two-stage pipeline. In the first stage, a static 3D model is optimized as a neural radiance field using guidance from both image and 3D-aware diffusion models, resulting in high-quality, view-consistent assets. In the second stage, a time-dependent, multi-resolution deformation field is introduced to represent motion and is optimized using video diffusion guidance, equipping the static 3D asset with detailed and plausible motion driven by text prompts. The resulting system supports text-to-4D, image-to-4D, and personalized 4D generation within a unified framework, enabling intuitive and flexible dynamic scene synthesis from highly accessible inputs. Together, these methods address two essential aspects of 4D content creation: the reconstruction of animatable head avatars from monocular videos, and the generation of dynamic, imaginative 4D scenes from text and image prompts. We hope these contributions advance the field toward more accessible, controllable, and high-quality 4D content creation—enabling a broad range of applications across research, industry, and creative practice.
DOI URL BibTeX

Robotic Composites and Compositions Article Emergent patterns of interaction with dynamic objects Aktaş, B., Myers, P., Salem, E., Klatzky, R., Howe, R. PLOS ONE, 20:e0331844, 2025 (Published)
Perception by touch is fundamentally linked to the motor system. A hallmark of this linkage takes the form of stereotyped haptic “exploratory procedures” [1], movement patterns that emerge when people set a perceptual goal such as judging the roughness of a textured surface. This paper expands the study of touch-directed movements by asking what patterns emerge when people encounter and interact with novel objects without explicitly specified goals. Participants were invited to freely interact with an art installation containing novel objects with distinct design features, intended to vary familiarity, structural affordance, and aesthetic response. Objects’ affordances were additionally varied over time by utilizing jamming, a physical mechanism that induces changes in stiffness and plasticity. From video recordings, four categories of spontaneous “interactive procedures” differentiated by underlying goals were reliably identified: passive observational, active perceptual, constructive, and hedonic. Perceptual actions were most frequent, indicating an overriding goal of acquiring information about physical properties. The prevalence of other interactive procedures varied across objects, demonstrating the influence of perceptual affordances and prior knowledge. Changes in state further moderated interactions, such that interactions were longer in the stiff/jammed state, and the occurrence of a state change during an interactive procedure lengthened its duration. These findings extend our understanding of haptic exploration beyond explicitly goal-directed contexts, revealing how spontaneous responses in complex and dynamic environments are linked to perceptual outcomes and prior knowledge.
DOI URL BibTeX

Empirical Inference Technical Report International AI Safety Report Bengio, Y., Mindermann, S., Privitera, D., Besiroglu, T., Bommasani, R., Casper, S., Choi, Y., Fox, P., Garfinkel, B., Goldfarb, D., Heidari, H., Ho, A., Kapoor, S., Khalatbari, L., Longpre, S., Manning, S., Mavroudis, V., Mazeika, M., Michael, J., Newman, J., et al. (DSIT 2025/001), 2025 (Published) URL BibTeX

Perceiving Systems Conference Paper Joker: Conditional 3D Head Synthesis with Extreme Facial Expressions Prinzler, M., Zakharov, E., Sklyarova, V., Kabadayi, B., Thies, J. In International Conference on 3D Vision (3DV), International Conference on 3D Vision, 2025 (Published)
We introduce Joker, a new method for the conditional synthesis of 3D human heads with extreme expressions. Given a single reference image of a person, we synthesize a volumetric human head with the reference’s identity and a new expression. We offer control over the expression via a 3D morphable model (3DMM) and textual inputs. This multi-modal conditioning signal is essential since 3DMMs alone fail to define subtle emotional changes and extreme expressions, including those involving the mouth cavity and tongue articulation. Our method is built upon a 2D diffusion-based prior that generalizes well to out-of-domain samples, such as sculptures, heavy makeup, and paintings while achieving high levels of expressiveness. To improve view consistency, we propose a new 3D distillation technique that converts predictions of our 2D prior into a neural radiance field (NeRF). Both the 2D prior and our distillation technique produce state-of-the-art results, which are confirmed by our extensive evaluations. Also, to the best of our knowledge, our method is the first to achieve view-consistent extreme tongue articulation.
project page arxiv BibTeX

Physical Intelligence Article Magnetoelectric film for wireless low-frequency neuromodulationMagnetoelectric film for wireless low-frequency neuromodulation Aydin, A., Jahanshahi, A., Esmaeili-Dokht, P., Han, M., Gardi, G., Temel, Y., Sitti, M. Brain Stimulation: Basic, Translational, and Clinical Research in Neuromodulation, 18:284, 2025 (Published)
Wireless neuromodulation techniques are widely investigated to address the challenges associated with conventional neurostimulation devices. Previous research has relied on ultrasound, light and magnetic fields as the modalities for remotely powering neuronal implants. Use of magnetic fields has been promising for wireless neuronal interfaces since they have excellent tissue penetration. Magnetically powered devices typically work with >100 kHz electromagnetic fields; therefore, they are heavily dependent on the on-board electronics to regulate output signal. Moreover, use of such high frequency is a limiting factor for safe use, especially in deeper areas due to tissue absorption. Magnetoelectric (ME) approach is a promising method that stems from the magneto-electrical coupling. It is a high throughput approach for power delivery through magnetic fields in low frequency regimes compared to far-field or inductive coupling. In this study, we aim to understand how ME approach can be used to modulate neuronal behavior in non-resonant frequency regimes. We fabricated ME planar films through laminating magnetostrictive and piezoelectric components. We initially defined the output electrical potential as the main design parameter and subsequently optimize the device geometry and applied magnetic field profile to achieve the best possible performance. We were able to observe current density of ∼ 4-6 μA/cm2 in phosphate-buffered saline environment under 10 Hz input magnetic field. Lastly, we investigated neuromodulation potential of the ME films in-vitro through calcium imaging studies. Our preliminary results show that primary hippocampal neurons have significantly increased calcium influx during stimulation compared to pre-stimulation phase. Stimulation efficiency was further investigated with changing stimulation duration and input magnetic field waveforms. Overall, these results show that ME films are promising candidates of neuronal interfaces for wireless electrical modulation. Future work will be conducted to understand exact mechanisms of neuromodulation and design such interfaces in an implantable miniature form for in-vivo studies.
DOI URL BibTeX

Empirical Inference Book Chapter Natural Language Processing Jin, Z., Mihalcea, R., Schölkopf, B. In Elgar Encyclopedia of Political Communication, (Editors: Nai, A. and Grömping, M. and Wirz, D.), Edward Elgar Publishing, 2025 (Published) PDF URL BibTeX

Social Foundations of Computation Book The Emerging Science of Machine Learning Benchmarks Hardt, M. 2025 (Published)
Machine learning turns on one simple trick: Split the data into training and test sets. Anything goes on the training set. Rank models on the test set and let model builders compete. Call it a benchmark. Machine learning researchers cherish a good tradition of lamenting the apparent shortcomings of benchmarks. Critics argue that static test sets and metrics promote narrow research objectives, stifling more creative scientific pursuits. Benchmarks also incentivize gaming; in fact, Goodhart's Law cautions against applying competitive pressure to statistical measurement. Over time, researchers may overfit to benchmarks, building models that exploit data artifacts. As a result, test set performance draws a skewed picture of model capabilities that deceives us—especially when comparing humans and machines. To top off the list of issues, there are a slew of reasons why things don't transfer well from benchmarks to the real world.
Website URL BibTeX

Organizational Leadership and Diversity Article Navigating AI Convergence in Human–Artificial Intelligence Teams: A Signaling Theory Approach Smith, A., Van Wagoner, P., Keplinger, K., Celebi, C. Journal of Organizational Behavior, 10.1002/job.2856:10.1002/job.2856, December 2024 (Published)
Teams that combine human intelligence with artificial intelligence (AI) have become indispensable for solving complex tasks in various decision-making contexts in modern organizations. However, the factors that contribute to AI convergence, where human team members align their decisions with those of their AI counterparts, still remain unclear. This study integrates signaling theory with self-determination theory to investigate how specific signals—such as signal fit, optional AI advice, and signal set congruence—affect employees' AI convergence in human–AI teams. Based on four experimental studies conducted in facial recognition and hiring contexts with approximately 1100 participants, the findings highlight the significant positive impact of congruent signals from both human and AI team members on AI convergence. Moreover, providing an option for employees to solicit AI advice also enhances AI convergence; when AI signals are chosen by employees rather than forced upon them, participants are more likely to accept AI advice. This research advances knowledge on human–AI teaming by (1) expanding signaling theory into the human–AI team context; (2) developing a deeper understanding of AI convergence and its drivers in human–AI teams; (3) providing actionable insights for designing teams and tasks to optimize decision-making in high-stakes, uncertain environments; and (4) introducing facial recognition as an innovative context for human–AI teaming.
Navigating AI Convergence in Human–Artificial Intelligence Teams Navigating AI Convergence in Human–Artificial Intelligence Teams DOI URL BibTeX

Perceiving Systems Book Chapter ElephantBook: Participatory Human–AI Elephant Population Monitoring Kulits, P., Wall, J., Beery, S. In Collaborative Intelligence: How Humans and AI Are Transforming Our World, 173-196, 7, (Editors: Lane, Mira and Sethumadhavan, Arathi), The MIT Press, Cambridge, Massachusetts, December 2024 (Published) URL BibTeX

Safety- and Efficiency- aligned Learning Conference Paper Efficiently Dispatching Flash Attention For Partially Filled Attention Masks Sharma, A., Geiping, J. In ENSLP NeurIPS Workshop 2024, ENSLP NeurIPS Workshop 2024, ENSLP NeurIPS Workshop, December 2024 (Published)
Transformers are widely used across various applications, many of which yield sparse or partially filled attention matrices. Examples include attention masks designed to reduce the quadratic complexity of attention, sequence packing techniques, and recent innovations like tree masking for fast validation in MEDUSA. Despite the inherent sparsity in these matrices, the state-of-the-art algorithm Flash Attention still processes them with quadratic complexity as though they were dense. In this paper, we introduce Binary Block Masking, a highly efficient modification that enhances Flash Attention by making it mask-aware. We further propose two optimizations: one tailored for masks with contiguous non-zero patterns and another for extremely sparse masks. Our experiments on attention masks derived from real-world scenarios demonstrate up to a 9x runtime improvement. The implementation will be publicly released to foster further research and application.
URL BibTeX

Robotic Materials Organizational Leadership and Diversity Article Accelerating the pace of innovation in robotics by fostering diversity and inclusive leadership Macari, D., Fratzl, A., Keplinger, K., Keplinger, C. Science Robotics, 9, December 2024 (Published)
Diverse and inclusive teams are not merely a moral imperative but also a catalyst for scientific excellence in robotics. Drawing from literature, a comprehensive citation analysis, and expert interviews, we derive seven main benefits of diversity and inclusion and propose a leadership guide for roboticists to reap these benefits.
DOI URL BibTeX

Perceiving Systems Conference Paper MotionFix: Text-Driven 3D Human Motion Editing Athanasiou, N., Cseke, A., Diomataris, M., Black, M. J., Varol, G. In SIGGRAPH Asia 2024 Conference Proceedings, ACM, SIGGRAPH Asia , December 2024 (Published)
The focus of this paper is 3D motion editing. Given a 3D human motion and a textual description of the desired modification, our goal is to generate an edited motion as described by the text. The challenges include the lack of training data and the design of a model that faithfully edits the source motion. In this paper, we address both these challenges. We build a methodology to semi-automatically collect a dataset of triplets in the form of (i) a source motion, (ii) a target motion, and (iii) an edit text, and create the new dataset. Having access to such data allows us to train a conditional diffusion model that takes both the source motion and the edit text as input. We further build various baselines trained only on text-motion pairs datasets and show superior performance of our model trained on triplets. We introduce new retrieval-based metrics for motion editing and establish a new benchmark on the evaluation set. Our results are encouraging, paving the way for further research on fine-grained motion generation. Code and models will be made publicly available.
Code (GitHub) Website Data Exploration ArXiv URL BibTeX

Empirical Inference Conference Paper From Causal to Concept-Based Representation Learning Rajendran*, G., Buchholz*, S., Aragam, B., Schölkopf, B., Ravikumar, P. K. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:101250-101296, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Learning Partitions from Context Buchholz, S. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:140066-140112, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving Didolkar, A. R., Goyal, A., Ke, N. R., Guo, S., Valko, M., Lillicrap, T. P., Rezende, D. J., Bengio, Y., Mozer, M. C., Arora, S. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:19783-19812, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper A Generative Model of Symmetry Transformations Allingham, J. U., Mlodozeniec, B. K., Padhy, S., Antorán, J., Krueger, D., Turner, R. E., Nalisnick, E., Hernández-Lobato, J. M. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:91091-91130, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Article A Randomized Controlled Trial on Anonymizing Reviewers to Each Other in Peer Review Discussions Rastogi, C., Song, X., Jin, Z., Stelmakh, I., Daumé III, H., Zhang, K., Shah, N. B. PLOS ONE, 19(12), Public Library of Science, December 2024 (Published) DOI URL BibTeX

Social Foundations of Computation Algorithms and Society Conference Paper Algorithmic Collective Action in Recommender Systems: Promoting Songs by Reordering Playlists Baumann, J., Mendler-Dünner, C. Advances in Neural Information Processing Systems 37 (NeurIPS 2024) , The Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), December 2024 (Published)
We investigate algorithmic collective action in transformer-based recommender systems. Our use case is a collective of fans aiming to promote the visibility of an artist by strategically placing one of their songs in the existing playlists they control. The success of the collective is measured by the increase in test-time recommendations of the targeted song. We introduce two easily implementable strategies towards this goal and test their efficacy on a publicly available recommender system model released by a major music streaming platform. Our findings reveal that even small collectives (controlling less than 0.01 of the training data) can achieve up 25x amplification of recommendations by strategically choosing the position at which to insert the song. We then focus on investigating the externalities of the strategy. We find that the performance loss for the platform is negligible, and the recommendations of other songs are largely preserved, minimally impairing the user experience of participants. Moreover, the costs are evenly distributed among other artists. Taken together, our findings demonstrate how collective action strategies can be effective while not necessarily being adversarial, raising new questions around incentives, social dynamics, and equilibria in recommender systems.
arXiv URL BibTeX

Empirical Inference Conference Paper Alien Recombination: Exploring Concept Blends Beyond Human Cognitive Availability in Visual Art Hernandez, A., Brinkmann, L., Serna, I., Rahaman, N., Alhaija, H. A., Yakura, H., Sola, M. C., Schölkopf, B., Rahwan, I. NeurIPS 2024 Workshop on Creativity and Generative AI, December 2024 (Published) arXiv BibTeX

Social Foundations of Computation Algorithms and Society Conference Paper An Engine Not a Camera: Measuring Performative Power of Online Search Mendler-Dünner, C., Carovano, G., Hardt, M. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), The Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), December 2024 (Published)
The power of digital platforms is at the center of major ongoing policy and regulatory efforts. To advance existing debates, we designed and executed an experiment to measure the power of online search providers, building on the recent definition of performative power. Instantiated in our setting, performative power quantifies the ability of a search engine to steer web traffic by rearranging results. To operationalize this definition we developed a browser extension that performs unassuming randomized experiments in the background. These randomized experiments emulate updates to the search algorithm and identify the causal effect of different content arrangements on clicks. We formally relate these causal effects to performative power. Analyzing tens of thousands of clicks, we discuss what our robust quantitative findings say about the power of online search engines. More broadly, we envision our work to serve as a blueprint for how performative power and online experiments can be integrated with future investigations into the economic power of digital platforms.
ArXiv URL BibTeX

Haptic Intelligence Ph.D. Thesis Capturing and Recognizing Multimodal Surface Interactions as Embedded High-Dimensional Distributions Khojasteh, B. University of Stuttgart, Stuttgart, Germany, December 2024, Faculty of Engineering Design, Production Engineering and Automotive Engineering (Published)
Exploring a surface with a handheld tool generates complex contact signals that uniquely encode the surface's properties-a needle hidden in a haystack of data. Humans naturally integrate visual, auditory, and haptic sensory data during these interactions to accurately assess and recognize surfaces. However, enabling artificial systems to perceive and recognize surfaces with human-like proficiency remains a significant challenge. The complexity and dimensionality of multi-modal sensor data, particularly in the intricate and dynamic modality of touch, hinders effective sensing and processing. Successfully overcoming these challenges will open up new possibilities in applications such as quality control, material documentation, and robotics. This dissertation addresses these issues at the levels of both the sensing hardware and the processing algorithms by introducing an automated similarity framework for multimodal surface recognition, developing a haptic-auditory test bed for acquiring high-quality surface data, and exploring optimal sensing configurations to improve recognition performance and robustness.
BibTeX

Empirical Inference Conference Paper Causal vs. Anticausal merging of predictors Garrido Mejia, S., Blöbaum, P., Schölkopf, B., Janzing, D. Advances in Neural Information Processing Systems 37 (NeurIPS 2024) , 37:1402-1427, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Ph.D. Thesis Causality for Natural Language Processing Jin, Z. University of Tübingen, Germany, December 2024, (ELLIS PhD student program) (Published) URL BibTeX

Empirical Inference Conference Paper Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias Chen*, Y., Vethavikashini*, C. R., Mattern*, J., Mihalcea, R., Jin, Z. NeurIPS 2024 Workshop on Causality and Language Models (CaLM), December 2024, *equal contribution (Published) DOI URL BibTeX