Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Organizational Leadership and Diversity Article Chatting Towards Inclusivity: A Digital Approach to Inclusion Action Plans and Leader Development Singh, V., Rivin, J. M., van Wagoner, H. P., Keplinger, K., Barbuto, J. 2025 (Published)
Inclusion is a cornerstone of success for organizations and society, yet inclusion is not guaranteed. Building on inclusive leadership research and relational models theory, we argue that inclusion cannot manifest without systematic effort and planning by leaders. Unfortunately, few resources exist to help leaders plan and enact specific inclusion behaviors. To address this, we introduce the “Leader Success Bot,” an innovative conversational chatbot designed to help leaders develop daily inclusion action plans. Through our immersive longitudinal design and mixed methods data, we advance the taxonomy of inclusive leader behaviors and test the impact of inclusion planning on leaders and followers. We demonstrate how equality matching is an overlooked relational model that is a pivotal relational dynamic for inclusion. Across two studies, our quantitative and qualitative findings show that equitable exchanges by leaders can foster a deeper sense of belonging and community. As leaders interact with the chatbot, both leaders and followers are more likely to accomplish their goals. Additionally, followers' inclusion climate and psychological safety benefited, leading to a decrease in turnover intentions. Our findings underscore the potential of chatbots to support inclusive leadership training and development by providing leaders with a structured, scalable platform for continuous reflection and growth. This research advances theoretical understanding of relational inclusion dynamics and offers practical insights and a scalable tool for HR managers seeking to build more inclusive, psychologically safe cultures.
DOI BibTeX

Learning and Dynamical Systems Conference Paper Conformal Generative Modeling with Improved Sample Efficiency through Sequential Greedy Filtering Kladny, K., Schölkopf, B., Muehlebach, M. In International Conference On Learning Representations, International Conference on Learning Representations, 2025 (Accepted) URL BibTeX

Perceiving Systems Thesis Dynamic 3D Synthesis: From Video-Based Animatable Head Avatars to Text-Guided 4D Content Creation Zheng, Y. 2025 (Published)
The synthesis of 4D content—dynamic 3D content that evolves over time—has become increasingly important across a wide range of applications, including virtual communication, gaming, AR/VR, and digital content creation. Despite recent advances, generating realistic 4D content from accessible inputs remains a significant challenge. Existing approaches often rely on dense multi-camera capture systems, which are costly and impractical for everyday use, or yield results with limited geometric and visual fidelity. This thesis investigates two sub tasks in 4D content creation: (1) the reconstruction of high-fidelity, animatable head avatars from accessible inputs such as monocular RGB videos, and (2) the generation of dynamic 4D scenes from text prompts and optionally sparse visual input, such as reference images. These two directions are unified by a common goal—enabling controllable and high-quality 4D content creation from minimal visual supervision. The first part of this thesis presents IMavatar, a morphable implicit surface representation for reconstructing personalized head avatars from monocular videos. Implicit surfaces provide topological flexibility and can recover detailed 3D geometry directly from RGB images, making them well-suited for head avatar reconstruction. However, modeling expression- and pose-dependent deformations in an interpretable and generalizable way remains a major challenge when working with implicit representations. Inspired by 3D morphable models, IMavatar models deformation by learning expression blendshapes and skinning weight fields in a canonical space, enabling structured and generalizable control over novel expressions and poses. To enable end-to-end optimization from monocular videos, we propose a novel analytical gradient formulation that supports joint training of the geometry and deformation directly from RGB supervision. By combining the geometric fidelity of neural implicit fields with the controllability of morphable models, IMavatar achieves high-quality 4D reconstructions and strong generalization to unseen expressions and head poses. The second part of this thesis presents PointAvatar, a deformable point-based representation for animatable 3D head avatars. While implicit representations are effective at learning detailed geometry from image observations, they are inherently difficult to animate and computationally expensive to render. To address these limitations, this work explores point clouds as the underlying geometric representation for head avatars, offering the efficiency of explicit representations while avoiding the fixed-topology constraints of meshes. PointAvatar uses a canonical point cloud combined with learned blendshape and skinning weight fields, and further disentangles intrinsic albedo from view-dependent shading to support relighting under novel illumination. To improve training stability and reconstruction quality, we adopt a coarse-to-fine strategy that gradually increases point cloud resolution during learning. This enables the model to effectively capture accurate geometry and high-quality texture from monocular RGB videos, including challenging cases such as eyeglasses and complex hairstyles. Compared to IMavatar, PointAvatar achieves an 8× speed-up during training and a 100× speed-up during inference rendering, while maintaining high visual and geometric quality. In the final part, this thesis explores Dream-in-4D, a diffusion-guided framework for generating creative 4D content from natural language. The focus is on synthesizing imaginative 4D scenes from minimal visual input—either a single image or no visual input at all. To this end, the method leverages prior knowledge from pre-trained image and video diffusion models to optimize a 4D representation. Dream-in-4D follows a two-stage pipeline. In the first stage, a static 3D model is optimized as a neural radiance field using guidance from both image and 3D-aware diffusion models, resulting in high-quality, view-consistent assets. In the second stage, a time-dependent, multi-resolution deformation field is introduced to represent motion and is optimized using video diffusion guidance, equipping the static 3D asset with detailed and plausible motion driven by text prompts. The resulting system supports text-to-4D, image-to-4D, and personalized 4D generation within a unified framework, enabling intuitive and flexible dynamic scene synthesis from highly accessible inputs. Together, these methods address two essential aspects of 4D content creation: the reconstruction of animatable head avatars from monocular videos, and the generation of dynamic, imaginative 4D scenes from text and image prompts. We hope these contributions advance the field toward more accessible, controllable, and high-quality 4D content creation—enabling a broad range of applications across research, industry, and creative practice.
DOI URL BibTeX

Robotic Composites and Compositions Article Emergent patterns of interaction with dynamic objects Aktaş, B., Myers, P., Salem, E., Klatzky, R., Howe, R. PLOS ONE, 20:e0331844, 2025 (Published)
Perception by touch is fundamentally linked to the motor system. A hallmark of this linkage takes the form of stereotyped haptic “exploratory procedures” [1], movement patterns that emerge when people set a perceptual goal such as judging the roughness of a textured surface. This paper expands the study of touch-directed movements by asking what patterns emerge when people encounter and interact with novel objects without explicitly specified goals. Participants were invited to freely interact with an art installation containing novel objects with distinct design features, intended to vary familiarity, structural affordance, and aesthetic response. Objects’ affordances were additionally varied over time by utilizing jamming, a physical mechanism that induces changes in stiffness and plasticity. From video recordings, four categories of spontaneous “interactive procedures” differentiated by underlying goals were reliably identified: passive observational, active perceptual, constructive, and hedonic. Perceptual actions were most frequent, indicating an overriding goal of acquiring information about physical properties. The prevalence of other interactive procedures varied across objects, demonstrating the influence of perceptual affordances and prior knowledge. Changes in state further moderated interactions, such that interactions were longer in the stiff/jammed state, and the occurrence of a state change during an interactive procedure lengthened its duration. These findings extend our understanding of haptic exploration beyond explicitly goal-directed contexts, revealing how spontaneous responses in complex and dynamic environments are linked to perceptual outcomes and prior knowledge.
DOI URL BibTeX

Empirical Inference Technical Report International AI Safety Report Bengio, Y., Mindermann, S., Privitera, D., Besiroglu, T., Bommasani, R., Casper, S., Choi, Y., Fox, P., Garfinkel, B., Goldfarb, D., Heidari, H., Ho, A., Kapoor, S., Khalatbari, L., Longpre, S., Manning, S., Mavroudis, V., Mazeika, M., Michael, J., Newman, J., et al. (DSIT 2025/001), 2025 (Published) URL BibTeX

Perceiving Systems Conference Paper Joker: Conditional 3D Head Synthesis with Extreme Facial Expressions Prinzler, M., Zakharov, E., Sklyarova, V., Kabadayi, B., Thies, J. In International Conference on 3D Vision (3DV), International Conference on 3D Vision, 2025 (Published)
We introduce Joker, a new method for the conditional synthesis of 3D human heads with extreme expressions. Given a single reference image of a person, we synthesize a volumetric human head with the reference’s identity and a new expression. We offer control over the expression via a 3D morphable model (3DMM) and textual inputs. This multi-modal conditioning signal is essential since 3DMMs alone fail to define subtle emotional changes and extreme expressions, including those involving the mouth cavity and tongue articulation. Our method is built upon a 2D diffusion-based prior that generalizes well to out-of-domain samples, such as sculptures, heavy makeup, and paintings while achieving high levels of expressiveness. To improve view consistency, we propose a new 3D distillation technique that converts predictions of our 2D prior into a neural radiance field (NeRF). Both the 2D prior and our distillation technique produce state-of-the-art results, which are confirmed by our extensive evaluations. Also, to the best of our knowledge, our method is the first to achieve view-consistent extreme tongue articulation.
project page arxiv BibTeX

Physical Intelligence Article Magnetoelectric film for wireless low-frequency neuromodulationMagnetoelectric film for wireless low-frequency neuromodulation Aydin, A., Jahanshahi, A., Esmaeili-Dokht, P., Han, M., Gardi, G., Temel, Y., Sitti, M. Brain Stimulation: Basic, Translational, and Clinical Research in Neuromodulation, 18:284, 2025 (Published)
Wireless neuromodulation techniques are widely investigated to address the challenges associated with conventional neurostimulation devices. Previous research has relied on ultrasound, light and magnetic fields as the modalities for remotely powering neuronal implants. Use of magnetic fields has been promising for wireless neuronal interfaces since they have excellent tissue penetration. Magnetically powered devices typically work with >100 kHz electromagnetic fields; therefore, they are heavily dependent on the on-board electronics to regulate output signal. Moreover, use of such high frequency is a limiting factor for safe use, especially in deeper areas due to tissue absorption. Magnetoelectric (ME) approach is a promising method that stems from the magneto-electrical coupling. It is a high throughput approach for power delivery through magnetic fields in low frequency regimes compared to far-field or inductive coupling. In this study, we aim to understand how ME approach can be used to modulate neuronal behavior in non-resonant frequency regimes. We fabricated ME planar films through laminating magnetostrictive and piezoelectric components. We initially defined the output electrical potential as the main design parameter and subsequently optimize the device geometry and applied magnetic field profile to achieve the best possible performance. We were able to observe current density of ∼ 4-6 μA/cm2 in phosphate-buffered saline environment under 10 Hz input magnetic field. Lastly, we investigated neuromodulation potential of the ME films in-vitro through calcium imaging studies. Our preliminary results show that primary hippocampal neurons have significantly increased calcium influx during stimulation compared to pre-stimulation phase. Stimulation efficiency was further investigated with changing stimulation duration and input magnetic field waveforms. Overall, these results show that ME films are promising candidates of neuronal interfaces for wireless electrical modulation. Future work will be conducted to understand exact mechanisms of neuromodulation and design such interfaces in an implantable miniature form for in-vivo studies.
DOI URL BibTeX

Empirical Inference Book Chapter Natural Language Processing Jin, Z., Mihalcea, R., Schölkopf, B. In Elgar Encyclopedia of Political Communication, (Editors: Nai, A. and Grömping, M. and Wirz, D.), Edward Elgar Publishing, 2025 (Published) PDF URL BibTeX

Social Foundations of Computation Book The Emerging Science of Machine Learning Benchmarks Hardt, M. 2025 (Published)
Machine learning turns on one simple trick: Split the data into training and test sets. Anything goes on the training set. Rank models on the test set and let model builders compete. Call it a benchmark. Machine learning researchers cherish a good tradition of lamenting the apparent shortcomings of benchmarks. Critics argue that static test sets and metrics promote narrow research objectives, stifling more creative scientific pursuits. Benchmarks also incentivize gaming; in fact, Goodhart's Law cautions against applying competitive pressure to statistical measurement. Over time, researchers may overfit to benchmarks, building models that exploit data artifacts. As a result, test set performance draws a skewed picture of model capabilities that deceives us—especially when comparing humans and machines. To top off the list of issues, there are a slew of reasons why things don't transfer well from benchmarks to the real world.
Website URL BibTeX

Organizational Leadership and Diversity Article Navigating AI Convergence in Human–Artificial Intelligence Teams: A Signaling Theory Approach Smith, A., Van Wagoner, P., Keplinger, K., Celebi, C. Journal of Organizational Behavior, 10.1002/job.2856:10.1002/job.2856, December 2024 (Published)
Teams that combine human intelligence with artificial intelligence (AI) have become indispensable for solving complex tasks in various decision-making contexts in modern organizations. However, the factors that contribute to AI convergence, where human team members align their decisions with those of their AI counterparts, still remain unclear. This study integrates signaling theory with self-determination theory to investigate how specific signals—such as signal fit, optional AI advice, and signal set congruence—affect employees' AI convergence in human–AI teams. Based on four experimental studies conducted in facial recognition and hiring contexts with approximately 1100 participants, the findings highlight the significant positive impact of congruent signals from both human and AI team members on AI convergence. Moreover, providing an option for employees to solicit AI advice also enhances AI convergence; when AI signals are chosen by employees rather than forced upon them, participants are more likely to accept AI advice. This research advances knowledge on human–AI teaming by (1) expanding signaling theory into the human–AI team context; (2) developing a deeper understanding of AI convergence and its drivers in human–AI teams; (3) providing actionable insights for designing teams and tasks to optimize decision-making in high-stakes, uncertain environments; and (4) introducing facial recognition as an innovative context for human–AI teaming.
Navigating AI Convergence in Human–Artificial Intelligence Teams Navigating AI Convergence in Human–Artificial Intelligence Teams DOI URL BibTeX

Perceiving Systems Book Chapter ElephantBook: Participatory Human–AI Elephant Population Monitoring Kulits, P., Wall, J., Beery, S. In Collaborative Intelligence: How Humans and AI Are Transforming Our World, 173-196, 7, (Editors: Lane, Mira and Sethumadhavan, Arathi), The MIT Press, Cambridge, Massachusetts, December 2024 (Published) URL BibTeX

Safety- and Efficiency- aligned Learning Conference Paper Efficiently Dispatching Flash Attention For Partially Filled Attention Masks Sharma, A., Geiping, J. In ENSLP NeurIPS Workshop 2024, ENSLP NeurIPS Workshop 2024, ENSLP NeurIPS Workshop, December 2024 (Published)
Transformers are widely used across various applications, many of which yield sparse or partially filled attention matrices. Examples include attention masks designed to reduce the quadratic complexity of attention, sequence packing techniques, and recent innovations like tree masking for fast validation in MEDUSA. Despite the inherent sparsity in these matrices, the state-of-the-art algorithm Flash Attention still processes them with quadratic complexity as though they were dense. In this paper, we introduce Binary Block Masking, a highly efficient modification that enhances Flash Attention by making it mask-aware. We further propose two optimizations: one tailored for masks with contiguous non-zero patterns and another for extremely sparse masks. Our experiments on attention masks derived from real-world scenarios demonstrate up to a 9x runtime improvement. The implementation will be publicly released to foster further research and application.
URL BibTeX

Robotic Materials Organizational Leadership and Diversity Article Accelerating the pace of innovation in robotics by fostering diversity and inclusive leadership Macari, D., Fratzl, A., Keplinger, K., Keplinger, C. Science Robotics, 9, December 2024 (Published)
Diverse and inclusive teams are not merely a moral imperative but also a catalyst for scientific excellence in robotics. Drawing from literature, a comprehensive citation analysis, and expert interviews, we derive seven main benefits of diversity and inclusion and propose a leadership guide for roboticists to reap these benefits.
DOI URL BibTeX

Perceiving Systems Conference Paper MotionFix: Text-Driven 3D Human Motion Editing Athanasiou, N., Cseke, A., Diomataris, M., Black, M. J., Varol, G. In SIGGRAPH Asia 2024 Conference Proceedings, ACM, SIGGRAPH Asia , December 2024 (Published)
The focus of this paper is 3D motion editing. Given a 3D human motion and a textual description of the desired modification, our goal is to generate an edited motion as described by the text. The challenges include the lack of training data and the design of a model that faithfully edits the source motion. In this paper, we address both these challenges. We build a methodology to semi-automatically collect a dataset of triplets in the form of (i) a source motion, (ii) a target motion, and (iii) an edit text, and create the new dataset. Having access to such data allows us to train a conditional diffusion model that takes both the source motion and the edit text as input. We further build various baselines trained only on text-motion pairs datasets and show superior performance of our model trained on triplets. We introduce new retrieval-based metrics for motion editing and establish a new benchmark on the evaluation set. Our results are encouraging, paving the way for further research on fine-grained motion generation. Code and models will be made publicly available.
Code (GitHub) Website Data Exploration ArXiv URL BibTeX

Empirical Inference Conference Paper From Causal to Concept-Based Representation Learning Rajendran*, G., Buchholz*, S., Aragam, B., Schölkopf, B., Ravikumar, P. K. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:101250-101296, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Learning Partitions from Context Buchholz, S. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:140066-140112, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving Didolkar, A. R., Goyal, A., Ke, N. R., Guo, S., Valko, M., Lillicrap, T. P., Rezende, D. J., Bengio, Y., Mozer, M. C., Arora, S. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:19783-19812, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper A Generative Model of Symmetry Transformations Allingham, J. U., Mlodozeniec, B. K., Padhy, S., Antorán, J., Krueger, D., Turner, R. E., Nalisnick, E., Hernández-Lobato, J. M. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:91091-91130, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Article A Randomized Controlled Trial on Anonymizing Reviewers to Each Other in Peer Review Discussions Rastogi, C., Song, X., Jin, Z., Stelmakh, I., Daumé III, H., Zhang, K., Shah, N. B. PLOS ONE, 19(12), Public Library of Science, December 2024 (Published) DOI URL BibTeX

Social Foundations of Computation Algorithms and Society Conference Paper Algorithmic Collective Action in Recommender Systems: Promoting Songs by Reordering Playlists Baumann, J., Mendler-Dünner, C. Advances in Neural Information Processing Systems 37 (NeurIPS 2024) , The Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), December 2024 (Published)
We investigate algorithmic collective action in transformer-based recommender systems. Our use case is a collective of fans aiming to promote the visibility of an artist by strategically placing one of their songs in the existing playlists they control. The success of the collective is measured by the increase in test-time recommendations of the targeted song. We introduce two easily implementable strategies towards this goal and test their efficacy on a publicly available recommender system model released by a major music streaming platform. Our findings reveal that even small collectives (controlling less than 0.01 of the training data) can achieve up 25x amplification of recommendations by strategically choosing the position at which to insert the song. We then focus on investigating the externalities of the strategy. We find that the performance loss for the platform is negligible, and the recommendations of other songs are largely preserved, minimally impairing the user experience of participants. Moreover, the costs are evenly distributed among other artists. Taken together, our findings demonstrate how collective action strategies can be effective while not necessarily being adversarial, raising new questions around incentives, social dynamics, and equilibria in recommender systems.
arXiv URL BibTeX

Empirical Inference Conference Paper Alien Recombination: Exploring Concept Blends Beyond Human Cognitive Availability in Visual Art Hernandez, A., Brinkmann, L., Serna, I., Rahaman, N., Alhaija, H. A., Yakura, H., Sola, M. C., Schölkopf, B., Rahwan, I. NeurIPS 2024 Workshop on Creativity and Generative AI, December 2024 (Published) arXiv BibTeX

Social Foundations of Computation Algorithms and Society Conference Paper An Engine Not a Camera: Measuring Performative Power of Online Search Mendler-Dünner, C., Carovano, G., Hardt, M. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), The Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), December 2024 (Published)
The power of digital platforms is at the center of major ongoing policy and regulatory efforts. To advance existing debates, we designed and executed an experiment to measure the power of online search providers, building on the recent definition of performative power. Instantiated in our setting, performative power quantifies the ability of a search engine to steer web traffic by rearranging results. To operationalize this definition we developed a browser extension that performs unassuming randomized experiments in the background. These randomized experiments emulate updates to the search algorithm and identify the causal effect of different content arrangements on clicks. We formally relate these causal effects to performative power. Analyzing tens of thousands of clicks, we discuss what our robust quantitative findings say about the power of online search engines. More broadly, we envision our work to serve as a blueprint for how performative power and online experiments can be integrated with future investigations into the economic power of digital platforms.
ArXiv URL BibTeX

Haptic Intelligence Ph.D. Thesis Capturing and Recognizing Multimodal Surface Interactions as Embedded High-Dimensional Distributions Khojasteh, B. University of Stuttgart, Stuttgart, Germany, December 2024, Faculty of Engineering Design, Production Engineering and Automotive Engineering (Published)
Exploring a surface with a handheld tool generates complex contact signals that uniquely encode the surface's properties-a needle hidden in a haystack of data. Humans naturally integrate visual, auditory, and haptic sensory data during these interactions to accurately assess and recognize surfaces. However, enabling artificial systems to perceive and recognize surfaces with human-like proficiency remains a significant challenge. The complexity and dimensionality of multi-modal sensor data, particularly in the intricate and dynamic modality of touch, hinders effective sensing and processing. Successfully overcoming these challenges will open up new possibilities in applications such as quality control, material documentation, and robotics. This dissertation addresses these issues at the levels of both the sensing hardware and the processing algorithms by introducing an automated similarity framework for multimodal surface recognition, developing a haptic-auditory test bed for acquiring high-quality surface data, and exploring optimal sensing configurations to improve recognition performance and robustness.
BibTeX

Empirical Inference Conference Paper Causal vs. Anticausal merging of predictors Garrido Mejia, S., Blöbaum, P., Schölkopf, B., Janzing, D. Advances in Neural Information Processing Systems 37 (NeurIPS 2024) , 37:1402-1427, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Ph.D. Thesis Causality for Natural Language Processing Jin, Z. University of Tübingen, Germany, December 2024, (ELLIS PhD student program) (Published) URL BibTeX

Empirical Inference Conference Paper Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias Chen*, Y., Vethavikashini*, C. R., Mattern*, J., Mihalcea, R., Jin, Z. NeurIPS 2024 Workshop on Causality and Language Models (CaLM), December 2024, *equal contribution (Published) DOI URL BibTeX

Empirical Inference Conference Paper Cooperate or Collapse: Emergence of Sustainability in a Society of LLM Agents Piatti*, G., Jin*, Z., Kleiman-Weiner*, M., Schölkopf, B., Sachan, M., Mihalcea, R. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:111715-111759, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024, *equal contribution (Published) arXiv URL BibTeX

Haptic Intelligence Robotic Materials Article Cutaneous Electrohydraulic (CUTE) Wearable Devices for Pleasant Broad-Bandwidth Haptic Cues Sanchez-Tamayo, N., Yoder, Z., Rothemund, P., Ballardini, G., Keplinger, C., Kuchenbecker, K. J. Advanced Science, 11(48):2402461, December 2024, This article was selected for the inside front cover. https://doi.org/10.1002/advs.202470295 (Published)
By focusing on vibrations, current wearable haptic devices underutilize the skin's perceptual capabilities. Devices that provide richer haptic stimuli, including contact feedback and/or variable pressure, are typically heavy and bulky due to the underlying actuator technology and the low sensitivity of hairy skin, which covers most of the body. This paper presents a system architecture for compact wearable devices that deliver salient and pleasant broad-bandwidth haptic cues: Cutaneous Electrohydraulic (CUTE) devices combine a custom materials design for soft haptic electrohydraulic actuators that feature high stroke, high force, and electrical safety with a comfortable mounting strategy that places the actuator in a non-contact resting position. A prototypical wrist-wearable CUTE device produces rich tactile sensations by making and breaking contact with the skin (2.44 mm actuation stroke), applying high controllable forces (exceeding 2.3 N), and delivering vibrations at a wide range of amplitudes and frequencies (0-200 Hz). A perceptual study with fourteen participants achieved 97.9\% recognition accuracy across six diverse cues and verified their pleasant and expressive feel. This system architecture for wearable devices gives unprecedented control over the haptic cues delivered to the skin, providing an elegant and discreet way to activate the user's sense of touch.
Video DOI BibTeX

Haptic Intelligence Master Thesis Diffusion Models for Fast and Accurate Approximate Model Predictive Control Marquez Julbe, P. Eindhoven University of Technology, Eindhoven, the Netherlands, December 2024, Master of Science in Systems and Control (Published)
Model predictive control (MPC) is a powerful control and planning framework for a large class of problems, yet its practical application remains limited by computational demands. While previous efforts have focused on approximating MPC with explicit representations for high-frequency real-time deployment, handling complex MPC formulations with multiple local optima or set-valued global optima remains an open challenge in practice. This thesis explores the use of diffusion models for approximate MPC, enabling their application in such scenarios with low computational time. We introduce a novel diffusion-based approximator capable of accurately modeling multi-modal out- put distributions, while achieving computation times under 2.5 ms, allowing users to efficiently sample multiple feasible and locally optimal solutions with no additional computational overhead. Our method is quantitatively compared with traditional least-squares regression models, demonstrating significant improvements. Experimental validation is performed on a 7-DOF KUKA LBR4+ robotic arm operating at 250 Hz, confirming the benefits of our approach and providing insights into high-frequency neural control. Additionally, we examine diffusion model sampling strategies, leveraging their unique properties to ensure feasible and smooth closed-loop operation. As part of this work, we release a general software framework for data collection using optimal control policies in the photo-realistic simulator Isaac Lab. The framework includes multi-processing tools for CPU-based controllers and supports training and evaluating neural controllers, including diffusion models such as DDPM and traditional least-squares regression.
BibTeX

Social Foundations of Computation Conference Paper Do Causal Predictors Generalize Better to New Domains? Nastl, V. Y., Hardt, M. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), The Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), Spotlight Poster, December 2024 (Published)
We study how well machine learning models trained on causal features generalize across domains. We consider 16 prediction tasks on tabular datasets covering applications in health, employment, education, social benefits, and politics. Each dataset comes with multiple domains, allowing us to test how well a model trained in one domain performs in another. For each prediction task, we select features that have a causal influence on the target of prediction. Our goal is to test the hypothesis that models trained on causal features generalize better across domains. Without exception, we find that predictors using all available features, regardless of causality, have better in-domain and out-of-domain accuracy than predictors using causal features. Moreover, even the absolute drop in accuracy from one domain to the other is no better for causal predictors than for models that use all features. If the goal is to generalize to new domains, practitioners might as well train the best possible model on all available features.
ArXiv URL BibTeX

Empirical Inference Conference Paper Do Finetti: On Causal Effects for Exchangeable Data Guo, S., Zhang, C., Muhan, K., Huszár*, F., Schölkopf*, B. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:127317-127345, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024, *equal supervision (Published) URL BibTeX

Social Foundations of Computation Algorithms and Society Conference Paper Evaluating Language Models as Risk Scores Cruz, A. F., Hardt, M., Mendler-Dünner, C. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), The Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), December 2024 (Published)
Current question-answering benchmarks predominantly focus on accuracy in realizable prediction tasks. Conditioned on a question and answer-key, does the most likely token match the ground truth? Such benchmarks necessarily fail to evaluate language models' ability to quantify outcome uncertainty. In this work, we focus on the use of language models as risk scores for unrealizable prediction tasks. We introduce folktexts, a software package to systematically generate risk scores using large language models, and evaluate them against benchmark prediction tasks. Specifically, the package derives natural language tasks from US Census data products, inspired by popular tabular data benchmarks. A flexible API allows for any task to be constructed out of 28 census features whose values are mapped to prompt-completion pairs. We demonstrate the utility of folktexts through a sweep of empirical insights on 16 recent large language models, inspecting risk scores, calibration curves, and diverse evaluation metrics. We find that zero-shot risk sores have high predictive signal while being widely miscalibrated: base models overestimate outcome uncertainty, while instruction-tuned models underestimate uncertainty and generate over-confident risk scores.
ArXiv Code URL BibTeX

Empirical Inference Conference Paper Improving Linear System Solvers for Hyperparameter Optimisation in Iterative Gaussian Processes Lin, J. A., Padhy, S., Mlodozeniec, B. K., Antorán, J., Hernández-Lobato, J. M. Advances in Neural Information Processing Systems 37 (NeurIPS 2024) , 37:15460-15496, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Inferring stochastic low-rank recurrent neural networks from neural data Pals, M., Sağtekin, A. E., Pei, F., Gloeckler, M., Macke, J. Advances in Neural Information Processing Systems 37 (NeurIPS 2024) , 37:18225-18264, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Latent Diffusion for Neural Spiking Data Kapoor, J., Schulz, A., Vetter, J., Pei, F., Gao, R., Macke, J. H. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:118119-118154, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Limits of Transformer Language Models on Learning to Compose Algorithms Thomm, J., Camposampiero, G., Terzic, A., Hersche, M., Schölkopf, B., Rahimi, A. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:7631-7674, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) arXiv URL BibTeX

Rationality Enhancement Article Metacognitive Learning from Consequences of Past Choices Shapes Moral Decision-Making Maier, M., Cheung, V., Lieder, F. Nature Human Behaviour, December 2024 (Submitted)
Many controversies arise from differences in how people resolve moral dilemmas by following deontological moral rules versus consequentialist cost-benefit reasoning (CBR). This article explores whether and, if so, how these seemingly intractable differences may arise from experience and whether they can be overcome through moral learning. We designed a new experimental paradigm to investigate moral learning from consequences of previous decisions. Our participants (N=387) faced a series of realistic moral dilemmas between two conflicting choices: one prescribed by a moral rule and the other favored by CBR. Critically, we let them observe the consequences of each of their decisions before making the next one. In one condition, CBR-based decisions consistently led to good outcomes, whereas rule-based decisions consistently led to bad outcomes. In the other condition, this contingency was reversed. We observed systematic, experience-dependent changes in people's moral rightness ratings and moral decisions over the course of just 13 decisions. Without being aware of it, participants adjusted how much moral weight they gave to CBR versus moral rules according to which approach produced better consequences in their respective experimental condition. These learning effects transferred to their subsequent responses to the Oxford Utilitarianism Scale, indicating genuine moral learning rather than task-specific effects. Our findings demonstrate the existence of rapid adaptive moral learning from the consequences of previous decisions. Individual differences in morality may thus be more malleable than previously thought.
DOI BibTeX

Empirical Inference Conference Paper Neural Characteristic Activation Analysis and Geometric Parameterization for ReLU Networks Chen, W., Ge, H. Advances in Neural Information Processing Systems 37 (NeurIPS 2024) , 37:97562-97586, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper On Affine Homotopy between Language Encoders Chan, R., Bourmasmoud, R., Svete, A., Ren, Y., Guo, Q., Jin, Z., Ravfogel, S., Sachan, M., Schölkopf, B., El-Assady, M., Cotterell, R. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:73337-73365, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Haptic Intelligence Ph.D. Thesis Precision Haptics in Gait Retraining for Knee Osteoarthritis Rokhmanova, N. Carnegie Mellon University, Pittsburgh, USA, December 2024, Department of Mechanical Engineering (Published)
Gait retraining, or teaching patients to walk in ways that reduce joint loading, shows promise as a conservative intervention for knee osteoarthritis. However, its use in clinical settings remains limited by challenges in prescribing optimal gait patterns and delivering precise, real-time biofeedback. This thesis presents four interconnected studies that aim to address these barriers to clinical adoption: First, a regression model was developed to predict patient-specific biomechanical responses to a gait modification using only simple clinical measures, reducing the need for instrumented gait analysis. Second, we identified how inertial sensor accuracy fundamentally impacts motor learning outcomes during gait retraining, demonstrating the importance of reliable kinematic tracking. Third, we designed and validated an open-source wearable haptic platform called ARIADNE, which delivers precise vibrotactile motion guidance and enables rigorous comparison of feedback strategies for gait retraining. This platform's integrated sensing revealed how anatomical placement and tissue properties influence vibration transmission and perception. Finally, a gait retraining study demonstrated that vibrotactile feedback significantly improves both learning and retention of therapeutic gait patterns compared to verbal instruction alone, highlighting the critical role of precise biofeedback systems in rehabilitation. These contributions help advance the field's understanding of the sensorimotor principles underlying gait retraining while providing practical tools to support future clinical implementation.
BibTeX

Social Foundations of Computation Algorithms and Society Conference Paper Questioning the Survey Responses of Large Language Models Dominguez-Olmedo, R., Hardt, M., Mendler-Dünner, C. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), The Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), Oral, December 2024 (Published)
As large language models increase in capability, researchers have started to conduct surveys of all kinds on these models in order to investigate the population represented by their responses. In this work, we critically examine language models' survey responses on the basis of the well-established American Community Survey by the U.S. Census Bureau and investigate whether they elicit a faithful representations of any human population. Using a de-facto standard multiple-choice prompting technique and evaluating 39 different language models using systematic experiments, we establish two dominant patterns: First, models' responses are governed by ordering and labeling biases, leading to variations across models that do not persist after adjusting for systematic biases. Second, models' responses do not contain the entropy variations and statistical signals typically found in human populations. As a result, a binary classifier can almost perfectly differentiate model-generated data from the responses of the U.S. census. At the same time, models' relative alignment with different demographic subgroups can be predicted from the subgroups' entropy, irrespective of the model's training data or training strategy. Taken together, our findings suggest caution in treating models' survey responses as equivalent to those of human populations.
ArXiv URL BibTeX

Empirical Inference Conference Paper Shaving Weights with Occam’s Razor: Bayesian Sparsification for Neural Networks using the Marginal Likelihood Dhahri, R., Immer, A., Charpentier, B., Günnemann, S., Fortuin, V. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:24959-24989, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Sourcerer: Sample-based Maximum Entropy Source Distribution Estimation Vetter, J., Moss, G., Schröder, C., Gao, R., Macke, J. H. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:88772-88806, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Social Foundations of Computation Conference Paper The Fairness-Quality Trade-off in Clustering Hakim, R., Stoica, A., Papadimitriou, C. H., Yannakakis, M. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), The Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), December 2024 (Published) arXiv URL BibTeX

Empirical Inference Conference Paper Theoretical Characterisation of the Gauss Newton Conditioning in Neural Networks Zhao*, J., Singh*, S. P., Lucchi, A. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:114965-115000, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024, *equal contribution (Published) URL BibTeX

Empirical Inference Conference Paper What Makes and Breaks Safety Fine-tuning? A Mechanistic Study Jain, S., Lubana, E. S., Oksuz, K., Joy, T., Torr, P., Sanyal, A., Dokania, P. K. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:93406-93478, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Perceiving Systems Article PuzzleAvatar: Assembling 3D Avatars from Personal Albums Xiu, Y., Liu, Z., Tzionas, D., Black, M. J. ACM Transactions on Graphics, 43(6):1-15, ACM, December 2024 (Published)
Generating personalized 3D avatars is crucial for AR/VR. However, recent text-to-3D methods that generate avatars for celebrities or fictional characters, struggle with everyday people. Methods for faithful reconstruction typically require full-body images in controlled settings. What if a user could just upload their personal "OOTD" (Outfit Of The Day) photo collection and get a faithful avatar in return? The challenge is that such casual photo collections contain diverse poses, challenging viewpoints, cropped views, and occlusion (albeit with a consistent outfit, accessories and hairstyle). We address this novel "Album2Human" task by developing PuzzleAvatar, a novel model that generates a faithful 3D avatar (in a canonical pose) from a personal OOTD album, while bypassing the challenging estimation of body and camera pose. To this end, we fine-tune a foundational vision-language model (VLM) on such photos, encoding the appearance, identity, garments, hairstyles, and accessories of a person into (separate) learned tokens and instilling these cues into the VLM. In effect, we exploit the learned tokens as "puzzle pieces" from which we assemble a faithful, personalized 3D avatar. Importantly, we can customize avatars by simply inter-changing tokens. As a benchmark for this new task, we collect a new dataset, called PuzzleIOI, with 41 subjects in a total of nearly 1K OOTD configurations, in challenging partial photos with paired ground-truth 3D bodies. Evaluation shows that PuzzleAvatar not only has high reconstruction accuracy, outperforming TeCH and MVDreamBooth, but also a unique scalability to album photos, and strong robustness. Our code and data are publicly available for research purpose.
DOI URL BibTeX

Perceiving Systems Conference Paper SPARK: Self-supervised Personalized Real-time Monocular Face Capture Baert, K., Bharadwaj, S., Castan, F., Maujean, B., Christie, M., Abrevaya, V., Boukhayma, A. In SIGGRAPH Asia 2024 Conference Proceedings, SIGGRAPH Asia, December 2024 (Published)
Feedforward monocular face capture methods seek to reconstruct posed faces from a single image of a person. Current state of the art approaches have the ability to regress parametric 3D face models in real-time across a wide range of identities, lighting conditions and poses by leveraging large image datasets of human faces. These methods however suffer from clear limitations in that the underlying parametric face model only provides a coarse estimation of the face shape, thereby limiting their practical applicability in tasks that require precise 3D reconstruction (aging, face swapping, digital make-up, ...). In this paper, we propose a method for high-precision 3D face capture taking advantage of a collection of unconstrained videos of a subject as prior information. Our proposal builds on a two stage approach. We start with the reconstruction of a detailed 3D face avatar of the person, capturing both precise geometry and appearance from a collection of videos. We then use the encoder from a pre-trained monocular face reconstruction method, substituting its decoder with our personalized model, and proceed with transfer learning on the video collection. Using our pre-estimated image formation model, we obtain a more precise self-supervision objective, enabling improved expression and pose alignment. This results in a trained encoder capable of efficiently regressing pose and expression parameters in real-time from previously unseen images, which combined with our personalized geometry model yields more accurate and high fidelity mesh inference. Through extensive qualitative and quantitative evaluation, we showcase the superiority of our final model as compared to state-of-the-art baselines, and demonstrate its generalization ability to unseen pose, expression and lighting.
DOI URL BibTeX

Perceiving Systems Article StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal Ye, C., Qiu, L., Gu, X., Zuo, Q., Wu, Y., Dong, Z., Bo, L., Xiu, Y., Han, X. ACM Transactions on Graphics, 43(6):1-18, ACM, December 2024 (Published)
This work addresses the challenge of high-quality surface normal estimation from monocular colored inputs (i.e., images and videos), a field which has recently been revolutionized by repurposing diffusion priors. However, previous attempts still struggle with stochastic inference, conflicting with the deterministic nature of the Image2Normal task, and costly ensembling step, which slows down the estimation process. Our method, StableNormal, mitigates the stochasticity of the diffusion process by reducing inference variance, thus producing "Stable-and-Sharp" normal estimates without any additional ensembling process. StableNormal works robustly under challenging imaging conditions, such as extreme lighting, blurring, and low quality. It is also robust against transparent and reflective surfaces, as well as cluttered scenes with numerous objects. Specifically, StableNormal employs a coarse-to-fine strategy, which starts with a one-step normal estimator (YOSO) to derive an initial normal guess, that is relatively coarse but reliable, then followed by a semantic-guided refinement process (SG-DRN) that refines the normals to recover geometric details. The effectiveness of StableNormal is demonstrated through competitive performance in standard datasets such as DIODE-indoor, iBims, ScannetV2 and NYUv2, and also in various downstream tasks, such as surface reconstruction and normal enhancement. These results evidence that StableNormal retains both the "stability" and "sharpness" for accurate normal estimation. StableNormal represents a baby attempt to repurpose diffusion priors for deterministic estimation. To democratize this, code and models have been publicly available.
DOI BibTeX