Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Social Foundations of Computation Book The Emerging Science of Machine Learning Benchmarks Hardt, M. 2025 (Published)
Machine learning turns on one simple trick: Split the data into training and test sets. Anything goes on the training set. Rank models on the test set and let model builders compete. Call it a benchmark. Machine learning researchers cherish a good tradition of lamenting the apparent shortcomings of benchmarks. Critics argue that static test sets and metrics promote narrow research objectives, stifling more creative scientific pursuits. Benchmarks also incentivize gaming; in fact, Goodhart's Law cautions against applying competitive pressure to statistical measurement. Over time, researchers may overfit to benchmarks, building models that exploit data artifacts. As a result, test set performance draws a skewed picture of model capabilities that deceives us—especially when comparing humans and machines. To top off the list of issues, there are a slew of reasons why things don't transfer well from benchmarks to the real world.
Website URL BibTeX

Organizational Leadership and Diversity Article Navigating AI Convergence in Human–Artificial Intelligence Teams: A Signaling Theory Approach Smith, A., Van Wagoner, P., Keplinger, K., Celebi, C. Journal of Organizational Behavior, 10.1002/job.2856:10.1002/job.2856, December 2024 (Published)
Teams that combine human intelligence with artificial intelligence (AI) have become indispensable for solving complex tasks in various decision-making contexts in modern organizations. However, the factors that contribute to AI convergence, where human team members align their decisions with those of their AI counterparts, still remain unclear. This study integrates signaling theory with self-determination theory to investigate how specific signals—such as signal fit, optional AI advice, and signal set congruence—affect employees' AI convergence in human–AI teams. Based on four experimental studies conducted in facial recognition and hiring contexts with approximately 1100 participants, the findings highlight the significant positive impact of congruent signals from both human and AI team members on AI convergence. Moreover, providing an option for employees to solicit AI advice also enhances AI convergence; when AI signals are chosen by employees rather than forced upon them, participants are more likely to accept AI advice. This research advances knowledge on human–AI teaming by (1) expanding signaling theory into the human–AI team context; (2) developing a deeper understanding of AI convergence and its drivers in human–AI teams; (3) providing actionable insights for designing teams and tasks to optimize decision-making in high-stakes, uncertain environments; and (4) introducing facial recognition as an innovative context for human–AI teaming.
Navigating AI Convergence in Human–Artificial Intelligence Teams Navigating AI Convergence in Human–Artificial Intelligence Teams DOI URL BibTeX

Perceiving Systems Book Chapter ElephantBook: Participatory Human–AI Elephant Population Monitoring Kulits, P., Wall, J., Beery, S. In Collaborative Intelligence: How Humans and AI Are Transforming Our World, 173-196, 7, (Editors: Lane, Mira and Sethumadhavan, Arathi), The MIT Press, Cambridge, Massachusetts, December 2024 (Published) URL BibTeX

Safety- and Efficiency- aligned Learning Conference Paper Efficiently Dispatching Flash Attention For Partially Filled Attention Masks Sharma, A., Geiping, J. In ENSLP NeurIPS Workshop 2024, ENSLP NeurIPS Workshop 2024, ENSLP NeurIPS Workshop, December 2024 (Published)
Transformers are widely used across various applications, many of which yield sparse or partially filled attention matrices. Examples include attention masks designed to reduce the quadratic complexity of attention, sequence packing techniques, and recent innovations like tree masking for fast validation in MEDUSA. Despite the inherent sparsity in these matrices, the state-of-the-art algorithm Flash Attention still processes them with quadratic complexity as though they were dense. In this paper, we introduce Binary Block Masking, a highly efficient modification that enhances Flash Attention by making it mask-aware. We further propose two optimizations: one tailored for masks with contiguous non-zero patterns and another for extremely sparse masks. Our experiments on attention masks derived from real-world scenarios demonstrate up to a 9x runtime improvement. The implementation will be publicly released to foster further research and application.
URL BibTeX

Robotic Materials Organizational Leadership and Diversity Article Accelerating the pace of innovation in robotics by fostering diversity and inclusive leadership Macari, D., Fratzl, A., Keplinger, K., Keplinger, C. Science Robotics, 9, December 2024 (Published)
Diverse and inclusive teams are not merely a moral imperative but also a catalyst for scientific excellence in robotics. Drawing from literature, a comprehensive citation analysis, and expert interviews, we derive seven main benefits of diversity and inclusion and propose a leadership guide for roboticists to reap these benefits.
DOI URL BibTeX

Perceiving Systems Conference Paper MotionFix: Text-Driven 3D Human Motion Editing Athanasiou, N., Cseke, A., Diomataris, M., Black, M. J., Varol, G. In SIGGRAPH Asia 2024 Conference Proceedings, ACM, SIGGRAPH Asia , December 2024 (Published)
The focus of this paper is 3D motion editing. Given a 3D human motion and a textual description of the desired modification, our goal is to generate an edited motion as described by the text. The challenges include the lack of training data and the design of a model that faithfully edits the source motion. In this paper, we address both these challenges. We build a methodology to semi-automatically collect a dataset of triplets in the form of (i) a source motion, (ii) a target motion, and (iii) an edit text, and create the new dataset. Having access to such data allows us to train a conditional diffusion model that takes both the source motion and the edit text as input. We further build various baselines trained only on text-motion pairs datasets and show superior performance of our model trained on triplets. We introduce new retrieval-based metrics for motion editing and establish a new benchmark on the evaluation set. Our results are encouraging, paving the way for further research on fine-grained motion generation. Code and models will be made publicly available.
Code (GitHub) Website Data Exploration ArXiv URL BibTeX

Empirical Inference Conference Paper From Causal to Concept-Based Representation Learning Rajendran*, G., Buchholz*, S., Aragam, B., Schölkopf, B., Ravikumar, P. K. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:101250-101296, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Learning Partitions from Context Buchholz, S. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:140066-140112, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving Didolkar, A. R., Goyal, A., Ke, N. R., Guo, S., Valko, M., Lillicrap, T. P., Rezende, D. J., Bengio, Y., Mozer, M. C., Arora, S. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:19783-19812, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper A Generative Model of Symmetry Transformations Allingham, J. U., Mlodozeniec, B. K., Padhy, S., Antorán, J., Krueger, D., Turner, R. E., Nalisnick, E., Hernández-Lobato, J. M. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:91091-91130, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Article A Randomized Controlled Trial on Anonymizing Reviewers to Each Other in Peer Review Discussions Rastogi, C., Song, X., Jin, Z., Stelmakh, I., Daumé III, H., Zhang, K., Shah, N. B. PLOS ONE, 19(12), Public Library of Science, December 2024 (Published) DOI URL BibTeX

Social Foundations of Computation Algorithms and Society Conference Paper Algorithmic Collective Action in Recommender Systems: Promoting Songs by Reordering Playlists Baumann, J., Mendler-Dünner, C. Advances in Neural Information Processing Systems 37 (NeurIPS 2024) , The Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), December 2024 (Published)
We investigate algorithmic collective action in transformer-based recommender systems. Our use case is a collective of fans aiming to promote the visibility of an artist by strategically placing one of their songs in the existing playlists they control. The success of the collective is measured by the increase in test-time recommendations of the targeted song. We introduce two easily implementable strategies towards this goal and test their efficacy on a publicly available recommender system model released by a major music streaming platform. Our findings reveal that even small collectives (controlling less than 0.01 of the training data) can achieve up 25x amplification of recommendations by strategically choosing the position at which to insert the song. We then focus on investigating the externalities of the strategy. We find that the performance loss for the platform is negligible, and the recommendations of other songs are largely preserved, minimally impairing the user experience of participants. Moreover, the costs are evenly distributed among other artists. Taken together, our findings demonstrate how collective action strategies can be effective while not necessarily being adversarial, raising new questions around incentives, social dynamics, and equilibria in recommender systems.
arXiv URL BibTeX

Empirical Inference Conference Paper Alien Recombination: Exploring Concept Blends Beyond Human Cognitive Availability in Visual Art Hernandez, A., Brinkmann, L., Serna, I., Rahaman, N., Alhaija, H. A., Yakura, H., Sola, M. C., Schölkopf, B., Rahwan, I. NeurIPS 2024 Workshop on Creativity and Generative AI, December 2024 (Published) arXiv BibTeX

Social Foundations of Computation Algorithms and Society Conference Paper An Engine Not a Camera: Measuring Performative Power of Online Search Mendler-Dünner, C., Carovano, G., Hardt, M. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), The Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), December 2024 (Published)
The power of digital platforms is at the center of major ongoing policy and regulatory efforts. To advance existing debates, we designed and executed an experiment to measure the power of online search providers, building on the recent definition of performative power. Instantiated in our setting, performative power quantifies the ability of a search engine to steer web traffic by rearranging results. To operationalize this definition we developed a browser extension that performs unassuming randomized experiments in the background. These randomized experiments emulate updates to the search algorithm and identify the causal effect of different content arrangements on clicks. We formally relate these causal effects to performative power. Analyzing tens of thousands of clicks, we discuss what our robust quantitative findings say about the power of online search engines. More broadly, we envision our work to serve as a blueprint for how performative power and online experiments can be integrated with future investigations into the economic power of digital platforms.
ArXiv URL BibTeX

Haptic Intelligence Ph.D. Thesis Capturing and Recognizing Multimodal Surface Interactions as Embedded High-Dimensional Distributions Khojasteh, B. University of Stuttgart, Stuttgart, Germany, December 2024, Faculty of Engineering Design, Production Engineering and Automotive Engineering (Published)
Exploring a surface with a handheld tool generates complex contact signals that uniquely encode the surface's properties-a needle hidden in a haystack of data. Humans naturally integrate visual, auditory, and haptic sensory data during these interactions to accurately assess and recognize surfaces. However, enabling artificial systems to perceive and recognize surfaces with human-like proficiency remains a significant challenge. The complexity and dimensionality of multi-modal sensor data, particularly in the intricate and dynamic modality of touch, hinders effective sensing and processing. Successfully overcoming these challenges will open up new possibilities in applications such as quality control, material documentation, and robotics. This dissertation addresses these issues at the levels of both the sensing hardware and the processing algorithms by introducing an automated similarity framework for multimodal surface recognition, developing a haptic-auditory test bed for acquiring high-quality surface data, and exploring optimal sensing configurations to improve recognition performance and robustness.
BibTeX

Empirical Inference Conference Paper Causal vs. Anticausal merging of predictors Garrido Mejia, S., Blöbaum, P., Schölkopf, B., Janzing, D. Advances in Neural Information Processing Systems 37 (NeurIPS 2024) , 37:1402-1427, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Ph.D. Thesis Causality for Natural Language Processing Jin, Z. University of Tübingen, Germany, December 2024, (ELLIS PhD student program) (Published) URL BibTeX

Empirical Inference Conference Paper Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias Chen*, Y., Vethavikashini*, C. R., Mattern*, J., Mihalcea, R., Jin, Z. NeurIPS 2024 Workshop on Causality and Language Models (CaLM), December 2024, *equal contribution (Published) DOI URL BibTeX

Empirical Inference Conference Paper Cooperate or Collapse: Emergence of Sustainability in a Society of LLM Agents Piatti*, G., Jin*, Z., Kleiman-Weiner*, M., Schölkopf, B., Sachan, M., Mihalcea, R. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:111715-111759, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024, *equal contribution (Published) arXiv URL BibTeX

Haptic Intelligence Robotic Materials Article Cutaneous Electrohydraulic (CUTE) Wearable Devices for Pleasant Broad-Bandwidth Haptic Cues Sanchez-Tamayo, N., Yoder, Z., Rothemund, P., Ballardini, G., Keplinger, C., Kuchenbecker, K. J. Advanced Science, 11(48):2402461, December 2024, This article was selected for the inside front cover. https://doi.org/10.1002/advs.202470295 (Published)
By focusing on vibrations, current wearable haptic devices underutilize the skin's perceptual capabilities. Devices that provide richer haptic stimuli, including contact feedback and/or variable pressure, are typically heavy and bulky due to the underlying actuator technology and the low sensitivity of hairy skin, which covers most of the body. This paper presents a system architecture for compact wearable devices that deliver salient and pleasant broad-bandwidth haptic cues: Cutaneous Electrohydraulic (CUTE) devices combine a custom materials design for soft haptic electrohydraulic actuators that feature high stroke, high force, and electrical safety with a comfortable mounting strategy that places the actuator in a non-contact resting position. A prototypical wrist-wearable CUTE device produces rich tactile sensations by making and breaking contact with the skin (2.44 mm actuation stroke), applying high controllable forces (exceeding 2.3 N), and delivering vibrations at a wide range of amplitudes and frequencies (0-200 Hz). A perceptual study with fourteen participants achieved 97.9\% recognition accuracy across six diverse cues and verified their pleasant and expressive feel. This system architecture for wearable devices gives unprecedented control over the haptic cues delivered to the skin, providing an elegant and discreet way to activate the user's sense of touch.
Video DOI BibTeX

Haptic Intelligence Master Thesis Diffusion Models for Fast and Accurate Approximate Model Predictive Control Marquez Julbe, P. Eindhoven University of Technology, Eindhoven, the Netherlands, December 2024, Master of Science in Systems and Control (Published)
Model predictive control (MPC) is a powerful control and planning framework for a large class of problems, yet its practical application remains limited by computational demands. While previous efforts have focused on approximating MPC with explicit representations for high-frequency real-time deployment, handling complex MPC formulations with multiple local optima or set-valued global optima remains an open challenge in practice. This thesis explores the use of diffusion models for approximate MPC, enabling their application in such scenarios with low computational time. We introduce a novel diffusion-based approximator capable of accurately modeling multi-modal out- put distributions, while achieving computation times under 2.5 ms, allowing users to efficiently sample multiple feasible and locally optimal solutions with no additional computational overhead. Our method is quantitatively compared with traditional least-squares regression models, demonstrating significant improvements. Experimental validation is performed on a 7-DOF KUKA LBR4+ robotic arm operating at 250 Hz, confirming the benefits of our approach and providing insights into high-frequency neural control. Additionally, we examine diffusion model sampling strategies, leveraging their unique properties to ensure feasible and smooth closed-loop operation. As part of this work, we release a general software framework for data collection using optimal control policies in the photo-realistic simulator Isaac Lab. The framework includes multi-processing tools for CPU-based controllers and supports training and evaluating neural controllers, including diffusion models such as DDPM and traditional least-squares regression.
BibTeX

Social Foundations of Computation Conference Paper Do Causal Predictors Generalize Better to New Domains? Nastl, V. Y., Hardt, M. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), The Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), Spotlight Poster, December 2024 (Published)
We study how well machine learning models trained on causal features generalize across domains. We consider 16 prediction tasks on tabular datasets covering applications in health, employment, education, social benefits, and politics. Each dataset comes with multiple domains, allowing us to test how well a model trained in one domain performs in another. For each prediction task, we select features that have a causal influence on the target of prediction. Our goal is to test the hypothesis that models trained on causal features generalize better across domains. Without exception, we find that predictors using all available features, regardless of causality, have better in-domain and out-of-domain accuracy than predictors using causal features. Moreover, even the absolute drop in accuracy from one domain to the other is no better for causal predictors than for models that use all features. If the goal is to generalize to new domains, practitioners might as well train the best possible model on all available features.
ArXiv URL BibTeX

Empirical Inference Conference Paper Do Finetti: On Causal Effects for Exchangeable Data Guo, S., Zhang, C., Muhan, K., Huszár*, F., Schölkopf*, B. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:127317-127345, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024, *equal supervision (Published) URL BibTeX

Social Foundations of Computation Algorithms and Society Conference Paper Evaluating Language Models as Risk Scores Cruz, A. F., Hardt, M., Mendler-Dünner, C. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), The Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), December 2024 (Published)
Current question-answering benchmarks predominantly focus on accuracy in realizable prediction tasks. Conditioned on a question and answer-key, does the most likely token match the ground truth? Such benchmarks necessarily fail to evaluate language models' ability to quantify outcome uncertainty. In this work, we focus on the use of language models as risk scores for unrealizable prediction tasks. We introduce folktexts, a software package to systematically generate risk scores using large language models, and evaluate them against benchmark prediction tasks. Specifically, the package derives natural language tasks from US Census data products, inspired by popular tabular data benchmarks. A flexible API allows for any task to be constructed out of 28 census features whose values are mapped to prompt-completion pairs. We demonstrate the utility of folktexts through a sweep of empirical insights on 16 recent large language models, inspecting risk scores, calibration curves, and diverse evaluation metrics. We find that zero-shot risk sores have high predictive signal while being widely miscalibrated: base models overestimate outcome uncertainty, while instruction-tuned models underestimate uncertainty and generate over-confident risk scores.
ArXiv Code URL BibTeX

Empirical Inference Conference Paper Improving Linear System Solvers for Hyperparameter Optimisation in Iterative Gaussian Processes Lin, J. A., Padhy, S., Mlodozeniec, B. K., Antorán, J., Hernández-Lobato, J. M. Advances in Neural Information Processing Systems 37 (NeurIPS 2024) , 37:15460-15496, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Inferring stochastic low-rank recurrent neural networks from neural data Pals, M., Sağtekin, A. E., Pei, F., Gloeckler, M., Macke, J. Advances in Neural Information Processing Systems 37 (NeurIPS 2024) , 37:18225-18264, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Latent Diffusion for Neural Spiking Data Kapoor, J., Schulz, A., Vetter, J., Pei, F., Gao, R., Macke, J. H. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:118119-118154, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Limits of Transformer Language Models on Learning to Compose Algorithms Thomm, J., Camposampiero, G., Terzic, A., Hersche, M., Schölkopf, B., Rahimi, A. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:7631-7674, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) arXiv URL BibTeX

Rationality Enhancement Article Metacognitive Learning from Consequences of Past Choices Shapes Moral Decision-Making Maier, M., Cheung, V., Lieder, F. Nature Human Behaviour, December 2024 (Submitted)
Many controversies arise from differences in how people resolve moral dilemmas by following deontological moral rules versus consequentialist cost-benefit reasoning (CBR). This article explores whether and, if so, how these seemingly intractable differences may arise from experience and whether they can be overcome through moral learning. We designed a new experimental paradigm to investigate moral learning from consequences of previous decisions. Our participants (N=387) faced a series of realistic moral dilemmas between two conflicting choices: one prescribed by a moral rule and the other favored by CBR. Critically, we let them observe the consequences of each of their decisions before making the next one. In one condition, CBR-based decisions consistently led to good outcomes, whereas rule-based decisions consistently led to bad outcomes. In the other condition, this contingency was reversed. We observed systematic, experience-dependent changes in people's moral rightness ratings and moral decisions over the course of just 13 decisions. Without being aware of it, participants adjusted how much moral weight they gave to CBR versus moral rules according to which approach produced better consequences in their respective experimental condition. These learning effects transferred to their subsequent responses to the Oxford Utilitarianism Scale, indicating genuine moral learning rather than task-specific effects. Our findings demonstrate the existence of rapid adaptive moral learning from the consequences of previous decisions. Individual differences in morality may thus be more malleable than previously thought.
DOI BibTeX

Empirical Inference Conference Paper Neural Characteristic Activation Analysis and Geometric Parameterization for ReLU Networks Chen, W., Ge, H. Advances in Neural Information Processing Systems 37 (NeurIPS 2024) , 37:97562-97586, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper On Affine Homotopy between Language Encoders Chan, R., Bourmasmoud, R., Svete, A., Ren, Y., Guo, Q., Jin, Z., Ravfogel, S., Sachan, M., Schölkopf, B., El-Assady, M., Cotterell, R. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:73337-73365, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Haptic Intelligence Ph.D. Thesis Precision Haptics in Gait Retraining for Knee Osteoarthritis Rokhmanova, N. Carnegie Mellon University, Pittsburgh, USA, December 2024, Department of Mechanical Engineering (Published)
Gait retraining, or teaching patients to walk in ways that reduce joint loading, shows promise as a conservative intervention for knee osteoarthritis. However, its use in clinical settings remains limited by challenges in prescribing optimal gait patterns and delivering precise, real-time biofeedback. This thesis presents four interconnected studies that aim to address these barriers to clinical adoption: First, a regression model was developed to predict patient-specific biomechanical responses to a gait modification using only simple clinical measures, reducing the need for instrumented gait analysis. Second, we identified how inertial sensor accuracy fundamentally impacts motor learning outcomes during gait retraining, demonstrating the importance of reliable kinematic tracking. Third, we designed and validated an open-source wearable haptic platform called ARIADNE, which delivers precise vibrotactile motion guidance and enables rigorous comparison of feedback strategies for gait retraining. This platform's integrated sensing revealed how anatomical placement and tissue properties influence vibration transmission and perception. Finally, a gait retraining study demonstrated that vibrotactile feedback significantly improves both learning and retention of therapeutic gait patterns compared to verbal instruction alone, highlighting the critical role of precise biofeedback systems in rehabilitation. These contributions help advance the field's understanding of the sensorimotor principles underlying gait retraining while providing practical tools to support future clinical implementation.
BibTeX

Social Foundations of Computation Algorithms and Society Conference Paper Questioning the Survey Responses of Large Language Models Dominguez-Olmedo, R., Hardt, M., Mendler-Dünner, C. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), The Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), Oral, December 2024 (Published)
As large language models increase in capability, researchers have started to conduct surveys of all kinds on these models in order to investigate the population represented by their responses. In this work, we critically examine language models' survey responses on the basis of the well-established American Community Survey by the U.S. Census Bureau and investigate whether they elicit a faithful representations of any human population. Using a de-facto standard multiple-choice prompting technique and evaluating 39 different language models using systematic experiments, we establish two dominant patterns: First, models' responses are governed by ordering and labeling biases, leading to variations across models that do not persist after adjusting for systematic biases. Second, models' responses do not contain the entropy variations and statistical signals typically found in human populations. As a result, a binary classifier can almost perfectly differentiate model-generated data from the responses of the U.S. census. At the same time, models' relative alignment with different demographic subgroups can be predicted from the subgroups' entropy, irrespective of the model's training data or training strategy. Taken together, our findings suggest caution in treating models' survey responses as equivalent to those of human populations.
ArXiv URL BibTeX

Empirical Inference Conference Paper Shaving Weights with Occam’s Razor: Bayesian Sparsification for Neural Networks using the Marginal Likelihood Dhahri, R., Immer, A., Charpentier, B., Günnemann, S., Fortuin, V. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:24959-24989, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Sourcerer: Sample-based Maximum Entropy Source Distribution Estimation Vetter, J., Moss, G., Schröder, C., Gao, R., Macke, J. H. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:88772-88806, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Social Foundations of Computation Conference Paper The Fairness-Quality Trade-off in Clustering Hakim, R., Stoica, A., Papadimitriou, C. H., Yannakakis, M. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), The Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), December 2024 (Published) arXiv URL BibTeX

Empirical Inference Conference Paper Theoretical Characterisation of the Gauss Newton Conditioning in Neural Networks Zhao*, J., Singh*, S. P., Lucchi, A. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:114965-115000, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024, *equal contribution (Published) URL BibTeX

Empirical Inference Conference Paper What Makes and Breaks Safety Fine-tuning? A Mechanistic Study Jain, S., Lubana, E. S., Oksuz, K., Joy, T., Torr, P., Sanyal, A., Dokania, P. K. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:93406-93478, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Perceiving Systems Article PuzzleAvatar: Assembling 3D Avatars from Personal Albums Xiu, Y., Liu, Z., Tzionas, D., Black, M. J. ACM Transactions on Graphics, 43(6):1-15, ACM, December 2024 (Published)
Generating personalized 3D avatars is crucial for AR/VR. However, recent text-to-3D methods that generate avatars for celebrities or fictional characters, struggle with everyday people. Methods for faithful reconstruction typically require full-body images in controlled settings. What if a user could just upload their personal "OOTD" (Outfit Of The Day) photo collection and get a faithful avatar in return? The challenge is that such casual photo collections contain diverse poses, challenging viewpoints, cropped views, and occlusion (albeit with a consistent outfit, accessories and hairstyle). We address this novel "Album2Human" task by developing PuzzleAvatar, a novel model that generates a faithful 3D avatar (in a canonical pose) from a personal OOTD album, while bypassing the challenging estimation of body and camera pose. To this end, we fine-tune a foundational vision-language model (VLM) on such photos, encoding the appearance, identity, garments, hairstyles, and accessories of a person into (separate) learned tokens and instilling these cues into the VLM. In effect, we exploit the learned tokens as "puzzle pieces" from which we assemble a faithful, personalized 3D avatar. Importantly, we can customize avatars by simply inter-changing tokens. As a benchmark for this new task, we collect a new dataset, called PuzzleIOI, with 41 subjects in a total of nearly 1K OOTD configurations, in challenging partial photos with paired ground-truth 3D bodies. Evaluation shows that PuzzleAvatar not only has high reconstruction accuracy, outperforming TeCH and MVDreamBooth, but also a unique scalability to album photos, and strong robustness. Our code and data are publicly available for research purpose.
DOI URL BibTeX

Perceiving Systems Conference Paper SPARK: Self-supervised Personalized Real-time Monocular Face Capture Baert, K., Bharadwaj, S., Castan, F., Maujean, B., Christie, M., Abrevaya, V., Boukhayma, A. In SIGGRAPH Asia 2024 Conference Proceedings, SIGGRAPH Asia, December 2024 (Published)
Feedforward monocular face capture methods seek to reconstruct posed faces from a single image of a person. Current state of the art approaches have the ability to regress parametric 3D face models in real-time across a wide range of identities, lighting conditions and poses by leveraging large image datasets of human faces. These methods however suffer from clear limitations in that the underlying parametric face model only provides a coarse estimation of the face shape, thereby limiting their practical applicability in tasks that require precise 3D reconstruction (aging, face swapping, digital make-up, ...). In this paper, we propose a method for high-precision 3D face capture taking advantage of a collection of unconstrained videos of a subject as prior information. Our proposal builds on a two stage approach. We start with the reconstruction of a detailed 3D face avatar of the person, capturing both precise geometry and appearance from a collection of videos. We then use the encoder from a pre-trained monocular face reconstruction method, substituting its decoder with our personalized model, and proceed with transfer learning on the video collection. Using our pre-estimated image formation model, we obtain a more precise self-supervision objective, enabling improved expression and pose alignment. This results in a trained encoder capable of efficiently regressing pose and expression parameters in real-time from previously unseen images, which combined with our personalized geometry model yields more accurate and high fidelity mesh inference. Through extensive qualitative and quantitative evaluation, we showcase the superiority of our final model as compared to state-of-the-art baselines, and demonstrate its generalization ability to unseen pose, expression and lighting.
DOI URL BibTeX

Perceiving Systems Article StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal Ye, C., Qiu, L., Gu, X., Zuo, Q., Wu, Y., Dong, Z., Bo, L., Xiu, Y., Han, X. ACM Transactions on Graphics, 43(6):1-18, ACM, December 2024 (Published)
This work addresses the challenge of high-quality surface normal estimation from monocular colored inputs (i.e., images and videos), a field which has recently been revolutionized by repurposing diffusion priors. However, previous attempts still struggle with stochastic inference, conflicting with the deterministic nature of the Image2Normal task, and costly ensembling step, which slows down the estimation process. Our method, StableNormal, mitigates the stochasticity of the diffusion process by reducing inference variance, thus producing "Stable-and-Sharp" normal estimates without any additional ensembling process. StableNormal works robustly under challenging imaging conditions, such as extreme lighting, blurring, and low quality. It is also robust against transparent and reflective surfaces, as well as cluttered scenes with numerous objects. Specifically, StableNormal employs a coarse-to-fine strategy, which starts with a one-step normal estimator (YOSO) to derive an initial normal guess, that is relatively coarse but reliable, then followed by a semantic-guided refinement process (SG-DRN) that refines the normals to recover geometric details. The effectiveness of StableNormal is demonstrated through competitive performance in standard datasets such as DIODE-indoor, iBims, ScannetV2 and NYUv2, and also in various downstream tasks, such as surface reconstruction and normal enhancement. These results evidence that StableNormal retains both the "stability" and "sharpness" for accurate normal estimation. StableNormal represents a baby attempt to repurpose diffusion priors for deterministic estimation. To democratize this, code and models have been publicly available.
DOI BibTeX

Perceiving Systems Ph.D. Thesis Beyond the Surface: Statistical Approaches to Internal Anatomy Prediction Keller, M. University of Tübingen, November 2024 (Published)
The creation of personalized anatomical digital twins is important in the fields of medicine, computer graphics, sports science, and biomechanics. But to observe a subject’s anatomy, expensive medical devices (MRI or CT) are required and creating a digital model is often time-consuming and involves manual effort. Instead, we can leverage the fact that the shape of the body surface is correlated with the internal anatomy; indeed, the external body shape is related to the bone lengths, the angle of skeletal articulation, and the thickness of various soft tissues. In this thesis, we leverage the correlation between body shape and anatomy and aim to infer the internal anatomy solely from the external appearance. Learning this correlation requires paired observations of people’s body shape, and their internal anatomy, which raises three challenges. First, building such datasets requires specific capture modalities. Second, these data must be annotated, i.e. the body shape and anatomical structures must be identified and segmented, which is often a tedious manual task requiring expertise. Third, to learn a model able to capture the correlation between body shape and internal anatomy, the data of people with various shapes and poses has to be put into correspondence. In this thesis, we cover three works that focus on learning this correlation. We show that we can infer the skeleton geometry, the bone location inside the body, and the soft tissue location solely from the external body shape. First, in the OSSO project, we leverage 2D medical scans to construct a paired dataset of 3D body shapes and corresponding 3D skeleton shapes. This dataset allows us to learn the correlation between body and skeleton shapes, enabling the inference of a custom skeleton based on an individual’s body. However, since this learning process is based on static views of subjects in specific poses, we cannot evaluate the accuracy of skeleton inference in different poses. To predict the bone orientation within the body in various poses, we need dynamic data. To track bones inside the body in motion, we can leverage methods from the biomechanics field. So in the second work, instead of medical imaging, we use a biomechanical skeletal model along with simulation to build a paired dataset of bodies in motion and their corresponding skeletons. In this work, we build such a dataset and learn SKEL, a body shape and skeleton model that includes the locations of anatomical bones from any body shape and in any pose. After dealing with the skeletal structure, we broaden our focus to include different layers of soft tissues. In the third work, HIT, we leverage segmented medical data to learn to predict the distribution of adipose tissues (fat) and lean tissues (muscle, organs, etc.) inside the body.
pdf URL BibTeX

Deep Models and Optimization Conference Paper Adaptive Methods through the Lens of SDEs: Theoretical Insights on the Role of Noise Monzio Compagnoni, E., Liu, T., Islamov, R., Proske, F. N., Orvieto, A., Lucchi, A. In The Thirteenth International Conference on Learning Representations, ICLR 2025, The Thirteenth International Conference on Learning Representations, November 2024 (Accepted) BibTeX

Safety- and Efficiency- aligned Learning Conference Paper Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers Singh, S., Singhania, P., Ranjan, A., Kirchenbauer, J., Geiping, J., Wen, Y., Jain, N., Hans, A., Shu, M., Tomar, A., Goldstein, T., Bhatele, A. International Conference for High Performance Computing, Networking, Storage and Analysis SC (SC24), 36-49, Supercomputing, IEEE Digital Library, Atlanta, GA, International Conference for High Performance Computing, November 2024 (Published) DOI URL BibTeX

Perceiving Systems Ph.D. Thesis Aerial Markerless Motion Capture Saini, N. November 2024 (Published)
Human motion capture (mocap) is important for several applications such as healthcare, sports, animation etc. Existing markerless mocap methods employ multiple static and calibrated RGB cameras to infer the subject’s pose. These methods are not suitable for outdoor and unstructured scenarios. They need an extra calibration step before the mocap session and cannot dynamically adapt the viewpoint for the best mocap performance. A mocap setup consisting of multiple unmanned aerial vehicles with onboard cameras is ideal for such situations. However, estimating the subject’s motion together with the camera motions is an under-constrained problem. In this thesis, we explore multiple approaches where we split this problem into multiple stages. We obtain the prior knowledge or rough estimates of the subject’s or the cameras’ motion in the initial stages and exploit them in the final stages. In our work AirCap-Pose-Estimator, we use extra sensors (an IMU and a GPS receiver) on the multiple moving cameras to obtain the approximate camera poses. We use these estimates to jointly optimize the camera poses, the 3D body pose and the subject’s shape to robustly fit the 2D keypoints of the subject. We show that the camera pose estimates using just the sensors are not accurate enough, and our joint optimization formulation improves the accuracy of the camera poses while estimating the subject’s poses. Placing extra sensors on the cameras is not always feasible. That is why, in our work AirPose, we introduce a distributed neural network that runs on board, estimating the subject’s motion and calibrating the cameras relative to the subject. We utilize realistic human scans with ground truth to train our network. We further fine-tune it using a small amount of real-world data. Finally, we propose a bundle-adjustment method (AirPose+), which utilizes the initial estimates from our network to recover high-quality motions of the subject and the cameras. Finally, we consider a generic setup consisting of multiple static and moving cameras. We propose a method that estimates the poses of the cameras and the human relative to the ground plane using only 2D human keypoints. We learn a human motion prior using a large amount of human mocap data and use it in a novel multi-stage optimization approach to fit the SMPL human body model and the camera poses to the 2D keypoints. We show that in addition to the aerial cameras, our method works for smartphone cameras and standard RGB ground cameras. This thesis advances the field of markerless mocap which is currently limited to multiple static calibrated RGB cameras. Our methods allow the user to use moving RGB cameras and skip the extrinsic calibration. In the future, we will explore the usage of a single moving camera without even needing camera intrinsics.
thesis BibTeX

Organizational Leadership and Diversity Article From challenges to opportunities: navigating the human response to automated agents in the workplace Ðula, I., Berberena, T., Keplinger, K., Wirzberger, M. Humanities and Social Sciences Communications, 11:1454, November 2024 (Published)
Workers are increasingly embracing Artificial Intelligence (AI) to optimise various aspects of their operations in the workplace. While AI offers new opportunities, it also presents unintended challenges that they must carefully navigate. This paper aims to develop a deeper understanding of workers’ experiences with interactions with automated agents (AA) in the workplace and provide actionable recommendations for organisational leaders to achieve positive outcomes. We propose and test a simulation model that quantifies and predicts workers’ experiences with AA, shedding light on the interplay of diverse variables, such as workload, effort and trust. Our findings suggest that lower-efficiency AA might outperform higher-efficiency ones due to the constraining influence of trust on adoption rates. Additionally, we find that lower initial trust in AA could lead to increased usage in certain scenarios and that stronger emotional and social responses to the use of AA may foster greater trust but result in decreased AA utilisation. This interdisciplinary research blends a systems dynamics approach with management theories and psychological concepts, aiming to bridge existing gaps and foster the sustainable and effective implementation of AA in the workplace. Ultimately, our research endeavour contributes to advancing the field of human-AI interaction in the workplace.
navigating the human response to automated agents in the workplace navigating the human response to automated agents in the workplace DOI URL BibTeX

Haptic Intelligence Ph.D. Thesis Data-Driven Needle Puncture Detection for the Delivery of Urgent Medical Care in Space L’Orsa, R. University of Calgary, Calgary, Canada, November 2024, Department of Electrical and Computer Engineering (Published)
Needle thoracostomy (NT) is a surgical procedure that treats one of the most preventable causes of trauma-related death: dangerous accumulations of air between the chest wall and the lungs. However, needle-tip overshoot of the target space can result in the inadvertent puncture of critical structures like the heart. This type of complication is fatal without urgent surgical care, which is not available in resource-poor environments like space. Since NT is done blind, operators rely on tool sensations to identify when the needle has reached its target. Needle instrumentation could enable puncture notifications to help operators limit tool-tip overshoot, but such a solution requires reliable puncture detection from manual (i.e., variable-velocity) needle insertion data streams. Data-driven puncture-detection (DDPD) algorithms are appropriate for this application, but their performance has historically been unacceptably low for use in safety-critical applications. This work contributes towards the development of an intelligent device for manual NT assistance by proposing two novel DDPD algorithms. Three data sets are collected that provide needle forces and displacements acquired during insertions into ex vivo porcine tissue analogs for the human chest, and factors affecting DDPD algorithm performance are analyzed in these data. Puncture event features are examined for each sensor, and the suitability of both accelerometer measurements and diffuse reflectance measurements are evaluated within the context of NT. Finally, DDPD ensembles are proposed that yield a 5.1-fold improvement in precision as compared to the traditional force-only DDPD approach. These results lay a foundation for improving the urgent delivery of percutaneous procedures in space and other resource-poor settings.
BibTeX

Haptic Intelligence Autonomous Learning Empirical Inference Miscellaneous Demonstration: Minsight - A Soft Vision-Based Tactile Sensor for Robotic Fingertips Andrussow, I., Sun, H., Martius, G., Kuchenbecker, K. J. Hands-on demonstration presented at the Conference on Robot Learning (CoRL), Munich, Germany, November 2024 (Published)
Beyond vision and hearing, tactile sensing enhances a robot's ability to dexterously manipulate unfamiliar objects and safely interact with humans. Giving touch sensitivity to robots requires compact, robust, affordable, and efficient hardware designs, especially for high-resolution tactile sensing. We present a soft vision-based tactile sensor engineered to meet these requirements. Comparable in size to a human fingertip, Minsight uses machine learning to output high-resolution directional contact force distributions at 60 Hz. Minsight's tactile force maps enable precise sensing of fingertip contacts, which we use in this hands-on demonstration to allow a 3-DoF robot arm to physically track contact with a user's finger. While observing the colorful image captured by Minsight's internal camera, attendees can experience how its ability to detect delicate touches in all directions facilitates real-time robot interaction.
BibTeX