Header logo is


2025


{OpenCapBench}: A Benchmark to Bridge Pose Estimation and Biomechanics
OpenCapBench: A Benchmark to Bridge Pose Estimation and Biomechanics

Gozlan, Y., Falisse, A., Uhlrich, S., Gatti, A., Black, M., Chaudhari, A.

In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , February 2025 (inproceedings)

Abstract
Pose estimation has promised to impact healthcare by enabling more practical methods to quantify nuances of human movement and biomechanics. However, despite the inherent connection between pose estimation and biomechanics, these disciplines have largely remained disparate. For example, most current pose estimation benchmarks use metrics such as Mean Per Joint Position Error, Percentage of Correct Keypoints, or mean Average Precision to assess performance, without quantifying kinematic and physiological correctness - key aspects for biomechanics. To alleviate this challenge, we develop OpenCapBench to offer an easy-to-use unified benchmark to assess common tasks in human pose estimation, evaluated under physiological constraints. OpenCapBench computes consistent kinematic metrics through joints angles provided by an open-source musculoskeletal modeling software (OpenSim). Through OpenCapBench, we demonstrate that current pose estimation models use keypoints that are too sparse for accurate biomechanics analysis. To mitigate this challenge, we introduce SynthPose, a new approach that enables finetuning of pre-trained 2D human pose models to predict an arbitrarily denser set of keypoints for accurate kinematic analysis through the use of synthetic data. Incorporating such finetuning on synthetic data of prior models leads to twofold reduced joint angle errors. Moreover, OpenCapBench allows users to benchmark their own developed models on our clinically relevant cohort. Overall, OpenCapBench bridges the computer vision and biomechanics communities, aiming to drive simultaneous advances in both areas.

ps

arXiv [BibTex]

2025


arXiv [BibTex]

2024


no image
ImageNot: A Contrast with ImageNet Preserves Model Rankings

Salaudeen, O., Hardt, M.

arXiv preprint arXiv:2404.02112, 2024 (conference) Submitted

Abstract
We introduce ImageNot, a dataset designed to match the scale of ImageNet while differing drastically in other aspects. We show that key model architectures developed for ImageNet over the years rank identically when trained and evaluated on ImageNot to how they rank on ImageNet. This is true when training models from scratch or fine-tuning them. Moreover, the relative improvements of each model over earlier models strongly correlate in both datasets. We further give evidence that ImageNot has a similar utility as ImageNet for transfer learning purposes. Our work demonstrates a surprising degree of external validity in the relative performance of image classification models. This stands in contrast with absolute accuracy numbers that typically drop sharply even under small changes to a dataset.

sf

ArXiv [BibTex]

2024


ArXiv [BibTex]


no image
Predictors from Causal Features Do Not Generalize Better to New Domains

Nastl, V. Y., Hardt, M.

arXiv preprint arXiv:2402.09891, 2024 (conference) Submitted

Abstract
We study how well machine learning models trained on causal features generalize across domains. We consider 16 prediction tasks on tabular datasets covering applications in health, employment, education, social benefits, and politics. Each dataset comes with multiple domains, allowing us to test how well a model trained in one domain performs in another. For each prediction task, we select features that have a causal influence on the target of prediction. Our goal is to test the hypothesis that models trained on causal features generalize better across domains. Without exception, we find that predictors using all available features, regardless of causality, have better in-domain and out-of-domain accuracy than predictors using causal features. Moreover, even the absolute drop in accuracy from one domain to the other is no better for causal predictors than for models that use all features. If the goal is to generalize to new domains, practitioners might as well train the best possible model on all available features.

sf

ArXiv [BibTex]


no image
An Engine Not a Camera: Measuring Performative Power of Online Search

Mendler-Dünner, C., Carovano, G., Hardt, M.

arXiv preprint arXiv:2405.19073, 2024 (conference) Submitted

Abstract
The power of digital platforms is at the center of major ongoing policy and regulatory efforts. To advance existing debates, we designed and executed an experiment to measure the power of online search providers, building on the recent definition of performative power. Instantiated in our setting, performative power quantifies the ability of a search engine to steer web traffic by rearranging results. To operationalize this definition we developed a browser extension that performs unassuming randomized experiments in the background. These randomized experiments emulate updates to the search algorithm and identify the causal effect of different content arrangements on clicks. We formally relate these causal effects to performative power. Analyzing tens of thousands of clicks, we discuss what our robust quantitative findings say about the power of online search engines. More broadly, we envision our work to serve as a blueprint for how performative power and online experiments can be integrated with future investigations into the economic power of digital platforms.

sf

ArXiv [BibTex]

ArXiv [BibTex]


MotionFix: Text-Driven 3D Human Motion Editing
MotionFix: Text-Driven 3D Human Motion Editing

Athanasiou, N., Cseke, A., Diomataris, M., Wen, M. J. B., Varol, G.

In SIGGRAPH Asia 2024 Conference Proceedings, ACM, December 2024 (inproceedings) To be published

Abstract
The focus of this paper is 3D motion editing. Given a 3D human motion and a textual description of the desired modification, our goal is to generate an edited motion as described by the text. The challenges include the lack of training data and the design of a model that faithfully edits the source motion. In this paper, we address both these challenges. We build a methodology to semi-automatically collect a dataset of triplets in the form of (i) a source motion, (ii) a target motion, and (iii) an edit text, and create the new dataset. Having access to such data allows us to train a conditional diffusion model that takes both the source motion and the edit text as input. We further build various baselines trained only on text-motion pairs datasets and show superior performance of our model trained on triplets. We introduce new retrieval-based metrics for motion editing and establish a new benchmark on the evaluation set. Our results are encouraging, paving the way for further research on fine-grained motion generation. Code and models will be made publicly available.

ps

link (url) [BibTex]

link (url) [BibTex]


no image
Questioning the Survey Responses of Large Language Models

Dominguez-Olmedo, R., Hardt, M., Mendler-Dünner, C.

arXiv preprint arXiv:2306.07951, 2024 (conference) Submitted

Abstract
As large language models increase in capability, researchers have started to conduct surveys of all kinds on these models in order to investigate the population represented by their responses. In this work, we critically examine language models' survey responses on the basis of the well-established American Community Survey by the U.S. Census Bureau and investigate whether they elicit a faithful representations of any human population. Using a de-facto standard multiple-choice prompting technique and evaluating 39 different language models using systematic experiments, we establish two dominant patterns: First, models' responses are governed by ordering and labeling biases, leading to variations across models that do not persist after adjusting for systematic biases. Second, models' responses do not contain the entropy variations and statistical signals typically found in human populations. As a result, a binary classifier can almost perfectly differentiate model-generated data from the responses of the U.S. census. At the same time, models' relative alignment with different demographic subgroups can be predicted from the subgroups' entropy, irrespective of the model's training data or training strategy. Taken together, our findings suggest caution in treating models' survey responses as equivalent to those of human populations.

sf

ArXiv [BibTex]


no image
Training on the Test Task Confounds Evaluation and Emergence

Dominguez-Olmedo, R., Dorner, F. E., Hardt, M.

arXiv preprint arXiv:2407.07890, 2024 (conference) Submitted

Abstract
We study a fundamental problem in the evaluation of large language models that we call training on the test task. Unlike wrongful practices like training on the test data, leakage, or data contamination, training on the test task is not malpractice. Rather, the term describes a growing set of techniques to include task-relevant data in the pretraining stage of a language model. We demonstrate that training on the test task confounds both relative model evaluations and claims about emergent capabilities. We argue that the seeming superiority of one model family over another may be explained by a different degree of training on the test task. To this end, we propose an effective method to adjust for training on the test task by fine-tuning each model under comparison on the same task-relevant data before evaluation. We then show that instances of emergent behavior largely vanish once we adjust for training on the test task. This also applies to reported instances of emergent behavior that cannot be explained by the choice of evaluation metric. Our work promotes a new perspective on the evaluation of large language models with broad implications for benchmarking and the study of emergent capabilities.

sf

ArXiv [BibTex]


no image
Demonstration: OCRA - A Kinematic Retargeting Algorithm for Expressive Whole-Arm Teleoperation

Mohan, M., Kuchenbecker, K. J.

Hands-on demonstration presented at the Conference on Robot Learning (CoRL), Munich, Germany, November 2024 (misc) Accepted

Abstract
Traditional teleoperation systems focus on controlling the pose of the end-effector (task space), often neglecting the additional degrees of freedom present in human and many robotic arms. This demonstration presents the Optimization-based Customizable Retargeting Algorithm (OCRA), which was designed to map motions from one serial kinematic chain to another in real time. OCRA is versatile, accommodating any robot joint counts and segment lengths, and it can retarget motions from human arms to kinematically different serial robot arms with revolute joints both expressively and efficiently. One of OCRA's key features is its customizability, allowing the user to adjust the emphasis between hand orientation error and the configuration error of the arm's central line, which we call the arm skeleton. To evaluate the perceptual quality of the motions generated by OCRA, we conducted a video-watching study with 70 participants; the results indicated that the algorithm produces robot motions that closely resemble human movements, with a median rating of 78/100, particularly when the arm skeleton error weight and hand orientation error are balanced. In this demonstration, the presenter will wear an Xsens MVN Link and teleoperate the arms of a NAO child-size humanoid robot to highlight OCRA's ability to create intuitive and human-like whole-arm motions.

hi

[BibTex]

[BibTex]


no image
Demonstration: Minsight - A Soft Vision-Based Tactile Sensor for Robotic Fingertips

Andrussow, I., Sun, H., Martius, G., Kuchenbecker, K. J.

Hands-on demonstration presented at the Conference on Robot Learning (CoRL), Munich, Germany, November 2024 (misc) Accepted

Abstract
Beyond vision and hearing, tactile sensing enhances a robot's ability to dexterously manipulate unfamiliar objects and safely interact with humans. Giving touch sensitivity to robots requires compact, robust, affordable, and efficient hardware designs, especially for high-resolution tactile sensing. We present a soft vision-based tactile sensor engineered to meet these requirements. Comparable in size to a human fingertip, Minsight uses machine learning to output high-resolution directional contact force distributions at 60 Hz. Minsight's tactile force maps enable precise sensing of fingertip contacts, which we use in this hands-on demonstration to allow a 3-DoF robot arm to physically track contact with a user's finger. While observing the colorful image captured by Minsight's internal camera, attendees can experience how its ability to detect delicate touches in all directions facilitates real-time robot interaction.

al hi ei

Project Page [BibTex]

Project Page [BibTex]


no image
Active Haptic Feedback for a Virtual Wrist-Anchored User Interface

Bartels, J. U., Sanchez-Tamayo, N., Sedlmair, M., Kuchenbecker, K. J.

Hands-on demonstration presented at the ACM Symposium on User Interface Software and Technology (UIST), Pittsburgh, USA, October 2024 (misc) Accepted

hi

DOI [BibTex]

DOI [BibTex]


Stable Video Portraits
Stable Video Portraits

Ostrek, M., Thies, J.

In European Conference on Computer Vision (ECCV 2024), LNCS, Springer Cham, European Conference on Computer Vision (ECCV 2024), October 2024 (inproceedings) Accepted

Abstract
Rapid advances in the field of generative AI and text-to-image methods in particular have transformed the way we interact with and perceive computer-generated imagery today. In parallel, much progress has been made in 3D face reconstruction, using 3D Morphable Models (3DMM). In this paper, we present Stable Video Portraits, a novel hybrid 2D/3D generation method that outputs photorealistic videos of talking faces leveraging a large pre-trained text-to-image prior (2D), controlled via a 3DMM (3D). Specifically, we introduce a person-specific fine-tuning of a general 2D stable diffusion model which we lift to a video model by providing temporal 3DMM sequences as conditioning and by introducing a temporal denoising procedure. As an output, this model generates temporally smooth imagery of a person with 3DMM-based controls, i.e., a person-specific avatar. The facial appearance of this person-specific avatar can be edited and morphed to text-defined celebrities, without any test-time fine-tuning. The method is analyzed quantitatively and qualitatively, and we show that our method outperforms state-of-the-art monocular head avatar methods.

ncs ps

link (url) [BibTex]

link (url) [BibTex]


On predicting {3D} bone locations inside the human body
On predicting 3D bone locations inside the human body

Dakri, A., Arora, V., Challier, L., Keller, M., Black, M. J., Pujades, S.

In 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), October 2024 (inproceedings)

Abstract
Knowing the precise location of the bones inside the human body is key in several medical tasks, such as patient placement inside an imaging device or surgical navigation inside a patient. Our goal is to predict the bone locations using only an external 3D body surface obser- vation. Existing approaches either validate their predictions on 2D data (X-rays) or with pseudo-ground truth computed from motion capture using biomechanical models. Thus, methods either suffer from a 3D-2D projection ambiguity or directly lack validation on clinical imaging data. In this work, we start with a dataset of segmented skin and long bones obtained from 3D full body MRI images that we refine into individual bone segmentations. To learn the skin to bones correlations, one needs to register the paired data. Few anatomical models allow to register a skeleton and the skin simultaneously. One such method, SKEL, has a skin and skeleton that is jointly rigged with the same pose parameters. How- ever, it lacks the flexibility to adjust the bone locations inside its skin. To address this, we extend SKEL into SKEL-J to allow its bones to fit the segmented bones while its skin fits the segmented skin. These precise fits allow us to train SKEL-J to more accurately infer the anatomical joint locations from the skin surface. Our qualitative and quantitative results show how our bone location predictions are more accurate than all existing approaches. To foster future research, we make available for research purposes the individual bone segmentations, the fitted SKEL-J models as well as the new inference methods.

ps

Project page [BibTex]

Project page [BibTex]


Synthesizing Environment-Specific People in Photographs
Synthesizing Environment-Specific People in Photographs

Ostrek, M., O’Sullivan, C., Black, M., Thies, J.

In European Conference on Computer Vision (ECCV 2024), LNCS, Springer Cham, European Conference on Computer Vision (ECCV 2024), October 2024 (inproceedings) Accepted

Abstract
We present ESP, a novel method for context-aware full-body generation, that enables photo-realistic synthesis and inpainting of people wearing clothing that is semantically appropriate for the scene depicted in an input photograph. ESP is conditioned on a 2D pose and contextual cues that are extracted from the photograph of the scene and integrated into the generation process, where the clothing is modeled explicitly with human parsing masks (HPM). Generated HPMs are used as tight guiding masks for inpainting, such that no changes are made to the original background. Our models are trained on a dataset containing a set of in-the-wild photographs of people covering a wide range of different environments. The method is analyzed quantitatively and qualitatively, and we show that ESP outperforms the state-of-the-art on the task of contextual full-body generation.

ncs ps

link (url) [BibTex]

link (url) [BibTex]


{HUMOS}: Human Motion Model Conditioned on Body Shape
HUMOS: Human Motion Model Conditioned on Body Shape

Tripathi, S., Taheri, O., Lassner, C., Black, M. J., Holden, D., Stoll, C.

In European Conference on Computer Vision (ECCV 2024), LNCS, Springer Cham, October 2024 (inproceedings)

Abstract
Generating realistic human motion is essential for many computer vision and graphics applications. The wide variety of human body shapes and sizes greatly impacts how people move. However, most existing motion models ignore these differences, relying on a standardized, average body. This leads to uniform motion across different body types, where movements don't match their physical characteristics, limiting diversity. To solve this, we introduce a new approach to develop a generative motion model based on body shape. We show that it's possible to train this model using unpaired data by applying cycle consistency, intuitive physics, and stability constraints, which capture the relationship between identity and movement. The resulting model generates diverse, physically plausible, and dynamically stable human motions that are both quantitatively and qualitatively more realistic than current state-of-the-art methods.

ps

project arXiv [BibTex]

project arXiv [BibTex]


GraspXL: Generating Grasping Motions for Diverse Objects at Scale
GraspXL: Generating Grasping Motions for Diverse Objects at Scale

Zhang, H., Christen, S., Fan, Z., Hilliges, O., Song, J.

In European Conference on Computer Vision (ECCV 2024), LNCS, Springer Cham, September 2024 (inproceedings) Accepted

ps

Code Video Paper [BibTex]

Code Video Paper [BibTex]


no image
Cutaneous Electrohydraulic (CUTE) Wearable Devices for Pleasant Broad-Bandwidth Haptic Cues

Sanchez-Tamayo, N., Yoder, Z., Rothemund, P., Ballardini, G., Keplinger, C., Kuchenbecker, K. J.

Advanced Science, (2402461):1-14, September 2024 (article)

Abstract
By focusing on vibrations, current wearable haptic devices underutilize the skin's perceptual capabilities. Devices that provide richer haptic stimuli, including contact feedback and/or variable pressure, are typically heavy and bulky due to the underlying actuator technology and the low sensitivity of hairy skin, which covers most of the body. This paper presents a system architecture for compact wearable devices that deliver salient and pleasant broad-bandwidth haptic cues: Cutaneous Electrohydraulic (CUTE) devices combine a custom materials design for soft haptic electrohydraulic actuators that feature high stroke, high force, and electrical safety with a comfortable mounting strategy that places the actuator in a non-contact resting position. A prototypical wrist-wearable CUTE device produces rich tactile sensations by making and breaking contact with the skin (2.44 mm actuation stroke), applying high controllable forces (exceeding 2.3 N), and delivering vibrations at a wide range of amplitudes and frequencies (0-200 Hz). A perceptual study with fourteen participants achieved 97.9% recognition accuracy across six diverse cues and verified their pleasant and expressive feel. This system architecture for wearable devices gives unprecedented control over the haptic cues delivered to the skin, providing an elegant and discreet way to activate the user's sense of touch.

hi rm

DOI [BibTex]


Electrohydraulic Musculoskeletal Robotic Leg for Agile, Adaptive, yet Energy-Efficient Locomotion
Electrohydraulic Musculoskeletal Robotic Leg for Agile, Adaptive, yet Energy-Efficient Locomotion

Buchner, T. J. K., Fukushima, T., Kazemipour, A., Gravert, S., Prairie, M., Romanescu, P., Arm, P., Zhang, Y., Wang, X., Zhang, S. L., Walter, J., Keplinger, C., Katzschmann, R. K.

Nature Communications, 15(1), September 2024 (article)

Abstract
Robotic locomotion in unstructured terrain demands an agile, adaptive, and energy-efficient architecture. To traverse such terrains, legged robots use rigid electromagnetic motors and sensorized drivetrains to adapt to the environment actively. These systems struggle to compete with animals that excel through their agile and effortless motion in natural environments. We propose a bio-inspired musculoskeletal leg architecture driven by antagonistic pairs of electrohydraulic artificial muscles. Our leg is mounted on a boom arm and can adaptively hop on varying terrain in an energy-efficient yet agile manner. It can also detect obstacles through capacitive self-sensing. The leg performs powerful and agile gait motions beyond 5 Hz and high jumps up to 40 % of the leg height. Our leg’s tunable stiffness and inherent adaptability allow it to hop over grass, sand, gravel, pebbles, and large rocks using only open-loop force control. The electrohydraulic leg features a low cost of transport (0.73), and while squatting, it consumes only a fraction of the energy (1.2 %) compared to its conventional electromagnetic counterpart. Its agile, adaptive, and energy-efficient properties would open a roadmap toward a new class of musculoskeletal robots for versatile locomotion and operation in unstructured natural environments.

rm

Press release Video (overview) Video (technical description) Article in pdf link (url) DOI [BibTex]

Press release Video (overview) Video (technical description) Article in pdf link (url) DOI [BibTex]


no image
Learning to Control Emulated Muscles in Real Robots: Towards Exploiting Bio-Inspired Actuator Morphology

Schumacher, P., Krause, L., Schneider, J., Büchler, D., Martius, G., Haeufle, D.

In 10th International Conference on Biomedical Robotics and Biomechatronics (BioRob), September 2024 (inproceedings) Accepted

ei

arXiv [BibTex]

arXiv [BibTex]


Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects
Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects

Fan, Z., Ohkawa, T., Yang, L., Lin, N., Zhou, Z., Zhou, S., Liang, J., Gao, Z., Zhang, X., Zhang, X., Li, F., Zheng, L., Lu, F., Zeid, K. A., Leibe, B., On, J., Baek, S., Prakash, A., Gupta, S., He, K., Sato, Y., Hilliges, O., Chang, H. J., Yao, A.

In European Conference on Computer Vision (ECCV 2024), LNCS, Springer Cham, September 2024 (inproceedings) Accepted

ps

Paper Leaderboard [BibTex]

Paper Leaderboard [BibTex]


{AWOL: Analysis WithOut synthesis using Language}
AWOL: Analysis WithOut synthesis using Language

Zuffi, S., Black, M. J.

In European Conference on Computer Vision (ECCV 2024), LNCS, Springer Cham, September 2024 (inproceedings)

ps

Paper [BibTex]

Paper [BibTex]


no image
Modeling Shank Tissue Properties and Quantifying Body Composition with a Wearable Actuator-Accelerometer Set

Rokhmanova, N., Martus, J., Faulkner, R., Fiene, J., Kuchenbecker, K. J.

Extended abstract (1 page) presented at the American Society of Biomechanics Annual Meeting (ASB), Madison, USA, August 2024 (misc)

hi

Project Page [BibTex]

Project Page [BibTex]


no image
Moûsai: Efficient Text-to-Music Diffusion Models

Schneider, F., Kamal, O., Jin, Z., Schölkopf, B.

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL), Volume 1: Long Papers, pages: 8050-8068, (Editors: Lun-Wei Ku and Andre Martins and Vivek Srikumar), Association for Computational Linguistics, August 2024 (conference)

ei

link (url) [BibTex]

link (url) [BibTex]


no image
Modelling Variability in Human Annotator Simulation

Wu*, W., Chen*, W., Zhang, C., Woodland, P. C.

Findings of the Association for Computational Linguistics (ACL), pages: 1139-1157, (Editors: Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek), Association for Computational Linguistics, August 2024, *equal contribution (conference)

ei

link (url) [BibTex]

link (url) [BibTex]


no image
Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals

Ortu*, F., Jin*, Z., Doimo, D., Sachan, M., Cazzaniga, A., Schölkopf, B.

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL) , Volume 1, Long Papers, pages: 8420-8436, (Editors: Lun-Wei Ku and Andre Martins and Vivek Srikumar), Association for Computational Linguistics, August 2024, *equal contribution (conference)

ei

arXiv link (url) [BibTex]

arXiv link (url) [BibTex]


no image
CausalCite: A Causal Formulation of Paper Citations

Kumar, I., Jin, Z., Mokhtarian, E., Guo, S., Chen, Y., Kiyavash, N., Sachan, M., Schölkopf, B.

Findings of the Association for Computational Linguistics (ACL), pages: 8395-8410, (Editors: Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek), Association for Computational Linguistics, August 2024 (conference)

ei

arXiv link (url) [BibTex]

arXiv link (url) [BibTex]


Re-Thinking Inverse Graphics with Large Language Models
Re-Thinking Inverse Graphics with Large Language Models

Kulits, P., Feng, H., Liu, W., Abrevaya, V., Black, M. J.

Transactions on Machine Learning Research, August 2024 (article)

Abstract
Inverse graphics -- the task of inverting an image into physical variables that, when rendered, enable reproduction of the observed scene -- is a fundamental challenge in computer vision and graphics. Successfully disentangling an image into its constituent elements, such as the shape, color, and material properties of the objects of the 3D scene that produced it, requires a comprehensive understanding of the environment. This complexity limits the ability of existing carefully engineered approaches to generalize across domains. Inspired by the zero-shot ability of large language models (LLMs) to generalize to novel contexts, we investigate the possibility of leveraging the broad world knowledge encoded in such models to solve inverse-graphics problems. To this end, we propose the Inverse-Graphics Large Language Model (IG-LLM), an inverse-graphics framework centered around an LLM, that autoregressively decodes a visual embedding into a structured, compositional 3D-scene representation. We incorporate a frozen pre-trained visual encoder and a continuous numeric head to enable end-to-end training. Through our investigation, we demonstrate the potential of LLMs to facilitate inverse graphics through next-token prediction, without the application of image-space supervision. Our analysis enables new possibilities for precise spatial reasoning about images that exploit the visual knowledge of LLMs. We release our code and data at https://ig-llm.is.tue.mpg.de/ to ensure the reproducibility of our investigation and to facilitate future research.

ps

link (url) [BibTex]

link (url) [BibTex]


no image
Leveraging Task Structures for Improved Identifiability in Neural Network Representations

Chen*, W., Horwood*, J., Heo, J., Hernández-Lobato, J. M.

Transactions on Machine Learning Research, August 2024, *equal contribution (article)

ei

link (url) [BibTex]

link (url) [BibTex]


no image
Adapting a High-Fidelity Simulation of Human Skin for Comparative Touch Sensing

Schulz, A., Serhat, G., Kuchenbecker, K. J.

Extended abstract (1 page) presented at the American Society of Biomechanics Annual Meeting (ASB), Madison, USA, August 2024 (misc)

hi

[BibTex]

[BibTex]


no image
On the Growth of Mistakes in Differentially Private Online Learning: A Lower Bound Perspective

Dmitriev, D., Szabó, K., Sanyal, A.

Proceedings of the 37th Annual Conference on Learning Theory (COLT), 247, pages: 1379-1398, Proceedings of Machine Learning Research, (Editors: Agrawal, Shipra and Roth, Aaron), PMLR, July 2024, (talk) (conference)

ei

link (url) [BibTex]

link (url) [BibTex]


no image
Robustness of Nonlinear Representation Learning

Buchholz, S., Schölkopf, B.

Proceedings of the 41st International Conference on Machine Learning (ICML), 235, pages: 4785-4821, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (conference)

ei

link (url) [BibTex]

link (url) [BibTex]


no image
Diffusion Tempering Improves Parameter Estimation with Probabilistic Integrators for ODEs

Beck, J., Bosch, N., Deistler, M., Kadhim, K. L., Macke, J. H., Hennig, P., Berens, P.

Proceedings of the 41st International Conference on Machine Learning (ICML), 235, pages: 3305-3326, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (conference)

ei

arXiv link (url) [BibTex]

arXiv link (url) [BibTex]


no image
Simultaneous identification of models and parameters of scientific simulators

Schröder, C., Macke, J. H.

Proceedings of the 41st International Conference on Machine Learning (ICML), 235, pages: 43895-43927, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (conference)

ei

link (url) [BibTex]

link (url) [BibTex]


no image
Causal Action Influence Aware Counterfactual Data Augmentation

Urpi, N. A., Bagatella, M., Vlastelica, M., Martius, G.

In Proceedings of the 41st International Conference on Machine Learning (ICML), 235, pages: 1709-1729, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (inproceedings)

al

link (url) [BibTex]

link (url) [BibTex]


no image
Position: Understanding LLMs Requires More Than Statistical Generalization

Reizinger, P., Ujváry, S., Mészáros, A., Kerekes, A., Brendel, W., Huszár, F.

Proceedings of the 41st International Conference on Machine Learning (ICML), 235, pages: 42365-42390, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (conference)

ei robustml

arXiv link (url) [BibTex]

arXiv link (url) [BibTex]


no image
Diffusive Gibbs Sampling

Chen*, W., Zhang*, M., Paige, B., Hernández-Lobato, J. M., Barber, D.

Proceedings of the 41st International Conference on Machine Learning (ICML), 235, pages: 7731-7747, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024, *equal contribution (conference)

ei

link (url) [BibTex]

link (url) [BibTex]


no image
What Makes Safety Fine-tuning Methods Safe? A Mechanistic Study

Jain, S., Lubana, E. S., Oksuz, K., Joy, T., Torr, P. H. S., Sanyal, A., Dokania, P. K.

ICML 2024 Workshop on Mechanistic Interpretability (Spotlight), July 2024 (conference)

ei

link (url) [BibTex]

link (url) [BibTex]


no image
Multimodal Multi-User Surface Recognition with the Kernel Two-Sample Test

Khojasteh, B., Solowjow, F., Trimpe, S., Kuchenbecker, K. J.

IEEE Transactions on Automation Science and Engineering, 21(3):4432-4447, July 2024 (article)

Abstract
Machine learning and deep learning have been used extensively to classify physical surfaces through images and time-series contact data. However, these methods rely on human expertise and entail the time-consuming processes of data and parameter tuning. To overcome these challenges, we propose an easily implemented framework that can directly handle heterogeneous data sources for classification tasks. Our data-versus-data approach automatically quantifies distinctive differences in distributions in a high-dimensional space via kernel two-sample testing between two sets extracted from multimodal data (e.g., images, sounds, haptic signals). We demonstrate the effectiveness of our technique by benchmarking against expertly engineered classifiers for visual-audio-haptic surface recognition due to the industrial relevance, difficulty, and competitive baselines of this application; ablation studies confirm the utility of key components of our pipeline. As shown in our open-source code, we achieve 97.2% accuracy on a standard multi-user dataset with 108 surface classes, outperforming the state-of-the-art machine-learning algorithm by 6% on a more difficult version of the task. The fact that our classifier obtains this performance with minimal data processing in the standard algorithm setting reinforces the powerful nature of kernel methods for learning to recognize complex patterns. Note to Practitioners—We demonstrate how to apply the kernel two-sample test to a surface-recognition task, discuss opportunities for improvement, and explain how to use this framework for other classification problems with similar properties. Automating surface recognition could benefit both surface inspection and robot manipulation. Our algorithm quantifies class similarity and therefore outputs an ordered list of similar surfaces. This technique is well suited for quality assurance and documentation of newly received materials or newly manufactured parts. More generally, our automated classification pipeline can handle heterogeneous data sources including images and high-frequency time-series measurements of vibrations, forces and other physical signals. As our approach circumvents the time-consuming process of feature engineering, both experts and non-experts can use it to achieve high-accuracy classification. It is particularly appealing for new problems without existing models and heuristics. In addition to strong theoretical properties, the algorithm is straightforward to use in practice since it requires only kernel evaluations. Its transparent architecture can provide fast insights into the given use case under different sensing combinations without costly optimization. Practitioners can also use our procedure to obtain the minimum data-acquisition time for independent time-series data from new sensor recordings.

hi

DOI Project Page [BibTex]

DOI Project Page [BibTex]


no image
Allocation Requires Prediction Only if Inequality Is Low

Shirali, A., Abebe*, R., Hardt*, M.

In Proceedings of the 41st International Conference on Machine Learning (ICML), PMLR, July 2024, *equal contribution (inproceedings)

Abstract
Algorithmic predictions are emerging as a promising solution concept for efficiently allocating societal resources. Fueling their use is an underlying assumption that such systems are necessary to identify individuals for interventions. We propose a principled framework for assessing this assumption: Using a simple mathematical model, we evaluate the efficacy of prediction-based allocations in settings where individuals belong to larger units such as hospitals, neighborhoods, or schools. We find that prediction-based allocations outperform baseline methods using aggregate unit-level statistics only when between-unit inequality is low and the intervention budget is high. Our results hold for a wide range of settings for the price of prediction, treatment effect heterogeneity, and unit-level statistics’ learnability. Combined, we highlight the potential limits to improving the efficacy of interventions through prediction

sf

ArXiv link (url) [BibTex]

ArXiv link (url) [BibTex]


no image
Inherent Trade-Offs between Diversity and Stability in Multi-Task Benchmarks

Zhang, G., Hardt, M.

In Proceedings of the 41st International Conference on Machine Learning (ICML), PMLR, July 2024 (inproceedings)

Abstract
We examine multi-task benchmarks in machine learning through the lens of social choice theory. We draw an analogy between benchmarks and electoral systems, where models are candidates and tasks are voters. This suggests a distinction between cardinal and ordinal benchmark systems. The former aggregate numerical scores into one model ranking; the latter aggregate rankings for each task. We apply Arrow's impossibility theorem to ordinal benchmarks to highlight the inherent limitations of ordinal systems, particularly their sensitivity to the inclusion of irrelevant models. Inspired by Arrow's theorem, we empirically demonstrate a strong trade-off between diversity and sensitivity to irrelevant changes in existing multi-task benchmarks. Our result is based on new quantitative measures of diversity and sensitivity that we introduce. Sensitivity quantifies the impact that irrelevant changes to tasks have on a benchmark. Diversity captures the degree of disagreement in model rankings across tasks. We develop efficient approximation algorithms for both measures, as exact computation is computationally challenging. Through extensive experiments on seven cardinal benchmarks and eleven ordinal benchmarks, we demonstrate a clear trade-off between diversity and stability: The more diverse a multi-task benchmark, the more sensitive to trivial changes it is. Additionally, we show that the aggregated rankings of existing benchmarks are highly unstable under irrelevant changes. The codes and data are available at https://socialfoundations.github.io/benchbench/.

sf

ArXiv link (url) [BibTex]

ArXiv link (url) [BibTex]


no image
Don’t Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget

Dorner, F. E., Hardt, M.

In Proceedings of the 41st International Conference on Machine Learning (ICML), PMLR, July 2024 (inproceedings)

Abstract
We study how to best spend a budget of noisy labels to compare the accuracy of two binary classifiers. It's common practice to collect and aggregate multiple noisy labels for a given data point into a less noisy label via a majority vote. We prove a theorem that runs counter to conventional wisdom. If the goal is to identify the better of two classifiers, we show it's best to spend the budget on collecting a single label for more samples. Our result follows from a non-trivial application of Cram\'er's theorem, a staple in the theory of large deviations. We discuss the implications of our work for the design of machine learning benchmarks, where they overturn some time-honored recommendations. In addition, our results provide sample size bounds superior to what follows from Hoeffding's bound.

sf

ArXiv link (url) [BibTex]

ArXiv link (url) [BibTex]


no image
Learning with 3D rotations, a hitchhiker’s guide to SO(3)

Geist, A. R., Frey, J., Zhobro, M., Levina, A., Martius, G.

In Proceedings of the 41st International Conference on Machine Learning (ICML), 235, pages: 15331-15350, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (inproceedings)

al

link (url) [BibTex]

link (url) [BibTex]


no image
LPGD: A General Framework for Backpropagation through Embedded Optimization Layers

Paulus, A., Martius, G., Musil, V.

In Proceedings of the 41st International Conference on Machine Learning (ICML), 235, pages: 39989-40014, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (inproceedings)

al

link (url) [BibTex]

link (url) [BibTex]


no image
Improving Neural Additive Models with Bayesian Principles

Bouchiat, K., Immer, A., Yèche, H., Rätsch, G., Fortuin, V.

Proceedings of the 41st International Conference on Machine Learning (ICML), 235, pages: 4416-4443, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (conference)

ei

link (url) [BibTex]

link (url) [BibTex]


no image
Unveiling CLIP Dynamics: Linear Mode Connectivity and Generalization

Abdolahpourrostam, A., Sanyal, A., Moosavi-Dezfooli, S.

ICML 2024 Workshop on Foundation Models in the Wild, July 2024 (conference)

ei

link (url) [BibTex]

link (url) [BibTex]


no image
A Sparsity Principle for Partially Observable Causal Representation Learning

Xu, D., Yao, D., Lachapelle, S., Taslakian, P., von Kügelgen, J., Locatello, F., Magliacane, S.

Proceedings of the 41st International Conference on Machine Learning (ICML), 235, pages: 55389-55433, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (conference)

ei

link (url) [BibTex]

link (url) [BibTex]


{ContourCraft}: Learning to Resolve Intersections in Neural Multi-Garment Simulations
ContourCraft: Learning to Resolve Intersections in Neural Multi-Garment Simulations

Grigorev, A., Becherini, G., Black, M., Hilliges, O., Thomaszewski, B.

In ACM SIGGRAPH 2024 Conference Papers, pages: 1-10, SIGGRAPH ’24, Association for Computing Machinery, New York, NY, USA, July 2024 (inproceedings)

Abstract
Learning-based approaches to cloth simulation have started to show their potential in recent years. However, handling collisions and intersections in neural simulations remains a largely unsolved problem. In this work, we present ContourCraft, a learning-based solution for handling intersections in neural cloth simulations. Unlike conventional approaches that critically rely on intersection-free inputs, ContourCraft robustly recovers from intersections introduced through missed collisions, self-penetrating bodies, or errors in manually designed multi-layer outfits. The technical core of ContourCraft is a novel intersection contour loss that penalizes interpenetrations and encourages rapid resolution thereof. We integrate our intersection loss with a collision-avoiding repulsion objective into a neural cloth simulation method based on graph neural networks (GNNs). We demonstrate our method’s ability across a challenging set of diverse multi-layer outfits under dynamic human motions. Our extensive analysis indicates that ContourCraft significantly improves collision handling for learned simulation and produces visually compelling results.

ps

paper arXiv project video code DOI [BibTex]

paper arXiv project video code DOI [BibTex]


no image
Causal Inference from Competing Treatments

Stoica, A., Nastl, V. Y., Hardt, M.

In Proceedings of the 41st International Conference on Machine Learning (ICML), PMLR, July 2024 (inproceedings)

Abstract
Many applications of RCTs involve the presence of multiple treatment administrators -- from field experiments to online advertising -- that compete for the subjects' attention. In the face of competition, estimating a causal effect becomes difficult, as the position at which a subject sees a treatment influences their response, and thus the treatment effect. In this paper, we build a game-theoretic model of agents who wish to estimate causal effects in the presence of competition, through a bidding system and a utility function that minimizes estimation error. Our main technical result establishes an approximation with a tractable objective that maximizes the sample value obtained through strategically allocating budget on subjects. This allows us to find an equilibrium in our model: we show that the tractable objective has a pure Nash equilibrium, and that any Nash equilibrium is an approximate equilibrium for our general objective that minimizes estimation error under broad conditions. Conceptually, our work successfully combines elements from causal inference and game theory to shed light on the equilibrium behavior of experimentation under competition.

sf

ArXiv link (url) [BibTex]

ArXiv link (url) [BibTex]


no image
A Measure-Theoretic Axiomatisation of Causality and Kernel Regression

Park, J.

University of Tübingen, Germany, July 2024 (phdthesis)

ei

[BibTex]

[BibTex]


no image
Geometry-Aware Instrumental Variable Regression

Kremer, H., Schölkopf, B.

Proceedings of the 41st International Conference on Machine Learning (ICML), 235, pages: 25560-25582, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (conference)

ei

link (url) [BibTex]

link (url) [BibTex]


no image
Targeted Reduction of Causal Models

Kekić, A., Schölkopf, B., Besserve, M.

40th Conference on Uncertainty in Artificial Intelligence (UAI), July 2024 (conference) To be published

ei

arXiv link (url) [BibTex]

arXiv link (url) [BibTex]