Header logo is


2025


{OpenCapBench}: A Benchmark to Bridge Pose Estimation and Biomechanics
OpenCapBench: A Benchmark to Bridge Pose Estimation and Biomechanics

Gozlan, Y., Falisse, A., Uhlrich, S., Gatti, A., Black, M., Chaudhari, A.

In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , February 2025 (inproceedings)

Abstract
Pose estimation has promised to impact healthcare by enabling more practical methods to quantify nuances of human movement and biomechanics. However, despite the inherent connection between pose estimation and biomechanics, these disciplines have largely remained disparate. For example, most current pose estimation benchmarks use metrics such as Mean Per Joint Position Error, Percentage of Correct Keypoints, or mean Average Precision to assess performance, without quantifying kinematic and physiological correctness - key aspects for biomechanics. To alleviate this challenge, we develop OpenCapBench to offer an easy-to-use unified benchmark to assess common tasks in human pose estimation, evaluated under physiological constraints. OpenCapBench computes consistent kinematic metrics through joints angles provided by an open-source musculoskeletal modeling software (OpenSim). Through OpenCapBench, we demonstrate that current pose estimation models use keypoints that are too sparse for accurate biomechanics analysis. To mitigate this challenge, we introduce SynthPose, a new approach that enables finetuning of pre-trained 2D human pose models to predict an arbitrarily denser set of keypoints for accurate kinematic analysis through the use of synthetic data. Incorporating such finetuning on synthetic data of prior models leads to twofold reduced joint angle errors. Moreover, OpenCapBench allows users to benchmark their own developed models on our clinically relevant cohort. Overall, OpenCapBench bridges the computer vision and biomechanics communities, aiming to drive simultaneous advances in both areas.

ps

arXiv [BibTex]

2025


arXiv [BibTex]

2024


{SPARK}: Self-supervised Personalized Real-time Monocular Face Capture
SPARK: Self-supervised Personalized Real-time Monocular Face Capture

Baert, K., Bharadwaj, S., Castan, F., Maujean, B., Christie, M., Abrevaya, V., Boukhayma, A.

In SIGGRAPH Asia 2024 Conference Proceedings, SIGGRAPH Asia 2024, December 2024 (inproceedings) Accepted

Abstract
Feedforward monocular face capture methods seek to reconstruct posed faces from a single image of a person. Current state of the art approaches have the ability to regress parametric 3D face models in real-time across a wide range of identities, lighting conditions and poses by leveraging large image datasets of human faces. These methods however suffer from clear limitations in that the underlying parametric face model only provides a coarse estimation of the face shape, thereby limiting their practical applicability in tasks that require precise 3D reconstruction (aging, face swapping, digital make-up, ...). In this paper, we propose a method for high-precision 3D face capture taking advantage of a collection of unconstrained videos of a subject as prior information. Our proposal builds on a two stage approach. We start with the reconstruction of a detailed 3D face avatar of the person, capturing both precise geometry and appearance from a collection of videos. We then use the encoder from a pre-trained monocular face reconstruction method, substituting its decoder with our personalized model, and proceed with transfer learning on the video collection. Using our pre-estimated image formation model, we obtain a more precise self-supervision objective, enabling improved expression and pose alignment. This results in a trained encoder capable of efficiently regressing pose and expression parameters in real-time from previously unseen images, which combined with our personalized geometry model yields more accurate and high fidelity mesh inference. Through extensive qualitative and quantitative evaluation, we showcase the superiority of our final model as compared to state-of-the-art baselines, and demonstrate its generalization ability to unseen pose, expression and lighting.

ps

Website Code Paper+Supmat link (url) DOI [BibTex]

2024


Website Code Paper+Supmat link (url) DOI [BibTex]


no image
Latent Diffusion for Neural Spiking Data

Kapoor, J., Schulz, A., Vetter, J., Pei, F., Gao, R., Macke, J. H.

Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 38th Annual Conference on Neural Information Processing Systems, December 2024 (conference) Accepted

ei

[BibTex]

[BibTex]


no image
Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving

Didolkar, A. R., Goyal, A., Ke, N. R., Guo, S., Valko, M., Lillicrap, T. P., Rezende, D. J., Bengio, Y., Mozer, M. C., Arora, S.

Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 38th Annual Conference on Neural Information Processing Systems, December 2024 (conference) Accepted

ei

[BibTex]

[BibTex]


no image
Learning partitions from Context

Buchholz, S.

Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 38th Annual Conference on Neural Information Processing Systems, December 2024 (conference) Accepted

ei

[BibTex]

[BibTex]


no image
ImageNot: A Contrast with ImageNet Preserves Model Rankings

Salaudeen, O., Hardt, M.

arXiv preprint arXiv:2404.02112, 2024 (conference) Submitted

Abstract
We introduce ImageNot, a dataset designed to match the scale of ImageNet while differing drastically in other aspects. We show that key model architectures developed for ImageNet over the years rank identically when trained and evaluated on ImageNot to how they rank on ImageNet. This is true when training models from scratch or fine-tuning them. Moreover, the relative improvements of each model over earlier models strongly correlate in both datasets. We further give evidence that ImageNot has a similar utility as ImageNet for transfer learning purposes. Our work demonstrates a surprising degree of external validity in the relative performance of image classification models. This stands in contrast with absolute accuracy numbers that typically drop sharply even under small changes to a dataset.

sf

ArXiv [BibTex]

ArXiv [BibTex]


no image
Predictors from Causal Features Do Not Generalize Better to New Domains

Nastl, V. Y., Hardt, M.

arXiv preprint arXiv:2402.09891, 2024 (conference) Submitted

Abstract
We study how well machine learning models trained on causal features generalize across domains. We consider 16 prediction tasks on tabular datasets covering applications in health, employment, education, social benefits, and politics. Each dataset comes with multiple domains, allowing us to test how well a model trained in one domain performs in another. For each prediction task, we select features that have a causal influence on the target of prediction. Our goal is to test the hypothesis that models trained on causal features generalize better across domains. Without exception, we find that predictors using all available features, regardless of causality, have better in-domain and out-of-domain accuracy than predictors using causal features. Moreover, even the absolute drop in accuracy from one domain to the other is no better for causal predictors than for models that use all features. If the goal is to generalize to new domains, practitioners might as well train the best possible model on all available features.

sf

ArXiv [BibTex]


no image
Causal vs. Anticausal merging of predictors

Garrido, S., Blöbaum, P., Schölkopf, B., Janzing, D.

In Advances in Neural Information Processing Systems 37 (NeurIPS 2024) , 38th Annual Conference on Neural Information Processing Systems, December 2024 (inproceedings) Accepted

ei

[BibTex]

[BibTex]


no image
A Generative Model of Symmetry Transformations

Allingham, J. U., Mlodozeniec, B. K., Padhy, S., Antoran, J., Krueger, D., Turner, R. E., Nalisnick, E., Hernández-Lobato, J. M.

Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 38th Annual Conference on Neural Information Processing Systems, December 2024 (conference) Accepted

ei

[BibTex]

[BibTex]


no image
From Causal to Concept-Based Representation Learning

Rajendran*, G., Buchholz*, S., Aragam, B., Schölkopf, B., Ravikumar, P. K.

Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 38th Annual Conference on Neural Information Processing Systems, December 2024 (conference) Accepted

ei

[BibTex]

[BibTex]


no image
Theoretical Characterisation of the Gauss Newton Conditioning in Neural Networks

Zhao, J., Singh, S. P., Lucchi, A.

Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 38th Annual Conference on Neural Information Processing Systems, December 2024 (conference) Accepted

ei

[BibTex]

[BibTex]


no image
An Engine Not a Camera: Measuring Performative Power of Online Search

Mendler-Dünner, C., Carovano, G., Hardt, M.

arXiv preprint arXiv:2405.19073, 2024 (conference) Submitted

Abstract
The power of digital platforms is at the center of major ongoing policy and regulatory efforts. To advance existing debates, we designed and executed an experiment to measure the power of online search providers, building on the recent definition of performative power. Instantiated in our setting, performative power quantifies the ability of a search engine to steer web traffic by rearranging results. To operationalize this definition we developed a browser extension that performs unassuming randomized experiments in the background. These randomized experiments emulate updates to the search algorithm and identify the causal effect of different content arrangements on clicks. We formally relate these causal effects to performative power. Analyzing tens of thousands of clicks, we discuss what our robust quantitative findings say about the power of online search engines. More broadly, we envision our work to serve as a blueprint for how performative power and online experiments can be integrated with future investigations into the economic power of digital platforms.

sf

ArXiv [BibTex]

ArXiv [BibTex]


no image
Robust Mixture Learning when Outliers Overwhelm Small Groups

Dmitriev, D., Buhai, R., Tiegel, S., Wolters, A., Novikov, G., Sanyal, A., Steurer, D., Yang, F.

Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 38th Annual Conference on Neural Information Processing Systems, December 2024 (conference) Accepted

ei

[BibTex]

[BibTex]


no image
Neural Characteristic Activation Analysis and Geometric Parameterization for ReLU Networks

Chen, W., Ge, H.

Advances in Neural Information Processing Systems 37 (NeurIPS 2024) , 38th Annual Conference on Neural Information Processing Systems, December 2024 (conference) Accepted

ei

[BibTex]

[BibTex]


no image
Improving Linear System Solvers for Hyperparameter Optimisation in Iterative Gaussian Processes

Lin, J. A., Padhy, S., Mlodozeniec, B. K., Antoran, J., Hernández-Lobato, J. M.

Advances in Neural Information Processing Systems 37 (NeurIPS 2024) , 38th Annual Conference on Neural Information Processing Systems, December 2024 (conference) Accepted

ei

[BibTex]

[BibTex]


MotionFix: Text-Driven 3D Human Motion Editing
MotionFix: Text-Driven 3D Human Motion Editing

Athanasiou, N., Cseke, A., Diomataris, M., Black, M. J., Varol, G.

In SIGGRAPH Asia 2024 Conference Proceedings, ACM, December 2024 (inproceedings) To be published

Abstract
The focus of this paper is 3D motion editing. Given a 3D human motion and a textual description of the desired modification, our goal is to generate an edited motion as described by the text. The challenges include the lack of training data and the design of a model that faithfully edits the source motion. In this paper, we address both these challenges. We build a methodology to semi-automatically collect a dataset of triplets in the form of (i) a source motion, (ii) a target motion, and (iii) an edit text, and create the new dataset. Having access to such data allows us to train a conditional diffusion model that takes both the source motion and the edit text as input. We further build various baselines trained only on text-motion pairs datasets and show superior performance of our model trained on triplets. We introduce new retrieval-based metrics for motion editing and establish a new benchmark on the evaluation set. Our results are encouraging, paving the way for further research on fine-grained motion generation. Code and models will be made publicly available.

ps

link (url) Project Page Project Page [BibTex]

link (url) Project Page Project Page [BibTex]


no image
Cooperate or Collapse: Emergence of Sustainability in a Society of LLM Agents

Piatti*, G., Jin*, Z., Kleiman-Weiner*, M., Schölkopf, B., Sachan, M., Mihalcea, R.

Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 38th Annual Conference on Neural Information Processing Systems, December 2024, *equal contribution (conference) Accepted

ei

[BibTex]

[BibTex]


no image
Sourcerer: Sample-based Maximum Entropy Source Distribution Estimation

Vetter, J., Moss, G., Schröder, C., Gao, R., Macke, J. H.

Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 38th Annual Conference on Neural Information Processing Systems, December 2024 (conference) Accepted

ei

[BibTex]

[BibTex]


no image
What Makes Safety Fine-tuning Methods Safe? A Mechanistic Study

Jain, S., Lubana, E. S., Oksuz, K., Joy, T., Torr, P., Sanyal, A., Dokania, P. K.

Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 38th Annual Conference on Neural Information Processing Systems, December 2024 (conference) Accepted

ei

[BibTex]

[BibTex]


no image
Questioning the Survey Responses of Large Language Models

Dominguez-Olmedo, R., Hardt, M., Mendler-Dünner, C.

arXiv preprint arXiv:2306.07951, 2024 (conference) Submitted

Abstract
As large language models increase in capability, researchers have started to conduct surveys of all kinds on these models in order to investigate the population represented by their responses. In this work, we critically examine language models' survey responses on the basis of the well-established American Community Survey by the U.S. Census Bureau and investigate whether they elicit a faithful representations of any human population. Using a de-facto standard multiple-choice prompting technique and evaluating 39 different language models using systematic experiments, we establish two dominant patterns: First, models' responses are governed by ordering and labeling biases, leading to variations across models that do not persist after adjusting for systematic biases. Second, models' responses do not contain the entropy variations and statistical signals typically found in human populations. As a result, a binary classifier can almost perfectly differentiate model-generated data from the responses of the U.S. census. At the same time, models' relative alignment with different demographic subgroups can be predicted from the subgroups' entropy, irrespective of the model's training data or training strategy. Taken together, our findings suggest caution in treating models' survey responses as equivalent to those of human populations.

sf

ArXiv [BibTex]


no image
Training on the Test Task Confounds Evaluation and Emergence

Dominguez-Olmedo, R., Dorner, F. E., Hardt, M.

arXiv preprint arXiv:2407.07890, 2024 (conference) Submitted

Abstract
We study a fundamental problem in the evaluation of large language models that we call training on the test task. Unlike wrongful practices like training on the test data, leakage, or data contamination, training on the test task is not malpractice. Rather, the term describes a growing set of techniques to include task-relevant data in the pretraining stage of a language model. We demonstrate that training on the test task confounds both relative model evaluations and claims about emergent capabilities. We argue that the seeming superiority of one model family over another may be explained by a different degree of training on the test task. To this end, we propose an effective method to adjust for training on the test task by fine-tuning each model under comparison on the same task-relevant data before evaluation. We then show that instances of emergent behavior largely vanish once we adjust for training on the test task. This also applies to reported instances of emergent behavior that cannot be explained by the choice of evaluation metric. Our work promotes a new perspective on the evaluation of large language models with broad implications for benchmarking and the study of emergent capabilities.

sf

ArXiv [BibTex]


no image
On Affine Homotopy between Language Encoders

Chan, R., Bourmasmoud, R., Svete, A., Ren, Y., Guo, Q., Jin, Z., Ravfogel, S., Sachan, M., Schölkopf, B., El-Assady, M., Cotterell, R.

Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 38th Annual Conference on Neural Information Processing Systems, December 2024 (conference) Accepted

ei

[BibTex]

[BibTex]


no image
Inferring stochastic low-rank recurrent neural networks from neural data

Pals, M., Sağtekin, A. E., Pei, F., Gloeckler, M., Macke, J.

Advances in Neural Information Processing Systems 37 (NeurIPS 2024) , 38th Annual Conference on Neural Information Processing Systems, December 2024 (conference) Accepted

ei

[BibTex]

[BibTex]


no image
Do Finetti: On Causal Effects for Exchangeable Data

Guo, S., Zhang, C., Muhan, K., Huszár*, F., Schölkopf*, B.

Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 38th Annual Conference on Neural Information Processing Systems, December 2024, *joint senior authors (conference) Accepted

ei

[BibTex]

[BibTex]


no image
Demonstration: OCRA - A Kinematic Retargeting Algorithm for Expressive Whole-Arm Teleoperation

Mohan, M., Kuchenbecker, K. J.

Hands-on demonstration presented at the Conference on Robot Learning (CoRL), Munich, Germany, November 2024 (misc) Accepted

Abstract
Traditional teleoperation systems focus on controlling the pose of the end-effector (task space), often neglecting the additional degrees of freedom present in human and many robotic arms. This demonstration presents the Optimization-based Customizable Retargeting Algorithm (OCRA), which was designed to map motions from one serial kinematic chain to another in real time. OCRA is versatile, accommodating any robot joint counts and segment lengths, and it can retarget motions from human arms to kinematically different serial robot arms with revolute joints both expressively and efficiently. One of OCRA's key features is its customizability, allowing the user to adjust the emphasis between hand orientation error and the configuration error of the arm's central line, which we call the arm skeleton. To evaluate the perceptual quality of the motions generated by OCRA, we conducted a video-watching study with 70 participants; the results indicated that the algorithm produces robot motions that closely resemble human movements, with a median rating of 78/100, particularly when the arm skeleton error weight and hand orientation error are balanced. In this demonstration, the presenter will wear an Xsens MVN Link and teleoperate the arms of a NAO child-size humanoid robot to highlight OCRA's ability to create intuitive and human-like whole-arm motions.

hi

Project Page [BibTex]

Project Page [BibTex]


no image
Demonstration: Minsight - A Soft Vision-Based Tactile Sensor for Robotic Fingertips

Andrussow, I., Sun, H., Martius, G., Kuchenbecker, K. J.

Hands-on demonstration presented at the Conference on Robot Learning (CoRL), Munich, Germany, November 2024 (misc) Accepted

Abstract
Beyond vision and hearing, tactile sensing enhances a robot's ability to dexterously manipulate unfamiliar objects and safely interact with humans. Giving touch sensitivity to robots requires compact, robust, affordable, and efficient hardware designs, especially for high-resolution tactile sensing. We present a soft vision-based tactile sensor engineered to meet these requirements. Comparable in size to a human fingertip, Minsight uses machine learning to output high-resolution directional contact force distributions at 60 Hz. Minsight's tactile force maps enable precise sensing of fingertip contacts, which we use in this hands-on demonstration to allow a 3-DoF robot arm to physically track contact with a user's finger. While observing the colorful image captured by Minsight's internal camera, attendees can experience how its ability to detect delicate touches in all directions facilitates real-time robot interaction.

al hi ei

Project Page [BibTex]

Project Page [BibTex]


no image
Active Haptic Feedback for a Virtual Wrist-Anchored User Interface

Bartels, J. U., Sanchez-Tamayo, N., Sedlmair, M., Kuchenbecker, K. J.

Hands-on demonstration presented at the ACM Symposium on User Interface Software and Technology (UIST), Pittsburgh, USA, October 2024 (misc) Accepted

hi

DOI [BibTex]

DOI [BibTex]


Human Hair Reconstruction with Strand-Aligned {3D} Gaussians
Human Hair Reconstruction with Strand-Aligned 3D Gaussians

Zakharov, E., Sklyarova, V., Black, M. J., Nam, G., Thies, J., Hilliges, O.

In European Conference on Computer Vision (ECCV 2024), LNCS, Springer Cham, October 2024 (inproceedings)

Abstract
We introduce a new hair modeling method that uses a dual representation of classical hair strands and 3D Gaussians to produce accurate and realistic strand-based reconstructions from multi-view data. In contrast to recent approaches that leverage unstructured Gaussians to model human avatars, our method reconstructs the hair using 3D polylines, or strands. This fundamental difference allows the use of the resulting hairstyles out-of-the-box in modern computer graphics engines for editing, rendering, and simulation. Our 3D lifting method relies on unstructured Gaussians to generate multi-view ground truth data to supervise the fitting of hair strands. The hairstyle itself is represented in the form of the so-called strand-aligned 3D Gaussians. This representation allows us to combine strand-based hair priors, which are essential for realistic modeling of the inner structure of hairstyles, with the differentiable rendering capabilities of 3D Gaussian Splatting. Our method, named Gaussian Haircut, is evaluated on synthetic and real scenes and demonstrates state-of-the-art performance in the task of strand-based hair reconstruction.

ps

pdf project code video arXiv [BibTex]

pdf project code video arXiv [BibTex]


Stable Video Portraits
Stable Video Portraits

Ostrek, M., Thies, J.

In European Conference on Computer Vision (ECCV 2024), LNCS, Springer Cham, European Conference on Computer Vision (ECCV 2024), October 2024 (inproceedings) Accepted

Abstract
Rapid advances in the field of generative AI and text-to-image methods in particular have transformed the way we interact with and perceive computer-generated imagery today. In parallel, much progress has been made in 3D face reconstruction, using 3D Morphable Models (3DMM). In this paper, we present Stable Video Portraits, a novel hybrid 2D/3D generation method that outputs photorealistic videos of talking faces leveraging a large pre-trained text-to-image prior (2D), controlled via a 3DMM (3D). Specifically, we introduce a person-specific fine-tuning of a general 2D stable diffusion model which we lift to a video model by providing temporal 3DMM sequences as conditioning and by introducing a temporal denoising procedure. As an output, this model generates temporally smooth imagery of a person with 3DMM-based controls, i.e., a person-specific avatar. The facial appearance of this person-specific avatar can be edited and morphed to text-defined celebrities, without any test-time fine-tuning. The method is analyzed quantitatively and qualitatively, and we show that our method outperforms state-of-the-art monocular head avatar methods.

ncs ps

link (url) [BibTex]

link (url) [BibTex]


Generating Human Interaction Motions in Scenes with Text Control
Generating Human Interaction Motions in Scenes with Text Control

Yi, H., Thies, J., Black, M. J., Peng, X. B., Rempe, D.

In European Conference on Computer Vision (ECCV 2024), LNCS, Springer Cham, October 2024 (inproceedings)

Abstract
We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models. Previous text-to-motion methods focus on characters in isolation without considering scenes due to the limited availability of datasets that include motion, text descriptions, and interactive scenes. Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model, emphasizing goal-reaching constraints on large-scale motion-capture datasets. We then enhance this model with a scene-aware component, fine-tuned using data augmented with detailed scene information, including ground plane and object shapes. To facilitate training, we embed annotated navigation and interaction motions within scenes. The proposed method produces realistic and diverse human-object interactions, such as navigation and sitting, in different scenes with various object shapes, orientations, initial body positions, and poses. Extensive experiments demonstrate that our approach surpasses prior techniques in terms of the plausibility of human-scene interactions, as well as the realism and variety of the generated motions.

ps

pdf project [BibTex]

pdf project [BibTex]


On predicting {3D} bone locations inside the human body
On predicting 3D bone locations inside the human body

Dakri, A., Arora, V., Challier, L., Keller, M., Black, M. J., Pujades, S.

In 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), October 2024 (inproceedings)

Abstract
Knowing the precise location of the bones inside the human body is key in several medical tasks, such as patient placement inside an imaging device or surgical navigation inside a patient. Our goal is to predict the bone locations using only an external 3D body surface obser- vation. Existing approaches either validate their predictions on 2D data (X-rays) or with pseudo-ground truth computed from motion capture using biomechanical models. Thus, methods either suffer from a 3D-2D projection ambiguity or directly lack validation on clinical imaging data. In this work, we start with a dataset of segmented skin and long bones obtained from 3D full body MRI images that we refine into individual bone segmentations. To learn the skin to bones correlations, one needs to register the paired data. Few anatomical models allow to register a skeleton and the skin simultaneously. One such method, SKEL, has a skin and skeleton that is jointly rigged with the same pose parameters. How- ever, it lacks the flexibility to adjust the bone locations inside its skin. To address this, we extend SKEL into SKEL-J to allow its bones to fit the segmented bones while its skin fits the segmented skin. These precise fits allow us to train SKEL-J to more accurately infer the anatomical joint locations from the skin surface. Our qualitative and quantitative results show how our bone location predictions are more accurate than all existing approaches. To foster future research, we make available for research purposes the individual bone segmentations, the fitted SKEL-J models as well as the new inference methods.

ps

Project page [BibTex]

Project page [BibTex]


Synthesizing Environment-Specific People in Photographs
Synthesizing Environment-Specific People in Photographs

Ostrek, M., O’Sullivan, C., Black, M., Thies, J.

In European Conference on Computer Vision (ECCV 2024), LNCS, Springer Cham, European Conference on Computer Vision (ECCV 2024), October 2024 (inproceedings) Accepted

Abstract
We present ESP, a novel method for context-aware full-body generation, that enables photo-realistic synthesis and inpainting of people wearing clothing that is semantically appropriate for the scene depicted in an input photograph. ESP is conditioned on a 2D pose and contextual cues that are extracted from the photograph of the scene and integrated into the generation process, where the clothing is modeled explicitly with human parsing masks (HPM). Generated HPMs are used as tight guiding masks for inpainting, such that no changes are made to the original background. Our models are trained on a dataset containing a set of in-the-wild photographs of people covering a wide range of different environments. The method is analyzed quantitatively and qualitatively, and we show that ESP outperforms the state-of-the-art on the task of contextual full-body generation.

ncs ps

link (url) [BibTex]

link (url) [BibTex]


Explorative Inbetweening of Time and Space
Explorative Inbetweening of Time and Space

Feng, H., Ding, Z., Xia, Z., Niklaus, S., Fernandez Abrevaya, V., Black, M. J., Zhang, X.

In European Conference on Computer Vision (ECCV 2024), LNCS, Springer Cham, October 2024 (inproceedings)

Abstract
We introduce bounded generation as a generalized task to control video generation to synthesize arbitrary camera and subject motion based only on a given start and end frame. Our objective is to fully leverage the inherent generalization capability of an image-to-video model without additional training or fine-tuning of the original model. This is achieved through the proposed new sampling strategy, which we call Time Reversal Fusion, that fuses the temporally forward and backward denoising paths conditioned on the start and end frame, respectively. The fused path results in a video that smoothly connects the two frames, generating inbetweening of faithful subject motion, novel views of static scenes, and seamless video looping when the two bounding frames are identical. We curate a diverse evaluation dataset of image pairs and compare against the closest existing methods. We find that Time Reversal Fusion outperforms related work on all subtasks, exhibiting the ability to generate complex motions and 3D-consistent views guided by bounded frames.

ps

Paper Website [BibTex]

Paper Website [BibTex]


{HUMOS}: Human Motion Model Conditioned on Body Shape
HUMOS: Human Motion Model Conditioned on Body Shape

Tripathi, S., Taheri, O., Lassner, C., Black, M. J., Holden, D., Stoll, C.

In European Conference on Computer Vision (ECCV 2024), LNCS, Springer Cham, October 2024 (inproceedings)

Abstract
Generating realistic human motion is essential for many computer vision and graphics applications. The wide variety of human body shapes and sizes greatly impacts how people move. However, most existing motion models ignore these differences, relying on a standardized, average body. This leads to uniform motion across different body types, where movements don't match their physical characteristics, limiting diversity. To solve this, we introduce a new approach to develop a generative motion model based on body shape. We show that it's possible to train this model using unpaired data by applying cycle consistency, intuitive physics, and stability constraints, which capture the relationship between identity and movement. The resulting model generates diverse, physically plausible, and dynamically stable human motions that are both quantitatively and qualitatively more realistic than current state-of-the-art methods.

ps

project arXiv [BibTex]

project arXiv [BibTex]


GraspXL: Generating Grasping Motions for Diverse Objects at Scale
GraspXL: Generating Grasping Motions for Diverse Objects at Scale

Zhang, H., Christen, S., Fan, Z., Hilliges, O., Song, J.

In European Conference on Computer Vision (ECCV 2024), LNCS, Springer Cham, September 2024 (inproceedings) Accepted

ps

Code Video Paper [BibTex]

Code Video Paper [BibTex]


no image
Learning to Control Emulated Muscles in Real Robots: Towards Exploiting Bio-Inspired Actuator Morphology

Schumacher, P., Krause, L., Schneider, J., Büchler, D., Martius, G., Haeufle, D.

In 10th International Conference on Biomedical Robotics and Biomechatronics (BioRob), September 2024 (inproceedings) Accepted

ei

arXiv [BibTex]

arXiv [BibTex]


Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects
Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects

Fan, Z., Ohkawa, T., Yang, L., Lin, N., Zhou, Z., Zhou, S., Liang, J., Gao, Z., Zhang, X., Zhang, X., Li, F., Zheng, L., Lu, F., Zeid, K. A., Leibe, B., On, J., Baek, S., Prakash, A., Gupta, S., He, K., Sato, Y., Hilliges, O., Chang, H. J., Yao, A.

In European Conference on Computer Vision (ECCV 2024), LNCS, Springer Cham, September 2024 (inproceedings) Accepted

ps

Paper Leaderboard [BibTex]

Paper Leaderboard [BibTex]


{AWOL: Analysis WithOut synthesis using Language}
AWOL: Analysis WithOut synthesis using Language

Zuffi, S., Black, M. J.

In European Conference on Computer Vision (ECCV 2024), LNCS, Springer Cham, September 2024 (inproceedings)

ps

Paper [BibTex]

Paper [BibTex]


no image
Modeling Shank Tissue Properties and Quantifying Body Composition with a Wearable Actuator-Accelerometer Set

Rokhmanova, N., Martus, J., Faulkner, R., Fiene, J., Kuchenbecker, K. J.

Extended abstract (1 page) presented at the American Society of Biomechanics Annual Meeting (ASB), Madison, USA, August 2024 (misc)

hi

Project Page [BibTex]

Project Page [BibTex]


no image
Moûsai: Efficient Text-to-Music Diffusion Models

Schneider, F., Kamal, O., Jin, Z., Schölkopf, B.

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL), Volume 1: Long Papers, pages: 8050-8068, (Editors: Lun-Wei Ku and Andre Martins and Vivek Srikumar), Association for Computational Linguistics, August 2024 (conference)

ei

link (url) [BibTex]

link (url) [BibTex]


no image
Modelling Variability in Human Annotator Simulation

Wu*, W., Chen*, W., Zhang, C., Woodland, P. C.

Findings of the Association for Computational Linguistics (ACL), pages: 1139-1157, (Editors: Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek), Association for Computational Linguistics, August 2024, *equal contribution (conference)

ei

link (url) [BibTex]

link (url) [BibTex]


no image
Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals

Ortu*, F., Jin*, Z., Doimo, D., Sachan, M., Cazzaniga, A., Schölkopf, B.

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL) , Volume 1, Long Papers, pages: 8420-8436, (Editors: Lun-Wei Ku and Andre Martins and Vivek Srikumar), Association for Computational Linguistics, August 2024, *equal contribution (conference)

ei

arXiv link (url) [BibTex]

arXiv link (url) [BibTex]


no image
CausalCite: A Causal Formulation of Paper Citations

Kumar, I., Jin, Z., Mokhtarian, E., Guo, S., Chen, Y., Kiyavash, N., Sachan, M., Schölkopf, B.

Findings of the Association for Computational Linguistics (ACL), pages: 8395-8410, (Editors: Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek), Association for Computational Linguistics, August 2024 (conference)

ei

arXiv link (url) [BibTex]

arXiv link (url) [BibTex]


no image
Adapting a High-Fidelity Simulation of Human Skin for Comparative Touch Sensing

Schulz, A., Serhat, G., Kuchenbecker, K. J.

Extended abstract (1 page) presented at the American Society of Biomechanics Annual Meeting (ASB), Madison, USA, August 2024 (misc)

hi

[BibTex]

[BibTex]


no image
On the Growth of Mistakes in Differentially Private Online Learning: A Lower Bound Perspective

Dmitriev, D., Szabó, K., Sanyal, A.

Proceedings of the 37th Annual Conference on Learning Theory (COLT), 247, pages: 1379-1398, Proceedings of Machine Learning Research, (Editors: Agrawal, Shipra and Roth, Aaron), PMLR, July 2024, (talk) (conference)

ei

link (url) [BibTex]

link (url) [BibTex]


no image
Robustness of Nonlinear Representation Learning

Buchholz, S., Schölkopf, B.

Proceedings of the 41st International Conference on Machine Learning (ICML), 235, pages: 4785-4821, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (conference)

ei

link (url) [BibTex]

link (url) [BibTex]


no image
Diffusion Tempering Improves Parameter Estimation with Probabilistic Integrators for ODEs

Beck, J., Bosch, N., Deistler, M., Kadhim, K. L., Macke, J. H., Hennig, P., Berens, P.

Proceedings of the 41st International Conference on Machine Learning (ICML), 235, pages: 3305-3326, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (conference)

ei

arXiv link (url) [BibTex]

arXiv link (url) [BibTex]


no image
Simultaneous identification of models and parameters of scientific simulators

Schröder, C., Macke, J. H.

Proceedings of the 41st International Conference on Machine Learning (ICML), 235, pages: 43895-43927, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (conference)

ei

link (url) [BibTex]

link (url) [BibTex]


no image
Causal Action Influence Aware Counterfactual Data Augmentation

Urpi, N. A., Bagatella, M., Vlastelica, M., Martius, G.

In Proceedings of the 41st International Conference on Machine Learning (ICML), 235, pages: 1709-1729, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (inproceedings)

al

link (url) [BibTex]

link (url) [BibTex]


no image
Position: Understanding LLMs Requires More Than Statistical Generalization

Reizinger, P., Ujváry, S., Mészáros, A., Kerekes, A., Brendel, W., Huszár, F.

Proceedings of the 41st International Conference on Machine Learning (ICML), 235, pages: 42365-42390, Proceedings of Machine Learning Research, (Editors: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix), PMLR, July 2024 (conference)

ei robustml

arXiv link (url) [BibTex]

arXiv link (url) [BibTex]