Publications | Max Planck Institute for Intelligent Systems

1011 results (View BibTeX file of all listed publications)

2024

Digital Human Characters: Challenges, Models and Algorithms

Digital Human Characters: Challenges, Models and Algorithms

Osman, A. A. A.

University of Tübingen, September 2024 (phdthesis)

Abstract

Statistical models for the body, head, and hands are essential in various computer vision tasks. However, popular models like SMPL, MANO, and FLAME produce unrealistic deformations due to inherent flaws in their modeling assumptions and how they are trained, which have become standard practices in constructing models for the body and its parts. This dissertation addresses these limitations by proposing new modeling and training algorithms to improve the realism and generalization of current models. We introduce a new model, STAR (Sparse Trained Articulated Human Body Regressor), which learns a sparse representation of the human body deformations, significantly reducing the number of model parameters compared to models like SMPL. This approach ensures that deformations are spatially localized, leading to more realistic deformations. STAR also incorporates shape-dependent pose deformations, accounting for variations in body shape to enhance overall model accuracy and realism. Additionally, we present a novel federated training algorithm for developing a comprehensive suite of models for the body and its parts. We train an expressive body model, SUPR (Sparse Unified Part-Based Representation), on a federated dataset of full-body scans, including detailed scans of the head, hands, and feet. We then separate SUPR into a full suite of state-of-the-art models for the head, hands, and foot. The new foot model captures complex foot deformations, addressing challenges related to foot shape, pose, and ground contact dynamics. The dissertation concludes by introducing AVATAR (Articulated Virtual Humans Trained By Bayesian Inference From a Single Scan), a novel, data-efficient training algorithm. AVATAR allows the creation of personalized, high-fidelity body models from a single scan by framing model construction as a Bayesian inference problem, thereby enabling training from small-scale datasets while reducing the risk of overfitting. These advancements push the state of the art in human body modeling and training techniques, making them more accessible for broader research and practical applications.

ps

2024

ps Osman, A. A. A. Digital Human Characters: Challenges, Models and Algorithms University of Tübingen, September 2024 (phdthesis)

no image

Engineering and Evaluating Naturalistic Vibrotactile Feedback for Telerobotic Assembly

University of Stuttgart, Stuttgart, Germany, August 2024, Faculty of Design, Production Engineering and Automotive Engineering (phdthesis)

Abstract

Teleoperation allows workers on a construction site to assemble pre-fabricated building components by controlling powerful machines from a safe distance. However, teleoperation's primary reliance on visual feedback limits the operator's efficiency in situations with stiff contact or poor visibility, compromising their situational awareness and thus increasing the difficulty of the task; it also makes construction machines more difficult to learn to operate. To bridge this gap, we propose that reliable, economical, and easy-to-implement naturalistic vibrotactile feedback could improve telerobotic control interfaces in construction and other application areas such as surgery. This type of feedback enables the operator to feel the natural vibrations experienced by the robot, which contain crucial information about its motions and its physical interactions with the environment. This dissertation explores how to deliver naturalistic vibrotactile feedback from a robot's end-effector to the hand of an operator performing telerobotic assembly tasks; furthermore, it seeks to understand the effects of such haptic cues. The presented research can be divided into four parts. We first describe the engineering of AiroTouch, a naturalistic vibrotactile feedback system tailored for use on construction sites but suitable for many other applications of telerobotics. Then we evaluate AiroTouch and explore the effects of the naturalistic vibrotactile feedback it delivers in three user studies conducted either in laboratory settings or on a construction site. We begin this dissertation by developing guidelines for creating a haptic feedback system that provides high-quality naturalistic vibrotactile feedback. These guidelines include three sections: component selection, component placement, and system evaluation. We detail each aspect with the parameters that need to be considered. Based on these guidelines, we adapt widely available commercial audio equipment to create our system called AiroTouch, which measures the vibration experienced by each robot tool with a high-bandwidth three-axis accelerometer and enables the user to feel this vibration in real time through a voice-coil actuator. Accurate haptic transmission is achieved by optimizing the positions of the system's off-the-shelf sensors and actuators and is then verified through measurements. The second part of this thesis presents our initial validation of AiroTouch. We explored how adding this naturalistic type of vibrotactile feedback affects the operator during small-scale telerobotic assembly. Due to the limited accessibility of teleoperated robots and to maintain safety, we conducted a user study in lab with a commercial bimanual dexterous teleoperation system developed for surgery (Intuitive da Vinci Si). Thirty participants used this robot equipped with AiroTouch to assemble a small stiff structure under three randomly ordered haptic feedback conditions: no vibrations, one-axis vibrations, and summed three-axis vibrations. The results show that participants learn to take advantage of both tested versions of the haptic feedback in the given tasks, as significantly lower vibrations and forces are observed in the second trial. Subjective responses indicate that naturalistic vibrotactile feedback increases the realism of the interaction and reduces the perceived task duration, task difficulty, and fatigue. To test our approach on a real construction site, we enhanced AiroTouch using wireless signal-transmission technologies and waterproofing, and then we adapted it to a mini-crane construction robot. A study was conducted to evaluate how naturalistic vibrotactile feedback affects an observer's understanding of telerobotic assembly performed by this robot on a construction site. Seven adults without construction experience observed a mix of manual and autonomous assembly processes both with and without naturalistic vibrotactile feedback. Qualitative analysis of their survey responses and interviews indicates that all participants had positive responses to this technology and believed it would be beneficial for construction activities. Finally, we evaluated the effects of naturalistic vibrotactile feedback provided by wireless AiroTouch during live teleoperation of the mini-crane. Twenty-eight participants remotely controlled the mini-crane to complete three large-scale assembly-related tasks in lab, both with and without this type of haptic feedback. Our results show that naturalistic vibrotactile feedback enhances the participants' awareness of both robot motion and contact between the robot and other objects, particularly in scenarios with limited visibility. These effects increase participants' confidence when controlling the robot. Moreover, there is a noticeable trend of reduced vibration magnitude in the conditions where this type of haptic feedback is provided. The primary contribution of this dissertation is the clear explanation of details that are essential for the effective implementation of naturalistic vibrotactile feedback. We demonstrate that our accessible, audio-based approach can enhance user performance and experience during telerobotic assembly in construction and other application domains. These findings lay the foundation for further exploration of the potential benefits of incorporating haptic cues to enhance user experience during teleoperation.

hi

Project Page [BibTex]

hi Gong, Y. Engineering and Evaluating Naturalistic Vibrotactile Feedback for Telerobotic Assembly University of Stuttgart, Stuttgart, Germany, August 2024, Faculty of Design, Production Engineering and Automotive Engineering (phdthesis)

Project Page [BibTex]

no image

A Measure-Theoretic Axiomatisation of Causality and Kernel Regression

University of Tübingen, Germany, July 2024 (phdthesis)

ei

ei Park, J. A Measure-Theoretic Axiomatisation of Causality and Kernel Regression University of Tübingen, Germany, July 2024 (phdthesis)

no image

Enhancement and Evaluation of Deep Generative Networks with Applications in Super-Resolution and Image Generation

Sajjadi, S. M. M.

University of Tübingen, Germany, July 2024 (phdthesis)

ei

ei Sajjadi, S. M. M. Enhancement and Evaluation of Deep Generative Networks with Applications in Super-Resolution and Image Generation University of Tübingen, Germany, July 2024 (phdthesis)

Modelling Dynamic 3D Human-Object Interactions: From Capture to Synthesis

Modelling Dynamic 3D Human-Object Interactions: From Capture to Synthesis

University of Tübingen, July 2024 (phdthesis) To be published

Abstract

Modeling digital humans that move and interact realistically with virtual 3D worlds has emerged as an essential research area recently, with significant applications in computer graphics, virtual and augmented reality, telepresence, the Metaverse, and assistive technologies. In particular, human-object interaction, encompassing full-body motion, hand-object grasping, and object manipulation, lies at the core of how humans execute tasks and represents the complex and diverse nature of human behavior. Therefore, accurate modeling of these interactions would enable us to simulate avatars to perform tasks, enhance animation realism, and develop applications that better perceive and respond to human behavior. Despite its importance, this remains a challenging problem, due to several factors such as the complexity of human motion, the variance of interaction based on the task, and the lack of rich datasets capturing the complexity of real-world interactions. Prior methods have made progress, but limitations persist as they often focus on individual aspects of interaction, such as body, hand, or object motion, without considering the holistic interplay among these components. This Ph.D. thesis addresses these challenges and contributes to the advancement of human-object interaction modeling through the development of novel datasets, methods, and algorithms.

ps

ps Taheri, O. Modelling Dynamic 3D Human-Object Interactions: From Capture to Synthesis University of Tübingen, July 2024 (phdthesis) To be published

no image

Advancing Normalising Flows to Model Boltzmann Distributions

University of Cambridge, UK, Cambridge, June 2024, (Cambridge-Tübingen-Fellowship-Program) (phdthesis)

ei

ei Stimper, V. Advancing Normalising Flows to Model Boltzmann Distributions University of Cambridge, UK, Cambridge, June 2024, (Cambridge-Tübingen-Fellowship-Program) (phdthesis)

no image

LFP transient events in macaque subcortical areas reveal network coordination across scales and structures: a simultaneous fMRI-electrophysiology study

Besserve, M., Safavi, S., Schölkopf, B., Logothetis, N.

Computational and Systems Neuroscience Meeting (COSYNE), March 2024 (poster)

ei

link (url) [BibTex]

ei Besserve, M., Safavi, S., Schölkopf, B., Logothetis, N. LFP transient events in macaque subcortical areas reveal network coordination across scales and structures: a simultaneous fMRI-electrophysiology study Computational and Systems Neuroscience Meeting (COSYNE), March 2024 (poster)

link (url) [BibTex]

no image

Language Models Can Reduce Asymmetry in Information Markets

Rahaman, N., Weiss, M., Wüthrich, M., Bengio, Y., Li, E., Pal, C., Schölkopf, B.

arXiv:2403.14443, March 2024, Published as: Redesigning Information Markets in the Era of Language Models, Conference on Language Modeling (COLM) (techreport)

Abstract

This work addresses the buyer's inspection paradox for information markets. The paradox is that buyers need to access information to determine its value, while sellers need to limit access to prevent theft. To study this, we introduce an open-source simulated digital marketplace where intelligent agents, powered by language models, buy and sell information on behalf of external participants. The central mechanism enabling this marketplace is the agents' dual capabilities: they not only have the capacity to assess the quality of privileged information but also come equipped with the ability to forget. This ability to induce amnesia allows vendors to grant temporary access to proprietary information, significantly reducing the risk of unauthorized retention while enabling agents to accurately gauge the information's relevance to specific queries or tasks. To perform well, agents must make rational decisions, strategically explore the marketplace through generated sub-queries, and synthesize answers from purchased information. Concretely, our experiments (a) uncover biases in language models leading to irrational behavior and evaluate techniques to mitigate these biases, (b) investigate how price affects demand in the context of informational goods, and (c) show that inspection and higher budgets both lead to higher quality outcomes.

ei

link (url) [BibTex]

ei Rahaman, N., Weiss, M., Wüthrich, M., Bengio, Y., Li, E., Pal, C., Schölkopf, B. Language Models Can Reduce Asymmetry in Information Markets arXiv:2403.14443, March 2024, Published as: Redesigning Information Markets in the Era of Language Models, Conference on Language Modeling (COLM) (techreport)

link (url) [BibTex]

no image

Koopman Spectral Analysis Uncovers the Temporal Structure of Spontaneous Neural Events

Shao, K., Xu, Y., Logothetis, N., Shen, Z., Besserve, M.

Computational and Systems Neuroscience Meeting (COSYNE), March 2024 (poster)

ei

link (url) [BibTex]

ei Shao, K., Xu, Y., Logothetis, N., Shen, Z., Besserve, M. Koopman Spectral Analysis Uncovers the Temporal Structure of Spontaneous Neural Events Computational and Systems Neuroscience Meeting (COSYNE), March 2024 (poster)

link (url) [BibTex]

no image

Interpreting How Large Language Models Handle Facts and Counterfactuals through Mechanistic Interpretability

Ortu, F.

University of Trieste, Italy, March 2024 (mastersthesis)

ei

ei Ortu, F. Interpreting How Large Language Models Handle Facts and Counterfactuals through Mechanistic Interpretability University of Trieste, Italy, March 2024 (mastersthesis)

no image

Identifiable Causal Representation Learning

von Kügelgen, J.

University of Cambridge, UK, Cambridge, February 2024, (Cambridge-Tübingen-Fellowship) (phdthesis)

ei

ei von Kügelgen, J. Identifiable Causal Representation Learning University of Cambridge, UK, Cambridge, February 2024, (Cambridge-Tübingen-Fellowship) (phdthesis)

Creating a Haptic Empathetic Robot Animal That Feels Touch and Emotion

Creating a Haptic Empathetic Robot Animal That Feels Touch and Emotion

University of Tübingen, Tübingen, Germany, February 2024, Department of Computer Science (phdthesis)

Abstract

Social touch, such as a hug or a poke on the shoulder, is an essential aspect of everyday interaction. Humans use social touch to gain attention, communicate needs, express emotions, and build social bonds. Despite its importance, touch sensing is very limited in most commercially available robots. By endowing robots with social-touch perception, one can unlock a myriad of new interaction possibilities. In this thesis, I present my work on creating a Haptic Empathetic Robot Animal (HERA), a koala-like robot for children with autism. I demonstrate the importance of establishing design guidelines based on one's target audience, which we investigated through interviews with autism specialists. I share our work on creating full-body tactile sensing for the NAO robot using low-cost, do-it-yourself (DIY) methods, and I introduce an approach to model long-term robot emotions using second-order dynamics.

hi

Project Page [BibTex]

hi Burns, R. Creating a Haptic Empathetic Robot Animal That Feels Touch and Emotion University of Tübingen, Tübingen, Germany, February 2024, Department of Computer Science (phdthesis)

Project Page [BibTex]

no image

Learning a Terrain- and Robot-Aware Dynamics Model for Autonomous Mobile Robot Navigation

Achterhold, J., Guttikonda, S., Kreber, J. U., Li, H., Stueckler, J.

CoRR abs/2409.11452, 2024, Preprint submitted to Robotics and Autonomous Systems Journal. https://arxiv.org/abs/2409.11452 (techreport) Submitted

Abstract

Mobile robots should be capable of planning cost-efficient paths for autonomous navigation. Typically, the terrain and robot properties are subject to variations. For instance, properties of the terrain such as friction may vary across different locations. Also, properties of the robot may change such as payloads or wear and tear, e.g., causing changing actuator gains or joint friction. Autonomous navigation approaches should thus be able to adapt to such variations. In this article, we propose a novel approach for learning a probabilistic, terrain- and robot-aware forward dynamics model (TRADYN) which can adapt to such variations and demonstrate its use for navigation. Our learning approach extends recent advances in meta-learning forward dynamics models based on Neural Processes for mobile robot navigation. We evaluate our method in simulation for 2D navigation of a robot with uni-cycle dynamics with varying properties on terrain with spatially varying friction coefficients. In our experiments, we demonstrate that TRADYN has lower prediction error over long time horizons than model ablations which do not adapt to robot or terrain variations. We also evaluate our model for navigation planning in a model-predictive control framework and under various sources of noise. We demonstrate that our approach yields improved performance in planning control-efficient paths by taking robot and terrain properties into account.

ev

preprint [BibTex]

ev Achterhold, J., Guttikonda, S., Kreber, J. U., Li, H., Stueckler, J. Learning a Terrain- and Robot-Aware Dynamics Model for Autonomous Mobile Robot Navigation CoRR abs/2409.11452, 2024, Preprint submitted to Robotics and Autonomous Systems Journal. https://arxiv.org/abs/2409.11452 (techreport) Submitted

preprint [BibTex]

no image

A Pontryagin Perspective on Reinforcement Learning

Eberhard, O., Vernade, C., Muehlebach, M.

Max Planck Institute for Intelligent Systems, 2024 (techreport)

lds

link (url) [BibTex]

lds Eberhard, O., Vernade, C., Muehlebach, M. A Pontryagin Perspective on Reinforcement Learning Max Planck Institute for Intelligent Systems, 2024 (techreport)

link (url) [BibTex]

Self- and Interpersonal Contact in 3D Human Mesh Reconstruction

Self- and Interpersonal Contact in 3D Human Mesh Reconstruction

University of Tübingen, Tübingen, 2024 (phdthesis)

Abstract

The ability to perceive tactile stimuli is of substantial importance for human beings in establishing a connection with the surrounding world. Humans rely on the sense of touch to navigate their environment and to engage in interactions with both themselves and other people. The field of computer vision has made great progress in estimating a person’s body pose and shape from an image, however, the investigation of self- and interpersonal contact has received little attention despite its considerable significance. Estimating contact from images is a challenging endeavor because it necessitates methodologies capable of predicting the full 3D human body surface, i.e. an individual’s pose and shape. The limitations of current methods become evident when considering the two primary datasets and labels employed within the community to supervise the task of human pose and shape estimation. First, the widely used 2D joint locations lack crucial information for representing the entire 3D body surface. Second, in datasets of 3D human bodies, e.g. collected from motion capture systems or body scanners, contact is usually avoided, since it naturally leads to occlusion which complicates data cleaning and can break the data processing pipelines. In this thesis, we first address the problem of estimating contact that humans make with themselves from RGB images. To do this, we introduce two novel methods that we use to create new datasets tailored for the task of human mesh estimation for poses with self-contact. We create (1) 3DCP, a dataset of 3D body scan and motion capture data of humans in poses with self-contact and (2) MTP, a dataset of images taken in the wild with accurate 3D reference data using pose mimicking. Next, we observe that 2D joint locations can be readily labeled at scale given an image, however, an equivalent label for self-contact does not exist. Consequently, we introduce (3) distrecte self-contact (DSC) annotations indicating the pairwise contact of discrete regions on the human body. We annotate three existing image datasets with discrete self-contact and use these labels during mesh optimization to bring body parts supposed to touch into contact. Then we train TUCH, a human mesh regressor, on our new datasets. When evaluated on the task of human body pose and shape estimation on public benchmarks, our results show that knowing about self-contact not only improves mesh estimates for poses with self-contact, but also for poses without self-contact. Next, we study contact humans make with other individuals during close social interaction. Reconstructing these interactions in 3D is a significant challenge due to the mutual occlusion. Furthermore, the existing datasets of images taken in the wild with ground-truth contact labels are of insufficient size to facilitate the training of a robust human mesh regressor. In this work, we employ a generative model, BUDDI, to learn the joint distribution of 3D pose and shape of two individuals during their interaction and use this model as prior during an optimization routine. To construct training data we leverage pre-existing datasets, i.e. motion capture data and Flickr images with discrete contact annotations. Similar to discrete self-contact labels, we utilize discrete human- human contact to jointly fit two meshes to detected 2D joint locations. The majority of methods for generating 3D humans focus on the motion of a single person and operate on 3D joint locations. While these methods can effectively generate motion, their representation of 3D humans is not sufficient for physical contact since they do not model the body surface. Our approach, in contrast, acts on the pose and shape parameters of a human body model, which enables us to sample 3D meshes of two people. We further demonstrate how the knowledge of human proxemics, incorporated in our model, can be used to guide an optimization routine. For this, in each optimization iteration, BUDDI takes the current mesh and proposes a refinement that we subsequently consider in the objective function. This procedure enables us to go beyond state of the art by forgoing ground-truth discrete human-human contact labels during optimization. Self- and interpersonal contact happen on the surface of the human body, however, the majority of existing art tends to predict bodies with similar, “average” body shape. This is due to a lack of training data of paired images taken in the wild and ground- truth 3D body shape and because 2D joint locations are not sufficient to explain body shape. The most apparent solution would be to collect body scans of people together with their photos. This is, however, a time-consuming and cost-intensive process that lacks scalability. Instead, we leverage the vocabulary humans use to describe body shape. First, we ask annotators to label how much a word like “tall” or “long legs” applies to a human body. We gather these ratings for rendered meshes of various body shapes, for which we have ground-truth body model shape parameters, and for images collected from model agency websites. Using this data, we learn a shape-to-attribute (A2S) model that predicts body shape ratings from body shape parameters. Then we train a human mesh regressor, SHAPY, on the model agency images wherein we supervise body shape via attribute annotations using A2S. Since no suitable test set of diverse 3D ground-truth body shape with images taken in natural settings exists, we introduce Human Bodies in the Wild (HBW). This novel dataset contains photographs of individuals together with their body scan. Our model predicts more realistic body shapes from an image and quantitatively improves body shape estimation on this new benchmark. In summary, we present novel datasets, optimization methods, a generative model, and regressors to advance the field of 3D human pose and shape estimation. Taken together, these methods open up ways to obtain more accurate and realistic 3D mesh estimates from images with multiple people in self- and mutual contact poses and with diverse body shapes. This line of research also enables generative approaches to create more natural, human-like avatars. We believe that knowing about self- and human-human contact through computer vision has wide-ranging implications in other fields as for example robotics, fitness, or behavioral science.

ps

ps Müller, L. Self- and Interpersonal Contact in 3D Human Mesh Reconstruction University of Tübingen, Tübingen, 2024 (phdthesis)

no image

Distributed Event-Based Learning via ADMM

Er, D., Trimpe, S., Muehlebach, M.

Max Planck Institute for Intelligent Systems, 2024 (techreport)

lds

link (url) [BibTex]

lds Er, D., Trimpe, S., Muehlebach, M. Distributed Event-Based Learning via ADMM Max Planck Institute for Intelligent Systems, 2024 (techreport)

link (url) [BibTex]

Natural Language Control for 3D Human Motion Synthesis

Natural Language Control for 3D Human Motion Synthesis

LIGM, Ecole des Ponts, Univ Gustave Eiffel, CNRS, 2024 (phdthesis)

Abstract

3D human motions are at the core of many applications in the film industry, healthcare, augmented reality, virtual reality and video games. However, these applications often rely on expensive and time-consuming motion capture data. The goal of this thesis is to explore generative models as an alternative route to obtain 3D human motions. More specifically, our aim is to allow a natural language interface as a means to control the generation process. To this end, we develop a series of models that synthesize realistic and diverse motions following the semantic inputs. In our first contribution, described in Chapter 3, we address the challenge of generating human motion sequences conditioned on specific action categories. We introduce ACTOR, a conditional variational autoencoder (VAE) that learns an action-aware latent representation for human motions. We show significant gains over existing methods thanks to our new Transformer-based VAE formulation, encoding and decoding SMPL pose sequences through a single motion-level embedding. In our second contribution, described in Chapter 4, we go beyond categorical actions, and dive into the task of synthesizing diverse 3D human motions from textual descriptions allowing a larger vocabulary and potentially more fine-grained control. Our work stands out from previous research by not deterministically generating a single motion sequence, but by synthesizing multiple, varied sequences from a given text. We propose TEMOS, building on our VAE-based ACTOR architecture, but this time integrating a pretrained text encoder to handle large-vocabulary natural language inputs. In our third contribution, described in Chapter 5, we address the adjacent task of text-to-3D human motion retrieval, where the goal is to search in a motion collection by querying via text. We introduce a simple yet effective approach, named TMR, building on our earlier model TEMOS, by integrating a contrastive loss to enhance the structure of the cross-modal latent space. Our findings emphasize the importance of retaining the motion generation loss in conjunction with contrastive training for improved results. We establish a new evaluation benchmark and conduct analyses on several protocols. In our fourth contribution, described in Chapter 6, we introduce a new problem termed as “multi-track timeline control” for text-driven 3D human motion synthesis. Instead of a single textual prompt, users can organize multiple prompts in temporal intervals that may overlap. We introduce STMC, a test-time denoising method that can be integrated with any pre-trained motion diffusion model. Our evaluations demonstrate that our method generates motions that closely match the semantic and temporal aspects of the input timelines. In summary, our contributions in this thesis are as follows: (i) we develop a generative variational autoencoder, ACTOR, for action-conditioned generation of human motion sequences, (ii) we introduce TEMOS, a text-conditioned generative model that synthesizes diverse human motions from textual descriptions, (iii) we present TMR, a new approach for text-to-3D human motion retrieval, (iv) we propose STMC, a method for timeline control in text-driven motion synthesis, enabling the generation of detailed and complex motions.

ps

Thesis [BibTex]

ps Petrovich, M. Natural Language Control for 3D Human Motion Synthesis LIGM, Ecole des Ponts, Univ Gustave Eiffel, CNRS, 2024 (phdthesis)

Thesis [BibTex]

no image

Incremental Few-Shot Adaptation for Non-Prehensile Object Manipulation using Parallelizable Physics Simulators

Baumeister, F., Mack, L., Stueckler, J.

CoRR abs/2409.13228, CoRR, 2024, Submitted to IEEE International Conference on Robotics and Automation (ICRA) 2025 (techreport) Submitted

Abstract

Few-shot adaptation is an important capability for intelligent robots that perform tasks in open-world settings such as everyday environments or flexible production. In this paper, we propose a novel approach for non-prehensile manipulation which iteratively adapts a physics-based dynamics model for model-predictive control. We adapt the parameters of the model incrementally with a few examples of robot-object interactions. This is achieved by sampling-based optimization of the parameters using a parallelizable rigid-body physics simulation as dynamic world model. In turn, the optimized dynamics model can be used for model-predictive control using efficient sampling-based optimization. We evaluate our few-shot adaptation approach in several object pushing experiments in simulation and with a real robot.

ev

preprint supplemental video link (url) [BibTex]

ev Baumeister, F., Mack, L., Stueckler, J. Incremental Few-Shot Adaptation for Non-Prehensile Object Manipulation using Parallelizable Physics Simulators CoRR abs/2409.13228, CoRR, 2024, Submitted to IEEE International Conference on Robotics and Automation (ICRA) 2025 (techreport) Submitted

preprint supplemental video link (url) [BibTex]

2023

no image

Denoising Representation Learning for Causal Discovery

Sakenyte, U.

Université de Genèva, Switzerland, December 2023, external supervision (mastersthesis)

ei

2023

ei Sakenyte, U. Denoising Representation Learning for Causal Discovery Université de Genèva, Switzerland, December 2023, external supervision (mastersthesis)

no image

Navigating the Ocean of Biases: Political Bias Attribution in Language Models via Causal Structures

Jenny, D.

ETH Zurich, Switzerland, November 2023, external supervision (thesis)

ei

ei Jenny, D. Navigating the Ocean of Biases: Political Bias Attribution in Language Models via Causal Structures ETH Zurich, Switzerland, November 2023, external supervision (thesis)

Hydraulically Amplified Self-healing Electrostatic Actuators

Hydraulically Amplified Self-healing Electrostatic Actuators

Keplinger, C. M., Acome, E. L., Kellaris, N. A., Mitchell, S. K.

(US Patent 11795979B2), October 2023 (patent)

Abstract

An electro-hydraulic actuator includes a deformable shell defining an enclosed internal cavity and containing a liquid dielectric, first and second electrodes on first and second sides, respectively, of the enclosed internal cavity. An electrostatic force between the first and second electrodes upon application of a voltage to one of the electrodes draws the electrodes towards each other to displace the liquid dielectric within the enclosed internal cavity. The shell includes active and inactive areas such that the electrostatic forces between the first and second electrodes displaces the liquid dielectric within the enclosed internal cavity from the active area of the shell to the inactive area of the shell. The first and second electrodes, the deformable shell, and the liquid dielectric cooperate to form a self-healing capacitor, and the liquid dielectric is configured for automatically filling breaches in the liquid dielectric resulting from dielectric breakdown.

rm

link (url) [BibTex]

rm Keplinger, C. M., Acome, E. L., Kellaris, N. A., Mitchell, S. K. Hydraulically Amplified Self-healing Electrostatic Actuators (US Patent 11795979B2), October 2023 (patent)

link (url) [BibTex]

no image

Gesture-Based Nonverbal Interaction for Exercise Robots

University of Tübingen, Tübingen, Germany, October 2023, Department of Computer Science (phdthesis)

Abstract

When teaching or coaching, humans augment their words with carefully timed hand gestures, head and body movements, and facial expressions to provide feedback to their students. Robots, however, rarely utilize these nuanced cues. A minimally supervised social robot equipped with these abilities could support people in exercising, physical therapy, and learning new activities. This thesis examines how the intuitive power of human gestures can be harnessed to enhance human-robot interaction. To address this question, this research explores gesture-based interactions to expand the capabilities of a socially assistive robotic exercise coach, investigating the perspectives of both novice users and exercise-therapy experts. This thesis begins by concentrating on the user's engagement with the robot, analyzing the feasibility of minimally supervised gesture-based interactions. This exploration seeks to establish a framework in which robots can interact with users in a more intuitive and responsive manner. The investigation then shifts its focus toward the professionals who are integral to the success of these innovative technologies: the exercise-therapy experts. Roboticists face the challenge of translating the knowledge of these experts into robotic interactions. We address this challenge by developing a teleoperation algorithm that can enable exercise therapists to create customized gesture-based interactions for a robot. Thus, this thesis lays the groundwork for dynamic gesture-based interactions in minimally supervised environments, with implications for not only exercise-coach robots but also broader applications in human-robot interaction.

hi

Project Page [BibTex]

hi Mohan, M. Gesture-Based Nonverbal Interaction for Exercise Robots University of Tübingen, Tübingen, Germany, October 2023, Department of Computer Science (phdthesis)

Project Page [BibTex]

no image

Efficient Sampling from Differentiable Matrix Elements

Technical University of Munich, Germany, September 2023 (mastersthesis)

ei

ei Kofler, A. Efficient Sampling from Differentiable Matrix Elements Technical University of Munich, Germany, September 2023 (mastersthesis)

High Strain Peano Hydraulically Amplified Self-Healing Electrostatic (HASEL) Transducers

High Strain Peano Hydraulically Amplified Self-Healing Electrostatic (HASEL) Transducers

Keplinger, C. M., Wang, X., Mitchell, S. K.

(US Patent App. 18/138,621), August 2023 (patent)

Abstract

High strain hydraulically amplified self-healing electrostatic transducers having increased maximum theoretical and practical strains are disclosed. In particular, the actuators include electrode configurations having a zipping front created by the attraction of the electrodes that is configured orthogonally to a strain axis along which the actuators. This configuration produces increased strains. In turn, various form factors for the actuator configuration are presented including an artificial circular muscle and a strain amplifying pulley system. Other actuator configurations are contemplated that include independent and opposed electrode pairs to create cyclic activation, hybrid electrode configurations, and use of strain limiting layers for controlled deflection of the actuator.

rm

link (url) [BibTex]

rm Keplinger, C. M., Wang, X., Mitchell, S. K. High Strain Peano Hydraulically Amplified Self-Healing Electrostatic (HASEL) Transducers (US Patent App. 18/138,621), August 2023 (patent)

link (url) [BibTex]

no image

Advances in Algorithmic Recourse: Ensuring Causal Consistency, Fairness, & Robustness

ETH Zurich, Switzerland, July 2023 (phdthesis)

ei

ei Karimi, A. Advances in Algorithmic Recourse: Ensuring Causal Consistency, Fairness, & Robustness ETH Zurich, Switzerland, July 2023 (phdthesis)

no image

Learning and Testing Powerful Hypotheses

University of Tübingen, Germany, July 2023 (phdthesis)

ei

ei Kübler, J. M. Learning and Testing Powerful Hypotheses University of Tübingen, Germany, July 2023 (phdthesis)

Capacitive Self-Sensing for Electrostatic Transducers with High Voltage Isolation

Capacitive Self-Sensing for Electrostatic Transducers with High Voltage Isolation

Correll, N., Ly, K. D., Kellaris, N. A., Keplinger, C. M.

(US Patent App. 17/928,453), June 2023 (patent)

Abstract

Transducer systems disclosed herein include self-sensing capabilities. In particular, electrostatic transducers include a low voltage electrode and a high voltage electrode. A low voltage sensing unit is coupled with the low voltage electrode of the electrostatic transducer. The low voltage sensing unit is configured to measure a capacitance of the electrostatic transducer, from which displacement of the electrostatic transducer may be calculated. High voltage drive signals received by the high voltage electrode during actuation may be isolated from the low voltage sensing unit. The isolation may be provided by dielectric material of the electrostatic transducer, a voltage suppression component, and/or a voltage suppression module comprising a low impedance ground path. In the event of an electrical failure of the transducer, the low voltage sensing unit may be isolated from high voltages.

rm

link (url) [BibTex]

rm Correll, N., Ly, K. D., Kellaris, N. A., Keplinger, C. M. Capacitive Self-Sensing for Electrostatic Transducers with High Voltage Isolation (US Patent App. 17/928,453), June 2023 (patent)

link (url) [BibTex]

no image

Learning Identifiable Representations: Independent Influences and Multiple Views

University of Tübingen, Germany, June 2023 (phdthesis)

ei

ei Gresele, L. Learning Identifiable Representations: Independent Influences and Multiple Views University of Tübingen, Germany, June 2023 (phdthesis)

no image

Learning with and for discrete optimization

Paulus, M.

ETH Zurich, Switzerland, May 2023, CLS PhD Program (phdthesis)

ei

ei Paulus, M. Learning with and for discrete optimization ETH Zurich, Switzerland, May 2023, CLS PhD Program (phdthesis)

High Strain Peano Hydraulically Amplified Self-healing Electrostatic (HASEL) Transducers

High Strain Peano Hydraulically Amplified Self-healing Electrostatic (HASEL) Transducers

Keplinger, C. M., Wang, X., Mitchell, S. K.

(US Patent 11635094), April 2023 (patent)

Abstract

High strain hydraulically amplified self-healing electrostatic transducers having increased maximum theoretical and practical strains are disclosed. In particular, the actuators include electrode configurations having a zipping front created by the attraction of the electrodes that is configured orthogonally to a strain axis along which the actuators. This configuration produces increased strains. In turn, various form factors for the actuator configuration are presented including an artificial circular muscle and a strain amplifying pulley system. Other actuator configurations are contemplated that include independent and opposed electrode pairs to create cyclic activation, hybrid electrode configurations, and use of strain limiting layers for controlled deflection of the actuator.

rm

link (url) [BibTex]

rm Keplinger, C. M., Wang, X., Mitchell, S. K. High Strain Peano Hydraulically Amplified Self-healing Electrostatic (HASEL) Transducers (US Patent 11635094), April 2023 (patent)

link (url) [BibTex]

no image

Intrinsic complexity and mechanisms of expressivity of cortical neurons

University of Tübingen, Germany, March 2023 (mastersthesis)

ei

ei Spieler, A. M. Intrinsic complexity and mechanisms of expressivity of cortical neurons University of Tübingen, Germany, March 2023 (mastersthesis)

no image

CausalEffect Estimation by Combining Observational and Interventional Data

ETH Zurich, Switzerland, February 2023 (mastersthesis)

lds ei

lds ei Kladny, K. CausalEffect Estimation by Combining Observational and Interventional Data ETH Zurich, Switzerland, February 2023 (mastersthesis)

no image

Towards Generative Machine Teaching

Qui, Z.

Technical University of Munich, Germany, February 2023 (mastersthesis)

ei

ei Qui, Z. Towards Generative Machine Teaching Technical University of Munich, Germany, February 2023 (mastersthesis)

no image

ArchiSound: Audio Generation with Diffusion

Schneider, F.

ETH Zurich, Switzerland, January 2023, external supervision (mastersthesis)

ei

ei Schneider, F. ArchiSound: Audio Generation with Diffusion ETH Zurich, Switzerland, January 2023, external supervision (mastersthesis)

no image

Generation and Quantification of Spin in Robot Table Tennis

University of Stuttgart, Germany, January 2023 (mastersthesis)

ei

ei Dittrich, A. Generation and Quantification of Spin in Robot Table Tennis University of Stuttgart, Germany, January 2023 (mastersthesis)

An Open-Source Modular Treadmill for Dynamic Force Measurement with Load Dependant Range Adjustment

An Open-Source Modular Treadmill for Dynamic Force Measurement with Load Dependant Range Adjustment

Sarvestani, A., Ruppert, F., Badri-Spröwitz, A.

2023 (unpublished) Submitted

Abstract

Ground reaction force sensing is one of the key components of gait analysis in legged locomotion research. To measure continuous force data during locomotion, we present a novel compound instrumented treadmill design. The treadmill is 1.7 m long, with a natural frequency of 170 Hz and an adjustable range that can be used for humans and small robots alike. Here, we present the treadmill’s design methodology and characterize it in its natural frequency, noise behavior and real-life performance. Additionally, we apply an ISO 376 norm conform calibration procedure for all spatial force directions and center of pressure position. We achieve a force accuracy of ≤ 5.6 N for the ground reaction forces and ≤ 13 mm in center of pressure position.

dlg

arXiv link (url) DOI [BibTex]

dlg Sarvestani, A., Ruppert, F., Badri-Spröwitz, A. An Open-Source Modular Treadmill for Dynamic Force Measurement with Load Dependant Range Adjustment 2023 (unpublished) Submitted

arXiv link (url) DOI [BibTex]

no image

Natural Language Processing for Policymaking

Jin, Z., Mihalcea, R.

In Handbook of Computational Social Science for Policy, pages: 141-162, 7, (Editors: Bertoni, E. and Fontana, M. and Gabrielli, L. and Signorelli, S. and Vespe, M.), Springer International Publishing, 2023 (inbook)

ei

ei Jin, Z., Mihalcea, R. Natural Language Processing for Policymaking In Handbook of Computational Social Science for Policy, pages: 141-162, 7, (Editors: Bertoni, E. and Fontana, M. and Gabrielli, L. and Signorelli, S. and Vespe, M.), Springer International Publishing, 2023 (inbook)

no image

Object-Level Dynamic Scene Reconstruction With Physical Plausibility From RGB-D Images

Eberhard Karls Universität Tübingen, Tübingen, 2023 (phdthesis)

Abstract

Humans have the remarkable ability to perceive and interact with objects in the world around them. They can easily segment objects from visual data and have an intuitive understanding of how physics influences objects. By contrast, robots are so far often constrained to tailored environments for a specific task, due to their inability to reconstruct a versatile and accurate scene representation. In this thesis, we combine RGB-D video data with background knowledge of real-world physics to develop such a representation for robots.

Our contributions can be separated into two main parts: a dynamic object tracking tool and optimization frameworks that allow for improving shape reconstructions based on physical plausibility. The dynamic object tracking tool "EM-Fusion" detects, segments, reconstructs, and tracks objects from RGB-D video data. We propose a probabilistic data association approach for attributing the image pixels to the different moving objects in the scene. This allows us to track and reconstruct moving objects and the background scene with state-of-the art accuracy and robustness towards occlusions.

We investigate two ways of further optimizing the reconstructed shapes of moving objects based on physical plausibility. The first of these, "Co-Section", includes physical plausibility by reasoning about the empty space around an object. We observe that no two objects can occupy the same space at the same time and that the depth images in the input video provide an estimate of observed empty space. Based on these observations, we propose intersection and hull constraints, which we combine with the observed surfaces in a global optimization approach. Compared to EM-Fusion, which only reconstructs the observed surface, Co-Section optimizes watertight shapes. These watertight shapes provide a rough estimate of unseen surfaces and could be useful as initialization for further refinement, e.g., by interactive perception. In the second optimization approach, "DiffSDFSim", we reason about object shapes based on physically plausible object motion. We observe that object trajectories after collisions depend on the object's shape, and extend a differentiable physics simulation for optimizing object shapes together with other physical properties (e.g., forces, masses, friction) based on the motion of the objects and their interactions. Our key contributions are using signed distance function models for representing shapes and a novel method for computing gradients that models the dependency of the time of contact on object shapes. We demonstrate that our approach recovers target shapes well by fitting to target trajectories and depth observations. Further, the ground-truth trajectories are recovered well in simulation using the resulting shape and physical properties. This enables predictions about the future motion of objects by physical simulation.

We anticipate that our contributions can be useful building blocks in the development of 3D environment perception for robots. The reconstruction of individual objects as in EM-Fusion is a key ingredient required for interactions with objects. Completed shapes as the ones provided by Co-Section provide useful cues for planning interactions like grasping of objects. Finally, the recovery of shape and other physical parameters using differentiable simulation as in DiffSDFSim allows simulating objects and thus predicting the effects of interactions. Future work might extend the presented works for interactive perception of dynamic environments by comparing these predictions with observed real-world interactions to further improve the reconstructions and physical parameter estimations.

ev

link (url) DOI [BibTex]

ev Strecke, M. F. Object-Level Dynamic Scene Reconstruction With Physical Plausibility From RGB-D Images Eberhard Karls Universität Tübingen, Tübingen, 2023 (phdthesis)

link (url) DOI [BibTex]

no image

Tube-shaped robotic device with anisotropic surface structure

Wang, T., Hu, W., Sitti, M.

2023, US Patent App. 18/133,104 (patent)

pi

pi Wang, T., Hu, W., Sitti, M. Tube-shaped robotic device with anisotropic surface structure 2023, US Patent App. 18/133,104 (patent)

no image

Carrier, use of a carrier, method of activating a carrier and method of making a carrier

Drotlef, D., Sitti, M., Amjadi, M.

2023, US Patent App. 16/500,442 (patent)

pi

pi Drotlef, D., Sitti, M., Amjadi, M. Carrier, use of a carrier, method of activating a carrier and method of making a carrier 2023, US Patent App. 16/500,442 (patent)

Synchronizing Machine Learning Algorithms, Realtime Robotic Control and Simulated Environment with o80

Synchronizing Machine Learning Algorithms, Realtime Robotic Control and Simulated Environment with o80

Berenz, V., Widmaier, F., Guist, S., Schölkopf, B., Büchler, D.

Robot Software Architectures Workshop (RSA) 2023, ICRA, 2023 (techreport)

Abstract

Robotic applications require the integration of various modalities, encompassing perception, control of real robots and possibly the control of simulated environments. While the state-of-the-art robotic software solutions such as ROS 2 provide most of the required features, flexible synchronization between algorithms, data streams and control loops can be tedious. o80 is a versatile C++ framework for robotics which provides a shared memory model and a command framework for real-time critical systems. It enables expert users to set up complex robotic systems and generate Python bindings for scientists. o80's unique feature is its flexible synchronization between processes, including the traditional blocking commands and the novel ``bursting mode'', which allows user code to control the execution of the lower process control loop. This makes it particularly useful for setups that mix real and simulated environments.

ei

arxiv poster link (url) [BibTex]

ei Berenz, V., Widmaier, F., Guist, S., Schölkopf, B., Büchler, D. Synchronizing Machine Learning Algorithms, Realtime Robotic Control and Simulated Environment with o80 Robot Software Architectures Workshop (RSA) 2023, ICRA, 2023 (techreport)

arxiv poster link (url) [BibTex]

no image

Wave front shaping with zone plates: Fabrication and characterization of lenses for soft x-ray applications from standard to singular optics

Baluktsian, M.

Universität Stuttgart, Stuttgart (und Verlag Dr. Hut, München), 2023 (phdthesis)

mms

link (url) [BibTex]

mms Baluktsian, M. Wave front shaping with zone plates: Fabrication and characterization of lenses for soft x-ray applications from standard to singular optics Universität Stuttgart, Stuttgart (und Verlag Dr. Hut, München), 2023 (phdthesis)

link (url) [BibTex]

no image

Tailored perovskite-type oxynitride semiconductors and oxides with advanced physical properties

Technische Universität Darmstadt, Darmstadt, 2023 (phdthesis)

mms

link (url) DOI [BibTex]

mms Bubeck, C. Tailored perovskite-type oxynitride semiconductors and oxides with advanced physical properties Technische Universität Darmstadt, Darmstadt, 2023 (phdthesis)

link (url) DOI [BibTex]

no image

Microfibers with mushroom-shaped tips for optimal adhesion

Sitti, M., Aksak, B.

2023, US Patent 11,613,674 (patent)

pi

pi Sitti, M., Aksak, B. Microfibers with mushroom-shaped tips for optimal adhesion 2023, US Patent 11,613,674 (patent)

no image

Method of fabricating a magnetic deformable machine and deformable 3D magnetic machine

Zhang, J., Ren, Z., Hu, W., Sitti, M.

2023, US Patent App. 18/020,161 (patent)

pi

pi Zhang, J., Ren, Z., Hu, W., Sitti, M. Method of fabricating a magnetic deformable machine and deformable 3D magnetic machine 2023, US Patent App. 18/020,161 (patent)

Magnetic trap system and method of navigating a microscopic device

Magnetic trap system and method of navigating a microscopic device

Son, D., Ugurlu, M., Bluemer, P., Sitti, M.

2023, US Patent App. 17/871,598 (patent)

pi

pi Son, D., Ugurlu, M., Bluemer, P., Sitti, M. Magnetic trap system and method of navigating a microscopic device 2023, US Patent App. 17/871,598 (patent)

A Liquid Repellent Fibrillar Dry Adhesive Material and a Method of Producing the Same

A Liquid Repellent Fibrillar Dry Adhesive Material and a Method of Producing the Same

Sitti, M., Drotlef, D., Liimatainen, V.

2023, US Patent App. 17/785,452 (patent)

pi

pi Sitti, M., Drotlef, D., Liimatainen, V. A Liquid Repellent Fibrillar Dry Adhesive Material and a Method of Producing the Same 2023, US Patent App. 17/785,452 (patent)

no image

Simultaneous calibration method for magnetic localization and actuation systems

Sitti, M., Son, D., Dong, X.

2023, US Patent 11,717,142 (patent)

pi

pi Sitti, M., Son, D., Dong, X. Simultaneous calibration method for magnetic localization and actuation systems 2023, US Patent 11,717,142 (patent)

no image

Dry adhesives and methods for making dry adhesives

M Sitti, M. M. B. A.

2023, US Patent 11,773,298, 2023 (patent)

pi

pi M Sitti, M. M. B. A. Dry adhesives and methods for making dry adhesives 2023, US Patent 11,773,298, 2023 (patent)

no image

Static and dynamic investigation of magnonic systems: materials, applications and modeling

Schulz, Frank Martin Ernst

Universität Stuttgart, Stuttgart, 2023 (phdthesis)

mms

link (url) DOI [BibTex]

mms Schulz, Frank Martin Ernst Static and dynamic investigation of magnonic systems: materials, applications and modeling Universität Stuttgart, Stuttgart, 2023 (phdthesis)

link (url) DOI [BibTex]

← Previous
1
2
3
4
5
6
7
8
9
…
20
21
Next →