Header logo is


2024


Leveraging Unpaired Data for the Creation of Controllable Digital Humans
Leveraging Unpaired Data for the Creation of Controllable Digital Humans

Sanyal, S.

Max Planck Institute for Intelligent Systems and Eberhard Karls Universität Tübingen, September 2024 (phdthesis) To be published

Abstract
Digital humans have grown increasingly popular, offering transformative potential across various fields such as education, entertainment, and healthcare. They enrich user experiences by providing immersive and personalized interactions. Enhancing these experiences involves making digital humans controllable, allowing for manipulation of aspects like pose and appearance, among others. Learning to create such controllable digital humans necessitates extensive data from diverse sources. This includes 2D human images alongside their corresponding 3D geometry and texture, 2D images showcasing similar appearances across a wide range of body poses, etc., for effective control over pose and appearance. However, the availability of such “paired data” is limited, making its collection both time-consuming and expensive. Despite these challenges, there is an abundance of unpaired 2D images with accessible, inexpensive labels—such as identity, type of clothing, appearance of clothing, etc. This thesis capitalizes on these affordable labels, employing informed observations from “unpaired data” to facilitate the learning of controllable digital humans through reconstruction, transposition, and generation processes. The presented methods—RingNet, SPICE, and SCULPT—each tackles different aspects of controllable digital human modeling. RingNet (Sanyal et al. [2019]) exploits the consistent facial geometry across different images of the same individual to estimate 3D face shapes and poses without 2D-to-3D supervision. This method illustrates how leveraging the inherent properties of unpaired images—such as identity consistency—can circumvent the need for expensive paired datasets. Similarly, SPICE (Sanyal et al. [2021]) employs a self-supervised learning framework that harnesses unpaired images to generate realistic transpositions of human poses by understanding the underlying 3D body structure and maintaining consistency in body shape and appearance features across different poses. Finally, SCULPT (Sanyal et al. [2024] generates clothed and textured 3D meshes by integrating insights from unpaired 2D images and medium-sized 3D scans. This process employs an unpaired learning approach, conditioning texture and geometry generation on attributes easily derived from data, like the type and appearance of clothing. In conclusion, this thesis highlights how unpaired data and innovative learning techniques can address the challenges of data scarcity and high costs in developing controllable digital humans by advancing reconstruction, transposition, and generation techniques.

ps

[BibTex]

2024


[BibTex]


Realistic Digital Human Characters: Challenges, Models and Algorithms
Realistic Digital Human Characters: Challenges, Models and Algorithms

Osman, A. A. A.

University of Tübingen, September 2024 (phdthesis)

Abstract
Statistical models for the body, head, and hands are essential in various computer vision tasks. However, popular models like SMPL, MANO, and FLAME produce unrealistic deformations due to inherent flaws in their modeling assumptions and how they are trained, which have become standard practices in constructing models for the body and its parts. This dissertation addresses these limitations by proposing new modeling and training algorithms to improve the realism and generalization of current models. We introduce a new model, STAR (Sparse Trained Articulated Human Body Regressor), which learns a sparse representation of the human body deformations, significantly reducing the number of model parameters compared to models like SMPL. This approach ensures that deformations are spatially localized, leading to more realistic deformations. STAR also incorporates shape-dependent pose deformations, accounting for variations in body shape to enhance overall model accuracy and realism. Additionally, we present a novel federated training algorithm for developing a comprehensive suite of models for the body and its parts. We train an expressive body model, SUPR (Sparse Unified Part-Based Representation), on a federated dataset of full-body scans, including detailed scans of the head, hands, and feet. We then separate SUPR into a full suite of state-of-the-art models for the head, hands, and foot. The new foot model captures complex foot deformations, addressing challenges related to foot shape, pose, and ground contact dynamics. The dissertation concludes by introducing AVATAR (Articulated Virtual Humans Trained By Bayesian Inference From a Single Scan), a novel, data-efficient training algorithm. AVATAR allows the creation of personalized, high-fidelity body models from a single scan by framing model construction as a Bayesian inference problem, thereby enabling training from small-scale datasets while reducing the risk of overfitting. These advancements push the state of the art in human body modeling and training techniques, making them more accessible for broader research and practical applications.

ps

[BibTex]


no image
Advances in Probabilistic Methods for Deep Learning

Immer, A.

ETH Zurich, Switzerland, September 2024, CLS PhD Program (phdthesis)

ei

[BibTex]

[BibTex]


no image
Engineering and Evaluating Naturalistic Vibrotactile Feedback for Telerobotic Assembly

Gong, Y.

University of Stuttgart, Stuttgart, Germany, August 2024, Faculty of Design, Production Engineering and Automotive Engineering (phdthesis)

Abstract
Teleoperation allows workers on a construction site to assemble pre-fabricated building components by controlling powerful machines from a safe distance. However, teleoperation's primary reliance on visual feedback limits the operator's efficiency in situations with stiff contact or poor visibility, compromising their situational awareness and thus increasing the difficulty of the task; it also makes construction machines more difficult to learn to operate. To bridge this gap, we propose that reliable, economical, and easy-to-implement naturalistic vibrotactile feedback could improve telerobotic control interfaces in construction and other application areas such as surgery. This type of feedback enables the operator to feel the natural vibrations experienced by the robot, which contain crucial information about its motions and its physical interactions with the environment. This dissertation explores how to deliver naturalistic vibrotactile feedback from a robot's end-effector to the hand of an operator performing telerobotic assembly tasks; furthermore, it seeks to understand the effects of such haptic cues. The presented research can be divided into four parts. We first describe the engineering of AiroTouch, a naturalistic vibrotactile feedback system tailored for use on construction sites but suitable for many other applications of telerobotics. Then we evaluate AiroTouch and explore the effects of the naturalistic vibrotactile feedback it delivers in three user studies conducted either in laboratory settings or on a construction site. We begin this dissertation by developing guidelines for creating a haptic feedback system that provides high-quality naturalistic vibrotactile feedback. These guidelines include three sections: component selection, component placement, and system evaluation. We detail each aspect with the parameters that need to be considered. Based on these guidelines, we adapt widely available commercial audio equipment to create our system called AiroTouch, which measures the vibration experienced by each robot tool with a high-bandwidth three-axis accelerometer and enables the user to feel this vibration in real time through a voice-coil actuator. Accurate haptic transmission is achieved by optimizing the positions of the system's off-the-shelf sensors and actuators and is then verified through measurements. The second part of this thesis presents our initial validation of AiroTouch. We explored how adding this naturalistic type of vibrotactile feedback affects the operator during small-scale telerobotic assembly. Due to the limited accessibility of teleoperated robots and to maintain safety, we conducted a user study in lab with a commercial bimanual dexterous teleoperation system developed for surgery (Intuitive da Vinci Si). Thirty participants used this robot equipped with AiroTouch to assemble a small stiff structure under three randomly ordered haptic feedback conditions: no vibrations, one-axis vibrations, and summed three-axis vibrations. The results show that participants learn to take advantage of both tested versions of the haptic feedback in the given tasks, as significantly lower vibrations and forces are observed in the second trial. Subjective responses indicate that naturalistic vibrotactile feedback increases the realism of the interaction and reduces the perceived task duration, task difficulty, and fatigue. To test our approach on a real construction site, we enhanced AiroTouch using wireless signal-transmission technologies and waterproofing, and then we adapted it to a mini-crane construction robot. A study was conducted to evaluate how naturalistic vibrotactile feedback affects an observer's understanding of telerobotic assembly performed by this robot on a construction site. Seven adults without construction experience observed a mix of manual and autonomous assembly processes both with and without naturalistic vibrotactile feedback. Qualitative analysis of their survey responses and interviews indicates that all participants had positive responses to this technology and believed it would be beneficial for construction activities. Finally, we evaluated the effects of naturalistic vibrotactile feedback provided by wireless AiroTouch during live teleoperation of the mini-crane. Twenty-eight participants remotely controlled the mini-crane to complete three large-scale assembly-related tasks in lab, both with and without this type of haptic feedback. Our results show that naturalistic vibrotactile feedback enhances the participants' awareness of both robot motion and contact between the robot and other objects, particularly in scenarios with limited visibility. These effects increase participants' confidence when controlling the robot. Moreover, there is a noticeable trend of reduced vibration magnitude in the conditions where this type of haptic feedback is provided. The primary contribution of this dissertation is the clear explanation of details that are essential for the effective implementation of naturalistic vibrotactile feedback. We demonstrate that our accessible, audio-based approach can enhance user performance and experience during telerobotic assembly in construction and other application domains. These findings lay the foundation for further exploration of the potential benefits of incorporating haptic cues to enhance user experience during teleoperation.

hi

Project Page [BibTex]

Project Page [BibTex]


no image
A Measure-Theoretic Axiomatisation of Causality and Kernel Regression

Park, J.

University of Tübingen, Germany, July 2024 (phdthesis)

ei

[BibTex]

[BibTex]


Modelling Dynamic 3D Human-Object Interactions: From Capture to Synthesis
Modelling Dynamic 3D Human-Object Interactions: From Capture to Synthesis

Taheri, O.

University of Tübingen, July 2024 (phdthesis) To be published

Abstract
Modeling digital humans that move and interact realistically with virtual 3D worlds has emerged as an essential research area recently, with significant applications in computer graphics, virtual and augmented reality, telepresence, the Metaverse, and assistive technologies. In particular, human-object interaction, encompassing full-body motion, hand-object grasping, and object manipulation, lies at the core of how humans execute tasks and represents the complex and diverse nature of human behavior. Therefore, accurate modeling of these interactions would enable us to simulate avatars to perform tasks, enhance animation realism, and develop applications that better perceive and respond to human behavior. Despite its importance, this remains a challenging problem, due to several factors such as the complexity of human motion, the variance of interaction based on the task, and the lack of rich datasets capturing the complexity of real-world interactions. Prior methods have made progress, but limitations persist as they often focus on individual aspects of interaction, such as body, hand, or object motion, without considering the holistic interplay among these components. This Ph.D. thesis addresses these challenges and contributes to the advancement of human-object interaction modeling through the development of novel datasets, methods, and algorithms.

ps

[BibTex]

[BibTex]


no image
Empowering Change: The Role of Student Changemakers in Advancing Sustainability within Engineering Education

Matthew, V., Simancek, R. E., Telepo, E., Machesky, J., Willman, H., Ismail, A. B., Schulz, A. K.

Proceedings of the American Society of Engineering Education (ASEE), June 2024, Victoria Matthew and Andrew K. Schulz contributed equally to this publication. (issue) In press

hi

[BibTex]

[BibTex]


no image
Advancing Normalising Flows to Model Boltzmann Distributions

Stimper, V.

University of Cambridge, UK, Cambridge, June 2024, (Cambridge-Tübingen-Fellowship-Program) (phdthesis)

ei

[BibTex]

[BibTex]


no image
Language Models Can Reduce Asymmetry in Information Markets

Rahaman, N., Weiss, M., Wüthrich, M., Bengio, Y., Li, E., Pal, C., Schölkopf, B.

arXiv:2403.14443, March 2024, Published as: Redesigning Information Markets in the Era of Language Models, Conference on Language Modeling (COLM) (techreport)

Abstract
This work addresses the buyer's inspection paradox for information markets. The paradox is that buyers need to access information to determine its value, while sellers need to limit access to prevent theft. To study this, we introduce an open-source simulated digital marketplace where intelligent agents, powered by language models, buy and sell information on behalf of external participants. The central mechanism enabling this marketplace is the agents' dual capabilities: they not only have the capacity to assess the quality of privileged information but also come equipped with the ability to forget. This ability to induce amnesia allows vendors to grant temporary access to proprietary information, significantly reducing the risk of unauthorized retention while enabling agents to accurately gauge the information's relevance to specific queries or tasks. To perform well, agents must make rational decisions, strategically explore the marketplace through generated sub-queries, and synthesize answers from purchased information. Concretely, our experiments (a) uncover biases in language models leading to irrational behavior and evaluate techniques to mitigate these biases, (b) investigate how price affects demand in the context of informational goods, and (c) show that inspection and higher budgets both lead to higher quality outcomes.

ei

link (url) [BibTex]

link (url) [BibTex]


no image
Identifiable Causal Representation Learning

von Kügelgen, J.

University of Cambridge, UK, Cambridge, February 2024, (Cambridge-Tübingen-Fellowship) (phdthesis)

ei

[BibTex]

[BibTex]


Creating a Haptic Empathetic Robot Animal That Feels Touch and Emotion
Creating a Haptic Empathetic Robot Animal That Feels Touch and Emotion

Burns, R.

University of Tübingen, Tübingen, Germany, February 2024, Department of Computer Science (phdthesis)

Abstract
Social touch, such as a hug or a poke on the shoulder, is an essential aspect of everyday interaction. Humans use social touch to gain attention, communicate needs, express emotions, and build social bonds. Despite its importance, touch sensing is very limited in most commercially available robots. By endowing robots with social-touch perception, one can unlock a myriad of new interaction possibilities. In this thesis, I present my work on creating a Haptic Empathetic Robot Animal (HERA), a koala-like robot for children with autism. I demonstrate the importance of establishing design guidelines based on one's target audience, which we investigated through interviews with autism specialists. I share our work on creating full-body tactile sensing for the NAO robot using low-cost, do-it-yourself (DIY) methods, and I introduce an approach to model long-term robot emotions using second-order dynamics.

hi

Project Page [BibTex]

Project Page [BibTex]


no image
Learning a Terrain- and Robot-Aware Dynamics Model for Autonomous Mobile Robot Navigation

Achterhold, J., Guttikonda, S., Kreber, J. U., Li, H., Stueckler, J.

CoRR abs/2409.11452, 2024, Preprint submitted to Robotics and Autonomous Systems Journal. https://arxiv.org/abs/2409.11452 (techreport) Submitted

Abstract
Mobile robots should be capable of planning cost-efficient paths for autonomous navigation. Typically, the terrain and robot properties are subject to variations. For instance, properties of the terrain such as friction may vary across different locations. Also, properties of the robot may change such as payloads or wear and tear, e.g., causing changing actuator gains or joint friction. Autonomous navigation approaches should thus be able to adapt to such variations. In this article, we propose a novel approach for learning a probabilistic, terrain- and robot-aware forward dynamics model (TRADYN) which can adapt to such variations and demonstrate its use for navigation. Our learning approach extends recent advances in meta-learning forward dynamics models based on Neural Processes for mobile robot navigation. We evaluate our method in simulation for 2D navigation of a robot with uni-cycle dynamics with varying properties on terrain with spatially varying friction coefficients. In our experiments, we demonstrate that TRADYN has lower prediction error over long time horizons than model ablations which do not adapt to robot or terrain variations. We also evaluate our model for navigation planning in a model-predictive control framework and under various sources of noise. We demonstrate that our approach yields improved performance in planning control-efficient paths by taking robot and terrain properties into account.

ev

preprint [BibTex]

preprint [BibTex]


no image
A Pontryagin Perspective on Reinforcement Learning

Eberhard, O., Vernade, C., Muehlebach, M.

Max Planck Institute for Intelligent Systems, 2024 (techreport)

lds

link (url) [BibTex]

link (url) [BibTex]


Self- and Interpersonal Contact in 3D Human Mesh Reconstruction
Self- and Interpersonal Contact in 3D Human Mesh Reconstruction

Müller, L.

University of Tübingen, Tübingen, 2024 (phdthesis)

Abstract
The ability to perceive tactile stimuli is of substantial importance for human beings in establishing a connection with the surrounding world. Humans rely on the sense of touch to navigate their environment and to engage in interactions with both themselves and other people. The field of computer vision has made great progress in estimating a person’s body pose and shape from an image, however, the investigation of self- and interpersonal contact has received little attention despite its considerable significance. Estimating contact from images is a challenging endeavor because it necessitates methodologies capable of predicting the full 3D human body surface, i.e. an individual’s pose and shape. The limitations of current methods become evident when considering the two primary datasets and labels employed within the community to supervise the task of human pose and shape estimation. First, the widely used 2D joint locations lack crucial information for representing the entire 3D body surface. Second, in datasets of 3D human bodies, e.g. collected from motion capture systems or body scanners, contact is usually avoided, since it naturally leads to occlusion which complicates data cleaning and can break the data processing pipelines. In this thesis, we first address the problem of estimating contact that humans make with themselves from RGB images. To do this, we introduce two novel methods that we use to create new datasets tailored for the task of human mesh estimation for poses with self-contact. We create (1) 3DCP, a dataset of 3D body scan and motion capture data of humans in poses with self-contact and (2) MTP, a dataset of images taken in the wild with accurate 3D reference data using pose mimicking. Next, we observe that 2D joint locations can be readily labeled at scale given an image, however, an equivalent label for self-contact does not exist. Consequently, we introduce (3) distrecte self-contact (DSC) annotations indicating the pairwise contact of discrete regions on the human body. We annotate three existing image datasets with discrete self-contact and use these labels during mesh optimization to bring body parts supposed to touch into contact. Then we train TUCH, a human mesh regressor, on our new datasets. When evaluated on the task of human body pose and shape estimation on public benchmarks, our results show that knowing about self-contact not only improves mesh estimates for poses with self-contact, but also for poses without self-contact. Next, we study contact humans make with other individuals during close social interaction. Reconstructing these interactions in 3D is a significant challenge due to the mutual occlusion. Furthermore, the existing datasets of images taken in the wild with ground-truth contact labels are of insufficient size to facilitate the training of a robust human mesh regressor. In this work, we employ a generative model, BUDDI, to learn the joint distribution of 3D pose and shape of two individuals during their interaction and use this model as prior during an optimization routine. To construct training data we leverage pre-existing datasets, i.e. motion capture data and Flickr images with discrete contact annotations. Similar to discrete self-contact labels, we utilize discrete human- human contact to jointly fit two meshes to detected 2D joint locations. The majority of methods for generating 3D humans focus on the motion of a single person and operate on 3D joint locations. While these methods can effectively generate motion, their representation of 3D humans is not sufficient for physical contact since they do not model the body surface. Our approach, in contrast, acts on the pose and shape parameters of a human body model, which enables us to sample 3D meshes of two people. We further demonstrate how the knowledge of human proxemics, incorporated in our model, can be used to guide an optimization routine. For this, in each optimization iteration, BUDDI takes the current mesh and proposes a refinement that we subsequently consider in the objective function. This procedure enables us to go beyond state of the art by forgoing ground-truth discrete human-human contact labels during optimization. Self- and interpersonal contact happen on the surface of the human body, however, the majority of existing art tends to predict bodies with similar, “average” body shape. This is due to a lack of training data of paired images taken in the wild and ground- truth 3D body shape and because 2D joint locations are not sufficient to explain body shape. The most apparent solution would be to collect body scans of people together with their photos. This is, however, a time-consuming and cost-intensive process that lacks scalability. Instead, we leverage the vocabulary humans use to describe body shape. First, we ask annotators to label how much a word like “tall” or “long legs” applies to a human body. We gather these ratings for rendered meshes of various body shapes, for which we have ground-truth body model shape parameters, and for images collected from model agency websites. Using this data, we learn a shape-to-attribute (A2S) model that predicts body shape ratings from body shape parameters. Then we train a human mesh regressor, SHAPY, on the model agency images wherein we supervise body shape via attribute annotations using A2S. Since no suitable test set of diverse 3D ground-truth body shape with images taken in natural settings exists, we introduce Human Bodies in the Wild (HBW). This novel dataset contains photographs of individuals together with their body scan. Our model predicts more realistic body shapes from an image and quantitatively improves body shape estimation on this new benchmark. In summary, we present novel datasets, optimization methods, a generative model, and regressors to advance the field of 3D human pose and shape estimation. Taken together, these methods open up ways to obtain more accurate and realistic 3D mesh estimates from images with multiple people in self- and mutual contact poses and with diverse body shapes. This line of research also enables generative approaches to create more natural, human-like avatars. We believe that knowing about self- and human-human contact through computer vision has wide-ranging implications in other fields as for example robotics, fitness, or behavioral science.

ps

[BibTex]

[BibTex]


no image
Distributed Event-Based Learning via ADMM

Er, D., Trimpe, S., Muehlebach, M.

Max Planck Institute for Intelligent Systems, 2024 (techreport)

lds

link (url) [BibTex]

link (url) [BibTex]


Natural Language Control for 3D Human Motion Synthesis
Natural Language Control for 3D Human Motion Synthesis

Petrovich, M.

LIGM, Ecole des Ponts, Univ Gustave Eiffel, CNRS, 2024 (phdthesis)

Abstract
3D human motions are at the core of many applications in the film industry, healthcare, augmented reality, virtual reality and video games. However, these applications often rely on expensive and time-consuming motion capture data. The goal of this thesis is to explore generative models as an alternative route to obtain 3D human motions. More specifically, our aim is to allow a natural language interface as a means to control the generation process. To this end, we develop a series of models that synthesize realistic and diverse motions following the semantic inputs. In our first contribution, described in Chapter 3, we address the challenge of generating human motion sequences conditioned on specific action categories. We introduce ACTOR, a conditional variational autoencoder (VAE) that learns an action-aware latent representation for human motions. We show significant gains over existing methods thanks to our new Transformer-based VAE formulation, encoding and decoding SMPL pose sequences through a single motion-level embedding. In our second contribution, described in Chapter 4, we go beyond categorical actions, and dive into the task of synthesizing diverse 3D human motions from textual descriptions allowing a larger vocabulary and potentially more fine-grained control. Our work stands out from previous research by not deterministically generating a single motion sequence, but by synthesizing multiple, varied sequences from a given text. We propose TEMOS, building on our VAE-based ACTOR architecture, but this time integrating a pretrained text encoder to handle large-vocabulary natural language inputs. In our third contribution, described in Chapter 5, we address the adjacent task of text-to-3D human motion retrieval, where the goal is to search in a motion collection by querying via text. We introduce a simple yet effective approach, named TMR, building on our earlier model TEMOS, by integrating a contrastive loss to enhance the structure of the cross-modal latent space. Our findings emphasize the importance of retaining the motion generation loss in conjunction with contrastive training for improved results. We establish a new evaluation benchmark and conduct analyses on several protocols. In our fourth contribution, described in Chapter 6, we introduce a new problem termed as “multi-track timeline control” for text-driven 3D human motion synthesis. Instead of a single textual prompt, users can organize multiple prompts in temporal intervals that may overlap. We introduce STMC, a test-time denoising method that can be integrated with any pre-trained motion diffusion model. Our evaluations demonstrate that our method generates motions that closely match the semantic and temporal aspects of the input timelines. In summary, our contributions in this thesis are as follows: (i) we develop a generative variational autoencoder, ACTOR, for action-conditioned generation of human motion sequences, (ii) we introduce TEMOS, a text-conditioned generative model that synthesizes diverse human motions from textual descriptions, (iii) we present TMR, a new approach for text-to-3D human motion retrieval, (iv) we propose STMC, a method for timeline control in text-driven motion synthesis, enabling the generation of detailed and complex motions.

ps

Thesis [BibTex]

Thesis [BibTex]


no image
Incremental Few-Shot Adaptation for Non-Prehensile Object Manipulation using Parallelizable Physics Simulators

Baumeister, F., Mack, L., Stueckler, J.

CoRR abs/2409.13228, CoRR, 2024, Submitted to IEEE International Conference on Robotics and Automation (ICRA) 2025 (techreport) Submitted

Abstract
Few-shot adaptation is an important capability for intelligent robots that perform tasks in open-world settings such as everyday environments or flexible production. In this paper, we propose a novel approach for non-prehensile manipulation which iteratively adapts a physics-based dynamics model for model-predictive control. We adapt the parameters of the model incrementally with a few examples of robot-object interactions. This is achieved by sampling-based optimization of the parameters using a parallelizable rigid-body physics simulation as dynamic world model. In turn, the optimized dynamics model can be used for model-predictive control using efficient sampling-based optimization. We evaluate our few-shot adaptation approach in several object pushing experiments in simulation and with a real robot.

ev

preprint supplemental video link (url) [BibTex]

preprint supplemental video link (url) [BibTex]

2023


Fairness in Machine Learning: Limitations and Opportunities
Fairness in Machine Learning: Limitations and Opportunities

Barocas, S., Hardt, M., Narayanan, A.

MIT Press, December 2023 (book)

Abstract
An introduction to the intellectual foundations and practical utility of the recent work on fairness and machine learning. Fairness and Machine Learning introduces advanced undergraduate and graduate students to the intellectual foundations of this recently emergent field, drawing on a diverse range of disciplinary perspectives to identify the opportunities and hazards of automated decision-making. It surveys the risks in many applications of machine learning and provides a review of an emerging set of proposed solutions, showing how even well-intentioned applications may give rise to objectionable results. It covers the statistical and causal measures used to evaluate the fairness of machine learning models as well as the procedural and substantive aspects of decision-making that are core to debates about fairness, including a review of legal and philosophical perspectives on discrimination. This incisive textbook prepares students of machine learning to do quantitative work on fairness while reflecting critically on its foundations and its practical utility.• Introduces the technical and normative foundations of fairness in automated decision-making• Covers the formal and computational methods for characterizing and addressing problems• Provides a critical assessment of their intellectual foundations and practical utility• Features rich pedagogy and extensive instructor resources

sf

link (url) [BibTex]

2023


link (url) [BibTex]


no image
Navigating the Ocean of Biases: Political Bias Attribution in Language Models via Causal Structures

Jenny, D.

ETH Zurich, Switzerland, November 2023, external supervision (thesis)

ei

[BibTex]

[BibTex]


Hydraulically Amplified Self-healing Electrostatic Actuators
Hydraulically Amplified Self-healing Electrostatic Actuators

Keplinger, C. M., Acome, E. L., Kellaris, N. A., Mitchell, S. K.

(US Patent 11795979B2), October 2023 (patent)

Abstract
An electro-hydraulic actuator includes a deformable shell defining an enclosed internal cavity and containing a liquid dielectric, first and second electrodes on first and second sides, respectively, of the enclosed internal cavity. An electrostatic force between the first and second electrodes upon application of a voltage to one of the electrodes draws the electrodes towards each other to displace the liquid dielectric within the enclosed internal cavity. The shell includes active and inactive areas such that the electrostatic forces between the first and second electrodes displaces the liquid dielectric within the enclosed internal cavity from the active area of the shell to the inactive area of the shell. The first and second electrodes, the deformable shell, and the liquid dielectric cooperate to form a self-healing capacitor, and the liquid dielectric is configured for automatically filling breaches in the liquid dielectric resulting from dielectric breakdown.

rm

link (url) [BibTex]

link (url) [BibTex]


no image
Gesture-Based Nonverbal Interaction for Exercise Robots

Mohan, M.

University of Tübingen, Tübingen, Germany, October 2023, Department of Computer Science (phdthesis)

Abstract
When teaching or coaching, humans augment their words with carefully timed hand gestures, head and body movements, and facial expressions to provide feedback to their students. Robots, however, rarely utilize these nuanced cues. A minimally supervised social robot equipped with these abilities could support people in exercising, physical therapy, and learning new activities. This thesis examines how the intuitive power of human gestures can be harnessed to enhance human-robot interaction. To address this question, this research explores gesture-based interactions to expand the capabilities of a socially assistive robotic exercise coach, investigating the perspectives of both novice users and exercise-therapy experts. This thesis begins by concentrating on the user's engagement with the robot, analyzing the feasibility of minimally supervised gesture-based interactions. This exploration seeks to establish a framework in which robots can interact with users in a more intuitive and responsive manner. The investigation then shifts its focus toward the professionals who are integral to the success of these innovative technologies: the exercise-therapy experts. Roboticists face the challenge of translating the knowledge of these experts into robotic interactions. We address this challenge by developing a teleoperation algorithm that can enable exercise therapists to create customized gesture-based interactions for a robot. Thus, this thesis lays the groundwork for dynamic gesture-based interactions in minimally supervised environments, with implications for not only exercise-coach robots but also broader applications in human-robot interaction.

hi

Project Page [BibTex]

Project Page [BibTex]


High Strain Peano Hydraulically Amplified Self-Healing Electrostatic (HASEL) Transducers
High Strain Peano Hydraulically Amplified Self-Healing Electrostatic (HASEL) Transducers

Keplinger, C. M., Wang, X., Mitchell, S. K.

(US Patent App. 18/138,621), August 2023 (patent)

Abstract
High strain hydraulically amplified self-healing electrostatic transducers having increased maximum theoretical and practical strains are disclosed. In particular, the actuators include electrode configurations having a zipping front created by the attraction of the electrodes that is configured orthogonally to a strain axis along which the actuators. This configuration produces increased strains. In turn, various form factors for the actuator configuration are presented including an artificial circular muscle and a strain amplifying pulley system. Other actuator configurations are contemplated that include independent and opposed electrode pairs to create cyclic activation, hybrid electrode configurations, and use of strain limiting layers for controlled deflection of the actuator.

rm

link (url) [BibTex]


no image
Learning and Testing Powerful Hypotheses

Kübler, J. M.

University of Tübingen, Germany, July 2023 (phdthesis)

ei

[BibTex]

[BibTex]


Capacitive Self-Sensing for Electrostatic Transducers with High Voltage Isolation
Capacitive Self-Sensing for Electrostatic Transducers with High Voltage Isolation

Correll, N., Ly, K. D., Kellaris, N. A., Keplinger, C. M.

(US Patent App. 17/928,453), June 2023 (patent)

Abstract
Transducer systems disclosed herein include self-sensing capabilities. In particular, electrostatic transducers include a low voltage electrode and a high voltage electrode. A low voltage sensing unit is coupled with the low voltage electrode of the electrostatic transducer. The low voltage sensing unit is configured to measure a capacitance of the electrostatic transducer, from which displacement of the electrostatic transducer may be calculated. High voltage drive signals received by the high voltage electrode during actuation may be isolated from the low voltage sensing unit. The isolation may be provided by dielectric material of the electrostatic transducer, a voltage suppression component, and/or a voltage suppression module comprising a low impedance ground path. In the event of an electrical failure of the transducer, the low voltage sensing unit may be isolated from high voltages.

rm

link (url) [BibTex]

link (url) [BibTex]


no image
Learning Identifiable Representations: Independent Influences and Multiple Views

Gresele, L.

University of Tübingen, Germany, June 2023 (phdthesis)

ei

[BibTex]


no image
Learning with and for discrete optimization

Paulus, M.

ETH Zurich, Switzerland, May 2023, CLS PhD Program (phdthesis)

ei

[BibTex]

[BibTex]


High Strain Peano Hydraulically Amplified Self-healing Electrostatic (HASEL) Transducers
High Strain Peano Hydraulically Amplified Self-healing Electrostatic (HASEL) Transducers

Keplinger, C. M., Wang, X., Mitchell, S. K.

(US Patent 11635094), April 2023 (patent)

Abstract
High strain hydraulically amplified self-healing electrostatic transducers having increased maximum theoretical and practical strains are disclosed. In particular, the actuators include electrode configurations having a zipping front created by the attraction of the electrodes that is configured orthogonally to a strain axis along which the actuators. This configuration produces increased strains. In turn, various form factors for the actuator configuration are presented including an artificial circular muscle and a strain amplifying pulley system. Other actuator configurations are contemplated that include independent and opposed electrode pairs to create cyclic activation, hybrid electrode configurations, and use of strain limiting layers for controlled deflection of the actuator.

rm

link (url) [BibTex]


An Open-Source Modular Treadmill for Dynamic Force Measurement with Load Dependant Range Adjustment
An Open-Source Modular Treadmill for Dynamic Force Measurement with Load Dependant Range Adjustment

Sarvestani, A., Ruppert, F., Badri-Spröwitz, A.

2023 (unpublished) Submitted

Abstract
Ground reaction force sensing is one of the key components of gait analysis in legged locomotion research. To measure continuous force data during locomotion, we present a novel compound instrumented treadmill design. The treadmill is 1.7 m long, with a natural frequency of 170 Hz and an adjustable range that can be used for humans and small robots alike. Here, we present the treadmill’s design methodology and characterize it in its natural frequency, noise behavior and real-life performance. Additionally, we apply an ISO 376 norm conform calibration procedure for all spatial force directions and center of pressure position. We achieve a force accuracy of ≤ 5.6 N for the ground reaction forces and ≤ 13 mm in center of pressure position.

dlg

arXiv link (url) DOI [BibTex]


no image
Natural Language Processing for Policymaking

Jin, Z., Mihalcea, R.

In Handbook of Computational Social Science for Policy, pages: 141-162, 7, (Editors: Bertoni, E. and Fontana, M. and Gabrielli, L. and Signorelli, S. and Vespe, M.), Springer International Publishing, 2023 (inbook)

ei

DOI [BibTex]

DOI [BibTex]


no image
Object-Level Dynamic Scene Reconstruction With Physical Plausibility From RGB-D Images

Strecke, M. F.

Eberhard Karls Universität Tübingen, Tübingen, 2023 (phdthesis)

Abstract
Humans have the remarkable ability to perceive and interact with objects in the world around them. They can easily segment objects from visual data and have an intuitive understanding of how physics influences objects. By contrast, robots are so far often constrained to tailored environments for a specific task, due to their inability to reconstruct a versatile and accurate scene representation. In this thesis, we combine RGB-D video data with background knowledge of real-world physics to develop such a representation for robots.

Our contributions can be separated into two main parts: a dynamic object tracking tool and optimization frameworks that allow for improving shape reconstructions based on physical plausibility. The dynamic object tracking tool "EM-Fusion" detects, segments, reconstructs, and tracks objects from RGB-D video data. We propose a probabilistic data association approach for attributing the image pixels to the different moving objects in the scene. This allows us to track and reconstruct moving objects and the background scene with state-of-the art accuracy and robustness towards occlusions.

We investigate two ways of further optimizing the reconstructed shapes of moving objects based on physical plausibility. The first of these, "Co-Section", includes physical plausibility by reasoning about the empty space around an object. We observe that no two objects can occupy the same space at the same time and that the depth images in the input video provide an estimate of observed empty space. Based on these observations, we propose intersection and hull constraints, which we combine with the observed surfaces in a global optimization approach. Compared to EM-Fusion, which only reconstructs the observed surface, Co-Section optimizes watertight shapes. These watertight shapes provide a rough estimate of unseen surfaces and could be useful as initialization for further refinement, e.g., by interactive perception. In the second optimization approach, "DiffSDFSim", we reason about object shapes based on physically plausible object motion. We observe that object trajectories after collisions depend on the object's shape, and extend a differentiable physics simulation for optimizing object shapes together with other physical properties (e.g., forces, masses, friction) based on the motion of the objects and their interactions. Our key contributions are using signed distance function models for representing shapes and a novel method for computing gradients that models the dependency of the time of contact on object shapes. We demonstrate that our approach recovers target shapes well by fitting to target trajectories and depth observations. Further, the ground-truth trajectories are recovered well in simulation using the resulting shape and physical properties. This enables predictions about the future motion of objects by physical simulation.

We anticipate that our contributions can be useful building blocks in the development of 3D environment perception for robots. The reconstruction of individual objects as in EM-Fusion is a key ingredient required for interactions with objects. Completed shapes as the ones provided by Co-Section provide useful cues for planning interactions like grasping of objects. Finally, the recovery of shape and other physical parameters using differentiable simulation as in DiffSDFSim allows simulating objects and thus predicting the effects of interactions. Future work might extend the presented works for interactive perception of dynamic environments by comparing these predictions with observed real-world interactions to further improve the reconstructions and physical parameter estimations.

ev

link (url) DOI [BibTex]


Synchronizing Machine Learning Algorithms, Realtime Robotic Control and Simulated Environment with o80
Synchronizing Machine Learning Algorithms, Realtime Robotic Control and Simulated Environment with o80

Berenz, V., Widmaier, F., Guist, S., Schölkopf, B., Büchler, D.

Robot Software Architectures Workshop (RSA) 2023, ICRA, 2023 (techreport)

Abstract
Robotic applications require the integration of various modalities, encompassing perception, control of real robots and possibly the control of simulated environments. While the state-of-the-art robotic software solutions such as ROS 2 provide most of the required features, flexible synchronization between algorithms, data streams and control loops can be tedious. o80 is a versatile C++ framework for robotics which provides a shared memory model and a command framework for real-time critical systems. It enables expert users to set up complex robotic systems and generate Python bindings for scientists. o80's unique feature is its flexible synchronization between processes, including the traditional blocking commands and the novel ``bursting mode'', which allows user code to control the execution of the lower process control loop. This makes it particularly useful for setups that mix real and simulated environments.

ei

arxiv poster link (url) [BibTex]


no image
Wave front shaping with zone plates: Fabrication and characterization of lenses for soft x-ray applications from standard to singular optics

Baluktsian, M.

Universität Stuttgart, Stuttgart (und Verlag Dr. Hut, München), 2023 (phdthesis)

mms

link (url) [BibTex]


no image
Microfibers with mushroom-shaped tips for optimal adhesion

Sitti, M., Aksak, B.

2023, US Patent 11,613,674 (patent)

pi

[BibTex]

[BibTex]


Magnetic trap system and method of navigating a microscopic device
Magnetic trap system and method of navigating a microscopic device

Son, D., Ugurlu, M., Bluemer, P., Sitti, M.

2023, US Patent App. 17/871,598 (patent)

pi

[BibTex]


no image
Dry adhesives and methods for making dry adhesives

M Sitti, M. M. B. A.

2023, US Patent 11,773,298, 2023 (patent)

pi

[BibTex]


no image
Static and dynamic investigation of magnonic systems: materials, applications and modeling

Schulz, Frank Martin Ernst

Universität Stuttgart, Stuttgart, 2023 (phdthesis)

mms

link (url) DOI [BibTex]

link (url) DOI [BibTex]

2022


no image
DRY ADHESIVES AND METHODS FOR MAKING DRY ADHESIVES

Metin Sitti, Michael Murphy, Burak Aksak

December 2022, US Patent App. 17/895,334, 2022 (patent)

pi

[BibTex]

2022


[BibTex]


no image
Multi-Timescale Representation Learning of Human and Robot Haptic Interactions

Richardson, B.

University of Stuttgart, Stuttgart, Germany, December 2022, Faculty of Computer Science, Electrical Engineering and Information Technology (phdthesis)

Abstract
The sense of touch is one of the most crucial components of the human sensory system. It allows us to safely and intelligently interact with the physical objects and environment around us. By simply touching or dexterously manipulating an object, we can quickly infer a multitude of its properties. For more than fifty years, researchers have studied how humans physically explore and form perceptual representations of objects. Some of these works proposed the paradigm through which human haptic exploration is presently understood: humans use a particular set of exploratory procedures to elicit specific semantic attributes from objects. Others have sought to understand how physically measured object properties correspond to human perception of semantic attributes. Few, however, have investigated how specific explorations are perceived. As robots become increasingly advanced and more ubiquitous in daily life, they are beginning to be equipped with haptic sensing capabilities and algorithms for processing and structuring haptic information. Traditional haptics research has so far strongly influenced the introduction of haptic sensation and perception into robots but has not proven sufficient to give robots the necessary tools to become intelligent autonomous agents. The work presented in this thesis seeks to understand how single and sequential haptic interactions are perceived by both humans and robots. In our first study, we depart from the more traditional methods of studying human haptic perception and investigate how the physical sensations felt during single explorations are perceived by individual people. We treat interactions as probability distributions over a haptic feature space and train a model to predict how similarly a pair of surfaces is rated, predicting perceived similarity with a reasonable degree of accuracy. Our novel method also allows us to evaluate how individual people weigh different surface properties when they make perceptual judgments. The method is highly versatile and presents many opportunities for further studies into how humans form perceptual representations of specific explorations. Our next body of work explores how to improve robotic haptic perception of single interactions. We use unsupervised feature-learning methods to derive powerful features from raw robot sensor data and classify robot explorations into numerous haptic semantic property labels that were assigned from human ratings. Additionally, we provide robots with more nuanced perception by learning to predict graded ratings of a subset of properties. Our methods outperform previous attempts that all used hand-crafted features, demonstrating the limitations of such traditional approaches. To push robot haptic perception beyond evaluation of single explorations, our final work introduces and evaluates a method to give robots the ability to accumulate information over many sequential actions; our approach essentially takes advantage of object permanence by conditionally and recursively updating the representation of an object as it is sequentially explored. We implement our method on a robotic gripper platform that performs multiple exploratory procedures on each of many objects. As the robot explores objects with new procedures, it gains confidence in its internal representations and classification of object properties, thus moving closer to the marvelous haptic capabilities of humans and providing a solid foundation for future research in this domain.

hi

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


Reconstructing Expressive {3D} Humans from {RGB} Images
Reconstructing Expressive 3D Humans from RGB Images

Choutas, V.

ETH Zurich, Max Planck Institute for Intelligent Systems and ETH Zurich, December 2022 (phdthesis)

Abstract
To interact with our environment, we need to adapt our body posture and grasp objects with our hands. During a conversation our facial expressions and hand gestures convey important non-verbal cues about our emotional state and intentions towards our fellow speakers. Thus, modeling and capturing 3D full-body shape and pose, hand articulation and facial expressions are necessary to create realistic human avatars for augmented and virtual reality. This is a complex task, due to the large number of degrees of freedom for articulation, body shape variance, occlusions from objects and self-occlusions from body parts, e.g. crossing our hands, and subject appearance. The community has thus far relied on expensive and cumbersome equipment, such as multi-view cameras or motion capture markers, to capture the 3D human body. While this approach is effective, it is limited to a small number of subjects and indoor scenarios. Using monocular RGB cameras would greatly simplify the avatar creation process, thanks to their lower cost and ease of use. These advantages come at a price though, since RGB capture methods need to deal with occlusions, perspective ambiguity and large variations in subject appearance, in addition to all the challenges posed by full-body capture. In an attempt to simplify the problem, researchers generally adopt a divide-and-conquer strategy, estimating the body, face and hands with distinct methods using part-specific datasets and benchmarks. However, the hands and face constrain the body and vice-versa, e.g. the position of the wrist depends on the elbow, shoulder, etc.; the divide-and-conquer approach can not utilize this constraint. In this thesis, we aim to reconstruct the full 3D human body, using only readily accessible monocular RGB images. In a first step, we introduce a parametric 3D body model, called SMPL-X, that can represent full-body shape and pose, hand articulation and facial expression. Next, we present an iterative optimization method, named SMPLify-X, that fits SMPL-X to 2D image keypoints. While SMPLify-X can produce plausible results if the 2D observations are sufficiently reliable, it is slow and susceptible to initialization. To overcome these limitations, we introduce ExPose, a neural network regressor, that predicts SMPL-X parameters from an image using body-driven attention, i.e. by zooming in on the hands and face, after predicting the body. From the zoomed-in part images, dedicated part networks predict the hand and face parameters. ExPose combines the independent body, hand, and face estimates by trusting them equally. This approach though does not fully exploit the correlation between parts and fails in the presence of challenges such as occlusion or motion blur. Thus, we need a better mechanism to aggregate information from the full body and part images. PIXIE uses neural networks called moderators that learn to fuse information from these two image sets before predicting the final part parameters. Overall, the addition of the hands and face leads to noticeably more natural and expressive reconstructions. Creating high fidelity avatars from RGB images requires accurate estimation of 3D body shape. Although existing methods are effective at predicting body pose, they struggle with body shape. We identify the lack of proper training data as the cause. To overcome this obstacle, we propose to collect internet images from fashion models websites, together with anthropometric measurements. At the same time, we ask human annotators to rate images and meshes according to a pre-defined set of linguistic attributes. We then define mappings between measurements, linguistic shape attributes and 3D body shape. Equipped with these mappings, we train a neural network regressor, SHAPY, that predicts accurate 3D body shapes from a single RGB image. We observe that existing 3D shape benchmarks lack subject variety and/or ground-truth shape. Thus, we introduce a new benchmark, Human Bodies in the Wild (HBW), which contains images of humans and their corresponding 3D ground-truth body shape. SHAPY shows how we can overcome the lack of in-the-wild images with 3D shape annotations through easy-to-obtain anthropometric measurements and linguistic shape attributes. Regressors that estimate 3D model parameters are robust and accurate, but often fail to tightly fit the observations. Optimization-based approaches tightly fit the data, by minimizing an energy function composed of a data term that penalizes deviations from the observations and priors that encode our knowledge of the problem. Finding the balance between these terms and implementing a performant version of the solver is a time-consuming and non-trivial task. Machine-learned continuous optimizers combine the benefits of both regression and optimization approaches. They learn the priors directly from data, avoiding the need for hand-crafted heuristics and loss term balancing, and benefit from optimized neural network frameworks for fast inference. Inspired from the classic Levenberg-Marquardt algorithm, we propose a neural optimizer that outperforms classic optimization, regression and hybrid optimization-regression approaches. Our proposed update rule uses a weighted combination of gradient descent and a network-predicted update. To show the versatility of the proposed method, we apply it on three other problems, namely full body estimation from (i) 2D keypoints, (ii) head and hand location from a head-mounted device and (iii) face tracking from dense 2D landmarks. Our method can easily be applied to new model fitting problems and offers a competitive alternative to well-tuned traditional model fitting pipelines, both in terms of accuracy and speed. To summarize, we propose a new and richer representation of the human body, SMPL-X, that is able to jointly model the 3D human body pose and shape, facial expressions and hand articulation. We propose methods, SMPLify-X, ExPose and PIXIE that estimate SMPL-X parameters from monocular RGB images, progressively improving the accuracy and realism of the predictions. To further improve reconstruction fidelity, we demonstrate how we can use easy-to-collect internet data and human annotations to overcome the lack of 3D shape data and train a model, SHAPY, that predicts accurate 3D body shape from a single RGB image. Finally, we propose a flexible learnable update rule for parametric human model fitting that outperforms both classic optimization and neural network approaches. This approach is easily applicable to a variety of problems, unlocking new applications in AR/VR scenarios.

ps

pdf [BibTex]

pdf [BibTex]


no image
Mechanical Design, Development and Testing of Bioinspired Legged Robots for Dynamic Locomotion

Sarvestani, L. A.

Eberhard Karls Universität Tübingen, Tübingen , November 2022 (phdthesis)

dlg

DOI [BibTex]

DOI [BibTex]


Hydraulically Amplified Self-healing Electrostatic Transducers Harnessing Zipping Mechanism
Hydraulically Amplified Self-healing Electrostatic Transducers Harnessing Zipping Mechanism

Keplinger, C. M., Acome, E. L., Kellaris, N. A., Mitchell, S. K., Morrissey, T. G.

(US Patent 11486421B2), November 2022 (patent)

Abstract
Hydraulically-amplified, self-healing, electrostatic transducers that harness electrostatic and hydraulic forces to achieve various actuation modes. Electrostatic forces between electrode pairs of the transducers generated upon application of a voltage to the electrode pairs draws the electrodes in each pair towards each other to displace a liquid dielectric contained within an enclosed internal cavity of the transducers to drive actuation in various manners. The electrodes and the liquid dielectric form a self-healing capacitor whereby the liquid dielectric automatically fills breaches in the liquid dielectric resulting from dielectric breakdown. Due to the resting shape of the cavity, a zipping-mechanism allows for selectively actuating the electrodes to a desired extent by controlling the voltage supplied.

rm

link (url) [BibTex]

link (url) [BibTex]


no image
Towards learning mechanistic models at the right level of abstraction

Neitz, A.

University of Tübingen, Germany, November 2022 (phdthesis)

ei

[BibTex]

[BibTex]