Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Perceiving Systems Thesis Dynamic 3D Synthesis: From Video-Based Animatable Head Avatars to Text-Guided 4D Content Creation Zheng, Y. 2025 (Published)
The synthesis of 4D content—dynamic 3D content that evolves over time—has become increasingly important across a wide range of applications, including virtual communication, gaming, AR/VR, and digital content creation. Despite recent advances, generating realistic 4D content from accessible inputs remains a significant challenge. Existing approaches often rely on dense multi-camera capture systems, which are costly and impractical for everyday use, or yield results with limited geometric and visual fidelity. This thesis investigates two sub tasks in 4D content creation: (1) the reconstruction of high-fidelity, animatable head avatars from accessible inputs such as monocular RGB videos, and (2) the generation of dynamic 4D scenes from text prompts and optionally sparse visual input, such as reference images. These two directions are unified by a common goal—enabling controllable and high-quality 4D content creation from minimal visual supervision. The first part of this thesis presents IMavatar, a morphable implicit surface representation for reconstructing personalized head avatars from monocular videos. Implicit surfaces provide topological flexibility and can recover detailed 3D geometry directly from RGB images, making them well-suited for head avatar reconstruction. However, modeling expression- and pose-dependent deformations in an interpretable and generalizable way remains a major challenge when working with implicit representations. Inspired by 3D morphable models, IMavatar models deformation by learning expression blendshapes and skinning weight fields in a canonical space, enabling structured and generalizable control over novel expressions and poses. To enable end-to-end optimization from monocular videos, we propose a novel analytical gradient formulation that supports joint training of the geometry and deformation directly from RGB supervision. By combining the geometric fidelity of neural implicit fields with the controllability of morphable models, IMavatar achieves high-quality 4D reconstructions and strong generalization to unseen expressions and head poses. The second part of this thesis presents PointAvatar, a deformable point-based representation for animatable 3D head avatars. While implicit representations are effective at learning detailed geometry from image observations, they are inherently difficult to animate and computationally expensive to render. To address these limitations, this work explores point clouds as the underlying geometric representation for head avatars, offering the efficiency of explicit representations while avoiding the fixed-topology constraints of meshes. PointAvatar uses a canonical point cloud combined with learned blendshape and skinning weight fields, and further disentangles intrinsic albedo from view-dependent shading to support relighting under novel illumination. To improve training stability and reconstruction quality, we adopt a coarse-to-fine strategy that gradually increases point cloud resolution during learning. This enables the model to effectively capture accurate geometry and high-quality texture from monocular RGB videos, including challenging cases such as eyeglasses and complex hairstyles. Compared to IMavatar, PointAvatar achieves an 8× speed-up during training and a 100× speed-up during inference rendering, while maintaining high visual and geometric quality. In the final part, this thesis explores Dream-in-4D, a diffusion-guided framework for generating creative 4D content from natural language. The focus is on synthesizing imaginative 4D scenes from minimal visual input—either a single image or no visual input at all. To this end, the method leverages prior knowledge from pre-trained image and video diffusion models to optimize a 4D representation. Dream-in-4D follows a two-stage pipeline. In the first stage, a static 3D model is optimized as a neural radiance field using guidance from both image and 3D-aware diffusion models, resulting in high-quality, view-consistent assets. In the second stage, a time-dependent, multi-resolution deformation field is introduced to represent motion and is optimized using video diffusion guidance, equipping the static 3D asset with detailed and plausible motion driven by text prompts. The resulting system supports text-to-4D, image-to-4D, and personalized 4D generation within a unified framework, enabling intuitive and flexible dynamic scene synthesis from highly accessible inputs. Together, these methods address two essential aspects of 4D content creation: the reconstruction of animatable head avatars from monocular videos, and the generation of dynamic, imaginative 4D scenes from text and image prompts. We hope these contributions advance the field toward more accessible, controllable, and high-quality 4D content creation—enabling a broad range of applications across research, industry, and creative practice.
DOI URL BibTeX

Haptic Intelligence Bachelor Thesis Kalman Filter Approach to Sensor Fusion of Ultra-Wideband Positioning and IMU Readings for Enhanced Indoor Tracking of Collaborating Humans Hudhud Mughrabi, M. Kadir Has University, Istanbul, Turkey, June 2024, Bachelor of Science (BSc) in Mechatronics Engineering (Published)
The question of how humans collaborate to perform complex tasks such as surgery has previously been investigated via multimodal sensing and analysis. Ultra-wideband (UWB) localization systems can be deployed to track collaborating team members due to good maneuverability even in cramped environments. However, UWB systems' sampling rate is inversely proportional to the number of people tracked, and their accuracy is hindered by electromagnetic occlusion. This thesis combines UWB positioning with measurements from a wearable inertial measurement unit (IMU) by applying an error-state extended Kalman filter (ES-EKF) to improve position and orientation estimation during team collaborative studies. ES-EKF offers faster and more consistent estimation and can be estimated even without UWB input. Single-human and multi-human sessions were recorded and filtered for evaluation in comparison to ground truth from optical motion capture. By integrating the IMU, the ES-EKF increases the sampling rate from 0.5–20 Hz to 100 Hz. As it is corrected in only 2 degrees of freedom (DOF), the ES-EKF yields improved results over UWB in 4 out of 6 DOF: lateral and longitudinal position and yaw and pitch orientation. Further filter design implications are suggested for future application of ES-EKF in position and orientation estimation of collaborating humans.
BibTeX

Empirical Inference Thesis Development of advanced methods for improving astronomical images Schmeißer, N. Eberhard Karls Universität Tübingen, Germany, Eberhard Karls Universität Tübingen, Germany, 2014 BibTeX

Empirical Inference Probabilistic Numerics Thesis Camera-specific Image Denoising Schober, M. Eberhard Karls Universität Tübingen, Germany, October 2013 (Published) PDF BibTeX

Empirical Inference Thesis Detecting the mincut in sparse random graphs Köhler, R. Eberhard Karls Universität Tübingen, Germany, 2010 BibTeX

Empirical Inference Thesis Finding Gene-Gene Interactions using Support Vector Machines Rakitsch, B. Eberhard Karls Universität Tübingen, Germany, 2010 BibTeX

Empirical Inference Thesis Motor Control and Learning in Table Tennis Mülling, K. Eberhard Karls Universität Tübingen, Gerrmany, 2009 BibTeX

Empirical Inference Thesis Reinforcement Learning for Motor Primitives Kober, J. Biologische Kybernetik, University of Stuttgart, Stuttgart, Germany, August 2008 PDF BibTeX

Empirical Inference Thesis Asymmetries of Time Series under Inverting their Direction Peters, J. Biologische Kybernetik, University of Heidelberg, August 2008 PDF BibTeX

Empirical Inference Thesis Pairwise Correlations and Multineuronal Firing Patterns in Primary Visual Cortex Berens, P. Biologische Kybernetik, Eberhard Karls Universität Tübingen, Tübingen, Germany, April 2008 BibTeX

Empirical Inference Thesis Development and Application of a Python Scripting Framework for BCI2000 Schreiner, T. Biologische Kybernetik, Eberhard-Karls-Universität Tübingen, Tübingen, Germany, January 2008 BibTeX

Empirical Inference Thesis Statistical Learning Theory Approaches to Clustering Jegelka, S. Biologische Kybernetik, Eberhard-Karls-Universität Tübingen, Tübingen, Germany, November 2007 PDF BibTeX

Empirical Inference Thesis Error Correcting Codes for the P300 Visual Speller Biessmann, F. Biologische Kybernetik, Eberhard-Karls-Universität Tübingen, Tübingen, Germany, July 2007
The aim of brain-computer interface (BCI) research is to establish a communication system based on intentional modulation of brain activity. This is accomplished by classifying patterns of brain ac- tivity, volitionally induced by the user. The BCI presented in this study is based on a classical paradigm as proposed by (Farwell and Donchin, 1988), the P300 visual speller. Recording electroencephalo- grams (EEG) from the scalp while presenting letters successively to the user, the speller can infer from the brain signal which letter the user was focussing on. Since EEG recordings are noisy, usually many repetitions are needed to detect the correct letter. The focus of this study was to improve the accuracy of the visual speller applying some basic principles from information theory: Stimulus sequences of the speller have been modified into error-correcting codes. Additionally a language model was incorporated into the probabilistic letter de- coder. Classification of single EEG epochs was less accurate using error correcting codes. However, the novel code could compensate for that such that overall, letter accuracies were as high as or even higher than for classical stimulus codes. In particular at high noise levels, error-correcting decoding achieved higher letter accuracies.
PDF BibTeX

Empirical Inference Thesis A priori Knowledge from Non-Examples Sinz, F. Biologische Kybernetik, Eberhard-Karls-Universität Tübingen, Tübingen, Germany, March 2007 PDF Web BibTeX

Empirical Inference Thesis A Machine Learning Approach for Estimating the Attenuation Map for a Combined PET/MR Scanner Hofmann, M. Biologische Kybernetik, Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, 2007 BibTeX

Empirical Inference Thesis Kernel PCA for Image Compression Huhle, B. Biologische Kybernetik, Eberhard-Karls-Universität, Tübingen, Germany, April 2006 PDF BibTeX

Empirical Inference Thesis Implicit Surfaces For Modelling Human Heads Steinke, F. Biologische Kybernetik, Eberhard-Karls-Universität, Tübingen, September 2005 BibTeX

Empirical Inference Thesis Efficient Adaptive Sampling of the Psychometric Function by Maximizing Information Gain Tanner, T. Biologische Kybernetik, Eberhard-Karls University Tübingen, Tübingen, Germany, May 2005
A common task in psychophysics is to measure the psychometric function. A psychometric function can be described by its shape and four parameters: offset or threshold, slope or width, false alarm rate or chance level and miss or lapse rate. Depending on the parameters of interest some points on the psychometric function may be more informative than others. Adaptive methods attempt to place trials on the most informative points based on the data collected in previous trials. A new Bayesian adaptive psychometric method placing trials by minimising the expected entropy of the posterior probabilty dis- tribution over a set of possible stimuli is introduced. The method is more flexible, faster and at least as efficient as the established method (Kontsevich and Tyler, 1999). Comparably accurate (2dB) threshold and slope estimates can be obtained after about 30 and 500 trials, respectively. By using a dynamic termination criterion the efficiency can be further improved. The method can be applied to all experimental designs including yes/no designs and allows acquisition of any set of free parameters. By weighting the importance of parameters one can include nuisance parameters and adjust the relative expected errors. Use of nuisance parameters may lead to more accurate estimates than assuming a guessed fixed value. Block designs are supported and do not harm the performance if a sufficient number of trials are performed. The method was evaluated by computer simulations in which the role of parametric assumptions, its robustness, the quality of different point estimates, the effect of dynamic termination criteria and many other settings were investigated.
BibTeX

Empirical Inference Thesis Real-Time Face Detection Kienzle, W. Biologische Kybernetik, Eberhard-Karls-Universitaet Tuebingen, Tuebingen, Germany, October 2003 BibTeX

Empirical Inference Thesis m-Alternative Forced Choice—Improving the Efficiency of the Method of Constant Stimuli Jäkel, F. Biologische Kybernetik, Graduate School for Neural and Behavioural Sciences, Tübingen, 2003 BibTeX

Empirical Inference Thesis Variationsverfahren zur Untersuchung von Grundzustandseigenschaften des Ein-Band Hubbard-Modells Eichhorn, J. Biologische Kybernetik, Technische Universität Dresden, Dresden/Germany, May 2001
Using different modifications of a new variational approach, statical groundstate properties of the one-band Hubbard model such as energy and staggered magnetisation are calculated. By taking into account additional fluctuations, the method ist gradually improved so that a very good description of the energy in one and two dimensions can be achieved. After a detailed discussion of the application in one dimension, extensions for two dimensions are introduced. By use of a modified version of the variational ansatz in particular a description of the quantum phase transition for the magnetisation should be possible.
PostScript BibTeX