Publications | Max Planck Institute for Intelligent Systems

4912 results (View BibTeX file of all listed publications)

2024

PuzzleAvatar: Assembling 3D Avatars from Personal Albums

Xiu, Y., Liu, Z., Tzionas, D., Black, M. J.

ACM Transactions on Graphics, 43(6), ACM, December 2024 (article) To be published

Abstract

Generating personalized 3D avatars is crucial for AR/VR. However, recent text-to-3D methods that generate avatars for celebrities or fictional characters, struggle with everyday people. Methods for faithful reconstruction typically require full-body images in controlled settings. What if a user could just upload their personal "OOTD" (Outfit Of The Day) photo collection and get a faithful avatar in return? The challenge is that such casual photo collections contain diverse poses, challenging viewpoints, cropped views, and occlusion (albeit with a consistent outfit, accessories and hairstyle). We address this novel "Album2Human" task by developing PuzzleAvatar, a novel model that generates a faithful 3D avatar (in a canonical pose) from a personal OOTD album, while bypassing the challenging estimation of body and camera pose. To this end, we fine-tune a foundational vision-language model (VLM) on such photos, encoding the appearance, identity, garments, hairstyles, and accessories of a person into (separate) learned tokens and instilling these cues into the VLM. In effect, we exploit the learned tokens as "puzzle pieces" from which we assemble a faithful, personalized 3D avatar. Importantly, we can customize avatars by simply inter-changing tokens. As a benchmark for this new task, we collect a new dataset, called PuzzleIOI, with 41 subjects in a total of nearly 1K OOTD configurations, in challenging partial photos with paired ground-truth 3D bodies. Evaluation shows that PuzzleAvatar not only has high reconstruction accuracy, outperforming TeCH and MVDreamBooth, but also a unique scalability to album photos, and strong robustness. Our code and data are publicly available for research purpose.

Page Code Video DOI [BibTex]

2024

ps Xiu, Y., Liu, Z., Tzionas, D., Black, M. J. PuzzleAvatar: Assembling 3D Avatars from Personal Albums ACM Transactions on Graphics, 43(6), ACM, December 2024 (article) To be published

Page Code Video DOI [BibTex]

StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal

Ye, C., Qiu, L., Gu, X., Zuo, Q., Wu, Y., Dong, Z., Bo, L., Xiu, Y., Han, X.

ACM Transactions on Graphics, 43(6), ACM, December 2024 (article) To be published

Abstract

This work addresses the challenge of high-quality surface normal estimation from monocular colored inputs (i.e., images and videos), a field which has recently been revolutionized by repurposing diffusion priors. However, previous attempts still struggle with stochastic inference, conflicting with the deterministic nature of the Image2Normal task, and costly ensembling step, which slows down the estimation process. Our method, StableNormal, mitigates the stochasticity of the diffusion process by reducing inference variance, thus producing "Stable-and-Sharp" normal estimates without any additional ensembling process. StableNormal works robustly under challenging imaging conditions, such as extreme lighting, blurring, and low quality. It is also robust against transparent and reflective surfaces, as well as cluttered scenes with numerous objects. Specifically, StableNormal employs a coarse-to-fine strategy, which starts with a one-step normal estimator (YOSO) to derive an initial normal guess, that is relatively coarse but reliable, then followed by a semantic-guided refinement process (SG-DRN) that refines the normals to recover geometric details. The effectiveness of StableNormal is demonstrated through competitive performance in standard datasets such as DIODE-indoor, iBims, ScannetV2 and NYUv2, and also in various downstream tasks, such as surface reconstruction and normal enhancement. These results evidence that StableNormal retains both the "stability" and "sharpness" for accurate normal estimation. StableNormal represents a baby attempt to repurpose diffusion priors for deterministic estimation. To democratize this, code and models have been publicly available.

Page Huggingface Demo Code Video DOI [BibTex]

ps Ye, C., Qiu, L., Gu, X., Zuo, Q., Wu, Y., Dong, Z., Bo, L., Xiu, Y., Han, X. StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal ACM Transactions on Graphics, 43(6), ACM, December 2024 (article) To be published

Page Huggingface Demo Code Video DOI [BibTex]

Reinforcement learning in cold atom experiments

Reinschmidt, M., Fortágh, J., Günther, A., Volchkov, V.

nature communications, 15:8532, October 2024 (article)

Abstract

Cold atom traps are at the heart of many quantum applications in science and technology. The preparation and control of atomic clouds involves complex optimization processes, that could be supported and accelerated by machine learning. In this work, we introduce reinforcement learning to cold atom experiments and demonstrate a flexible and adaptive approach to control a magneto-optical trap. Instead of following a set of predetermined rules to accomplish a specific task, the objectives are defined by a reward function. This approach not only optimizes the cooling of atoms just as an experi- mentalist would do, but also enables new operational modes such as the preparation of pre-defined numbers of atoms in a cloud. The machine control is trained to be robust against external perturbations and able to react to situations not seen during the training. Finally, we show that the time con- suming training can be performed in-silico using a generic simulation and demonstrate successful transfer to the real world experiment.

OS Lab

link (url) DOI [BibTex]

OS Lab Reinschmidt, M., Fortágh, J., Günther, A., Volchkov, V. Reinforcement learning in cold atom experiments nature communications, 15:8532, October 2024 (article)

link (url) DOI [BibTex]

Hexagonal electrohydraulic modules for rapidly reconfigurable high-speed robots

Yoder, Z., Rumley, E., Schmidt, I., Rothemund, P., Keplinger, C.

Science Robotics, 9, September 2024 (article)

Abstract

Robots made from reconfigurable modular units feature versatility, cost efficiency, and improved sustainability compared with fixed designs. Reconfigurable modules driven by soft actuators provide adaptable actuation, safe interaction, and wide design freedom, but existing soft modules would benefit from high-speed and high-strain actuation, as well as driving methods well-suited to untethered operation. Here, we introduce a class of electrically actuated robotic modules that provide high-speed (a peak contractile strain rate of 4618% per second, 15.8-hertz bandwidth, and a peak specific power of 122 watts per kilogram), high-strain (49% contraction) actuation and that use magnets for reversible mechanical and electrical connections between neighboring modules, thereby serving as building blocks for rapidly reconfigurable and highly agile robotic systems. The actuation performance of each hexagonal electrohydraulic (HEXEL) module is enabled by a synergistic combination of soft and rigid components; a hexagonal exoskeleton of rigid plates amplifies the motion produced by soft electrohydraulic actuators and provides a mechanical structure and connection platform for reconfigurable robots composed of many modules. We characterize the actuation performance of individual HEXEL modules, present a model that captures their quasi-static force-stroke behavior, and demonstrate both a high-jumping and a fast pipe-crawling robot. Using embedded magnetic connections, we arranged multiple modules into reconfigurable robots with diverse functionality, including a high-stroke muscle, a multimodal active array, a table-top active platform, and a fast-rolling robot. We further leveraged the magnetic connections for hosting untethered, snap-on driving electronics, together highlighting the promise of HEXEL modules for creating rapidly reconfigurable high-speed robots.

link (url) DOI [BibTex]

rm Yoder, Z., Rumley, E., Schmidt, I., Rothemund, P., Keplinger, C. Hexagonal electrohydraulic modules for rapidly reconfigurable high-speed robots Science Robotics, 9, September 2024 (article)

link (url) DOI [BibTex]

Fiber-Optic Shape Sensing Using Neural Networks Operating on Multispecklegrams

Cao, C. G. L., Javot, B., Bhattarai, S., Bierig, K., Oreshnikov, I., Volchkov, V. V.

IEEE Sensors Journal, 24(17):27532-27540, September 2024 (article)

Abstract

Application of machine learning techniques on fiber speckle images to infer fiber deformation allows the use of an unmodified multimode fiber to act as a shape sensor. This approach eliminates the need for complex fiber design or construction (e.g., Bragg gratings and time-of-flight). Prior work in shape determination using neural networks trained on a finite number of possible fiber shapes (formulated as a classification task), or trained on a few continuous degrees of freedom, has been limited to reconstruction of fiber shapes only one bend at a time. Furthermore, generalization to shapes that were not used in training is challenging. Our innovative approach improves generalization capabilities, using computer vision-assisted parameterization of the actual fiber shape to provide a ground truth, and multiple specklegrams per fiber shape obtained by controlling the input field. Results from experimenting with several neural network architectures, shape parameterization, number of inputs, and specklegram resolution show that fiber shapes with multiple bends can be accurately predicted. Our approach is able to generalize to new shapes that were not in the training set. This approach of end-to-end training on parameterized ground truth opens new avenues for fiber-optic sensor applications. We publish the datasets used for training and validation, as well as an out-of-distribution (OOD) test set, and encourage interested readers to access these datasets for their own model development.

hi ei OS Lab zwe-sw

MPI Papers

Abteilungen

Forschungsgruppen

Publikationen

Jahr

2024

2024