Header logo is


2017


no image
Evaluation of High-Fidelity Simulation as a Training Tool in Transoral Robotic Surgery

Bur, A. M., Gomez, E. D., Newman, J. G., Weinstein, G. S., Bert W. O’Malley, J., Rassekh, C. H., Kuchenbecker, K. J.

Laryngoscope, 127(12):2790-2795, December 2017 (article)

hi

DOI [BibTex]

2017


DOI [BibTex]


no image
Synchronicity Trumps Mischief in Rhythmic Human-Robot Social-Physical Interaction

Fitter, N. T., Kuchenbecker, K. J.

In Proceedings of the International Symposium on Robotics Research (ISRR), Puerto Varas, Chile, December 2017 (inproceedings) In press

Abstract
Hand-clapping games and other forms of rhythmic social-physical interaction might help foster human-robot teamwork, but the design of such interactions has scarcely been explored. We leveraged our prior work to enable the Rethink Robotics Baxter Research Robot to competently play one-handed tempo-matching hand-clapping games with a human user. To understand how such a robot’s capabilities and behaviors affect user perception, we created four versions of this interaction: the hand clapping could be initiated by either the robot or the human, and the non-initiating partner could be either cooperative, yielding synchronous motion, or mischievously uncooperative. Twenty adults tested two clapping tempos in each of these four interaction modes in a random order, rating every trial on standardized scales. The study results showed that having the robot initiate the interaction gave it a more dominant perceived personality. Despite previous results on the intrigue of misbehaving robots, we found that moving synchronously with the robot almost always made the interaction more enjoyable, less mentally taxing, less physically demanding, and lower effort for users than asynchronous interactions caused by robot or human mischief. Taken together, our results indicate that cooperative rhythmic social-physical interaction has the potential to strengthen human-robot partnerships.

hi

[BibTex]

[BibTex]


no image
Optimal gamification can help people procrastinate less

Lieder, F., Griffiths, T. L.

Annual Meeting of the Society for Judgment and Decision Making, Annual Meeting of the Society for Judgment and Decision Making, November 2017 (conference)

re

Project Page [BibTex]

Project Page [BibTex]


Learning a model of facial shape and expression from {4D} scans
Learning a model of facial shape and expression from 4D scans

Li, T., Bolkart, T., Black, M. J., Li, H., Romero, J.

ACM Transactions on Graphics, 36(6):194:1-194:17, November 2017, Two first authors contributed equally (article)

Abstract
The field of 3D face modeling has a large gap between high-end and low-end methods. At the high end, the best facial animation is indistinguishable from real humans, but this comes at the cost of extensive manual labor. At the low end, face capture from consumer depth sensors relies on 3D face models that are not expressive enough to capture the variability in natural facial shape and expression. We seek a middle ground by learning a facial model from thousands of accurately aligned 3D scans. Our FLAME model (Faces Learned with an Articulated Model and Expressions) is designed to work with existing graphics software and be easy to fit to data. FLAME uses a linear shape space trained from 3800 scans of human heads. FLAME combines this linear shape space with an articulated jaw, neck, and eyeballs, pose-dependent corrective blendshapes, and additional global expression from 4D face sequences in the D3DFACS dataset along with additional 4D sequences.We accurately register a template mesh to the scan sequences and make the D3DFACS registrations available for research purposes. In total the model is trained from over 33, 000 scans. FLAME is low-dimensional but more expressive than the FaceWarehouse model and the Basel Face Model. We compare FLAME to these models by fitting them to static 3D scans and 4D sequences using the same optimization method. FLAME is significantly more accurate and is available for research purposes (http://flame.is.tue.mpg.de).

ps

data/model video code chumpy code tensorflow paper supplemental Project Page [BibTex]

data/model video code chumpy code tensorflow paper supplemental Project Page [BibTex]


Investigating Body Image Disturbance in Anorexia Nervosa Using Novel Biometric Figure Rating Scales: A Pilot Study
Investigating Body Image Disturbance in Anorexia Nervosa Using Novel Biometric Figure Rating Scales: A Pilot Study

Mölbert, S. C., Thaler, A., Streuber, S., Black, M. J., Karnath, H., Zipfel, S., Mohler, B., Giel, K. E.

European Eating Disorders Review, 25(6):607-612, November 2017 (article)

Abstract
This study uses novel biometric figure rating scales (FRS) spanning body mass index (BMI) 13.8 to 32.2 kg/m2 and BMI 18 to 42 kg/m2. The aims of the study were (i) to compare FRS body weight dissatisfaction and perceptual distortion of women with anorexia nervosa (AN) to a community sample; (ii) how FRS parameters are associated with questionnaire body dissatisfaction, eating disorder symptoms and appearance comparison habits; and (iii) whether the weight spectrum of the FRS matters. Women with AN (n = 24) and a community sample of women (n = 104) selected their current and ideal body on the FRS and completed additional questionnaires. Women with AN accurately picked the body that aligned best with their actual weight in both FRS. Controls underestimated their BMI in the FRS 14–32 and were accurate in the FRS 18–42. In both FRS, women with AN desired a body close to their actual BMI and controls desired a thinner body. Our observations suggest that body image disturbance in AN is unlikely to be characterized by a visual perceptual disturbance, but rather by an idealization of underweight in conjunction with high body dissatisfaction. The weight spectrum of FRS can influence the accuracy of BMI estimation.

ps

publisher DOI Project Page [BibTex]


Embodied Hands: Modeling and Capturing Hands and Bodies Together
Embodied Hands: Modeling and Capturing Hands and Bodies Together

Romero, J., Tzionas, D., Black, M. J.

ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6):245:1-245:17, 245:1–245:17, ACM, November 2017 (article)

Abstract
Humans move their hands and bodies together to communicate and solve tasks. Capturing and replicating such coordinated activity is critical for virtual characters that behave realistically. Surprisingly, most methods treat the 3D modeling and tracking of bodies and hands separately. Here we formulate a model of hands and bodies interacting together and fit it to full-body 4D sequences. When scanning or capturing the full body in 3D, hands are small and often partially occluded, making their shape and pose hard to recover. To cope with low-resolution, occlusion, and noise, we develop a new model called MANO (hand Model with Articulated and Non-rigid defOrmations). MANO is learned from around 1000 high-resolution 3D scans of hands of 31 subjects in a wide variety of hand poses. The model is realistic, low-dimensional, captures non-rigid shape changes with pose, is compatible with standard graphics packages, and can fit any human hand. MANO provides a compact mapping from hand poses to pose blend shape corrections and a linear manifold of pose synergies. We attach MANO to a standard parameterized 3D body shape model (SMPL), resulting in a fully articulated body and hand model (SMPL+H). We illustrate SMPL+H by fitting complex, natural, activities of subjects captured with a 4D scanner. The fitting is fully automatic and results in full body models that move naturally with detailed hand motions and a realism not seen before in full body performance capture. The models and data are freely available for research purposes at http://mano.is.tue.mpg.de.

ps

website youtube paper suppl video link (url) DOI Project Page [BibTex]

website youtube paper suppl video link (url) DOI Project Page [BibTex]


A Generative Model of People in Clothing
A Generative Model of People in Clothing

Lassner, C., Pons-Moll, G., Gehler, P. V.

In Proceedings IEEE International Conference on Computer Vision (ICCV), IEEE, Piscataway, NJ, USA, IEEE International Conference on Computer Vision (ICCV), October 2017 (inproceedings)

Abstract
We present the first image-based generative model of people in clothing in a full-body setting. We sidestep the commonly used complex graphics rendering pipeline and the need for high-quality 3D scans of dressed people. Instead, we learn generative models from a large image database. The main challenge is to cope with the high variance in human pose, shape and appearance. For this reason, pure image-based approaches have not been considered so far. We show that this challenge can be overcome by splitting the generating process in two parts. First, we learn to generate a semantic segmentation of the body and clothing. Second, we learn a conditional model on the resulting segments that creates realistic images. The full model is differentiable and can be conditioned on pose, shape or color. The result are samples of people in different clothing items and styles. The proposed model can generate entirely new people with realistic clothing. In several experiments we present encouraging results that suggest an entirely data-driven approach to people generation is possible.

ps

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


Semantic Video CNNs through Representation Warping
Semantic Video CNNs through Representation Warping

Gadde, R., Jampani, V., Gehler, P. V.

In Proceedings IEEE International Conference on Computer Vision (ICCV), IEEE, Piscataway, NJ, USA, IEEE International Conference on Computer Vision (ICCV), October 2017 (inproceedings) Accepted

Abstract
In this work, we propose a technique to convert CNN models for semantic segmentation of static images into CNNs for video data. We describe a warping method that can be used to augment existing architectures with very lit- tle extra computational cost. This module is called Net- Warp and we demonstrate its use for a range of network architectures. The main design principle is to use optical flow of adjacent frames for warping internal network repre- sentations across time. A key insight of this work is that fast optical flow methods can be combined with many different CNN architectures for improved performance and end-to- end training. Experiments validate that the proposed ap- proach incurs only little extra computational cost, while im- proving performance, when video streams are available. We achieve new state-of-the-art results on the standard CamVid and Cityscapes benchmark datasets and show reliable im- provements over different baseline networks. Our code and models are available at http://segmentation.is. tue.mpg.de

ps

pdf Supplementary Project Page [BibTex]

pdf Supplementary Project Page [BibTex]


A simple yet effective baseline for 3d human pose estimation
A simple yet effective baseline for 3d human pose estimation

Martinez, J., Hossain, R., Romero, J., Little, J. J.

In Proceedings IEEE International Conference on Computer Vision (ICCV), IEEE, Piscataway, NJ, USA, IEEE International Conference on Computer Vision (ICCV), October 2017 (inproceedings)

Abstract
Following the success of deep convolutional networks, state-of-the-art methods for 3d human pose estimation have focused on deep end-to-end systems that predict 3d joint locations given raw image pixels. Despite their excellent performance, it is often not easy to understand whether their remaining error stems from a limited 2d pose (visual) understanding, or from a failure to map 2d poses into 3-dimensional positions. With the goal of understanding these sources of error, we set out to build a system that given 2d joint locations predicts 3d positions. Much to our surprise, we have found that, with current technology, "lifting" ground truth 2d joint locations to 3d space is a task that can be solved with a remarkably low error rate: a relatively simple deep feed-forward network outperforms the best reported result by about 30\% on Human3.6M, the largest publicly available 3d pose estimation benchmark. Furthermore, training our system on the output of an off-the-shelf state-of-the-art 2d detector (\ie, using images as input) yields state of the art results -- this includes an array of systems that have been trained end-to-end specifically for this task. Our results indicate that a large portion of the error of modern deep 3d pose estimation systems stems from their visual analysis, and suggests directions to further advance the state of the art in 3d human pose estimation.

ps

video code arxiv pdf preprint Project Page [BibTex]

video code arxiv pdf preprint Project Page [BibTex]


An Online Scalable Approach to Unified Multirobot Cooperative Localization and Object Tracking
An Online Scalable Approach to Unified Multirobot Cooperative Localization and Object Tracking

Ahmad, A., Lawless, G., Lima, P.

IEEE Transactions on Robotics (T-RO), 33, pages: 1184 - 1199, October 2017 (article)

Abstract
In this article we present a unified approach for multi-robot cooperative simultaneous localization and object tracking based on particle filters. Our approach is scalable with respect to the number of robots in the team. We introduce a method that reduces, from an exponential to a linear growth, the space and computation time requirements with respect to the number of robots in order to maintain a given level of accuracy in the full state estimation. Our method requires no increase in the number of particles with respect to the number of robots. However, in our method each particle represents a full state hypothesis, leading to the linear dependency on the number of robots of both space and time complexity. The derivation of the algorithm implementing our approach from a standard particle filter algorithm and its complexity analysis are presented. Through an extensive set of simulation experiments on a large number of randomized datasets, we demonstrate the correctness and efficacy of our approach. Through real robot experiments on a standardized open dataset of a team of four soccer playing robots tracking a ball, we evaluate our method's estimation accuracy with respect to the ground truth values. Through comparisons with other methods based on i) nonlinear least squares minimization and ii) joint extended Kalman filter, we further highlight our method's advantages. Finally, we also present a robustness test for our approach by evaluating it under scenarios of communication and vision failure in teammate robots.

ps

Published Version link (url) DOI [BibTex]

Published Version link (url) DOI [BibTex]


Electrically tunable binary phase Fresnel lens based on a dielectric elastomer actuator
Electrically tunable binary phase Fresnel lens based on a dielectric elastomer actuator

Park, S., Park, B., Nam, S., Yun, S., Park, S. K., Mun, S., Lim, J. M., Ryu, Y., Song, S. H., Kyung, K.

Optics Express, 25(20):23801-23808, OSA, October 2017 (article)

Abstract
We propose and demonstrate an all-solid-state tunable binary phase Fresnel lens with electrically controllable focal length. The lens is composed of a binary phase Fresnel zone plate, a circular acrylic frame, and a dielectric elastomer (DE) actuator which is made of a thin DE layer and two compliant electrodes using silver nanowires. Under electric potential, the actuator produces in-plane deformation in a radial direction that can compress the Fresnel zones. The electrically-induced deformation compresses the Fresnel zones to be contracted as high as 9.1 % and changes the focal length, getting shorter from 20.0 cm to 14.5 cm. The measured change in the focal length of the fabricated lens is consistent with the result estimated from numerical simulation.

hi

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Using Contact Forces and Robot Arm Accelerations to Automatically Rate Surgeon Skill at Peg Transfer

Brown, J. D., O’Brien, C. E., Leung, S. C., Dumon, K. R., Lee, D. I., Kuchenbecker, K. J.

IEEE Transactions on Biomedical Engineering, 64(9):2263-2275, September 2017 (article)

hi

link (url) DOI [BibTex]

link (url) DOI [BibTex]


An Interactive Robotic System for Promoting Social Engagement
An Interactive Robotic System for Promoting Social Engagement

Burns, R., Javed, H., Jeon, M., Howard, A., Park, C. H.

In International Conference on Intelligent Robots and Systems (IROS) 2017, International Conference on Intelligent Robots and Systems, September 2017 (inproceedings)

Abstract
This abstract (and poster) is a condensed version of Burns' Master's thesis and related journal article. It discusses the use of imitation via robotic motion learning to improve human-robot interaction. It focuses on the preliminary results from a pilot study of 12 subjects. We hypothesized that the robot's use of imitation will increase the user's openness towards engaging with the robot. Post-imitation, experimental subjects displayed a more positive emotional state, had higher instances of mood contagion towards the robot, and interpreted the robot to have a higher level of autonomy than their control group counterparts. These results point to an increased user interest in engagement fueled by personalized imitation during interaction.

hi

[BibTex]

[BibTex]


Parameterized Model of {2D} Articulated Human Shape
Parameterized Model of 2D Articulated Human Shape

Black, M. J., Freifeld, O., Weiss, A., Loper, M., Guan, P.

September 2017, U.S.~Patent 9,761,060 (misc)

Abstract
Disclosed are computer-readable devices, systems and methods for generating a model of a clothed body. The method includes generating a model of an unclothed human body, the model capturing a shape or a pose of the unclothed human body, determining two-dimensional contours associated with the model, and computing deformations by aligning a contour of a clothed human body with a contour of the unclothed human body. Based on the two-dimensional contours and the deformations, the method includes generating a first two-dimensional model of the unclothed human body, the first two-dimensional model factoring the deformations of the unclothed human body into one or more of a shape variation component, a viewpoint change, and a pose variation and learning an eigen-clothing model using principal component analysis applied to the deformations, wherein the eigen-clothing model classifies different types of clothing, to yield a second two-dimensional model of a clothed human body.

ps

Google Patents [BibTex]


A Robotic Framework to Overcome Sensory Overload in Children on the Autism Spectrum: A Pilot Study
A Robotic Framework to Overcome Sensory Overload in Children on the Autism Spectrum: A Pilot Study

Javed, H., Burns, R., Jeon, M., Howard, A., Park, C. H.

In International Conference on Intelligent Robots and Systems (IROS) 2017, International Conference on Intelligent Robots and Systems, September 2017 (inproceedings)

Abstract
This paper discusses a novel framework designed to provide sensory stimulation to children with Autism Spectrum Disorder (ASD). The set up consists of multi-sensory stations to stimulate visual/auditory/olfactory/gustatory/tactile/vestibular senses, together with a robotic agent that navigates through each station responding to the different stimuli. We hypothesize that the robot’s responses will help children learn acceptable ways to respond to stimuli that might otherwise trigger sensory overload. Preliminary results from a pilot study conducted to examine the effectiveness of such a setup were encouraging and are described briefly in this text.

hi

[BibTex]

[BibTex]


 Effects of animation retargeting on perceived action outcomes
Effects of animation retargeting on perceived action outcomes

Kenny, S., Mahmood, N., Honda, C., Black, M. J., Troje, N. F.

Proceedings of the ACM Symposium on Applied Perception (SAP’17), pages: 2:1-2:7, September 2017 (conference)

Abstract
The individual shape of the human body, including the geometry of its articulated structure and the distribution of weight over that structure, influences the kinematics of a person's movements. How sensitive is the visual system to inconsistencies between shape and motion introduced by retargeting motion from one person onto the shape of another? We used optical motion capture to record five pairs of male performers with large differences in body weight, while they pushed, lifted, and threw objects. Based on a set of 67 markers, we estimated both the kinematics of the actions as well as the performer's individual body shape. To obtain consistent and inconsistent stimuli, we created animated avatars by combining the shape and motion estimates from either a single performer or from different performers. In a virtual reality environment, observers rated the perceived weight or thrown distance of the objects. They were also asked to explicitly discriminate between consistent and hybrid stimuli. Observers were unable to accomplish the latter, but hybridization of shape and motion influenced their judgements of action outcome in systematic ways. Inconsistencies between shape and motion were assimilated into an altered perception of the action outcome.

ps

pdf DOI [BibTex]

pdf DOI [BibTex]


no image
Stiffness Perception during Pinching and Dissection with Teleoperated Haptic Forceps

Ng, C., Zareinia, K., Sun, Q., Kuchenbecker, K. J.

In Proceedings of the International Symposium on Robot and Human Interactive Communication (RO-MAN), pages: 456-463, Lisbon, Portugal, August 2017 (inproceedings)

hi

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Coupling Adaptive Batch Sizes with Learning Rates
Coupling Adaptive Batch Sizes with Learning Rates

Balles, L., Romero, J., Hennig, P.

In Proceedings Conference on Uncertainty in Artificial Intelligence (UAI) 2017, pages: 410-419, (Editors: Gal Elidan and Kristian Kersting), Association for Uncertainty in Artificial Intelligence (AUAI), Conference on Uncertainty in Artificial Intelligence (UAI), August 2017 (inproceedings)

Abstract
Mini-batch stochastic gradient descent and variants thereof have become standard for large-scale empirical risk minimization like the training of neural networks. These methods are usually used with a constant batch size chosen by simple empirical inspection. The batch size significantly influences the behavior of the stochastic optimization algorithm, though, since it determines the variance of the gradient estimates. This variance also changes over the optimization process; when using a constant batch size, stability and convergence is thus often enforced by means of a (manually tuned) decreasing learning rate schedule. We propose a practical method for dynamic batch size adaptation. It estimates the variance of the stochastic gradients and adapts the batch size to decrease the variance proportionally to the value of the objective function, removing the need for the aforementioned learning rate decrease. In contrast to recent related work, our algorithm couples the batch size to the learning rate, directly reflecting the known relationship between the two. On three image classification benchmarks, our batch size adaptation yields faster optimization convergence, while simultaneously simplifying learning rate tuning. A TensorFlow implementation is available.

ps pn

Code link (url) Project Page [BibTex]

Code link (url) Project Page [BibTex]


Physical and Behavioral Factors Improve Robot Hug Quality
Physical and Behavioral Factors Improve Robot Hug Quality

Block, A. E., Kuchenbecker, K. J.

Workshop Paper (2 pages) presented at the RO-MAN Workshop on Social Interaction and Multimodal Expression for Socially Intelligent Robots, Lisbon, Portugal, August 2017 (misc)

Abstract
A hug is one of the most basic ways humans can express affection. As hugs are so common, a natural progression of robot development is to have robots one day hug humans as seamlessly as these intimate human-human interactions occur. This project’s purpose is to evaluate human responses to different robot physical characteristics and hugging behaviors. Specifically, we aim to test the hypothesis that a warm, soft, touch-sensitive PR2 humanoid robot can provide humans with satisfying hugs by matching both their hugging pressure and their hugging duration. Thirty participants experienced and evaluated twelve hugs with the robot, divided into three randomly ordered trials that focused on physical robot char- acteristics and nine randomly ordered trials with varied hug pressure and duration. We found that people prefer soft, warm hugs over hard, cold hugs. Furthermore, users prefer hugs that physically squeeze them and release immediately when they are ready for the hug to end.

hi

Project Page [BibTex]

Project Page [BibTex]


Crowdshaping Realistic {3D} Avatars with Words
Crowdshaping Realistic 3D Avatars with Words

Streuber, S., Ramirez, M. Q., Black, M., Zuffi, S., O’Toole, A., Hill, M. Q., Hahn, C. A.

August 2017, Application PCT/EP2017/051954 (misc)

Abstract
A method for generating a body shape, comprising the steps: - receiving one or more linguistic descriptors related to the body shape; - retrieving an association between the one or more linguistic descriptors and a body shape; and - generating the body shape, based on the association.

ps

Google Patents [BibTex]

Google Patents [BibTex]


Robotic Motion Learning Framework to Promote Social Engagement
Robotic Motion Learning Framework to Promote Social Engagement

Burns, R.

The George Washington University, August 2017 (mastersthesis)

Abstract
This paper discusses a novel framework designed to increase human-robot interaction through robotic imitation of the user's gestures. The set up consists of a humanoid robotic agent that socializes with and play games with the user. For the experimental group, the robot also imitates one of the user's novel gestures during a play session. We hypothesize that the robot's use of imitation will increase the user's openness towards engaging with the robot. Preliminary results from a pilot study of 12 subjects are promising in that post-imitation, experimental subjects displayed a more positive emotional state, had higher instances of mood contagion towards the robot, and interpreted the robot to have a higher level of autonomy than their control group counterparts. These results point to an increased user interest in engagement fueled by personalized imitation during interaction.

hi

link (url) [BibTex]

link (url) [BibTex]


no image
Ungrounded Haptic Augmented Reality System for Displaying Texture and Friction

Culbertson, H., Kuchenbecker, K. J.

IEEE/ASME Transactions on Mechatronics, 22(4):1839-1849, August 2017 (article)

hi

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Joint Graph Decomposition and Node Labeling by Local Search
Joint Graph Decomposition and Node Labeling by Local Search

Levinkov, E., Uhrig, J., Tang, S., Omran, M., Insafutdinov, E., Kirillov, A., Rother, C., Brox, T., Schiele, B., Andres, B.

In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 1904-1912, IEEE, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

ps

PDF Supplementary DOI Project Page [BibTex]

PDF Supplementary DOI Project Page [BibTex]


Dynamic {FAUST}: Registering Human Bodies in Motion
Dynamic FAUST: Registering Human Bodies in Motion

Bogo, F., Romero, J., Pons-Moll, G., Black, M. J.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
While the ready availability of 3D scan data has influenced research throughout computer vision, less attention has focused on 4D data; that is 3D scans of moving nonrigid objects, captured over time. To be useful for vision research, such 4D scans need to be registered, or aligned, to a common topology. Consequently, extending mesh registration methods to 4D is important. Unfortunately, no ground-truth datasets are available for quantitative evaluation and comparison of 4D registration methods. To address this we create a novel dataset of high-resolution 4D scans of human subjects in motion, captured at 60 fps. We propose a new mesh registration method that uses both 3D geometry and texture information to register all scans in a sequence to a common reference topology. The approach exploits consistency in texture over both short and long time intervals and deals with temporal offsets between shape and texture capture. We show how using geometry alone results in significant errors in alignment when the motions are fast and non-rigid. We evaluate the accuracy of our registration and provide a dataset of 40,000 raw and aligned meshes. Dynamic FAUST extends the popular FAUST dataset to dynamic 4D data, and is available for research purposes at http://dfaust.is.tue.mpg.de.

ps

pdf video Project Page Project Page Project Page [BibTex]

pdf video Project Page Project Page Project Page [BibTex]


Learning from Synthetic Humans
Learning from Synthetic Humans

Varol, G., Romero, J., Martin, X., Mahmood, N., Black, M. J., Laptev, I., Schmid, C.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
Estimating human pose, shape, and motion from images and videos are fundamental challenges with many applications. Recent advances in 2D human pose estimation use large amounts of manually-labeled training data for learning convolutional neural networks (CNNs). Such data is time consuming to acquire and difficult to extend. Moreover, manual labeling of 3D pose, depth and motion is impractical. In this work we present SURREAL (Synthetic hUmans foR REAL tasks): a new large-scale dataset with synthetically-generated but realistic images of people rendered from 3D sequences of human motion capture data. We generate more than 6 million frames together with ground truth pose, depth maps, and segmentation masks. We show that CNNs trained on our synthetic dataset allow for accurate human depth estimation and human part segmentation in real RGB images. Our results and the new dataset open up new possibilities for advancing person analysis using cheap and large-scale synthetic data.

ps

arXiv project data Project Page Project Page [BibTex]

arXiv project data Project Page Project Page [BibTex]


On human motion prediction using recurrent neural networks
On human motion prediction using recurrent neural networks

Martinez, J., Black, M. J., Romero, J.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
Human motion modelling is a classical problem at the intersection of graphics and computer vision, with applications spanning human-computer interaction, motion synthesis, and motion prediction for virtual and augmented reality. Following the success of deep learning methods in several computer vision tasks, recent work has focused on using deep recurrent neural networks (RNNs) to model human motion, with the goal of learning time-dependent representations that perform tasks such as short-term motion prediction and long-term human motion synthesis. We examine recent work, with a focus on the evaluation methodologies commonly used in the literature, and show that, surprisingly, state-of-the-art performance can be achieved by a simple baseline that does not attempt to model motion at all. We investigate this result, and analyze recent RNN methods by looking at the architectures, loss functions, and training procedures used in state-of-the-art approaches. We propose three changes to the standard RNN models typically used for human motion, which result in a simple and scalable RNN architecture that obtains state-of-the-art performance on human motion prediction.

ps

arXiv Project Page [BibTex]

arXiv Project Page [BibTex]


Articulated Multi-person Tracking in the Wild
Articulated Multi-person Tracking in the Wild

Insafutdinov, E., Andriluka, M., Pishchulin, L., Tang, S., Levinkov, E., Andres, B., Schiele, B.

In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 1293-1301, IEEE, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017, Oral (inproceedings)

ps

DOI [BibTex]

DOI [BibTex]


Slow Flow: Exploiting High-Speed Cameras for Accurate and Diverse Optical Flow Reference Data
Slow Flow: Exploiting High-Speed Cameras for Accurate and Diverse Optical Flow Reference Data

Janai, J., Güney, F., Wulff, J., Black, M., Geiger, A.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, pages: 1406-1416, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
Existing optical flow datasets are limited in size and variability due to the difficulty of capturing dense ground truth. In this paper, we tackle this problem by tracking pixels through densely sampled space-time volumes recorded with a high-speed video camera. Our model exploits the linearity of small motions and reasons about occlusions from multiple frames. Using our technique, we are able to establish accurate reference flow fields outside the laboratory in natural environments. Besides, we show how our predictions can be used to augment the input images with realistic motion blur. We demonstrate the quality of the produced flow fields on synthetic and real-world datasets. Finally, we collect a novel challenging optical flow dataset by applying our technique on data from a high-speed camera and analyze the performance of the state-of-the-art in optical flow under various levels of motion blur.

avg ps

pdf suppmat Project page Video DOI Project Page [BibTex]

pdf suppmat Project page Video DOI Project Page [BibTex]


Optical Flow in Mostly Rigid Scenes
Optical Flow in Mostly Rigid Scenes

Wulff, J., Sevilla-Lara, L., Black, M. J.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, pages: 6911-6920, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
The optical flow of natural scenes is a combination of the motion of the observer and the independent motion of objects. Existing algorithms typically focus on either recovering motion and structure under the assumption of a purely static world or optical flow for general unconstrained scenes. We combine these approaches in an optical flow algorithm that estimates an explicit segmentation of moving objects from appearance and physical constraints. In static regions we take advantage of strong constraints to jointly estimate the camera motion and the 3D structure of the scene over multiple frames. This allows us to also regularize the structure instead of the motion. Our formulation uses a Plane+Parallax framework, which works even under small baselines, and reduces the motion estimation to a one-dimensional search problem, resulting in more accurate estimation. In moving regions the flow is treated as unconstrained, and computed with an existing optical flow method. The resulting Mostly-Rigid Flow (MR-Flow) method achieves state-of-the-art results on both the MPISintel and KITTI-2015 benchmarks.

ps

pdf SupMat video code Project Page [BibTex]

pdf SupMat video code Project Page [BibTex]


OctNet: Learning Deep 3D Representations at High Resolutions
OctNet: Learning Deep 3D Representations at High Resolutions

Riegler, G., Ulusoy, O., Geiger, A.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
We present OctNet, a representation for deep learning with sparse 3D data. In contrast to existing models, our representation enables 3D convolutional networks which are both deep and high resolution. Towards this goal, we exploit the sparsity in the input data to hierarchically partition the space using a set of unbalanced octrees where each leaf node stores a pooled feature representation. This allows to focus memory allocation and computation to the relevant dense regions and enables deeper networks without compromising resolution. We demonstrate the utility of our OctNet representation by analyzing the impact of resolution on several 3D tasks including 3D object classification, orientation estimation and point cloud labeling.

avg ps

pdf suppmat Project Page Video Project Page [BibTex]

pdf suppmat Project Page Video Project Page [BibTex]


Reflectance Adaptive Filtering Improves Intrinsic Image Estimation
Reflectance Adaptive Filtering Improves Intrinsic Image Estimation

Nestmeyer, T., Gehler, P. V.

In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 1771-1780, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

ps

pre-print DOI Project Page Project Page [BibTex]

pre-print DOI Project Page Project Page [BibTex]


Detailed, accurate, human shape estimation from clothed {3D} scan sequences
Detailed, accurate, human shape estimation from clothed 3D scan sequences

Zhang, C., Pujades, S., Black, M., Pons-Moll, G.

In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, Washington, DC, USA, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), July 2017, Spotlight (inproceedings)

Abstract
We address the problem of estimating human body shape from 3D scans over time. Reliable estimation of 3D body shape is necessary for many applications including virtual try-on, health monitoring, and avatar creation for virtual reality. Scanning bodies in minimal clothing, however, presents a practical barrier to these applications. We address this problem by estimating body shape under clothing from a sequence of 3D scans. Previous methods that have exploited statistical models of body shape produce overly smooth shapes lacking personalized details. In this paper we contribute a new approach to recover not only an approximate shape of the person, but also their detailed shape. Our approach allows the estimated shape to deviate from a parametric model to fit the 3D scans. We demonstrate the method using high quality 4D data as well as sequences of visual hulls extracted from multi-view images. We also make available a new high quality 4D dataset that enables quantitative evaluation. Our method outperforms the previous state of the art, both qualitatively and quantitatively.

ps

arxiv_preprint video dataset pdf supplemental DOI Project Page [BibTex]

arxiv_preprint video dataset pdf supplemental DOI Project Page [BibTex]


{3D} Menagerie: Modeling the {3D} Shape and Pose of Animals
3D Menagerie: Modeling the 3D Shape and Pose of Animals

Zuffi, S., Kanazawa, A., Jacobs, D., Black, M. J.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, pages: 5524-5532, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
There has been significant work on learning realistic, articulated, 3D models of the human body. In contrast, there are few such models of animals, despite many applications. The main challenge is that animals are much less cooperative than humans. The best human body models are learned from thousands of 3D scans of people in specific poses, which is infeasible with live animals. Consequently, we learn our model from a small set of 3D scans of toy figurines in arbitrary poses. We employ a novel part-based shape model to compute an initial registration to the scans. We then normalize their pose, learn a statistical shape model, and refine the registrations and the model together. In this way, we accurately align animal scans from different quadruped families with very different shapes and poses. With the registration to a common template we learn a shape space representing animals including lions, cats, dogs, horses, cows and hippos. Animal shapes can be sampled from the model, posed, animated, and fit to data. We demonstrate generalization by fitting it to images of real animals including species not seen in training.

ps

pdf video Project Page [BibTex]

pdf video Project Page [BibTex]


Optical Flow Estimation using a Spatial Pyramid Network
Optical Flow Estimation using a Spatial Pyramid Network

Ranjan, A., Black, M.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
We learn to compute optical flow by combining a classical spatial-pyramid formulation with deep learning. This estimates large motions in a coarse-to-fine approach by warping one image of a pair at each pyramid level by the current flow estimate and computing an update to the flow. Instead of the standard minimization of an objective function at each pyramid level, we train one deep network per level to compute the flow update. Unlike the recent FlowNet approach, the networks do not need to deal with large motions; these are dealt with by the pyramid. This has several advantages. First, our Spatial Pyramid Network (SPyNet) is much simpler and 96% smaller than FlowNet in terms of model parameters. This makes it more efficient and appropriate for embedded applications. Second, since the flow at each pyramid level is small (< 1 pixel), a convolutional approach applied to pairs of warped images is appropriate. Third, unlike FlowNet, the learned convolution filters appear similar to classical spatio-temporal filters, giving insight into the method and how to improve it. Our results are more accurate than FlowNet on most standard benchmarks, suggesting a new direction of combining classical flow methods with deep learning.

ps

pdf SupMat project/code [BibTex]

pdf SupMat project/code [BibTex]


Multiple People Tracking by Lifted Multicut and Person Re-identification
Multiple People Tracking by Lifted Multicut and Person Re-identification

Tang, S., Andriluka, M., Andres, B., Schiele, B.

In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 3701-3710, IEEE Computer Society, Washington, DC, USA, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

ps

DOI Project Page [BibTex]

DOI Project Page [BibTex]


Video Propagation Networks
Video Propagation Networks

Jampani, V., Gadde, R., Gehler, P. V.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

ps

pdf supplementary arXiv project page code Project Page [BibTex]

pdf supplementary arXiv project page code Project Page [BibTex]


Generating Descriptions with Grounded and Co-Referenced People
Generating Descriptions with Grounded and Co-Referenced People

Rohrbach, A., Rohrbach, M., Tang, S., Oh, S. J., Schiele, B.

In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 4196-4206, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

ps

PDF DOI Project Page [BibTex]

PDF DOI Project Page [BibTex]


Semantic Multi-view Stereo: Jointly Estimating Objects and Voxels
Semantic Multi-view Stereo: Jointly Estimating Objects and Voxels

Ulusoy, A. O., Black, M. J., Geiger, A.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
Dense 3D reconstruction from RGB images is a highly ill-posed problem due to occlusions, textureless or reflective surfaces, as well as other challenges. We propose object-level shape priors to address these ambiguities. Towards this goal, we formulate a probabilistic model that integrates multi-view image evidence with 3D shape information from multiple objects. Inference in this model yields a dense 3D reconstruction of the scene as well as the existence and precise 3D pose of the objects in it. Our approach is able to recover fine details not captured in the input shapes while defaulting to the input models in occluded regions where image evidence is weak. Due to its probabilistic nature, the approach is able to cope with the approximate geometry of the 3D models as well as input shapes that are not present in the scene. We evaluate the approach quantitatively on several challenging indoor and outdoor datasets.

avg ps

YouTube pdf suppmat Project Page [BibTex]

YouTube pdf suppmat Project Page [BibTex]


Deep representation learning for human motion prediction and classification
Deep representation learning for human motion prediction and classification

Bütepage, J., Black, M., Kragic, D., Kjellström, H.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
Generative models of 3D human motion are often restricted to a small number of activities and can therefore not generalize well to novel movements or applications. In this work we propose a deep learning framework for human motion capture data that learns a generic representation from a large corpus of motion capture data and generalizes well to new, unseen, motions. Using an encoding-decoding network that learns to predict future 3D poses from the most recent past, we extract a feature representation of human motion. Most work on deep learning for sequence prediction focuses on video and speech. Since skeletal data has a different structure, we present and evaluate different network architectures that make different assumptions about time dependencies and limb correlations. To quantify the learned features, we use the output of different layers for action classification and visualize the receptive fields of the network units. Our method outperforms the recent state of the art in skeletal motion prediction even though these use action specific training data. Our results show that deep feedforward networks, trained from a generic mocap database, can successfully be used for feature extraction from human motion data and that this representation can be used as a foundation for classification and prediction.

ps

arXiv Project Page [BibTex]

arXiv Project Page [BibTex]


Unite the People: Closing the Loop Between 3D and 2D Human Representations
Unite the People: Closing the Loop Between 3D and 2D Human Representations

Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M. J., Gehler, P. V.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
3D models provide a common ground for different representations of human bodies. In turn, robust 2D estimation has proven to be a powerful tool to obtain 3D fits “in-the-wild”. However, depending on the level of detail, it can be hard to impossible to acquire labeled data for training 2D estimators on large scale. We propose a hybrid approach to this problem: with an extended version of the recently introduced SMPLify method, we obtain high quality 3D body model fits for multiple human pose datasets. Human annotators solely sort good and bad fits. This procedure leads to an initial dataset, UP-3D, with rich annotations. With a comprehensive set of experiments, we show how this data can be used to train discriminative models that produce results with an unprecedented level of detail: our models predict 31 segments and 91 landmark locations on the body. Using the 91 landmark pose estimator, we present state-of-the art results for 3D human pose and shape estimation using an order of magnitude less training data and without assumptions about gender or pose in the fitting procedure. We show that UP-3D can be enhanced with these improved fits to grow in quantity and quality, which makes the system deployable on large scale. The data, code and models are available for research purposes.

ps

arXiv project/code/data Project Page [BibTex]

arXiv project/code/data Project Page [BibTex]


no image
A variation in wrinkle structures of UV-cured films with chemical structures of prepolymers

Park, S. K., Kwark, Y., Nam, S., Moon, J., Kim, D. W., Park, S., Park, B., Yun, S., Lee, J., Yu, B., Kyung, K.

Materials Letters, 199, pages: 105-109, July 2017 (article)

Abstract
Spontaneously wrinkled films can be easily obtained from UV-crosslinkable liquid prepolymers under special UV-curing conditions. They vary wrinkle structures of the UV-cured films and, however, cannot be precisely controlled. Here, five different UV-crosslinkable prepolymers are synthesized to study the chemical structure effect of prepolymers on wrinkle formation and modulation of the UV-cured films irrespective of the UV-curing conditions. Both wavelength and amplitude of the wrinkles are tuned with the different liquid prepolymers from 4.10 to 5.63µm and from 1.00 to 1.66µm, respectively. The wrinkle structures of the UV-cured films are faded by adding a solid prepolymer to a liquid prepolymer due to interference from it in the shrinkage of the liquid prepolymer layer. The wrinkles completely disappear in the UV-cured films fabricated from the formulated prepolymers containing over 50wt% of the solid prepolymer.

hi

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Design of a Parallel Continuum Manipulator for 6-DOF Fingertip Haptic Display

Young, E. M., Kuchenbecker, K. J.

In Proceedings of the IEEE World Haptics Conference (WHC), pages: 599-604, Munich, Germany, June 2017, Finalist for best poster paper (inproceedings)

Abstract
Despite rapid advancements in the field of fingertip haptics, rendering tactile cues with six degrees of freedom (6 DOF) remains an elusive challenge. In this paper, we investigate the potential of displaying fingertip haptic sensations with a 6-DOF parallel continuum manipulator (PCM) that mounts to the user's index finger and moves a contact platform around the fingertip. Compared to traditional mechanisms composed of rigid links and discrete joints, PCMs have the potential to be strong, dexterous, and compact, but they are also more complicated to design. We define the design space of 6-DOF parallel continuum manipulators and outline a process for refining such a device for fingertip haptic applications. Following extensive simulation, we obtain 12 designs that meet our specifications, construct a manually actuated prototype of one such design, and evaluate the simulation's ability to accurately predict the prototype's motion. Finally, we demonstrate the range of deliverable fingertip tactile cues, including a normal force into the finger and shear forces tangent to the finger at three extreme points on the boundary of the fingertip.

hi

DOI Project Page [BibTex]

DOI Project Page [BibTex]


no image
High Magnitude Unidirectional Haptic Force Display Using a Motor/Brake Pair and a Cable

Hu, S., Kuchenbecker, K. J.

In Proceedings of the IEEE World Haptics Conference (WHC), pages: 394-399, Munich, Germany, June 2017 (inproceedings)

Abstract
Clever electromechanical design is required to make the force feedback delivered by a kinesthetic haptic interface both strong and safe. This paper explores a onedimensional haptic force display that combines a DC motor and a magnetic particle brake on the same shaft. Rather than a rigid linkage, a spooled cable connects the user to the actuators to enable a large workspace, reduce the moving mass, and eliminate the sticky residual force from the brake. This design combines the high torque/power ratio of the brake and the active output capabilities of the motor to provide a wider range of forces than can be achieved with either actuator alone. A prototype of this device was built, its performance was characterized, and it was used to simulate constant force sources and virtual springs and dampers. Compared to the conventional design of using only a motor, the hybrid device can output higher unidirectional forces at the expense of free space feeling less free.

hi

DOI Project Page [BibTex]

DOI Project Page [BibTex]


no image
Physically Interactive Exercise Games with a Baxter Robot

Fitter, N. T., Kuchenbecker, K. J.

Hands-on demonstration presented at the IEEE World Haptics Conference (WHC), Munich, Germany, June 2017 (misc)

hi

Project Page [BibTex]

Project Page [BibTex]


no image
Perception of Force and Stiffness in the Presence of Low-Frequency Haptic Noise

Gurari, N., Okamura, A. M., Kuchenbecker, K. J.

PLoS ONE, 12(6):e0178605, June 2017 (article)

hi

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
A Wrist-Squeezing Force-Feedback System for Robotic Surgery Training

Brown, J. D., Fernandez, J. N., Cohen, S. P., Kuchenbecker, K. J.

In Proceedings of the IEEE World Haptics Conference (WHC), pages: 107-112, Munich, Germany, June 2017 (inproceedings)

Abstract
Over time, surgical trainees learn to compensate for the lack of haptic feedback in commercial robotic minimally invasive surgical systems. Incorporating touch cues into robotic surgery training could potentially shorten this learning process if the benefits of haptic feedback were sustained after it is removed. In this paper, we develop a wrist-squeezing haptic feedback system and evaluate whether it holds the potential to train novice da Vinci users to reduce the force they exert on a bimanual inanimate training task. Subjects were randomly divided into two groups according to a multiple baseline experimental design. Each of the ten participants moved a ring along a curved wire nine times while the haptic feedback was conditionally withheld, provided, and withheld again. The realtime tactile feedback of applied force magnitude significantly reduced the integral of the force produced by the da Vinci tools on the task materials, and this result remained even when the haptic feedback was removed. Overall, our findings suggest that wrist-squeezing force feedback can play an essential role in helping novice trainees learn to minimize the force they exert with a surgical robot.

hi

DOI [BibTex]

DOI [BibTex]


System and method for simulating realistic clothing
System and method for simulating realistic clothing

Black, M. J., Guan, P.

June 2017, U.S.~Patent 9,679,409 B2 (misc)

Abstract
Systems, methods, and computer-readable storage media for simulating realistic clothing. The system generates a clothing deformation model for a clothing type, wherein the clothing deformation model factors a change of clothing shape due to rigid limb rotation, pose-independent body shape, and pose-dependent deformations. Next, the system generates a custom-shaped garment for a given body by mapping, via the clothing deformation model, body shape parameters to clothing shape parameters. The system then automatically dresses the given body with the custom- shaped garment.

ps

Google Patents pdf [BibTex]


no image
Handling Scan-Time Parameters in Haptic Surface Classification

Burka, A., Kuchenbecker, K. J.

In Proceedings of the IEEE World Haptics Conference (WHC), pages: 424-429, Munich, Germany, June 2017 (inproceedings)

hi

DOI Project Page [BibTex]

DOI Project Page [BibTex]


no image
Proton Pack: Visuo-Haptic Surface Data Recording

Burka, A., Kuchenbecker, K. J.

Hands-on demonstration presented at the IEEE World Haptics Conference (WHC), Munich, Germany, June 2017 (misc)

hi

Project Page [BibTex]

Project Page [BibTex]


no image
Teaching a Robot to Collaborate with a Human Via Haptic Teleoperation

Hu, S., Kuchenbecker, K. J.

Work-in-progress paper (2 pages) presented at the IEEE World Haptics Conference (WHC), Munich, Germany, June 2017 (misc)

hi

Project Page [BibTex]

Project Page [BibTex]