Header logo is


2016


Thumb xl smpl
Skinned multi-person linear model

Black, M.J., Loper, M., Mahmood, N., Pons-Moll, G., Romero, J.

December 2016, Application PCT/EP2016/064610 (misc)

Abstract
The invention comprises a learned model of human body shape and pose dependent shape variation that is more accurate than previous models and is compatible with existing graphics pipelines. Our Skinned Multi-Person Linear model (SMPL) is a skinned vertex based model that accurately represents a wide variety of body shapes in natural human poses. The parameters of the model are learned from data including the rest pose template, blend weights, pose-dependent blend shapes, identity- dependent blend shapes, and a regressor from vertices to joint locations. Unlike previous models, the pose-dependent blend shapes are a linear function of the elements of the pose rotation matrices. This simple formulation enables training the entire model from a relatively large number of aligned 3D meshes of different people in different poses. The invention quantitatively evaluates variants of SMPL using linear or dual- quaternion blend skinning and show that both are more accurate than a Blend SCAPE model trained on the same data. In a further embodiment, the invention realistically models dynamic soft-tissue deformations. Because it is based on blend skinning, SMPL is compatible with existing rendering engines and we make it available for research purposes.

ps

Google Patents [BibTex]

2016


Google Patents [BibTex]


Thumb xl psychscience
Creating body shapes from verbal descriptions by linking similarity spaces

Hill, M. Q., Streuber, S., Hahn, C. A., Black, M. J., O’Toole, A. J.

Psychological Science, 27(11):1486-1497, November 2016, (article)

Abstract
Brief verbal descriptions of bodies (e.g. curvy, long-legged) can elicit vivid mental images. The ease with which we create these mental images belies the complexity of three-dimensional body shapes. We explored the relationship between body shapes and body descriptions and show that a small number of words can be used to generate categorically accurate representations of three-dimensional bodies. The dimensions of body shape variation that emerged in a language-based similarity space were related to major dimensions of variation computed directly from three-dimensional laser scans of 2094 bodies. This allowed us to generate three-dimensional models of people in the shape space using only their coordinates on analogous dimensions in the language-based description space. Human descriptions of photographed bodies and their corresponding models matched closely. The natural mapping between the spaces illustrates the role of language as a concise code for body shape, capturing perceptually salient global and local body features.

ps

pdf [BibTex]

pdf [BibTex]


no image
Qualitative User Reactions to a Hand-Clapping Humanoid Robot

Fitter, N. T., Kuchenbecker, K. J.

In Social Robotics: 8th International Conference, ICSR 2016, Kansas City, MO, USA, November 1-3, 2016 Proceedings, 9979, pages: 317-327, Lecture Notes in Artificial Intelligence, Springer International Publishing, November 2016, Oral presentation given by Fitter (inproceedings)

hi

[BibTex]

[BibTex]


no image
Designing and Assessing Expressive Open-Source Faces for the Baxter Robot

Fitter, N. T., Kuchenbecker, K. J.

In Social Robotics: 8th International Conference, ICSR 2016, Kansas City, MO, USA, November 1-3, 2016 Proceedings, 9979, pages: 340-350, Lecture Notes in Artificial Intelligence, Springer International Publishing, November 2016, Oral presentation given by Fitter (inproceedings)

hi

[BibTex]

[BibTex]


no image
Rhythmic Timing in Playful Human-Robot Social Motor Coordination

Fitter, N. T., Hawkes, D. T., Kuchenbecker, K. J.

In Social Robotics: 8th International Conference, ICSR 2016, Kansas City, MO, USA, November 1-3, 2016 Proceedings, 9979, pages: 296-305, Lecture Notes in Artificial Intelligence, Springer International Publishing, November 2016, Oral presentation given by Fitter (inproceedings)

hi

[BibTex]

[BibTex]


no image
An electro-active polymer based lens module for dynamically varying focal system

Yun, S., Park, S., Nam, S., Park, B., Park, S. K., Mun, S., Lim, J. M., Kyung, K.

Applied Physics Letters, 109(14):141908, October 2016 (article)

Abstract
We demonstrate a polymer-based active-lens module allowing a dynamic focus controllable optical system with a wide tunable range. The active-lens module is composed of parallelized two active- lenses with a convex and a concave shaped hemispherical lens structure, respectively. Under opera- tion with dynamic input voltage signals, each active-lens produces translational movement bi-directionally responding to a hybrid driving force that is a combination of an electro-active response of a thin dielectric elastomer membrane and an electro-static attraction force. Since the proposed active lens module widely modulates a gap-distance between lens-elements, an optical system based on the active-lens module provides widely-variable focusing for selective imaging of objects in arbitrary position.

hi

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Thumb xl smplify
Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image

Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M. J.

In Computer Vision – ECCV 2016, pages: 561-578, Lecture Notes in Computer Science, Springer International Publishing, 14th European Conference on Computer Vision, October 2016 (inproceedings)

Abstract
We describe the first method to automatically estimate the 3D pose of the human body as well as its 3D shape from a single unconstrained image. We estimate a full 3D mesh and show that 2D joints alone carry a surprising amount of information about body shape. The problem is challenging because of the complexity of the human body, articulation, occlusion, clothing, lighting, and the inherent ambiguity in inferring 3D from 2D. To solve this, we fi rst use a recently published CNN-based method, DeepCut, to predict (bottom-up) the 2D body joint locations. We then fit (top-down) a recently published statistical body shape model, called SMPL, to the 2D joints. We do so by minimizing an objective function that penalizes the error between the projected 3D model joints and detected 2D joints. Because SMPL captures correlations in human shape across the population, we are able to robustly fi t it to very little data. We further leverage the 3D model to prevent solutions that cause interpenetration. We evaluate our method, SMPLify, on the Leeds Sports, HumanEva, and Human3.6M datasets, showing superior pose accuracy with respect to the state of the art.

ps

pdf Video Sup Mat video Code Project Project Page [BibTex]

pdf Video Sup Mat video Code Project Project Page [BibTex]


Thumb xl gadde
Superpixel Convolutional Networks using Bilateral Inceptions

Gadde, R., Jampani, V., Kiefel, M., Kappler, D., Gehler, P.

In European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science, Springer, 14th European Conference on Computer Vision, October 2016 (inproceedings)

Abstract
In this paper we propose a CNN architecture for semantic image segmentation. We introduce a new “bilateral inception” module that can be inserted in existing CNN architectures and performs bilateral filtering, at multiple feature-scales, between superpixels in an image. The feature spaces for bilateral filtering and other parameters of the module are learned end-to-end using standard backpropagation techniques. The bilateral inception module addresses two issues that arise with general CNN segmentation architectures. First, this module propagates information between (super) pixels while respecting image edges, thus using the structured information of the problem for improved results. Second, the layer recovers a full resolution segmentation result from the lower resolution solution of a CNN. In the experiments, we modify several existing CNN architectures by inserting our inception modules between the last CNN (1 × 1 convolution) layers. Empirical results on three different datasets show reliable improvements not only in comparison to the baseline networks, but also in comparison to several dense-pixel prediction techniques such as CRFs, while being competitive in time.

am ps

pdf supplementary poster Project Page Project Page [BibTex]

pdf supplementary poster Project Page Project Page [BibTex]


Thumb xl thumb
Barrista - Caffe Well-Served

Lassner, C., Kappler, D., Kiefel, M., Gehler, P.

In ACM Multimedia Open Source Software Competition, ACM OSSC16, October 2016 (inproceedings)

Abstract
The caffe framework is one of the leading deep learning toolboxes in the machine learning and computer vision community. While it offers efficiency and configurability, it falls short of a full interface to Python. With increasingly involved procedures for training deep networks and reaching depths of hundreds of layers, creating configuration files and keeping them consistent becomes an error prone process. We introduce the barrista framework, offering full, pythonic control over caffe. It separates responsibilities and offers code to solve frequently occurring tasks for pre-processing, training and model inspection. It is compatible to all caffe versions since mid 2015 and can import and export .prototxt files. Examples are included, e.g., a deep residual network implemented in only 172 lines (for arbitrary depths), comparing to 2320 lines in the official implementation for the equivalent model.

am ps

pdf link (url) DOI Project Page [BibTex]

pdf link (url) DOI Project Page [BibTex]


no image
Using IMU Data to Demonstrate Hand-Clapping Games to a Robot

Fitter, N. T., Kuchenbecker, K. J.

In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pages: 851 - 856, October 2016, Interactive presentation given by Fitter (inproceedings)

hi

[BibTex]

[BibTex]


no image
Wrinkle structures formed by formulating UV-crosslinkable liquid prepolymers

Park, S. K., Kwark, Y., Nam, S., Park, S., Park, B., Yun, S., Moon, J., Lee, J., Yu, B., Kyung, K.

Polymer, 99, pages: 447-452, September 2016 (article)

Abstract
Artificial wrinkles have recently been in the spotlight due to their potential use in high-tech applications. A spontaneously wrinkled film can be fabricated from UV-crosslinkable liquid prepolymers. Here, we controlled the wrinkle formation by simply formulating two UV-crosslinkable liquid prepolymers, tetraethylene glycol bis(4-ethenyl-2,3,5,6-tetrafluorophenyl) ether (TEGDSt) and tetraethylene glycol diacrylate (TEGDA). The wrinkles were formed from the TEGDSt/TEGDA formulated prepolymer layers containing up to 30 wt% of TEGDA. The wrinkle formation depended upon the rate of photo-crosslinking reaction of the formulated prepolymers. The first order apparent rate constant, kapp, was between ca. 5.7 × 10−3 and 12.2 × 10−3 s−1 for the wrinkle formation. The wrinkle structures were modulated within the kapp mainly due to variation in the extent of shrinkage of the formulated prepolymer layers with the content of TEGDA

hi

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Numerical Investigation of Frictional Forces Between a Finger and a Textured Surface During Active Touch

Khojasteh, B., Janko, M., Visell, Y.

Extended abstract presented in form of an oral presentation at the 3rd International Conference on BioTribology (ICoBT), London, England, September 2016 (misc)

Abstract
The biomechanics of the human finger pad has been investigated in relation to motor behaviour and sensory function in the upper limb. While the frictional properties of the finger pad are important for grip and grasp function, recent attention has also been given to the roles played by friction when perceiving a surface via sliding contact. Indeed, the mechanics of sliding contact greatly affect stimuli felt by the finger scanning a surface. Past research has shed light on neural mechanisms of haptic texture perception, but the relation with time-resolved frictional contact interactions is unknown. Current biotribological models cannot predict time-resolved frictional forces felt by a finger as it slides on a rough surface. This constitutes a missing link in understanding the mechanical basis of texture perception. To ameliorate this, we developed a two-dimensional finite element numerical simulation of a human finger pad in sliding contact with a textured surface. Our model captures bulk mechanical properties, including hyperelasticity, dissipation, and tissue heterogeneity, and contact dynamics. To validate it, we utilized a database of measurements that we previously captured with a variety of human fingers and surfaces. By designing the simulations to match the measurements, we evaluated the ability of the FEM model to predict time-resolved sliding frictional forces. We varied surface texture wavelength, sliding speed, and normal forces in the experiments. An analysis of the results indicated that both time- and frequency-domain features of forces produced during finger-surface sliding interactions were reproduced, including many of the phenomena that we observed in analyses of real measurements, including quasiperiodicity, harmonic distortion and spectral decay in the frequency domain, and their dependence on kinetics and surface properties. The results shed light on frictional signatures of surface texture during active touch, and may inform understanding of the role played by friction in texture discrimination.

hi

[BibTex]

[BibTex]


no image
ProtonPack: A Visuo-Haptic Data Acquisition System for Robotic Learning of Surface Properties

Burka, A., Hu, S., Helgeson, S., Krishnan, S., Gao, Y., Hendricks, L. A., Darrell, T., Kuchenbecker, K. J.

In Proceedings of the IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), pages: 58-65, 2016, Oral presentation given by Burka (inproceedings)

hi

Project Page [BibTex]

Project Page [BibTex]


no image
Equipping the Baxter Robot with Human-Inspired Hand-Clapping Skills

Fitter, N. T., Kuchenbecker, K. J.

In Proceedings of the IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pages: 105-112, 2016 (inproceedings)

hi

[BibTex]

[BibTex]


Thumb xl romo and mini
Behavioral Learning and Imitation for Music-Based Robotic Therapy for Children with Autism Spectrum Disorder

Burns, R., Nizambad, S., Park, C. H., Jeon, M., Howard, A.

Workshop paper (5 pages) at the RO-MAN Workshop on Behavior Adaptation, Interaction and Learning for Assistive Robotics, August 2016 (misc)

Abstract
In this full workshop paper, we discuss the positive impacts of robot, music, and imitation therapies on children with autism. We also discuss the use of Laban Motion Analysis (LMA) to identify emotion through movement and posture cues. We present our preliminary studies of the "Five Senses" game that our two robots, Romo the penguin and Darwin Mini, partake in. Using an LMA-focused approach (enabled by our skeletal tracking Kinect algorithm), we find that our participants show increased frequency of movement and speed when the game has a musical accompaniment. Therefore, participants may have increased engagement with our robots and game if music is present. We also begin exploring motion learning for future works.

hi

link (url) [BibTex]

link (url) [BibTex]


Thumb xl screen shot 2016 07 25 at 13.52.05
Non-parametric Models for Structured Data and Applications to Human Bodies and Natural Scenes

Lehrmann, A.

ETH Zurich, July 2016 (phdthesis)

Abstract
The purpose of this thesis is the study of non-parametric models for structured data and their fields of application in computer vision. We aim at the development of context-sensitive architectures which are both expressive and efficient. Our focus is on directed graphical models, in particular Bayesian networks, where we combine the flexibility of non-parametric local distributions with the efficiency of a global topology with bounded treewidth. A bound on the treewidth is obtained by either constraining the maximum indegree of the underlying graph structure or by introducing determinism. The non-parametric distributions in the nodes of the graph are given by decision trees or kernel density estimators. The information flow implied by specific network topologies, especially the resultant (conditional) independencies, allows for a natural integration and control of contextual information. We distinguish between three different types of context: static, dynamic, and semantic. In four different approaches we propose models which exhibit varying combinations of these contextual properties and allow modeling of structured data in space, time, and hierarchies derived thereof. The generative character of the presented models enables a direct synthesis of plausible hypotheses. Extensive experiments validate the developed models in two application scenarios which are of particular interest in computer vision: human bodies and natural scenes. In the practical sections of this work we discuss both areas from different angles and show applications of our models to human pose, motion, and segmentation as well as object categorization and localization. Here, we benefit from the availability of modern datasets of unprecedented size and diversity. Comparisons to traditional approaches and state-of-the-art research on the basis of well-established evaluation criteria allows the objective assessment of our contributions.

ps

pdf [BibTex]


no image
Reproducing a Laser Pointer Dot on a Secondary Projected Screen

Hu, S., Kuchenbecker, K. J.

In Proceedings of the IEEE International Conference on Advanced Intelligent Mechatronics (AIM), pages: 1645-1650, 2016, Oral presentation given by Hu (inproceedings)

hi

[BibTex]

[BibTex]


Thumb xl cover
Dynamic baseline stereo vision-based cooperative target tracking

Ahmad, A., Ruff, E., Bülthoff, H.

19th International Conference on Information Fusion, pages: 1728-1734, July 2016 (conference)

Abstract
In this article we present a new method for multi-robot cooperative target tracking based on dynamic baseline stereo vision. The core novelty of our approach includes a computationally light-weight scheme to compute the 3D stereo measurements that exactly satisfy the epipolar constraints and a covariance intersection (CI)-based method to fuse the 3D measurements obtained by each individual robot. Using CI we are able to systematically integrate the robot localization uncertainties as well as the uncertainties in the measurements generated by the monocular camera images from each individual robot into the resulting stereo measurements. Through an extensive set of simulation and real robot results we show the robustness and accuracy of our approach with respect to ground truth. The source code related to this article is publicly accessible on our website and the datasets are available on request.

ps

DOI [BibTex]

DOI [BibTex]


Thumb xl webteaser
Body Talk: Crowdshaping Realistic 3D Avatars with Words

Streuber, S., Quiros-Ramirez, M. A., Hill, M. Q., Hahn, C. A., Zuffi, S., O’Toole, A., Black, M. J.

ACM Trans. Graph. (Proc. SIGGRAPH), 35(4):54:1-54:14, July 2016 (article)

Abstract
Realistic, metrically accurate, 3D human avatars are useful for games, shopping, virtual reality, and health applications. Such avatars are not in wide use because solutions for creating them from high-end scanners, low-cost range cameras, and tailoring measurements all have limitations. Here we propose a simple solution and show that it is surprisingly accurate. We use crowdsourcing to generate attribute ratings of 3D body shapes corresponding to standard linguistic descriptions of 3D shape. We then learn a linear function relating these ratings to 3D human shape parameters. Given an image of a new body, we again turn to the crowd for ratings of the body shape. The collection of linguistic ratings of a photograph provides remarkably strong constraints on the metric 3D shape. We call the process crowdshaping and show that our Body Talk system produces shapes that are perceptually indistinguishable from bodies created from high-resolution scans and that the metric accuracy is sufficient for many tasks. This makes body “scanning” practical without a scanner, opening up new applications including database search, visualization, and extracting avatars from books.

ps

pdf web tool video talk (ppt) [BibTex]

pdf web tool video talk (ppt) [BibTex]


Thumb xl teaser
DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation

Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P., Schiele, B.

In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 4929-4937, IEEE, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2016 (inproceedings)

Abstract
This paper considers the task of articulated human pose estimation of multiple people in real-world images. We propose an approach that jointly solves the tasks of detection and pose estimation: it infers the number of persons in a scene, identifies occluded body parts, and disambiguates body parts between people in close proximity of each other. This joint formulation is in contrast to previous strategies, that address the problem by first detecting people and subsequently estimating their body pose. We propose a partitioning and labeling formulation of a set of body-part hypotheses generated with CNN-based part detectors. Our formulation, an instance of an integer linear program, implicitly performs non-maximum suppression on the set of part candidates and groups them to form configurations of body parts respecting geometric and appearance constraints. Experiments on four different datasets demonstrate state-of-the-art results for both single person and multi person pose estimation.

ps

code pdf supplementary DOI Project Page [BibTex]

code pdf supplementary DOI Project Page [BibTex]


Thumb xl tsaiteaser
Video segmentation via object flow

Tsai, Y., Yang, M., Black, M. J.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2016 (inproceedings)

Abstract
Video object segmentation is challenging due to fast moving objects, deforming shapes, and cluttered backgrounds. Optical flow can be used to propagate an object segmentation over time but, unfortunately, flow is often inaccurate, particularly around object boundaries. Such boundaries are precisely where we want our segmentation to be accurate. To obtain accurate segmentation across time, we propose an efficient algorithm that considers video segmentation and optical flow estimation simultaneously. For video segmentation, we formulate a principled, multiscale, spatio-temporal objective function that uses optical flow to propagate information between frames. For optical flow estimation, particularly at object boundaries, we compute the flow independently in the segmented regions and recompose the results. We call the process object flow and demonstrate the effectiveness of jointly optimizing optical flow and video segmentation using an iterative scheme. Experiments on the SegTrack v2 and Youtube-Objects datasets show that the proposed algorithm performs favorably against the other state-of-the-art methods.

ps

pdf [BibTex]

pdf [BibTex]


Thumb xl capital
Patches, Planes and Probabilities: A Non-local Prior for Volumetric 3D Reconstruction

Ulusoy, A. O., Black, M. J., Geiger, A.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2016 (inproceedings)

Abstract
In this paper, we propose a non-local structured prior for volumetric multi-view 3D reconstruction. Towards this goal, we present a novel Markov random field model based on ray potentials in which assumptions about large 3D surface patches such as planarity or Manhattan world constraints can be efficiently encoded as probabilistic priors. We further derive an inference algorithm that reasons jointly about voxels, pixels and image segments, and estimates marginal distributions of appearance, occupancy, depth, normals and planarity. Key to tractable inference is a novel hybrid representation that spans both voxel and pixel space and that integrates non-local information from 2D image segmentations in a principled way. We compare our non-local prior to commonly employed local smoothness assumptions and a variety of state-of-the-art volumetric reconstruction baselines on challenging outdoor scenes with textureless and reflective surfaces. Our experiments indicate that regularizing over larger distances has the potential to resolve ambiguities where local regularizers fail.

avg ps

YouTube pdf poster suppmat Project Page [BibTex]

YouTube pdf poster suppmat Project Page [BibTex]


Thumb xl ijcv tumb
Capturing Hands in Action using Discriminative Salient Points and Physics Simulation

Tzionas, D., Ballan, L., Srikantha, A., Aponte, P., Pollefeys, M., Gall, J.

International Journal of Computer Vision (IJCV), 118(2):172-193, June 2016 (article)

Abstract
Hand motion capture is a popular research field, recently gaining more attention due to the ubiquity of RGB-D sensors. However, even most recent approaches focus on the case of a single isolated hand. In this work, we focus on hands that interact with other hands or objects and present a framework that successfully captures motion in such interaction scenarios for both rigid and articulated objects. Our framework combines a generative model with discriminatively trained salient points to achieve a low tracking error and with collision detection and physics simulation to achieve physically plausible estimates even in case of occlusions and missing visual data. Since all components are unified in a single objective function which is almost everywhere differentiable, it can be optimized with standard optimization techniques. Our approach works for monocular RGB-D sequences as well as setups with multiple synchronized RGB cameras. For a qualitative and quantitative evaluation, we captured 29 sequences with a large variety of interactions and up to 150 degrees of freedom.

ps

Website pdf link (url) DOI Project Page [BibTex]

Website pdf link (url) DOI Project Page [BibTex]


Thumb xl header
Optical Flow with Semantic Segmentation and Localized Layers

Sevilla-Lara, L., Sun, D., Jampani, V., Black, M. J.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 3889-3898, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2016 (inproceedings)

Abstract
Existing optical flow methods make generic, spatially homogeneous, assumptions about the spatial structure of the flow. In reality, optical flow varies across an image depending on object class. Simply put, different objects move differently. Here we exploit recent advances in static semantic scene segmentation to segment the image into objects of different types. We define different models of image motion in these regions depending on the type of object. For example, we model the motion on roads with homographies, vegetation with spatially smooth flow, and independently moving objects like cars and planes with affine motion plus deviations. We then pose the flow estimation problem using a novel formulation of localized layers, which addresses limitations of traditional layered models for dealing with complex scene motion. Our semantic flow method achieves the lowest error of any published monocular method in the KITTI-2015 flow benchmark and produces qualitatively better flow and segmentation than recent top methods on a wide range of natural videos.

ps

video Kitti Precomputed Data (1.6GB) pdf YouTube Sequences Code Project Page Project Page [BibTex]

video Kitti Precomputed Data (1.6GB) pdf YouTube Sequences Code Project Page Project Page [BibTex]


Thumb xl tes cvpr16 bilateral
Learning Sparse High Dimensional Filters: Image Filtering, Dense CRFs and Bilateral Neural Networks

Jampani, V., Kiefel, M., Gehler, P. V.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 4452-4461, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2016 (inproceedings)

Abstract
Bilateral filters have wide spread use due to their edge-preserving properties. The common use case is to manually choose a parametric filter type, usually a Gaussian filter. In this paper, we will generalize the parametrization and in particular derive a gradient descent algorithm so the filter parameters can be learned from data. This derivation allows to learn high dimensional linear filters that operate in sparsely populated feature spaces. We build on the permutohedral lattice construction for efficient filtering. The ability to learn more general forms of high-dimensional filters can be used in several diverse applications. First, we demonstrate the use in applications where single filter applications are desired for runtime reasons. Further, we show how this algorithm can be used to learn the pairwise potentials in densely connected conditional random fields and apply these to different image segmentation tasks. Finally, we introduce layers of bilateral filters in CNNs and propose bilateral neural networks for the use of high-dimensional sparse data. This view provides new ways to encode model structure into network architectures. A diverse set of experiments empirically validates the usage of general forms of filters.

ps

project page code CVF open-access pdf supplementary poster Project Page Project Page [BibTex]

project page code CVF open-access pdf supplementary poster Project Page Project Page [BibTex]


no image
Design and evaluation of a novel mechanical device to improve hemiparetic gait: a case report

Fjeld, K., Hu, S., Kuchenbecker, K. J., Vasudevan, E. V.

Extended abstract presented at the Biomechanics and Neural Control of Movement Conference (BANCOM), 2016, Poster presentation given by Fjeld (misc)

hi

Project Page [BibTex]

Project Page [BibTex]


Thumb xl futeaser
Occlusion boundary detection via deep exploration of context

Fu, H., Wang, C., Tao, D., Black, M. J.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2016 (inproceedings)

Abstract
Occlusion boundaries contain rich perceptual information about the underlying scene structure. They also provide important cues in many visual perception tasks such as scene understanding, object recognition, and segmentation. In this paper, we improve occlusion boundary detection via enhanced exploration of contextual information (e.g., local structural boundary patterns, observations from surrounding regions, and temporal context), and in doing so develop a novel approach based on convolutional neural networks (CNNs) and conditional random fields (CRFs). Experimental results demonstrate that our detector significantly outperforms the state-of-the-art (e.g., improving the F-measure from 0.62 to 0.71 on the commonly used CMU benchmark). Last but not least, we empirically assess the roles of several important components of the proposed detector, so as to validate the rationale behind this approach.

ps

pdf [BibTex]

pdf [BibTex]


Thumb xl jun teaser
Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer

Xie, J., Kiefel, M., Sun, M., Geiger, A.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2016 (inproceedings)

Abstract
Semantic annotations are vital for training models for object recognition, semantic segmentation or scene understanding. Unfortunately, pixelwise annotation of images at very large scale is labor-intensive and only little labeled data is available, particularly at instance level and for street scenes. In this paper, we propose to tackle this problem by lifting the semantic instance labeling task from 2D into 3D. Given reconstructions from stereo or laser data, we annotate static 3D scene elements with rough bounding primitives and develop a probabilistic model which transfers this information into the image domain. We leverage our method to obtain 2D labels for a novel suburban video dataset which we have collected, resulting in 400k semantic and instance image annotations. A comparison of our method to state-of-the-art label transfer baselines reveals that 3D information enables more efficient annotation while at the same time resulting in improved accuracy and time-coherent labels.

avg ps

pdf suppmat Project Page Project Page [BibTex]

pdf suppmat Project Page Project Page [BibTex]


no image
Deep Learning for Tactile Understanding From Visual and Haptic Data

Gao, Y., Hendricks, L. A., Kuchenbecker, K. J., Darrell, T.

In Proceedings of the IEEE International Conference on Robotics and Automation, pages: 536-543, May 2016, Oral presentation given by Gao (inproceedings)

hi

[BibTex]

[BibTex]


no image
Robust Tactile Perception of Artificial Tumors Using Pairwise Comparisons of Sensor Array Readings

Hui, J. C. T., Block, A. E., Taylor, C. J., Kuchenbecker, K. J.

In Proceedings of the IEEE Haptics Symposium, pages: 305-312, Philadelphia, Pennsylvania, USA, April 2016, Oral presentation given by Hui (inproceedings)

hi

[BibTex]

[BibTex]


no image
Data-Driven Comparison of Four Cutaneous Displays for Pinching Palpation in Robotic Surgery

Brown, J. D., Ibrahim, M., Chase, E. D. Z., Pacchierotti, C., Kuchenbecker, K. J.

In Proceedings of the IEEE Haptics Symposium, pages: 147-154, Philadelphia, Pennsylvania, USA, April 2016, Oral presentation given by Brown (inproceedings)

hi

[BibTex]

[BibTex]


Thumb xl romo breakdown
Multisensory Robotic Therapy through Motion Capture and Imitation for Children with ASD

Burns, R., Nizambad, S., Park, C. H., Jeon, M., Howard, A.

Proceedings of the American Society of Engineering Education, Mid-Atlantic Section, Spring Conference, April 2016 (conference)

Abstract
It is known that children with autism have difficulty with emotional communication. As the population of children with autism increases, it is crucial we create effective therapeutic programs that will improve their communication skills. We present an interactive robotic system that delivers emotional and social behaviors for multi­sensory therapy for children with autism spectrum disorders. Our framework includes emotion­-based robotic gestures and facial expressions, as well as tracking and understanding the child’s responses through Kinect motion capture.

hi

link (url) [BibTex]

link (url) [BibTex]


no image
Design and Implementation of a Visuo-Haptic Data Acquisition System for Robotic Learning of Surface Properties

Burka, A., Hu, S., Helgeson, S., Krishnan, S., Gao, Y., Hendricks, L. A., Darrell, T., Kuchenbecker, K. J.

In Proceedings of the IEEE Haptics Symposium, pages: 350-352, April 2016, Work-in-progress paper. Poster presentation given by Burka (inproceedings)

hi

Project Page [BibTex]

Project Page [BibTex]


no image
Objective assessment of robotic surgical skill using instrument contact vibrations

Gomez, E. D., Aggarwal, R., McMahan, W., Bark, K., Kuchenbecker, K. J.

Surgical Endoscopy, 30(4):1419-1431, 2016 (article)

hi

[BibTex]

[BibTex]


no image
One Sensor, Three Displays: A Comparison of Tactile Rendering from a BioTac Sensor

Brown, J. D., Ibrahim, M., Chase, E. D. Z., Pacchierotti, C., Kuchenbecker, K. J.

Hands-on demonstration presented at IEEE Haptics Symposium, Philadelphia, Pennsylvania, USA, April 2016 (misc)

hi

[BibTex]

[BibTex]


Thumb xl angry romo
Multisensory robotic therapy to promote natural emotional interaction for children with ASD

Burns, R., Azzi, P., Spadafora, M., Park, C. H., Jeon, M., Kim, H. J., Lee, J., Raihan, K., Howard, A.

Proceedings of the Eleventh ACM/IEEE International Conference on Human Robot Interaction (HRI), pages: 571-571, March 2016 (conference)

Abstract
In this video submission, we are introduced to two robots, Romo the penguin and Darwin Mini. We have programmed these robots to perform a variety of emotions through facial expression and body language, respectively. We aim to use these robots with children with autism, to demo safe emotional and social responses in various sensory situations.

hi

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Thumb xl interactive
Interactive Robotic Framework for Multi-Sensory Therapy for Children with Autism Spectrum Disorder

Burns, R., Park, C. H., Kim, H. J., Lee, J., Rennie, A., Jeon, M., Howard, A.

In Proceedings of the Eleventh ACM/IEEE International Conference on Human Robot Interaction (HRI), pages: 421-422, March 2016 (inproceedings)

Abstract
In this abstract, we present the overarching goal of our interactive robotic framework - to teach emotional and social behavior to children with autism spectrum disorders via multi-sensory therapy. We introduce our robot characters, Romo and Darwin Mini, and the "Five Senses" scenario they will undergo. This sensory game will develop the children's interest, and will model safe and appropriate reactions to typical sensory overload stimuli.

hi

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Thumb xl appealingavatarsbig
Appealing female avatars from 3D body scans: Perceptual effects of stylization

Fleming, R., Mohler, B., Romero, J., Black, M. J., Breidt, M.

In 11th Int. Conf. on Computer Graphics Theory and Applications (GRAPP), Febuary 2016 (inproceedings)

Abstract
Advances in 3D scanning technology allow us to create realistic virtual avatars from full body 3D scan data. However, negative reactions to some realistic computer generated humans suggest that this approach might not always provide the most appealing results. Using styles derived from existing popular character designs, we present a novel automatic stylization technique for body shape and colour information based on a statistical 3D model of human bodies. We investigate whether such stylized body shapes result in increased perceived appeal with two different experiments: One focuses on body shape alone, the other investigates the additional role of surface colour and lighting. Our results consistently show that the most appealing avatar is a partially stylized one. Importantly, avatars with high stylization or no stylization at all were rated to have the least appeal. The inclusion of colour information and improvements to render quality had no significant effect on the overall perceived appeal of the avatars, and we observe that the body shape primarily drives the change in appeal ratings. For body scans with colour information, we found that a partially stylized avatar was most effective, increasing average appeal ratings by approximately 34%.

ps

pdf Project Page [BibTex]

pdf Project Page [BibTex]


no image
Cutaneous Feedback of Fingertip Deformation and Vibration for Palpation in Robotic Surgery

Pacchierotti, C., Prattichizzo, D., Kuchenbecker, K. J.

IEEE Transactions on Biomedical Engineering, 63(2):278-287, February 2016 (article)

hi

[BibTex]

[BibTex]


no image
Structure modulated electrostatic deformable mirror for focus and geometry control

Nam, S., Park, S., Yun, S., Park, B., Park, S. K., Kyung, K.

Optics Express, 24(1):55-66, OSA, January 2016 (article)

Abstract
We suggest a way to electrostatically control deformed geometry of an electrostatic deformable mirror (EDM) based on geometric modulation of a basement. The EDM is composed of a metal coated elastomeric membrane (active mirror) and a polymeric basement with electrode (ground). When an electrical voltage is applied across the components, the active mirror deforms toward the stationary basement responding to electrostatic attraction force in an air gap. Since the differentiated gap distance can induce change in electrostatic force distribution between the active mirror and the basement, the EDMs are capable of controlling deformed geometry of the active mirror with different basement structures (concave, flat, and protrusive). The modulation of the deformed geometry leads to significant change in the range of the focal length of the EDMs. Even under dynamic operations, the EDM shows fairly consistent and large deformation enough to change focal length in a wide frequency range (1~175 Hz). The geometric modulation of the active mirror with dynamic focus tunability can allow the EDM to be an active mirror lens for optical zoom devices as well as an optical component controlling field of view.

hi

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Thumb xl teaser web
Human Pose Estimation from Video and IMUs

Marcard, T. V., Pons-Moll, G., Rosenhahn, B.

Transactions on Pattern Analysis and Machine Intelligence PAMI, 38(8):1533-1547, January 2016 (article)

ps

data pdf dataset_documentation [BibTex]

data pdf dataset_documentation [BibTex]


Thumb xl teaser
Deep Discrete Flow

Güney, F., Geiger, A.

Asian Conference on Computer Vision (ACCV), 2016 (conference) Accepted

avg ps

pdf suppmat Project Page [BibTex]

pdf suppmat Project Page [BibTex]


Thumb xl both testbed cropped
Moving-horizon Nonlinear Least Squares-based Multirobot Cooperative Perception

Ahmad, A., Bülthoff, H.

Robotics and Autonomous Systems, 83, pages: 275-286, 2016 (article)

Abstract
In this article we present an online estimator for multirobot cooperative localization and target tracking based on nonlinear least squares minimization. Our method not only makes the rigorous optimization-based approach applicable online but also allows the estimator to be stable and convergent. We do so by employing a moving horizon technique to nonlinear least squares minimization and a novel design of the arrival cost function that ensures stability and convergence of the estimator. Through an extensive set of real robot experiments, we demonstrate the robustness of our method as well as the optimality of the arrival cost function. The experiments include comparisons of our method with i) an extended Kalman filter-based online-estimator and ii) an offline-estimator based on full-trajectory nonlinear least squares.

ps

DOI Project Page [BibTex]

DOI Project Page [BibTex]


no image
Designing Human-Robot Exercise Games for Baxter

Fitter, N. T., Hawkes, D. T., Johnson, M. J., Kuchenbecker, K. J.

2016, Late-breaking results report presented at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (misc)

hi

Project Page [BibTex]

Project Page [BibTex]


no image
Psychophysical Power Optimization of Friction Modulation for Tactile Interfaces

Sednaoui, T., Vezzoli, E., Gueorguiev, D., Amberg, M., Chappaz, C., Lemaire-Semail, B.

In Haptics: Perception, Devices, Control, and Applications, pages: 354-362, Springer International Publishing, Cham, 2016 (inproceedings)

Abstract
Ultrasonic vibration and electrovibration can modulate the friction between a surface and a sliding finger. The power consumption of these devices is critical to their integration in modern mobile devices such as smartphones. This paper presents a simple control solution to reduce up to 68.8 {\%} this power consumption by taking advantage of the human perception limits.

hi

[BibTex]

[BibTex]


Thumb xl screen shot 2018 05 04 at 11.40.29
Effect of Waveform in Haptic Perception of Electrovibration on Touchscreens

Vardar, Y., Güçlü, B., Basdogan, C.

In Haptics: Perception, Devices, Control, and Applications, pages: 190-203, Springer International Publishing, Cham, 2016 (inproceedings)

Abstract
The perceived intensity of electrovibration can be altered by modulating the amplitude, frequency, and waveform of the input voltage signal applied to the conductive layer of a touchscreen. Even though the effect of the first two has been already investigated for sinusoidal signals, we are not aware of any detailed study investigating the effect of the waveform on our haptic perception in the domain of electrovibration. This paper investigates how input voltage waveform affects our haptic perception of electrovibration on touchscreens. We conducted absolute detection experiments using square wave and sinusoidal input signals at seven fundamental frequencies (15, 30, 60, 120, 240, 480 and 1920 Hz). Experimental results depicted the well-known U-shaped tactile sensitivity across frequencies. However, the sensory thresholds were lower for the square wave than the sinusoidal wave at fundamental frequencies less than 60 Hz while they were similar at higher frequencies. Using an equivalent circuit model of a finger-touchscreen system, we show that the sensation difference between the waveforms at low fundamental frequencies can be explained by frequency-dependent electrical properties of human skin and the differential sensitivity of mechanoreceptor channels to individual frequency components in the electrostatic force. As a matter of fact, when the electrostatic force waveforms are analyzed in the frequency domain based on human vibrotactile sensitivity data from the literature [15], the electrovibration stimuli caused by square-wave input signals at all the tested frequencies in this study are found to be detected by the Pacinian psychophysical channel.

hi

vardar_eurohaptics_2016 [BibTex]

vardar_eurohaptics_2016 [BibTex]


Thumb xl sabteaser
Perceiving Systems (2011-2015)
Scientific Advisory Board Report, 2016 (misc)

ps

pdf [BibTex]

pdf [BibTex]


Thumb xl siyong
Shape estimation of subcutaneous adipose tissue using an articulated statistical shape model

Yeo, S. Y., Romero, J., Loper, M., Machann, J., Black, M.

Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, 0(0):1-8, 2016 (article)

ps

publisher website preprint pdf link (url) DOI Project Page [BibTex]

publisher website preprint pdf link (url) DOI Project Page [BibTex]


Thumb xl siyu eccvw
Multi-Person Tracking by Multicuts and Deep Matching

(Winner of the Multi-Object Tracking Challenge ECCV 2016)

Tang, S., Andres, B., Andriluka, M., Schiele, B.

ECCV Workshop on Benchmarking Mutliple Object Tracking, 2016 (conference)

ps

PDF [BibTex]

PDF [BibTex]


no image
Peripheral vs. central determinants of vibrotactile adaptation

Klöcker, A., Gueorguiev, D., Thonnard, J. L., Mouraux, A.

Journal of Neurophysiology, 115(2):685-691, 2016, PMID: 26581868 (article)

Abstract
Long-lasting mechanical vibrations applied to the skin induce a reversible decrease in the perception of vibration at the stimulated skin site. This phenomenon of vibrotactile adaptation has been studied extensively, yet there is still no clear consensus on the mechanisms leading to vibrotactile adaptation. In particular, the respective contributions of 1) changes affecting mechanical skin impedance, 2) peripheral processes, and 3) central processes are largely unknown. Here we used direct electrical stimulation of nerve fibers to bypass mechanical transduction processes and thereby explore the possible contribution of central vs. peripheral processes to vibrotactile adaptation. Three experiments were conducted. In the first, adaptation was induced with mechanical vibration of the fingertip (51- or 251-Hz vibration delivered for 8 min, at 40× detection threshold). In the second, we attempted to induce adaptation with transcutaneous electrical stimulation of the median nerve (51- or 251-Hz constant-current pulses delivered for 8 min, at 1.5× detection threshold). Vibrotactile detection thresholds were measured before and after adaptation. Mechanical stimulation induced a clear increase of vibrotactile detection thresholds. In contrast, thresholds were unaffected by electrical stimulation. In the third experiment, we assessed the effect of mechanical adaptation on the detection thresholds to transcutaneous electrical nerve stimuli, measured before and after adaptation. Electrical detection thresholds were unaffected by the mechanical adaptation. Taken together, our results suggest that vibrotactile adaptation is predominantly the consequence of peripheral mechanoreceptor processes and/or changes in biomechanical properties of the skin.

hi

link (url) DOI [BibTex]

link (url) DOI [BibTex]