The past few years with the advent of Deep Convolutional Neural Networks (DCNNs), as well as the availability of visual data it was shown that it is possible to produce excellent results in very challenging tasks, such as visual object recognition, detection, tracking etc. Nevertheless, in certain tasks such as fine-grain object recognition (e.g., face recognition) it is very difficult to collect the amount of data that are needed. In this talk, I will show how, using DCNNs, we can generate highly realistic faces and heads and use them for training algorithms such as face and facial expression recognition. Next, I will reverse the problem and demonstrate how by having trained a very powerful face recognition network it can be used to perform very accurate 3D shape and texture reconstruction of faces from a single image. Finally, I will demonstrate how to create very lightweight networks for representing 3D face texture and shape structure by capitalising upon intrinsic mesh convolutions.
Organizers: Dimitris Tzionas
In this talk, I will present my understanding on 3D face reconstruction, modelling and applications from a deep learning perspective. In the first part of my talk, I will discuss the relationship between representations (point clouds, meshes, etc) and network layers (CNN, GCN, etc) on face reconstruction task, then present my ECCV work PRN which proposed a new representation to help achieve state-of-the-art performance on face reconstruction and dense alignment tasks. I will also introduce my open source project face3d that provides examples for generating different 3D face representations. In the second part of the talk, I will talk some publications in integrating 3D techniques into deep networks, then introduce my upcoming work which implements this. In the third part, I will present how related tasks could promote each other in deep learning, including face recognition for face reconstruction task and face reconstruction for face anti-spoofing task. Finally, with such understanding of these three parts, I will present my plans on 3D face modelling and applications.
Organizers: Timo Bolkart
Much existing work in reinforcement learning involves environments that are either intentionally neutral, lacking a role for cooperation and competition, or intentionally simple, when agents need imagine nothing more than that they are playing versions of themselves. Richer game theoretic notions become important as these constraints are relaxed. For humans, this encompasses issues that concern utility, such as envy and guilt, and that concern inference, such as recursive modeling of other players, I will discuss studies treating a paradigmatic game of trust as an interactive partially-observable Markov decision process, and will illustrate the solution concepts with evidence from interactions between various groups of subjects, including those diagnosed with borderline and anti-social personality disorders.
Optic flow offers a rich source of information about an organism’s environment. Flies, for instance, are thought to make use of motion vision to control and stabilise their course during acrobatic airborne manoeuvres. How these computations are implemented in neural hardware and how such circuits cope with the visual complexity of natural scenes, however, remain open questions. This talk outlines some of the progress we have made in unraveling the computational substrate underlying optic flow processing in Drosophila. In particular, I will focus on our efforts to connect neural mechanisms and real-world demands via task-driven modelling.
Organizers: Michel Besserve
Minimally invasive approaches to the treatment of vascular diseases are constantly evolving. These diseases are among the most prevalent medical problems today including stroke, myocardial infarction, pulmonary emboli, hemorrhage and aneurysms. I will review current approaches to vascular embolization and thrombosis, the challenges they pose and the limitations of current devices and end with patient inspired engineering approaches to the treatment of these conditions.
Organizers: Metin Sitti
The tongue plays a vital part in everyday life where we use it extensively during speech production. Due to this importance, we want to derive a parametric shape model of the tongue. This model enables us to reconstruct the full tongue shape from a sparse set of points, like for example motion capture data. Moreover, we can use such a model in simulations of the vocal tract to perform articulatory speech synthesis or to create animated virtual avatars. In my talk, I describe a framework for deriving such a model from MRI scans of the vocal tract. In particular, this framework uses image denoising and segmentation methods to produce a point cloud approximating the vocal tract surface. In this context, I will also discuss how palatal contacts of the tongue can be handled, i.e., situations where the tongue touches the palate and thus no tongue boundary is visible. Afterwards, template matching is used to derive a mesh representation of the tongue from this cloud. The acquired meshes are finally used to construct a multilinear model.
Organizers: Timo Bolkart
The early Calculus of Newton and Leibniz made heavy use of infinitesimal quantities and flourished for over a hundred years until it was superseded by the more rigorous epsilon-delta formalism. It took until the 1950's for A. Robinson to find a proper way to construct a number system containing actual infinitesimals -- the Hyperreals *|R. This talk outlines their construction and possible applications in modern analysis.
Organizers: Philipp Hennig
This talk will focus on three topics of my research at Yale University, which centers on themes of human and robotic manipulation and haptic perception. My major research undertaking at Yale has involved running a quantitative study of daily upper-limb prosthesis use in unilateral amputees. This work aims to better understand the techniques employed by long-term users of artificial arms and hands in order to inform future prosthetic device design and therapeutic interventions. While past attempts to quantify prosthesis-use have implemented either behavioral questionnaires or observations of specific tasks in a structured laboratory settings, our approach involves participants completing many hours of self-selected household chores in their own homes while wearing a head mounted video camera. I will discuss how we have addressed the processing of such a large and unstructured data set, in addition to our current findings. Complementary to my work in prosthetics, I will also discuss my work on several novel robotic grippers which aim to enhance the grasping, manipulation and object identification capabilities of robotic systems. These grippers implement underactuated designs, machine learning approaches or variable friction surfaces to provide low-cost, model-free and easily reproducible solutions to what have been traditionally been considered complex problems in robotic manipulation, i.e. stable grasp acquisition, fast tactile object recognition and within-hand object manipulation. Finally, I will present a brief overview of my efforts designing and testing shape-changing haptic interfaces, a largely unexplored feedback modality that I believe has huge potential for discretely communicating information to people with and without sensory impairments. This technology has been implemented in a pedestrian navigation system and evaluated in a variety of scenarios, including a large scale immersive theatre production with visually impaired artistic collaborators and almost 100 participants.
Organizers: Katherine Kuchenbecker
Already starting at birth, humans integrate information from several sensory modalities in order to form a representation of the environment - such as when a baby explores, manipulates, and interacts with objects. The combination of visual and touch information is one of the most fundamental sensory integration processes, as touch information (such as body-relative size, shape, texture, material, temperature, and weight) can easily be linked to the visual image, thereby providing a grounding for later visual-only recognition. Previous research on such integration processes has so far mainly focused on low-level object properties (such as curvature, or surface granularity) such that little is known on how the human actually forms a high-level multisensory representation of objects. Here, I will review research from our lab that investigates how the human brain processes shape using input from vision and touch. Using a large variety of novel, 3D-printed shapes we were able to show that touch is actually equally good at shape processing than vision, suggesting a common, multisensory representation of shape. We next conducted a series of imaging experiments (using anatomical, functional, and white-matter analyses) that chart the brain networks that process this shape representation. I will conclude the talk with a brief medley of other haptics-related research in the lab, including robot learning, braille, and haptic face recognition.
Organizers: Katherine Kuchenbecker
Background: Pre-pregnancy obesity and inadequate maternal weight gain during pregnancy can lead to adverse effects in the newborn but also to metabolic, cardiovascular and even neurological diseases in older ages of the offspring. Heart activity can be used as a proxy for the activity of the autonomic nervous system (ANS). The aim of this study is to evaluate the effect of pre-pregnancy weight, maternal weight gain and maternal metabolism on the ANS of the fetus in healthy pregnancies.
Organizers: Katherine Kuchenbecker
The emergence of multi-view capture systems has yield a tremendous amount of video sequences. The task of capturing spatio-temporal models from real world imagery (4D modeling) should arguably benefit from this enormous visual information. In order to achieve highly realistic representations both geometry and appearance need to be modeled in high precision. Yet, even with the great progress of the geometric modeling, the appearance aspect has not been fully explored and visual quality can still be improved. I will explain how we can optimally exploit the redundant visual information of the captured video sequences and provide a temporally coherent, super-resolved, view-independent appearance representation. I will further discuss how to exploit the interdependency of both geometry and appearance as separate modalities to enhance visual perception and finally how to decompose appearance representations into intrinsic components (shading & albedo) and super-resolve them jointly to allow for more realistic renderings.
Organizers: Despoina Paschalidou
Probabilistic modeling is the method of choice when it comes to reasoning under uncertainty. However, one of the main practical downsides of probabilistic models is that inference, i.e. the process of using the model to answer statistical queries, is notoriously hard in general. This led to a common folklore that probabilistic models which allow exact inference are necessarily simplistic and undermodel any practical task. In this talk, I will present sum-product networks (SPNs), a recently proposed architecture representing a rich and expressive class of probability distributions, which also allows exact and efficient computation of many inference tasks. I will discuss representational properties, inference routines and learning approaches in SPNs. Furthermore, I will provide some examples of practical applications using SPNs.
For man-machine interaction it is crucial to develop models of humans that look and move indistinguishably from real humans. Such virtual humans will be key for application areas such as computer vision, medicine and psychology, virtual and augmented reality and special effects in movies. Currently, digital models typically lack realistic soft tissue and clothing or require time-consuming manual editing of physical simulation parameters. Our hypothesis is that better and more realistic models of humans and clothing can be learned directly from real measurements coming from 4D scans, images and depth and inertial sensors. We combine statistical machine learning techniques and physics based simulation to create realistic models from data. We then use such models to extract information out of incomplete and noisy sensor data from monocular video, depth or IMUs. I will give an overview of a selection of projects conducted in Perceiving Systems in which we build realistic models of human pose, shape, soft-tissue and clothing. I will also present some of our recent work on 3D reconstruction of people models from monocular video, real-time fusion and online human body shape estimation from depth data and recovery of human pose in the wild from video and IMUs. I will conclude the talk outlining the next challenges in building digital humans and perceiving them from sensory data.
Organizers: Melanie Feldhofer