Header logo is
Institute Talks

Generating Faces & Heads: Texture, Shape and Beyond.

Talk
  • 17 December 2018 • 11:00 12:00
  • Stefanos Zafeiriou
  • PS Aquarium

The past few years with the advent of Deep Convolutional Neural Networks (DCNNs), as well as the availability of visual data it was shown that it is possible to produce excellent results in very challenging tasks, such as visual object recognition, detection, tracking etc. Nevertheless, in certain tasks such as fine-grain object recognition (e.g., face recognition) it is very difficult to collect the amount of data that are needed. In this talk, I will show how, using DCNNs, we can generate highly realistic faces and heads and use them for training algorithms such as face and facial expression recognition. Next, I will reverse the problem and demonstrate how by having trained a very powerful face recognition network it can be used to perform very accurate 3D shape and texture reconstruction of faces from a single image. Finally, I will demonstrate how to create very lightweight networks for representing 3D face texture and shape structure by capitalising upon intrinsic mesh convolutions.

Organizers: Dimitris Tzionas

Deep learning on 3D face reconstruction, modelling and applications

Talk
  • 19 December 2018 • 11:00 12:00
  • Yao Feng
  • PS Aquarium

In this talk, I will present my understanding on 3D face reconstruction, modelling and applications from a deep learning perspective. In the first part of my talk, I will discuss the relationship between representations (point clouds, meshes, etc) and network layers (CNN, GCN, etc) on face reconstruction task, then present my ECCV work PRN which proposed a new representation to help achieve state-of-the-art performance on face reconstruction and dense alignment tasks. I will also introduce my open source project face3d that provides examples for generating different 3D face representations. In the second part of the talk, I will talk some publications in integrating 3D techniques into deep networks, then introduce my upcoming work which implements this. In the third part, I will present how related tasks could promote each other in deep learning, including face recognition for face reconstruction task and face reconstruction for face anti-spoofing task. Finally, with such understanding of these three parts, I will present my plans on 3D face modelling and applications.

Organizers: Timo Bolkart

Mind Games

IS Colloquium
  • 21 December 2018 • 11:00 12:00
  • Peter Dayan
  • IS Lecture Hall

Much existing work in reinforcement learning involves environments that are either intentionally neutral, lacking a role for cooperation and competition, or intentionally simple, when agents need imagine nothing more than that they are playing versions of themselves. Richer game theoretic notions become important as these constraints are relaxed. For humans, this encompasses issues that concern utility, such as envy and guilt, and that concern inference, such as recursive modeling of other players, I will discuss studies treating a paradigmatic game of trust as an interactive partially-observable Markov decision process, and will illustrate the solution concepts with evidence from interactions between various groups of subjects, including those diagnosed with borderline and anti-social personality disorders.

TBA

IS Colloquium
  • 28 January 2019 • 11:15 12:15
  • Florian Marquardt

Organizers: Matthias Bauer

Embedded Optimization for Nonlinear Model Predictive Control

IS Colloquium
  • 19 May 2014 • 10:15 11:30
  • Prof. Moritz Diehl
  • Max Planck House Lecture Hall

This talk shows how embedded optimization - i.e. autonomous optimization algorithms receiving data, solving problems, and sending answers continuously - are able to address challenging control problems. When nonlinear differential equation models are used to predict and optimize future system behaviour, one speaks of Nonlinear Model Predictive Control (NMPC).The talk presents experimental applications of NMPC to time and energy optimal control of mechatronic systems and discusses some of the algorithmic tricks that make NMPC optimization rates up to 1 MHz possible. Finally, we present on particular challenging application, tethered flight for airborne wind energy systems.

Organizers: Sebastian Trimpe


Towards Lifelong Learning for Visual Scene Understanding

IS Colloquium
  • 12 May 2014 • 11:15
  • Christoph Lampert
  • Max Planck House Lecture Hall

The goal of lifelong visual learning is to develop techniques that continuously and autonomously learn from visual data, potentially for years or decades. During this time the system should build an ever-improving base of generic visual information, and use it as background knowledge and context for solving specific computer vision tasks. In my talk, I will highlight two recent results from our group on the road towards lifelong visual scene understanding: the derivation of theoretical guarantees for lifelong learning systems and the development of practical methods for object categorization based on semantic attributes.

Organizers: Gerard Pons-Moll


  • Nikolaus Troje
  • MRC Seminar room (0.A.03)

Point-light walkers and stick figures rendered orthographically and without self-occlusion do not contain any information as to their depth. For instance, a frontoparallel projection could depict a walker from the front or from the back. Nevertheless, observers show a strong bias towards seeing the walker as facing the viewer. A related stimulus, the silhouette of a human figure, does not seem to show such a bias. We develop these observations into a tool to study the cause of the facing the viewer bias observed for biological motion displays.

I will give a short overview about existing theories with respect to the facing-the-viewer bias, and about a number of findings that seem hard to explain with any single one of them. I will then present the results of our studies on both stick figures and silhouettes which gave rise to a new theory about the facing the viewer bias, and I will eventually present an experiment that tests a hypothesis resulting from it. The studies are discussed in the context of one of the most general problems the visual system has to solve: How do we disambiguate an initially ambiguous sensory world and eventually arrive at the perception of a stable, predictable "reality"?


Video Segmentation

IS Colloquium
  • 05 May 2014 • 09:15:00
  • Thomas Brox
  • Max Planck House Lecture Hall

Compared to static image segmentation, video segmentation is still in its infancy. Various research groups have different tasks in mind when they talk of video segmentation. For some it is motion segmentation, some think of an over-segmentation with thousands of regions per video, and others understand video segmentation as contour tracking. I will go through what I think are reasonable video segmentation subtasks and will touch the issue of benchmarking. I will also discuss the difference between image and video segmentation. Due to the availability of motion and the redundancy of successive frames, video segmentation should actually be easier than image segmentation. However, recent evidence indicates the opposite: at least at the level of superpixel segmentation, image segmentation methodology is more advanced than what can be found in the video segmentation literature.

Organizers: Gerard Pons-Moll


  • Cordelia Schmid
  • MRC seminar room (0.A.03)

In the first part of our talk, we present an approach for large displacement optical flow. Optical flow computation is a key component in many computer vision systems designed for tasks such as action
detection or activity  recognition. Inspired by the large displacement optical flow of Brox and  Malik, our approach  DeepFlow  combines a novel matching algorithm with a variational approach . Our matching algorithm builds upon a multi-stage architecture interleaving convolutions and max-pooling.  DeepFlow efficiently handles large displacements  occurring in realistic videos, and shows competitive performance on optical flow benchmarks.

In the second part of our talk, we present a state-of-the-art approach  for action recognition based  on motion stabilized trajectory  descriptors and a Fisher vector representation.  We briefly review the recent trajectory-based video features and, then, introduce their motion stabilized version, combining human detection and dominant motion estimation. Fisher vectors summarize the information of a video efficiently. Results on several of the recent action datasets as well as the TrecVid MED dataset show that our approach outperforms the state-of-the-art


  • Jiri Matas
  • Max Planck House Lecture Hall

Computer vision problems often involve optimization of two quantities, one of which is time. Such problems can be formulated as time-constrained optimization or performance-constrained search for the fastest algorithm. We show that it is possible to obtain quasi-optimal time-constrained solutions to some vision problems by applying Wald's theory of sequential decision-making. Wald assumes independence of observation, which is rarely true in computer vision. We address the problem by combining Wald's sequential probability ratio test and AdaBoost. The solution, called the WaldBoost, can be viewed as a principled way to build a close-to-optimal “cascade of classifiers” of the Viola-Jones type. The approach will be demonstrated on four tasks: (i) face detection, (ii) establishing reliable correspondences between image, (iii) real-time detection of interest points and (iv) model search and outlier detection using RANSAC. In the face detection problem, the objective is learning the fastest detector satisfying constraints on false positive and false negative rates. The correspondence pruning addresses the problem of fast selection with a predefined false negative rated. In interest point problem we show how a fast implementation of known detectors can obtained by Waldboost. The “mimicked” detectors provide a training set of positive and negative examples of interest ponts and WaldBoost learns a detector, (significantly) faster than the providers of the training set, formed as a linear combination of efficiently computable feature. In RANSAC, we show how to exploit Wald's test in a randomised model verification procedure to obtain an algorithm significantly faster than deterministic verification yet with equivalent probabilistic guarantees of correctness.

Organizers: Gerard Pons-Moll


Scalable Surface-Based Stereo Matching

Talk
  • 10 April 2014 • 14:00:00
  • Daniel Scharstein
  • MRC seminar room (0.A.03)

Stereo matching -- establishing correspondences between images taken from nearby viewpoints -- is one of the oldest problems in computer vision.  While impressive progress has been made over the last two decades, most current stereo methods do not scale to the high-resolution images taken by today's cameras since they require searching the full space of all possible disparity hypotheses over all pixels.

In this talk I will describe a new scalable stereo method that only evaluates a small portion of the search space.  The method first generates plane hypotheses from matched sparse features, which are then refined into surface hypotheses using local slanted plane sweeps over a narrow disparity range.  Finally, each pixel is assigned to one of the local surface hypotheses. The technique achieves significant speedups over previous algorithms and achieves state-of-the-art accuracy on high-resolution stereo pairs of up to 19 megapixels.

I will also present a new dataset of high-resolution stereo pairs with subpixel-accurate ground truth, and provide a brief outlook on the upcoming new version of the Middlebury stereo benchmark.


  • Simo Särkkä
  • Max Planck House Lecture Hall

Gaussian process regression is a non-parametric Bayesian machine learning paradigm, where instead of estimating parameters of fixed-form functions, we model the whole unknown functions as Gaussian processes. Gaussian processes are also commonly used for representing uncertainties in models of dynamic systems in many applications such as tracking, navigation, and automatic control systems. The latter models are often formulated as state-space models, where the use of non-linear Kalman filter type of methods is common. The aim of this talk is to discuss connections of Kalman filtering methods and Gaussian process regression. In particular, I discuss representations of Gaussian processes as state-space models, which enable the use of computationally efficient Kalman-filter-based (or more general Bayesian-filter-based) solutions to Gaussian process regression problems. This also allows for computationally efficient inference in latent force models (LFM), which are models combining first-principles mechanical models with non-parametric Gaussian process regression models.

Organizers: Philipp Hennig


  • Rainer Dahlhaus
  • Max Planck House Lecture Hall

(joint work with Jan. C. Neddermeyer) A technique for online estimation of spot volatility for high-frequency data is developed. The algorithm works directly on the transaction data and updates the volatility estimate immediately after the occurrence of a new transaction. Furthermore, a nonlinear market microstructure noise model is proposed that reproduces several stylized facts of high frequency data. A computationally efficient particle filter is used that allows for the approximation of the unknown efficient prices and, in combination with a recursive EM algorithm, for the estimation of the volatility curve. We neither assume that the transaction times are equidistant nor do we use interpolated prices. We also make a distinction between volatility per time unit and volatility per transaction and provide estimators for both. More precisely we use a model with random time change where spot volatility is decomposed into spot volatility per transaction times the trading intensity - thus highlighting the influence of trading intensity on volatility.

Organizers: Michel Besserve


Simulation in physical scene understanding

IS Colloquium
  • 28 March 2014 • 11:15 12:45
  • Peter Battaglia
  • Max Planck House Lecture Hall

Our ability to understand a scene is central to how we interact with our environment and with each other. Classic research on visual scene perception has focused on how people "know what is where by looking", but this talk will explore people's ability to infer the "hows" and "whys" of their world, and in particular, how they form a physical understanding of a scene. From a glance we can know so much: not only what objects are where, but whether they are movable, fragile, slimy, or hot; whether they were made by hand, by machine, or by nature; whether they are broken and how they could be repaired; and so on. I posit that these common-sense physical intuitions are made possible by the brain's sophisticated capacity for constructing and manipulating a rich mental representation of a scene via a mechanism of approximate probabilistic simulation -- in short, a physics engine in the head. I will present a series of recent and ongoing studies that develop and test this computational model in a variety of prediction, inference, and planning tasks. Our model captures various aspects of people's experimental judgments, including the accuracy of their performance as well as several illusions and errors. These results help explain core aspects of human mental models that are instrumental to how we understand and act in our everyday world. They also open new directions for developing robotic and AI systems that can perceive, reason, and act the way people do.

Organizers: Michel Besserve