Header logo is

Institute Talks

Our Recent Research on 3D Deep Learning

Talk
  • 07 August 2020 • 11:00—12:00
  • Vittorio Ferrari

I will present three recent projects within the 3D Deep Learning research line from my team at Google Research: (1) a deep network for reconstructing the 3D shape of multiple objects appearing in a single RGB image (ECCV'20). (2) a new conditioning scheme for normalizing flow models. It enables several applications such as reconstructing an object's 3D point cloud from an image, or the converse problem of rendering an image given a 3D point cloud, both within the same modeling framework (CVPR'20); (3) a neural rendering framework that maps a voxelized object into a high quality image. It renders highly-textured objects and illumination effects such as reflections and shadows realistically. It allows controllable rendering: geometric and appearance modifications in the input are accurately represented in the final rendering (CVPR'20).

Organizers: Yinghao Huang Arjun Chandrasekaran


Functions, Machine Learning, and Game Development

Talk
  • 10 August 2020 • 16:00—17:00
  • Daniel Holden
  • Remote talk on Zoom

Game Development requires a vast array of tools, techniques, and expertise, ranging from game design, artistic content creation, to data management and low level engine programming. Yet all of these domains have one kind of task in common - the transformation of one kind of data into another. Meanwhile, advances in Machine Learning have resulted in a fundamental change in how we think about these kinds of data transformations - allowing for accurate and scalable function approximation, and the ability to train such approximations on virtually unlimited amounts of data. In this talk I will present how these two fundamental changes in Computer Science affect game development - how they can be used to improve game technology as well as the way games are built - and the exciting new possibilities and challenges they bring along the way.

Organizers: Abhinanda Ranjit Punnakkal

  • Yitian Shao
  • remote talk on Zoom

A longstanding goal of engineering has been to realize haptic interfaces that can convey realistic sensations of touch, comparable to signals presented via visual or audio displays. Today, this ideal remains far from realization, due to the difficulty of characterizing and electronically reproducing the complex and dynamic tactile signals that are produced during even the simplest touch interactions. In this talk, I will present my work on capturing whole-hand tactile signals, in the form of mechanical waves, produced during natural hand interactions. I will describe how I characterized the information content in these signals and used the results to guide the design of new electronic devices for distributed tactile feedback.

Organizers: Katherine J. Kuchenbecker


Learning from vision, touch and audition

Talk
  • 28 July 2020 • 15:00—16:30
  • Antonio Torralba
  • Remote talk on Zoom

Babies learn with very little supervision, and, even when supervision is present, it comes in the form of an unknown spoken language that also needs to be learned. How can kids make sense of the world? In this work, I will show that an agent that has access to multimodal data (like vision, audition or touch) can use the correlation between images and sounds to discover objects in the world without supervision. I will show that ambient sounds can be used as a supervisory signal for learning to see and vice versa (the sound of crashing waves, the roar of fast-moving cars – sound conveys important information about the objects in our surroundings). I will describe an approach that learns, by watching videos without annotations, to locate image regions that produce sounds, and to separate the input sounds into a set of components that represents the sound from each pixel. I will also discuss our recent work on capturing tactile information.

Organizers: Arjun Chandrasekaran


  • Artsiom Sanakoyeu
  • Remote talk on Zoom

Learning the embedding space, where semantically similar objects are located close together and dissimilar objects far apart, is a cornerstone of many computer vision applications. Existing approaches usually learn a single metric in the embedding space for all available data points,which may have a very complex non-uniform distribution with different notions of similarity between objects, e.g. appearance, shape, color or semantic meaning. We approach this problem by using the embedding space more efficiently by jointly splitting the embedding space and data into K smaller sub-problems. It divides both, the data and the embedding space into K subsets and learns K separate distance metrics in the non-overlapping subspaces of the embedding space, defined by groups of neurons in the embedding layer of the neural network. In the second part of the talk, we show that, at least for proximal animal classes such as chimpanzees, it is possible to transfer the knowledge existing in dense pose recognition for humans, as well as inmore general object detectors and segmenters, to the problem of dense pose recognition in other classes. We do this by (1) establishing a DensePose model for the new animal which is also geometrically aligned to humans (2) introducing a multi-head R-CNN architecture that facilitates transfer of multiple recognition tasks between classes, (3) finding which combination of known classes can be transferred most effectively to the new animal and (4) using self-calibrated uncertainty heads to generate pseudo-labels graded by quality for training a model for this class.

Organizers: Nikos Athanasiou


  • Dr. Dieter Nees
  • https://global.gotomeeting.com/join/588789885

Roll-to-roll UV nanoimprint lithography (R2R-UV-NIL) gains increasing industrial interest for large area nano- and micro-structuring of flexible substrates because it combines nanometer resolution with many square meter per minute productivity. Small-area masters of functional nano and micro surface structures are readily available by various lithographic techniques like e.g. UV-, e-beam- or interference lithography. However, the upscaling of small-area nano- and micro-structured masters into medium size roller molds – often called shims - for R2R-UV-NIL production still remains a bottleneck in the large area nano-structuring process chain. At JR MATERIALS we have installed a customized EVG 770 UV-NIL-stepper and are developing step-&-repeat UV-NIL processes and materials for the seamless upscaling of small-area masters into polymer shims for our R2R-UV-NIL pilot line with dimensions of up to 270 x 630 mm2. These polymer shims can be used either directly for short to medium R2R-UV-NIL manufacturing runs or get galvano-formed into nickel-shims for real long run production. In this seminar the JR MATERIALS UV-NIL tools, processes and materials as well as a few applications will be presented.


Towards Commodity 3D Scanning for Content Creation

Talk
  • 16 July 2020 • 16:00—17:30
  • Angela Dai

In recent years, commodity 3D sensors have become widely available, spawning significant interest in both offline and real-time 3D reconstruction. While state-of-the-art reconstruction results from commodity RGB-D sensors are visually appealing, they are far from usable in practical computer graphics applications since they do not match the high quality of artist-modeled 3D graphics content. One of the biggest challenges in this context is that obtained 3D scans suffer from occlusions, thus resulting in incomplete 3D models. In this talk, I will present a data-driven approach towards generating high quality 3D models from commodity scan data, and the use of these geometrically complete 3D models towards semantic and texture understanding of real-world environments.

Organizers: Yinghao Huang


  • William T. Freeman

How can we tell that a video is playing backwards? People's motions look wrong when the video is played backwards--can we develop an algorithm to distinguish forward from backward video? Similarly, can we tell if a video is sped-up? We have developed algorithms to distinguish forwards from backwards video, and fast from slow. Training algorithms for these tasks provides a self-supervised task that facilitates human activity recognition. We'll show these results, and applications of these unsupervised video learning tasks, including a method to change the timing of people in videos.

Organizers: Yinghao Huang


Learning Non-rigid Optimization

Talk
  • 10 July 2020 • 15:00—16:00
  • Matthias Nießner
  • Remote talk on Zoom

Applying data-driven approaches to non-rigid 3D reconstruction has been difficult, which we believe can be attributed to the lack of a large-scale training corpus. One recent approach proposes self-supervision based on non-rigid reconstruction. Unfortunately, this method fails for important cases such as highly non-rigid deformations. We first address this problem of lack of data by introducing a novel semi-supervised strategy to obtain dense interframe correspondences from a sparse set of annotations. This way, we obtain a large dataset of 400 scenes, over 390,000 RGB-D frames, and 2,537 densely aligned frame pairs; in addition, we provide a test set along with several metrics for evaluation. Based on this corpus, we introduce a data-driven non-rigid feature matching approach, which we integrate into an optimization-based reconstruction pipeline. Here, we propose a new neural network that operates on RGB-D frames, while maintaining robustness under large non-rigid deformations and producing accurate predictions. Our approach significantly outperforms both existing non-rigid reconstruction methods that do not use learned data terms, as well as learning-based approaches that only use self-supervision.

Organizers: Vassilis Choutas


  • Dushyant Mehta

In our recent work, XNect, we propose a real-time solution for the challenging task of multi-person 3D human pose estimation from a single RGB camera. To achieve real-time performance without compromising on accuracy, our approach relies on a new efficient Convolutional Neural Network architecture, and a multi-staged pose formulation. The CNN architecture is approx. 1.3x faster than ResNet-50, while achieving the same accuracy on various tasks, and the benefits extend beyond inference speed to a much smaller training memory footprint and a much higher training throughput. The proposed pose formulation jointly reasons about all the subjects in the scene, ensuring that pose inference can be done in real time even with a large number of subjects in the scene. The key insight behind the accuracy of the formulation is to split the reasoning about human pose into two distinct stages. The first stage, which is fully convolutional, infers 2D and 3D pose of body parts supported by image evidence, and reasons jointly about all subjects. The second stage, which is a small fully connected network, operates on each individual subject, and uses the context of the visibly body parts and learned pose priors, to infer the 3D pose of the missing body parts. A third stage on top reconciles the 2D and 3D poses per frame and across time, to produce a temporally stable kinematic skeleton. In this talk, we will briefly discuss the proposed Convolutional Neural Network architecture and the possible benefits it might bring to your workflow. The other part of the talk would be on how the pose formulation proposed in this work came to be, what its advantages are, and how it can be extended to other related problems.

Organizers: Yinghao Huang


Machine Learning for Covid-19 Risk Awareness from Contact Tracing

Max Planck Lecture
  • 23 June 2020 • 17:30
  • Yoshua Bengio
  • Virtual Event

The Covid-19 pandemic has spread rapidly worldwide, overwhelming manual contact tracing in many countries, resulting in widespread lockdowns for emergency containment. Large-scale digital contact tracing (DCT) has emerged as a potential solution to resume economic and social activity without triggering a second outbreak. Various DCT methods have been proposed, each making trade-offs between privacy, mobility restriction, and public health. Many approaches model infection and encounters as binary events. With such approaches, called binary contact tracing, once a case is confirmed by a positive lab test result, it is propagated to people who were contacts of the infected person, typically recommending that these individuals should self-quarantine. This approach ignores the inherent uncertainty in contacts and the infection process, which could be used to tailor messaging to high-risk individuals, and prompt proactive testing or earlier self-quarantine. It also does not make use of observations such as symptoms or pre-existing medical conditions, which could be used to make more accurate risk predictions. Methods which may use such information have been proposed, but these typically require access to the graph of social interactions and/or centralization of sensitive personal data, which is incompatible with reasonable privacy and security constraints. We use an agent-based epidemiological simulation to develop and test ML methods that can be deployed to a smartphone to locally predict an individual's risk of infection from their contact history and other information, while respecting strong privacy and security constraints. We use this risk score to provide personalized recommendations to the user via an app, an approach we call probabilistic risk awareness (PRA). We show that PRA can significantly reduce spread of the disease compared to other methods, for equivalent average mobility and realistic assumptions about app adoption, and thereby save lives.

Organizers: Michael Black Bernhard Schölkopf Julia Braun Oliwia Gust


The sound of fermions

Physics Colloquium
  • 16 June 2020 • 16:15—18:15
  • Martin Zwierlein
  • WebEx (https://mpi-is.webex.com/mpi-is/onstage/g.php?MTID=e2189612fea810cac733067ed5b121127)

Fermions, particles with half-integer spin like the electron, proton and neutron, obey the Pauli principle: They cannot share one and the same quantum state. This “anti social” behavior is directly observed in experiments with ultracold gases of fermionic atoms: Pauli blocking in momentum space for a free Fermi gas, and in real space in gases confined to an optical lattice. When fermions interact, new, rather “social” behavior emerges, i.e. hydrodynamic flow, superfluidity and magnetism. The interplay of Pauli’s principle and strong interactions poses great difficulties to our understanding of complex Fermi systems, from nuclei to high-temperature superconducting materials and neutron stars. I will describe experiments on atomic Fermi gases where interactions become as strong as allowed by quantum mechanics – the unitary Fermi gas, fermions immersed in a Bose gas and the Fermi-Hubbard lattice gas. Sound and heat transport distinguish collisionally hydrodnamic from superfluid flow, while spin transport reveals the underlying mechanism responsible for quantum magnetism.