In recent years, commodity 3D sensors have become widely available, spawning significant interest in both offline and real-time 3D reconstruction. While state-of-the-art reconstruction results from commodity RGB-D sensors are visually appealing, they are far from usable in practical computer graphics applications since they do not match the high quality of artist-modeled 3D graphics content. One of the biggest challenges in this context is that obtained 3D scans suffer from occlusions, thus resulting in incomplete 3D models. In this talk, I will present a data-driven approach towards generating high quality 3D models from commodity scan data, and the use of these geometrically complete 3D models towards semantic and texture understanding of real-world environments.
Organizers: Yinghao Huang
How can we tell that a video is playing backwards? People's motions look wrong when the video is played backwards--can we develop an algorithm to distinguish forward from backward video? Similarly, can we tell if a video is sped-up? We have developed algorithms to distinguish forwards from backwards video, and fast from slow. Training algorithms for these tasks provides a self-supervised task that facilitates human activity recognition. We'll show these results, and applications of these unsupervised video learning tasks, including a method to change the timing of people in videos.
Organizers: Yinghao Huang
Applying data-driven approaches to non-rigid 3D reconstruction has been difficult, which we believe can be attributed to the lack of a large-scale training corpus. One recent approach proposes self-supervision based on non-rigid reconstruction. Unfortunately, this method fails for important cases such as highly non-rigid deformations. We first address this problem of lack of data by introducing a novel semi-supervised strategy to obtain dense interframe correspondences from a sparse set of annotations. This way, we obtain a large dataset of 400 scenes, over 390,000 RGB-D frames, and 2,537 densely aligned frame pairs; in addition, we provide a test set along with several metrics for evaluation. Based on this corpus, we introduce a data-driven non-rigid feature matching approach, which we integrate into an optimization-based reconstruction pipeline. Here, we propose a new neural network that operates on RGB-D frames, while maintaining robustness under large non-rigid deformations and producing accurate predictions. Our approach significantly outperforms both existing non-rigid reconstruction methods that do not use learned data terms, as well as learning-based approaches that only use self-supervision.
Organizers: Vassilis Choutas
In our recent work, XNect, we propose a real-time solution for the challenging task of multi-person 3D human pose estimation from a single RGB camera. To achieve real-time performance without compromising on accuracy, our approach relies on a new efficient Convolutional Neural Network architecture, and a multi-staged pose formulation. The CNN architecture is approx. 1.3x faster than ResNet-50, while achieving the same accuracy on various tasks, and the benefits extend beyond inference speed to a much smaller training memory footprint and a much higher training throughput. The proposed pose formulation jointly reasons about all the subjects in the scene, ensuring that pose inference can be done in real time even with a large number of subjects in the scene. The key insight behind the accuracy of the formulation is to split the reasoning about human pose into two distinct stages. The first stage, which is fully convolutional, infers 2D and 3D pose of body parts supported by image evidence, and reasons jointly about all subjects. The second stage, which is a small fully connected network, operates on each individual subject, and uses the context of the visibly body parts and learned pose priors, to infer the 3D pose of the missing body parts. A third stage on top reconciles the 2D and 3D poses per frame and across time, to produce a temporally stable kinematic skeleton. In this talk, we will briefly discuss the proposed Convolutional Neural Network architecture and the possible benefits it might bring to your workflow. The other part of the talk would be on how the pose formulation proposed in this work came to be, what its advantages are, and how it can be extended to other related problems.
Organizers: Yinghao Huang
The Covid-19 pandemic has spread rapidly worldwide, overwhelming manual contact tracing in many countries, resulting in widespread lockdowns for emergency containment. Large-scale digital contact tracing (DCT) has emerged as a potential solution to resume economic and social activity without triggering a second outbreak. Various DCT methods have been proposed, each making trade-offs between privacy, mobility restriction, and public health.
Fermions, particles with half-integer spin like the electron, proton and neutron, obey the Pauli principle: They cannot share one and the same quantum state. This “anti social” behavior is directly observed in experiments with ultracold gases of fermionic atoms: Pauli blocking in momentum space for a free Fermi gas, and in real space in gases confined to an optical lattice. When fermions interact, new, rather “social” behavior emerges, i.e. hydrodynamic flow, superfluidity and magnetism. The interplay of Pauli’s principle and strong interactions poses great difficulties to our understanding of complex Fermi systems, from nuclei to high-temperature superconducting materials and neutron stars. I will describe experiments on atomic Fermi gases where interactions become as strong as allowed by quantum mechanics – the unitary Fermi gas, fermions immersed in a Bose gas and the Fermi-Hubbard lattice gas. Sound and heat transport distinguish collisionally hydrodnamic from superfluid flow, while spin transport reveals the underlying mechanism responsible for quantum magnetism.
In this visual feast, Scott recounts results and revelations from four years of experimentation using machine learning as a ‘creative collaborator’ in his artistic process. He makes the case that AI, rather than rendering artists obsolete, will empower us and expand our creative horizons. In this visual feast, Scott shares an eclectic range of successes and failures encountered in his efforts to create powerful, but artistically controllable neural networks to use as tools to represent and abstract the human figure. Scott also gives a behinds-the-scenes look at creating the work for his recent Artist+AI exhibition in London.
Organizers: Ahmed Osman
In this talk, I will introduce the notion of 'canonicalization' and how it can be used to solve 3D computer vision tasks. I will describe Normalized Object Coordinate Space (NOCS), a 3D canonical container that we have developed for 3D estimation, aggregation, and synthesis tasks. I will demonstrate how NOCS allows us to address previously difficult tasks like category-level 6DoF object pose estimation, and correspondence-free multiview 3D shape aggregation. Finally, I will discuss future directions including opportunities to extend NOCS for tasks like articulated and non-rigid shape and pose estimation.
Organizers: Timo Bolkart
Motivated by the current COVID-19 outbreak, we introduce a novel epidemic model based on marked temporal point processes that is specifically designed to make fine-grained spatiotemporal predictions about the course of the disease in a population. Our model can make use and benefit from data gathered by a variety of contact tracing technologies and it can quantify the effects that different testing and tracing strategies, social distancing measures, and business restrictions may have on the course of the disease. Building on our model, we use Bayesian optimization to estimate the risk of exposure of each individual at the sites they visit from historical longitudinal testing data. Experiments using real COVID-19 data and mobility patterns from several towns and regions in Germany and Switzerland demonstrate that our model can be used to quantify the effects of tracing, testing, and containment strategies at an unprecedented spatiotemporal resolution. To facilitate research and informed policy-making, particularly in the context of the current COVID-19 outbreak, we are releasing an open-source implementation of our framework at https://github.com/covid19-model.
Organizers: Bernhard Schölkopf
This talk is devoted to modern methods for attosecond and femtosecond laser spectro-microscopy with the special focus on applications that require extreme spatial resolution. In the first part, I discuss how high-harmonic generation by high-energy, high-power light transients holds promise to deliver the required photon flux and photon energy for attosecond pump-probe spectroscopy at high spatiotemporal resolution in order to capture electron-dynamic in matter. I demonstrate the first prototype high-energy field synthesizer based on Yb:YAG, thin-disk laser technology for generating high-energy light transients. In the second part of my talk, I show resolving the complex electric field of light at PHz frequency by means of electro-optic sampling in ambient air, and discuss the potential of the technique in molecular spectroscopy and high-resolution, label-free imaging. 1. A. Alismail et al., "Multi-octave, CEP-stable source for high-energy field synthesis," Science Advances 6, eaax 3408 (2020) 2. H. Wang et al., "High Energy, Sub-Cycle, Field Synthesizers," IEEE Journal of Selected Topics in Quantum Electronics, (2019). 3. A. Sommer et al., " Attosecond nonlinear polarization and energy transfer in dielectrics," Nature 534, 86 (2016). 4. H. Fattahi, "Sub-cycle light transients for attosecond, X-ray, four-dimensional imaging," The Contemporary Physics Journal, 57, 1 (2016). 5. H. Fattahi et al., "Third-generation femtosecond technology," Optica 1, 45 (2014).