Wearable haptic devices have seen growing interest in recent years, but providing realistic tactile feedback is not a challenge that is soon to be solved. Daily interactions with physical objects elicit complex sensations at the fingertips. Furthermore, human fingertips exhibit a broad range of physical dimensions and perceptive abilities, adding increased complexity to the task of simulating haptic interactions in a compelling manner. However, as the applications of wearable haptic feedback grow, concerns of wearability and generalizability often persuade tactile device designers to simplify the complexities associated with rendering realistic haptic sensations. As such, wearable devices tend to be optimized for particular uses and average users, rendering only the most salient dimensions of tactile feedback for a given task and assuming all users interpret the feedback in a similar fashion. We propose that providing more realistic haptic feedback will require in-depth examinations of higher-dimensional tactile cues and personalization of these cues for individual users. In this thesis, we aim to provide hardware and software-based solutions for rendering more expressive and personalized tactile cues to the fingertip.
Organizers: Katherine J. Kuchenbecker
One of the most striking characteristics of human behavior in contrast to all other animal is that we show extraordinary variability across populations. Human cultural diversity is a biological oddity. More specifically, we propose that what makes humans unique is the nature of the individual ontogenetic process, that results in this unparalleled cultural diversity. Hence, our central question is: How is human ontogeny adapted to cultural diversity and how does it contribute to it? This question is critical, because cultural diversity does not only entail our predominant mode of adaptation to local ecologies, but is key in the construction of our cognitive architecture. The colors we see, the tones that we hear, the memories we form, the norms we adhere to are all the consequence of an interaction between our emerging cognitive system and our lived experiences. While psychologists make careers measuring cognitive systems, we are terrible at measuring experience. The standard methods all face unsurmountable limitations. In our department, we hope to apply Machine Learning, Deep Learning and Computer Vision to automatically extract developmentally important indicators of humans’ daily experience. Similarly to the way that modern sequencing technologies allow us to study the human genotype at scale, applying AI methods to reliably quantify humans’ lived experience would allow us to study the human behavioral phenotype at scale, and fundamentally alter the science of human behavior and its application in education, mental health and medicine: The phenotyping revolution.
Organizers: Timo Bolkart
Imagine a futuristic version of Google Street View that could dial up any possible place in the world, at any possible time. Effectively, such a service would be a recording of the plenoptic function—the hypothetical function described by Adelson and Bergen that captures all light rays passing through space at all times. While the plenoptic function is completely impractical to capture in its totality, every photo ever taken represents a sample of this function. I will present recent methods we've developed to reconstruct the plenoptic function from sparse space-time samples of photos—including Street View itself, as well as tourist photos of famous landmarks. The results of this work include the ability to take a single photo and synthesize a full dawn-to-dusk timelapse video, as well as compelling 4D view synthesis capabilities where a scene can simultaneously be explored in space and time.
Game Development requires a vast array of tools, techniques, and expertise, ranging from game design, artistic content creation, to data management and low level engine programming. Yet all of these domains have one kind of task in common - the transformation of one kind of data into another. Meanwhile, advances in Machine Learning have resulted in a fundamental change in how we think about these kinds of data transformations - allowing for accurate and scalable function approximation, and the ability to train such approximations on virtually unlimited amounts of data. In this talk I will present how these two fundamental changes in Computer Science affect game development - how they can be used to improve game technology as well as the way games are built - and the exciting new possibilities and challenges they bring along the way.
Organizers: Abhinanda Ranjit Punnakkal
I will present three recent projects within the 3D Deep Learning research line from my team at Google Research: (1) a deep network for reconstructing the 3D shape of multiple objects appearing in a single RGB image (ECCV'20). (2) a new conditioning scheme for normalizing flow models. It enables several applications such as reconstructing an object's 3D point cloud from an image, or the converse problem of rendering an image given a 3D point cloud, both within the same modeling framework (CVPR'20); (3) a neural rendering framework that maps a voxelized object into a high quality image. It renders highly-textured objects and illumination effects such as reflections and shadows realistically. It allows controllable rendering: geometric and appearance modifications in the input are accurately represented in the final rendering (CVPR'20).
A longstanding goal of engineering has been to realize haptic interfaces that can convey realistic sensations of touch, comparable to signals presented via visual or audio displays. Today, this ideal remains far from realization, due to the difficulty of characterizing and electronically reproducing the complex and dynamic tactile signals that are produced during even the simplest touch interactions. In this talk, I will present my work on capturing whole-hand tactile signals, in the form of mechanical waves, produced during natural hand interactions. I will describe how I characterized the information content in these signals and used the results to guide the design of new electronic devices for distributed tactile feedback.
Organizers: Katherine J. Kuchenbecker
Babies learn with very little supervision, and, even when supervision is present, it comes in the form of an unknown spoken language that also needs to be learned. How can kids make sense of the world? In this work, I will show that an agent that has access to multimodal data (like vision, audition or touch) can use the correlation between images and sounds to discover objects in the world without supervision. I will show that ambient sounds can be used as a supervisory signal for learning to see and vice versa (the sound of crashing waves, the roar of fast-moving cars – sound conveys important information about the objects in our surroundings). I will describe an approach that learns, by watching videos without annotations, to locate image regions that produce sounds, and to separate the input sounds into a set of components that represents the sound from each pixel. I will also discuss our recent work on capturing tactile information.
Organizers: Arjun Chandrasekaran
Learning the embedding space, where semantically similar objects are located close together and dissimilar objects far apart, is a cornerstone of many computer vision applications. Existing approaches usually learn a single metric in the embedding space for all available data points,which may have a very complex non-uniform distribution with different notions of similarity between objects, e.g. appearance, shape, color or semantic meaning. We approach this problem by using the embedding space more efficiently by jointly splitting the embedding space and data into K smaller sub-problems. It divides both, the data and the embedding space into K subsets and learns K separate distance metrics in the non-overlapping subspaces of the embedding space, defined by groups of neurons in the embedding layer of the neural network. In the second part of the talk, we show that, at least for proximal animal classes such as chimpanzees, it is possible to transfer the knowledge existing in dense pose recognition for humans, as well as inmore general object detectors and segmenters, to the problem of dense pose recognition in other classes. We do this by (1) establishing a DensePose model for the new animal which is also geometrically aligned to humans (2) introducing a multi-head R-CNN architecture that facilitates transfer of multiple recognition tasks between classes, (3) finding which combination of known classes can be transferred most effectively to the new animal and (4) using self-calibrated uncertainty heads to generate pseudo-labels graded by quality for training a model for this class.
Organizers: Nikos Athanasiou
Roll-to-roll UV nanoimprint lithography (R2R-UV-NIL) gains increasing industrial interest for large area nano- and micro-structuring of flexible substrates because it combines nanometer resolution with many square meter per minute productivity. Small-area masters of functional nano and micro surface structures are readily available by various lithographic techniques like e.g. UV-, e-beam- or interference lithography. However, the upscaling of small-area nano- and micro-structured masters into medium size roller molds – often called shims - for R2R-UV-NIL production still remains a bottleneck in the large area nano-structuring process chain. At JR MATERIALS we have installed a customized EVG 770 UV-NIL-stepper and are developing step-&-repeat UV-NIL processes and materials for the seamless upscaling of small-area masters into polymer shims for our R2R-UV-NIL pilot line with dimensions of up to 270 x 630 mm2. These polymer shims can be used either directly for short to medium R2R-UV-NIL manufacturing runs or get galvano-formed into nickel-shims for real long run production. In this seminar the JR MATERIALS UV-NIL tools, processes and materials as well as a few applications will be presented.
In recent years, commodity 3D sensors have become widely available, spawning significant interest in both offline and real-time 3D reconstruction. While state-of-the-art reconstruction results from commodity RGB-D sensors are visually appealing, they are far from usable in practical computer graphics applications since they do not match the high quality of artist-modeled 3D graphics content. One of the biggest challenges in this context is that obtained 3D scans suffer from occlusions, thus resulting in incomplete 3D models. In this talk, I will present a data-driven approach towards generating high quality 3D models from commodity scan data, and the use of these geometrically complete 3D models towards semantic and texture understanding of real-world environments.
Organizers: Yinghao Huang