Header logo is
Institute Talks

Learning to Act with Confidence

Talk
  • 23 October 2018 • 12:00 13:00
  • Andreas Krause
  • MPI-IS Tübingen, N0.002

Actively acquiring decision-relevant information is a key capability of intelligent systems, and plays a central role in the scientific process. In this talk I will present research from my group on this topic at the intersection of statistical learning, optimization and decision making. In particular, I will discuss how statistical confidence bounds can guide data acquisition in a principled way to make effective and reliable decisions in a variety of complex domains. I will also discuss several applications, ranging from autonomously guiding wetlab experiments in protein function optimization to safe exploration in robotics.

Control Systems for a Surgical Robot on the Space Station

IS Colloquium
  • 23 October 2018 • 16:30 17:30
  • Chris Macnab
  • MPI-IS Stuttgart, Heisenbergstr. 3, Room 2P4

As part of a proposed design for a surgical robot on the space station, my research group has been asked to look at controls that can provide literally surgical precision. Due to excessive time delay, we envision a system with a local model being controlled by a surgeon while the remote system on the space station follows along in a safe manner. Two of the major design considerations that come into play for the low-level feedback loops on the remote side are 1) the harmonic drives in a robot will cause excessive vibrations in a micro-gravity environment unless active damping strategies are employed and 2) when interacting with a human tissue environment the robot must apply smooth control signals that result in precise positions and forces. Thus, we envision intelligent strategies that utilize nonlinear, adaptive, neural-network, and/or fuzzy control theory as the most suitable. However, space agencies, or their engineering sub-contractors, typically provide gain and phase margin characteristics as requirements to the engineers involved in a control system design, which are normally associated with PID or other traditional linear control schemes. We are currently endeavouring to create intelligent controls that have guaranteed gain and phase margins using the Cerebellar Model Articulation Controller.

Organizers: Katherine Kuchenbecker

Artificial haptic intelligence for human-machine systems

IS Colloquium
  • 24 October 2018 • 11:00 12:00
  • Veronica J. Santos
  • 5H7 at MPI-IS in Stuttgart

The functionality of artificial manipulators could be enhanced by artificial “haptic intelligence” that enables the identification of object features via touch for semi-autonomous decision-making and/or display to a human operator. This could be especially useful when complementary sensory modalities, such as vision, are unavailable. I will highlight past and present work to enhance the functionality of artificial hands in human-machine systems. I will describe efforts to develop multimodal tactile sensor skins, and to teach robots how to haptically perceive salient geometric features such as edges and fingertip-sized bumps and pits using machine learning techniques. I will describe the use of reinforcement learning to teach robots goal-based policies for a functional contour-following task: the closure of a ziplock bag. Our Contextual Multi-Armed Bandits approach tightly couples robot actions to the tactile and proprioceptive consequences of the actions, and selects future actions based on prior experiences, the current context, and a functional task goal. Finally, I will describe current efforts to develop real-time capabilities for the perception of tactile directionality, and to develop models for haptically locating objects buried in granular media. Real-time haptic perception and decision-making capabilities could be used to advance semi-autonomous robot systems and reduce the cognitive burden on human teleoperators of devices ranging from wheelchair-mounted robots to explosive ordnance disposal robots.

Organizers: Katherine Kuchenbecker

Artificial haptic intelligence for human-machine systems

IS Colloquium
  • 25 October 2018 • 11:00 11:00
  • Veronica J. Santos
  • N2.025 at MPI-IS in Tübingen

The functionality of artificial manipulators could be enhanced by artificial “haptic intelligence” that enables the identification of object features via touch for semi-autonomous decision-making and/or display to a human operator. This could be especially useful when complementary sensory modalities, such as vision, are unavailable. I will highlight past and present work to enhance the functionality of artificial hands in human-machine systems. I will describe efforts to develop multimodal tactile sensor skins, and to teach robots how to haptically perceive salient geometric features such as edges and fingertip-sized bumps and pits using machine learning techniques. I will describe the use of reinforcement learning to teach robots goal-based policies for a functional contour-following task: the closure of a ziplock bag. Our Contextual Multi-Armed Bandits approach tightly couples robot actions to the tactile and proprioceptive consequences of the actions, and selects future actions based on prior experiences, the current context, and a functional task goal. Finally, I will describe current efforts to develop real-time capabilities for the perception of tactile directionality, and to develop models for haptically locating objects buried in granular media. Real-time haptic perception and decision-making capabilities could be used to advance semi-autonomous robot systems and reduce the cognitive burden on human teleoperators of devices ranging from wheelchair-mounted robots to explosive ordnance disposal robots.

Organizers: Katherine Kuchenbecker Adam Spiers

A fine-grained perspective onto object interactions

Talk
  • 30 October 2018 • 10:30 11:30
  • Dima Damen
  • N3.022 (Aquarium)

This talk aims to argue for a fine-grained perspective onto human-object interactions, from video sequences. I will present approaches for the understanding of ‘what’ objects one interacts with during daily activities, ‘when’ should we label the temporal boundaries of interactions, ‘which’ semantic labels one can use to describe such interactions and ‘who’ is better when contrasting people perform the same interaction. I will detail my group’s latest works on sub-topics related to: (1) assessing action ‘completion’ – when an interaction is attempted but not completed [BMVC 2018], (2) determining skill or expertise from video sequences [CVPR 2018] and (3) finding unequivocal semantic representations for object interactions [ongoing work]. I will also introduce EPIC-KITCHENS 2018, the recently released largest dataset of object interactions in people’s homes, recorded using wearable cameras. The dataset includes 11.5M frames fully annotated with objects and actions, based on unique annotations from the participants narrating their own videos, thus reflecting true intention. Three open challenges are now available on object detection, action recognition and action anticipation [http://epic-kitchens.github.io]

Organizers: Mohamed Hassan

TBA

IS Colloquium
  • 28 January 2019 • 3pm 4pm
  • Florian Marquardt

Organizers: Matthias Bauer

  • Jürgen Gall

Capturing human motion or objects by vision technology has been intensively studied. Although humans interact very often with other persons or objects, most of the previous work has focused on capturing a single object or the motion of a single person. In this talk, I will highlight four projects that deal with human-human or human-object interactions. The first project addresses the problem of capturing skeleton and non-articulated cloth motion of two interacting characters. The second project aims to model spatial hand-object relations during object manipulation. In the third project, an affordance detector is learned from human-object interactions. The fourth project investigates how human motion can be exploited for object discovery from depth video streams.

Organizers: Philipp Hennig Michel Besserve


  • Peter Gehler

n this talk I will present recent work on two different topics from low- and high-level computer vision: Intrinsic Image Recovery and Efficient object detection. By intrinsic image decomposition we refer to the challenging task of decoupling material properties from lighting properties given a single image. We propose a probabilistic model that incorporates previous attempts exploiting edge information and combine it with a novel prior on material reflectances in the image. This results in a random field model with global, latent variables and pixel-accurate output reflectance values. I will present experiments on a recently proposed ground-truth database.

The proposed model is found to outperform previous models that have been proposed. Then I will also discuss some possible future developments in this field. In the second part of the talk I will present an efficient object detection scheme that breaks the computational complexity of commonly used detection algorithms, eg sliding windows. We pose the detection problem naturally as a structured prediction problem for which we decompose the inference procedure into an adaptive best-first search.

This results in test-time inference that scales sub-linearly in the size of the search space and detection requires usually less than 100 classifier evaluations. This paves the way for using strong (but costly) classifiers such as non-linear SVMs. The algorithmic properties are demonstrated using the VOC'07 dataset. This work is part of the Visipedia project, in collaboration with Steve Branson, Catherine Wah, Florian Schroff, Boris Babenko, Peter Welinder and Pietro Perona.


  • Yusuf Sahillioglu

3D shape correspondence methods seek on two given shapes for pairs of surface points that are semantically equivalent. We present three automatic algorithms that address three different aspects of this problem: 1) coarse, 2) dense, and 3) partial correspondence. In 1), after sampling evenly-spaced base vertices on shapes, we formulate the problem of shape correspondence as combinatorial optimization over the domain of all possible mappings of bases, which then reduces within a probabilistic framework to a log-likelihood maximization problem that we solve via EM (Expectation Maximization) algorithm.

Due to computational limitations, we change this algorithm to a coarse-to-fine one (2) to achieve dense correspondence between all vertices. Our scale-invariant isometric distortion measure makes partial matching (3) possible as well.


  • Serge Belongie

We present an interactive, hybrid human-computer method for object classification. The method applies to classes of problems that are difficult for most people, but are recognizable by people with the appropriate expertise (e.g., animal species or airplane model recognition). The classification method can be seen as a visual version of the 20 questions game, where questions based on simple visual attributes are posed interactively.

The goal is to identify the true class while minimizing the number of questions asked, using the visual content of the image. Incorporating user input drives up recognition accuracy to levels that are good enough for practical applications; at the same time, computer vision reduces the amount of human interaction required. The resulting hybrid system is able to handle difficult, large multi-class problems with tightly-related categories.

We introduce a general framework for incorporating almost any off-the-shelf multi-class object recognition algorithm into the visual 20 questions game, and provide methodologies to account for imperfect user responses and unreliable computer vision algorithms. We evaluate the accuracy and computational properties of different computer vision algorithms and the effects of noisy user responses on a dataset of 200 bird species and on the Animals With Attributes dataset.

Our results demonstrate the effectiveness and practicality of the hybrid human-computer classification paradigm. This work is part of the Visipedia project, in collaboration with Steve Branson, Catherine Wah, Florian Schroff, Boris Babenko, Peter Welinder and Pietro Perona.


  • Javier Romero

Object grasping and manipulation is a crucial part of daily human activities. The study of these actions represents a central component in the development of systems that attempt to understand human activities and robots that are able to act in human environments.

Three essential parts of this problem are tackled in this talk: the perception of the human hand in interaction with objects, the modeling of human grasping actions and the refinement of the execution of a robotic grasp. The estimation of the human hand pose is carrried out with a markerless visual system that performs in real time under object occlusions. Low dimensional models of various grasping actions are created by exploiting the correlations between different hand joints in a non-linear manner with Gaussian Process Latent Variable Models (GPLVM). Finally, robot grasping actions are perfected by exploiting the appearance of the robot during action execution.


  • Fernando De La Torre

Enabling computers to understand human behavior has the potential to revolutionize many areas that benefit society such as clinical diagnosis, human computer interaction, and social robotics. A critical element in the design of any behavioral sensing system is to find a good representation of the data for encoding, segmenting, classifying and predicting subtle human behavior.

In this talk I will propose several extensions of Component Analysis (CA) techniques (e.g., kernel principal component analysis, support vector machines, spectral clustering) that are able to learn spatio-temporal representations or components useful in many human sensing tasks. In the first part of the talk I will give an overview of several ongoing projects in the CMU Human Sensing Laboratory, including our current work on depression assessment from video, as well as hot-flash detection from wearable sensors.

In the second part of the talk I will show how several extensions of the CA methods outperform state-of-the-art algorithms in problems such as temporal alignment of human motion, temporal segmentation/clustering of human activities, joint segmentation and classification of human behavior, facial expression analysis, and facial feature detection in images. The talk will be adaptive, and I will discuss the topics of major interest to the audience.


  • Arto Nurmikko

Semiconductor light emitters, based on single crystal epitaxial inorganic semiconductor heterostructures are ubiquitous. In spite of their extraordinary versatility and technological maturity, penetrating the full visible spectrum using a single material system for red, green, and blue (RGB) in a seamless way remains, nonetheless, an elusive challenge.

Semiconductor nanocrystals, or quantum dots (QDs), synthesized by solution-based methods of colloidal chemistry represent a strongly contrasting basis for active optical materials. While possessing an ability to absorb and efficiently luminesce across the RGB by simple nanocrystal particle size control within a single material system, these preparations have yet to make a significant impact as viable light emitting devices, mainly due to the difficulties in casting such materials from their natural habitat, that is “from the chemist’s bottle” to a useful solid thin film form for device use. In this presentation we show how II-VI compound nanocrystals can be transitioned to solid templates with targeted spatial control and placeme


  • Arto Nurmikko

Invasive access by microprobe arrays inserted safely into the brain is now enabling us to “listen” to local neural circuits at levels of spatial and temporal detail which, in addition to enriching fundamental brain science, has led to the possibility of a new generation of neurotechnologies to overcome disabilities due to a range of neurological injuries where pathways from the brain to the rest of the central and peripheral nervous systems have been injured or severed. In this presentation we discuss the biomedical engineering challenges and opportunities with these incipient technologies, with emphasis on implantable wireless neural interfaces for communicating with the brain.

A second topic, related to the possibility of sending direct inputs of information back to the brain by implanted devices is also explored, focusing on recently discovered means to render selected neural cell types and microcircuits to be light sensitized, following local microbiologically induced conditioning.


  • Henning Hamer

The scope of this work is hand-object interaction. As a starting point, we observe hands manipulating objects and derive information based on computer vision methods. After considering hands and objects in isolation, we focus on the inherent interdependencies. One application of the gained knowledge is the synthesis of interactive hand motion for animated sequences.


  • Vittorio Ferrari

Vision is a crucial sense for computational systems to interact with their environments as biological systems do. A major task is interpreting images of complex scenes, by recognizing and localizing objects, persons and actions. This involves learning a large number of visual models, ideally autonomously.

In this talk I will present two ways of reducing the amount of human supervision required by this learning process. The first way is labeling images only by the object class they contain. Learning from cluttered images is very challenging in this weakly supervised setting. In the traditional paradigm, each class is learned starting from scratch. In our work instead, knowledge generic over classes is first learned during a meta-training stage from images of diverse classes with given object locations, and is then used to support learning any new class without location annotation. Generic knowledge helps because during meta-training the system can learn about localizing objects in general. As demonstrated experimentally, this approach enables learning from more challenging images than possible before, such as the PASCAL VOC 2007, containing extensive clutter and large scale and appearance variations between object instances.

The second way is the analysis of news items consisting of images and text captions. We associate names and action verbs in the captions to the face and body pose of the persons in the images. We introduce a joint probabilistic model for simultaneously recovering image-caption correspondences and learning appearance models for the face and pose classes occurring in the corpus. As demonstrated experimentally, this joint `face and pose' model solves the correspondence problem better than earlier models covering only the face.

I will conclude with an outlook on the idea of visual culture, where new visual concepts are learned incrementally on top of all visual knowledge acquired so far. Beside generic knowledge, visual culture includes also knowledge specific to a class, knowledge of scene structures and other forms of visual knowledge. Potentially, this approach could considerably extend current visual recognition capabilities and produce an integrated body of visual knowledge.