Header logo is


2020


A gamified app that helps people overcome self-limiting beliefs by promoting metacognition
A gamified app that helps people overcome self-limiting beliefs by promoting metacognition

Amo, V., Lieder, F.

pages: 6, SIG 8 Meets SIG 16, September 2020 (conference)

Abstract
Previous research has shown that approaching learning with a growth mindset is key for maintaining motivation and overcoming setbacks. Mindsets are systems of beliefs that people hold to be true. They influence a person's attitudes, thoughts, and emotions when they learn something new or encounter challenges. In clinical psychology, metareasoning (reflecting on one's mental processes) and meta-awareness (recognizing thoughts as mental events instead of equating them to reality) have proven effective for overcoming maladaptive thinking styles. Hence, they are potentially an effective method for overcoming self-limiting beliefs in other domains as well. However, the potential of integrating assisted metacognition into mindset interventions has not been explored yet. Here, we propose that guiding and training people on how to leverage metareasoning and meta-awareness for overcoming self-limiting beliefs can significantly enhance the effectiveness of mindset interventions. To test this hypothesis, we develop a gamified mobile application that guides and trains people to use metacognitive strategies based on Cognitive Restructuring (CR) and Acceptance Commitment Therapy (ACT) techniques. The application helps users to identify and overcome self-limiting beliefs by working with aversive emotions when they are triggered by fixed mindsets in real-life situations. Our app aims to help people sustain their motivation to learn when they face inner obstacles (e.g. anxiety, frustration, and demotivation). We expect the application to be an effective tool for helping people better understand and develop the metacognitive skills of emotion regulation and self-regulation that are needed to overcome self-limiting beliefs and develop growth mindsets.

re

A gamified app that helps people overcome self-limiting beliefs by promoting metacognition [BibTex]


Learning to Dress 3D People in Generative Clothing
Learning to Dress 3D People in Generative Clothing

Ma, Q., Yang, J., Ranjan, A., Pujades, S., Pons-Moll, G., Tang, S., Black, M. J.

In Computer Vision and Pattern Recognition (CVPR), June 2020 (inproceedings)

Abstract
Three-dimensional human body models are widely used in the analysis of human pose and motion. Existing models, however, are learned from minimally-clothed 3D scans and thus do not generalize to the complexity of dressed people in common images and videos. Additionally, current models lack the expressive power needed to represent the complex non-linear geometry of pose-dependent clothing shape. To address this, we learn a generative 3D mesh model of clothed people from 3D scans with varying pose and clothing. Specifically, we train a conditional Mesh-VAE-GAN to learn the clothing deformation from the SMPL body model, making clothing an additional term on SMPL. Our model is conditioned on both pose and clothing type, giving the ability to draw samples of clothing to dress different body shapes in a variety of styles and poses. To preserve wrinkle detail, our Mesh-VAE-GAN extends patchwise discriminators to 3D meshes. Our model, named CAPE, represents global shape and fine local structure, effectively extending the SMPL body model to clothing. To our knowledge, this is the first generative model that directly dresses 3D human body meshes and generalizes to different poses.

ps

arxiv project page [BibTex]


Generating 3D People in Scenes without People
Generating 3D People in Scenes without People

Zhang, Y., Hassan, M., Neumann, H., Black, M. J., Tang, S.

In Computer Vision and Pattern Recognition (CVPR), June 2020 (inproceedings)

Abstract
We present a fully-automatic system that takes a 3D scene and generates plausible 3D human bodies that are posed naturally in that 3D scene. Given a 3D scene without people, humans can easily imagine how people could interact with the scene and the objects in it. However, this is a challenging task for a computer as solving it requires (1) the generated human bodies should be semantically plausible with the 3D environment, e.g. people sitting on the sofa or cooking near the stove; (2) the generated human-scene interaction should be physically feasible in the way that the human body and scene do not interpenetrate while, at the same time, body-scene contact supports physical interactions. To that end, we make use of the surface-based 3D human model SMPL-X. We first train a conditional variational autoencoder to predict semantically plausible 3D human pose conditioned on latent scene representations, then we further refine the generated 3D bodies using scene constraints to enforce feasible physical interaction. We show that our approach is able to synthesize realistic and expressive 3D human bodies that naturally interact with 3D environment. We perform extensive experiments demonstrating that our generative framework compares favorably with existing methods, both qualitatively and quantitatively. We believe that our scene-conditioned 3D human generation pipeline will be useful for numerous applications; e.g. to generate training data for human pose estimation, in video games and in VR/AR.

ps

PDF link (url) [BibTex]

PDF link (url) [BibTex]


no image
Where Does It End? - Reasoning About Hidden Surfaces by Object Intersection Constraints

Strecke, M., Stückler, J.

In Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020 (inproceedings) Accepted

ev

[BibTex]

[BibTex]


no image
A Continuous-time Perspective for Modeling Acceleration in Riemannian Optimization

F Alimisis, F., Orvieto, A., Becigneul, G., Lucchi, A.

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), June 2020 (conference) Accepted

ei

[BibTex]

[BibTex]


no image
A Kernel Mean Embedding Approach to Reducing Conservativeness in Stochastic Programming and Control

Zhu, J., Diehl, M., Schölkopf, B.

2nd Annual Conference on Learning for Dynamics and Control (L4DC), June 2020 (conference) Accepted

ei

arXiv [BibTex]

arXiv [BibTex]


Learning Physics-guided Face Relighting under Directional Light
Learning Physics-guided Face Relighting under Directional Light

Nestmeyer, T., Lalonde, J., Matthews, I., Lehrmann, A. M.

In Conference on Computer Vision and Pattern Recognition, IEEE/CVF, June 2020 (inproceedings) Accepted

Abstract
Relighting is an essential step in realistically transferring objects from a captured image into another environment. For example, authentic telepresence in Augmented Reality requires faces to be displayed and relit consistent with the observer's scene lighting. We investigate end-to-end deep learning architectures that both de-light and relight an image of a human face. Our model decomposes the input image into intrinsic components according to a diffuse physics-based image formation model. We enable non-diffuse effects including cast shadows and specular highlights by predicting a residual correction to the diffuse render. To train and evaluate our model, we collected a portrait database of 21 subjects with various expressions and poses. Each sample is captured in a controlled light stage setup with 32 individual light sources. Our method creates precise and believable relighting results and generalizes to complex illumination conditions and challenging poses, including when the subject is not looking straight at the camera.

ps

Paper [BibTex]

Paper [BibTex]


{VIBE}: Video Inference for Human Body Pose and Shape Estimation
VIBE: Video Inference for Human Body Pose and Shape Estimation

Kocabas, M., Athanasiou, N., Black, M. J.

In Computer Vision and Pattern Recognition (CVPR), June 2020 (inproceedings)

Abstract
Human motion is fundamental to understanding behavior. Despite progress on single-image 3D pose and shape estimation, existing video-based state-of-the-art methodsfail to produce accurate and natural motion sequences due to a lack of ground-truth 3D motion data for training. To address this problem, we propose “Video Inference for Body Pose and Shape Estimation” (VIBE), which makes use of an existing large-scale motion capture dataset (AMASS) together with unpaired, in-the-wild, 2D keypoint annotations. Our key novelty is an adversarial learning framework that leverages AMASS to discriminate between real human motions and those produced by our temporal pose and shape regression networks. We define a temporal network architecture and show that adversarial training, at the sequence level, produces kinematically plausible motion sequences without in-the-wild ground-truth 3D labels. We perform extensive experimentation to analyze the importance of motion and demonstrate the effectiveness of VIBE on challenging 3D pose estimation datasets, achieving state-of-the-art performance. Code and pretrained models are available at https://github.com/mkocabas/VIBE

ps

arXiv code [BibTex]

arXiv code [BibTex]


no image
Mixed-curvature Variational Autoencoders

Skopek, O., Ganea, O., Becigneul, G.

8th International Conference on Learning Representations (ICLR), April 2020 (conference) Accepted

ei

link (url) [BibTex]

link (url) [BibTex]


From Variational to Deterministic Autoencoders
From Variational to Deterministic Autoencoders

Ghosh*, P., Sajjadi*, M. S. M., Vergari, A., Black, M. J., Schölkopf, B.

8th International Conference on Learning Representations (ICLR) , April 2020, *equal contribution (conference) Accepted

Abstract
Variational Autoencoders (VAEs) provide a theoretically-backed framework for deep generative models. However, they often produce “blurry” images, which is linked to their training objective. Sampling in the most popular implementation, the Gaussian VAE, can be interpreted as simply injecting noise to the input of a deterministic decoder. In practice, this simply enforces a smooth latent space structure. We challenge the adoption of the full VAE framework on this specific point in favor of a simpler, deterministic one. Specifically, we investigate how substituting stochasticity with other explicit and implicit regularization schemes can lead to a meaningful latent space without having to force it to conform to an arbitrarily chosen prior. To retrieve a generative mechanism for sampling new data points, we propose to employ an efficient ex-post density estimation step that can be readily adopted both for the proposed deterministic autoencoders as well as to improve sample quality of existing VAEs. We show in a rigorous empirical study that regularized deterministic autoencoding achieves state-of-the-art sample quality on the common MNIST, CIFAR-10 and CelebA datasets.

ei ps

arXiv [BibTex]

arXiv [BibTex]


Chained Representation Cycling: Learning to Estimate 3D Human Pose and Shape by Cycling Between Representations
Chained Representation Cycling: Learning to Estimate 3D Human Pose and Shape by Cycling Between Representations

Rueegg, N., Lassner, C., Black, M. J., Schindler, K.

In Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), Febuary 2020 (inproceedings)

Abstract
The goal of many computer vision systems is to transform image pixels into 3D representations. Recent popular models use neural networks to regress directly from pixels to 3D object parameters. Such an approach works well when supervision is available, but in problems like human pose and shape estimation, it is difficult to obtain natural images with 3D ground truth. To go one step further, we propose a new architecture that facilitates unsupervised, or lightly supervised, learning. The idea is to break the problem into a series of transformations between increasingly abstract representations. Each step involves a cycle designed to be learnable without annotated training data, and the chain of cycles delivers the final solution. Specifically, we use 2D body part segments as an intermediate representation that contains enough information to be lifted to 3D, and at the same time is simple enough to be learned in an unsupervised way. We demonstrate the method by learning 3D human pose and shape from un-paired and un-annotated images. We also explore varying amounts of paired data and show that cycling greatly alleviates the need for paired data. While we present results for modeling humans, our formulation is general and can be applied to other vision problems.

ps

pdf [BibTex]

pdf [BibTex]


no image
More Powerful Selective Kernel Tests for Feature Selection

Lim, J. N., Yamada, M., Jitkrittum, W., Terada, Y., Matsui, S., Shimodaira, H.

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), 2020 (conference) To be published

ei

arXiv [BibTex]

arXiv [BibTex]


Learning Unsupervised Hierarchical Part Decomposition of 3D Objects from a Single RGB Image
Learning Unsupervised Hierarchical Part Decomposition of 3D Objects from a Single RGB Image

Paschalidou, D., Gool, L., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020 (inproceedings)

Abstract
Humans perceive the 3D world as a set of distinct objects that are characterized by various low-level (geometry, reflectance) and high-level (connectivity, adjacency, symmetry) properties. Recent methods based on convolutional neural networks (CNNs) demonstrated impressive progress in 3D reconstruction, even when using a single 2D image as input. However, the majority of these methods focuses on recovering the local 3D geometry of an object without considering its part-based decomposition or relations between parts. We address this challenging problem by proposing a novel formulation that allows to jointly recover the geometry of a 3D object as a set of primitives as well as their latent hierarchical structure without part-level supervision. Our model recovers the higher level structural decomposition of various objects in the form of a binary tree of primitives, where simple parts are represented with fewer primitives and more complex parts are modeled with more components. Our experiments on the ShapeNet and D-FAUST datasets demonstrate that considering the organization of parts indeed facilitates reasoning about 3D geometry.

avg

pdf suppmat Video Project Page [BibTex]

pdf suppmat Video Project Page [BibTex]


no image
TUM Flyers: Vision-Based MAV Navigation for Systematic Inspection of Structures

Usenko, V., Stumberg, L. V., Stückler, J., Cremers, D.

In Bringing Innovative Robotic Technologies from Research Labs to Industrial End-users: The Experience of the European Robotics Challenges, 136, pages: 189-209, Springer International Publishing, 2020 (inbook)

ev

[BibTex]

[BibTex]


no image
ACTrain: Ein KI-basiertes Aufmerksamkeitstraining für die Wissensarbeit [ACTrain: An AI-based attention training for knowledge work]

Wirzberger, M., Oreshnikov, I., Passy, J., Lado, A., Shenhav, A., Lieder, F.

66th Spring Conference of the German Ergonomics Society, 2020 (conference)

Abstract
Unser digitales Zeitalter lebt von Informationen und stellt unsere begrenzte Verarbeitungskapazität damit täglich auf die Probe. Gerade in der Wissensarbeit haben ständige Ablenkungen erhebliche Leistungseinbußen zur Folge. Unsere intelligente Anwendung ACTrain setzt genau an dieser Stelle an und verwandelt Computertätigkeiten in eine Trainingshalle für den Geist. Feedback auf Basis maschineller Lernverfahren zeigt anschaulich den Wert auf, sich nicht von einer selbst gewählten Aufgabe ablenken zu lassen. Diese metakognitive Einsicht soll zum Durchhalten motivieren und das zugrunde liegende Fertigkeitsniveau der Aufmerksamkeitskontrolle stärken. In laufenden Feldexperimenten untersuchen wir die Frage, ob das Training mit diesem optimalen Feedback die Aufmerksamkeits- und Selbstkontrollfertigkeiten im Vergleich zu einer Kontrollgruppe ohne Feedback verbessern kann.

re sf

link (url) [BibTex]


no image
Computationally Tractable Riemannian Manifolds for Graph Embeddings

Cruceru, C., Becigneul, G., Ganea, O.

37th International Conference on Machine Learning (ICML), 2020 (conference) Submitted

ei

[BibTex]

[BibTex]


Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis
Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis

Liao, Y., Schwarz, K., Mescheder, L., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020 (inproceedings)

Abstract
In recent years, Generative Adversarial Networks have achieved impressive results in photorealistic image synthesis. This progress nurtures hopes that one day the classical rendering pipeline can be replaced by efficient models that are learned directly from images. However, current image synthesis models operate in the 2D domain where disentangling 3D properties such as camera viewpoint or object pose is challenging. Furthermore, they lack an interpretable and controllable representation. Our key hypothesis is that the image generation process should be modeled in 3D space as the physical world surrounding us is intrinsically three-dimensional. We define the new task of 3D controllable image synthesis and propose an approach for solving it by reasoning both in 3D space and in the 2D image domain. We demonstrate that our model is able to disentangle latent 3D factors of simple multi-object scenes in an unsupervised fashion from raw images. Compared to pure 2D baselines, it allows for synthesizing scenes that are consistent wrt. changes in viewpoint or object pose. We further evaluate various 3D representations in terms of their usefulness for this challenging task.

avg

pdf suppmat Video Project Page [BibTex]

pdf suppmat Video Project Page [BibTex]


no image
Planning from Images with Deep Latent Gaussian Process Dynamics

Bosch, N., Achterhold, J., Leal-Taixe, L., Stückler, J.

2nd Annual Conference on Learning for Dynamics and Control (L4DC) , 2020 (conference) Accepted

ev

[BibTex]

[BibTex]


no image
Practical Accelerated Optimization on Riemannian Manifolds

F Alimisis, F., Orvieto, A., Becigneul, G., Lucchi, A.

37th International Conference on Machine Learning (ICML), 2020 (conference) Submitted

ei

[BibTex]

[BibTex]


Exploring Data Aggregation in Policy Learning for Vision-based Urban Autonomous Driving
Exploring Data Aggregation in Policy Learning for Vision-based Urban Autonomous Driving

Prakash, A., Behl, A., Ohn-Bar, E., Chitta, K., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020 (inproceedings)

Abstract
Data aggregation techniques can significantly improve vision-based policy learning within a training environment, e.g., learning to drive in a specific simulation condition. However, as on-policy data is sequentially sampled and added in an iterative manner, the policy can specialize and overfit to the training conditions. For real-world applications, it is useful for the learned policy to generalize to novel scenarios that differ from the training conditions. To improve policy learning while maintaining robustness when training end-to-end driving policies, we perform an extensive analysis of data aggregation techniques in the CARLA environment. We demonstrate how the majority of them have poor generalization performance, and develop a novel approach with empirically better generalization performance compared to existing techniques. Our two key ideas are (1) to sample critical states from the collected on-policy data based on the utility they provide to the learned policy in terms of driving behavior, and (2) to incorporate a replay buffer which progressively focuses on the high uncertainty regions of the policy's state distribution. We evaluate the proposed approach on the CARLA NoCrash benchmark, focusing on the most challenging driving scenarios with dense pedestrian and vehicle traffic. Our approach improves driving success rate by 16% over state-of-the-art, achieving 87% of the expert performance while also reducing the collision rate by an order of magnitude without the use of any additional modality, auxiliary tasks, architectural modifications or reward from the environment.

avg

pdf suppmat Video Project Page [BibTex]

pdf suppmat Video Project Page [BibTex]


Differentiation of blackbox combinatorial solvers
Differentiation of blackbox combinatorial solvers

Vlastelica, M., Paulus, A., Musil, V., Martius, G., Rolı́nek, M.

In International Conference on Learning Representations, ICLR’20, 2020 (incollection)

al

link (url) [BibTex]

link (url) [BibTex]


no image
Constant Curvature Graph Convolutional Networks

Bachmann*, G., Becigneul*, G., Ganea, O.

37th International Conference on Machine Learning (ICML), 2020, *equal contribution (conference) Submitted

ei

[BibTex]

[BibTex]


Learning Situational Driving
Learning Situational Driving

Ohn-Bar, E., Prakash, A., Behl, A., Chitta, K., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020 (inproceedings)

Abstract
Human drivers have a remarkable ability to drive in diverse visual conditions and situations, e.g., from maneuvering in rainy, limited visibility conditions with no lane markings to turning in a busy intersection while yielding to pedestrians. In contrast, we find that state-of-the-art sensorimotor driving models struggle when encountering diverse settings with varying relationships between observation and action. To generalize when making decisions across diverse conditions, humans leverage multiple types of situation-specific reasoning and learning strategies. Motivated by this observation, we develop a framework for learning a situational driving policy that effectively captures reasoning under varying types of scenarios. Our key idea is to learn a mixture model with a set of policies that can capture multiple driving modes. We first optimize the mixture model through behavior cloning, and show it to result in significant gains in terms of driving performance in diverse conditions. We then refine the model by directly optimizing for the driving task itself, i.e., supervised with the navigation task reward. Our method is more scalable than methods assuming access to privileged information, e.g., perception labels, as it only assumes demonstration and reward-based supervision. We achieve over 98% success rate on the CARLA driving benchmark as well as state-of-the-art performance on a newly introduced generalization benchmark.

avg

pdf suppmat Video Project Page [BibTex]

pdf suppmat Video Project Page [BibTex]


On Joint Estimation of Pose, Geometry and svBRDF from a Handheld Scanner
On Joint Estimation of Pose, Geometry and svBRDF from a Handheld Scanner

Schmitt, C., Donne, S., Riegler, G., Koltun, V., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020 (inproceedings)

Abstract
We propose a novel formulation for joint recovery of camera pose, object geometry and spatially-varying BRDF. The input to our approach is a sequence of RGB-D images captured by a mobile, hand-held scanner that actively illuminates the scene with point light sources. Compared to previous works that jointly estimate geometry and materials from a hand-held scanner, we formulate this problem using a single objective function that can be minimized using off-the-shelf gradient-based solvers. By integrating material clustering as a differentiable operation into the optimization process, we avoid pre-processing heuristics and demonstrate that our model is able to determine the correct number of specular materials independently. We provide a study on the importance of each component in our formulation and on the requirements of the initial geometry. We show that optimizing over the poses is crucial for accurately recovering fine details and that our approach naturally results in a semantically meaningful material segmentation.

avg

pdf Project Page [BibTex]

pdf Project Page [BibTex]


no image
DirectShape: Photometric Alignment of Shape Priors for Visual Vehicle Pose and Shape Estimation

Wang, R., Yang, N., Stückler, J., Cremers, D.

In Accepted for IEEE international Conference on Robotics and Automation (ICRA), 2020, arXiv:1904.10097 (inproceedings) Accepted

ev

[BibTex]

[BibTex]


Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision
Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision

Niemeyer, M., Mescheder, L., Oechsle, M., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020 (inproceedings)

Abstract
Learning-based 3D reconstruction methods have shown impressive results. However, most methods require 3D supervision which is often hard to obtain for real-world datasets. Recently, several works have proposed differentiable rendering techniques to train reconstruction models from RGB images. Unfortunately, these approaches are currently restricted to voxel- and mesh-based representations, suffering from discretization or low resolution. In this work, we propose a differentiable rendering formulation for implicit shape and texture representations. Implicit representations have recently gained popularity as they represent shape and texture continuously. Our key insight is that depth gradients can be derived analytically using the concept of implicit differentiation. This allows us to learn implicit shape and texture representations directly from RGB images. We experimentally show that our single-view reconstructions rival those learned with full 3D supervision. Moreover, we find that our method can be used for multi-view 3D reconstruction, directly resulting in watertight meshes.

avg

pdf suppmat Video Project Page [BibTex]

pdf suppmat Video Project Page [BibTex]

2009


no image
A computational model of human table tennis for robot application

Mülling, K., Peters, J.

In AMS 2009, pages: 57-64, (Editors: Dillmann, R. , J. Beyerer, C. Stiller, M. Zöllner, T. Gindele), Springer, Berlin, Germany, Autonome Mobile Systeme, December 2009 (inproceedings)

Abstract
Table tennis is a difficult motor skill which requires all basic components of a general motor skill learning system. In order to get a step closer to such a generic approach to the automatic acquisition and refinement of table tennis, we study table tennis from a human motor control point of view. We make use of the basic models of discrete human movement phases, virtual hitting points, and the operational timing hypothesis. Using these components, we create a computational model which is aimed at reproducing human-like behavior. We verify the functionality of this model in a physically realistic simulation of a BarrettWAM.

ei

Web DOI [BibTex]

2009


Web DOI [BibTex]


no image
A PAC-Bayesian Approach to Formulation of Clustering Objectives

Seldin, Y., Tishby, N.

In Proceedings of the NIPS 2009 Workshop "Clustering: Science or Art? Towards Principled Approaches", pages: 1-4, NIPS Workshop "Clustering: Science or Art? Towards Principled Approaches", December 2009 (inproceedings)

Abstract
Clustering is a widely used tool for exploratory data analysis. However, the theoretical understanding of clustering is very limited. We still do not have a well-founded answer to the seemingly simple question of “how many clusters are present in the data?”, and furthermore a formal comparison of clusterings based on different optimization objectives is far beyond our abilities. The lack of good theoretical support gives rise to multiple heuristics that confuse the practitioners and stall development of the field. We suggest that the ill-posed nature of clustering problems is caused by the fact that clustering is often taken out of its subsequent application context. We argue that one does not cluster the data just for the sake of clustering it, but rather to facilitate the solution of some higher level task. By evaluation of the clustering’s contribution to the solution of the higher level task it is possible to compare different clusterings, even those obtained by different optimization objectives. In the preceding work it was shown that such an approach can be applied to evaluation and design of co-clustering solutions. Here we suggest that this approach can be extended to other settings, where clustering is applied.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Notes on Graph Cuts with Submodular Edge Weights

Jegelka, S., Bilmes, J.

In pages: 1-6, NIPS Workshop on Discrete Optimization in Machine Learning: Submodularity, Sparsity & Polyhedra (DISCML), December 2009 (inproceedings)

Abstract
Generalizing the cost in the standard min-cut problem to a submodular cost function immediately makes the problem harder. Not only do we prove NP hardness even for nonnegative submodular costs, but also show a lower bound of (|V |1/3) on the approximation factor for the (s, t) cut version of the problem. On the positive side, we propose and compare three approximation algorithms with an overall approximation factor of O(min{|V |,p|E| log |V |}) that appear to do well in practice.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Learning new basic Movements for Robotics

Kober, J., Peters, J.

In AMS 2009, pages: 105-112, (Editors: Dillmann, R. , J. Beyerer, C. Stiller, M. Zöllner, T. Gindele), Springer, Berlin, Germany, Autonome Mobile Systeme, December 2009 (inproceedings)

Abstract
Obtaining novel skills is one of the most important problems in robotics. Machine learning techniques may be a promising approach for automatic and autonomous acquisition of movement policies. However, this requires both an appropriate policy representation and suitable learning algorithms. Employing the most recent form of the dynamical systems motor primitives originally introduced by Ijspeert et al. [1], we show how both discrete and rhythmic tasks can be learned using a concerted approach of both imitation and reinforcement learning, and present our current best performing learning algorithms. Finally, we show that it is possible to include a start-up phase in rhythmic primitives. We apply our approach to two elementary movements, i.e., Ball-in-a-Cup and Ball-Paddling, which can be learned on a real Barrett WAM robot arm at a pace similar to human learning.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
From Motor Learning to Interaction Learning in Robots

Sigaud, O., Peters, J.

In Proceedings of 7ème Journées Nationales de la Recherche en Robotique, pages: 189-195, JNRR, November 2009 (inproceedings)

Abstract
The number of advanced robot systems has been increasing in recent years yielding a large variety of versatile designs with many degrees of freedom. These robots have the potential of being applicable in uncertain tasks outside well-structured industrial settings. However, the complexity of both systems and tasks is often beyond the reach of classical robot programming methods. As a result, a more autonomous solution for robot task acquisition is needed where robots adaptively adjust their behaviour to the encountered situations and required tasks. Learning approaches pose one of the most appealing ways to achieve this goal. However, while learning approaches are of high importance for robotics, we cannot simply use off-the-shelf methods from the machine learning community as these usually do not scale into the domains of robotics due to excessive computational cost as well as a lack of scalability. Instead, domain appropriate approaches are needed. We focus here on several core domains of robot learning. For accurate task execution, we need motor learning capabilities. For fast learning of the motor tasks, imitation learning offers the most promising approach. Self improvement requires reinforcement learning approaches that scale into the domain of complex robots. Finally, for efficient interaction of humans with robot systems, we will need a form of interaction learning. This contribution provides a general introduction to these issues and briefly presents the contributions of the related book chapters to the corresponding research topics.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Detecting Objects in Large Image Collections and Videos by Efficient Subimage Retrieval

Lampert, CH.

In ICCV 2009, pages: 987-994, IEEE Computer Society, Piscataway, NJ, USA, Twelfth IEEE International Conference on Computer Vision, October 2009 (inproceedings)

Abstract
We study the task of detecting the occurrence of objects in large image collections or in videos, a problem that combines aspects of content based image retrieval and object localization. While most previous approaches are either limited to special kinds of queries, or do not scale to large image sets, we propose a new method, efficient subimage retrieval (ESR), which is at the same time very flexible and very efficient. Relying on a two-layered branch-and-bound setup, ESR performs object-based image retrieval in sets of 100,000 or more images within seconds. An extensive evaluation on several datasets shows that ESR is not only very fast, but it also achieves detection accuracies that are on par with or superior to previously published methods for object-based image retrieval.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A new non-monotonic algorithm for PET image reconstruction

Sra, S., Kim, D., Dhillon, I., Schölkopf, B.

In IEEE - Nuclear Science Symposium Conference Record (NSS/MIC), 2009, pages: 2500-2502, (Editors: B Yu), IEEE, Piscataway, NJ, USA, IEEE Nuclear Science Symposium and Medical Imaging Conference, October 2009 (inproceedings)

Abstract
Maximizing some form of Poisson likelihood (either with or without penalization) is central to image reconstruction algorithms in emission tomography. In this paper we introduce NMML, a non-monotonic algorithm for maximum likelihood PET image reconstruction. NMML offers a simple and flexible procedure that also easily incorporates standard convex regular-ization for doing penalized likelihood estimation. A vast number image reconstruction algorithms have been developed for PET, and new ones continue to be designed. Among these, methods based on the expectation maximization (EM) and ordered-subsets (OS) framework seem to have enjoyed the greatest popularity. Our method NMML differs fundamentally from methods based on EM: i) it does not depend on the concept of optimization transfer (or surrogate functions); and ii) it is a rapidly converging nonmonotonic descent procedure. The greatest strengths of NMML, however, are its simplicity, efficiency, and scalability, which make it especially attractive for tomograph ic reconstruction. We provide a theoretical analysis NMML, and empirically observe it to outperform standard EM based methods, sometimes by orders of magnitude. NMML seamlessly allows integreation of penalties (regularizers) in the likelihood. This ability can prove to be crucial, especially because with the rapidly rising importance of combined PET/MR scanners, one will want to include more “prior” knowledge into the reconstruction.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Approximation Algorithms for Tensor Clustering

Jegelka, S., Sra, S., Banerjee, A.

In Algorithmic Learning Theory: 20th International Conference, pages: 368-383, (Editors: Gavalda, R. , G. Lugosi, T. Zeugmann, S. Zilles), Springer, Berlin, Germany, ALT, October 2009 (inproceedings)

Abstract
We present the first (to our knowledge) approximation algo- rithm for tensor clustering—a powerful generalization to basic 1D clustering. Tensors are increasingly common in modern applications dealing with complex heterogeneous data and clustering them is a fundamental tool for data analysis and pattern discovery. Akin to their 1D cousins, common tensor clustering formulations are NP-hard to optimize. But, unlike the 1D case no approximation algorithms seem to be known. We address this imbalance and build on recent co-clustering work to derive a tensor clustering algorithm with approximation guarantees, allowing metrics and divergences (e.g., Bregman) as objective functions. Therewith, we answer two open questions by Anagnostopoulos et al. (2008). Our analysis yields a constant approximation factor independent of data size; a worst-case example shows this factor to be tight for Euclidean co-clustering. However, empirically the approximation factor is observed to be conservative, so our method can also be used in practice.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Active learning using mean shift optimization for robot grasping

Kroemer, O., Detry, R., Piater, J., Peters, J.

In Proceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2009), pages: 2610-2615, IEEE Service Center, Piscataway, NJ, USA, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), October 2009 (inproceedings)

Abstract
When children learn to grasp a new object, they often know several possible grasping points from observing a parent‘s demonstration and subsequently learn better grasps by trial and error. From a machine learning point of view, this process is an active learning approach. In this paper, we present a new robot learning framework for reproducing this ability in robot grasping. For doing so, we chose a straightforward approach: first, the robot observes a few good grasps by demonstration and learns a value function for these grasps using Gaussian process regression. Subsequently, it chooses grasps which are optimal with respect to this value function using a mean-shift optimization approach, and tries them out on the real system. Upon every completed trial, the value function is updated, and in the following trials it is more likely to choose even better grasping points. This method exhibits fast learning due to the data-efficiency of Gaussian process regression framework and the fact th at t he mean-shift method provides maxima of this cost function. Experiments were repeatedly carried out successfully on a real robot system. After less than sixty trials, our system has adapted its grasping policy to consistently exhibit successful grasps.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Sparse online model learning for robot control with support vector regression

Nguyen-Tuong, D., Schölkopf, B., Peters, J.

In Proceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2009), pages: 3121-3126, IEEE Service Center, Piscataway, NJ, USA, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), October 2009 (inproceedings)

Abstract
The increasing complexity of modern robots makes it prohibitively hard to accurately model such systems as required by many applications. In such cases, machine learning methods offer a promising alternative for approximating such models using measured data. To date, high computational demands have largely restricted machine learning techniques to mostly offline applications. However, making the robots adaptive to changes in the dynamics and to cope with unexplored areas of the state space requires online learning. In this paper, we propose an approximation of the support vector regression (SVR) by sparsification based on the linear independency of training data. As a result, we obtain a method which is applicable in real-time online learning. It exhibits competitive learning accuracy when compared with standard regression techniques, such as nu-SVR, Gaussian process regression (GPR) and locally weighted projection regression (LWPR).

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Implicit Wiener Series Analysis of Epileptic Seizure Recordings

Barbero, A., Franz, M., Drongelen, W., Dorronsoro, J., Schölkopf, B., Grosse-Wentrup, M.

In EMBC 2009, pages: 5304-5307, (Editors: Y Kim and B He and G Worrell and X Pan), IEEE Service Center, Piscataway, NJ, USA, 31st Annual International Conference of the IEEE Engineering in Medicine and Biology Society, September 2009 (inproceedings)

Abstract
Implicit Wiener series are a powerful tool to build Volterra representations of time series with any degree of nonlinearity. A natural question is then whether higher order representations yield more useful models. In this work we shall study this question for ECoG data channel relationships in epileptic seizure recordings, considering whether quadratic representations yield more accurate classifiers than linear ones. To do so we first show how to derive statistical information on the Volterra coefficient distribution and how to construct seizure classification patterns over that information. As our results illustrate, a quadratic model seems to provide no advantages over a linear one. Nevertheless, we shall also show that the interpretability of the implicit Wiener series provides insights into the inter-channel relationships of the recordings.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Incorporating Prior Knowledge on Class Probabilities into Local Similarity Measures for Intermodality Image Registration

Hofmann, M., Schölkopf, B., Bezrukov, I., Cahill, N.

In Proceedings of the MICCAI 2009 Workshop on Probabilistic Models for Medical Image Analysis , pages: 220-231, (Editors: W Wells and S Joshi and K Pohl), PMMIA, September 2009 (inproceedings)

Abstract
We present a methodology for incorporating prior knowledge on class probabilities into the registration process. By using knowledge from the imaging modality, pre-segmentations, and/or probabilistic atlases, we construct vectors of class probabilities for each image voxel. By defining new image similarity measures for distribution-valued images, we show how the class probability images can be nonrigidly registered in a variational framework. An experiment on nonrigid registration of MR and CT full-body scans illustrates that the proposed technique outperforms standard mutual information (MI) and normalized mutual information (NMI) based registration techniques when measured in terms of target registration error (TRE) of manually labeled fiducials.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Markerless 3D Face Tracking (DAGM 2009)

Walder, C., Breidt, M., Bülthoff, H., Schölkopf, B., Curio, C.

In Pattern Recognition, Lecture Notes in Computer Science, Vol. 5748 , pages: 41-50, (Editors: J Denzler and G Notni and H Süsse), Springer, Berlin, Germany, 31st Symposium of the German Association for Pattern Recognition (DAGM), September 2009 (inproceedings)

Abstract
We present a novel algorithm for the markerless tracking of deforming surfaces such as faces. We acquire a sequence of 3D scans along with color images at 40Hz. The data is then represented by implicit surface and color functions, using a novel partition-of-unity type method of efficiently combining local regressors using nearest neighbor searches. Both these functions act on the 4D space of 3D plus time, and use temporal information to handle the noise in individual scans. After interactive registration of a template mesh to the first frame, it is then automatically deformed to track the scanned surface, using the variation of both shape and color as features in a dynamic energy minimization problem. Our prototype system yields high-quality animated 3D models in correspondence, at a rate of approximately twenty seconds per timestep. Tracking results for faces and other objects are presented.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Object Localization with Global and Local Context Kernels

Blaschko, M., Lampert, C.

In British Machine Vision Conference 2009, pages: 1-11, BMVC, September 2009 (inproceedings)

Abstract
Recent research has shown that the use of contextual cues significantly improves performance in sliding window type localization systems. In this work, we propose a method that incorporates both global and local context information through appropriately defined kernel functions. In particular, we make use of a weighted combination of kernels defined over local spatial regions, as well as a global context kernel. The relative importance of the context contributions is learned automatically, and the resulting discriminant function is of a form such that localization at test time can be solved efficiently using a branch and bound optimization scheme. By specifying context directly with a kernel learning approach, we achieve high localization accuracy with a simple and efficient representation. This is in contrast to other systems that incorporate context for which expensive inference needs to be done at test time. We show experimentally on the PASCAL VOC datasets that the inclusion of context can significantly improve localization performance, provided the relative contributions of context cues are learned appropriately.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Efficient Sample Reuse in EM-Based Policy Search

Hachiya, H., Peters, J., Sugiyama, M.

In 16th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pages: 469-484, (Editors: Buntine, W. , M. Grobelnik, D. Mladenic, J. Shawe-Taylor), Springer, Berlin, Germany, ECML PKDD, September 2009 (inproceedings)

Abstract
Direct policy search is a promising reinforcement learning framework in particular for controlling in continuous, high-dimensional systems such as anthropomorphic robots. Policy search often requires a large number of samples for obtaining a stable policy update estimator due to its high flexibility. However, this is prohibitive when the sampling cost is expensive. In this paper, we extend a EM-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, called Reward-weighted Regression with sample Reuse, is demonstrated through a robot learning experiment.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Active Structured Learning for High-Speed Object Detection

Lampert, C., Peters, J.

In DAGM 2009, pages: 221-231, (Editors: Denzler, J. , G. Notni, H. Süsse), Springer, Berlin, Germany, 31st Annual Symposium of the German Association for Pattern Recognition, September 2009 (inproceedings)

Abstract
High-speed smooth and accurate visual tracking of objects in arbitrary, unstructured environments is essential for robotics and human motion analysis. However, building a system that can adapt to arbitrary objects and a wide range of lighting conditions is a challenging problem, especially if hard real-time constraints apply like in robotics scenarios. In this work, we introduce a method for learning a discriminative object tracking system based on the recent structured regression framework for object localization. Using a kernel function that allows fast evaluation on the GPU, the resulting system can process video streams at speed of 100 frames per second or more. Consecutive frames in high speed video sequences are typically very redundant, and for training an object detection system, it is sufficient to have training labels from only a subset of all images. We propose an active learning method that select training examples in a data-driven way, thereby minimizing the required number of training labeling. Experiments on realistic data show that the active learning is superior to previously used methods for dataset subsampling for this task.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Generalized Clustering via Kernel Embeddings

Jegelka, S., Gretton, A., Schölkopf, B., Sriperumbudur, B., von Luxburg, U.

In KI 2009: AI and Automation, Lecture Notes in Computer Science, Vol. 5803, pages: 144-152, (Editors: B Mertsching and M Hund and Z Aziz), Springer, Berlin, Germany, 32nd Annual Conference on Artificial Intelligence (KI), September 2009 (inproceedings)

Abstract
We generalize traditional goals of clustering towards distinguishing components in a non-parametric mixture model. The clusters are not necessarily based on point locations, but on higher order criteria. This framework can be implemented by embedding probability distributions in a Hilbert space. The corresponding clustering objective is very general and relates to a range of common clustering concepts.

ei

PDF PDF Web DOI [BibTex]

PDF PDF Web DOI [BibTex]


no image
Discovering Temporal Patterns of Differential Gene Expression in Microarray Time Series

Stegle, O., Denby, KJ., Wild, DL., McHattie, S., Mead, A., Ghahramani, Z., Borgwardt, KM.

In Proceedings of the German Conference on Bioinformatics 2009 (GCB 2009), pages: 133-142, (Editors: Grosse, I. , S. Neumann, S. Posch, F. Schreiber, P. F. Stadler), Gesellschaft für Informatik, Bonn, Germany, German Conference on Bioinformatics (GCB '09), September 2009 (inproceedings)

Abstract
A wealth of time series of microarray measurements have become available over recent years. Several two-sample tests for detecting differential gene expression in these time series have been defined, but they can only answer the question whether a gene is differentially expressed across the whole time series, not in which intervals it is differentially expressed. In this article, we propose a Gaussian process based approach for studying these dynamics of differential gene expression. In experiments on Arabidopsis thaliana gene expression levels, our novel technique helps us to uncover that the family of WRKY transcription factors appears to be involved in the early response to infection by a fungal pathogen.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Varieties of Justification in Machine Learning

Corfield, D.

In Proceedings of Multiplicity and Unification in Statistics and Probability, pages: 1-10, Multiplicity and Unification in Statistics and Probability, June 2009 (inproceedings)

Abstract
The field of machine learning has flourished over the past couple of decades. With huge amounts of data available, efficient algorithms can learn to extrapolate from their training sets to become very accurate classifiers. For example, it is straightforward now to develop classifiers which achieve accuracies of around 99% on databases of handwritten digits. Now these algorithms have been devised by theorists who arrive at the problem of machine learning with a range of different philosophical outlooks on the subject of inductive reasoning. This has led to a wide range of theoretical rationales for their work. In this talk I shall classify the different forms of justification for inductive machine learning into four kinds, and make some comparisons between them. With little by way of theoretical knowledge to aid in the learning tasks, while the relevance of these justificatory approaches for the inductive reasoning of the natural sciences is questionable, certain issues surrounding the presuppositions of inductive reasoning are brought sharply into focus. In particular, Frequentist, Bayesian and MDL outlooks can be compared.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Effects of Stimulus Type and of Error-Correcting Code Design on BCI Speller Performance

Hill, J., Farquhar, J., Martens, S., Biessmann, F., Schölkopf, B.

In Advances in neural information processing systems 21, pages: 665-672, (Editors: D Koller and D Schuurmans and Y Bengio and L Bottou), Curran, Red Hook, NY, USA, 22nd Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
From an information-theoretic perspective, a noisy transmission system such as a visual Brain-Computer Interface (BCI) speller could benefit from the use of errorcorrecting codes. However, optimizing the code solely according to the maximal minimum-Hamming-distance criterion tends to lead to an overall increase in target frequency of target stimuli, and hence a significantly reduced average target-to-target interval (TTI), leading to difficulties in classifying the individual event-related potentials (ERPs) due to overlap and refractory effects. Clearly any change to the stimulus setup must also respect the possible psychophysiological consequences. Here we report new EEG data from experiments in which we explore stimulus types and codebooks in a within-subject design, finding an interaction between the two factors. Our data demonstrate that the traditional, rowcolumn code has particular spatial properties that lead to better performance than one would expect from its TTIs and Hamming-distances alone, but nonetheless error-correcting codes can improve performance provided the right stimulus type is used.

ei

PDF PDF Web [BibTex]

PDF PDF Web [BibTex]


no image
Influence of graph construction on graph-based clustering measures

Maier, M., von Luxburg, U., Hein, M.

In Advances in neural information processing systems 21, pages: 1025-1032, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Graph clustering methods such as spectral clustering are defined for general weighted graphs. In machine learning, however, data often is not given in form of a graph, but in terms of similarity (or distance) values between points. In this case, first a neighborhood graph is constructed using the similarities between the points and then a graph clustering algorithm is applied to this graph. In this paper we investigate the influence of the construction of the similarity graph on the clustering results. We first study the convergence of graph clustering criteria such as the normalized cut (Ncut) as the sample size tends to infinity. We find that the limit expressions are different for different types of graph, for example the r-neighborhood graph or the k-nearest neighbor graph. In plain words: Ncut on a kNN graph does something systematically different than Ncut on an r-neighborhood graph! This finding shows that graph clustering criteria cannot be studied independently of the kind of graph they are applied to. We also provide examples which show that these differences can be observed for toy and real data already for rather small sample sizes.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Local Gaussian Process Regression for Real Time Online Model Learning and Control

Nguyen-Tuong, D., Seeger, M., Peters, J.

In Advances in neural information processing systems 21, pages: 1193-1200, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Learning in real-time applications, e.g., online approximation of the inverse dynamics model for model-based robot control, requires fast online regression techniques. Inspired by local learning, we propose a method to speed up standard Gaussian Process regression (GPR) with local GP models (LGP). The training data is partitioned in local regions, for each an individual GP model is trained. The prediction for a query point is performed by weighted estimation using nearby local models. Unlike other GP approximations, such as mixtures of experts, we use a distance based measure for partitioning of the data and weighted prediction. The proposed method achieves online learning and prediction in real-time. Comparisons with other nonparametric regression methods show that LGP has higher accuracy than LWPR and close to the performance of standard GPR and nu-SVR.

ei

PDF Web [BibTex]

PDF Web [BibTex]