Header logo is


2018


Deep Reinforcement Learning for Event-Triggered Control
Deep Reinforcement Learning for Event-Triggered Control

Baumann, D., Zhu, J., Martius, G., Trimpe, S.

In Proceedings of the 57th IEEE International Conference on Decision and Control (CDC), pages: 943-950, 57th IEEE International Conference on Decision and Control (CDC), December 2018 (inproceedings)

al ics

arXiv PDF DOI Project Page Project Page [BibTex]

2018


arXiv PDF DOI Project Page Project Page [BibTex]


Gait learning for soft microrobots controlled by light fields
Gait learning for soft microrobots controlled by light fields

Rohr, A. V., Trimpe, S., Marco, A., Fischer, P., Palagi, S.

In International Conference on Intelligent Robots and Systems (IROS) 2018, pages: 6199-6206, International Conference on Intelligent Robots and Systems 2018, October 2018 (inproceedings)

Abstract
Soft microrobots based on photoresponsive materials and controlled by light fields can generate a variety of different gaits. This inherent flexibility can be exploited to maximize their locomotion performance in a given environment and used to adapt them to changing environments. However, because of the lack of accurate locomotion models, and given the intrinsic variability among microrobots, analytical control design is not possible. Common data-driven approaches, on the other hand, require running prohibitive numbers of experiments and lead to very sample-specific results. Here we propose a probabilistic learning approach for light-controlled soft microrobots based on Bayesian Optimization (BO) and Gaussian Processes (GPs). The proposed approach results in a learning scheme that is highly data-efficient, enabling gait optimization with a limited experimental budget, and robust against differences among microrobot samples. These features are obtained by designing the learning scheme through the comparison of different GP priors and BO settings on a semisynthetic data set. The developed learning scheme is validated in microrobot experiments, resulting in a 115% improvement in a microrobot’s locomotion performance with an experimental budget of only 20 tests. These encouraging results lead the way toward self-adaptive microrobotic systems based on lightcontrolled soft microrobots and probabilistic learning control.

ics pf

arXiv IEEE Xplore DOI Project Page [BibTex]

arXiv IEEE Xplore DOI Project Page [BibTex]


On the Integration of Optical Flow and Action Recognition
On the Integration of Optical Flow and Action Recognition

Sevilla-Lara, L., Liao, Y., Güney, F., Jampani, V., Geiger, A., Black, M. J.

In German Conference on Pattern Recognition (GCPR), LNCS 11269, pages: 281-297, Springer, Cham, October 2018 (inproceedings)

Abstract
Most of the top performing action recognition methods use optical flow as a "black box" input. Here we take a deeper look at the combination of flow and action recognition, and investigate why optical flow is helpful, what makes a flow method good for action recognition, and how we can make it better. In particular, we investigate the impact of different flow algorithms and input transformations to better understand how these affect a state-of-the-art action recognition method. Furthermore, we fine tune two neural-network flow methods end-to-end on the most widely used action recognition dataset (UCF101). Based on these experiments, we make the following five observations: 1) optical flow is useful for action recognition because it is invariant to appearance, 2) optical flow methods are optimized to minimize end-point-error (EPE), but the EPE of current methods is not well correlated with action recognition performance, 3) for the flow methods tested, accuracy at boundaries and at small displacements is most correlated with action recognition performance, 4) training optical flow to minimize classification error instead of minimizing EPE improves recognition performance, and 5) optical flow learned for the task of action recognition differs from traditional optical flow especially inside the human body and at the boundary of the body. These observations may encourage optical flow researchers to look beyond EPE as a goal and guide action recognition researchers to seek better motion cues, leading to a tighter integration of the optical flow and action recognition communities.

avg ps

arXiv DOI [BibTex]

arXiv DOI [BibTex]


Towards Robust Visual Odometry with a Multi-Camera System
Towards Robust Visual Odometry with a Multi-Camera System

Liu, P., Geppert, M., Heng, L., Sattler, T., Geiger, A., Pollefeys, M.

In International Conference on Intelligent Robots and Systems (IROS) 2018, International Conference on Intelligent Robots and Systems, October 2018 (inproceedings)

Abstract
We present a visual odometry (VO) algorithm for a multi-camera system and robust operation in challenging environments. Our algorithm consists of a pose tracker and a local mapper. The tracker estimates the current pose by minimizing photometric errors between the most recent keyframe and the current frame. The mapper initializes the depths of all sampled feature points using plane-sweeping stereo. To reduce pose drift, a sliding window optimizer is used to refine poses and structure jointly. Our formulation is flexible enough to support an arbitrary number of stereo cameras. We evaluate our algorithm thoroughly on five datasets. The datasets were captured in different conditions: daytime, night-time with near-infrared (NIR) illumination and night-time without NIR illumination. Experimental results show that a multi-camera setup makes the VO more robust to challenging environments, especially night-time conditions, in which a single stereo configuration fails easily due to the lack of features.

avg

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Learning Priors for Semantic 3D Reconstruction
Learning Priors for Semantic 3D Reconstruction

Cherabier, I., Schönberger, J., Oswald, M., Pollefeys, M., Geiger, A.

In Computer Vision – ECCV 2018, Springer International Publishing, Cham, September 2018 (inproceedings)

Abstract
We present a novel semantic 3D reconstruction framework which embeds variational regularization into a neural network. Our network performs a fixed number of unrolled multi-scale optimization iterations with shared interaction weights. In contrast to existing variational methods for semantic 3D reconstruction, our model is end-to-end trainable and captures more complex dependencies between the semantic labels and the 3D geometry. Compared to previous learning-based approaches to 3D reconstruction, we integrate powerful long-range dependencies using variational coarse-to-fine optimization. As a result, our network architecture requires only a moderate number of parameters while keeping a high level of expressiveness which enables learning from very little data. Experiments on real and synthetic datasets demonstrate that our network achieves higher accuracy compared to a purely variational approach while at the same time requiring two orders of magnitude less iterations to converge. Moreover, our approach handles ten times more semantic class labels using the same computational resources.

avg

pdf suppmat Project Page Video DOI Project Page [BibTex]

pdf suppmat Project Page Video DOI Project Page [BibTex]


no image
Discovering and Teaching Optimal Planning Strategies

Lieder, F., Callaway, F., Krueger, P. M., Das, P., Griffiths, T. L., Gul, S.

In The 14th biannual conference of the German Society for Cognitive Science, GK, September 2018, Falk Lieder and Frederick Callaway contributed equally to this publication. (inproceedings)

Abstract
How should we think and decide, and how can we learn to make better decisions? To address these questions we formalize the discovery of cognitive strategies as a metacognitive reinforcement learning problem. This formulation leads to a computational method for deriving optimal cognitive strategies and a feedback mechanism for accelerating the process by which people learn how to make better decisions. As a proof of concept, we apply our approach to develop an intelligent system that teaches people optimal planning stratgies. Our training program combines a novel process-tracing paradigm that makes peoples latent planning strategies observable with an intelligent system that gives people feedback on how their planning strategy could be improved. The pedagogy of our intelligent tutor is based on the theory that people discover their cognitive strategies through metacognitive reinforcement learning. Concretely, the tutor’s feedback is designed to maximally accelerate people’s metacognitive reinforcement learning towards the optimal cognitive strategy. A series of four experiments confirmed that training with the cognitive tutor significantly improved people’s decision-making competency: Experiment 1 demonstrated that the cognitive tutor’s feedback accelerates participants’ metacognitive learning. Experiment 2 found that this training effect transfers to more difficult planning problems in more complex environments. Experiment 3 found that these transfer effects are retained for at least 24 hours after the training. Finally, Experiment 4 found that practicing with the cognitive tutor conveys additional benefits above and beyond verbal description of the optimal planning strategy. The results suggest that promoting metacognitive reinforcement learning with optimal feedback is a promising approach to improving the human mind.

re

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


Unsupervised Learning of Multi-Frame Optical Flow with Occlusions
Unsupervised Learning of Multi-Frame Optical Flow with Occlusions

Janai, J., Güney, F., Ranjan, A., Black, M. J., Geiger, A.

In European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science, vol 11220, pages: 713-731, Springer, Cham, September 2018 (inproceedings)

avg ps

pdf suppmat Video Project Page DOI Project Page [BibTex]

pdf suppmat Video Project Page DOI Project Page [BibTex]


no image
Discovering Rational Heuristics for Risky Choice

Gul, S., Krueger, P. M., Callaway, F., Griffiths, T. L., Lieder, F.

The 14th biannual conference of the German Society for Cognitive Science, GK, The 14th biannual conference of the German Society for Cognitive Science, GK, September 2018 (conference)

Abstract
How should we think and decide to make the best possible use of our precious time and limited cognitive resources? And how do people’s cognitive strategies compare to this ideal? We study these questions in the domain of multi-alternative risky choice using the methodology of resource-rational analysis. To answer the first question, we leverage a new meta-level reinforcement learning algorithm to derive optimal heuristics for four different risky choice environments. We find that our method rediscovers two fast-and-frugal heuristics that people are known to use, namely Take-The-Best and choosing randomly, as resource-rational strategies for specific environments. Our method also discovered a novel heuristic that combines elements of Take-The-Best and Satisficing. To answer the second question, we use the Mouselab paradigm to measure how people’s decision strategies compare to the predictions of our resource-rational analysis. We found that our resource-rational analysis correctly predicted which strategies people use and under which conditions they use them. While people generally tend to make rational use of their limited resources overall, their strategy choices do not always fully exploit the structure of each decision problem. Overall, people’s decision operations were about 88% as resource-rational as they could possibly be. A formal model comparison confirmed that our resource-rational model explained people’s decision strategies significantly better than the Directed Cognition model of Gabaix et al. (2006). Our study is a proof-of-concept that optimal cognitive strategies can be automatically derived from the principle of resource-rationality. Our results suggest that resource-rational analysis is a promising approach for uncovering people’s cognitive strategies and revisiting the debate about human rationality with a more realistic normative standard.

re

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images
SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images

Coors, B., Condurache, A. P., Geiger, A.

European Conference on Computer Vision (ECCV), September 2018 (conference)

Abstract
Omnidirectional cameras offer great benefits over classical cameras wherever a wide field of view is essential, such as in virtual reality applications or in autonomous robots. Unfortunately, standard convolutional neural networks are not well suited for this scenario as the natural projection surface is a sphere which cannot be unwrapped to a plane without introducing significant distortions, particularly in the polar regions. In this work, we present SphereNet, a novel deep learning framework which encodes invariance against such distortions explicitly into convolutional neural networks. Towards this goal, SphereNet adapts the sampling locations of the convolutional filters, effectively reversing distortions, and wraps the filters around the sphere. By building on regular convolutions, SphereNet enables the transfer of existing perspective convolutional neural network models to the omnidirectional case. We demonstrate the effectiveness of our method on the tasks of image classification and object detection, exploiting two newly created semi-synthetic and real-world omnidirectional datasets.

avg

pdf suppmat Project Page [BibTex]


no image
Learning to Select Computations

Callaway, F., Gul, S., Krueger, P. M., Griffiths, T. L., Lieder, F.

In Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference, August 2018, Frederick Callaway and Sayan Gul and Falk Lieder contributed equally to this publication. (inproceedings)

Abstract
The efficient use of limited computational resources is an essential ingredient of intelligence. Selecting computations optimally according to rational metareasoning would achieve this, but this is computationally intractable. Inspired by psychology and neuroscience, we propose the first concrete and domain-general learning algorithm for approximating the optimal selection of computations: Bayesian metalevel policy search (BMPS). We derive this general, sample-efficient search algorithm for a computation-selecting metalevel policy based on the insight that the value of information lies between the myopic value of information and the value of perfect information. We evaluate BMPS on three increasingly difficult metareasoning problems: when to terminate computation, how to allocate computation between competing options, and planning. Across all three domains, BMPS achieved near-optimal performance and compared favorably to previously proposed metareasoning heuristics. Finally, we demonstrate the practical utility of BMPS in an emergency management scenario, even accounting for the overhead of metareasoning.

re

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


A machine from machines
A machine from machines

Fischer, P.

Nature Physics, 14, pages: 1072–1073, July 2018 (misc)

Abstract
Building spinning microrotors that self-assemble and synchronize to form a gear sounds like an impossible feat. However, it has now been achieved using only a single type of building block -- a colloid that self-propels.

pf

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Probabilistic Recurrent State-Space Models
Probabilistic Recurrent State-Space Models

Doerr, A., Daniel, C., Schiegg, M., Nguyen-Tuong, D., Schaal, S., Toussaint, M., Trimpe, S.

In Proceedings of the International Conference on Machine Learning (ICML), International Conference on Machine Learning (ICML), July 2018 (inproceedings)

Abstract
State-space models (SSMs) are a highly expressive model class for learning patterns in time series data and for system identification. Deterministic versions of SSMs (e.g., LSTMs) proved extremely successful in modeling complex time-series data. Fully probabilistic SSMs, however, unfortunately often prove hard to train, even for smaller problems. To overcome this limitation, we propose a scalable initialization and training algorithm based on doubly stochastic variational inference and Gaussian processes. In the variational approximation we propose in contrast to related approaches to fully capture the latent state temporal correlations to allow for robust training.

am ics

arXiv pdf Project Page [BibTex]

arXiv pdf Project Page [BibTex]


Colloidal Chemical Nanomotors
Colloidal Chemical Nanomotors

Alarcon-Correa, M.

Colloidal Chemical Nanomotors, pages: 150, Cuvillier Verlag, MPI-IS , June 2018 (phdthesis)

Abstract
Synthetic sophisticated nanostructures represent a fundamental building block for the development of nanotechnology. The fabrication of nanoparticles complex in structure and material composition is key to build nanomachines that can operate as man-made nanoscale motors, which autonomously convert external energy into motion. To achieve this, asymmetric nanoparticles were fabricated combining a physical vapor deposition technique known as NanoGLAD and wet chemical synthesis. This thesis primarily concerns three complex colloidal systems that have been developed: i)Hollow nanocup inclusion complexes that have a single Au nanoparticle in their pocket. The Au particle can be released with an external trigger. ii)The smallest self-propelling nanocolloids that have been made to date, which give rise to a local concentration gradient that causes enhanced diffusion of the particles. iii)Enzyme-powered pumps that have been assembled using bacteriophages as biological nanoscaffolds. This construct also can be used for enzyme recovery after heterogeneous catalysis.

pf

[BibTex]

[BibTex]


Soft Miniaturized Linear Actuators Wirelessly Powered by Rotating Permanent Magnets
Soft Miniaturized Linear Actuators Wirelessly Powered by Rotating Permanent Magnets

Qiu, T., Palagi, S., Sachs, J., Fischer, P.

In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages: 3595-3600, May 2018 (inproceedings)

Abstract
Wireless actuation by magnetic fields allows for the operation of untethered miniaturized devices, e.g. in biomedical applications. Nevertheless, generating large controlled forces over relatively large distances is challenging. Magnetic torques are easier to generate and control, but they are not always suitable for the tasks at hand. Moreover, strong magnetic fields are required to generate a sufficient torque, which are difficult to achieve with electromagnets. Here, we demonstrate a soft miniaturized actuator that transforms an externally applied magnetic torque into a controlled linear force. We report the design, fabrication and characterization of both the actuator and the magnetic field generator. We show that the magnet assembly, which is based on a set of rotating permanent magnets, can generate strong controlled oscillating fields over a relatively large workspace. The actuator, which is 3D-printed, can lift a load of more than 40 times its weight. Finally, we show that the actuator can be further miniaturized, paving the way towards strong, wirelessly powered microactuators.

pf

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Robust Dense Mapping for Large-Scale Dynamic Environments
Robust Dense Mapping for Large-Scale Dynamic Environments

Barsan, I. A., Liu, P., Pollefeys, M., Geiger, A.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2018, IEEE, International Conference on Robotics and Automation, May 2018 (inproceedings)

Abstract
We present a stereo-based dense mapping algorithm for large-scale dynamic urban environments. In contrast to other existing methods, we simultaneously reconstruct the static background, the moving objects, and the potentially moving but currently stationary objects separately, which is desirable for high-level mobile robotic tasks such as path planning in crowded environments. We use both instance-aware semantic segmentation and sparse scene flow to classify objects as either background, moving, or potentially moving, thereby ensuring that the system is able to model objects with the potential to transition from static to dynamic, such as parked cars. Given camera poses estimated from visual odometry, both the background and the (potentially) moving objects are reconstructed separately by fusing the depth maps computed from the stereo input. In addition to visual odometry, sparse scene flow is also used to estimate the 3D motions of the detected moving objects, in order to reconstruct them accurately. A map pruning technique is further developed to improve reconstruction accuracy and reduce memory consumption, leading to increased scalability. We evaluate our system thoroughly on the well-known KITTI dataset. Our system is capable of running on a PC at approximately 2.5Hz, with the primary bottleneck being the instance-aware semantic segmentation, which is a limitation we hope to address in future work.

avg

pdf Video Project Page Project Page [BibTex]

pdf Video Project Page Project Page [BibTex]


Online Learning of a Memory for Learning Rates
Online Learning of a Memory for Learning Rates

(nominated for best paper award)

Meier, F., Kappler, D., Schaal, S.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2018, IEEE, International Conference on Robotics and Automation, May 2018, accepted (inproceedings)

Abstract
The promise of learning to learn for robotics rests on the hope that by extracting some information about the learning process itself we can speed up subsequent similar learning tasks. Here, we introduce a computationally efficient online meta-learning algorithm that builds and optimizes a memory model of the optimal learning rate landscape from previously observed gradient behaviors. While performing task specific optimization, this memory of learning rates predicts how to scale currently observed gradients. After applying the gradient scaling our meta-learner updates its internal memory based on the observed effect its prediction had. Our meta-learner can be combined with any gradient-based optimizer, learns on the fly and can be transferred to new optimization tasks. In our evaluations we show that our meta-learning algorithm speeds up learning of MNIST classification and a variety of learning control tasks, either in batch or online learning settings.

am

pdf video code [BibTex]

pdf video code [BibTex]


Learning Sensor Feedback Models from Demonstrations via Phase-Modulated Neural Networks
Learning Sensor Feedback Models from Demonstrations via Phase-Modulated Neural Networks

Sutanto, G., Su, Z., Schaal, S., Meier, F.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2018, IEEE, International Conference on Robotics and Automation, May 2018 (inproceedings)

am

pdf video [BibTex]

pdf video [BibTex]


RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials
RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials

Paschalidou, D., Ulusoy, A. O., Schmitt, C., Gool, L., Geiger, A.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2018, 2018 (inproceedings)

Abstract
In this paper, we consider the problem of reconstructing a dense 3D model using images captured from different views. Recent methods based on convolutional neural networks (CNN) allow learning the entire task from data. However, they do not incorporate the physics of image formation such as perspective geometry and occlusion. Instead, classical approaches based on Markov Random Fields (MRF) with ray-potentials explicitly model these physical processes, but they cannot cope with large surface appearance variations across different viewpoints. In this paper, we propose RayNet, which combines the strengths of both frameworks. RayNet integrates a CNN that learns view-invariant feature representations with an MRF that explicitly encodes the physics of perspective projection and occlusion. We train RayNet end-to-end using empirical risk minimization. We thoroughly evaluate our approach on challenging real-world datasets and demonstrate its benefits over a piece-wise trained baseline, hand-crafted models as well as other learning-based approaches.

avg

pdf suppmat Video Project Page code Poster Project Page [BibTex]

pdf suppmat Video Project Page code Poster Project Page [BibTex]


no image
On Time Optimization of Centroidal Momentum Dynamics

Ponton, B., Herzog, A., Del Prete, A., Schaal, S., Righetti, L.

In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages: 5776-5782, IEEE, Brisbane, Australia, 2018 (inproceedings)

Abstract
Recently, the centroidal momentum dynamics has received substantial attention to plan dynamically consistent motions for robots with arms and legs in multi-contact scenarios. However, it is also non convex which renders any optimization approach difficult and timing is usually kept fixed in most trajectory optimization techniques to not introduce additional non convexities to the problem. But this can limit the versatility of the algorithms. In our previous work, we proposed a convex relaxation of the problem that allowed to efficiently compute momentum trajectories and contact forces. However, our approach could not minimize a desired angular momentum objective which seriously limited its applicability. Noticing that the non-convexity introduced by the time variables is of similar nature as the centroidal dynamics one, we propose two convex relaxations to the problem based on trust regions and soft constraints. The resulting approaches can compute time-optimized dynamically consistent trajectories sufficiently fast to make the approach realtime capable. The performance of the algorithm is demonstrated in several multi-contact scenarios for a humanoid robot. In particular, we show that the proposed convex relaxation of the original problem finds solutions that are consistent with the original non-convex problem and illustrate how timing optimization allows to find motion plans that would be difficult to plan with fixed timing † †Implementation details and demos can be found in the source code available at https://git-amd.tuebingen.mpg.de/bponton/timeoptimization.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Deep Marching Cubes: Learning Explicit Surface Representations
Deep Marching Cubes: Learning Explicit Surface Representations

Liao, Y., Donne, S., Geiger, A.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2018, 2018 (inproceedings)

Abstract
Existing learning based solutions to 3D surface prediction cannot be trained end-to-end as they operate on intermediate representations (eg, TSDF) from which 3D surface meshes must be extracted in a post-processing step (eg, via the marching cubes algorithm). In this paper, we investigate the problem of end-to-end 3D surface prediction. We first demonstrate that the marching cubes algorithm is not differentiable and propose an alternative differentiable formulation which we insert as a final layer into a 3D convolutional neural network. We further propose a set of loss functions which allow for training our model with sparse point supervision. Our experiments demonstrate that the model allows for predicting sub-voxel accurate 3D shapes of arbitrary topology. Additionally, it learns to complete shapes and to separate an object's inside from its outside even in the presence of sparse and incomplete ground truth. We investigate the benefits of our approach on the task of inferring shapes from 3D point clouds. Our model is flexible and can be combined with a variety of shape encoder and shape inference techniques.

avg

pdf suppmat Video Project Page Poster Project Page [BibTex]

pdf suppmat Video Project Page Poster Project Page [BibTex]


Semantic Visual Localization
Semantic Visual Localization

Schönberger, J., Pollefeys, M., Geiger, A., Sattler, T.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2018, 2018 (inproceedings)

Abstract
Robust visual localization under a wide range of viewing conditions is a fundamental problem in computer vision. Handling the difficult cases of this problem is not only very challenging but also of high practical relevance, eg, in the context of life-long localization for augmented reality or autonomous robots. In this paper, we propose a novel approach based on a joint 3D geometric and semantic understanding of the world, enabling it to succeed under conditions where previous approaches failed. Our method leverages a novel generative model for descriptor learning, trained on semantic scene completion as an auxiliary task. The resulting 3D descriptors are robust to missing observations by encoding high-level 3D geometric and semantic information. Experiments on several challenging large-scale localization datasets demonstrate reliable localization under extreme viewpoint, illumination, and geometry changes.

avg

pdf suppmat Poster Project Page [BibTex]

pdf suppmat Poster Project Page [BibTex]


Which Training Methods for GANs do actually Converge?
Which Training Methods for GANs do actually Converge?

Mescheder, L., Geiger, A., Nowozin, S.

International Conference on Machine learning (ICML), 2018 (conference)

Abstract
Recent work has shown local convergence of GAN training for absolutely continuous data and generator distributions. In this paper, we show that the requirement of absolute continuity is necessary: we describe a simple yet prototypical counterexample showing that in the more realistic case of distributions that are not absolutely continuous, unregularized GAN training is not always convergent. Furthermore, we discuss regularization strategies that were recently proposed to stabilize GAN training. Our analysis shows that GAN training with instance noise or zero-centered gradient penalties converges. On the other hand, we show that Wasserstein-GANs and WGAN-GP with a finite number of discriminator updates per generator update do not always converge to the equilibrium point. We discuss these results, leading us to a new explanation for the stability problems of GAN training. Based on our analysis, we extend our convergence results to more general GANs and prove local convergence for simplified gradient penalties even if the generator and data distributions lie on lower dimensional manifolds. We find these penalties to work well in practice and use them to learn high-resolution generative image models for a variety of datasets with little hyperparameter tuning.

avg

code video paper supplement slides poster Project Page [BibTex]


no image
L4: Practical loss-based stepsize adaptation for deep learning

Rolinek, M., Martius, G.

In Advances in Neural Information Processing Systems 31 (NeurIPS 2018), pages: 6434-6444, (Editors: S. Bengio and H. Wallach and H. Larochelle and K. Grauman and N. Cesa-Bianchi and R. Garnett), Curran Associates, Inc., 2018 (inproceedings)

al

Github link (url) Project Page [BibTex]

Github link (url) Project Page [BibTex]


Learning 3D Shape Completion from Laser Scan Data with Weak Supervision
Learning 3D Shape Completion from Laser Scan Data with Weak Supervision

Stutz, D., Geiger, A.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2018, 2018 (inproceedings)

Abstract
3D shape completion from partial point clouds is a fundamental problem in computer vision and computer graphics. Recent approaches can be characterized as either data-driven or learning-based. Data-driven approaches rely on a shape model whose parameters are optimized to fit the observations. Learning-based approaches, in contrast, avoid the expensive optimization step and instead directly predict the complete shape from the incomplete observations using deep neural networks. However, full supervision is required which is often not available in practice. In this work, we propose a weakly-supervised learning-based approach to 3D shape completion which neither requires slow optimization nor direct supervision. While we also learn a shape prior on synthetic data, we amortize, ie, learn, maximum likelihood fitting using deep neural networks resulting in efficient shape completion without sacrificing accuracy. Tackling 3D shape completion of cars on ShapeNet and KITTI, we demonstrate that the proposed amortized maximum likelihood approach is able to compete with a fully supervised baseline and a state-of-the-art data-driven approach while being significantly faster. On ModelNet, we additionally show that the approach is able to generalize to other object categories as well.

avg

pdf suppmat Project Page Poster Project Page [BibTex]

pdf suppmat Project Page Poster Project Page [BibTex]


Systematic self-exploration of behaviors for robots in a dynamical systems framework
Systematic self-exploration of behaviors for robots in a dynamical systems framework

Pinneri, C., Martius, G.

In Proc. Artificial Life XI, pages: 319-326, MIT Press, Cambridge, MA, 2018 (inproceedings)

Abstract
One of the challenges of this century is to understand the neural mechanisms behind cognitive control and learning. Recent investigations propose biologically plausible synaptic mechanisms for self-organizing controllers, in the spirit of Hebbian learning. In particular, differential extrinsic plasticity (DEP) [Der and Martius, PNAS 2015], has proven to enable embodied agents to self-organize their individual sensorimotor development, and generate highly coordinated behaviors during their interaction with the environment. These behaviors are attractors of a dynamical system. In this paper, we use the DEP rule to generate attractors and we combine it with a “repelling potential” which allows the system to actively explore all its attractor behaviors in a systematic way. With a view to a self-determined exploration of goal-free behaviors, our framework enables switching between different motion patterns in an autonomous and sequential fashion. Our algorithm is able to recover all the attractor behaviors in a toy system and it is also effective in two simulated environments. A spherical robot discovers all its major rolling modes and a hexapod robot learns to locomote in 50 different ways in 30min.

al

link (url) DOI Project Page [BibTex]

link (url) DOI Project Page [BibTex]


Learning Transformation Invariant Representations with Weak Supervision
Learning Transformation Invariant Representations with Weak Supervision

Coors, B., Condurache, A., Mertins, A., Geiger, A.

In International Conference on Computer Vision Theory and Applications, International Conference on Computer Vision Theory and Applications, 2018 (inproceedings)

Abstract
Deep convolutional neural networks are the current state-of-the-art solution to many computer vision tasks. However, their ability to handle large global and local image transformations is limited. Consequently, extensive data augmentation is often utilized to incorporate prior knowledge about desired invariances to geometric transformations such as rotations or scale changes. In this work, we combine data augmentation with an unsupervised loss which enforces similarity between the predictions of augmented copies of an input sample. Our loss acts as an effective regularizer which facilitates the learning of transformation invariant representations. We investigate the effectiveness of the proposed similarity loss on rotated MNIST and the German Traffic Sign Recognition Benchmark (GTSRB) in the context of different classification models including ladder networks. Our experiments demonstrate improvements with respect to the standard data augmentation approach for supervised and semi-supervised learning tasks, in particular in the presence of little annotated data. In addition, we analyze the performance of the proposed approach with respect to its hyperparameters, including the strength of the regularization as well as the layer where representation similarity is enforced.

avg

pdf [BibTex]

pdf [BibTex]


Learning equations for extrapolation and control
Learning equations for extrapolation and control

Sahoo, S. S., Lampert, C. H., Martius, G.

In Proc. 35th International Conference on Machine Learning, ICML 2018, Stockholm, Sweden, 2018, 80, pages: 4442-4450, http://proceedings.mlr.press/v80/sahoo18a/sahoo18a.pdf, (Editors: Dy, Jennifer and Krause, Andreas), PMLR, 2018 (inproceedings)

Abstract
We present an approach to identify concise equations from data using a shallow neural network approach. In contrast to ordinary black-box regression, this approach allows understanding functional relations and generalizing them from observed data to unseen parts of the parameter space. We show how to extend the class of learnable equations for a recently proposed equation learning network to include divisions, and we improve the learning and model selection strategy to be useful for challenging real-world data. For systems governed by analytical expressions, our method can in many cases identify the true underlying equation and extrapolate to unseen domains. We demonstrate its effectiveness by experiments on a cart-pendulum system, where only 2 random rollouts are required to learn the forward dynamics and successfully achieve the swing-up task.

al

Code Arxiv Poster Slides link (url) Project Page [BibTex]

Code Arxiv Poster Slides link (url) Project Page [BibTex]


Robust Affordable 3D Haptic Sensation via Learning Deformation Patterns
Robust Affordable 3D Haptic Sensation via Learning Deformation Patterns

Sun, H., Martius, G.

Proceedings International Conference on Humanoid Robots, pages: 846-853, IEEE, New York, NY, USA, 2018 IEEE-RAS International Conference on Humanoid Robots, 2018, Oral Presentation (conference)

Abstract
Haptic sensation is an important modality for interacting with the real world. This paper proposes a general framework of inferring haptic forces on the surface of a 3D structure from internal deformations using a small number of physical sensors instead of employing dense sensor arrays. Using machine learning techniques, we optimize the sensor number and their placement and are able to obtain high-precision force inference for a robotic limb using as few as 9 sensors. For the optimal and sparse placement of the measurement units (strain gauges), we employ data-driven methods based on data obtained by finite element simulation. We compare data-driven approaches with model-based methods relying on geometric distance and information criteria such as Entropy and Mutual Information. We validate our approach on a modified limb of the “Poppy” robot [1] and obtain 8 mm localization precision.

al

DOI Project Page [BibTex]

DOI Project Page [BibTex]


no image
Unsupervised Contact Learning for Humanoid Estimation and Control

Rotella, N., Schaal, S., Righetti, L.

In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages: 411-417, IEEE, Brisbane, Australia, 2018 (inproceedings)

Abstract
This work presents a method for contact state estimation using fuzzy clustering to learn contact probability for full, six-dimensional humanoid contacts. The data required for training is solely from proprioceptive sensors - endeffector contact wrench sensors and inertial measurement units (IMUs) - and the method is completely unsupervised. The resulting cluster means are used to efficiently compute the probability of contact in each of the six endeffector degrees of freedom (DoFs) independently. This clustering-based contact probability estimator is validated in a kinematics-based base state estimator in a simulation environment with realistic added sensor noise for locomotion over rough, low-friction terrain on which the robot is subject to foot slip and rotation. The proposed base state estimator which utilizes these six DoF contact probability estimates is shown to perform considerably better than that which determines kinematic contact constraints purely based on measured normal force.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Learning Task-Specific Dynamics to Improve Whole-Body Control

Gams, A., Mason, S., Ude, A., Schaal, S., Righetti, L.

In Hua, IEEE, Beijing, China, November 2018 (inproceedings)

Abstract
In task-based inverse dynamics control, reference accelerations used to follow a desired plan can be broken down into feedforward and feedback trajectories. The feedback term accounts for tracking errors that are caused from inaccurate dynamic models or external disturbances. On underactuated, free-floating robots, such as humanoids, high feedback terms can be used to improve tracking accuracy; however, this can lead to very stiff behavior or poor tracking accuracy due to limited control bandwidth. In this paper, we show how to reduce the required contribution of the feedback controller by incorporating learned task-space reference accelerations. Thus, we i) improve the execution of the given specific task, and ii) offer the means to reduce feedback gains, providing for greater compliance of the system. With a systematic approach we also reduce heuristic tuning of the model parameters and feedback gains, often present in real-world experiments. In contrast to learning task-specific joint-torques, which might produce a similar effect but can lead to poor generalization, our approach directly learns the task-space dynamics of the center of mass of a humanoid robot. Simulated and real-world results on the lower part of the Sarcos Hermes humanoid robot demonstrate the applicability of the approach.

am mg

link (url) [BibTex]

link (url) [BibTex]


no image
An MPC Walking Framework With External Contact Forces

Mason, S., Rotella, N., Schaal, S., Righetti, L.

In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages: 1785-1790, IEEE, Brisbane, Australia, May 2018 (inproceedings)

Abstract
In this work, we present an extension to a linear Model Predictive Control (MPC) scheme that plans external contact forces for the robot when given multiple contact locations and their corresponding friction cone. To this end, we set up a two-step optimization problem. In the first optimization, we compute the Center of Mass (CoM) trajectory, foot step locations, and introduce slack variables to account for violating the imposed constraints on the Zero Moment Point (ZMP). We then use the slack variables to trigger the second optimization, in which we calculate the optimal external force that compensates for the ZMP tracking error. This optimization considers multiple contacts positions within the environment by formulating the problem as a Mixed Integer Quadratic Program (MIQP) that can be solved at a speed between 100-300 Hz. Once contact is created, the MIQP reduces to a single Quadratic Program (QP) that can be solved in real-time ({\textless}; 1kHz). Simulations show that the presented walking control scheme can withstand disturbances 2-3× larger with the additional force provided by a hand contact.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]

2008


no image
Human movement generation based on convergent flow fields: A computational model and a behavioral experiment

Hoffmann, H., Schaal, S.

In Advances in Computational Motor Control VII, Symposium at the Society for Neuroscience Meeting, Washington DC, 2008, 2008, clmc (inproceedings)

am

link (url) [BibTex]

2008


link (url) [BibTex]


no image
Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields

Park, D., Hoffmann, H., Pastor, P., Schaal, S.

In IEEE International Conference on Humanoid Robots, 2008., 2008, clmc (inproceedings)

am

PDF [BibTex]

PDF [BibTex]


no image
Emergence of Interaction Among Adaptive Agents

Martius, G., Nolfi, S., Herrmann, J. M.

In Proc. From Animals to Animats 10 (SAB 2008), 5040, pages: 457-466, LNCS, Springer, 2008 (inproceedings)

al

DOI [BibTex]

DOI [BibTex]


no image
The dual role of uncertainty in force field learning

Mistry, M., Theodorou, E., Hoffmann, H., Schaal, S.

In Abstracts of the Eighteenth Annual Meeting of Neural Control of Movement (NCM), Naples, Florida, April 29-May 4, 2008, clmc (inproceedings)

Abstract
Force field experiments have been a successful paradigm for studying the principles of planning, execution, and learning in human arm movements. Subjects have been shown to cope with the disturbances generated by force fields by learning internal models of the underlying dynamics to predict disturbance effects or by increasing arm impedance (via co-contraction) if a predictive approach becomes infeasible. Several studies have addressed the issue uncertainty in force field learning. Scheidt et al. demonstrated that subjects exposed to a viscous force field of fixed structure but varying strength (randomly changing from trial to trial), learn to adapt to the mean disturbance, regardless of the statistical distribution. Takahashi et al. additionally show a decrease in strength of after-effects after learning in the randomly varying environment. Thus they suggest that the nervous system adopts a dual strategy: learning an internal model of the mean of the random environment, while simultaneously increasing arm impedance to minimize the consequence of errors. In this study, we examine what role variance plays in the learning of uncertain force fields. We use a 7 degree-of-freedom exoskeleton robot as a manipulandum (Sarcos Master Arm, Sarcos, Inc.), and apply a 3D viscous force field of fixed structure and strength randomly selected from trial to trial. Additionally, in separate blocks of trials, we alter the variance of the randomly selected strength multiplier (while keeping a constant mean). In each block, after sufficient learning has occurred, we apply catch trials with no force field and measure the strength of after-effects. As expected in higher variance cases, results show increasingly smaller levels of after-effects as the variance is increased, thus implying subjects choose the robust strategy of increasing arm impedance to cope with higher levels of uncertainty. Interestingly, however, subjects show an increase in after-effect strength with a small amount of variance as compared to the deterministic (zero variance) case. This result implies that a small amount of variability aides in internal model formation, presumably a consequence of the additional amount of exploration conducted in the workspace of the task.

am

[BibTex]

[BibTex]


no image
Dynamic movement primitives for movement generation motivated by convergent force fields in frog

Hoffmann, H., Pastor, P., Schaal, S.

In Adaptive Motion of Animals and Machines (AMAM), 2008, clmc (inproceedings)

am

PDF [BibTex]

PDF [BibTex]


no image
Structure from Behavior in Autonomous Agents

Martius, G., Fiedler, K., Herrmann, J.

In Proc. IEEE Intl. Conf. Intelligent Robots and Systems (IROS 2008), pages: 858 - 862, 2008 (inproceedings)

al

DOI [BibTex]

DOI [BibTex]


no image
Behavioral experiments on reinforcement learning in human motor control

Hoffmann, H., Theodorou, E., Schaal, S.

In Abstracts of the Eighteenth Annual Meeting of Neural Control of Movement (NCM), Naples, Florida, April 29-May 4, 2008, clmc (inproceedings)

Abstract
Reinforcement learning (RL) - learning solely based on reward or cost feedback - is widespread in robotics control and has been also suggested as computational model for human motor control. In human motor control, however, hardly any experiment studied reinforcement learning. Here, we study learning based on visual cost feedback in a reaching task and did three experiments: (1) to establish a simple enough experiment for RL, (2) to study spatial localization of RL, and (3) to study the dependence of RL on the cost function. In experiment (1), subjects sit in front of a drawing tablet and look at a screen onto which the drawing pen's position is projected. Beginning from a start point, their task is to move with the pen through a target point presented on screen. Visual feedback about the pen's position is given only before movement onset. At the end of a movement, subjects get visual feedback only about the cost of this trial. We choose as cost the squared distance between target and virtual pen position at the target line. Above a threshold value, the cost was fixed at this value. In the mapping of the pen's position onto the screen, we added a bias (unknown to subject) and Gaussian noise. As result, subjects could learn the bias, and thus, showed reinforcement learning. In experiment (2), we randomly altered the target position between three different locations (three different directions from start point: -45, 0, 45). For each direction, we chose a different bias. As result, subjects learned all three bias values simultaneously. Thus, RL can be spatially localized. In experiment (3), we varied the sensitivity of the cost function by multiplying the squared distance with a constant value C, while keeping the same cut-off threshold. As in experiment (2), we had three target locations. We assigned to each location a different C value (this assignment was randomized between subjects). Since subjects learned the three locations simultaneously, we could directly compare the effect of the different cost functions. As result, we found an optimal C value; if C was too small (insensitive cost), learning was slow; if C was too large (narrow cost valley), the exploration time was longer and learning delayed. Thus, reinforcement learning in human motor control appears to be sen

am

[BibTex]

[BibTex]


no image
Movement generation by learning from demonstration and generalization to new targets

Pastor, P., Hoffmann, H., Schaal, S.

In Adaptive Motion of Animals and Machines (AMAM), 2008, clmc (inproceedings)

am

PDF [BibTex]

PDF [BibTex]


no image
Combining dynamic movement primitives and potential fields for online obstacle avoidance

Park, D., Hoffmann, H., Schaal, S.

In Adaptive Motion of Animals and Machines (AMAM), Cleveland, Ohio, 2008, 2008, clmc (inproceedings)

am

link (url) [BibTex]

link (url) [BibTex]


no image
Computational model for movement learning under uncertain cost

Theodorou, E., Hoffmann, H., Mistry, M., Schaal, S.

In Abstracts of the Society of Neuroscience Meeting (SFN 2008), Washington, DC 2008, 2008, clmc (inproceedings)

Abstract
Stochastic optimal control is a framework for computing control commands that lead to an optimal behavior under a given cost. Despite the long history of optimal control in engineering, it has been only recently applied to describe human motion. So far, stochastic optimal control has been mainly used in tasks that are already learned, such as reaching to a target. For learning, however, there are only few cases where optimal control has been applied. The main assumptions of stochastic optimal control that restrict its application to tasks after learning are the a priori knowledge of (1) a quadratic cost function (2) a state space model that captures the kinematics and/or dynamics of musculoskeletal system and (3) a measurement equation that models the proprioceptive and/or exteroceptive feedback. Under these assumptions, a sequence of control gains is computed that is optimal with respect to the prespecified cost function. In our work, we relax the assumption of the a priori known cost function and provide a computational framework for modeling tasks that involve learning. Typically, a cost function consists of two parts: one part that models the task constraints, like squared distance to goal at movement endpoint, and one part that integrates over the squared control commands. In learning a task, the first part of this cost function will be adapted. We use an expectation-maximization scheme for learning: the expectation step optimizes the task constraints through gradient descent of a reward function and the maximizing step optimizes the control commands. Our computational model is tested and compared with data given from a behavioral experiment. In this experiment, subjects sit in front of a drawing tablet and look at a screen onto which the drawing-pen's position is projected. Beginning from a start point, their task is to move with the pen through a target point presented on screen. Visual feedback about the pen's position is given only before movement onset. At the end of a movement, subjects get visual feedback only about the cost of this trial. In the mapping of the pen's position onto the screen, we added a bias (unknown to subject) and Gaussian noise. Therefore the cost is a function of this bias. The subjects were asked to reach to the target and minimize this cost over trials. In this behavioral experiment, subjects could learn the bias and thus showed reinforcement learning. With our computational model, we could model the learning process over trials. Particularly, the dependence on parameters of the reward function (Gaussian width) and the modulation of movement variance over time were similar in experiment and model.

am

[BibTex]

[BibTex]


no image
A Bayesian approach to empirical local linearizations for robotics

Ting, J., D’Souza, A., Vijayakumar, S., Schaal, S.

In International Conference on Robotics and Automation (ICRA2008), Pasadena, CA, USA, May 19-23, 2008, 2008, clmc (inproceedings)

Abstract
Local linearizations are ubiquitous in the control of robotic systems. Analytical methods, if available, can be used to obtain the linearization, but in complex robotics systems where the the dynamics and kinematics are often not faithfully obtainable, empirical linearization may be preferable. In this case, it is important to only use data for the local linearization that lies within a ``reasonable'' linear regime of the system, which can be defined from the Hessian at the point of the linearization -- a quantity that is not available without an analytical model. We introduce a Bayesian approach to solve statistically what constitutes a ``reasonable'' local regime. We approach this problem in the context local linear regression. In contrast to previous locally linear methods, we avoid cross-validation or complex statistical hypothesis testing techniques to find the appropriate local regime. Instead, we treat the parameters of the local regime probabilistically and use approximate Bayesian inference for their estimation. This approach results in an analytical set of iterative update equations that are easily implemented on real robotics systems for real-time applications. As in other locally weighted regressions, our algorithm also lends itself to complete nonlinear function approximation for learning empirical internal models. We sketch the derivation of our Bayesian method and provide evaluations on synthetic data and actual robot data where the analytical linearization was known.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Do humans plan continuous trajectories in kinematic coordinates?

Hoffmann, H., Schaal, S.

In Abstracts of the Society of Neuroscience Meeting (SFN 2008), Washington, DC 2008, 2008, clmc (inproceedings)

Abstract
The planning and execution of human arm movements is still unresolved. An ongoing controversy is whether we plan a movement in kinematic coordinates and convert these coordinates with an inverse internal model into motor commands (like muscle activation) or whether we combine a few muscle synergies or equilibrium points to move a hand, e.g., between two targets. The first hypothesis implies that a planner produces a desired end-effector position for all time points; the second relies on the dynamics of the muscular-skeletal system for a given control command to produce a continuous end-effector trajectory. To distinguish between these two possibilities, we use a visuomotor adaptation experiment. Subjects moved a pen on a graphics tablet and observed the pen's mapped position onto a screen (subjects quickly adapted to this mapping). The task was to move a cursor between two points in a given time window. In the adaptation test, we manipulated the velocity profile of the cursor feedback such that the shape of the trajectories remained unchanged (for straight paths). If humans would use a kinematic plan and map at each time the desired end-effector position onto control commands, subjects should adapt to the above manipulation. In a similar experiment, Wolpert et al (1995) showed adaptation to changes in the curvature of trajectories. This result, however, cannot rule out a shift of an equilibrium point or an additional synergy activation between start and end point of a movement. In our experiment, subjects did two sessions, one control without and one with velocity-profile manipulation. To skew the velocity profile of the cursor trajectory, we added to the current velocity, v, the function 0.8*v*cos(pi + pi*x), where x is the projection of the cursor position onto the start-goal line divided by the distance start to goal (x=0 at the start point). As result, subjects did not adapt to this manipulation: for all subjects, the true hand motion was not significantly modified in a direction consistent with adaptation, despite that the visually presented motion differed significantly from the control motion. One may still argue that this difference in motion was insufficient to be processed visually. Thus, as a control experiment, we replayed control and modified motions to the subjects and asked which of the two motions appeared 'more natural'. Subjects chose the unperturbed motion as more natural significantly better than chance. In summary, for a visuomotor transformation task, the hypothesis of a planned continuous end-effector trajectory predicts adaptation to a modified velocity profile. The current experiment found no adaptation under such transformation.

am

[BibTex]

[BibTex]


no image
A Versatile Stair-Climbing Robot for Search and Rescue Applications

Eich, M., Grimminger, F., Kirchner, F.

In 2008 IEEE International Workshop on Safety, Security and Rescue Robotics, pages: 35-40, October 2008 (inproceedings)

am

DOI [BibTex]

DOI [BibTex]

2007


no image
Towards Machine Learning of Motor Skills

Peters, J., Schaal, S., Schölkopf, B.

In Proceedings of Autonome Mobile Systeme (AMS), pages: 138-144, (Editors: K Berns and T Luksch), 2007, clmc (inproceedings)

Abstract
Autonomous robots that can adapt to novel situations has been a long standing vision of robotics, artificial intelligence, and cognitive sciences. Early approaches to this goal during the heydays of artificial intelligence research in the late 1980s, however, made it clear that an approach purely based on reasoning or human insights would not be able to model all the perceptuomotor tasks that a robot should fulfill. Instead, new hope was put in the growing wake of machine learning that promised fully adaptive control algorithms which learn both by observation and trial-and-error. However, to date, learning techniques have yet to fulfill this promise as only few methods manage to scale into the high-dimensional domains of manipulator robotics, or even the new upcoming trend of humanoid robotics, and usually scaling was only achieved in precisely pre-structured domains. In this paper, we investigate the ingredients for a general approach to motor skill learning in order to get one step closer towards human-like performance. For doing so, we study two ma jor components for such an approach, i.e., firstly, a theoretically well-founded general approach to representing the required control structures for task representation and execution and, secondly, appropriate learning algorithms which can be applied in this setting.

am ei

PDF DOI [BibTex]

2007


PDF DOI [BibTex]


no image
Reinforcement Learning for Optimal Control of Arm Movements

Theodorou, E., Peters, J., Schaal, S.

In Abstracts of the 37st Meeting of the Society of Neuroscience., Neuroscience, 2007, clmc (inproceedings)

Abstract
Every day motor behavior consists of a plethora of challenging motor skills from discrete movements such as reaching and throwing to rhythmic movements such as walking, drumming and running. How this plethora of motor skills can be learned remains an open question. In particular, is there any unifying computa-tional framework that could model the learning process of this variety of motor behaviors and at the same time be biologically plausible? In this work we aim to give an answer to these questions by providing a computational framework that unifies the learning mechanism of both rhythmic and discrete movements under optimization criteria, i.e., in a non-supervised trial-and-error fashion. Our suggested framework is based on Reinforcement Learning, which is mostly considered as too costly to be a plausible mechanism for learning com-plex limb movement. However, recent work on reinforcement learning with pol-icy gradients combined with parameterized movement primitives allows novel and more efficient algorithms. By using the representational power of such mo-tor primitives we show how rhythmic motor behaviors such as walking, squash-ing and drumming as well as discrete behaviors like reaching and grasping can be learned with biologically plausible algorithms. Using extensive simulations and by using different reward functions we provide results that support the hy-pothesis that Reinforcement Learning could be a viable candidate for motor learning of human motor behavior when other learning methods like supervised learning are not feasible.

am ei

[BibTex]

[BibTex]


no image
Reinforcement learning by reward-weighted regression for operational space control

Peters, J., Schaal, S.

In Proceedings of the 24th Annual International Conference on Machine Learning, pages: 745-750, ICML, 2007, clmc (inproceedings)

Abstract
Many robot control problems of practical importance, including operational space control, can be reformulated as immediate reward reinforcement learning problems. However, few of the known optimization or reinforcement learning algorithms can be used in online learning control for robots, as they are either prohibitively slow, do not scale to interesting domains of complex robots, or require trying out policies generated by random search, which are infeasible for a physical system. Using a generalization of the EM-base reinforcement learning framework suggested by Dayan & Hinton, we reduce the problem of learning with immediate rewards to a reward-weighted regression problem with an adaptive, integrated reward transformation for faster convergence. The resulting algorithm is efficient, learns smoothly without dangerous jumps in solution space, and works well in applications of complex high degree-of-freedom robots.

am ei

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Policy gradient methods for machine learning

Peters, J., Theodorou, E., Schaal, S.

In Proceedings of the 14th INFORMS Conference of the Applied Probability Society, pages: 97-98, Eindhoven, Netherlands, July 9-11, 2007, 2007, clmc (inproceedings)

Abstract
We present an in-depth survey of policy gradient methods as they are used in the machine learning community for optimizing parameterized, stochastic control policies in Markovian systems with respect to the expected reward. Despite having been developed separately in the reinforcement learning literature, policy gradient methods employ likelihood ratio gradient estimators as also suggested in the stochastic simulation optimization community. It is well-known that this approach to policy gradient estimation traditionally suffers from three drawbacks, i.e., large variance, a strong dependence on baseline functions and a inefficient gradient descent. In this talk, we will present a series of recent results which tackles each of these problems. The variance of the gradient estimation can be reduced significantly through recently introduced techniques such as optimal baselines, compatible function approximations and all-action gradients. However, as even the analytically obtainable policy gradients perform unnaturally slow, it required the step from ÔvanillaÕ policy gradient methods towards natural policy gradients in order to overcome the inefficiency of the gradient descent. This development resulted into the Natural Actor-Critic architecture which can be shown to be very efficient in application to motor primitive learning for robotics.

am ei

[BibTex]

[BibTex]


no image
Policy Learning for Motor Skills

Peters, J., Schaal, S.

In Proceedings of 14th International Conference on Neural Information Processing (ICONIP), pages: 233-242, (Editors: Ishikawa, M. , K. Doya, H. Miyamoto, T. Yamakawa), 2007, clmc (inproceedings)

Abstract
Policy learning which allows autonomous robots to adapt to novel situations has been a long standing vision of robotics, artificial intelligence, and cognitive sciences. However, to date, learning techniques have yet to fulfill this promise as only few methods manage to scale into the high-dimensional domains of manipulator robotics, or even the new upcoming trend of humanoid robotics, and usually scaling was only achieved in precisely pre-structured domains. In this paper, we investigate the ingredients for a general approach policy learning with the goal of an application to motor skill refinement in order to get one step closer towards human-like performance. For doing so, we study two major components for such an approach, i.e., firstly, we study policy learning algorithms which can be applied in the general setting of motor skill learning, and, secondly, we study a theoretically well-founded general approach to representing the required control structures for task representation and execution.

am ei

PDF DOI [BibTex]

PDF DOI [BibTex]