Header logo is


2020


Learning Sensory-Motor Associations from Demonstration
Learning Sensory-Motor Associations from Demonstration

Berenz, V., Bjelic, A., Herath, L., Mainprice, J.

29th IEEE International Conference on Robot and Human Interactive Communication (Ro-Man 2020), August 2020 (conference) Accepted

Abstract
We propose a method which generates reactive robot behavior learned from human demonstration. In order to do so, we use the Playful programming language which is based on the reactive programming paradigm. This allows us to represent the learned behavior as a set of associations between sensor and motor primitives in a human readable script. Distinguishing between sensor and motor primitives introduces a supplementary level of granularity and more importantly enforces feedback, increasing adaptability and robustness. As the experimental section shows, useful behaviors may be learned from a single demonstration covering a very limited portion of the task space.

am

[BibTex]

2020


[BibTex]


Excursion Search for Constrained Bayesian Optimization under a Limited Budget of Failures
Excursion Search for Constrained Bayesian Optimization under a Limited Budget of Failures

Marco, A., Rohr, A. V., Baumann, D., Hernández-Lobato, J. M., Trimpe, S.

2020 (proceedings) In revision

Abstract
When learning to ride a bike, a child falls down a number of times before achieving the first success. As falling down usually has only mild consequences, it can be seen as a tolerable failure in exchange for a faster learning process, as it provides rich information about an undesired behavior. In the context of Bayesian optimization under unknown constraints (BOC), typical strategies for safe learning explore conservatively and avoid failures by all means. On the other side of the spectrum, non conservative BOC algorithms that allow failing may fail an unbounded number of times before reaching the optimum. In this work, we propose a novel decision maker grounded in control theory that controls the amount of risk we allow in the search as a function of a given budget of failures. Empirical validation shows that our algorithm uses the failures budget more efficiently in a variety of optimization experiments, and generally achieves lower regret, than state-of-the-art methods. In addition, we propose an original algorithm for unconstrained Bayesian optimization inspired by the notion of excursion sets in stochastic processes, upon which the failures-aware algorithm is built.

ics am

arXiv code (python) PDF [BibTex]


no image
A Real-Robot Dataset for Assessing Transferability of Learned Dynamics Models

Agudelo-España, D., Zadaianchuk, A., Wenk, P., Garg, A., Akpo, J., Grimminger, F., Viereck, J., Naveau, M., Righetti, L., Martius, G., Krause, A., Schölkopf, B., Bauer, S., Wüthrich, M.

IEEE International Conference on Robotics and Automation (ICRA), 2020 (conference) Accepted

am al ei mg

Project Page PDF [BibTex]

Project Page PDF [BibTex]


Optimizing Rank-based Metrics with Blackbox Differentiation
Optimizing Rank-based Metrics with Blackbox Differentiation

Rolinek, M., Musil, V., Paulus, A., Vlastelica, M., Michaelis, C., Martius, G.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020, Best paper nomination (inproceedings)

Abstract
Rank-based metrics are some of the most widely used criteria for performance evaluation of computer vision models. Despite years of effort, direct optimization for these metrics remains a challenge due to their non-differentiable and non-decomposable nature. We present an efficient, theoretically sound, and general method for differentiating rank-based metrics with mini-batch gradient descent. In addition, we address optimization instability and sparsity of the supervision signal that both arise from using rank-based metrics as optimization targets. Resulting losses based on recall and Average Precision are applied to image retrieval and object detection tasks. We obtain performance that is competitive with state-of-the-art on standard image retrieval datasets and consistently improve performance of near state-of-the-art object detectors.

al

Code Long Oral Short Oral Arxiv Project Page [BibTex]

Code Long Oral Short Oral Arxiv Project Page [BibTex]

2018


Deep Reinforcement Learning for Event-Triggered Control
Deep Reinforcement Learning for Event-Triggered Control

Baumann, D., Zhu, J., Martius, G., Trimpe, S.

In Proceedings of the 57th IEEE International Conference on Decision and Control (CDC), pages: 943-950, 57th IEEE International Conference on Decision and Control (CDC), December 2018 (inproceedings)

al ics

arXiv PDF DOI Project Page Project Page [BibTex]

2018


arXiv PDF DOI Project Page Project Page [BibTex]


Motion-based Object Segmentation based on Dense RGB-D Scene Flow
Motion-based Object Segmentation based on Dense RGB-D Scene Flow

Shao, L., Shah, P., Dwaracherla, V., Bohg, J.

IEEE Robotics and Automation Letters, 3(4):3797-3804, IEEE, IEEE/RSJ International Conference on Intelligent Robots and Systems, October 2018 (conference)

Abstract
Given two consecutive RGB-D images, we propose a model that estimates a dense 3D motion field, also known as scene flow. We take advantage of the fact that in robot manipulation scenarios, scenes often consist of a set of rigidly moving objects. Our model jointly estimates (i) the segmentation of the scene into an unknown but finite number of objects, (ii) the motion trajectories of these objects and (iii) the object scene flow. We employ an hourglass, deep neural network architecture. In the encoding stage, the RGB and depth images undergo spatial compression and correlation. In the decoding stage, the model outputs three images containing a per-pixel estimate of the corresponding object center as well as object translation and rotation. This forms the basis for inferring the object segmentation and final object scene flow. To evaluate our model, we generated a new and challenging, large-scale, synthetic dataset that is specifically targeted at robotic manipulation: It contains a large number of scenes with a very diverse set of simultaneously moving 3D objects and is recorded with a commonly-used RGB-D camera. In quantitative experiments, we show that we significantly outperform state-of-the-art scene flow and motion-segmentation methods. In qualitative experiments, we show how our learned model transfers to challenging real-world scenes, visually generating significantly better results than existing methods.

am

Project Page arXiv DOI [BibTex]

Project Page arXiv DOI [BibTex]


Probabilistic Recurrent State-Space Models
Probabilistic Recurrent State-Space Models

Doerr, A., Daniel, C., Schiegg, M., Nguyen-Tuong, D., Schaal, S., Toussaint, M., Trimpe, S.

In Proceedings of the International Conference on Machine Learning (ICML), International Conference on Machine Learning (ICML), July 2018 (inproceedings)

Abstract
State-space models (SSMs) are a highly expressive model class for learning patterns in time series data and for system identification. Deterministic versions of SSMs (e.g., LSTMs) proved extremely successful in modeling complex time-series data. Fully probabilistic SSMs, however, unfortunately often prove hard to train, even for smaller problems. To overcome this limitation, we propose a scalable initialization and training algorithm based on doubly stochastic variational inference and Gaussian processes. In the variational approximation we propose in contrast to related approaches to fully capture the latent state temporal correlations to allow for robust training.

am ics

arXiv pdf Project Page [BibTex]

arXiv pdf Project Page [BibTex]


Online Learning of a Memory for Learning Rates
Online Learning of a Memory for Learning Rates

(nominated for best paper award)

Meier, F., Kappler, D., Schaal, S.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2018, IEEE, International Conference on Robotics and Automation, May 2018, accepted (inproceedings)

Abstract
The promise of learning to learn for robotics rests on the hope that by extracting some information about the learning process itself we can speed up subsequent similar learning tasks. Here, we introduce a computationally efficient online meta-learning algorithm that builds and optimizes a memory model of the optimal learning rate landscape from previously observed gradient behaviors. While performing task specific optimization, this memory of learning rates predicts how to scale currently observed gradients. After applying the gradient scaling our meta-learner updates its internal memory based on the observed effect its prediction had. Our meta-learner can be combined with any gradient-based optimizer, learns on the fly and can be transferred to new optimization tasks. In our evaluations we show that our meta-learning algorithm speeds up learning of MNIST classification and a variety of learning control tasks, either in batch or online learning settings.

am

pdf video code [BibTex]

pdf video code [BibTex]


Learning Sensor Feedback Models from Demonstrations via Phase-Modulated Neural Networks
Learning Sensor Feedback Models from Demonstrations via Phase-Modulated Neural Networks

Sutanto, G., Su, Z., Schaal, S., Meier, F.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2018, IEEE, International Conference on Robotics and Automation, May 2018 (inproceedings)

am

pdf video [BibTex]

pdf video [BibTex]


no image
On Time Optimization of Centroidal Momentum Dynamics

Ponton, B., Herzog, A., Del Prete, A., Schaal, S., Righetti, L.

In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages: 5776-5782, IEEE, Brisbane, Australia, 2018 (inproceedings)

Abstract
Recently, the centroidal momentum dynamics has received substantial attention to plan dynamically consistent motions for robots with arms and legs in multi-contact scenarios. However, it is also non convex which renders any optimization approach difficult and timing is usually kept fixed in most trajectory optimization techniques to not introduce additional non convexities to the problem. But this can limit the versatility of the algorithms. In our previous work, we proposed a convex relaxation of the problem that allowed to efficiently compute momentum trajectories and contact forces. However, our approach could not minimize a desired angular momentum objective which seriously limited its applicability. Noticing that the non-convexity introduced by the time variables is of similar nature as the centroidal dynamics one, we propose two convex relaxations to the problem based on trust regions and soft constraints. The resulting approaches can compute time-optimized dynamically consistent trajectories sufficiently fast to make the approach realtime capable. The performance of the algorithm is demonstrated in several multi-contact scenarios for a humanoid robot. In particular, we show that the proposed convex relaxation of the original problem finds solutions that are consistent with the original non-convex problem and illustrate how timing optimization allows to find motion plans that would be difficult to plan with fixed timing † †Implementation details and demos can be found in the source code available at https://git-amd.tuebingen.mpg.de/bponton/timeoptimization.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
L4: Practical loss-based stepsize adaptation for deep learning

Rolinek, M., Martius, G.

In Advances in Neural Information Processing Systems 31 (NeurIPS 2018), pages: 6434-6444, (Editors: S. Bengio and H. Wallach and H. Larochelle and K. Grauman and N. Cesa-Bianchi and R. Garnett), Curran Associates, Inc., 2018 (inproceedings)

al

Github link (url) Project Page [BibTex]

Github link (url) Project Page [BibTex]


Systematic self-exploration of behaviors for robots in a dynamical systems framework
Systematic self-exploration of behaviors for robots in a dynamical systems framework

Pinneri, C., Martius, G.

In Proc. Artificial Life XI, pages: 319-326, MIT Press, Cambridge, MA, 2018 (inproceedings)

Abstract
One of the challenges of this century is to understand the neural mechanisms behind cognitive control and learning. Recent investigations propose biologically plausible synaptic mechanisms for self-organizing controllers, in the spirit of Hebbian learning. In particular, differential extrinsic plasticity (DEP) [Der and Martius, PNAS 2015], has proven to enable embodied agents to self-organize their individual sensorimotor development, and generate highly coordinated behaviors during their interaction with the environment. These behaviors are attractors of a dynamical system. In this paper, we use the DEP rule to generate attractors and we combine it with a “repelling potential” which allows the system to actively explore all its attractor behaviors in a systematic way. With a view to a self-determined exploration of goal-free behaviors, our framework enables switching between different motion patterns in an autonomous and sequential fashion. Our algorithm is able to recover all the attractor behaviors in a toy system and it is also effective in two simulated environments. A spherical robot discovers all its major rolling modes and a hexapod robot learns to locomote in 50 different ways in 30min.

al

link (url) DOI Project Page [BibTex]

link (url) DOI Project Page [BibTex]


Learning equations for extrapolation and control
Learning equations for extrapolation and control

Sahoo, S. S., Lampert, C. H., Martius, G.

In Proc. 35th International Conference on Machine Learning, ICML 2018, Stockholm, Sweden, 2018, 80, pages: 4442-4450, http://proceedings.mlr.press/v80/sahoo18a/sahoo18a.pdf, (Editors: Dy, Jennifer and Krause, Andreas), PMLR, 2018 (inproceedings)

Abstract
We present an approach to identify concise equations from data using a shallow neural network approach. In contrast to ordinary black-box regression, this approach allows understanding functional relations and generalizing them from observed data to unseen parts of the parameter space. We show how to extend the class of learnable equations for a recently proposed equation learning network to include divisions, and we improve the learning and model selection strategy to be useful for challenging real-world data. For systems governed by analytical expressions, our method can in many cases identify the true underlying equation and extrapolate to unseen domains. We demonstrate its effectiveness by experiments on a cart-pendulum system, where only 2 random rollouts are required to learn the forward dynamics and successfully achieve the swing-up task.

al

Code Arxiv Poster Slides link (url) Project Page [BibTex]

Code Arxiv Poster Slides link (url) Project Page [BibTex]


Robust Affordable 3D Haptic Sensation via Learning Deformation Patterns
Robust Affordable 3D Haptic Sensation via Learning Deformation Patterns

Sun, H., Martius, G.

Proceedings International Conference on Humanoid Robots, pages: 846-853, IEEE, New York, NY, USA, 2018 IEEE-RAS International Conference on Humanoid Robots, 2018, Oral Presentation (conference)

Abstract
Haptic sensation is an important modality for interacting with the real world. This paper proposes a general framework of inferring haptic forces on the surface of a 3D structure from internal deformations using a small number of physical sensors instead of employing dense sensor arrays. Using machine learning techniques, we optimize the sensor number and their placement and are able to obtain high-precision force inference for a robotic limb using as few as 9 sensors. For the optimal and sparse placement of the measurement units (strain gauges), we employ data-driven methods based on data obtained by finite element simulation. We compare data-driven approaches with model-based methods relying on geometric distance and information criteria such as Entropy and Mutual Information. We validate our approach on a modified limb of the “Poppy” robot [1] and obtain 8 mm localization precision.

al

DOI Project Page [BibTex]

DOI Project Page [BibTex]


no image
Unsupervised Contact Learning for Humanoid Estimation and Control

Rotella, N., Schaal, S., Righetti, L.

In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages: 411-417, IEEE, Brisbane, Australia, 2018 (inproceedings)

Abstract
This work presents a method for contact state estimation using fuzzy clustering to learn contact probability for full, six-dimensional humanoid contacts. The data required for training is solely from proprioceptive sensors - endeffector contact wrench sensors and inertial measurement units (IMUs) - and the method is completely unsupervised. The resulting cluster means are used to efficiently compute the probability of contact in each of the six endeffector degrees of freedom (DoFs) independently. This clustering-based contact probability estimator is validated in a kinematics-based base state estimator in a simulation environment with realistic added sensor noise for locomotion over rough, low-friction terrain on which the robot is subject to foot slip and rotation. The proposed base state estimator which utilizes these six DoF contact probability estimates is shown to perform considerably better than that which determines kinematic contact constraints purely based on measured normal force.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Learning Task-Specific Dynamics to Improve Whole-Body Control

Gams, A., Mason, S., Ude, A., Schaal, S., Righetti, L.

In Hua, IEEE, Beijing, China, November 2018 (inproceedings)

Abstract
In task-based inverse dynamics control, reference accelerations used to follow a desired plan can be broken down into feedforward and feedback trajectories. The feedback term accounts for tracking errors that are caused from inaccurate dynamic models or external disturbances. On underactuated, free-floating robots, such as humanoids, high feedback terms can be used to improve tracking accuracy; however, this can lead to very stiff behavior or poor tracking accuracy due to limited control bandwidth. In this paper, we show how to reduce the required contribution of the feedback controller by incorporating learned task-space reference accelerations. Thus, we i) improve the execution of the given specific task, and ii) offer the means to reduce feedback gains, providing for greater compliance of the system. With a systematic approach we also reduce heuristic tuning of the model parameters and feedback gains, often present in real-world experiments. In contrast to learning task-specific joint-torques, which might produce a similar effect but can lead to poor generalization, our approach directly learns the task-space dynamics of the center of mass of a humanoid robot. Simulated and real-world results on the lower part of the Sarcos Hermes humanoid robot demonstrate the applicability of the approach.

am mg

link (url) [BibTex]

link (url) [BibTex]


no image
An MPC Walking Framework With External Contact Forces

Mason, S., Rotella, N., Schaal, S., Righetti, L.

In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages: 1785-1790, IEEE, Brisbane, Australia, May 2018 (inproceedings)

Abstract
In this work, we present an extension to a linear Model Predictive Control (MPC) scheme that plans external contact forces for the robot when given multiple contact locations and their corresponding friction cone. To this end, we set up a two-step optimization problem. In the first optimization, we compute the Center of Mass (CoM) trajectory, foot step locations, and introduce slack variables to account for violating the imposed constraints on the Zero Moment Point (ZMP). We then use the slack variables to trigger the second optimization, in which we calculate the optimal external force that compensates for the ZMP tracking error. This optimization considers multiple contacts positions within the environment by formulating the problem as a Mixed Integer Quadratic Program (MIQP) that can be solved at a speed between 100-300 Hz. Once contact is created, the MIQP reduces to a single Quadratic Program (QP) that can be solved in real-time ({\textless}; 1kHz). Simulations show that the presented walking control scheme can withstand disturbances 2-3× larger with the additional force provided by a hand contact.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]

2017


Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets
Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets

Hausman, K., Chebotar, Y., Schaal, S., Sukhatme, G., Lim, J.

In Proceedings from the conference "Neural Information Processing Systems 2017., (Editors: Guyon I. and Luxburg U.v. and Bengio S. and Wallach H. and Fergus R. and Vishwanathan S. and Garnett R.), Curran Associates, Inc., Advances in Neural Information Processing Systems 30 (NIPS), December 2017 (inproceedings)

am

pdf video [BibTex]

2017


pdf video [BibTex]


On the Design of {LQR} Kernels for Efficient Controller Learning
On the Design of LQR Kernels for Efficient Controller Learning

Marco, A., Hennig, P., Schaal, S., Trimpe, S.

Proceedings of the 56th IEEE Annual Conference on Decision and Control (CDC), pages: 5193-5200, IEEE, IEEE Conference on Decision and Control, December 2017 (conference)

Abstract
Finding optimal feedback controllers for nonlinear dynamic systems from data is hard. Recently, Bayesian optimization (BO) has been proposed as a powerful framework for direct controller tuning from experimental trials. For selecting the next query point and finding the global optimum, BO relies on a probabilistic description of the latent objective function, typically a Gaussian process (GP). As is shown herein, GPs with a common kernel choice can, however, lead to poor learning outcomes on standard quadratic control problems. For a first-order system, we construct two kernels that specifically leverage the structure of the well-known Linear Quadratic Regulator (LQR), yet retain the flexibility of Bayesian nonparametric learning. Simulations of uncertain linear and nonlinear systems demonstrate that the LQR kernels yield superior learning performance.

am ics pn

arXiv PDF On the Design of LQR Kernels for Efficient Controller Learning - CDC presentation DOI Project Page [BibTex]

arXiv PDF On the Design of LQR Kernels for Efficient Controller Learning - CDC presentation DOI Project Page [BibTex]


Optimizing Long-term Predictions for Model-based Policy Search
Optimizing Long-term Predictions for Model-based Policy Search

Doerr, A., Daniel, C., Nguyen-Tuong, D., Marco, A., Schaal, S., Toussaint, M., Trimpe, S.

Proceedings of 1st Annual Conference on Robot Learning (CoRL), 78, pages: 227-238, (Editors: Sergey Levine and Vincent Vanhoucke and Ken Goldberg), 1st Annual Conference on Robot Learning, November 2017 (conference)

Abstract
We propose a novel long-term optimization criterion to improve the robustness of model-based reinforcement learning in real-world scenarios. Learning a dynamics model to derive a solution promises much greater data-efficiency and reusability compared to model-free alternatives. In practice, however, modelbased RL suffers from various imperfections such as noisy input and output data, delays and unmeasured (latent) states. To achieve higher resilience against such effects, we propose to optimize a generative long-term prediction model directly with respect to the likelihood of observed trajectories as opposed to the common approach of optimizing a dynamics model for one-step-ahead predictions. We evaluate the proposed method on several artificial and real-world benchmark problems and compare it to PILCO, a model-based RL framework, in experiments on a manipulation robot. The results show that the proposed method is competitive compared to state-of-the-art model learning methods. In contrast to these more involved models, our model can directly be employed for policy search and outperforms a baseline method in the robot experiment.

am ics

PDF Project Page [BibTex]

PDF Project Page [BibTex]


no image
Learning optimal gait parameters and impedance profiles for legged locomotion

Heijmink, E., Radulescu, A., Ponton, B., Barasuol, V., Caldwell, D., Semini, C.

Proceedings International Conference on Humanoid Robots, IEEE, 2017 IEEE-RAS 17th International Conference on Humanoid Robots, November 2017 (conference)

Abstract
The successful execution of complex modern robotic tasks often relies on the correct tuning of a large number of parameters. In this paper we present a methodology for improving the performance of a trotting gait by learning the gait parameters, impedance profile and the gains of the control architecture. We show results on a set of terrains, for various speeds using a realistic simulation of a hydraulically actuated system. Our method achieves a reduction in the gait's mechanical energy consumption during locomotion of up to 26%. The simulation results are validated in experimental trials on the hardware system.

am

paper [BibTex]

paper [BibTex]


no image
A New Data Source for Inverse Dynamics Learning

Kappler, D., Meier, F., Ratliff, N., Schaal, S.

In Proceedings IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, Piscataway, NJ, USA, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), September 2017 (inproceedings)

am

[BibTex]

[BibTex]


no image
Bayesian Regression for Artifact Correction in Electroencephalography

Fiebig, K., Jayaram, V., Hesse, T., Blank, A., Peters, J., Grosse-Wentrup, M.

Proceedings of the 7th Graz Brain-Computer Interface Conference 2017 - From Vision to Reality, pages: 131-136, (Editors: Müller-Putz G.R., Steyrl D., Wriessnegger S. C., Scherer R.), Graz University of Technology, Austria, Graz Brain-Computer Interface Conference, September 2017 (conference)

am ei

DOI [BibTex]

DOI [BibTex]


no image
Investigating Music Imagery as a Cognitive Paradigm for Low-Cost Brain-Computer Interfaces

Grossberger, L., Hohmann, M. R., Peters, J., Grosse-Wentrup, M.

Proceedings of the 7th Graz Brain-Computer Interface Conference 2017 - From Vision to Reality, pages: 160-164, (Editors: Müller-Putz G.R., Steyrl D., Wriessnegger S. C., Scherer R.), Graz University of Technology, Austria, Graz Brain-Computer Interface Conference, September 2017 (conference)

am ei

DOI [BibTex]

DOI [BibTex]


On the relevance of grasp metrics for predicting grasp success
On the relevance of grasp metrics for predicting grasp success

Rubert, C., Kappler, D., Morales, A., Schaal, S., Bohg, J.

In Proceedings of the IEEE/RSJ International Conference of Intelligent Robots and Systems, September 2017 (inproceedings) Accepted

Abstract
We aim to reliably predict whether a grasp on a known object is successful before it is executed in the real world. There is an entire suite of grasp metrics that has already been developed which rely on precisely known contact points between object and hand. However, it remains unclear whether and how they may be combined into a general purpose grasp stability predictor. In this paper, we analyze these questions by leveraging a large scale database of simulated grasps on a wide variety of objects. For each grasp, we compute the value of seven metrics. Each grasp is annotated by human subjects with ground truth stability labels. Given this data set, we train several classification methods to find out whether there is some underlying, non-trivial structure in the data that is difficult to model manually but can be learned. Quantitative and qualitative results show the complexity of the prediction problem. We found that a good prediction performance critically depends on using a combination of metrics as input features. Furthermore, non-parametric and non-linear classifiers best capture the structure in the data.

am

Project Page [BibTex]

Project Page [BibTex]


no image
Local Bayesian Optimization of Motor Skills

Akrour, R., Sorokin, D., Peters, J., Neumann, G.

Proceedings of the 34th International Conference on Machine Learning, 70, pages: 41-50, Proceedings of Machine Learning Research, (Editors: Doina Precup, Yee Whye Teh), PMLR, International Conference on Machine Learning (ICML), August 2017 (conference)

am ei

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning
Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning

Chebotar, Y., Hausman, K., Zhang, M., Sukhatme, G., Schaal, S., Levine, S.

Proceedings of the 34th International Conference on Machine Learning, 70, Proceedings of Machine Learning Research, (Editors: Doina Precup, Yee Whye Teh), PMLR, International Conference on Machine Learning (ICML), August 2017 (conference)

am

pdf video [BibTex]

pdf video [BibTex]


Model-Based Policy Search for Automatic Tuning of Multivariate PID Controllers
Model-Based Policy Search for Automatic Tuning of Multivariate PID Controllers

Doerr, A., Nguyen-Tuong, D., Marco, A., Schaal, S., Trimpe, S.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages: 5295-5301, IEEE, Piscataway, NJ, USA, IEEE International Conference on Robotics and Automation (ICRA), May 2017 (inproceedings)

am ics

PDF arXiv DOI Project Page [BibTex]

PDF arXiv DOI Project Page [BibTex]


Learning Feedback Terms for Reactive Planning and Control
Learning Feedback Terms for Reactive Planning and Control

Rai, A., Sutanto, G., Schaal, S., Meier, F.

Proceedings 2017 IEEE International Conference on Robotics and Automation (ICRA), IEEE, Piscataway, NJ, USA, IEEE International Conference on Robotics and Automation (ICRA), May 2017 (conference)

am

pdf video [BibTex]

pdf video [BibTex]


Virtual vs. {R}eal: Trading Off Simulations and Physical Experiments in Reinforcement Learning with {B}ayesian Optimization
Virtual vs. Real: Trading Off Simulations and Physical Experiments in Reinforcement Learning with Bayesian Optimization

Marco, A., Berkenkamp, F., Hennig, P., Schoellig, A. P., Krause, A., Schaal, S., Trimpe, S.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages: 1557-1563, IEEE, Piscataway, NJ, USA, IEEE International Conference on Robotics and Automation (ICRA), May 2017 (inproceedings)

am ics pn

PDF arXiv ICRA 2017 Spotlight presentation Virtual vs. Real - Video explanation DOI Project Page [BibTex]

PDF arXiv ICRA 2017 Spotlight presentation Virtual vs. Real - Video explanation DOI Project Page [BibTex]


Path Integral Guided Policy Search
Path Integral Guided Policy Search

Chebotar, Y., Kalakrishnan, M., Yahya, A., Li, A., Schaal, S., Levine, S.

Proceedings 2017 IEEE International Conference on Robotics and Automation (ICRA), IEEE, Piscataway, NJ, USA, IEEE International Conference on Robotics and Automation (ICRA), May 2017 (conference)

am

pdf video [BibTex]

pdf video [BibTex]

2013


Probabilistic Object Tracking Using a Range Camera
Probabilistic Object Tracking Using a Range Camera

Wüthrich, M., Pastor, P., Kalakrishnan, M., Bohg, J., Schaal, S.

In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages: 3195-3202, IEEE, November 2013 (inproceedings)

Abstract
We address the problem of tracking the 6-DoF pose of an object while it is being manipulated by a human or a robot. We use a dynamic Bayesian network to perform inference and compute a posterior distribution over the current object pose. Depending on whether a robot or a human manipulates the object, we employ a process model with or without knowledge of control inputs. Observations are obtained from a range camera. As opposed to previous object tracking methods, we explicitly model self-occlusions and occlusions from the environment, e.g, the human or robotic hand. This leads to a strongly non-linear observation model and additional dependencies in the Bayesian network. We employ a Rao-Blackwellised particle filter to compute an estimate of the object pose at every time step. In a set of experiments, we demonstrate the ability of our method to accurately and robustly track the object pose in real-time while it is being manipulated by a human or a robot.

am

arXiv Video Code Video DOI Project Page [BibTex]

2013


arXiv Video Code Video DOI Project Page [BibTex]


Hypothesis Testing Framework for Active Object Detection
Hypothesis Testing Framework for Active Object Detection

Sankaran, B., Atanasov, N., Le Ny, J., Koletschka, T., Pappas, G., Daniilidis, K.

In IEEE International Conference on Robotics and Automation (ICRA), May 2013, clmc (inproceedings)

Abstract
One of the central problems in computer vision is the detection of semantically important objects and the estimation of their pose. Most of the work in object detection has been based on single image processing and its performance is limited by occlusions and ambiguity in appearance and geometry. This paper proposes an active approach to object detection by controlling the point of view of a mobile depth camera. When an initial static detection phase identifies an object of interest, several hypotheses are made about its class and orientation. The sensor then plans a sequence of view-points, which balances the amount of energy used to move with the chance of identifying the correct hypothesis. We formulate an active M-ary hypothesis testing problem, which includes sensor mobility, and solve it using a point-based approximate POMDP algorithm. The validity of our approach is verified through simulation and experiments with real scenes captured by a kinect sensor. The results suggest a significant improvement over static object detection.

am

pdf [BibTex]

pdf [BibTex]


no image
Action and Goal Related Decision Variables Modulate the Competition Between Multiple Potential Targets

Enachescu, V, Christopoulos, Vassilios N, Schrater, P. R., Schaal, S.

In Abstracts of Neural Control of Movement Conference (NCM 2013), February 2013 (inproceedings)

am

[BibTex]

[BibTex]


The functional role of automatic body response in shaping voluntary actions based on muscle synergy theory
The functional role of automatic body response in shaping voluntary actions based on muscle synergy theory

Alnajjar, F. S., Berenz, V., Shimoda, S.

In Neural Engineering (NER), 2013 6th International IEEE/EMBS Conference on, pages: 1230-1233, 2013 (inproceedings)

am

DOI [BibTex]

DOI [BibTex]


Coaching robots with biosignals based on human affective social behaviors
Coaching robots with biosignals based on human affective social behaviors

Suzuki, K., Gruebler, A., Berenz, V.

In ACM/IEEE International Conference on Human-Robot Interaction, HRI 2013, Tokyo, Japan, March 3-6, 2013, pages: 419-420, 2013 (inproceedings)

am

link (url) [BibTex]

link (url) [BibTex]


Fusing visual and tactile sensing for 3-D object reconstruction while grasping
Fusing visual and tactile sensing for 3-D object reconstruction while grasping

Ilonen, J., Bohg, J., Kyrki, V.

In IEEE International Conference on Robotics and Automation (ICRA), pages: 3547-3554, 2013 (inproceedings)

Abstract
In this work, we propose to reconstruct a complete 3-D model of an unknown object by fusion of visual and tactile information while the object is grasped. Assuming the object is symmetric, a first hypothesis of its complete 3-D shape is generated from a single view. This initial model is used to plan a grasp on the object which is then executed with a robotic manipulator equipped with tactile sensors. Given the detected contacts between the fingers and the object, the full object model including the symmetry parameters can be refined. This refined model will then allow the planning of more complex manipulation tasks. The main contribution of this work is an optimal estimation approach for the fusion of visual and tactile data applying the constraint of object symmetry. The fusion is formulated as a state estimation problem and solved with an iterative extended Kalman filter. The approach is validated experimentally using both artificial and real data from two different robotic platforms.

am

DOI Project Page [BibTex]

DOI Project Page [BibTex]


no image
Learning Objective Functions for Manipulation

Kalakrishnan, M., Pastor, P., Righetti, L., Schaal, S.

In 2013 IEEE International Conference on Robotics and Automation, IEEE, Karlsruhe, Germany, 2013 (inproceedings)

Abstract
We present an approach to learning objective functions for robotic manipulation based on inverse reinforcement learning. Our path integral inverse reinforcement learning algorithm can deal with high-dimensional continuous state-action spaces, and only requires local optimality of demonstrated trajectories. We use L 1 regularization in order to achieve feature selection, and propose an efficient algorithm to minimize the resulting convex objective function. We demonstrate our approach by applying it to two core problems in robotic manipulation. First, we learn a cost function for redundancy resolution in inverse kinematics. Second, we use our method to learn a cost function over trajectories, which is then used in optimization-based motion planning for grasping and manipulation tasks. Experimental results show that our method outperforms previous algorithms in high-dimensional settings.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Learning Task Error Models for Manipulation

Pastor, P., Kalakrishnan, M., Binney, J., Kelly, J., Righetti, L., Sukhatme, G. S., Schaal, S.

In 2013 IEEE Conference on Robotics and Automation, IEEE, Karlsruhe, Germany, 2013 (inproceedings)

Abstract
Precise kinematic forward models are important for robots to successfully perform dexterous grasping and manipulation tasks, especially when visual servoing is rendered infeasible due to occlusions. A lot of research has been conducted to estimate geometric and non-geometric parameters of kinematic chains to minimize reconstruction errors. However, kinematic chains can include non-linearities, e.g. due to cable stretch and motor-side encoders, that result in significantly different errors for different parts of the state space. Previous work either does not consider such non-linearities or proposes to estimate non-geometric parameters of carefully engineered models that are robot specific. We propose a data-driven approach that learns task error models that account for such unmodeled non-linearities. We argue that in the context of grasping and manipulation, it is sufficient to achieve high accuracy in the task relevant state space. We identify this relevant state space using previously executed joint configurations and learn error corrections for those. Therefore, our system is developed to generate subsequent executions that are similar to previous ones. The experiments show that our method successfully captures the non-linearities in the head kinematic chain (due to a counterbalancing spring) and the arm kinematic chains (due to cable stretch) of the considered experimental platform, see Fig. 1. The feasibility of the presented error learning approach has also been evaluated in independent DARPA ARM-S testing contributing to successfully complete 67 out of 72 grasping and manipulation tasks.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]

2010


no image
Reinforcement learning of full-body humanoid motor skills

Stulp, F., Buchli, J., Theodorou, E., Schaal, S.

In Humanoid Robots (Humanoids), 2010 10th IEEE-RAS International Conference on, pages: 405-410, December 2010, clmc (inproceedings)

Abstract
Applying reinforcement learning to humanoid robots is challenging because humanoids have a large number of degrees of freedom and state and action spaces are continuous. Thus, most reinforcement learning algorithms would become computationally infeasible and require a prohibitive amount of trials to explore such high-dimensional spaces. In this paper, we present a probabilistic reinforcement learning approach, which is derived from the framework of stochastic optimal control and path integrals. The algorithm, called Policy Improvement with Path Integrals (PI2), has a surprisingly simple form, has no open tuning parameters besides the exploration noise, is model-free, and performs numerically robustly in high dimensional learning problems. We demonstrate how PI2 is able to learn full-body motor skills on a 34-DOF humanoid robot. To demonstrate the generality of our approach, we also apply PI2 in the context of variable impedance control, where both planned trajectories and gain schedules for each joint are optimized simultaneously.

am

link (url) [BibTex]

2010


link (url) [BibTex]


Enhanced Visual Scene Understanding through Human-Robot Dialog
Enhanced Visual Scene Understanding through Human-Robot Dialog

Johnson-Roberson, M., Bohg, J., Kragic, D., Skantze, G., Gustafson, J., Carlson, R.

In Proceedings of AAAI 2010 Fall Symposium: Dialog with Robots, November 2010 (inproceedings)

am

pdf [BibTex]

pdf [BibTex]


Scene Representation and Object Grasping Using Active Vision
Scene Representation and Object Grasping Using Active Vision

Gratal, X., Bohg, J., Björkman, M., Kragic, D.

In IROS’10 Workshop on Defining and Solving Realistic Perception Problems in Personal Robotics, October 2010 (inproceedings)

Abstract
Object grasping and manipulation pose major challenges for perception and control and require rich interaction between these two fields. In this paper, we concentrate on the plethora of perceptual problems that have to be solved before a robot can be moved in a controlled way to pick up an object. A vision system is presented that integrates a number of different computational processes, e.g. attention, segmentation, recognition or reconstruction to incrementally build up a representation of the scene suitable for grasping and manipulation of objects. Our vision system is equipped with an active robotic head and a robot arm. This embodiment enables the robot to perform a number of different actions like saccading, fixating, and grasping. By applying these actions, the robot can incrementally build a scene representation and use it for interaction. We demonstrate our system in a scenario for picking up known objects from a table top. We also show the system’s extendibility towards grasping of unknown and familiar objects.

am

video pdf slides [BibTex]

video pdf slides [BibTex]


Strategies for multi-modal scene exploration
Strategies for multi-modal scene exploration

Bohg, J., Johnson-Roberson, M., Björkman, M., Kragic, D.

In Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on, pages: 4509-4515, October 2010 (inproceedings)

Abstract
We propose a method for multi-modal scene exploration where initial object hypothesis formed by active visual segmentation are confirmed and augmented through haptic exploration with a robotic arm. We update the current belief about the state of the map with the detection results and predict yet unknown parts of the map with a Gaussian Process. We show that through the integration of different sensor modalities, we achieve a more complete scene model. We also show that the prediction of the scene structure leads to a valid scene representation even if the map is not fully traversed. Furthermore, we propose different exploration strategies and evaluate them both in simulation and on our robotic platform.

am

video pdf DOI Project Page [BibTex]

video pdf DOI Project Page [BibTex]


Attention-based active 3D point cloud segmentation
Attention-based active 3D point cloud segmentation

Johnson-Roberson, M., Bohg, J., Björkman, M., Kragic, D.

In Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on, pages: 1165-1170, October 2010 (inproceedings)

Abstract
In this paper we present a framework for the segmentation of multiple objects from a 3D point cloud. We extend traditional image segmentation techniques into a full 3D representation. The proposed technique relies on a state-of-the-art min-cut framework to perform a fully 3D global multi-class labeling in a principled manner. Thereby, we extend our previous work in which a single object was actively segmented from the background. We also examine several seeding methods to bootstrap the graphical model-based energy minimization and these methods are compared over challenging scenes. All results are generated on real-world data gathered with an active vision robotic head. We present quantitive results over aggregate sets as well as visual results on specific examples.

am

pdf DOI [BibTex]

pdf DOI [BibTex]


no image
Relative Entropy Policy Search

Peters, J., Mülling, K., Altun, Y.

In Proceedings of the Twenty-Fourth National Conference on Artificial Intelligence, pages: 1607-1612, (Editors: Fox, M. , D. Poole), AAAI Press, Menlo Park, CA, USA, Twenty-Fourth National Conference on Artificial Intelligence (AAAI-10), July 2010 (inproceedings)

Abstract
Policy search is a successful approach to reinforcement learning. However, policy improvements often result in the loss of information. Hence, it has been marred by premature convergence and implausible solutions. As first suggested in the context of covariant policy gradients (Bagnell and Schneider 2003), many of these problems may be addressed by constraining the information loss. In this paper, we continue this path of reasoning and suggest the Relative Entropy Policy Search (REPS) method. The resulting method differs significantly from previous policy gradient approaches and yields an exact update step. It works well on typical reinforcement learning benchmark problems.

am ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Reinforcement learning of motor skills in high dimensions: A path integral approach

Theodorou, E., Buchli, J., Schaal, S.

In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages: 2397-2403, May 2010, clmc (inproceedings)

Abstract
Reinforcement learning (RL) is one of the most general approaches to learning control. Its applicability to complex motor systems, however, has been largely impossible so far due to the computational difficulties that reinforcement learning encounters in high dimensional continuous state-action spaces. In this paper, we derive a novel approach to RL for parameterized control policies based on the framework of stochastic optimal control with path integrals. While solidly grounded in optimal control theory and estimation theory, the update equations for learning are surprisingly simple and have no danger of numerical instabilities as neither matrix inversions nor gradient learning rates are required. Empirical evaluations demonstrate significant performance improvements over gradient-based policy learning and scalability to high-dimensional control problems. Finally, a learning experiment on a robot dog illustrates the functionality of our algorithm in a real-world scenario. We believe that our new algorithm, Policy Improvement with Path Integrals (PI2), offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL in robotics.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Inverse dynamics control of floating base systems using orthogonal decomposition

Mistry, M., Buchli, J., Schaal, S.

In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages: 3406-3412, May 2010, clmc (inproceedings)

Abstract
Model-based control methods can be used to enable fast, dexterous, and compliant motion of robots without sacrificing control accuracy. However, implementing such techniques on floating base robots, e.g., humanoids and legged systems, is non-trivial due to under-actuation, dynamically changing constraints from the environment, and potentially closed loop kinematics. In this paper, we show how to compute the analytically correct inverse dynamics torques for model-based control of sufficiently constrained floating base rigid-body systems, such as humanoid robots with one or two feet in contact with the environment. While our previous inverse dynamics approach relied on an estimation of contact forces to compute an approximate inverse dynamics solution, here we present an analytically correct solution by using an orthogonal decomposition to project the robot dynamics onto a reduced dimensional space, independent of contact forces. We demonstrate the feasibility and robustness of our approach on a simulated floating base bipedal humanoid robot and an actual robot dog locomoting over rough terrain.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Fast, robust quadruped locomotion over challenging terrain

Kalakrishnan, M., Buchli, J., Pastor, P., Mistry, M., Schaal, S.

In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages: 2665-2670, May 2010, clmc (inproceedings)

Abstract
We present a control architecture for fast quadruped locomotion over rough terrain. We approach the problem by decomposing it into many sub-systems, in which we apply state-of-the-art learning, planning, optimization and control techniques to achieve robust, fast locomotion. Unique features of our control strategy include: (1) a system that learns optimal foothold choices from expert demonstration using terrain templates, (2) a body trajectory optimizer based on the Zero-Moment Point (ZMP) stability criterion, and (3) a floating-base inverse dynamics controller that, in conjunction with force control, allows for robust, compliant locomotion over unperceived obstacles. We evaluate the performance of our controller by testing it on the LittleDog quadruped robot, over a wide variety of rough terrain of varying difficulty levels. We demonstrate the generalization ability of this controller by presenting test results from an independent external test team on terrains that have never been shown to us.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Accelerometer-based Tilt Estimation of a Rigid Body with only Rotational Degrees of Freedom

Trimpe, S., D’Andrea, R.

In Proceedings of the IEEE International Conference on Robotics and Automation, 2010 (inproceedings)

am ics

PDF DOI [BibTex]

PDF DOI [BibTex]