Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Autonomous Motion Conference Paper Learning from demonstration Schaal, S. In Advances in Neural Information Processing Systems 9, 1040-1046, (Editors: Mozer, M. C.;Jordan, M.;Petsche, T.), MIT Press, Cambridge, MA, 1997, clmc
By now it is widely accepted that learning a task from scratch, i.e., without any prior knowledge, is a daunting undertaking. Humans, however, rarely attempt to learn from scratch. They extract initial biases as well as strategies how to approach a learning problem from instructions and/or demonstrations of other humans. For learning control, this paper investigates how learning from demonstration can be applied in the context of reinforcement learning. We consider priming the Q-function, the value function, the policy, and the model of the task dynamics as possible areas where demonstrations can speed up learning. In general nonlinear learning problems, only model-based reinforcement learning shows significant speed-up after a demonstration, while in the special case of linear quadratic regulator (LQR) problems, all methods profit from the demonstration. In an implementation of pole balancing on a complex anthropomorphic robot arm, we demonstrate that, when facing the complexities of real signal processing, model-based reinforcement learning offers the most robustness for LQR problems. Using the suggested methods, the robot learns pole balancing in just a single trial after a 30 second long demonstration of the human instructor. 
URL BibTeX

Autonomous Motion Conference Paper Learning tasks from a single demonstration Atkeson, C. G., Schaal, S. In IEEE International Conference on Robotics and Automation (ICRA97), 2:1706-1712, Piscataway, NJ: IEEE, Albuquerque, NM, 20-25 April, 1997, clmc
Learning a complex dynamic robot manoeuvre from a single human demonstration is difficult. This paper explores an approach to learning from demonstration based on learning an optimization criterion from the demonstration and a task model from repeated attempts to perform the task, and using the learned criterion and model to compute an appropriate robot movement. A preliminary version of the approach has been implemented on an anthropomorphic robot arm using a pendulum swing up task as an example
URL BibTeX

Autonomous Motion Conference Paper Local dimensionality reduction for locally weighted learning Vijayakumar, S., Schaal, S. In International Conference on Computational Intelligence in Robotics and Automation, 220-225, Monteray, CA, July10-11, 1997, 1997, clmc
Incremental learning of sensorimotor transformations in high dimensional spaces is one of the basic prerequisites for the success of autonomous robot devices as well as biological movement systems. So far, due to sparsity of data in high dimensional spaces, learning in such settings requires a significant amount of prior knowledge about the learning task, usually provided by a human expert. In this paper we suggest a partial revision of the view. Based on empirical studies, it can been observed that, despite being globally high dimensional and sparse, data distributions from physical movement systems are locally low dimensional and dense. Under this assumption, we derive a learning algorithm, Locally Adaptive Subspace Regression, that exploits this property by combining a local dimensionality reduction as a preprocessing step with a nonparametric learning technique, locally weighted regression. The usefulness of the algorithm and the validity of its assumptions are illustrated for a synthetic data set and data of the inverse dynamics of an actual 7 degree-of-freedom anthropomorphic robot arm.
URL BibTeX

Autonomous Motion Article Locally weighted learning Atkeson, C. G., Moore, A. W., Schaal, S. Artificial Intelligence Review, 11(1-5):11-73, 1997, clmc
This paper surveys locally weighted learning, a form of lazy learning and memory-based learning, and focuses on locally weighted linear regression. The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, assessing predictions, handling noisy data and outliers, improving the quality of predictions by tuning fit parameters, interference between old and new data, implementing locally weighted learning efficiently, and applications of locally weighted learning. A companion paper surveys how locally weighted learning can be used in robot learning and control. Keywords: locally weighted regression, LOESS, LWR, lazy learning, memory-based learning, least commitment learning, distance functions, smoothing parameters, weighting functions, global tuning, local tuning, interference.
URL BibTeX

Autonomous Motion Article Locally weighted learning for control Atkeson, C. G., Moore, A. W., Schaal, S. Artificial Intelligence Review, 11(1-5):75-113, 1997, clmc
Lazy learning methods provide useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of complex systems. This paper surveys ways in which locally weighted learning, a type of lazy learning, has been applied by us to control tasks. We explain various forms that control tasks can take, and how this affects the choice of learning paradigm. The discussion section explores the interesting impact that explicitly remembering all previous experiences has on the problem of learning to control. Keywords: locally weighted regression, LOESS, LWR, lazy learning, memory-based learning, least commitment learning, forward models, inverse models, linear quadratic regulation (LQR), shifting setpoint algorithm, dynamic programming.
URL BibTeX

Autonomous Motion Conference Paper Robot learning from demonstration Atkeson, C. G., Schaal, S. In Machine Learning: Proceedings of the Fourteenth International Conference (ICML ’97), 12-20, (Editors: Fisher Jr., D. H.), Morgan Kaufmann, Nashville, TN, July 8-12, 1997, 1997, clmc
The goal of robot learning from demonstration is to have a robot learn from watching a demonstration of the task to be performed. In our approach to learning from demonstration the robot learns a reward function from the demonstration and a task model from repeated attempts to perform the task. A policy is computed based on the learned reward function and task model. Lessons learned from an implementation on an anthropomorphic robot arm using a pendulum swing up task include 1) simply mimicking demonstrated motions is not adequate to perform this task, 2) a task planner can use a learned model and reward function to compute an appropriate policy, 3) this model-based planning process supports rapid learning, 4) both parametric and nonparametric models can be learned and used, and 5) incorporating a task level direct learning component, which is non-model-based, in addition to the model-based planner, is useful in compensating for structural modeling errors and slow model learning. 
URL BibTeX

Autonomous Motion Conference Paper A kendama learning robot based on a dynamic optimiation principle Miyamoto, H., Gandolfo, F., Gomi, H., Schaal, S., Koike, Y., Rieka, O., Nakano, E., Wada, Y., Kawato, M. In Preceedings of the International Conference on Neural Information Processing, 938-942, Hong Kong, September 1996, clmc BibTeX

Empirical Inference Conference Paper Incorporating invariances in support vector learning machines Schölkopf, B., Burges, C., Vapnik, V. In Artificial Neural Networks --- ICANN‘96, Artificial Neural Networks: ICANN 96, LNCS vol. 1112, 47-52, (Editors: C von der Malsburg and W von Seelen and JC Vorbrüggen and B Sendhoff), Springer, Berlin, Germany, 6th International Conference on Artificial Neural Networks, July 1996, volume 1112 of Lecture Notes in Computer Science
Developed only recently, support vector learning machines achieve high generalization ability by minimizing a bound on the expected test error; however, so far there existed no way of adding knowledge about invariances of a classification problem at hand. We present a method of incorporating prior knowledge about transformation invariances by applying transformations to support vectors, the training examples most critical for determining the classification boundary.
PDF DOI BibTeX

Autonomous Motion Article A Kendama learning robot based on bi-directional theory Miyamoto, H., Schaal, S., Gandolfo, F., Koike, Y., Osu, R., Nakano, E., Wada, Y., Kawato, M. Neural Networks, 9(8):1281-1302, 1996, clmc
A general theory of movement-pattern perception based on bi-directional theory for sensory-motor integration can be used for motion capture and learning by watching in robotics. We demonstrate our methods using the game of Kendama, executed by the SARCOS Dextrous Slave Arm, which has a very similar kinematic structure to the human arm. Three ingredients have to be integrated for the successful execution of this task. The ingredients are (1) to extract via-points from a human movement trajectory using a forward-inverse relaxation model, (2) to treat via-points as a control variable while reconstructing the desired trajectory from all the via-points, and (3) to modify the via-points for successful execution. In order to test the validity of the via-point representation, we utilized a numerical model of the SARCOS arm, and examined the behavior of the system under several conditions.
URL BibTeX

Autonomous Motion Book Chapter From isolation to cooperation: An alternative of a system of experts Schaal, S., Atkeson, C. G. In Advances in Neural Information Processing Systems 8, 605-611, (Editors: Touretzky, D. S.;Mozer, M. C.;Hasselmo, M. E.), MIT Press, Cambridge, MA, 1996, clmc
We introduce a constructive, incremental learning system for regression problems that models data by means of locally linear experts. In contrast to other approaches, the experts are trained independently and do not compete for data during learning. Only when a prediction for a query is required do the experts cooperate by blending their individual predictions. Each expert is trained by minimizing a penalized local cross validation error using second order methods. In this way, an expert is able to adjust the size and shape of the receptive field in which its predictions are valid, and also to adjust its bias on the importance of individual input dimensions. The size and shape adjustment corresponds to finding a local distance metric, while the bias adjustment accomplishes local dimensionality reduction. We derive asymptotic results for our method. In a variety of simulations we demonstrate the properties of the algorithm with respect to interference, learning speed, prediction accuracy, feature detection, and task oriented incremental learning. 
URL BibTeX

Materials Article Influence of humidity on polycrystalline Cu(In,Ga)Se-2 thin films for solar cells: A study of Na and H2O coadsorption Heske, C., Richter, G., Chen, Z. H., Fink, R., Umbach, E., Riedl, W., Karg, F. Conference Record of the Twenty Fifth Ieee Photovoltaic Specialists Conference - 1996, 861-864, 1996 DOI BibTeX

Autonomous Motion Article One-handed juggling: A dynamical approach to a rhythmic movement task Schaal, S., Sternad, D., Atkeson, C. G. Journal of Motor Behavior, 28(2):165-183, 1996, clmc
The skill of rhythmic juggling a ball on a racket is investigated from the viewpoint of nonlinear dynamics. The difference equations that model the dynamical system are analyzed by means of local and non-local stability analyses. These analyses yield that the task dynamics offer an economical juggling pattern which is stable even for open-loop actuator motion. For this pattern, two types of pre dictions are extracted: (i) Stable periodic bouncing is sufficiently characterized by a negative acceleration of the racket at the moment of impact with the ball; (ii) A nonlinear scaling relation maps different juggling trajectories onto one topologically equivalent dynamical system. The relevance of these results for the human control of action was evaluated in an experiment where subjects performed a comparable task of juggling a ball on a paddle. Task manipulations involved different juggling heights and gravity conditions of the ball. The predictions were confirmed: (i) For stable rhythmic performance the paddle's acceleration at impact is negative and fluctuations of the impact acceleration follow predictions from global stability analysis; (ii) For each subject, the realizations of juggling for the different experimental conditions are related by the scaling relation. These results allow the conclusion that for the given task, humans reliably exploit the stable solutions inherent to the dynamics of the task and do not overrule these dynamics by other control mechanisms. The dynamical scaling serves as an efficient principle to generate different movement realizations from only a few parameter changes and is discussed as a dynamical formalization of the principle of motor equivalence.
URL BibTeX

Autonomous Motion Conference Paper A kendama learning robot based on a dynamic optimization theory Miyamoto, H., Gandolfo, F., Gomi, H., Schaal, S., Koike, Y., Osu, R., Nakano, E., Kawato, M. In Preceedings of the 4th IEEE International Workshop on Robot and Human Communication (RO-MAN’95), 327-332, Tokyo, July 1995, clmc BibTeX

Autonomous Motion Book Chapter Batting a ball: Dynamics of a rhythmic skill Sternad, D., Schaal, S., Atkeson, C. G. In Studies in Perception and Action, 119-122, (Editors: Bardy, B.;Bostma, R.;Guiard, Y.), Erlbaum, Hillsdayle, NJ, 1995, clmc BibTeX

Autonomous Motion Article Memory-based neural networks for robot learning Atkeson, C. G., Schaal, S. Neurocomputing, 9:1-27, 1995, clmc
This paper explores a memory-based approach to robot learning, using memory-based neural networks to learn models of the task to be performed. Steinbuch and Taylor presented neural network designs to explicitly store training data and do nearest neighbor lookup in the early 1960s. In this paper their nearest neighbor network is augmented with a local model network, which fits a local model to a set of nearest neighbors. This network design is equivalent to a statistical approach known as locally weighted regression, in which a local model is formed to answer each query, using a weighted regression in which nearby points (similar experiences) are weighted more than distant points (less relevant experiences). We illustrate this approach by describing how it has been used to enable a robot to learn a difficult juggling task. Keywords: memory-based, robot learning, locally weighted regression, nearest neighbor, local models.
URL BibTeX

Physical Intelligence Conference Paper Visual tracking for moving multiple objects: an integration of vision and control Sitti, M., Bozma, I., Denker, A. In Industrial Electronics, 1995. ISIE’95., Proceedings of the IEEE International Symposium on, 2:535-540, 1995 BibTeX

Autonomous Motion Conference Paper Assessing the quality of learned local models Schaal, S., Atkeson, C. G. In Advances in Neural Information Processing Systems 6, 160-167, (Editors: Cowan, J.;Tesauro, G.;Alspector, J.), Morgan Kaufmann, San Mateo, CA, 1994, clmc
An approach is presented to learning high dimensional functions in the case where the learning algorithm can affect the generation of new data. A local modeling algorithm, locally weighted regression, is used to represent the learned function. Architectural parameters of the approach, such as distance metrics, are also localized and become a function of the query point instead of being global. Statistical tests are given for when a local model is good enough and sampling should be moved to a new area. Our methods explicitly deal with the case where prediction accuracy requirements exist during exploration: By gradually shifting a "center of exploration" and controlling the speed of the shift with local prediction accuracy, a goal-directed exploration of state space takes place along the fringes of the current data support until the task goal is achieved. We illustrate this approach with simulation results and results from a real robot learning a complex juggling task.
URL BibTeX

Autonomous Motion Conference Paper Memory-based robot learning Schaal, S., Atkeson, C. G. In IEEE International Conference on Robotics and Automation, 3:2928-2933, San Diego, CA, 1994, clmc
We present a memory-based local modeling approach to robot learning using a nonparametric regression technique, locally weighted regression. The model of the task to be performed is represented by infinitely many local linear models, the (hyper-) tangent planes at every query point. This is in contrast to other methods using a finite set of linear models to accomplish a piece-wise linear model. Architectural parameters of our approach, such as distance metrics, are a function of the current query point instead of being global. Statistical tests are presented for when a local model is good enough such that it can be reliably used to build a local controller. These statistical measures also direct the exploration of the robot. We explicitly deal with the case where prediction accuracy requirements exist during exploration: By gradually shifting a center of exploration and controlling the speed of the shift with local prediction accuracy, a goal-directed exploration of state space takes place along the fringes of the current data support until the task goal is achieved. We illustrate this approach by describing how it has been used to enable a robot to learn a challenging juggling task: within 40 to 100 trials the robot accomplished the task goal starting out with no initial experiences.
BibTeX

Autonomous Motion Conference Paper Nonparametric regression for learning Schaal, S. In Conference on Adaptive Behavior and Learning, Center of Interdisciplinary Research (ZIF) Bielefeld Germany, also technical report TR-H-098 of the ATR Human Information Processing Research Laboratories, 1994, clmc
In recent years, learning theory has been increasingly influenced by the fact that many learning algorithms have at least in part a comprehensive interpretation in terms of well established statistical theories. Furthermore, with little modification, several statistical methods can be directly cast into learning algorithms. One family of such methods stems from nonparametric regression. This paper compares nonparametric learning with the more widely used parametric counterparts and investigates how these two families differ in their properties and their applicability. 
URL BibTeX

Autonomous Motion Article Robot juggling: An implementation of memory-based learning Schaal, S., Atkeson, C. G. Control Systems Magazine, 14(1):57-71, 1994, clmc
This paper explores issues involved in implementing robot learning for a challenging dynamic task, using a case study from robot juggling. We use a memory-based local modeling approach (locally weighted regression) to represent a learned model of the task to be performed. Statistical tests are given to examine the uncertainty of a model, to optimize its prediction quality, and to deal with noisy and corrupted data. We develop an exploration algorithm that explicitly deals with prediction accuracy requirements during exploration. Using all these ingredients in combination with methods from optimal control, our robot achieves fast real-time learning of the task within 40 to 100 trials.
URL BibTeX

Autonomous Motion Conference Paper Robot learning by nonparametric regression Schaal, S., Atkeson, C. G. In Proceedings of the International Conference on Intelligent Robots and Systems (IROS’94), 478-485, Munich Germany, 1994, clmc
We present an approach to robot learning grounded on a nonparametric regression technique, locally weighted regression. The model of the task to be performed is represented by infinitely many local linear models, i.e., the (hyper-) tangent planes at every query point. Such a model, however, is only generated when a query is performed and is not retained. This is in contrast to other methods using a finite set of linear models to accomplish a piecewise linear model. Architectural parameters of our approach, such as distance metrics, are also a function of the current query point instead of being global. Statistical tests are presented for when a local model is good enough such that it can be reliably used to build a local controller. These statistical measures also direct the exploration of the robot. We explicitly deal with the case where prediction accuracy requirements exist during exploration: By gradually shifting a center of exploration and controlling the speed of the shift with local prediction accuracy, a goal-directed exploration of state space takes place along the fringes of the current data support until the task goal is achieved. We illustrate this approach by describing how it has been used to enable a robot to learn a challenging juggling task: Within 40 to 100 trials the robot accomplished the task goal starting out with no initial experiences.
BibTeX

Autonomous Motion Book Chapter A genetic algorithm for evolution from an ecological perspective Sternad, D., Schaal, S. In 1992 Lectures in Complex Systems, 223-231, (Editors: Nadel, L.;Stein, D.), Addison-Wesley, Redwood City, CA, 1993, clmc
In the population model presented, an evolutionary dynamic is explored which is based on the operator characteristics of genetic algorithms. An essential modification in the genetic algorithms is the inclusion of a constraint in the mixing of the gene pool. The pairing for the crossover is governed by a selection principle based on a complementarity criterion derived from the theoretical tenet of perception-action (P-A) mutuality of ecological psychology. According to Swenson and Turvey [37] P-A mutuality underlies evolution and is an integral part of its thermodynamics. The present simulation tested the contribution of P-A-cycles in evolutionary dynamics. A numerical experiment compares the population's evolution with and without this intentional component. The effect is measured in the difference of the rate of energy dissipation, as well as in three operationalized aspects of complexity. The results support the predicted increase in the rate of energy dissipation, paralleled by an increase in the average heterogeneity of the population. Furthermore, the spatio-temporal evolution of the system is tested for the characteristic power-law relations of a nonlinear system poised in a critical state. The frequency distribution of consecutive increases in population size shows a significantly different exponent in functional relationship.
BibTeX

Autonomous Motion Article Design concurrent calculation: A CAD- and data-integrated approach Schaal, S., Ehrlenspiel, K. Journal of Engineering Design, 4:71-85, 1993, clmc
Besides functional regards, product design demands increasingly more for further reaching considerations. Quality alone cannot suffice anymore to compete in the market; design for manufacturability, for assembly, for recycling, etc., are well-known keywords. Those can largely be reduced to the necessity of design for costs. This paper focuses on a CAD-based approach to design concurrent calculation. It will discuss how, in the meantime well-established, tools like feature technology, knowledge-based systems, and relational databases can be blended into one coherent concept to achieve an entirely CAD- and data-integrated cost information tool. This system is able to extract data from the CAD-system, combine it with data about the company specific manufacturing environment, and subsequently autonomously evaluate manufacturability aspects and costs of the given CAD-model. Within minutes the designer gets quantitative in-formation about the major cost sources of his/her design. Additionally, some alternative methods for approximating manu-facturing times from empirical data, namely neural networks and local weighted regression, are introduced.
BibTeX

Autonomous Motion Book Chapter Learning passive motor control strategies with genetic algorithms Schaal, S., Sternad, D. In 1992 Lectures in complex systems, 913-918, (Editors: Nadel, L.;Stein, D.), Addison-Wesley, Redwood City, CA, 1993, clmc
This study investigates learning passive motor control strategies. Passive control is understood as control without active error correction; the movement is stabilized by particular properties of the controlling dynamics. We analyze the task of juggling a ball on a racket. An approximation to the optimal solution of the task is derived by means of optimization theory. In order to model the learning process, the problem is coded for a genetic algorithm in representations without sensory or with sensory information. For all representations the genetic algorithm is able to find passive control strategies, but learning speed and the quality of the outcome are significantly different. A comparison with data from human subjects shows that humans seem to apply yet different movement strategies to the ones proposed. For the feedback representation some implications arise for learning from demonstration.
URL BibTeX

Autonomous Motion Conference Paper Open loop stable control strategies for robot juggling Schaal, S., Atkeson, C. G. In IEEE International Conference on Robotics and Automation, 3:913-918, Piscataway, NJ: IEEE, Georgia, Atlanta, May 2-6, 1993, clmc
In a series of case studies out of the field of dynamic manipulation (Mason, 1992), different principles for open loop stable control are introduced and analyzed. This investigation may provide some insight into how open loop control can serve as a useful foundation for closed loop control and, particularly, what to focus on in learning control. 
URL BibTeX

Autonomous Motion Conference Paper Roles for memory-based learning in robotics Atkeson, C. G., Schaal, S. In Proceedings of the Sixth International Symposium on Robotics Research, 503-521, Hidden Valley, PA, 1993, clmc BibTeX

Autonomous Motion Book Chapter Informationssysteme mit CAD (Information systems within CAD) Schaal, S. In CAD/CAM Grundlagen, 199-204, (Editors: Milberg, J.), Springer, Buchreihe CIM-TT. Berlin, 1992, clmc BibTeX

Autonomous Motion Book Integrierte Wissensverarbeitung mit CAD am Beispiel der konstruktionsbegleitenden Kalkulation (Ways to smarter CAD Systems) Schaal, S. Hanser 1992. (Konstruktionstechnik München Band 8). Zugl. München: TU Diss., München, 1992, clmc BibTeX

Autonomous Motion Conference Paper What should be learned? Schaal, S., Atkeson, C. G., Botros, S. In Proceedings of Seventh Yale Workshop on Adaptive and Learning Systems, 199-204, New Haven, CT, May 20-22, 1992, clmc BibTeX

Autonomous Motion Book Chapter Ways to smarter CAD-systems Ehrlenspiel, K., Schaal, S. In Proceedings of ICED’91Heurista, 10-16, (Editors: Hubka), Edition, Schriftenreihe WDK 21. Zürich, 1991, clmc BibTeX

Physical Intelligence Article In vivo diabetic wound healing with nanofibrous scaffolds modified with gentamicin and recombinant human epidermal growth factor Dwivedi, C., Pandey, I., Pandey, H., Patil, S., Mishra, S. B., Pandey, A. C., Zamboni, P., Ramteke, P. W., Singh, A. V. Journal of Biomedical Materials Research Part A, 106(3):641-651, March 0
Abstract Diabetic wounds are susceptible to microbial infection. The treatment of these wounds requires a higher payload of growth factors. With this in mind, the strategy for this study was to utilize a novel payload comprising of Eudragit RL/RS 100 nanofibers carrying the bacterial inhibitor gentamicin sulfate (GS) in concert with recombinant human epidermal growth factor (rhEGF); an accelerator of wound healing. GS containing Eudragit was electrospun to yield nanofiber scaffolds, which were further modified by covalent immobilization of rhEGF to their surface. This novel fabricated nanoscaffold was characterized using scanning electron microscopy, Fourier transform infrared spectroscopy, and X‐ray diffraction. The thermal behavior of the nanoscaffold was determined using thermogravimetric analysis and differential scanning calorimetry. In the in vitro antibacterial assays, the nanoscaffolds exhibited comparable antibacterial activity to pure gentemicin powder. In vivo work using female C57/BL6 mice, the nanoscaffolds induced faster wound healing activity in dorsal wounds compared to the control. The paradigm in this study presents a robust in vivo model to enhance the applicability of drug delivery systems in wound healing applications. © 2017 Wiley Periodicals, Inc. J Biomed Mater Res Part A: 106A: 641–651, 2018.
DOI URL BibTeX

Article Classified Regression for Bayesian Optimization: Robot Learning with Unknown Penalties Marco, A., Baumann, D., Hennig, P., Trimpe, S. 0, Submitted to Journal (In preparation)
Learning robot controllers by minimizing a black-box objective cost using Bayesian optimization (BO) can be time-consuming and challenging. It is very often the case that some roll-outs result in failure behaviors, causing premature experiment detention. In such cases, the designer is forced to decide on heuristic cost penalties because the acquired data is often scarce, or not comparable with that of the stable policies. To overcome this, we propose a Bayesian model that captures exactly what we know about the cost of unstable controllers prior to data collection: Nothing, except that it should be a somewhat large number. The resulting Bayesian model, approximated with a Gaussian process, predicts high cost values in regions where failures are likely to occur. In this way, the model guides the BO exploration toward regions of stability. We demonstrate the benefits of the proposed model in several illustrative and statistical synthetic benchmarks, and also in experiments on a real robotic platform. In addition, we propose and experimentally validate a new BO method to account for unknown constraints. Such method is an extension of Max-Value Entropy Search, a recent information-theoretic method, to solve unconstrained global optimization problems.
arXiv URL BibTeX

Conference Paper Goal-conditioned Offline Planning from Curious Exploration Bagatella, M., Martius, G. In Advances in Neural Information Processing Systems 36, 0 BibTeX

Conference Paper Goal-conditioned Offline Planning from Curious Exploration Bagatella, M., Martius, G. In Advances in Neural Information Processing Systems 36, 0
Curiosity has established itself as a powerful exploration strategy in deep reinforcement learning. Notably, leveraging expected future novelty as intrinsic motivation has been shown to efficiently generate exploratory trajectories, as well as a robust dynamics model. We consider the challenge of extracting goal-conditioned behavior from the products of such unsupervised exploration techniques, without any additional environment interaction. We find that conventional goal-conditioned reinforcement learning approaches for extracting a value function and policy fall short in this difficult offline setting. By analyzing the geometry of optimal goal-conditioned value functions, we relate this issue to a specific class of estimation artifacts in learned values. In order to mitigate their occurrence, we propose to combine model-based planning over learned value landscapes with a graph-based value aggregation scheme. We show how this combination can correct both local and global artifacts, obtaining significant improvements in zero-shot goal-reaching performance across diverse simulated environments.
URL BibTeX