Header logo is


2009


no image
Path integral-based stochastic optimal control for rigid body dynamics

Theodorou, E. A., Buchli, J., Schaal, S.

In Adaptive Dynamic Programming and Reinforcement Learning, 2009. ADPRL ’09. IEEE Symposium on, pages: 219-225, 2009, clmc (inproceedings)

Abstract
Recent advances on path integral stochastic optimal control [1],[2] provide new insights in the optimal control of nonlinear stochastic systems which are linear in the controls, with state independent and time invariant control transition matrix. Under these assumptions, the Hamilton-Jacobi-Bellman (HJB) equation is formulated and linearized with the use of the logarithmic transformation of the optimal value function. The resulting HJB is a linear second order partial differential equation which is solved by an approximation based on the Feynman-Kac formula [3]. In this work we review the theory of path integral control and derive the linearized HJB equation for systems with state dependent control transition matrix. In addition we derive the path integral formulation for the general class of systems with state dimensionality that is higher than the dimensionality of the controls. Furthermore, by means of a modified inverse dynamics controller, we apply path integral stochastic optimal control over the new control space. Simulations illustrate the theoretical results. Future developments and extensions are discussed.

am

link (url) [BibTex]

2009


link (url) [BibTex]


no image
Learning locomotion over rough terrain using terrain templates

Kalakrishnan, M., Buchli, J., Pastor, P., Schaal, S.

In Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on, pages: 167-172, 2009, clmc (inproceedings)

Abstract
We address the problem of foothold selection in robotic legged locomotion over very rough terrain. The difficulty of the problem we address here is comparable to that of human rock-climbing, where foot/hand-hold selection is one of the most critical aspects. Previous work in this domain typically involves defining a reward function over footholds as a weighted linear combination of terrain features. However, a significant amount of effort needs to be spent in designing these features in order to model more complex decision functions, and hand-tuning their weights is not a trivial task. We propose the use of terrain templates, which are discretized height maps of the terrain under a foothold on different length scales, as an alternative to manually designed features. We describe an algorithm that can simultaneously learn a small set of templates and a foothold ranking function using these templates, from expert-demonstrated footholds. Using the LittleDog quadruped robot, we experimentally show that the use of terrain templates can produce complex ranking functions with higher performance than standard terrain features, and improved generalization to unseen terrain.

am

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


Valero-Cuevas, F., Hoffmann, H., Kurse, M. U., Kutch, J. J., Theodorou, E. A.

IEEE Reviews in Biomedical Engineering – (All authors have equally contributed), (2):110?135, 2009, clmc (article)

Abstract
Computational models of the neuromuscular system hold the potential to allow us to reach a deeper understanding of neuromuscular function and clinical rehabilitation by complementing experimentation. By serving as a means to distill and explore specific hypotheses, computational models emerge from prior experimental data and motivate future experimental work. Here we review computational tools used to understand neuromuscular function including musculoskeletal modeling, machine learning, control theory, and statistical model analysis. We conclude that these tools, when used in combination, have the potential to further our understanding of neuromuscular function by serving as a rigorous means to test scientific hypotheses in ways that complement and leverage experimental data.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Compact models of motor primitive variations for predictible reaching and obstacle avoidance

Stulp, F., Oztop, E., Pastor, P., Beetz, M., Schaal, S.

In IEEE-RAS International Conference on Humanoid Robots (Humanoids 2009), Paris, Dec.7-10, 2009, clmc (inproceedings)

Abstract
over and over again. This regularity allows humans and robots to reuse existing solutions for known recurring tasks. We expect that reusing a set of standard solutions to solve similar tasks will facilitate the design and on-line adaptation of the control systems of robots operating in human environments. In this paper, we derive a set of standard solutions for reaching behavior from human motion data. We also derive stereotypical reaching trajectories for variations of the task, in which obstacles are present. These stereotypical trajectories are then compactly represented with Dynamic Movement Primitives. On the humanoid robot Sarcos CB, this approach leads to reproducible, predictable, and human-like reaching motions.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Human optimization strategies under reward feedback

Hoffmann, H., Theodorou, E., Schaal, S.

In Abstracts of Neural Control of Movement Conference (NCM 2009), Waikoloa, Hawaii, 2009, 2009, clmc (inproceedings)

Abstract
Many hypothesis on human movement generation have been cast into an optimization framework, implying that movements are adapted to optimize a single quantity, like, e.g., jerk, end-point variance, or control cost. However, we still do not understand how humans actually learn when given only a cost or reward feedback at the end of a movement. Such a reinforcement learning setting has been extensively explored theoretically in engineering and computer science, but in human movement control, hardly any experiment studied movement learning under reward feedback. We present experiments probing which computational strategies humans use to optimize a movement under a continuous reward function. We present two experimental paradigms. The first paradigm mimics a ball-hitting task. Subjects (n=12) sat in front of a computer screen and moved a stylus on a tablet towards an unknown target. This target was located on a line that the subjects had to cross. During the movement, visual feedback was suppressed. After the movement, a reward was displayed graphically as a colored bar. As reward, we used a Gaussian function of the distance between the target location and the point of line crossing. We chose such a function since in sensorimotor tasks, the cost or loss function that humans seem to represent is close to an inverted Gaussian function (Koerding and Wolpert 2004). The second paradigm mimics pocket billiards. On the same experimental setup as above, the computer screen displayed a pocket (two bars), a white disk, and a green disk. The goal was to hit with the white disk the green disk (as in a billiard collision), such that the green disk moved into the pocket. Subjects (n=8) manipulated with the stylus the white disk to effectively choose start point and movement direction. Reward feedback was implicitly given as hitting or missing the pocket with the green disk. In both paradigms, subjects increased the average reward over trials. The surprising result was that in these experiments, humans seem to prefer a strategy that uses a reward-weighted average over previous movements instead of gradient ascent. The literature on reinforcement learning is dominated by gradient-ascent methods. However, our computer simulations and theoretical analysis revealed that reward-weighted averaging is the more robust choice given the amount of movement variance observed in humans. Apparently, humans choose an optimization strategy that is suitable for their own movement variance.

am

[BibTex]

[BibTex]


no image
Bayesian Methods for Autonomous Learning Systems (Phd Thesis)

Ting, J.

Department of Computer Science, University of Southern California, Los Angeles, CA, 2009, clmc (phdthesis)

am

PDF [BibTex]

PDF [BibTex]


no image
On-line learning and modulation of periodic movements with nonlinear dynamical systems

Gams, A., Ijspeert, A., Schaal, S., Lenarčič, J.

Autonomous Robots, 27(1):3-23, 2009, clmc (article)

Abstract
Abstract  The paper presents a two-layered system for (1) learning and encoding a periodic signal without any knowledge on its frequency and waveform, and (2) modulating the learned periodic trajectory in response to external events. The system is used to learn periodic tasks on a humanoid HOAP-2 robot. The first layer of the system is a dynamical system responsible for extracting the fundamental frequency of the input signal, based on adaptive frequency oscillators. The second layer is a dynamical system responsible for learning of the waveform based on a built-in learning algorithm. By combining the two dynamical systems into one system we can rapidly teach new trajectories to robots without any knowledge of the frequency of the demonstration signal. The system extracts and learns only one period of the demonstration signal. Furthermore, the trajectories are robust to perturbations and can be modulated to cope with a dynamic environment. The system is computationally inexpensive, works on-line for any periodic signal, requires no additional signal processing to determine the frequency of the input signal and can be applied in parallel to multiple dimensions. Additionally, it can adapt to changes in frequency and shape, e.g. to non-stationary signals, such as hand-generated signals and human demonstrations.

am

link (url) [BibTex]

link (url) [BibTex]


no image
The SL simulation and real-time control software package

Schaal, S.

University of Southern California, Los Angeles, CA, 2009, clmc (techreport)

Abstract
SL was originally developed as a Simulation Laboratory software package to allow creating complex rigid-body dynamics simulations with minimal development times. It was meant to complement a real-time robotics setup such that robot programs could first be debugged in simulation before trying them on the actual robot. For this purpose, the motor control setup of SL was copied from our experience with real-time robot setups with vxWorks (Windriver Systems, Inc.)Ñindeed, more than 90% of the code is identical to the actual robot software, as will be explained later in detail. As a result, SL is divided into three software components: 1) the generic code that is shared by the actual robot and the simulation, 2) the robot specific code, and 3) the simulation specific code. The robot specific code is tailored to the robotic environments that we have experienced over the years, in particular towards VME-based multi-processor real-time operating systems. The simulation specific code has all the components for OpenGL graphics simulations and mimics the robot multi-processor environment in simple C-code. Importantly, SL can be used stand-alone for creating graphics an-imationsÑthe heritage from real-time robotics does not restrict the complexity of possible simulations. This technical report describes SL in detail and can serve as a manual for new users of SL.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Local dimensionality reduction for non-parametric regression

Hoffman, H., Schaal, S., Vijayakumar, S.

Neural Processing Letters, 2009, clmc (article)

Abstract
Locally-weighted regression is a computationally-efficient technique for non-linear regression. However, for high-dimensional data, this technique becomes numerically brittle and computationally too expensive if many local models need to be maintained simultaneously. Thus, local linear dimensionality reduction combined with locally-weighted regression seems to be a promising solution. In this context, we review linear dimensionality-reduction methods, compare their performance on nonparametric locally-linear regression, and discuss their ability to extend to incremental learning. The considered methods belong to the following three groups: (1) reducing dimensionality only on the input data, (2) modeling the joint input-output data distribution, and (3) optimizing the correlation between projection directions and output data. Group 1 contains principal component regression (PCR); group 2 contains principal component analysis (PCA) in joint input and output space, factor analysis, and probabilistic PCA; and group 3 contains reduced rank regression (RRR) and partial least squares (PLS) regression. Among the tested methods, only group 3 managed to achieve robust performance even for a non-optimal number of components (factors or projection directions). In contrast, group 1 and 2 failed for fewer components since these methods rely on the correct estimate of the true intrinsic dimensionality. In group 3, PLS is the only method for which a computationally-efficient incremental implementation exists. Thus, PLS appears to be ideally suited as a building block for a locally-weighted regressor in which projection directions are incrementally added on the fly.

am

link (url) [BibTex]

link (url) [BibTex]


no image
The SL simulation and real-time control software package

Schaal, S.

University of Southern California, Los Angeles, CA, 2009, clmc (techreport)

Abstract
SL was originally developed as a Simulation Laboratory software package to allow creating complex rigid-body dynamics simulations with minimal development times. It was meant to complement a real-time robotics setup such that robot programs could first be debugged in simulation before trying them on the actual robot. For this purpose, the motor control setup of SL was copied from our experience with real-time robot setups with vxWorks (Windriver Systems, Inc.)â??indeed, more than 90% of the code is identical to the actual robot software, as will be explained later in detail. As a result, SL is divided into three software components: 1) the generic code that is shared by the actual robot and the simulation, 2) the robot specific code, and 3) the simulation specific code. The robot specific code is tailored to the robotic environments that we have experienced over the years, in particular towards VME-based multi-processor real-time operating systems. The simulation specific code has all the components for OpenGL graphics simulations and mimics the robot multi-processor environment in simple C-code. Importantly, SL can be used stand-alone for creating graphics an-imationsâ??the heritage from real-time robotics does not restrict the complexity of possible simulations. This technical report describes SL in detail and can serve as a manual for new users of SL.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Learning and generalization of motor skills by learning from demonstration

Pastor, P., Hoffmann, H., Asfour, T., Schaal, S.

In International Conference on Robotics and Automation (ICRA2009), Kobe, Japan, May 12-19, 2009, 2009, clmc (inproceedings)

Abstract
We provide a general approach for learning robotic motor skills from human demonstration. To represent an observed movement, a non-linear differential equation is learned such that it reproduces this movement. Based on this representation, we build a library of movements by labeling each recorded movement according to task and context (e.g., grasping, placing, and releasing). Our differential equation is formulated such that generalization can be achieved simply by adapting a start and a goal parameter in the equation to the desired position values of a movement. For object manipulation, we present how our framework extends to the control of gripper orientation and finger position. The feasibility of our approach is demonstrated in simulation as well as on a real robot. The robot learned a pick-and-place operation and a water-serving task and could generalize these tasks to novel situations.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Compliant quadruped locomotion over rough terrain

Buchli, J., Kalakrishnan, M., Mistry, M., Pastor, P., Schaal, S.

In Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on, pages: 814-820, 2009, clmc (inproceedings)

Abstract
Many critical elements for statically stable walking for legged robots have been known for a long time, including stability criteria based on support polygons, good foothold selection, recovery strategies to name a few. All these criteria have to be accounted for in the planning as well as the control phase. Most legged robots usually employ high gain position control, which means that it is crucially important that the planned reference trajectories are a good match for the actual terrain, and that tracking is accurate. Such an approach leads to conservative controllers, i.e. relatively low speed, ground speed matching, etc. Not surprisingly such controllers are not very robust - they are not suited for the real world use outside of the laboratory where the knowledge of the world is limited and error prone. Thus, to achieve robust robotic locomotion in the archetypical domain of legged systems, namely complex rough terrain, where the size of the obstacles are in the order of leg length, additional elements are required. A possible solution to improve the robustness of legged locomotion is to maximize the compliance of the controller. While compliance is trivially achieved by reduced feedback gains, for terrain requiring precise foot placement (e.g. climbing rocks, walking over pegs or cracks) compliance cannot be introduced at the cost of inferior tracking. Thus, model-based control and - in contrast to passive dynamic walkers - active balance control is required. To achieve these objectives, in this paper we add two crucial elements to legged locomotion, i.e., floating-base inverse dynamics control and predictive force control, and we show that these elements increase robustness in face of unknown and unanticipated perturbations (e.g. obstacles). Furthermore, we introduce a novel line-based COG trajectory planner, which yields a simpler algorithm than traditional polygon based methods and creates the appropriate input to our control system.We show results from bot- h simulation and real world of a robotic dog walking over non-perceived obstacles and rocky terrain. The results prove the effectivity of the inverse dynamics/force controller. The presented results show that we have all elements needed for robust all-terrain locomotion, which should also generalize to other legged systems, e.g., humanoid robots.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Incorporating Muscle Activation-Contraction dynamics to an optimal control framework for finger movements

Theodorou, Evangelos A., Valero-Cuevas, Francisco J.

Abstracts of Neural Control of Movement Conference (NCM 2009), 2009, clmc (article)

Abstract
Recent experimental and theoretical work [1] investigated the neural control of contact transition between motion and force during tapping with the index finger as a nonlinear optimization problem. Such transitions from motion to well-directed contact force are a fundamental part of dexterous manipulation. There are 3 alternative hypotheses of how this transition could be accomplished by the nervous system as a function of changes in direction and magnitude of the torque vector controlling the finger. These hypotheses are 1) an initial change in direction with a subsequent change in magnitude of the torque vector; 2) an initial change in magnitude with a subsequent directional change of the torque vector; and 3) a simultaneous and proportionally equal change of both direction and magnitude of the torque vector. Experimental work in [2] shows that the nervous system selects the first strategy, and in [1] we suggest that this may in fact be the optimal strategy. In [4] the framework of Iterative Linear Quadratic Optimal Regulator (ILQR) was extended to incorporate motion and force control. However, our prior simulation work assumed direct and instantaneous control of joint torques, which ignores the known delays and filtering properties of skeletal muscle. In this study, we implement an ILQR controller for a more biologically plausible biomechanical model of the index finger than [4], and add activation-contraction dynamics to the system to simulate muscle function. The planar biomechanical model includes the kinematics of the 3 joints while the applied torques are driven by activation?contraction dynamics with biologically plausible time constants [3]. In agreement with our experimental work [2], the task is to, within 500 ms, move the finger from a given resting configuration to target configuration with a desired terminal velocity. ILQR does not only stabilize the finger dynamics according to the objective function, but it also generates smooth joint space trajectories with minimal tuning and without an a-priori initial control policy (which is difficult to find for highly dimensional biomechanical systems). Furthemore, the use of this optimal control framework and the addition of activation-contraction dynamics considers the full nonlinear dynamics of the index finger and produces a sequence of postures which are compatible with experimental motion data [2]. These simulations combined with prior experimental results suggest that optimal control is a strong candidate for the generation of finger movements prior to abrupt motion-to-force transitions. This work is funded in part by grants NIH R01 0505520 and NSF EFRI-0836042 to Dr. Francisco J. Valero- Cuevas 1 Venkadesan M, Valero-Cuevas FJ. 
Effects of neuromuscular lags on controlling contact transitions. 
Philosophical Transactions of the Royal Society A: 2008. 2 Venkadesan M, Valero-Cuevas FJ. 
Neural Control of Motion-to-Force Transitions with the Fingertip. 
J. Neurosci., Feb 2008; 28: 1366 - 1373; 3 Zajac. Muscle and tendon: properties, models, scaling, and application to biomechanics and motor control. Crit Rev Biomed Eng, 17 4. Weiwei Li., Francisco Valero Cuevas: ?Linear Quadratic Optimal Control of Contact Transition with Fingertip ? ACC 2009

am

PDF [BibTex]

PDF [BibTex]


no image
Inertial parameter estimation of floating-base humanoid systems using partial force sensing

Mistry, M., Schaal, S., Yamane, K.

In IEEE-RAS International Conference on Humanoid Robots (Humanoids 2009), Paris, Dec.7-10, 2009, clmc (inproceedings)

Abstract
Recently, several controllers have been proposed for humanoid robots which rely on full-body dynamic models. The estimation of inertial parameters from data is a critical component for obtaining accurate models for control. However, floating base systems, such as humanoid robots, incur added challenges to this task (e.g. contact forces must be measured, contact states can change, etc.) In this work, we outline a theoretical framework for whole body inertial parameter estimation, including the unactuated floating base. Using a least squares minimization approach, conducted within the nullspace of unmeasured degrees of freedom, we are able to use a partial force sensor set for full-body estimation, e.g. using only joint torque sensors, allowing for estimation when contact force measurement is unavailable or unreliable (e.g. due to slipping, rolling contacts, etc.). We also propose how to determine the theoretical minimum force sensor set for full body estimation, and discuss the practical limitations of doing so.

am

link (url) [BibTex]

link (url) [BibTex]


no image
On-line learning and modulation of periodic movements with nonlinear dynamical systems

Gams, A., Ijspeert, A., Schaal, S., Lenarčič, J.

Autonomous Robots, 27(1):3-23, 2009, clmc (article)

Abstract
Abstract  The paper presents a two-layered system for (1) learning and encoding a periodic signal without any knowledge on its frequency and waveform, and (2) modulating the learned periodic trajectory in response to external events. The system is used to learn periodic tasks on a humanoid HOAP-2 robot. The first layer of the system is a dynamical system responsible for extracting the fundamental frequency of the input signal, based on adaptive frequency oscillators. The second layer is a dynamical system responsible for learning of the waveform based on a built-in learning algorithm. By combining the two dynamical systems into one system we can rapidly teach new trajectories to robots without any knowledge of the frequency of the demonstration signal. The system extracts and learns only one period of the demonstration signal. Furthermore, the trajectories are robust to perturbations and can be modulated to cope with a dynamic environment. The system is computationally inexpensive, works on-line for any periodic signal, requires no additional signal processing to determine the frequency of the input signal and can be applied in parallel to multiple dimensions. Additionally, it can adapt to changes in frequency and shape, e.g. to non-stationary signals, such as hand-generated signals and human demonstrations.

am

link (url) [BibTex]

link (url) [BibTex]

2005


no image
Composite adaptive control with locally weighted statistical learning

Nakanishi, J., Farrell, J. A., Schaal, S.

Neural Networks, 18(1):71-90, January 2005, clmc (article)

Abstract
This paper introduces a provably stable learning adaptive control framework with statistical learning. The proposed algorithm employs nonlinear function approximation with automatic growth of the learning network according to the nonlinearities and the working domain of the control system. The unknown function in the dynamical system is approximated by piecewise linear models using a nonparametric regression technique. Local models are allocated as necessary and their parameters are optimized on-line. Inspired by composite adaptive control methods, the proposed learning adaptive control algorithm uses both the tracking error and the estimation error to update the parameters. We first discuss statistical learning of nonlinear functions, and motivate our choice of the locally weighted learning framework. Second, we begin with a class of first order SISO systems for theoretical development of our learning adaptive control framework, and present a stability proof including a parameter projection method that is needed to avoid potential singularities during adaptation. Then, we generalize our adaptive controller to higher order SISO systems, and discuss further extension to MIMO problems. Finally, we evaluate our theoretical control framework in numerical simulations to illustrate the effectiveness of the proposed learning adaptive controller for rapid convergence and high accuracy of control.

am

link (url) [BibTex]

2005


link (url) [BibTex]


no image
Natural Actor-Critic

Peters, J., Vijayakumar, S., Schaal, S.

In Proceedings of the 16th European Conference on Machine Learning, 3720, pages: 280-291, (Editors: Gama, J.;Camacho, R.;Brazdil, P.;Jorge, A.;Torgo, L.), Springer, ECML, 2005, clmc (inproceedings)

Abstract
This paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing AmariÕs natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by linear regres- sion. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen policy representation, and can be estimated more efficiently than regular policy gradients. The critic makes use of a special basis function parameterization motivated by the policy-gradient compatible function approximation. We show that several well-known reinforcement learning methods such as the original Actor-Critic and BradtkeÕs Linear Quadratic Q-Learning are in fact Natural Actor-Critic algorithms. Em- pirical evaluations illustrate the effectiveness of our techniques in com- parison to previous methods, and also demonstrate their applicability for learning control on an anthropomorphic robot arm.

am ei

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Comparative experiments on task space control with redundancy resolution

Nakanishi, J., Cory, R., Mistry, M., Peters, J., Schaal, S.

In Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages: 3901-3908, Edmonton, Alberta, Canada, Aug. 2-6, IROS, 2005, clmc (inproceedings)

Abstract
Understanding the principles of motor coordination with redundant degrees of freedom still remains a challenging problem, particularly for new research in highly redundant robots like humanoids. Even after more than a decade of research, task space control with redundacy resolution still remains an incompletely understood theoretical topic, and also lacks a larger body of thorough experimental investigation on complex robotic systems. This paper presents our first steps towards the development of a working redundancy resolution algorithm which is robust against modeling errors and unforeseen disturbances arising from contact forces. To gain a better understanding of the pros and cons of different approaches to redundancy resolution, we focus on a comparative empirical evaluation. First, we review several redundancy resolution schemes at the velocity, acceleration and torque levels presented in the literature in a common notational framework and also introduce some new variants of these previous approaches. Second, we present experimental comparisons of these approaches on a seven-degree-of-freedom anthropomorphic robot arm. Surprisingly, one of our simplest algorithms empirically demonstrates the best performance, despite, from a theoretical point, the algorithm does not share the same beauty as some of the other methods. Finally, we discuss practical properties of these control algorithms, particularly in light of inevitable modeling errors of the robot dynamics.

am ei

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
A model of smooth pursuit based on learning of the target dynamics using only retinal signals

Shibata, T., Tabata, H., Schaal, S., Kawato, M.

Neural Networks, 18, pages: 213-225, 2005, clmc (article)

Abstract
While the predictive nature of the primate smooth pursuit system has been evident through several behavioural and neurophysiological experiments, few models have attempted to explain these results comprehensively. The model we propose in this paper in line with previous models employing optimal control theory; however, we hypothesize two new issues: (1) the medical superior temporal (MST) area in the cerebral cortex implements a recurrent neural network (RNN) in order to predict the current or future target velocity, and (2) a forward model of the target motion is acquired by on-line learning. We use stimulation studies to demonstrate how our new model supports these hypotheses.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Linear and Nonlinear Estimation models applied to Hemodynamic Model

Theodorou, E.

Technical Report-2005-1, Computational Action and Vision Lab University of Minnesota, 2005, clmc (techreport)

Abstract
The relation between BOLD signal and neural activity is still poorly understood. The Gaussian Linear Model known as GLM is broadly used in many fMRI data analysis for recovering the underlying neural activity. Although GLM has been proved to be a really useful tool for analyzing fMRI data it can not be used for describing the complex biophysical process of neural metabolism. In this technical report we make use of a system of Stochastic Differential Equations that is based on Buxton model [1] for describing the underlying computational principles of hemodynamic process. Based on this SDE we built a Kalman Filter estimator so as to estimate the induced neural signal as well as the blood inflow under physiologic and sensor noise. The performance of Kalman Filter estimator is investigated under different physiologic noise characteristics and measurement frequencies.

am

PDF [BibTex]

PDF [BibTex]


no image
Predicting EMG Data from M1 Neurons with Variational Bayesian Least Squares

Ting, J., D’Souza, A., Yamamoto, K., Yoshioka, T., Hoffman, D., Kakei, S., Sergio, L., Kalaska, J., Kawato, M., Strick, P., Schaal, S.

In Advances in Neural Information Processing Systems 18 (NIPS 2005), (Editors: Weiss, Y.;Schölkopf, B.;Platt, J.), Cambridge, MA: MIT Press, Vancouver, BC, Dec. 6-11, 2005, clmc (inproceedings)

Abstract
An increasing number of projects in neuroscience requires the statistical analysis of high dimensional data sets, as, for instance, in predicting behavior from neural firing, or in operating artificial devices from brain recordings in brain-machine interfaces. Linear analysis techniques remain prevalent in such cases, but classi-cal linear regression approaches are often numercially too fragile in high dimen-sions. In this paper, we address the question of whether EMG data collected from arm movements of monkeys can be faithfully reconstructed with linear ap-proaches from neural activity in primary motor cortex (M1). To achieve robust data analysis, we develop a full Bayesian approach to linear regression that automatically detects and excludes irrelevant features in the data, and regular-izes against overfitting. In comparison with ordinary least squares, stepwise re-gression, partial least squares, and a brute force combinatorial search for the most predictive input features in the data, we demonstrate that the new Bayesian method offers a superior mixture of characteristics in terms of regularization against overfitting, computational efficiency, and ease of use, demonstrating its potential as a drop-in replacement for other linear regression techniques. As neuroscientific results, our analyses demonstrate that EMG data can be well pre-dicted from M1 neurons, further opening the path for possible real-time inter-faces between brains and machines.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Rapbid synchronization and accurate phase-locking of rhythmic motor primitives

Pongas, D., Billard, A., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2005), pages: 2911-2916, Edmonton, Alberta, Canada, Aug. 2-6, 2005, clmc (inproceedings)

Abstract
Rhythmic movement is ubiquitous in human and animal behavior, e.g., as in locomotion, dancing, swimming, chewing, scratching, music playing, etc. A particular feature of rhythmic movement in biology is the rapid synchronization and phase locking with other rhythmic events in the environment, for instance music or visual stimuli as in ball juggling. In traditional oscillator theories to rhythmic movement generation, synchronization with another signal is relatively slow, and it is not easy to achieve accurate phase locking with a particular feature of the driving stimulus. Using a recently developed framework of dynamic motor primitives, we demonstrate a novel algorithm for very rapid synchronizaton of a rhythmic movement pattern, which can phase lock any feature of the movement to any particulur event in the driving stimulus. As an example application, we demonstrate how an anthropomorphic robot can use imitation learning to acquire a complex rumming pattern and keep it synchronized with an external rhythm generator that changes its frequency over time.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Parametric and Non-Parametric approaches for nonlinear tracking of moving objects

Hidaka, Y, Theodorou, E.

Technical Report-2005-1, 2005, clmc (article)

am

PDF [BibTex]

PDF [BibTex]


no image
A new methodology for robot control design

Peters, J., Mistry, M., Udwadia, F. E., Schaal, S.

In The 5th ASME International Conference on Multibody Systems, Nonlinear Dynamics, and Control (MSNDC 2005), Long Beach, CA, Sept. 24-28, 2005, clmc (inproceedings)

Abstract
Gauss principle of least constraint and its generalizations have provided a useful insights for the development of tracking controllers for mechanical systems (Udwadia,2003). Using this concept, we present a novel methodology for the design of a specific class of robot controllers. With our new framework, we demonstrate that well-known and also several novel nonlinear robot control laws can be derived from this generic framework, and show experimental verifications on a Sarcos Master Arm robot for some of these controllers. We believe that the suggested approach unifies and simplifies the design of optimal nonlinear control laws for robots obeying rigid body dynamics equations, both with or without external constraints, holonomic or nonholonomic constraints, with over-actuation or underactuation, as well as open-chain and closed-chain kinematics.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Arm movement experiments with joint space force fields using an exoskeleton robot

Mistry, M., Mohajerian, P., Schaal, S.

In IEEE Ninth International Conference on Rehabilitation Robotics, pages: 408-413, Chicago, Illinois, June 28-July 1, 2005, clmc (inproceedings)

Abstract
A new experimental platform permits us to study a novel variety of issues of human motor control, particularly full 3-D movements involving the major seven degrees-of-freedom (DOF) of the human arm. We incorporate a seven DOF robot exoskeleton, and can minimize weight and inertia through gravity, Coriolis, and inertia compensation, such that subjects' arm movements are largely unaffected by the manipulandum. Torque perturbations can be individually applied to any or all seven joints of the human arm, thus creating novel dynamic environments, or force fields, for subjects to respond and adapt to. Our first study investigates a joint space force field where the shoulder velocity drives a disturbing force in the elbow joint. Results demonstrate that subjects learn to compensate for the force field within about 100 trials, and from the strong presence of aftereffects when removing the field in some randomized catch trials, that an inverse dynamics, or internal model, of the force field is formed by the nervous system. Interestingly, while post-learning hand trajectories return to baseline, joint space trajectories remained changed in response to the field, indicating that besides learning a model of the force field, the nervous system also chose to exploit the space to minimize the effects of the force field on the realization of the endpoint trajectory plan. Further applications for our apparatus include studies in motor system redundancy resolution and inverse kinematics, as well as rehabilitation.

am

link (url) [BibTex]

link (url) [BibTex]


no image
A unifying framework for the control of robotics systems

Peters, J., Mistry, M., Udwadia, F. E., Cory, R., Nakanishi, J., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2005), pages: 1824-1831, Edmonton, Alberta, Canada, Aug. 2-6, 2005, clmc (inproceedings)

Abstract
Recently, [1] suggested to derive tracking controllers for mechanical systems using a generalization of GaussÕ principle of least constraint. This method al-lows us to reformulate control problems as a special class of optimal control. We take this line of reasoning one step further and demonstrate that well-known and also several novel nonlinear robot control laws can be derived from this generic methodology. We show experimental verifications on a Sar-cos Master Arm robot for some of the the derived controllers.We believe that the suggested approach offers a promising unification and simplification of nonlinear control law design for robots obeying rigid body dynamics equa-tions, both with or without external constraints, with over-actuation or under-actuation, as well as open-chain and closed-chain kinematics.

am

link (url) [BibTex]

link (url) [BibTex]

2001


no image
Humanoid oculomotor control based on concepts of computational neuroscience

Shibata, T., Vijayakumar, S., Conradt, J., Schaal, S.

In Humanoids2001, Second IEEE-RAS International Conference on Humanoid Robots, 2001, clmc (inproceedings)

Abstract
Oculomotor control in a humanoid robot faces similar problems as biological oculomotor systems, i.e., the stabilization of gaze in face of unknown perturbations of the body, selective attention, the complexity of stereo vision and dealing with large information processing delays. In this paper, we suggest control circuits to realize three of the most basic oculomotor behaviors - the vestibulo-ocular and optokinetic reflex (VOR-OKR) for gaze stabilization, smooth pursuit for tracking moving objects, and saccades for overt visual attention. Each of these behaviors was derived from inspirations from computational neuroscience, which proves to be a viable strategy to explore novel control mechanisms for humanoid robotics. Our implementations on a humanoid robot demonstrate good performance of the oculomotor behaviors that appears natural and human-like.

am

link (url) [BibTex]

2001


link (url) [BibTex]


no image
Trajectory formation for imitation with nonlinear dynamical systems

Ijspeert, A., Nakanishi, J., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2001), pages: 752-757, Weilea, Hawaii, Oct.29-Nov.3, 2001, clmc (inproceedings)

Abstract
This article explores a new approach to learning by imitation and trajectory formation by representing movements as mixtures of nonlinear differential equations with well-defined attractor dynamics. An observed movement is approximated by finding a best fit of the mixture model to its data by a recursive least squares regression technique. In contrast to non-autonomous movement representations like splines, the resultant movement plan remains an autonomous set of nonlinear differential equations that forms a control policy which is robust to strong external perturbations and that can be modified by additional perceptual variables. This movement policy remains the same for a given target, regardless of the initial conditions, and can easily be re-used for new targets. We evaluate the trajectory formation system (TFS) in the context of a humanoid robot simulation that is part of the Virtual Trainer (VT) project, which aims at supervising rehabilitation exercises in stroke-patients. A typical rehabilitation exercise was collected with a Sarcos Sensuit, a device to record joint angular movement from human subjects, and approximated and reproduced with our imitation techniques. Our results demonstrate that multi-joint human movements can be encoded successfully, and that this system allows robust modifications of the movement policy through external variables.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Real-time statistical learning for robotics and human augmentation

Schaal, S., Vijayakumar, S., D’Souza, A., Ijspeert, A., Nakanishi, J.

In International Symposium on Robotics Research, (Editors: Jarvis, R. A.;Zelinsky, A.), Lorne, Victoria, Austrialia Nov.9-12, 2001, clmc (inproceedings)

Abstract
Real-time modeling of complex nonlinear dynamic processes has become increasingly important in various areas of robotics and human augmentation. To address such problems, we have been developing special statistical learning methods that meet the demands of on-line learning, in particular the need for low computational complexity, rapid learning, and scalability to high-dimensional spaces. In this paper, we introduce a novel algorithm that possesses all the necessary properties by combining methods from probabilistic and nonparametric learning. We demonstrate the applicability of our methods for three different applications in humanoid robotics, i.e., the on-line learning of a full-body inverse dynamics model, an inverse kinematics model, and imitation learning. The latter application will also introduce a novel method to shape attractor landscapes of dynamical system by means of statis-tical learning.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Robust learning of arm trajectories through human demonstration

Billard, A., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2001), Piscataway, NJ: IEEE, Maui, Hawaii, Oct.29-Nov.3, 2001, clmc (inproceedings)

Abstract
We present a model, composed of hierarchy of artificial neural networks, for robot learning by demonstration. The model is implemented in a dynamic simulation of a 41 degrees of freedom humanoid for reproducing 3D human motion of the arm. Results show that the model requires few information about the desired trajectory and learns on-line the relevant features of movement. It can generalize across a small set of data to produce a qualitatively good reproduction of the demonstrated trajectory. Finally, it is shown that reproduction of the trajectory after learning is robust against perturbations.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Synchronized robot drumming by neural oscillator

Kotosaka, S., Schaal, S.

Journal of the Robotics Society of Japan, 19(1):116-123, 2001, clmc (article)

Abstract
Sensory-motor integration is one of the key issues in robotics. In this paper, we propose an approach to rhythmic arm movement control that is synchronized with an external signal based on exploiting a simple neural oscillator network. Trajectory generation by the neural oscillator is a biologically inspired method that can allow us to generate a smooth and continuous trajectory. The parameter tuning of the oscillators is used to generate a synchronized movement with wide intervals. We adopted the method for the drumming task as an example task. By using this method, the robot can realize synchronized drumming with wide drumming intervals in real time. The paper also shows the experimental results of drumming by a humanoid robot.

am

[BibTex]

[BibTex]


no image
Origins and violations of the 2/3 power law in rhythmic 3D movements

Schaal, S., Sternad, D.

Experimental Brain Research, 136, pages: 60-72, 2001, clmc (article)

Abstract
The 2/3 power law, the nonlinear relationship between tangential velocity and radius of curvature of the endeffector trajectory, has been suggested as a fundamental constraint of the central nervous system in the formation of rhythmic endpoint trajectories. However, studies on the 2/3 power law have largely been confined to planar drawing patterns of relatively small size. With the hypothesis that this strategy overlooks nonlinear effects that are constitutive in movement generation, the present experiments tested the validity of the power law in elliptical patterns which were not confined to a planar surface and which were performed by the unconstrained 7-DOF arm with significant variations in pattern size and workspace orientation. Data were recorded from five human subjects where the seven joint angles and the endpoint trajectories were analyzed. Additionally, an anthropomorphic 7-DOF robot arm served as a "control subject" whose endpoint trajectories were generated on the basis of the human joint angle data, modeled as simple harmonic oscillations. Analyses of the endpoint trajectories demonstrate that the power law is systematically violated with increasing pattern size, in both exponent and the goodness of fit. The origins of these violations can be explained analytically based on smooth rhythmic trajectory formation and the kinematic structure of the human arm. We conclude that in unconstrained rhythmic movements, the power law seems to be a by-product of a movement system that favors smooth trajectories, and that it is unlikely to serve as a primary movement generating principle. Our data rather suggests that subjects employed smooth oscillatory pattern generators in joint space to realize the required movement patterns.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Graph-matching vs. entropy-based methods for object detection
Neural Networks, 14(3):345-354, 2001, clmc (article)

Abstract
Labeled Graph Matching (LGM) has been shown successful in numerous ob-ject vision tasks. This method is the basis for arguably the best face recognition system in the world. We present an algorithm for visual pattern recognition that is an extension of LGM ("LGM+"). We compare the performance of LGM and LGM+ algorithms with a state of the art statistical method based on Mutual Information Maximization (MIM). We present an adaptation of the MIM method for multi-dimensional Gabor wavelet features. The three pattern recognition methods were evaluated on an object detection task, using a set of stimuli on which none of the methods had been tested previously. The results indicate that while the performance of the MIM method operating upon Gabor wavelets is superior to the same method operating on pixels and to LGM, it is surpassed by LGM+. LGM+ offers a significant improvement in performance over LGM without losing LGMâ??s virtues of simplicity, biological plausibility, and a computational cost that is 2-3 orders of magnitude lower than that of the MIM algorithm. 

am

link (url) [BibTex]

link (url) [BibTex]


no image
Biomimetic gaze stabilization based on feedback-error learning with nonparametric regression networks

Shibata, T., Schaal, S.

Neural Networks, 14(2):201-216, 2001, clmc (article)

Abstract
Oculomotor control in a humanoid robot faces similar problems as biological oculomotor systems, i.e. the stabilization of gaze in face of unknown perturbations of the body, selective attention, stereo vision, and dealing with large information processing delays. Given the nonlinearities of the geometry of binocular vision as well as the possible nonlinearities of the oculomotor plant, it is desirable to accomplish accurate control of these behaviors through learning approaches. This paper develops a learning control system for the phylogenetically oldest behaviors of oculomotor control, the stabilization reflexes of gaze. In a step-wise procedure, we demonstrate how control theoretic reasonable choices of control components result in an oculomotor control system that resembles the known functional anatomy of the primate oculomotor system. The core of the learning system is derived from the biologically inspired principle of feedback-error learning combined with a state-of-the-art non-parametric statistical learning network. With this circuitry, we demonstrate that our humanoid robot is able to acquire high performance visual stabilization reflexes after about 40 s of learning despite significant nonlinearities and processing delays in the system.

am

link (url) [BibTex]


no image
Fast learning of biomimetic oculomotor control with nonparametric regression networks (in Japanese)

Shibata, T., Schaal, S.

Journal of the Robotics Society of Japan, 19(4):468-479, 2001, clmc (article)

am

[BibTex]

[BibTex]


no image
Bouncing a ball: Tuning into dynamic stability

Sternad, D., Duarte, M., Katsumata, H., Schaal, S.

Journal of Experimental Psychology: Human Perception and Performance, 27(5):1163-1184, 2001, clmc (article)

Abstract
Rhythmically bouncing a ball with a racket was investigated and modeled with a nonlinear map. Model analyses provided a variable defining a dynamically stable solution that obviates computationally expensive corrections. Three experiments evaluated whether dynamic stability is optimized and what perceptual support is necessary for stable behavior. Two hypotheses were tested: (a) Performance is stable if racket acceleration is negative at impact, and (b) variability is lowest at an impact acceleration between -4 and -1 m/s2. In Experiment 1 participants performed the task, eyes open or closed, bouncing a ball confined to a 1-dimensional trajectory. Experiment 2 eliminated constraints on racket and ball trajectory. Experiment 3 excluded visual or haptic information. Movements were performed with negative racket accelerations in the range of highest stability. Performance with eyes closed was more variable, leaving acceleration unaffected. With haptic information, performance was more stable than with visual information alone.

am

[BibTex]

[BibTex]


no image
Overt visual attention for a humanoid robot

Vijayakumar, S., Conradt, J., Shibata, T., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2001), 2001, clmc (inproceedings)

Abstract
The goal of our research is to investigate the interplay between oculomotor control, visual processing, and limb control in humans and primates by exploring the computational issues of these processes with a biologically inspired artificial oculomotor system on an anthropomorphic robot. In this paper, we investigate the computational mechanisms for visual attention in such a system. Stimuli in the environment excite a dynamical neural network that implements a saliency map, i.e., a winner-take-all competition between stimuli while simultenously smoothing out noise and suppressing irrelevant inputs. In real-time, this system computes new targets for the shift of gaze, executed by the head-eye system of the robot. The redundant degrees-of- freedom of the head-eye system are resolved through a learned inverse kinematics with optimization criterion. We also address important issues how to ensure that the coordinate system of the saliency map remains correct after movement of the robot. The presented attention system is built on principled modules and generally applicable for any sensory modality.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Learning inverse kinematics

D’Souza, A., Vijayakumar, S., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2001), Piscataway, NJ: IEEE, Maui, Hawaii, Oct.29-Nov.3, 2001, clmc (inproceedings)

Abstract
Real-time control of the endeffector of a humanoid robot in external coordinates requires computationally efficient solutions of the inverse kinematics problem. In this context, this paper investigates learning of inverse kinematics for resolved motion rate control (RMRC) employing an optimization criterion to resolve kinematic redundancies. Our learning approach is based on the key observations that learning an inverse of a non uniquely invertible function can be accomplished by augmenting the input representation to the inverse model and by using a spatially localized learning approach. We apply this strategy to inverse kinematics learning and demonstrate how a recently developed statistical learning algorithm, Locally Weighted Projection Regression, allows efficient learning of inverse kinematic mappings in an incremental fashion even when input spaces become rather high dimensional. The resulting performance of the inverse kinematics is comparable to Liegeois ([1]) analytical pseudo inverse with optimization. Our results are illustrated with a 30 degree-of-freedom humanoid robot.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Biomimetic smooth pursuit based on fast learning of the target dynamics

Shibata, T., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2001), 2001, clmc (inproceedings)

Abstract
Following a moving target with a narrow-view foveal vision system is one of the essential oculomotor behaviors of humans and humanoids. This oculomotor behavior, called ``Smooth Pursuit'', requires accurate tracking control which cannot be achieved by a simple visual negative feedback controller due to the significant delays in visual information processing. In this paper, we present a biologically inspired and control theoretically sound smooth pursuit controller consisting of two cascaded subsystems. One is an inverse model controller for the oculomotor system, and the other is a learning controller for the dynamics of the visual target. The latter controller learns how to predict the target's motion in head coordinates such that tracking performance can be improved. We investigate our smooth pursuit system in simulations and experiments on a humanoid robot. By using a fast on-line statistical learning network, our humanoid oculomotor system is able to acquire high performance smooth pursuit after about 5 seconds of learning despite significant processing delays in the syste

am

link (url) [BibTex]

link (url) [BibTex]


no image
Biomimetic oculomotor control

Shibata, T., Vijayakumar, S., Conradt, J., Schaal, S.

Adaptive Behavior, 9(3/4):189-207, 2001, clmc (article)

Abstract
Oculomotor control in a humanoid robot faces similar problems as biological oculomotor systems, i.e., capturing targets accurately on a very narrow fovea, dealing with large delays in the control system, the stabilization of gaze in face of unknown perturbations of the body, selective attention, and the complexity of stereo vision. In this paper, we suggest control circuits to realize three of the most basic oculomotor behaviors and their integration - the vestibulo-ocular and optokinetic reflex (VOR-OKR) for gaze stabilization, smooth pursuit for tracking moving objects, and saccades for overt visual attention. Each of these behaviors and the mechanism for their integration was derived with inspiration from computational theories as well as behavioral and physiological data in neuroscience. Our implementations on a humanoid robot demonstrate good performance of the oculomotor behaviors, which proves to be a viable strategy to explore novel control mechanisms for humanoid robotics. Conversely, insights gained from our models have been able to directly influence views and provide new directions for computational neuroscience research.

am

link (url) [BibTex]

link (url) [BibTex]

1997


no image
Locally weighted learning

Atkeson, C. G., Moore, A. W., Schaal, S.

Artificial Intelligence Review, 11(1-5):11-73, 1997, clmc (article)

Abstract
This paper surveys locally weighted learning, a form of lazy learning and memory-based learning, and focuses on locally weighted linear regression. The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, assessing predictions, handling noisy data and outliers, improving the quality of predictions by tuning fit parameters, interference between old and new data, implementing locally weighted learning efficiently, and applications of locally weighted learning. A companion paper surveys how locally weighted learning can be used in robot learning and control. Keywords: locally weighted regression, LOESS, LWR, lazy learning, memory-based learning, least commitment learning, distance functions, smoothing parameters, weighting functions, global tuning, local tuning, interference.

am

link (url) [BibTex]

1997


link (url) [BibTex]


no image
Locally weighted learning for control

Atkeson, C. G., Moore, A. W., Schaal, S.

Artificial Intelligence Review, 11(1-5):75-113, 1997, clmc (article)

Abstract
Lazy learning methods provide useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of complex systems. This paper surveys ways in which locally weighted learning, a type of lazy learning, has been applied by us to control tasks. We explain various forms that control tasks can take, and how this affects the choice of learning paradigm. The discussion section explores the interesting impact that explicitly remembering all previous experiences has on the problem of learning to control. Keywords: locally weighted regression, LOESS, LWR, lazy learning, memory-based learning, least commitment learning, forward models, inverse models, linear quadratic regulation (LQR), shifting setpoint algorithm, dynamic programming.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Learning from demonstration

Schaal, S.

In Advances in Neural Information Processing Systems 9, pages: 1040-1046, (Editors: Mozer, M. C.;Jordan, M.;Petsche, T.), MIT Press, Cambridge, MA, 1997, clmc (inproceedings)

Abstract
By now it is widely accepted that learning a task from scratch, i.e., without any prior knowledge, is a daunting undertaking. Humans, however, rarely attempt to learn from scratch. They extract initial biases as well as strategies how to approach a learning problem from instructions and/or demonstrations of other humans. For learning control, this paper investigates how learning from demonstration can be applied in the context of reinforcement learning. We consider priming the Q-function, the value function, the policy, and the model of the task dynamics as possible areas where demonstrations can speed up learning. In general nonlinear learning problems, only model-based reinforcement learning shows significant speed-up after a demonstration, while in the special case of linear quadratic regulator (LQR) problems, all methods profit from the demonstration. In an implementation of pole balancing on a complex anthropomorphic robot arm, we demonstrate that, when facing the complexities of real signal processing, model-based reinforcement learning offers the most robustness for LQR problems. Using the suggested methods, the robot learns pole balancing in just a single trial after a 30 second long demonstration of the human instructor. 

am

link (url) [BibTex]

link (url) [BibTex]


no image
Robot learning from demonstration

Atkeson, C. G., Schaal, S.

In Machine Learning: Proceedings of the Fourteenth International Conference (ICML ’97), pages: 12-20, (Editors: Fisher Jr., D. H.), Morgan Kaufmann, Nashville, TN, July 8-12, 1997, 1997, clmc (inproceedings)

Abstract
The goal of robot learning from demonstration is to have a robot learn from watching a demonstration of the task to be performed. In our approach to learning from demonstration the robot learns a reward function from the demonstration and a task model from repeated attempts to perform the task. A policy is computed based on the learned reward function and task model. Lessons learned from an implementation on an anthropomorphic robot arm using a pendulum swing up task include 1) simply mimicking demonstrated motions is not adequate to perform this task, 2) a task planner can use a learned model and reward function to compute an appropriate policy, 3) this model-based planning process supports rapid learning, 4) both parametric and nonparametric models can be learned and used, and 5) incorporating a task level direct learning component, which is non-model-based, in addition to the model-based planner, is useful in compensating for structural modeling errors and slow model learning. 

am

link (url) [BibTex]

link (url) [BibTex]


no image
Local dimensionality reduction for locally weighted learning

Vijayakumar, S., Schaal, S.

In International Conference on Computational Intelligence in Robotics and Automation, pages: 220-225, Monteray, CA, July10-11, 1997, 1997, clmc (inproceedings)

Abstract
Incremental learning of sensorimotor transformations in high dimensional spaces is one of the basic prerequisites for the success of autonomous robot devices as well as biological movement systems. So far, due to sparsity of data in high dimensional spaces, learning in such settings requires a significant amount of prior knowledge about the learning task, usually provided by a human expert. In this paper we suggest a partial revision of the view. Based on empirical studies, it can been observed that, despite being globally high dimensional and sparse, data distributions from physical movement systems are locally low dimensional and dense. Under this assumption, we derive a learning algorithm, Locally Adaptive Subspace Regression, that exploits this property by combining a local dimensionality reduction as a preprocessing step with a nonparametric learning technique, locally weighted regression. The usefulness of the algorithm and the validity of its assumptions are illustrated for a synthetic data set and data of the inverse dynamics of an actual 7 degree-of-freedom anthropomorphic robot arm.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Learning tasks from a single demonstration

Atkeson, C. G., Schaal, S.

In IEEE International Conference on Robotics and Automation (ICRA97), 2, pages: 1706-1712, Piscataway, NJ: IEEE, Albuquerque, NM, 20-25 April, 1997, clmc (inproceedings)

Abstract
Learning a complex dynamic robot manoeuvre from a single human demonstration is difficult. This paper explores an approach to learning from demonstration based on learning an optimization criterion from the demonstration and a task model from repeated attempts to perform the task, and using the learned criterion and model to compute an appropriate robot movement. A preliminary version of the approach has been implemented on an anthropomorphic robot arm using a pendulum swing up task as an example

am

link (url) [BibTex]

link (url) [BibTex]