Header logo is


2011


no image
Projected Newton-type methods in machine learning

Schmidt, M., Kim, D., Sra, S.

In Optimization for Machine Learning, pages: 305-330, MIT Press, Cambridge, MA, USA, 2011 (incollection)

Abstract
{We consider projected Newton-type methods for solving large-scale optimization problems arising in machine learning and related fields. We first introduce an algorithmic framework for projected Newton-type methods by reviewing a canonical projected (quasi-)Newton method. This method, while conceptually pleasing, has a high computation cost per iteration. Thus, we discuss two variants that are more scalable, namely, two-metric projection and inexact projection methods. Finally, we show how to apply the Newton-type framework to handle non-smooth objectives. Examples are provided throughout the chapter to illustrate machine learning applications of our framework.}

mms

link (url) [BibTex]

2011


link (url) [BibTex]

1996


no image
From isolation to cooperation: An alternative of a system of experts

Schaal, S., Atkeson, C. G.

In Advances in Neural Information Processing Systems 8, pages: 605-611, (Editors: Touretzky, D. S.;Mozer, M. C.;Hasselmo, M. E.), MIT Press, Cambridge, MA, 1996, clmc (inbook)

Abstract
We introduce a constructive, incremental learning system for regression problems that models data by means of locally linear experts. In contrast to other approaches, the experts are trained independently and do not compete for data during learning. Only when a prediction for a query is required do the experts cooperate by blending their individual predictions. Each expert is trained by minimizing a penalized local cross validation error using second order methods. In this way, an expert is able to adjust the size and shape of the receptive field in which its predictions are valid, and also to adjust its bias on the importance of individual input dimensions. The size and shape adjustment corresponds to finding a local distance metric, while the bias adjustment accomplishes local dimensionality reduction. We derive asymptotic results for our method. In a variety of simulations we demonstrate the properties of the algorithm with respect to interference, learning speed, prediction accuracy, feature detection, and task oriented incremental learning. 

am

link (url) [BibTex]

1996


link (url) [BibTex]