Header logo is


2003


no image
Learning Control and Planning from the View of Control Theory and Imitation

Peters, J., Schaal, S.

NIPS Workshop "Planning for the Real World: The promises and challenges of dealing with uncertainty", December 2003 (talk)

Abstract
Learning control and planning in high dimensional continuous state-action systems, e.g., as needed in a humanoid robot, has so far been a domain beyond the applicability of generic planning techniques like reinforcement learning and dynamic programming. This talk describes an approach we have taken in order to enable complex robotics systems to learn to accomplish control tasks. Adaptive learning controllers equipped with statistical learning techniques can be used to learn tracking controllers -- missing state information and uncertainty in the state estimates are usually addressed by observers or direct adaptive control methods. Imitation learning is used as an ingredient to seed initial control policies whose output is a desired trajectory suitable to accomplish the task at hand. Reinforcement learning with stochastic policy gradients using a natural gradient forms the third component that allows refining the initial control policy until the task is accomplished. In comparison to general learning control, this approach is highly prestructured and thus more domain specific. However, it seems to be a theoretically clean and feasible strategy for control systems of the complexity that we need to address.

ei

Web [BibTex]

2003


Web [BibTex]


no image
Molecular phenotyping of human chondrocyte cell lines T/C-28a2, T/C-28a4, and C-28/I2

Finger, F., Schorle, C., Zien, A., Gebhard, P., Goldring, M., Aigner, T.

Arthritis & Rheumatism, 48(12):3395-3403, December 2003 (article)

ei

[BibTex]

[BibTex]


no image
A Study on Rainfall - Runoff Models for Improving Ensemble Streamflow Prediction: 1. Rainfallrunoff Models Using Artificial Neural Networks

Jeong, D., Kim, Y., Cho, S., Shin, H.

Journal of the Korean Society of Civil Engineers, 23(6B):521-530, December 2003 (article)

Abstract
The previous ESP (Ensemble Streamflow Prediction) studies conducted in Korea reported that the modeling error is a major source of the ESP forecast error in winter and spring (i.e. dry seasons), and thus suggested that improving the rainfall-runoff model would be critical to obtain more accurate probabilistic forecasts with ESP. This study used two types of Artificial Neural Networks (ANN), such as a Single Neural Network (SNN) and an Ensemble Neural Networks (ENN), to improve the simulation capability of the rainfall-runoff model of the ESP forecasting system for the monthly inflow to the Daecheong dam. Applied for the first time to Korean hydrology, ENN combines the outputs of member models so that it can control the generalization error better than SNN. Because the dry and the flood season in Korea shows considerably different streamflow characteristics, this study calibrated the rainfall-runoff model separately for each season. Therefore, four rainfall-runoff models were developed according to the ANN types and the seasons. This study compared the ANN models with a conceptual rainfall-runoff model called TANK and verified that the ANN models were superior to TANK. Among the ANN models, ENN was more accurate than SNN. The ANN model performance was improved when the model was calibrated separately for the dry and the flood season. The best ANN model developed in this article will be incorporated into the ESP system to increase the forecast capability of ESP for the monthly inflow to the Daecheong dam.

ei

[BibTex]

[BibTex]


no image
Quantitative Cerebral Blood Flow Measurements in the Rat Using a Beta-Probe and H215O

Weber, B., Spaeth, N., Wyss, M., Wild, D., Burger, C., Stanley, R., Buck, A.

Journal of Cerebral Blood Flow and Metabolism, 23(12):1455-1460, December 2003 (article)

Abstract
Beta-probes are a relatively new tool for tracer kinetic studies in animals. They are highly suited to evaluate new positron emission tomography tracers or measure physiologic parameters at rest and after some kind of stimulation or intervention. In many of these experiments, the knowledge of CBF is highly important. Thus, the purpose of this study was to evaluate the method of CBF measurements using a beta-probe and H215O. CBF was measured in the barrel cortex of eight rats at baseline and after acetazolamide challenge. Trigeminal nerve stimulation was additionally performed in five animals. In each category, three injections of 250 to 300 MBq H215O were performed at 10-minute intervals. Data were analyzed using a standard one-tissue compartment model (K1 = CBF, k2 = CBF/p, where p is the partition coefficient). Values for K1 were 0.35 plusminus 0.09, 0.58 plusminus 0.16, and 0.49 plusminus 0.03 mL dot min-1 dot mL-1 at rest, after acetazolamide challenge, and during trigeminal nerve stimulation, respectively. The corresponding values for k2 were 0.55 plusminus 0.12, 0.94 plusminus 0.16, and 0.85 plusminus 0.12 min-7, and for p were 0.64 plusminus 0.05, 0.61 plusminus 0.07, and 0.59 plusminus 0.06.The standard deviation of the difference between two successive experiments, a measure for the reproducibility of the method, was 10.1%, 13.0%, and 5.7% for K1, k2, and p, respectively. In summary, beta-probes in conjunction with H215O allow the reproducible quantitative measurement of CBF, although some systematic underestimation seems to occur, probably because of partial volume effects.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Recurrent neural networks from learning attractor dynamics

Schaal, S., Peters, J.

NIPS Workshop on RNNaissance: Recurrent Neural Networks, December 2003 (talk)

Abstract
Many forms of recurrent neural networks can be understood in terms of dynamic systems theory of difference equations or differential equations. Learning in such systems corresponds to adjusting some internal parameters to obtain a desired time evolution of the network, which can usually be characterized in term of point attractor dynamics, limit cycle dynamics, or, in some more rare cases, as strange attractor or chaotic dynamics. Finding a stable learning process to adjust the open parameters of the network towards shaping the desired attractor type and basin of attraction has remain a complex task, as the parameter trajectories during learning can lead the system through a variety of undesirable unstable behaviors, such that learning may never succeed. In this presentation, we review a recently developed learning framework for a class of recurrent neural networks that employs a more structured network approach. We assume that the canonical system behavior is known a priori, e.g., it is a point attractor or a limit cycle. With either supervised learning or reinforcement learning, it is possible to acquire the transformation from a simple representative of this canonical behavior (e.g., a 2nd order linear point attractor, or a simple limit cycle oscillator) to the desired highly complex attractor form. For supervised learning, one shot learning based on locally weighted regression techniques is possible. For reinforcement learning, stochastic policy gradient techniques can be employed. In any case, the recurrent network learned by these methods inherits the stability properties of the simple dynamic system that underlies the nonlinear transformation, such that stability of the learning approach is not a problem. We demonstrate the success of this approach for learning various skills on a humanoid robot, including tasks that require to incorporate additional sensory signals as coupling terms to modify the recurrent network evolution on-line.

ei

Web [BibTex]

Web [BibTex]


no image
Support Vector Channel Selection in BCI

Lal, T., Schröder, M., Hinterberger, T., Weston, J., Bogdan, M., Birbaumer, N., Schölkopf, B.

(120), Max Planck Institute for Biological Cybernetics, Tuebingen, Germany, December 2003 (techreport)

Abstract
Designing a Brain Computer Interface (BCI) system one can choose from a variety of features that may be useful for classifying brain activity during a mental task. For the special case of classifying EEG signals we propose the usage of the state of the art feature selection algorithms Recursive Feature Elimination [3] and Zero-Norm Optimization [13] which are based on the training of Support Vector Machines (SVM) [11]. These algorithms can provide more accurate solutions than standard filter methods for feature selection [14]. We adapt the methods for the purpose of selecting EEG channels. For a motor imagery paradigm we show that the number of used channels can be reduced significantly without increasing the classification error. The resulting best channels agree well with the expected underlying cortical activity patterns during the mental tasks. Furthermore we show how time dependent task specific information can be visualized.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Blind separation of post-nonlinear mixtures using linearizing transformations and temporal decorrelation

Ziehe, A., Kawanabe, M., Harmeling, S., Müller, K.

Journal of Machine Learning Research, 4(7-8):1319-1338, November 2003 (article)

Abstract
We propose two methods that reduce the post-nonlinear blind source separation problem (PNL-BSS) to a linear BSS problem. The first method is based on the concept of maximal correlation: we apply the alternating conditional expectation (ACE) algorithm--a powerful technique from non-parametric statistics--to approximately invert the componentwise nonlinear functions. The second method is a Gaussianizing transformation, which is motivated by the fact that linearly mixed signals before nonlinear transformation are approximately Gaussian distributed. This heuristic, but simple and efficient procedure works as good as the ACE method. Using the framework provided by ACE, convergence can be proven. The optimal transformations obtained by ACE coincide with the sought-after inverse functions of the nonlinearities. After equalizing the nonlinearities, temporal decorrelation separation (TDSEP) allows us to recover the source signals. Numerical simulations testing "ACE-TD" and "Gauss-TD" on realistic examples are performed with excellent results.

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Correlated stage- and subfield-associated hippocampal gene expression patterns in experimental and human temporal lobe epilepsy

Becker, A., Chen, J., Zien, A., Sochivko, D., Normann, S., Schramm, J., Elger, C., Wiestler, O., Blumcke, I.

European Journal of Neuroscience, 18(10):2792-2802, November 2003 (article)

Abstract
Epileptic activity evokes profound alterations of hippocampal organization and function. Genomic responses may reflect immediate consequences of excitatory stimulation as well as sustained molecular processes related to neuronal plasticity and structural remodeling. Using oligonucleotide microarrays with 8799 sequences, we determined subregional gene expression profiles in rats subjected to pilocarpine-induced epilepsy (U34A arrays, Affymetrix, Santa Clara, CA, USA; P < 0.05, twofold change, n = 3 per stage). Patterns of gene expression corresponded to distinct stages of epilepsy development. The highest number of differentially expressed genes (dentate gyrus, approx. 400 genes and CA1, approx. 700 genes) was observed 3 days after status epilepticus. The majority of up-regulated genes was associated with mechanisms of cellular stress and injury - 14 days after status epilepticus, numerous transcription factors and genes linked to cytoskeletal and synaptic reorganization were differentially expressed and, in the stage of chronic spontaneous seizures, distinct changes were observed in the transcription of genes involved in various neurotransmission pathways and between animals with low vs. high seizure frequency. A number of genes (n = 18) differentially expressed during the chronic epileptic stage showed corresponding expression patterns in hippocampal subfields of patients with pharmacoresistant temporal lobe epilepsy (n = 5 temporal lobe epilepsy patients; U133A microarrays, Affymetrix; covering 22284 human sequences). These data provide novel insights into the molecular mechanisms of epileptogenesis and seizure-associated cellular and structural remodeling of the hippocampus.

ei

[BibTex]

[BibTex]


no image
Concentration Inequalities for Sub-Additive Functions Using the Entropy Method

Bousquet, O.

Stochastic Inequalities and Applications, 56, pages: 213-247, Progress in Probability, (Editors: Giné, E., C. Houdré and D. Nualart), November 2003 (article)

Abstract
We obtain exponential concentration inequalities for sub-additive functions of independent random variables under weak conditions on the increments of those functions, like the existence of exponential moments for these increments. As a consequence of these general inequalities, we obtain refinements of Talagrand's inequality for empirical processes and new bounds for randomized empirical processes. These results are obtained by further developing the entropy method introduced by Ledoux.

ei

PostScript [BibTex]

PostScript [BibTex]


no image
Technical report on Separation methods for nonlinear mixtures

Jutten, C., Karhunen, J., Almeida, L., Harmeling, S.

(D29), EU-Project BLISS, October 2003 (techreport)

ei

PDF [BibTex]

PDF [BibTex]


no image
Image Reconstruction by Linear Programming

Tsuda, K., Rätsch, G.

(118), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, October 2003 (techreport)

ei

PDF [BibTex]

PDF [BibTex]


no image
YKL-39 (chitinase 3-like protein 2), but not YKL-40 (chitinase 3-like protein 1), is up regulated in osteoarthritic chondrocytes

Knorr, T., Obermayr, F., Bartnik, E., Zien, A., Aigner, T.

Annals of the Rheumatic Diseases, 62(10):995-998, October 2003 (article)

Abstract
OBJECTIVE: To investigate quantitatively the mRNA expression levels of YKL-40, an established marker of rheumatoid and osteoarthritic cartilage degeneration in synovial fluid and serum, and a closely related molecule YKL-39, in articular chondrocytes. METHODS: cDNA array and online quantitative polymerase chain reaction (PCR) were used to measure mRNA expression levels of YKL-39 and YKL-40 in chondrocytes in normal, early degenerative, and late stage osteoarthritic cartilage samples. RESULTS: Expression analysis showed high levels of both proteins in normal articular chondrocytes, with lower levels of YKL-39 than YKL-40. Whereas YKL-40 was significantly down regulated in late stage osteoarthritic chondrocytes, YKL-39 was significantly up regulated. In vitro both YKLs were down regulated by interleukin 1beta. CONCLUSIONS: The up regulation of YKL-39 in osteoarthritic cartilage suggests that YKL-39 may be a more accurate marker of chondrocyte activation than YKL-40, although it has yet to be established as a suitable marker in synovial fluid and serum. The decreased expression of YKL-40 by osteoarthritic chondrocytes is surprising as increased levels have been reported in rheumatoid and osteoarthritic synovial fluid, where it may derive from activated synovial cells or osteophytic tissue or by increased matrix destruction in the osteoarthritic joint. YKL-39 and YKL-40 are potentially interesting marker molecules for arthritic joint disease because they are abundantly expressed by both normal and osteoarthritic chondrocytes.

ei

[BibTex]

[BibTex]


no image
Technical report on implementation of linear methods and validation on acoustic sources

Harmeling, S., Bünau, P., Ziehe, A., Pham, D.

EU-Project BLISS, September 2003 (techreport)

ei

PDF [BibTex]

PDF [BibTex]


no image
Statistical Learning Theory

Bousquet, O.

Machine Learning Summer School, August 2003 (talk)

ei

PDF [BibTex]

PDF [BibTex]


no image
Remarks on Statistical Learning Theory

Bousquet, O.

Machine Learning Summer School, August 2003 (talk)

ei

PDF [BibTex]

PDF [BibTex]


Thumb xl hedvig
Learning the statistics of people in images and video

Sidenbladh, H., Black, M. J.

International Journal of Computer Vision, 54(1-3):183-209, August 2003 (article)

Abstract
This paper address the problems of modeling the appearance of humans and distinguishing human appearance from the appearance of general scenes. We seek a model of appearance and motion that is generic in that it accounts for the ways in which people's appearance varies and, at the same time, is specific enough to be useful for tracking people in natural scenes. Given a 3D model of the person projected into an image we model the likelihood of observing various image cues conditioned on the predicted locations and orientations of the limbs. These cues are taken to be steered filter responses corresponding to edges, ridges, and motion-compensated temporal differences. Motivated by work on the statistics of natural scenes, the statistics of these filter responses for human limbs are learned from training images containing hand-labeled limb regions. Similarly, the statistics of the filter responses in general scenes are learned to define a “background” distribution. The likelihood of observing a scene given a predicted pose of a person is computed, for each limb, using the likelihood ratio between the learned foreground (person) and background distributions. Adopting a Bayesian formulation allows cues to be combined in a principled way. Furthermore, the use of learned distributions obviates the need for hand-tuned image noise models and thresholds. The paper provides a detailed analysis of the statistics of how people appear in scenes and provides a connection between work on natural image statistics and the Bayesian tracking of people.

ps

pdf pdf from publisher code DOI [BibTex]

pdf pdf from publisher code DOI [BibTex]


Thumb xl delatorreijcvteaser
A framework for robust subspace learning

De la Torre, F., Black, M. J.

International Journal of Computer Vision, 54(1-3):117-142, August 2003 (article)

Abstract
Many computer vision, signal processing and statistical problems can be posed as problems of learning low dimensional linear or multi-linear models. These models have been widely used for the representation of shape, appearance, motion, etc., in computer vision applications. Methods for learning linear models can be seen as a special case of subspace fitting. One draw-back of previous learning methods is that they are based on least squares estimation techniques and hence fail to account for “outliers” which are common in realistic training sets. We review previous approaches for making linear learning methods robust to outliers and present a new method that uses an intra-sample outlier process to account for pixel outliers. We develop the theory of Robust Subspace Learning (RSL) for linear models within a continuous optimization framework based on robust M-estimation. The framework applies to a variety of linear learning problems in computer vision including eigen-analysis and structure from motion. Several synthetic and natural examples are used to develop and illustrate the theory and applications of robust subspace learning in computer vision.

ps

pdf code pdf from publisher Project Page [BibTex]

pdf code pdf from publisher Project Page [BibTex]


Thumb xl ijcvcoverhd
Guest editorial: Computational vision at Brown

Black, M. J., Kimia, B.

International Journal of Computer Vision, 54(1-3):5-11, August 2003 (article)

ps

pdf pdf from publisher [BibTex]

pdf pdf from publisher [BibTex]


no image
Statistical Learning Theory, Capacity and Complexity

Schölkopf, B.

Complexity, 8(4):87-94, July 2003 (article)

Abstract
We give an exposition of the ideas of statistical learning theory, followed by a discussion of how a reinterpretation of the insights of learning theory could potentially also benefit our understanding of a certain notion of complexity.

ei

Web DOI [BibTex]


Thumb xl cviu91teaser
Robust parameterized component analysis: Theory and applications to 2D facial appearance models

De la Torre, F., Black, M. J.

Computer Vision and Image Understanding, 91(1-2):53-71, July 2003 (article)

Abstract
Principal component analysis (PCA) has been successfully applied to construct linear models of shape, graylevel, and motion in images. In particular, PCA has been widely used to model the variation in the appearance of people's faces. We extend previous work on facial modeling for tracking faces in video sequences as they undergo significant changes due to facial expressions. Here we consider person-specific facial appearance models (PSFAM), which use modular PCA to model complex intra-person appearance changes. Such models require aligned visual training data; in previous work, this has involved a time consuming and error-prone hand alignment and cropping process. Instead, the main contribution of this paper is to introduce parameterized component analysis to learn a subspace that is invariant to affine (or higher order) geometric transformations. The automatic learning of a PSFAM given a training image sequence is posed as a continuous optimization problem and is solved with a mixture of stochastic and deterministic techniques achieving sub-pixel accuracy. We illustrate the use of the 2D PSFAM model with preliminary experiments relevant to applications including video-conferencing and avatar animation.

ps

pdf [BibTex]

pdf [BibTex]


no image
Ranking on Data Manifolds

Zhou, D., Weston, J., Gretton, A., Bousquet, O., Schölkopf, B.

(113), Max Planck Institute for Biological Cybernetics, 72076 Tuebingen, Germany, June 2003 (techreport)

Abstract
The Google search engine has had a huge success with its PageRank web page ranking algorithm, which exploits global, rather than local, hyperlink structure of the World Wide Web using random walk. This algorithm can only be used for graph data, however. Here we propose a simple universal ranking algorithm for vectorial data, based on the exploration of the intrinsic global geometric structure revealed by a huge amount of data. Experimental results from image and text to bioinformatics illustrates the validity of our algorithm.

ei

PDF [BibTex]

PDF [BibTex]


no image
Kernel Hebbian Algorithm for Iterative Kernel Principal Component Analysis

Kim, K., Franz, M., Schölkopf, B.

(109), MPI f. biologische Kybernetik, Tuebingen, June 2003 (techreport)

Abstract
A new method for performing a kernel principal component analysis is proposed. By kernelizing the generalized Hebbian algorithm, one can iteratively estimate the principal components in a reproducing kernel Hilbert space with only linear order memory complexity. The derivation of the method, a convergence proof, and preliminary applications in image hyperresolution are presented. In addition, we discuss the extension of the method to the online learning of kernel principal components.

ei

PDF [BibTex]

PDF [BibTex]


no image
Learning with Local and Global Consistency

Zhou, D., Bousquet, O., Lal, T., Weston, J., Schölkopf, B.

(112), Max Planck Institute for Biological Cybernetics, Tuebingen, Germany, June 2003 (techreport)

Abstract
We consider the learning problem in the transductive setting. Given a set of points of which only some are labeled, the goal is to predict the label of the unlabeled points. A principled clue to solve such a learning problem is the consistency assumption that a classifying function should be sufficiently smooth with respect to the structure revealed by these known labeled and unlabeled points. We present a simple algorithm to obtain such a smooth solution. Our method yields encouraging experimental results on a number of classification problems and demonstrates effective use of unlabeled data.

ei

[BibTex]

[BibTex]


no image
Dealing with large Diagonals in Kernel Matrices

Weston, J., Schölkopf, B., Eskin, E., Leslie, C., Noble, W.

Annals of the Institute of Statistical Mathematics, 55(2):391-408, June 2003 (article)

Abstract
In kernel methods, all the information about the training data is contained in the Gram matrix. If this matrix has large diagonal values, which arises for many types of kernels, then kernel methods do not perform well: We propose and test several methods for dealing with this problem by reducing the dynamic range of the matrix while preserving the positive definiteness of the Hessian of the quadratic programming problem that one has to solve when training a Support Vector Machine, which is a common kernel approach for pattern recognition.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
The Metric Nearness Problem with Applications

Dhillon, I., Sra, S., Tropp, J.

Univ. of Texas at Austin, June 2003 (techreport)

ei

GZIP [BibTex]

GZIP [BibTex]


no image
Implicit Wiener Series

Franz, M., Schölkopf, B.

(114), Max Planck Institute for Biological Cybernetics, June 2003 (techreport)

Abstract
The Wiener series is one of the standard methods to systematically characterize the nonlinearity of a neural system. The classical estimation method of the expansion coefficients via cross-correlation suffers from severe problems that prevent its application to high-dimensional and strongly nonlinear systems. We propose a new estimation method based on regression in a reproducing kernel Hilbert space that overcomes these problems. Numerical experiments show performance advantages in terms of convergence, interpretability and system size that can be handled.

ei

PDF [BibTex]

PDF [BibTex]


no image
Machine Learning approaches to protein ranking: discriminative, semi-supervised, scalable algorithms

Weston, J., Leslie, C., Elisseeff, A., Noble, W.

(111), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, June 2003 (techreport)

Abstract
A key tool in protein function discovery is the ability to rank databases of proteins given a query amino acid sequence. The most successful method so far is a web-based tool called PSI-BLAST which uses heuristic alignment of a profile built using the large unlabeled database. It has been shown that such use of global information via an unlabeled data improves over a local measure derived from a basic pairwise alignment such as performed by PSI-BLAST's predecessor, BLAST. In this article we look at ways of leveraging techniques from the field of machine learning for the problem of ranking. We show how clustering and semi-supervised learning techniques, which aim to capture global structure in data, can significantly improve over PSI-BLAST.

ei

PDF [BibTex]

PDF [BibTex]


no image
The em Algorithm for Kernel Matrix Completion with Auxiliary Data

Tsuda, K., Akaho, S., Asai, K.

Journal of Machine Learning Research, 4, pages: 67-81, May 2003 (article)

ei

PDF [BibTex]

PDF [BibTex]


no image
Constructing Descriptive and Discriminative Non-linear Features: Rayleigh Coefficients in Kernel Feature Spaces

Mika, S., Rätsch, G., Weston, J., Schölkopf, B., Smola, A., Müller, K.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(5):623-628, May 2003 (article)

Abstract
We incorporate prior knowledge to construct nonlinear algorithms for invariant feature extraction and discrimination. Employing a unified framework in terms of a nonlinearized variant of the Rayleigh coefficient, we propose nonlinear generalizations of Fisher‘s discriminant and oriented PCA using support vector kernel functions. Extensive simulations show the utility of our approach.

ei

DOI [BibTex]

DOI [BibTex]


no image
The Geometry Of Kernel Canonical Correlation Analysis

Kuss, M., Graepel, T.

(108), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, May 2003 (techreport)

Abstract
Canonical correlation analysis (CCA) is a classical multivariate method concerned with describing linear dependencies between sets of variables. After a short exposition of the linear sample CCA problem and its analytical solution, the article proceeds with a detailed characterization of its geometry. Projection operators are used to illustrate the relations between canonical vectors and variates. The article then addresses the problem of CCA between spaces spanned by objects mapped into kernel feature spaces. An exact solution for this kernel canonical correlation (KCCA) problem is derived from a geometric point of view. It shows that the expansion coefficients of the canonical vectors in their respective feature space can be found by linear CCA in the basis induced by kernel principal component analysis. The effect of mappings into higher dimensional feature spaces is considered critically since it simplifies the CCA problem in general. Then two regularized variants of KCCA are discussed. Relations to other methods are illustrated, e.g., multicategory kernel Fisher discriminant analysis, kernel principal component regression and possible applications thereof in blind source separation.

ei

PDF [BibTex]

PDF [BibTex]


no image
Kernel-based nonlinear blind source separation

Harmeling, S., Ziehe, A., Kawanabe, M., Müller, K.

Neural Computation, 15(5):1089-1124, May 2003 (article)

Abstract
We propose kTDSEP, a kernel-based algorithm for nonlinear blind source separation (BSS). It combines complementary research fields: kernel feature spaces and BSS using temporal information. This yields an efficient algorithm for nonlinear BSS with invertible nonlinearity. Key assumptions are that the kernel feature space is chosen rich enough to approximate the nonlinearity and that signals of interest contain temporal information. Both assumptions are fulfilled for a wide set of real-world applications. The algorithm works as follows: First, the data are (implicitly) mapped to a high (possibly infinite)—dimensional kernel feature space. In practice, however, the data form a smaller submanifold in feature space—even smaller than the number of training data points—a fact that has already been used by, for example, reduced set techniques for support vector machines. We propose to adapt to this effective dimension as a preprocessing step and to construct an orthonormal basis of this submanifold. The latter dimension-reduction step is essential for making the subsequent application of BSS methods computationally and numerically tractable. In the reduced space, we use a BSS algorithm that is based on second-order temporal decorrelation. Finally, we propose a selection procedure to obtain the original sources from the extracted nonlinear components automatically. Experiments demonstrate the excellent performance and efficiency of our kTDSEP algorithm for several problems of nonlinear BSS and for more than two sources.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
The Kernel Mutual Information

Gretton, A., Herbrich, R., Smola, A.

Max Planck Institute for Biological Cybernetics, April 2003 (techreport)

Abstract
We introduce two new functions, the kernel covariance (KC) and the kernel mutual information (KMI), to measure the degree of independence of several continuous random variables. The former is guaranteed to be zero if and only if the random variables are pairwise independent; the latter shares this property, and is in addition an approximate upper bound on the mutual information, as measured near independence, and is based on a kernel density estimate. We show that Bach and Jordan‘s kernel generalised variance (KGV) is also an upper bound on the same kernel density estimate, but is looser. Finally, we suggest that the addition of a regularising term in the KGV causes it to approach the KMI, which motivates the introduction of this regularisation. The performance of the KC and KMI is verified in the context of instantaneous independent component analysis (ICA), by recovering both artificial and real (musical) signals following linear mixing.

ei

PostScript [BibTex]

PostScript [BibTex]


no image
Tractable Inference for Probabilistic Data Models

Csato, L., Opper, M., Winther, O.

Complexity, 8(4):64-68, April 2003 (article)

Abstract
We present an approximation technique for probabilistic data models with a large number of hidden variables, based on ideas from statistical physics. We give examples for two nontrivial applications. © 2003 Wiley Periodicals, Inc.

ei

PDF GZIP Web [BibTex]

PDF GZIP Web [BibTex]


no image
Feature selection and transduction for prediction of molecular bioactivity for drug design

Weston, J., Perez-Cruz, F., Bousquet, O., Chapelle, O., Elisseeff, A., Schölkopf, B.

Bioinformatics, 19(6):764-771, April 2003 (article)

Abstract
Motivation: In drug discovery a key task is to identify characteristics that separate active (binding) compounds from inactive (non-binding) ones. An automated prediction system can help reduce resources necessary to carry out this task. Results: Two methods for prediction of molecular bioactivity for drug design are introduced and shown to perform well in a data set previously studied as part of the KDD (Knowledge Discovery and Data Mining) Cup 2001. The data is characterized by very few positive examples, a very large number of features (describing three-dimensional properties of the molecules) and rather different distributions between training and test data. Two techniques are introduced specifically to tackle these problems: a feature selection method for unbalanced data and a classifier which adapts to the distribution of the the unlabeled test data (a so-called transductive method). We show both techniques improve identification performance and in conjunction provide an improvement over using only one of the techniques. Our results suggest the importance of taking into account the characteristics in this data which may also be relevant in other problems of a similar type.

ei

Web [BibTex]


no image
Rademacher and Gaussian averages in Learning Theory

Bousquet, O.

Universite de Marne-la-Vallee, March 2003 (talk)

ei

PDF [BibTex]

PDF [BibTex]


no image
Use of the Zero-Norm with Linear Models and Kernel Methods

Weston, J., Elisseeff, A., Schölkopf, B., Tipping, M.

Journal of Machine Learning Research, 3, pages: 1439-1461, March 2003 (article)

Abstract
We explore the use of the so-called zero-norm of the parameters of linear models in learning. Minimization of such a quantity has many uses in a machine learning context: for variable or feature selection, minimizing training error and ensuring sparsity in solutions. We derive a simple but practical method for achieving these goals and discuss its relationship to existing techniques of minimizing the zero-norm. The method boils down to implementing a simple modification of vanilla SVM, namely via an iterative multiplicative rescaling of the training data. Applications we investigate which aid our discussion include variable and feature selection on biological microarray data, and multicategory classification.

ei

PDF PostScript PDF [BibTex]

PDF PostScript PDF [BibTex]


no image
Introduction: Robots with Cognition?

Franz, MO.

6, pages: 38, (Editors: H.H. Bülthoff, K.R. Gegenfurtner, H.A. Mallot, R. Ulrich, F.A. Wichmann), 6. T{\"u}binger Wahrnehmungskonferenz (TWK), February 2003 (talk)

Abstract
Using robots as models of cognitive behaviour has a long tradition in robotics. Parallel to the historical development in cognitive science, one observes two major, subsequent waves in cognitive robotics. The first is based on ideas of classical, cognitivist Artificial Intelligence (AI). According to the AI view of cognition as rule-based symbol manipulation, these robots typically try to extract symbolic descriptions of the environment from their sensors that are used to update a common, global world representation from which, in turn, the next action of the robot is derived. The AI approach has been successful in strongly restricted and controlled environments requiring well-defined tasks, e.g. in industrial assembly lines. AI-based robots mostly failed, however, in the unpredictable and unstructured environments that have to be faced by mobile robots. This has provoked the second wave in cognitive robotics which tries to achieve cognitive behaviour as an emergent property from the interaction of simple, low-level modules. Robots of the second wave are called animats as their architecture is designed to closely model aspects of real animals. Using only simple reactive mechanisms and Hebbian-type or evolutionary learning, the resulting animats often outperformed the highly complex AI-based robots in tasks such as obstacle avoidance, corridor following etc. While successful in generating robust, insect-like behaviour, typical animats are limited to stereotyped, fixed stimulus-response associations. If one adopts the view that cognition requires a flexible, goal-dependent choice of behaviours and planning capabilities (H.A. Mallot, Kognitionswissenschaft, 1999, 40-48) then it appears that cognitive behaviour cannot emerge from a collection of purely reactive modules. It rather requires environmentally decoupled structures that work without directly engaging the actions that it is concerned with. This poses the current challenge to cognitive robotics: How can we build cognitive robots that show the robustness and the learning capabilities of animats without falling back into the representational paradigm of AI? The speakers of the symposium present their approaches to this question in the context of robot navigation and sensorimotor learning. In the first talk, Prof. Helge Ritter introduces a robot system for imitation learning capable of exploring various alternatives in simulation before actually performing a task. The second speaker, Angelo Arleo, develops a model of spatial memory in rat navigation based on his electrophysiological experiments. He validates the model on a mobile robot which, in some navigation tasks, shows a performance comparable to that of the real rat. A similar model of spatial memory is used to investigate the mechanisms of territory formation in a series of robot experiments presented by Prof. Hanspeter Mallot. In the last talk, we return to the domain of sensorimotor learning where Ralf M{\"o}ller introduces his approach to generate anticipatory behaviour by learning forward models of sensorimotor relationships.

ei

Web [BibTex]

Web [BibTex]


no image
Expectation Maximization for Clustering on Hyperspheres

Banerjee, A., Dhillon, I., Ghosh, J., Sra, S.

Univ. of Texas at Austin, February 2003 (techreport)

ei

GZIP [BibTex]

GZIP [BibTex]


no image
Modeling Data using Directional Distributions

Dhillon, I., Sra, S.

Univ. of Texas at Austin, January 2003 (techreport)

ei

GZIP [BibTex]

GZIP [BibTex]


no image
An Introduction to Variable and Feature Selection.

Guyon, I., Elisseeff, A.

Journal of Machine Learning, 3, pages: 1157-1182, 2003 (article)

ei

[BibTex]

[BibTex]


no image
A Note on Parameter Tuning for On-Line Shifting Algorithms

Bousquet, O.

Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 2003 (techreport)

Abstract
In this short note, building on ideas of M. Herbster [2] we propose a method for automatically tuning the parameter of the FIXED-SHARE algorithm proposed by Herbster and Warmuth [3] in the context of on-line learning with shifting experts. We show that this can be done with a memory requirement of $O(nT)$ and that the additional loss incurred by the tuning is the same as the loss incurred for estimating the parameter of a Bernoulli random variable.

ei

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Dynamics of a rigid body in a Stokes fluid

Gonzalez, O., Graf, ABA., Maddocks, JH.

Journal of Fluid Mechanics, 2003 (article) Accepted

ei

[BibTex]

[BibTex]


no image
A novel transient heater-foil technique for liquid crystal experiments on film cooled surfaces

Vogel, G., Graf, ABA., von Wolfersdorf, J., Weigand, B.

ASME Journal of Turbomachinery, 125, pages: 529-537, 2003 (article)

ei

PDF [BibTex]

PDF [BibTex]


no image
Prediction at an Uncertain Input for Gaussian Processes and Relevance Vector Machines - Application to Multiple-Step Ahead Time-Series Forecasting

Quiñonero-Candela, J., Girard, A., Rasmussen, C.

(IMM-2003-18), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 2003 (techreport)

ei

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Microarrays: How Many Do You Need?

Zien, A., Fluck, J., Zimmer, R., Lengauer, T.

Journal of Computational Biology, 10(3-4):653-667, 2003 (article)

Abstract
We estimate the number of microarrays that is required in order to gain reliable results from a common type of study: the pairwise comparison of different classes of samples. We show that current knowledge allows for the construction of models that look realistic with respect to searches for individual differentially expressed genes and derive prototypical parameters from real data sets. Such models allow investigation of the dependence of the required number of samples on the relevant parameters: the biological variability of the samples within each class, the fold changes in expression that are desired to be detected, the detection sensitivity of the microarrays, and the acceptable error rates of the results. We supply experimentalists with general conclusions as well as a freely accessible Java applet at www.scai.fhg.de/special/bio/howmanyarrays/ for fine tuning simulations to their particular settings.

ei

Web [BibTex]

Web [BibTex]


no image
New Approaches to Statistical Learning Theory

Bousquet, O.

Annals of the Institute of Statistical Mathematics, 55(2):371-389, 2003 (article)

Abstract
We present new tools from probability theory that can be applied to the analysis of learning algorithms. These tools allow to derive new bounds on the generalization performance of learning algorithms and to propose alternative measures of the complexity of the learning task, which in turn can be used to derive new learning algorithms.

ei

PostScript [BibTex]

PostScript [BibTex]