Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Empirical Inference Conference Paper Implicit estimation of Wiener series Franz, M., Schölkopf, B. In Machine Learning for Signal Processing XIV, Proc. 2004 IEEE Signal Processing Society Workshop, 735-744, (Editors: A Barros and J Principe and J Larsen and T Adali and S Douglas), IEEE, New York, Machine Learning for Signal Processing XIV, Proc. 2004 IEEE Signal Processing Society Workshop, 2004
The Wiener series is one of the standard methods to systematically characterize the nonlinearity of a system. The classical estimation method of the expansion coefficients via cross-correlation suffers from severe problems that prevent its application to high-dimensional and strongly nonlinear systems. We propose an implicit estimation method based on regression in a reproducing kernel Hilbert space that alleviates these problems. Experiments show performance advantages in terms of convergence, interpretability, and system sizes that can be handled.
PDF PostScript BibTeX

Empirical Inference Book Chapter Introduction to Statistical Learning Theory Bousquet, O., Boucheron, S., Lugosi, G. In Lecture Notes in Artificial Intelligence 3176:169-207, (Editors: Bousquet, O., U. von Luxburg and G. Rätsch), Springer, Heidelberg, Germany, 2004 PDF BibTeX

Empirical Inference Article Kernel Methods and their Potential Use in Signal Processing Perez-Cruz, F., Bousquet, O. IEEE Signal Processing Magazine, (Special issue on Signal Processing for Mining), 2004 (Accepted) PostScript BibTeX

Empirical Inference Conference Paper Kernel Methods for Manifold Estimation Schölkopf, B. In Proceedings in Computational Statistics, Proceedings in Computational Statistics, 441-452, (Editors: J Antoch), Physica-Verlag/Springer, Heidelberg, Germany, COMPSTAT, 2004 BibTeX

Empirical Inference Book Chapter Kernels for graphs Kashima, H., Tsuda, K., Inokuchi, A. In 155-170, (Editors: Schoelkopf, B., K. Tsuda and J.P. Vert), MIT Press, Cambridge, MA; USA, 2004 PDF BibTeX

Empirical Inference Conference Paper Learning from Labeled and Unlabeled Data Using Random Walks Zhou, D., Schölkopf, B. In Pattern Recognition, Proceedings of the 26th DAGM Symposium, 237-244, (Editors: Rasmussen, C.E., H.H. Bülthoff, M.A. Giese and B. Schölkopf), Pattern Recognition, Proceedings of the 26th DAGM Symposium, 2004
We consider the general problem of learning from labeled and unlabeled data. Given a set of points, some of them are labeled, and the remaining points are unlabeled. The goal is to predict the labels of the unlabeled points. Any supervised learning algorithm can be applied to this problem, for instance, Support Vector Machines (SVMs). The problem of our interest is if we can implement a classifier which uses the unlabeled data information in some way and has higher accuracy than the classifiers which use the labeled data only. Recently we proposed a simple algorithm, which can substantially benefit from large amounts of unlabeled data and demonstrates clear superiority to supervised learning methods. In this paper we further investigate the algorithm using random walks and spectral graph theory, which shed light on the key steps in this algorithm.
PDF PostScript BibTeX

Empirical Inference Technical Report Learning from Labeled and Unlabeled Data Using Random Walks Zhou, D., Schölkopf, B. Max Planck Institute for Biological Cybernetics, 2004
We consider the general problem of learning from labeled and unlabeled data. Given a set of points, some of them are labeled, and the remaining points are unlabeled. The goal is to predict the labels of the unlabeled points. Any supervised learning algorithm can be applied to this problem, for instance, Support Vector Machines (SVMs). The problem of our interest is if we can implement a classifier which uses the unlabeled data information in some way and has higher accuracy than the classifiers which use the labeled data only. Recently we proposed a simple algorithm, which can substantially benefit from large amounts of unlabeled data and demonstrates clear superiority to supervised learning methods. In this paper we further investigate the algorithm using random walks and spectral graph theory, which shed light on the key steps in this algorithm.
PDF PostScript BibTeX

Empirical Inference Poster Masking by plaid patterns revisited Wichmann, F. Experimentelle Psychologie. Beitr{\"a}ge zur 46. Tagung experimentell arbeitender Psychologen, 46:285, 2004 BibTeX

Empirical Inference Conference Paper Maximal Margin Classification for Metric Spaces Hein, M., Bousquet, O. In Learning Theory and Kernel Machines, 72-86, (Editors: Schölkopf, B. and Warmuth, M. K.), Springer, Heidelberg, Germany, 16. Annual Conference on Computational Learning Theory / COLT Kernel, 2004
In this article we construct a maximal margin classification algorithm for arbitrary metric spaces. At first we show that the Support Vector Machine (SVM) is a maximal margin algorithm for the class of metric spaces where the negative squared distance is conditionally positive definite (CPD). This means that the metric space can be isometrically embedded into a Hilbert space, where one performs linear maximal margin separation. We will show that the solution only depends on the metric, but not on the kernel. Following the framework we develop for the SVM, we construct an algorithm for maximal margin classification in arbitrary metric spaces. The main difference compared with SVM is that we no longer embed isometrically into a Hilbert space, but a Banach space. We further give an estimate of the capacity of the function class involved in this algorithm via Rademacher averages. We recover an algorithm of Graepel et al. [6].
PDF PostScript PDF DOI BibTeX

Empirical Inference Conference Paper Multivariate Regression via Stiefel Manifold Constraints BakIr, G., Gretton, A., Franz, M., Schölkopf, B. In Pattern Recognition, Proceedings of the 26th DAGM Symposium, Lecture Notes in Computer Science, Vol. 3175, 262-269, (Editors: CE Rasmussen and HH Bülthoff and B Schölkopf and MA Giese), Springer, Berlin, Germany, Pattern Recognition, Proceedings of the 26th DAGM Symposium, 2004
We introduce a learning technique for regression between high-dimensional spaces. Standard methods typically reduce this task to many one-dimensional problems, with each output dimension considered independently. By contrast, in our approach the feature construction and the regression estimation are performed jointly, directly minimizing a loss function that we specify, subject to a rank constraint. A major advantage of this approach is that the loss is no longer chosen according to the algorithmic requirements, but can be tailored to the characteristics of the task at hand; the features will then be optimal with respect to this objective, and dependence between the outputs can be exploited.
PostScript BibTeX

Empirical Inference Technical Report Multivariate Regression with Stiefel Constraints Bakir, G., Gretton, A., Franz, M., Schölkopf, B. (128), MPI for Biological Cybernetics, Spemannstr 38, 72076, Tuebingen, 2004
We introduce a new framework for regression between multi-dimensional spaces. Standard methods for solving this problem typically reduce the problem to one-dimensional regression by choosing features in the input and/or output spaces. These methods, which include PLS (partial least squares), KDE (kernel dependency estimation), and PCR (principal component regression), select features based on different a-priori judgments as to their relevance. Moreover, loss function and constraints are chosen not primarily on statistical grounds, but to simplify the resulting optimisation. By contrast, in our approach the feature construction and the regression estimation are performed jointly, directly minimizing a loss function that we specify, subject to a rank constraint. A major advantage of this approach is that the loss is no longer chosen according to the algorithmic requirements, but can be tailored to the characteristics of the task at hand; the features will then be optimal with respect to this objective. Our approach also allows for the possibility of using a regularizer in the optimization. Finally, by processing the observations sequentially, our algorithm is able to work on large scale problems.
PDF BibTeX

Empirical Inference Poster Neural mechanisms underlying control of a Brain-Computer-Interface (BCI): Simultaneous recording of bold-response and EEG Hinterberger, T., Wilhelm, B., Veit, R., Weiskopf, N., Lal, T., Birbaumer, N. 2004
Brain computer interfaces (BCI) enable humans or animals to communicate or activate external devices without muscle activity using electric brain signals. The BCI Thought Translation Device (TTD) uses learned regulation of slow cortical potentials (SCPs), a skill most people and paralyzed patients can acquire with training periods of several hours up to months. The neurophysiological mechanisms and anatomical sources of SCPs and other event-related brain macro-potentials are well understood, but the neural mechanisms underlying learning of the self-regulation skill for BCI-use are unknown. To uncover the relevant areas of brain activation during regulation of SCPs, the TTD was combined with functional MRI and EEG was recorded inside the MRI scanner in twelve healthy participants who have learned to regulate their SCP with feedback and reinforcement. The results demonstrate activation of specific brain areas during execution of the brain regulation skill: successf! ul control of cortical positivity allowing a person to activate an external device was closely related to an increase of BOLD (blood oxygen level dependent) response in the basal ganglia and frontal premotor deactivation indicating learned regulation of a cortical-striatal loop responsible for local excitation thresholds of cortical assemblies. The data suggest that human users of a BCI learn the regulation of cortical excitation thresholds of large neuronal assemblies as a prerequisite of direct brain communication: the learning of this skill depends critically on an intact and flexible interaction between these cortico-basal ganglia-circuits. Supported by the Deutsche Forschungsgemeinschaft (DFG) and the National Institute of Health (NIH).
BibTeX

Empirical Inference Conference Paper On the Convergence of Spectral Clustering on Random Samples: The Normalized Case von Luxburg, U., Bousquet, O., Belkin, M. In Proceedings of the 17th Annual Conference on Learning Theory, 457-471, Proceedings of the 17th Annual Conference on Learning Theory, 2004 PDF PostScript BibTeX

Empirical Inference Article Phenotypic Characterization of Human Chondrocyte Cell Line C-20/A4: A Comparison between Monolayer and Alginate Suspension Culture Finger, F., Schorle, C., Söder, S., Zien, A., Goldring, M., Aigner, T. Cells Tissues Organs, 178(2):65-77, 2004
DNA microarray analysis was used to investigate the molecular phenotype of one of the first human chondrocyte cell lines, C-20/A4, derived from juvenile costal chondrocytes by immortalization with origin-defective simian virus 40 large T antigen. Clontech Human Cancer Arrays 1.2 and quantitative PCR were used to examine gene expression profiles of C-20/A4 cells cultured in the presence of serum in monolayer and alginate beads. In monolayer cultures, genes involved in cell proliferation were strongly upregulated compared to those expressed by human adult articular chondrocytes in primary culture. Of the cell cycle-regulated genes, only two, the CDK regulatory subunit and histone H4, were downregulated after culture in alginate beads, consistent with the ability of these cells to proliferate in suspension culture. In contrast, the expression of several genes that are involved in pericellular matrix formation, including MMP-14, COL6A1, fibronectin, biglycan and decorin, was upregulated when the C-20/A4 cells were transferred to suspension culture in alginate. Also, nexin-1, vimentin, and IGFBP-3, which are known to be expressed by primary chondrocytes, were differentially expressed in our study. Consistent with the proliferative phenotype of this cell line, few genes involved in matrix synthesis and turnover were highly expressed in the presence of serum. These results indicate that immortalized chondrocyte cell lines, rather than substituting for primary chondrocytes, may serve as models for extending findings on chondrocyte function not achievable by the use of primary chondrocytes.
BibTeX

Empirical Inference Book Chapter Protein Classification via Kernel Matrix Completion Kin, T., Kato, T., Tsuda, K. In 261-274, (Editors: Schoelkopf, B., K. Tsuda and J.P. Vert), MIT Press, Cambridge, MA; USA, 2004 PDF BibTeX

Empirical Inference Conference Paper Protein Functional Class Prediction with a Combined Graph Shin, H., Tsuda, K., Schölkopf, B. In Proceedings of the Korean Data Mining Conference, 200-219, Proceedings of the Korean Data Mining Conference, 2004
In bioinformatics, there exist multiple descriptions of graphs for the same set of genes or proteins. For instance, in yeast systems, graph edges can represent different relationships such as protein-protein interactions, genetic interactions, or co-participation in a protein complex, etc. Relying on similarities between nodes, each graph can be used independently for prediction of protein function. However, since different graphs contain partly independent and partly complementary information about the problem at hand, one can enhance the total information extracted by combining all graphs. In this paper, we propose a method for integrating multiple graphs within a framework of semi-supervised learning. The method alternates between minimizing the objective function with respect to network output and with respect to combining weights. We apply the method to the task of protein functional class prediction in yeast. The proposed method performs significantly better than the same algorithm trained on any single graph.
PDF BibTeX

Empirical Inference Article Protein ranking: from local to global structure in the protein similarity network Weston, J., Elisseeff, A., Zhou, D., Leslie, C., Noble, W. Proceedings of the National Academy of Science, 101(17):6559-6563, 2004
Biologists regularly search databases of DNA or protein sequences for evolutionary or functional relationships to a given query sequence. We describe a ranking algorithm that exploits the entire network structure of similarity relationships among proteins in a sequence database by performing a diffusion operation on a pre-computed, weighted network. The resulting ranking algorithm, evaluated using a human-curated database of protein structures, is efficient and provides significantly better rankings than a local network search algorithm such as PSI-BLAST.
Web BibTeX

Empirical Inference Conference Paper Semi-supervised kernel regression using whitened function classes Franz, M., Kwon, Y., Rasmussen, C., Schölkopf, B. In Pattern Recognition, Proceedings of the 26th DAGM Symposium, Pattern Recognition, Proceedings of the 26th DAGM Symposium, Lecture Notes in Computer Science, Vol. 3175, LNCS 3175:18-26, (Editors: CE Rasmussen and HH Bülthoff and MA Giese and B Schölkopf), Springer, Berlin, Gerrmany, 26th DAGM Symposium, 2004
The use of non-orthonormal basis functions in ridge regression leads to an often undesired non-isotropic prior in function space. In this study, we investigate an alternative regularization technique that results in an implicit whitening of the basis functions by penalizing directions in function space with a large prior variance. The regularization term is computed from unlabelled input data that characterizes the input distribution. Tests on two datasets using polynomial basis functions showed an improved average performance compared to standard ridge regression.
PDF PostScript BibTeX

Empirical Inference Article Some observations on the effects of slant and texture type on slant-from-texture Rosas, P., Wichmann, F., Wagemans, J. Vision Research, 44(13):1511-1535, 2004
We measure the performance of five subjects in a slant-discrimination task for differently textured planes. As textures we used uniform lattices, randomly displaced lattices, circles (polka dots), Voronoi tessellations, plaids, 1/f noise, “coherent” noise and a leopard skin-like texture. Our results show: (1) Improving performance with larger slants for all textures. (2) Thus, following from (1), cases of “non-symmetrical” performance around a particular orientation. (3) For orientations sufficiently slanted, the different textures do not elicit major differences in performance, (4) while for orientations closer to the vertical plane there are marked differences between them. (5) These differences allow a rank-order of textures to be formed according to their “helpfulness”– that is, how easy the discrimination task is when a particular texture is mapped on the plane. Polka dots tend to allow the best slant discrimination performance, noise patterns the worst. Two additional experiments were conducted to test the generality of the obtained rank-order. First, the tilt of the planes was rotated to break the axis of gravity present in the original discrimination experiment. Second, the task was changed to a slant report task via probe adjustment. The results of both control experiments confirmed the texture-based rank-order previously obtained. We comment on the importance of these results for depth perception research in general, and in particular the implications our results have for studies of cue combination (sensor fusion) using texture as one of the cues involved.
PDF BibTeX

Empirical Inference Ph.D. Thesis Statistical Learning with Similarity and Dissimilarity Functions von Luxburg, U. 1-166, Technische Universität Berlin, Germany, Technische Universität Berlin, Germany, 2004 PDF PostScript BibTeX

Empirical Inference Miscellaneous Statistische Lerntheorie und Empirische Inferenz Schölkopf, B. Jahrbuch der Max-Planck-Gesellschaft, 2004:377-382, 2004
Statistical learning theory studies the process of inferring regularities from empirical data. The fundamental problem is what is called generalization: how it is possible to infer a law which will be valid for an infinite number of future observations, given only a finite amount of data? This problem hinges upon fundamental issues of statistics and science in general, such as the problems of complexity of explanations, a priori knowledge, and representation of data.
PDF Web BibTeX

Empirical Inference Technical Report Transductive Inference with Graphs Zhou, D., Schölkopf, B. Max Planck Institute for Biological Cybernetics, 2004, See the improved version Regularization on Discrete Spaces.
We propose a general regularization framework for transductive inference. The given data are thought of as a graph, where the edges encode the pairwise relationships among data. We develop discrete analysis and geometry on graphs, and then naturally adapt the classical regularization in the continuous case to the graph situation. A new and effective algorithm is derived from this general framework, as well as an approach we developed before.
BibTeX

Empirical Inference Conference Paper Unifying Colloborative and Content-Based Filtering. Basilico, J., Hofmann, T. In Proceedings of the 21st International Conference on Machine Learning, ACM International Conference Proceeding Series, 65 , (Editors: Greiner, R. , D. Schuurmans), ACM Press, New York, USA, ICLM, 2004
Collaborative and content-based filtering are two paradigms that have been applied in the context of recommender systems and user preference prediction. This paper proposes a novel, unified approach that systematically integrates all available training information such as past user-item ratings as well as attributes of items or users to learn a prediction function. The key ingredient of our method is the design of a suitable kernel or similarity function between user-item pairs that allows simultaneous generalization across the user and item dimensions. We propose an on-line algorithm (JRank) that generalizes perceptron learning. Experimental results on the EachMovie data set show significant improvements over standard approaches.
PDF BibTeX

Empirical Inference Technical Report Support Vector Channel Selection in BCI Lal, T., Schröder, M., Hinterberger, T., Weston, J., Bogdan, M., Birbaumer, N., Schölkopf, B. (120), Max Planck Institute for Biological Cybernetics, Tuebingen, Germany, December 2003
Designing a Brain Computer Interface (BCI) system one can choose from a variety of features that may be useful for classifying brain activity during a mental task. For the special case of classifying EEG signals we propose the usage of the state of the art feature selection algorithms Recursive Feature Elimination [3] and Zero-Norm Optimization [13] which are based on the training of Support Vector Machines (SVM) [11]. These algorithms can provide more accurate solutions than standard filter methods for feature selection [14]. We adapt the methods for the purpose of selecting EEG channels. For a motor imagery paradigm we show that the number of used channels can be reduced significantly without increasing the classification error. The resulting best channels agree well with the expected underlying cortical activity patterns during the mental tasks. Furthermore we show how time dependent task specific information can be visualized.
PDF Web BibTeX

Empirical Inference Poster Texture and haptic cues in slant discrimination: Measuring the effect of texture type on cue combination Rosas, P., Wichmann, F., Ernst, M., Wagemans, J. Journal of Vision, 3(12):26, 2003 Fall Vision Meeting of the Optical Society of America, December 2003
In a number of models of depth cue combination the depth percept is constructed via a weighted average combination of independent depth estimations. The influence of each cue in such average depends on the reliability of the source of information. (Young, Landy, & Maloney, 1993; Ernst & Banks, 2002.) In particular, Ernst & Banks (2002) formulate the combination performed by the human brain as that of the minimum variance unbiased estimator that can be constructed from the available cues. Using slant discrimination and slant judgment via probe adjustment as tasks, we have observed systematic differences in performance of human observers when a number of different types of textures were used as cue to slant (Rosas, Wichmann & Wagemans, 2003). If the depth percept behaves as described above, our measurements of the slopes of the psychometric functions provide the predicted weights for the texture cue for the ranked texture types. We have combined these texture types with object motion but the obtained results are difficult to reconcile with the unbiased minimum variance estimator model (Rosas & Wagemans, 2003). This apparent failure of such model might be explained by the existence of a coupling of texture and motion, violating the assumption of independence of cues. Hillis, Ernst, Banks, & Landy (2002) have shown that while for between-modality combination the human visual system has access to the single-cue information, for within-modality combination (visual cues: disparity and texture) the single-cue information is lost, suggesting a coupling between these cues. Then, in the present study we combine the different texture types with haptic information in a slant discrimination task, to test whether in the between-modality condition the texture cue and the haptic cue to slant are combined as predicted by an unbiased, minimum variance estimator model.
Web DOI BibTeX

Empirical Inference Article Concentration Inequalities for Sub-Additive Functions Using the Entropy Method Bousquet, O. Stochastic Inequalities and Applications, 56:213-247, Progress in Probability, (Editors: Giné, E., C. Houdré and D. Nualart), November 2003
We obtain exponential concentration inequalities for sub-additive functions of independent random variables under weak conditions on the increments of those functions, like the existence of exponential moments for these increments. As a consequence of these general inequalities, we obtain refinements of Talagrand's inequality for empirical processes and new bounds for randomized empirical processes. These results are obtained by further developing the entropy method introduced by Ledoux.
PostScript BibTeX

Empirical Inference Proceedings Learning Theory and Kernel Machines: 16th Annual Conference on Learning Theory and 7th Kernel Workshop (COLT/Kernel 2003), LNCS Vol. 2777 Schölkopf, B., Warmuth, M. Proceedings of the 16th Annual Conference on Learning Theory and 7th Kernel Workshop (COLT/Kernel 2003), COLT/Kernel 2003, 746, Springer, Berlin, Germany, 16th Annual Conference on Learning Theory and 7th Kernel Workshop, November 2003, Lecture Notes in Computer Science ; 2777 DOI BibTeX

Empirical Inference Conference Paper Cluster Kernels for Semi-Supervised Learning Chapelle, O., Weston, J., Schölkopf, B. In Advances in Neural Information Processing Systems, Advances in Neural Information Processing Systems 15, 585-592, (Editors: S Becker and S Thrun and K Obermayer), MIT Press, Cambridge, MA, USA, 16th Annual Conference on Neural Information Processing Systems (NIPS 2002), October 2003
We propose a framework to incorporate unlabeled data in kernel classifier, based on the idea that two points in the same cluster are more likely to have the same label. This is achieved by modifying the eigenspectrum of the kernel matrix. Experimental results assess the validity of this approach.
PDF Web BibTeX

Empirical Inference Conference Paper Clustering with the Fisher score Tsuda, K., Kawanabe, M., Müller, K. In Advances in Neural Information Processing Systems 15, Advances in Neural Information Processing Systems 15, 729-736, (Editors: Becker, S. , S. Thrun, K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS 2002), October 2003
Recently the Fisher score (or the Fisher kernel) is increasingly used as a feature extractor for classification problems. The Fisher score is a vector of parameter derivatives of loglikelihood of a probabilistic model. This paper gives a theoretical analysis about how class information is preserved in the space of the Fisher score, which turns out that the Fisher score consists of a few important dimensions with class information and many nuisance dimensions. When we perform clustering with the Fisher score, K-Means type methods are obviously inappropriate because they make use of all dimensions. So we will develop a novel but simple clustering algorithm specialized for the Fisher score, which can exploit important dimensions. This algorithm is successfully tested in experiments with artificial data and real data (amino acid sequences).
PDF Web BibTeX

Empirical Inference Technical Report Image Reconstruction by Linear Programming Tsuda, K., Rätsch, G. (118), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, October 2003 PDF BibTeX

Empirical Inference Conference Paper Kernel Dependency Estimation Weston, J., Chapelle, O., Elisseeff, A., Schölkopf, B., Vapnik, V. In Advances in Neural Information Processing Systems 15, 873-880, (Editors: S Becker and S Thrun and K Obermayer), MIT Press, Cambridge, MA, USA, 16th Annual Conference on Neural Information Processing Systems (NIPS 2002), October 2003 PDF Web BibTeX

Empirical Inference Conference Paper Linear Combinations of Optic Flow Vectors for Estimating Self-Motion: a Real-World Test of a Neural Model Franz, M., Chahl, J. In Advances in Neural Information Processing Systems, Advances in Neural Information Processing Systems 15, 1319-1326, (Editors: Becker, S., S. Thrun and K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS 2002), October 2003
The tangential neurons in the fly brain are sensitive to the typical optic flow patterns generated during self-motion. In this study, we examine whether a simplified linear model of these neurons can be used to estimate self-motion from the optic flow. We present a theory for the construction of an estimator consisting of a linear combination of optic flow vectors that incorporates prior knowledge both about the distance distribution of the environment, and about the noise and self-motion statistics of the sensor. The estimator is tested on a gantry carrying an omnidirectional vision sensor. The experiments show that the proposed approach leads to accurate and robust estimates of rotation rates, whereas translation estimates turn out to be less reliable.
PDF Web BibTeX

Empirical Inference Conference Paper Mismatch String Kernels for SVM Protein Classification Leslie, C., Eskin, E., Weston, J., Noble, W. In Advances in Neural Information Processing Systems, Advances in Neural Information Processing Systems 15, 1417-1424, (Editors: Becker, S. , S. Thrun, K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS 2002), October 2003
We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. These kernels measure sequence similarity based on shared occurrences of k-length subsequences, counted with up to m mismatches, and do not rely on any generative model for the positive training sequences. We compute the kernels efficiently using a mismatch tree data structure and report experiments on a benchmark SCOP dataset, where we show that the mismatch kernel used with an SVM classifier performs as well as the Fisher kernel, the most successful method for remote homology detection, while achieving considerable computational savings.
PDF Web BibTeX

Empirical Inference Conference Paper On the Complexity of Learning the Kernel Matrix Bousquet, O., Herrmann, D. In Advances in Neural Information Processing Systems, 15, Advances in Neural Information Processing Systems 15, 399-406, (Editors: Becker, S. , S. Thrun, K. Obermayer), The MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS 2002), October 2003
We investigate data based procedures for selecting the kernel when learning with Support Vector Machines. We provide generalization error bounds by estimating the Rademacher complexities of the corresponding function classes. In particular we obtain a complexity bound for function classes induced by kernels with given eigenvectors, i.e., we allow to vary the spectrum and keep the eigenvectors fix. This bound is only a logarithmic factor bigger than the complexity of the function class induced by a single kernel. However, optimizing the margin over such classes leads to overfitting. We thus propose a suitable way of constraining the class. We use an efficient algorithm to solve the resulting optimization problem, present preliminary experimental results, and compare them to an alignment-based approach.
PDF Web BibTeX

Empirical Inference Thesis Real-Time Face Detection Kienzle, W. Biologische Kybernetik, Eberhard-Karls-Universitaet Tuebingen, Tuebingen, Germany, October 2003 BibTeX

Empirical Inference Conference Paper Adaptive, Cautious, Predictive control with Gaussian Process Priors Murray-Smith, R., Sbarbaro, D., Rasmussen, C., Girard, A. In Proceedings of the 13th IFAC Symposium on System Identification, 1195-1200, (Editors: Van den Hof, P., B. Wahlberg and S. Weiland), Proceedings of the 13th IFAC Symposium on System Identification, August 2003
Nonparametric Gaussian Process models, a Bayesian statistics approach, are used to implement a nonlinear adaptive control law. Predictions, including propagation of the state uncertainty are made over a k-step horizon. The expected value of a quadratic cost function is minimised, over this prediction horizon, without ignoring the variance of the model predictions. The general method and its main features are illustrated on a simulation example.
PDF BibTeX

Empirical Inference Conference Paper Marginalized Kernels between Labeled Graphs Kashima, H., Tsuda, K., Inokuchi, A. In 20th International Conference on Machine Learning, 321-328, (Editors: Faucett, T. and N. Mishra), 20th International Conference on Machine Learning, August 2003 PDF BibTeX

Empirical Inference Conference Paper Sparse Gaussian Processes: inference, subspace identification and model selection Csato, L., Opper, M. In Proceedings, 1-6, (Editors: Van der Hof, , Wahlberg), The Netherlands, 13th IFAC Symposium on System Identifiaction, August 2003, electronical version; Index ThA02-2
Gaussian Process (GP) inference is a probabilistic kernel method where the GP is treated as a latent function. The inference is carried out using the Bayesian online learning and its extension to the more general iterative approach which we call TAP/EP learning. Sparsity is introduced in this context to make the TAP/EP method applicable to large datasets. We address the prohibitive scaling of the number of parameters by defining a subset of the training data that is used as the support the GP, thus the number of required parameters is independent of the training set, similar to the case of ``Support--‘‘ or ``Relevance--Vectors‘‘. An advantage of the full probabilistic treatment is that allows the computation of the marginal data likelihood or evidence, leading to hyper-parameter estimation within the GP inference. An EM algorithm to choose the hyper-parameters is proposed. The TAP/EP learning is the E-step and the M-step then updates the hyper-parameters. Due to the sparse E-step the resulting algorithm does not involve manipulation of large matrices. The presented algorithm is applicable to a wide variety of likelihood functions. We present results of applying the algorithm on classification and nonstandard regression problems for artificial and real datasets.
PDF GZIP BibTeX

Empirical Inference Talk Statistical Learning Theory Bousquet, O. Machine Learning Summer School, August 2003 PDF BibTeX

Empirical Inference Conference Paper On the Representation, Learning and Transfer of Spatio-Temporal Movement Characteristics Ilg, W., Bakir, G., Mezger, J., Giese, M. In Humanoids Proceedings, 0-0, Humanoids Proceedings, July 2003, electronical version
In this paper we present a learning-based approach for the modelling of complex movement sequences. Based on the method of Spatio-Temporal Morphable Models (STMMS. We derive a hierarchical algorithm that, in a first step, identifies automatically movement elements in movement sequences based on a coarse spatio-temporal description, and in a second step models these movement primitives by approximation through linear combinations of learned example movement trajectories. We describe the different steps of the algorithm and show how it can be applied for modelling and synthesis of complex sequences of human movements that contain movement elements with variable style. The proposed method is demonstrated on different applications of movement representation relevant for imitation learning of movement styles in humanoid robotics.
PDF BibTeX

Empirical Inference Article Statistical Learning Theory, Capacity and Complexity Schölkopf, B. Complexity, 8(4):87-94, July 2003
We give an exposition of the ideas of statistical learning theory, followed by a discussion of how a reinterpretation of the insights of learning theory could potentially also benefit our understanding of a certain notion of complexity.
Web DOI BibTeX

Empirical Inference Article Dealing with large Diagonals in Kernel Matrices Weston, J., Schölkopf, B., Eskin, E., Leslie, C., Noble, W. Annals of the Institute of Statistical Mathematics, 55(2):391-408, June 2003
In kernel methods, all the information about the training data is contained in the Gram matrix. If this matrix has large diagonal values, which arises for many types of kernels, then kernel methods do not perform well: We propose and test several methods for dealing with this problem by reducing the dynamic range of the matrix while preserving the positive definiteness of the Hessian of the quadratic programming problem that one has to solve when training a Support Vector Machine, which is a common kernel approach for pattern recognition.
PDF DOI BibTeX

Empirical Inference Technical Report Implicit Wiener Series Franz, M., Schölkopf, B. (114), Max Planck Institute for Biological Cybernetics, June 2003
The Wiener series is one of the standard methods to systematically characterize the nonlinearity of a neural system. The classical estimation method of the expansion coefficients via cross-correlation suffers from severe problems that prevent its application to high-dimensional and strongly nonlinear systems. We propose a new estimation method based on regression in a reproducing kernel Hilbert space that overcomes these problems. Numerical experiments show performance advantages in terms of convergence, interpretability and system size that can be handled.
PDF BibTeX

Empirical Inference Technical Report Kernel Hebbian Algorithm for Iterative Kernel Principal Component Analysis Kim, K., Franz, M., Schölkopf, B. (109), MPI f. biologische Kybernetik, Tuebingen, June 2003
A new method for performing a kernel principal component analysis is proposed. By kernelizing the generalized Hebbian algorithm, one can iteratively estimate the principal components in a reproducing kernel Hilbert space with only linear order memory complexity. The derivation of the method, a convergence proof, and preliminary applications in image hyperresolution are presented. In addition, we discuss the extension of the method to the online learning of kernel principal components.
PDF BibTeX

Empirical Inference Technical Report Learning with Local and Global Consistency Zhou, D., Bousquet, O., Lal, T., Weston, J., Schölkopf, B. (112), Max Planck Institute for Biological Cybernetics, Tuebingen, Germany, June 2003
We consider the learning problem in the transductive setting. Given a set of points of which only some are labeled, the goal is to predict the label of the unlabeled points. A principled clue to solve such a learning problem is the consistency assumption that a classifying function should be sufficiently smooth with respect to the structure revealed by these known labeled and unlabeled points. We present a simple algorithm to obtain such a smooth solution. Our method yields encouraging experimental results on a number of classification problems and demonstrates effective use of unlabeled data.
BibTeX

Empirical Inference Technical Report Machine Learning approaches to protein ranking: discriminative, semi-supervised, scalable algorithms Weston, J., Leslie, C., Elisseeff, A., Noble, W. (111), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, June 2003
A key tool in protein function discovery is the ability to rank databases of proteins given a query amino acid sequence. The most successful method so far is a web-based tool called PSI-BLAST which uses heuristic alignment of a profile built using the large unlabeled database. It has been shown that such use of global information via an unlabeled data improves over a local measure derived from a basic pairwise alignment such as performed by PSI-BLAST's predecessor, BLAST. In this article we look at ways of leveraging techniques from the field of machine learning for the problem of ranking. We show how clustering and semi-supervised learning techniques, which aim to capture global structure in data, can significantly improve over PSI-BLAST.
PDF BibTeX