Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Empirical Inference Article Robust EEG Channel Selection Across Subjects for Brain Computer Interfaces Schröder, M., Lal, T., Hinterberger, T., Bogdan, M., Hill, J., Birbaumer, N., Rosenstiel, W., Schölkopf, B. EURASIP Journal on Applied Signal Processing, 2005(19, Special Issue: Trends in Brain Computer Interfaces):3103-3112, (Editors: Vesin, J. M., T. Ebrahimi), 2005
Most EEG-based Brain Computer Interface (BCI) paradigms come along with specific electrode positions, e.g.~for a visual based BCI electrode positions close to the primary visual cortex are used. For new BCI paradigms it is usually not known where task relevant activity can be measured from the scalp. For individual subjects Lal et.~al showed that recording positions can be found without the use of prior knowledge about the paradigm used. However it remains unclear to what extend their method of Recursive Channel Elimination (RCE) can be generalized across subjects. In this paper we transfer channel rankings from a group of subjects to a new subject. For motor imagery tasks the results are promising, although cross-subject channel selection does not quite achieve the performance of channel selection on data of single subjects. Although the RCE method was not provided with prior knowledge about the mental task, channels that are well known to be important (from a physiological point of view) were consistently selected whereas task-irrelevant channels were reliably disregarded.
Web DOI BibTeX

Empirical Inference Book Chapter Support Vector Machines and Kernel Algorithms Schölkopf, B., Smola, A. In Encyclopedia of Biostatistics (2nd edition), Vol. 8, 8:5328-5335, (Editors: P Armitage and T Colton), John Wiley & Sons, NY USA, 2005 BibTeX

Empirical Inference Poster The human brain as large margin classifier Graf, A., Wichmann, F., Bülthoff, H., Schölkopf, B. Proceedings of the Computational & Systems Neuroscience Meeting (COSYNE), 2:1, 2005 BibTeX

Empirical Inference Article Theory of Classification: A Survey of Some Recent Advances Boucheron, S., Bousquet, O., Lugosi, G. ESAIM: Probability and Statistics, 9:323 , 2005
The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have lead to these important recent developments.
PDF DOI BibTeX

Empirical Inference Technical Report Towards a Statistical Theory of Clustering. Presented at the PASCAL workshop on clustering, London von Luxburg, U., Ben-David, S. Presented at the PASCAL workshop on clustering, London, 2005
The goal of this paper is to discuss statistical aspects of clustering in a framework where the data to be clustered has been sampled from some unknown probability distribution. Firstly, the clustering of the data set should reveal some structure of the underlying data rather than model artifacts due to the random sampling process. Secondly, the more sample points we have, the more reliable the clustering should be. We discuss which methods can and cannot be used to tackle those problems. In particular we argue that generalization bounds as they are used in statistical learning theory of classification are unsuitable in a general clustering framework. We suggest that the main replacements of generalization bounds should be convergence proofs and stability considerations. This paper should be considered as a road map paper which identifies important questions and potentially fruitful directions for future research about statistical clustering. We do not attempt to present a complete statistical theory of clustering.
PDF BibTeX

Empirical Inference Book Chapter Visual perception I: Basic principles Wagemans, J., Wichmann, F., de Beeck, H. In Handbook of Cognition, 3-47, (Editors: Lamberts, K. , R. Goldstone), Sage, London, 2005 BibTeX

Empirical Inference Conference Paper Attentional Modulation of Auditory Event-Related Potentials in a Brain-Computer Interface Hill, J., Lal, T., Bierig, K., Birbaumer, N., Schölkopf, B. In Proceedings of the 2004 IEEE International Workshop on Biomedical Circuits and Systems (BioCAS04), BioCAS04, (S3/5/INV- S3/17-20)4, IEEE Computer Society, Los Alamitos, CA, USA, 2004 IEEE International Workshop on Biomedical Circuits and Systems, December 2004
Motivated by the particular problems involved in communicating with "locked-in" paralysed patients, we aim to develop a brain-computer interface that uses auditory stimuli. We describe a paradigm that allows a user to make a binary decision by focusing attention on one of two concurrent auditory stimulus sequences. Using Support Vector Machine classification and Recursive Channel Elimination on the independent components of averaged event-related potentials, we show that an untrained user‘s EEG data can be classified with an encouragingly high level of accuracy. This suggests that it is possible for users to modulate EEG signals in a single trial by the conscious direction of attention, well enough to be useful in BCI.
PDF Web DOI BibTeX

Empirical Inference Article On the representation, learning and transfer of spatio-temporal movement characteristics Ilg, W., Bakir, G., Mezger, J., Giese, M. International Journal of Humanoid Robotics, 1(4):613-636, December 2004 BibTeX

Empirical Inference Article Efficient face detection by a cascaded support-vector machine expansion Romdhani, S., Torr, P., Schölkopf, B., Blake, A. Proceedings of The Royal Society of London A, 460(2501):3283-3297, A, November 2004
We describe a fast system for the detection and localization of human faces in images using a nonlinear ‘support-vector machine‘. We approximate the decision surface in terms of a reduced set of expansion vectors and propose a cascaded evaluation which has the property that the full support-vector expansion is only evaluated on the face-like parts of the image, while the largest part of typical images is classified using a single expansion vector (a simpler and more efficient classifier). As a result, only three reduced-set vectors are used, on average, to classify an image patch. Hence, the cascaded evaluation, presented in this paper, offers a thirtyfold speed-up over an evaluation using the full set of reduced-set vectors, which is itself already thirty times faster than classification using all the support vectors.
PDF DOI BibTeX

Empirical Inference Article Insect-inspired estimation of egomotion Franz, M., Chahl, J., Krapp, H. Neural Computation, 16(11):2245-2260, November 2004
Tangential neurons in the fly brain are sensitive to the typical optic flow patterns generated during egomotion. In this study, we examine whether a simplified linear model based on the organization principles in tangential neurons can be used to estimate egomotion from the optic flow. We present a theory for the construction of an estimator consisting of a linear combination of optic flow vectors that incorporates prior knowledge both about the distance distribution of the environment, and about the noise and egomotion statistics of the sensor. The estimator is tested on a gantry carrying an omnidirectional vision sensor. The experiments show that the proposed approach leads to accurate and robust estimates of rotation rates, whereas translation estimates are of reasonable quality, albeit less reliable.
PDF PostScript Web DOI BibTeX

Empirical Inference Technical Report Joint Kernel Maps Weston, J., Schölkopf, B., Bousquet, O., Mann, .., Noble, W. (131), Max-Planck-Institute for Biological Cybernetics, Tübingen, November 2004 PDF BibTeX

Empirical Inference Talk Discrete vs. Continuous: Two Sides of Machine Learning Zhou, D. October 2004
We consider the problem of transductive inference. In many real-world problems, unlabeled data is far easier to obtain than labeled data. Hence transductive inference is very significant in many practical problems. According to Vapnik's point of view, one should predict the function value only on the given points directly rather than a function defined on the whole space, the latter being a more complicated problem. Inspired by this idea, we develop discrete calculus on finite discrete spaces, and then build discrete regularization. A family of transductive algorithms is naturally derived from this regularization framework. We validate the algorithms on both synthetic and real-world data from text/web categorization to bioinformatics problems. A significant by-product of this work is a powerful way of ranking data based on examples including images, documents, proteins and many other kinds of data. This talk is mainly based on the followiing contribution: (1) D. Zhou and B. Sch{\"o}lkopf: Transductive Inference with Graphs, MPI Technical report, August, 2004; (2) D. Zhou, B. Sch{\"o}lkopf and T. Hofmann. Semi-supervised Learning on Directed Graphs. NIPS 2004; (3) D. Zhou, O. Bousquet, T.N. Lal, J. Weston and B. Sch{\"o}lkopf. Learning with Local and Global Consistency. NIPS 2003.
PDF BibTeX

Empirical Inference Poster S-cones contribute to flicker brightness in human vision Wehrhahn, C., Hill, N., Dillenburger, B. 34(174.12), 34th Annual Meeting of the Society for Neuroscience (Neuroscience 2004), October 2004
In the retina of primates three cone types sensitive to short, middle and long wavelengths of light convert photons into electrical signals. Many investigators have presented evidence that, in color normal observers, the signals of cones sensitive to short wavelengths of light (S-cones) do not contribute to the perception of brightness of a colored surface when this is alternated with an achromatic reference (flicker brightness). Other studies indicate that humans do use S-cone signals when performing this task. Common to all these studies is the small number of observers, whose performance data are reported. Considerable variability in the occurrence of cone types across observers has been found, but, to our knowledge, no cone counts exist from larger populations of humans. We reinvestigated how much the S-cones contribute to flicker brightness. 76 color normal observers were tested in a simple psychophysical procedure neutral to the cone type occurence (Teufel & Wehrhahn (2000), JOSA A 17: 994 - 1006). The data show that, in the majority of our observers, S-cones provide input with a negative sign - relative to L- and M-cone contribution - in the task in question. There is indeed considerable between-subject variability such that for 20 out of 76 observers the magnitude of this input does not differ significantly from 0. Finally, we argue that the sign of S-cone contribution to flicker brightness perception by an observer cannot be used to infer the relative sign their contributions to the neuronal signals carrying the information leading to the perception of flicker brightness. We conclude that studies which use only a small number of observers may easily fail to find significant evidence for the small but significant population tendency for the S-cones to contribute to flicker brightness. Our results confirm all earlier results and reconcile their contradictory interpretations.
Web BibTeX

Empirical Inference Proceedings Advanced Lectures on Machine Learning Bousquet, O., von Luxburg, U., Rätsch, G. ML Summer Schools 2003, LNAI 3176:240, Springer, Berlin, Germany, ML Summer Schools, September 2004
Machine Learning has become a key enabling technology for many engineering applications, investigating scientific questions and theoretical problems alike. To stimulate discussions and to disseminate new results, a summer school series was started in February 2002, the documentation of which is published as LNAI 2600. This book presents revised lectures of two subsequent summer schools held in 2003 in Canberra, Australia, and in T{\"u}bingen, Germany. The tutorial lectures included are devoted to statistical learning theory, unsupervised learning, Bayesian inference, and applications in pattern recognition; they provide in-depth overviews of exciting new developments and contain a large number of references. Graduate students, lecturers, researchers and professionals alike will find this book a useful resource in learning and teaching machine learning.
Web BibTeX

Empirical Inference Talk Grundlagen von Support Vector Maschinen und Anwendungen in der Bildverarbeitung Eichhorn, J. September 2004
Invited talk at the workshop "Numerical, Statistical and Discrete Methods in Image Processing" at the TU M{\"u}nchen (in GERMAN)
PDF BibTeX

Empirical Inference Conference Paper Learning Depth From Stereo Sinz, F., Candela, J., BakIr, G., Rasmussen, C., Franz, M. In Pattern Recognition: 26th DAGM Symposium, 26th DAGM Symposium, 245-252, LNCS 3175, (Editors: Rasmussen, C. E., H. H. Bülthoff, B. Schölkopf, M. A. Giese), Springer, Berlin, Germany, 26th DAGM Symposium, September 2004
We compare two approaches to the problem of estimating the depth of a point in space from observing its image position in two different cameras: 1.~The classical photogrammetric approach explicitly models the two cameras and estimates their intrinsic and extrinsic parameters using a tedious calibration procedure; 2.~A generic machine learning approach where the mapping from image to spatial coordinates is directly approximated by a Gaussian Process regression. Our results show that the generic learning approach, in addition to simplifying the procedure of calibration, can lead to higher depth accuracies than classical calibration although no specific domain knowledge is used.
PDF PostScript Web BibTeX

Empirical Inference Conference Paper Modelling Spikes with Mixtures of Factor Analysers Görür, D., Rasmussen, C., Tolias, A., Sinz, F., Logothetis, N. In Pattern Recognition: Proceedings of the 26th DAGM Symposium, Pattern Recognition, 391-398, LNCS 3175, (Editors: Rasmussen, C. E. , H.H. Bülthoff, B. Schölkopf, M.A. Giese), Springer, Berlin, Germany, 26th DAGM Symposium, September 2004
Identifying the action potentials of individual neurons from extracellular recordings, known as spike sorting, is a challenging problem. We consider the spike sorting problem using a generative model,mixtures of factor analysers, which concurrently performs clustering and feature extraction. The most important advantage of this method is that it quantifies the certainty with which the spikes are classified. This can be used as a means for evaluating the quality of clustering and therefore spike isolation. Using this method, nearly simultaneously occurring spikes can also be modelled which is a hard task for many of the spike sorting methods. Furthermore, modelling the data with a generative model allows us to generate simulated data.
PDF PDF DOI BibTeX

Empirical Inference Book Kernel Methods in Computational Biology Schölkopf, B., Tsuda, K., Vert, J. 410, Computational Molecular Biology, MIT Press, Cambridge, MA, USA, August 2004
Modern machine learning techniques are proving to be extremely valuable for the analysis of data in computational biology problems. One branch of machine learning, kernel methods, lends itself particularly well to the difficult aspects of biological data, which include high dimensionality (as in microarray measurements), representation as discrete and structured data (as in DNA or amino acid sequences), and the need to combine heterogeneous sources of information. This book provides a detailed overview of current research in kernel methods and their applications to computational biology. Following three introductory chapters—an introduction to molecular and computational biology, a short review of kernel methods that focuses on intuitive concepts rather than technical details, and a detailed survey of recent applications of kernel methods in computational biology—the book is divided into three sections that reflect three general trends in current research. The first part presents different ideas for the design of kernel functions specifically adapted to various biological data; the second part covers different approaches to learning from heterogeneous data; and the third part offers examples of successful applications of support vector machine methods.
Web BibTeX

Empirical Inference Article Learning kernels from biological networks by maximizing entropy Tsuda, K., Noble, W. Bioinformatics, 20(Suppl. 1):i326-i333, August 2004
Motivation: The diffusion kernel is a general method for computing pairwise distances among all nodes in a graph, based on the sum of weighted paths between each pair of nodes. This technique has been used successfully, in conjunction with kernel-based learning methods, to draw inferences from several types of biological networks. Results: We show that computing the diffusion kernel is equivalent to maximizing the von Neumann entropy, subject to a global constraint on the sum of the Euclidean distances between nodes. This global constraint allows for high variance in the pairwise distances. Accordingly, we propose an alternative, locally constrained diffusion kernel, and we demonstrate that the resulting kernel allows for more accurate support vector machine prediction of protein functional classifications from metabolic and protein–protein interaction networks.
PDF Web BibTeX

Empirical Inference Conference Paper Learning to Find Graph Pre-Images BakIr, G., Zien, A., Tsuda, K. In Pattern Recognition: Proceedings of the 26th DAGM Symposium, Pattern Recognition, 253-261, (Editors: Rasmussen, C. E., H. H. Bülthoff, B. Schölkopf, M. A. Giese), Springer, Berlin, Germany, 26th DAGM Symposium, August 2004
The recent development of graph kernel functions has made it possible to apply well-established machine learning methods to graphs. However, to allow for analyses that yield a graph as a result, it is necessary to solve the so-called pre-image problem: to reconstruct a graph from its feature space representation induced by the kernel. Here, we suggest a practical solution to this problem.
PostScript PDF DOI BibTeX

Empirical Inference Article Masking effect produced by Mach bands on the detection of narrow bars of random polarity Henning, G., Hoddinott, K., Wilson-Smith, Z., Hill, N. Journal of the Optical Society of America, 21(8):1379-1387, A, August 2004 BibTeX

Empirical Inference Proceedings Pattern Recognition: 26th DAGM Symposium, LNCS, Vol. 3175 Rasmussen, C., Bülthoff, H., Giese, M., Schölkopf, B. Proceedings of the 26th Pattern Recognition Symposium (DAGM‘04), 581, Springer, Berlin, Germany, 26th Pattern Recognition Symposium, August 2004 Web DOI BibTeX

Empirical Inference Technical Report Semi-Supervised Induction Yu, K., Tresp, V., Zhou, D. (141), Max Planck Institute for Biological Cybernetics, Tuebingen, Germany, August 2004
Considerable progress was recently achieved on semi-supervised learning, which differs from the traditional supervised learning by additionally exploring the information of the unlabelled examples. However, a disadvantage of many existing methods is that it does not generalize to unseen inputs. This paper investigates learning methods that effectively make use of both labelled and unlabelled data to build predictive functions, which are defined on not just the seen inputs but the whole space. As a nice property, the proposed method allows effcient training and can easily handle new test points. We validate the method based on both toy data and real world data sets.
PDF PDF BibTeX

Empirical Inference Conference Paper Exponential Families for Conditional Random Fields Altun, Y., Smola, A., Hofmann, T. In Proceedings of the 20th Annual Conference on Uncertainty in Artificial Intelligence (UAI 2004), 2-9, (Editors: Chickering, D.M. , J.Y. Halpern), Morgan Kaufmann, San Francisco, CA, USA, 20th Annual Conference on Uncertainty in Artificial Intelligence (UAI 2004), July 2004
In this paper we define conditional random fields in reproducing kernel Hilbert spaces and show connections to Gaussian Process classification. More specifically, we prove decomposition results for undirected graphical models and we give constructions for kernels. Finally we present efficient means of solving the optimization problem using reduced rank decompositions and we show how stationarity can be exploited efficiently in the optimization process.
PDF Web BibTeX

Empirical Inference Technical Report Hilbertian Metrics and Positive Definite Kernels on Probability Measures Hein, M., Bousquet, O. (126), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, July 2004
We investigate the problem of defining Hilbertian metrics resp. positive definite kernels on probability measures, continuing previous work. This type of kernels has shown very good results in text classification and has a wide range of possible applications. In this paper we extend the two-parameter family of Hilbertian metrics of Topsoe such that it now includes all commonly used Hilbertian metrics on probability measures. This allows us to do model selection among these metrics in an elegant and unified way. Second we investigate further our approach to incorporate similarity information of the probability space into the kernel. The analysis provides a better understanding of these kernels and gives in some cases a more efficient way to compute them. Finally we compare all proposed kernels in two text and one image classification problem.
PDF BibTeX

Empirical Inference Technical Report Kernels, Associated Structures and Generalizations Hein, M., Bousquet, O. (127), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, July 2004
This paper gives a survey of results in the mathematical literature on positive definite kernels and their associated structures. We concentrate on properties which seem potentially relevant for Machine Learning and try to clarify some results that have been misused in the literature. Moreover we consider different lines of generalizations of positive definite kernels. Namely we deal with operator-valued kernels and present the general framework of Hilbertian subspaces of Schwartz which we use to introduce kernels which are distributions. Finally indefinite kernels and their associated reproducing kernel spaces are considered.
PDF BibTeX

Empirical Inference Technical Report Object categorization with SVM: kernels for local features Eichhorn, J., Chapelle, O. (137), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, July 2004
In this paper, we propose to combine an efficient image representation based on local descriptors with a Support Vector Machine classifier in order to perform object categorization. For this purpose, we apply kernels defined on sets of vectors. After testing different combinations of kernel / local descriptors, we have been able to identify a very performant one.
PDF BibTeX

Empirical Inference Talk Riemannian Geometry on Graphs and its Application to Ranking and Classification Zhou, D. June 2004
We consider the problem of transductive inference. In many real-world problems, unlabeled data is far easier to obtain than labeled data. Hence transductive inference is very significant in many practical problems. According to Vapnik's point of view, one should predict the function value only on the given points directly rather than a function defined on the whole space, the latter being a more complicated problem. Inspired by this idea, we develop discrete calculus on finite discrete spaces, and then build discrete regularization. A family of transductive algorithms is naturally derived from this regularization framework. We validate the algorithms on both synthetic and real-world data from text/web categorization to bioinformatics problems. A significant by-product of this work is a powerful way of ranking data based on examples including images, documents, proteins and many other kinds of data.
PDF BibTeX

Empirical Inference Proceedings Advances in Neural Information Processing Systems 16: Proceedings of the 2003 Conference Thrun, S., Saul, L., Schölkopf, B. Proceedings of the Seventeenth Annual Conference on Neural Information Processing Systems (NIPS 2003), 1621, MIT Press, Cambridge, MA, USA, 17th Annual Conference on Neural Information Processing Systems (NIPS 2003), June 2004
The annual Neural Information Processing (NIPS) conference is the flagship meeting on neural computation. It draws a diverse group of attendees—physicists, neuroscientists, mathematicians, statisticians, and computer scientists. The presentations are interdisciplinary, with contributions in algorithms, learning theory, cognitive science, neuroscience, brain imaging, vision, speech and signal processing, reinforcement learning and control, emerging technologies, and applications. Only thirty percent of the papers submitted are accepted for presentation at NIPS, so the quality is exceptionally high. This volume contains all the papers presented at the 2003 conference.
Web BibTeX

Empirical Inference Article Distance-Based Classification with Lipschitz Functions von Luxburg, U., Bousquet, O. Journal of Machine Learning Research, 5:669-695, June 2004
The goal of this article is to develop a framework for large margin classification in metric spaces. We want to find a generalization of linear decision functions for metric spaces and define a corresponding notion of margin such that the decision function separates the training points with a large margin. It will turn out that using Lipschitz functions as decision functions, the inverse of the Lipschitz constant can be interpreted as the size of a margin. In order to construct a clean mathematical setup we isometrically embed the given metric space into a Banach space and the space of Lipschitz functions into its dual space. To analyze the resulting algorithm, we prove several representer theorems. They state that there always exist solutions of the Lipschitz classifier which can be expressed in terms of distance functions to training points. We provide generalization bounds for Lipschitz classifiers in terms of the Rademacher complexities of some Lipschitz function classes. The generality of our approach can be seen from the fact that several well-known algorithms are special cases of the Lipschitz classifier, among them the support vector machine, the linear programming machine, and the 1-nearest neighbor classifier.
PDF PostScript PDF BibTeX

Empirical Inference Conference Paper Gaussian Processes in Reinforcement Learning Rasmussen, C., Kuss, M. In Advances in Neural Information Processing Systems 16, Advances in Neural Information Processing Systems 16, 751-759, (Editors: Thrun, S., L. K. Saul, B. Schölkopf), MIT Press, Cambridge, MA, USA, Seventeenth Annual Conference on Neural Information Processing Systems (NIPS 2003), June 2004
We exploit some useful properties of Gaussian process (GP) regression models for reinforcement learning in continuous state spaces and discrete time. We demonstrate how the GP model allows evaluation of the value function in closed form. The resulting policy iteration algorithm is demonstrated on a simple problem with a two dimensional state space. Further, we speculate that the intrinsic ability of GP models to characterise distributions of functions would allow the method to capture entire distributions over future values instead of merely their expectation, which has traditionally been the focus of much of reinforcement learning.
PDF Web BibTeX

Empirical Inference Conference Paper Image Construction by Linear Programming Tsuda, K., Rätsch, G. In Advances in Neural Information Processing Systems 16, Advances in Neural Information Processing Systems 16, 57-64, (Editors: Thrun, S., L.K. Saul, B. Schölkopf), MIT Press, Cambridge, MA, USA, Seventeenth Annual Conference on Neural Information Processing Systems (NIPS 2003), June 2004
A common way of image denoising is to project a noisy image to the subspace of admissible images made for instance by PCA. However, a major drawback of this method is that all pixels are updated by the projection, even when only a few pixels are corrupted by noise or occlusion. We propose a new method to identify the noisy pixels by 1-norm penalization and update the identified pixels only. The identification and updating of noisy pixels are formulated as one linear program which can be solved efficiently. Especially, one can apply the ν-trick to directly specify the fraction of pixels to be reconstructed. Moreover, we extend the linear program to be able to exploit prior knowledge that occlusions often appear in contiguous blocks (e.g. sunglasses on faces). The basic idea is to penalize boundary points and interior points of the occluded area differently. We are able to show the ν-property also for this extended LP leading a method which is easy to use. Experimental results impressively demonstrate the power of our approach.
PDF Web BibTeX

Empirical Inference Conference Paper Insights from Machine Learning Applied to Human Visual Classification Graf, A., Wichmann, F. In Advances in Neural Information Processing Systems 16, Advances in Neural Information Processing Systems 16, 905-912, (Editors: Thrun, S., L. Saul, B. Schölkopf), MIT Press, Cambridge, MA, USA, Seventeenth Annual Conference on Neural Information Processing Systems (NIPS 2003), June 2004
We attempt to understand visual classification in humans using both psychophysical and machine learning techniques. Frontal views of human faces were used for a gender classification task. Human subjects classified the faces and their gender judgment, reaction time and confidence rating were recorded. Several hyperplane learning algorithms were used on the same classification task using the Principal Components of the texture and flowfield representation of the faces. The classification performance of the learning algorithms was estimated using the face database with the true gender of the faces as labels, and also with the gender estimated by the subjects. We then correlated the human responses to the distance of the stimuli to the separating hyperplane of the learning algorithms. Our results suggest that human classification can be modeled by some hyperplane algorithms in the feature space we used. For classification, the brain needs more processing for stimuli close to that hyperplane than for those further away.
PDF Web BibTeX

Empirical Inference Conference Paper Learning to Find Pre-Images Bakir, G., Weston, J., Schölkopf, B. In Advances in Neural Information Processing Systems, Advances in Neural Information Processing Systems 16, 449-456, (Editors: S Thrun and LK Saul and B Schölkopf), MIT Press, Cambridge, MA, USA, 17th Annual Conference on Neural Information Processing Systems (NIPS 2003), June 2004
We consider the problem of reconstructing patterns from a feature map. Learning algorithms using kernels to operate in a reproducing kernel Hilbert space (RKHS) express their solutions in terms of input points mapped into the RKHS. We introduce a technique based on kernel principal component analysis and regression to reconstruct corresponding patterns in the input space (aka pre-images) and review its performance in several applications requiring the construction of pre-images. The introduced technique avoids difficult and/or unstable numerical optimization, is easy to implement and, unlike previous methods, permits the computation of pre-images in discrete input spaces.
PDF Web BibTeX

Empirical Inference Conference Paper Learning with Local and Global Consistency Zhou, D., Bousquet, O., Lal, T., Weston, J., Schölkopf, B. In Advances in Neural Information Processing Systems, Advances in Neural Information Processing Systems 16, 321-328, (Editors: S Thrun and LK Saul and B Schölkopf), MIT Press, Cambridge, MA, USA, 17th Annual Conference on Neural Information Processing Systems (NIPS 2003), June 2004
We consider the general problem of learning from labeled and unlabeled data, which is often called semi-supervised learning or transductive inference. A principled approach to semi-supervised learning is to design a classifying function which is sufficiently smooth with respect to the intrinsic structure collectively revealed by known labeled and unlabeled points. We present a simple algorithm to obtain such a smooth solution. Our method yields encouraging experimental results on a number of classification problems and demonstrates effective use of unlabeled data.
PDF Web BibTeX

Empirical Inference Conference Paper Measure Based Regularization Bousquet, O., Chapelle, O., Hein, M. In Advances in Neural Information Processing Systems, Advances in Neural Information Processing Systems 16, 1221-1228, (Editors: Thrun, S., L. Saul, B. Schölkopf), MIT Press, Cambridge, MA, USA, Seventeenth Annual Conference on Neural Information Processing Systems (NIPS 2003), June 2004
We address in this paper the question of how the knowledge of the marginal distribution $P(x)$ can be incorporated in a learning algorithm. We suggest three theoretical methods for taking into account this distribution for regularization and provide links to existing graph-based semi-supervised learning algorithms. We also propose practical implementations.
PDF Web BibTeX

Empirical Inference Conference Paper PAC-Bayesian Generic Chaining Audibert, J., Bousquet, O. In Advances in Neural Information Processing Systems 16, Advances in Neural Information Processing Systems 16, 1125-1132 , (Editors: Thrun, S., L.K. Saul, B. Schölkopf), MIT Press, Cambridge, MA, USA, Seventeenth Annual Conference on Neural Information Processing Systems (NIPS 2003), June 2004
There exist many different generalization error bounds for classification. Each of these bounds contains an improvement over the others for certain situations. Our goal is to combine these different improvements into a single bound. In particular we combine the PAC-Bayes approach introduced by McAllester, which is interesting for averaging classifiers, with the optimal union bound provided by the generic chaining technique developed by Fernique and Talagrand. This combination is quite natural since the generic chaining is based on the notion of majorizing measures, which can be considered as priors on the set of classifiers, and such priors also arise in the PAC-bayesian setting.
PDF Web BibTeX

Empirical Inference Conference Paper Prediction on Spike Data Using Kernel Algorithms Eichhorn, J., Tolias, A., Zien, A., Kuss, M., Rasmussen, C., Weston, J., Logothetis, N., Schölkopf, B. In Advances in Neural Information Processing Systems 16: Proceedings of the 2003 Conference, Advances in Neural Information Processing Systems 16, 1367-1374, (Editors: S Thrun and LK Saul and B Schölkopf), MIT Press, Cambridge, MA, USA, 17th Annual Conference on Neural Information Processing Systems (NIPS 2003), June 2004
We report and compare the performance of different learning algorithms based on data from cortical recordings. The task is to predict the orientation of visual stimuli from the activity of a population of simultaneously recorded neurons. We compare several ways of improving the coding of the input (i.e., the spike data) as well as of the output (i.e., the orientation), and report the results obtained using different kernel algorithms.
PDF Web BibTeX

Empirical Inference Conference Paper Ranking on Data Manifolds Zhou, D., Weston, J., Gretton, A., Bousquet, O., Schölkopf, B. In Advances in Neural Information Processing Systems, Advances in neural information processing systems 16, 169-176, (Editors: S Thrun and L Saul and B Schölkopf), MIT Press, Cambridge, MA, USA, 17th Annual Conference on Neural Information Processing Systems (NIPS 2003), June 2004
The Google search engine has enjoyed a huge success with its web page ranking algorithm, which exploits global, rather than local, hyperlink structure of the web using random walks. Here we propose a simple universal ranking algorithm for data lying in the Euclidean space, such as text or image data. The core idea of our method is to rank the data with respect to the intrinsic manifold structure collectively revealed by a great amount of data. Encouraging experimental results from synthetic, image, and text data illustrate the validity of our method.
PDF Web BibTeX

Empirical Inference Conference Paper Semi-Supervised Protein Classification using Cluster Kernels Weston, J., Leslie, C., Zhou, D., Elisseeff, A., Noble, W. In Advances in Neural Information Processing Systems, Advances in Neural Information Processing Systems 16, 595-602, (Editors: Thrun, S., L.K. Saul, B. Schölkopf), MIT Press, Cambridge, MA, USA, Seventeenth Annual Conference on Neural Information Processing Systems (NIPS 2003), June 2004
A key issue in supervised protein classification is the representation of input sequences of amino acids. Recent work using string kernels for protein data has achieved state-of-the-art classification performance. However, such representations are based only on labeled data --- examples with known 3D structures, organized into structural classes --- while in practice, unlabeled data is far more plentiful. In this work, we develop simple and scalable cluster kernel techniques for incorporating unlabeled data into the representation of protein sequences. We show that our methods greatly improve the classification performance of string kernels and outperform standard approaches for using unlabeled data, such as adding close homologs of the positive examples to the training data. We achieve equal or superior performance to previously presented cluster kernel methods while achieving far greater computational efficiency.
PDF Web BibTeX

Empirical Inference Article Support Vector Channel Selection in BCI Lal, T., Schröder, M., Hinterberger, T., Weston, J., Bogdan, M., Birbaumer, N., Schölkopf, B. IEEE Transactions on Biomedical Engineering, 51(6):1003-1010, June 2004
Designing a Brain Computer Interface (BCI) system one can choose from a variety of features that may be useful for classifying brain activity during a mental task. For the special case of classifying EEG signals we propose the usage of the state of the art feature selection algorithms Recursive Feature Elimination and Zero-Norm Optimization which are based on the training of Support Vector Machines (SVM). These algorithms can provide more accurate solutions than standard filter methods for feature selection. We adapt the methods for the purpose of selecting EEG channels. For a motor imagery paradigm we show that the number of used channels can be reduced significantly without increasing the classification error. The resulting best channels agree well with the expected underlying cortical activity patterns during the mental tasks. Furthermore we show how time dependent task specific information can be visualized.
DOI BibTeX

Empirical Inference Conference Paper Warped Gaussian Processes Snelson, E., Rasmussen, C., Ghahramani, Z. In Advances in Neural Information Processing Systems 16, Advances in Neural Information Processing Systems 16, 337-344, (Editors: Thrun, S., L.K. Saul, B. Schölkopf), MIT Press, Cambridge, MA, USA, Seventeenth Annual Conference on Neural Information Processing Systems (NIPS 2003), June 2004
We generalise the Gaussian process (GP) framework for regression by learning a nonlinear transformation of the GP outputs. This allows for non-Gaussian processes and non-Gaussian noise. The learning algorithm chooses a nonlinear transformation such that transformed data is well-modelled by a GP. This can be seen as including a preprocessing transformation as an integral part of the probabilistic modelling problem, rather than as an ad-hoc step. We demonstrate on several real regression problems that learning the transformation can lead to significantly better performance than using a regular GP, or a GP with a fixed transformation.
PDF Web BibTeX

Empirical Inference Book Chapter Distributed Command Execution Stark, S., Berlin, M. In BSD Hacks: 100 industrial-strength tips & tools, 152-152, (Editors: Lavigne, Dru), O’Reilly, Beijing, May 2004
Often you want to execute a command not only on one computer, but on several at once. For example, you might want to report the current statistics on a group of managed servers or update all of your web servers at once.
BibTeX

Empirical Inference Conference Paper Kernel Hebbian Algorithm for single-frame super-resolution Kim, K., Franz, M., Schölkopf, B. In Statistical Learning in Computer Vision (SLCV 2004), Computer Vision - ECCV 2004, LNCS vol. 3024, 135-149, (Editors: A Leonardis and H Bischof), Springer, Berlin, Germany, 8th European Conference on Computer Vision (ECCV 2004), May 2004
This paper presents a method for single-frame image super-resolution using an unsupervised learning technique. The required prior knowledge about the high-resolution images is obtained from Kernel Principal Component Analysis (KPCA). The original form of KPCA, however, can be only applied to strongly restricted image classes due to the limited number of training examples that can be processed. We therefore propose a new iterative method for performing KPCA, the {em Kernel Hebbian Algorithm}. By kernelizing the Generalized Hebbian Algorithm, one can iteratively estimate the Kernel Principal Components with only linear order memory complexity. The resulting super-resolution algorithm shows a comparable performance to the existing supervised methods on images containing faces and natural scenes.
PDF Web BibTeX

Empirical Inference Article A Compression Approach to Support Vector Model Selection von Luxburg, U., Bousquet, O., Schölkopf, B. Journal of Machine Learning Research, 5:293-323, April 2004
In this paper we investigate connections between statistical learning theory and data compression on the basis of support vector machine (SVM) model selection. Inspired by several generalization bounds we construct "compression coefficients" for SVMs which measure the amount by which the training labels can be compressed by a code built from the separating hyperplane. The main idea is to relate the coding precision to geometrical concepts such as the width of the margin or the shape of the data in the feature space. The so derived compression coefficients combine well known quantities such as the radius-margin term R^2/rho^2, the eigenvalues of the kernel matrix, and the number of support vectors. To test whether they are useful in practice we ran model selection experiments on benchmark data sets. As a result we found that compression coefficients can fairly accurately predict the parameters for which the test error is minimized.
PDF BibTeX

Empirical Inference Article cDNA-Microarray Technology in Cartilage Research - Functional Genomics of Osteoarthritis [in German] Aigner, T., Finger, F., Zien, A., Bartnik, E. Zeitschrift f{\"u}r Orthop{\"a}die und ihre Grenzgebiete, 142(2):241-247, April 2004
Functional genomics represents a new challenging approach in order to analyze complex diseases such as osteoarthritis on a molecular level. The characterization of the molecular changes of the cartilage cells, the chondrocytes, enables a better understanding of the pathomechanisms of the disease. In particular, the identification and characterization of new target molecules for therapeutic intervention is of interest. Also, potential molecular markers for diagnosis and monitoring of osteoarthritis contribute to a more appropriate patient management. The DNA-microarray technology complements (but does not replace) biochemical and biological research in new disease-relevant genes. Large-scale functional genomics will identify molecular networks such as yet identified players in the anabolic-catabolic balance of articular cartilage as well as disease-relevant intracellular signaling cascades so far rather unknown in articular chondrocytes. However, at the moment it is also important to recognize the limitations of the microarray technology in order to avoid over-interpretation of the results. This might lead to misleading results and prevent to a significant extent a proper use of the potential of this technology in the field of osteoarthritis.
BibTeX

Empirical Inference Technical Report Kamerakalibrierung und Tiefenschätzung: Ein Vergleich von klassischer Bündelblockausgleichung und statistischen Lernalgorithmen Sinz, F. Wilhelm-Schickard-Institut für Informatik, Universität Tübingen, Tübingen, Germany, March 2004
Die Arbeit verleicht zwei Herangehensweisen an das Problem der Sch{\"a}tzung der r{\"a}umliche Position eines Punktes aus den Bildkoordinaten in zwei verschiedenen Kameras. Die klassische Methode der B{\"u}ndelblockausgleichung modelliert zwei Einzelkameras und sch{\"a}tzt deren {\"a}ußere und innere Orientierung mit einer iterativen Kalibrationsmethode, deren Konvergenz sehr stark von guten Startwerten abh{\"a}ngt. Die Tiefensch{\"a}tzung eines Punkts geschieht durch die Invertierung von drei der insgesamt vier Projektionsgleichungen der Einzalkameramodelle. Die zweite Methode benutzt Kernel Ridge Regression und Support Vector Regression, um direkt eine Abbildung von den Bild- auf die Raumkoordinaten zu lernen. Die Resultate zeigen, daß der Ansatz mit maschinellem Lernen, neben einer erheblichen Vereinfachung des Kalibrationsprozesses, zu h{\"o}heren Positionsgenaugikeiten f{\"u}hren kann.
PDF BibTeX

Empirical Inference Poster EEG Channel Selection for Brain Computer Interface Systems Based on Support Vector Methods Schröder, M., Lal, T., Bogdan, M., Schölkopf, B. 7:50, (Editors: Bülthoff, H.H., H.A. Mallot, R. Ulrich and F.A. Wichmann), 7th T{\"u}bingen Perception Conference (TWK 2004), February 2004
A Brain Computer Interface (BCI) system allows the direct interpretation of brain activity patterns (e.g. EEG signals) by a computer. Typical BCI applications comprise spelling aids or environmental control systems supporting paralyzed patients that have lost motor control completely. The design of an EEG based BCI system requires good answers for the problem of selecting useful features during the performance of a mental task as well as for the problem of classifying these features. For the special case of choosing appropriate EEG channels from several available channels, we propose the application of variants of the Support Vector Machine (SVM) for both problems. Although these algorithms do not rely on prior knowledge they can provide more accurate solutions than standard lter methods [1] for feature selection which usually incorporate prior knowledge about neural activity patterns during the performed mental tasks. For judging the importance of features we introduce a new relevance measure and apply it to EEG channels. Although we base the relevance measure for this purpose on the previously introduced algorithms, it does in general not depend on specic algorithms but can be derived using arbitrary combinations of feature selectors and classifiers.
Web BibTeX