Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Empirical Inference Technical Report Approximate Inference for Robust Gaussian Process Regression Kuss, M., Pfingsten, T., Csato, L., Rasmussen, C. (136), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 2005
Gaussian process (GP) priors have been successfully used in non-parametric Bayesian regression and classification models. Inference can be performed analytically only for the regression model with Gaussian noise. For all other likelihood models inference is intractable and various approximation techniques have been proposed. In recent years expectation-propagation (EP) has been developed as a general method for approximate inference. This article provides a general summary of how expectation-propagation can be used for approximate inference in Gaussian process models. Furthermore we present a case study describing its implementation for a new robust variant of Gaussian process regression. To gain further insights into the quality of the EP approximation we present experiments in which we compare to results obtained by Markov chain Monte Carlo (MCMC) sampling.
PDF BibTeX

Autonomous Motion Empirical Inference Conference Paper Comparative experiments on task space control with redundancy resolution Nakanishi, J., Cory, R., Mistry, M., Peters, J., Schaal, S. In Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, 3901-3908, Edmonton, Alberta, Canada, Aug. 2-6, IROS, 2005, clmc
Understanding the principles of motor coordination with redundant degrees of freedom still remains a challenging problem, particularly for new research in highly redundant robots like humanoids. Even after more than a decade of research, task space control with redundacy resolution still remains an incompletely understood theoretical topic, and also lacks a larger body of thorough experimental investigation on complex robotic systems. This paper presents our first steps towards the development of a working redundancy resolution algorithm which is robust against modeling errors and unforeseen disturbances arising from contact forces. To gain a better understanding of the pros and cons of different approaches to redundancy resolution, we focus on a comparative empirical evaluation. First, we review several redundancy resolution schemes at the velocity, acceleration and torque levels presented in the literature in a common notational framework and also introduce some new variants of these previous approaches. Second, we present experimental comparisons of these approaches on a seven-degree-of-freedom anthropomorphic robot arm. Surprisingly, one of our simplest algorithms empirically demonstrates the best performance, despite, from a theoretical point, the algorithm does not share the same beauty as some of the other methods. Finally, we discuss practical properties of these control algorithms, particularly in light of inevitable modeling errors of the robot dynamics.
DOI URL BibTeX

Empirical Inference Conference Paper From Graphs to Manifolds - Weak and Strong Pointwise Consistency of Graph Laplacians Hein, M., Audibert, J., von Luxburg, U. In Proceedings of the 18th Conference on Learning Theory (COLT), 470-485, Conference on Learning Theory, 2005, Student Paper Award
In the machine learning community it is generally believed that graph Laplacians corresponding to a finite sample of data points converge to a continuous Laplace operator if the sample size increases. Even though this assertion serves as a justification for many Laplacian-based algorithms, so far only some aspects of this claim have been rigorously proved. In this paper we close this gap by establishing the strong pointwise consistency of a family of graph Laplacians with data-dependent weights to some weighted Laplace operator. Our investigation also includes the important case where the data lies on a submanifold of $R^d$.
PDF BibTeX

Empirical Inference Poster Global image statistics of natural scenes Drewes, J., Wichmann, F., Gegenfurtner, K. Bioinspired Information Processing, 08:1, 2005 BibTeX

Empirical Inference Conference Paper Healing the Relevance Vector Machine through Augmentation Rasmussen, C., Candela, J. In Proceedings of the 22nd International Conference on Machine Learning, 689 , (Editors: De Raedt, L. , S. Wrobel), ICML, 2005
The Relevance Vector Machine (RVM) is a sparse approximate Bayesian kernel method. It provides full predictive distributions for test cases. However, the predictive uncertainties have the unintuitive property, that emph{they get smaller the further you move away from the training cases}. We give a thorough analysis. Inspired by the analogy to non-degenerate Gaussian Processes, we suggest augmentation to solve the problem. The purpose of the resulting model, RVM*, is primarily to corroborate the theoretical and experimental analysis. Although RVM* could be used in practical applications, it is no longer a truly sparse model. Experiments show that sparsity comes at the expense of worse predictive distributions.
PDF PostScript BibTeX

Empirical Inference Conference Paper Implicit Surface Modelling as an Eigenvalue Problem Walder, C., Chapelle, O., Schölkopf, B. In Proceedings of the 22nd International Conference on Machine Learning, 937-944, (Editors: L De Raedt and S Wrobel), ACM, New York, NY, USA, ICML, 2005
We discuss the problem of fitting an implicit shape model to a set of points sampled from a co-dimension one manifold of arbitrary topology. The method solves a non-convex optimisation problem in the embedding function that defines the implicit by way of its zero level set. By assuming that the solution is a mixture of radial basis functions of varying widths we attain the globally optimal solution by way of an equivalent eigenvalue problem, without using or constructing as an intermediate step the normal vectors of the manifold at each data point. We demonstrate the system on two and three dimensional data, with examples of missing data interpolation and set operations on the resultant shapes.
PDF BibTeX

Empirical Inference Conference Paper Intrinsic Dimensionality Estimation of Submanifolds in Euclidean space Hein, M., Audibert, Y. In Proceedings of the 22nd International Conference on Machine Learning, 289 , (Editors: De Raedt, L. , S. Wrobel), ICML Bonn, 2005
We present a new method to estimate the intrinsic dimensionality of a submanifold M in Euclidean space from random samples. The method is based on the convergence rates of a certain U-statistic on the manifold. We solve at least partially the question of the choice of the scale of the data. Moreover the proposed method is easy to implement, can handle large data sets and performs very well even for small sample sizes. We compare the proposed method to two standard estimators on several artificial as well as real data sets.
PDF BibTeX

Empirical Inference Article Invariance of Neighborhood Relation under Input Space to Feature Space Mapping Shin, H., Cho, S. Pattern Recognition Letters, 26(6):707-718, 2005
If the training pattern set is large, it takes a large memory and a long time to train support vector machine (SVM). Recently, we proposed neighborhood property based pattern selection algorithm (NPPS) which selects only the patterns that are likely to be near the decision boundary ahead of SVM training [Proc. of the 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Lecture Notes in Artificial Intelligence (LNAI 2637), Seoul, Korea, pp. 376–387]. NPPS tries to identify those patterns that are likely to become support vectors in feature space. Preliminary reports show its effectiveness: SVM training time was reduced by two orders of magnitude with almost no loss in accuracy for various datasets. It has to be noted, however, that decision boundary of SVM and support vectors are all defined in feature space while NPPS described above operates in input space. If neighborhood relation in input space is not preserved in feature space, NPPS may not always be effective. In this paper, we sh ow that the neighborhood relation is invariant under input to feature space mapping. The result assures that the patterns selected by NPPS in input space are likely to be located near decision boundary in feature space.
PDF PDF BibTeX

Empirical Inference Conference Paper Joint Kernel Maps Weston, J., Schölkopf, B., Bousquet, O. In Proceedings of the 8th International Work-Conference on Artificial Neural Networks (Computational Intelligence and Bioinspired System), Proceedings of the 8th InternationalWork-Conference on Artificial Neural Networks, LNCS 3512:176-191, (Editors: J Cabestany and A Prieto and F Sandoval), Springer, Berlin Heidelberg, Germany, IWANN, 2005
We develop a methodology for solving high dimensional dependency estimation problems between pairs of data types, which is viable in the case where the output of interest has very high dimension, e.g., thousands of dimensions. This is achieved by mapping the objects into continuous or discrete spaces, using joint kernels. Known correlations between input and output can be defined by such kernels, some of which can maintain linearity in the outputs to provide simple (closed form) pre-images. We provide examples of such kernels and empirical results.
PostScript DOI BibTeX

Empirical Inference Poster Kernel-Methods, Similarity, and Exemplar Theories of Categorization Jäkel, F., Wichmann, F. ASIC, 4, 2005
Kernel-methods are popular tools in machine learning and statistics that can be implemented in a simple feed-forward neural network. They have strong connections to several psychological theories. For example, Shepard‘s universal law of generalization can be given a kernel interpretation. This leads to an inner product and a metric on the psychological space that is different from the usual Minkowski norm. The metric has psychologically interesting properties: It is bounded from above and does not have additive segments. As categorization models often rely on Shepard‘s law as a model for psychological similarity some of them can be recast as kernel-methods. In particular, ALCOVE is shown to be closely related to kernel logistic regression. The relationship to the Generalized Context Model is also discussed. It is argued that functional analysis which is routinely used in machine learning provides valuable insights also for psychology.
Web BibTeX

Empirical Inference Conference Paper Large Scale Genomic Sequence SVM Classifiers Sonnenburg, S., Rätsch, G., Schölkopf, B. In Proceedings of the 22nd International Conference on Machine Learning, Proceedings of the 22nd International Conference on Machine Learning, 849-856, (Editors: L De Raedt and S Wrobel), ACM, New York, NY, USA, ICML, 2005
In genomic sequence analysis tasks like splice site recognition or promoter identification, large amounts of training sequences are available, and indeed needed to achieve sufficiently high classification performances. In this work we study two recently proposed and successfully used kernels, namely the Spectrum kernel and the Weighted Degree kernel (WD). In particular, we suggest several extensions using Suffix Trees and modi cations of an SMO-like SVM training algorithm in order to accelerate the training of the SVMs and their evaluation on test sequences. Our simulations show that for the spectrum kernel and WD kernel, large scale SVM training can be accelerated by factors of 20 and 4 times, respectively, while using much less memory (e.g. no kernel caching). The evaluation on new sequences is often several thousand times faster using the new techniques (depending on the number of Support Vectors). Our method allows us to train on sets as large as one million sequences.
PDF BibTeX

Empirical Inference Conference Paper Long Term Prediction of Product Quality in a Glass Manufacturing Process Using a Kernel Based Approach Jung, T., Herrera, L., Schölkopf, B. In Proceedings of the 8th International Work-Conference on Artificial Neural Networks (Computational Intelligence and Bioinspired Systems), Proceedings of the 8th International Work-Conferenceon Artificial Neural Networks (Computational Intelligence and Bioinspired Systems), Lecture Notes in Computer Science, Vol. 3512, LNCS 3512:960-967, (Editors: J Cabestany and A Prieto and F Sandoval), Springer, Berlin Heidelberg, Germany, IWANN, 2005
In this paper we report the results obtained using a kernel-based approach to predict the temporal development of four response signals in the process control of a glass melting tank with 16 input parameters. The data set is a revised version1 from the modelling challenge in EUNITE-2003. The central difficulties are: large time-delays between changes in the inputs and the outputs, large number of data, and a general lack of knowledge about the relevant variables that intervene in the process. The methodology proposed here comprises Support Vector Machines (SVM) and Regularization Networks (RN). We use the idea of sparse approximation both as a means of regularization and as a means of reducing the computational complexity. Furthermore, we will use an incremental approach to add new training examples to the kernel-based method and efficiently update the current solution. This allows us to use a sophisticated learning scheme, where we iterate between prediction and training, with good computational efficiency and satisfactory results.
DOI BibTeX

Empirical Inference Technical Report Maximum-Margin Feature Combination for Detection and Categorization BakIr, G., Wu, M., Eichhorn, J. Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 2005
In this paper we are concerned with the optimal combination of features of possibly different types for detection and estimation tasks in machine vision. We propose to combine features such that the resulting classifier maximizes the margin between classes. In contrast to existing approaches which are non-convex and/or generative we propose to use a discriminative model leading to convex problem formulation and complexity control. Furthermore we assert that decision functions should not compare apples and oranges by comparing features of different types directly. Instead we propose to combine different similarity measures for each different feature type. Furthermore we argue that the question: ”Which feature type is more discriminative for task X?” is ill-posed and show empirically that the answer to this question might depend on the complexity of the decision function.
PDF BibTeX

Empirical Inference Article Moment Inequalities for Functions of Independent Random Variables Boucheron, S., Bousquet, O., Lugosi, G., Massart, P. To appear in Annals of Probability, 33:514-560, 2005
A general method for obtaining moment inequalities for functions of independent random variables is presented. It is a generalization of the entropy method which has been used to derive concentration inequalities for such functions cite{BoLuMa01}, and is based on a generalized tensorization inequality due to Lata{l}a and Oleszkiewicz cite{LaOl00}. The new inequalities prove to be a versatile tool in a wide range of applications. We illustrate the power of the method by showing how it can be used to effortlessly re-derive classical inequalities including Rosenthal and Kahane-Khinchine-type inequalities for sums of independent random variables, moment inequalities for suprema of empirical processes, and moment inequalities for Rademacher chaos and $U$-statistics. Some of these corollaries are apparently new. In particular, we generalize Talagrands exponential inequality for Rademacher chaos of order two to any order. We also discuss applications for other complex functions of independent random variables, such as suprema of boolean polynomials which include, as special cases, subgraph counting problems in random graphs.
PDF BibTeX

Autonomous Motion Empirical Inference Conference Paper Natural Actor-Critic Peters, J., Vijayakumar, S., Schaal, S. In Proceedings of the 16th European Conference on Machine Learning, 3720:280-291, (Editors: Gama, J.;Camacho, R.;Brazdil, P.;Jorge, A.;Torgo, L.), Springer, ECML, 2005, clmc
This paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing AmariÕs natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by linear regres- sion. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen policy representation, and can be estimated more efficiently than regular policy gradients. The critic makes use of a special basis function parameterization motivated by the policy-gradient compatible function approximation. We show that several well-known reinforcement learning methods such as the original Actor-Critic and BradtkeÕs Linear Quadratic Q-Learning are in fact Natural Actor-Critic algorithms. Em- pirical evaluations illustrate the effectiveness of our techniques in com- parison to previous methods, and also demonstrate their applicability for learning control on an anthropomorphic robot arm.
DOI URL BibTeX

Empirical Inference Conference Paper Object correspondence as a machine learning problem Schölkopf, B., Steinke, F., Blanz, V. In Proceedings of the 22nd International Conference on Machine Learning, 777-784, (Editors: L De Raedt and S Wrobel), ACM, New York, NY, USA, ICML, 2005
We propose machine learning methods for the estimation of deformation fields that transform two given objects into each other, thereby establishing a dense point to point correspondence. The fields are computed using a modified support vector machine containing a penalty enforcing that points of one object will be mapped to ``similar‘‘ points on the other one. Our system, which contains little engineering or domain knowledge, delivers state of the art performance. We present application results including close to photorealistic morphs of 3D head models.
PDF BibTeX

Empirical Inference Conference Paper Propagating Distributions on a Hypergraph by Dual Information Regularization Tsuda, K. In Proceedings of the 22nd International Conference on Machine Learning, 921 , (Editors: De Raedt, L. , S. Wrobel), ICML Bonn, 2005
In the information regularization framework by Corduneanu and Jaakkola (2005), the distributions of labels are propagated on a hypergraph for semi-supervised learning. The learning is efficiently done by a Blahut-Arimoto-like two step algorithm, but, unfortunately, one of the steps cannot be solved in a closed form. In this paper, we propose a dual version of information regularization, which is considered as more natural in terms of information geometry. Our learning algorithm has two steps, each of which can be solved in a closed form. Also it can be naturally applied to exponential family distributions such as Gaussians. In experiments, our algorithm is applied to protein classification based on a metabolic network and known functional categories.
BibTeX

Empirical Inference Poster Rapid animal detection in natural scenes: critical features are local Wichmann, F., Rosas, P., Gegenfurtner, K. Experimentelle Psychologie. Beitr{\"a}ge zur 47. Tagung experimentell arbeitender Psychologen, 47:225, 2005 BibTeX

Empirical Inference Article Robust EEG Channel Selection Across Subjects for Brain Computer Interfaces Schröder, M., Lal, T., Hinterberger, T., Bogdan, M., Hill, J., Birbaumer, N., Rosenstiel, W., Schölkopf, B. EURASIP Journal on Applied Signal Processing, 2005(19, Special Issue: Trends in Brain Computer Interfaces):3103-3112, (Editors: Vesin, J. M., T. Ebrahimi), 2005
Most EEG-based Brain Computer Interface (BCI) paradigms come along with specific electrode positions, e.g.~for a visual based BCI electrode positions close to the primary visual cortex are used. For new BCI paradigms it is usually not known where task relevant activity can be measured from the scalp. For individual subjects Lal et.~al showed that recording positions can be found without the use of prior knowledge about the paradigm used. However it remains unclear to what extend their method of Recursive Channel Elimination (RCE) can be generalized across subjects. In this paper we transfer channel rankings from a group of subjects to a new subject. For motor imagery tasks the results are promising, although cross-subject channel selection does not quite achieve the performance of channel selection on data of single subjects. Although the RCE method was not provided with prior knowledge about the mental task, channels that are well known to be important (from a physiological point of view) were consistently selected whereas task-irrelevant channels were reliably disregarded.
Web DOI BibTeX

Empirical Inference Book Chapter Support Vector Machines and Kernel Algorithms Schölkopf, B., Smola, A. In Encyclopedia of Biostatistics (2nd edition), Vol. 8, 8:5328-5335, (Editors: P Armitage and T Colton), John Wiley & Sons, NY USA, 2005 BibTeX

Empirical Inference Poster The human brain as large margin classifier Graf, A., Wichmann, F., Bülthoff, H., Schölkopf, B. Proceedings of the Computational & Systems Neuroscience Meeting (COSYNE), 2:1, 2005 BibTeX

Empirical Inference Article Theory of Classification: A Survey of Some Recent Advances Boucheron, S., Bousquet, O., Lugosi, G. ESAIM: Probability and Statistics, 9:323 , 2005
The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have lead to these important recent developments.
PDF DOI BibTeX

Empirical Inference Technical Report Towards a Statistical Theory of Clustering. Presented at the PASCAL workshop on clustering, London von Luxburg, U., Ben-David, S. Presented at the PASCAL workshop on clustering, London, 2005
The goal of this paper is to discuss statistical aspects of clustering in a framework where the data to be clustered has been sampled from some unknown probability distribution. Firstly, the clustering of the data set should reveal some structure of the underlying data rather than model artifacts due to the random sampling process. Secondly, the more sample points we have, the more reliable the clustering should be. We discuss which methods can and cannot be used to tackle those problems. In particular we argue that generalization bounds as they are used in statistical learning theory of classification are unsuitable in a general clustering framework. We suggest that the main replacements of generalization bounds should be convergence proofs and stability considerations. This paper should be considered as a road map paper which identifies important questions and potentially fruitful directions for future research about statistical clustering. We do not attempt to present a complete statistical theory of clustering.
PDF BibTeX

Empirical Inference Book Chapter Visual perception I: Basic principles Wagemans, J., Wichmann, F., de Beeck, H. In Handbook of Cognition, 3-47, (Editors: Lamberts, K. , R. Goldstone), Sage, London, 2005 BibTeX

Empirical Inference Conference Paper Attentional Modulation of Auditory Event-Related Potentials in a Brain-Computer Interface Hill, J., Lal, T., Bierig, K., Birbaumer, N., Schölkopf, B. In Proceedings of the 2004 IEEE International Workshop on Biomedical Circuits and Systems (BioCAS04), BioCAS04, (S3/5/INV- S3/17-20)4, IEEE Computer Society, Los Alamitos, CA, USA, 2004 IEEE International Workshop on Biomedical Circuits and Systems, December 2004
Motivated by the particular problems involved in communicating with "locked-in" paralysed patients, we aim to develop a brain-computer interface that uses auditory stimuli. We describe a paradigm that allows a user to make a binary decision by focusing attention on one of two concurrent auditory stimulus sequences. Using Support Vector Machine classification and Recursive Channel Elimination on the independent components of averaged event-related potentials, we show that an untrained user‘s EEG data can be classified with an encouragingly high level of accuracy. This suggests that it is possible for users to modulate EEG signals in a single trial by the conscious direction of attention, well enough to be useful in BCI.
PDF Web DOI BibTeX

Empirical Inference Article On the representation, learning and transfer of spatio-temporal movement characteristics Ilg, W., Bakir, G., Mezger, J., Giese, M. International Journal of Humanoid Robotics, 1(4):613-636, December 2004 BibTeX

Empirical Inference Article Efficient face detection by a cascaded support-vector machine expansion Romdhani, S., Torr, P., Schölkopf, B., Blake, A. Proceedings of The Royal Society of London A, 460(2501):3283-3297, A, November 2004
We describe a fast system for the detection and localization of human faces in images using a nonlinear ‘support-vector machine‘. We approximate the decision surface in terms of a reduced set of expansion vectors and propose a cascaded evaluation which has the property that the full support-vector expansion is only evaluated on the face-like parts of the image, while the largest part of typical images is classified using a single expansion vector (a simpler and more efficient classifier). As a result, only three reduced-set vectors are used, on average, to classify an image patch. Hence, the cascaded evaluation, presented in this paper, offers a thirtyfold speed-up over an evaluation using the full set of reduced-set vectors, which is itself already thirty times faster than classification using all the support vectors.
PDF DOI BibTeX

Empirical Inference Article Insect-inspired estimation of egomotion Franz, M., Chahl, J., Krapp, H. Neural Computation, 16(11):2245-2260, November 2004
Tangential neurons in the fly brain are sensitive to the typical optic flow patterns generated during egomotion. In this study, we examine whether a simplified linear model based on the organization principles in tangential neurons can be used to estimate egomotion from the optic flow. We present a theory for the construction of an estimator consisting of a linear combination of optic flow vectors that incorporates prior knowledge both about the distance distribution of the environment, and about the noise and egomotion statistics of the sensor. The estimator is tested on a gantry carrying an omnidirectional vision sensor. The experiments show that the proposed approach leads to accurate and robust estimates of rotation rates, whereas translation estimates are of reasonable quality, albeit less reliable.
PDF PostScript Web DOI BibTeX

Empirical Inference Technical Report Joint Kernel Maps Weston, J., Schölkopf, B., Bousquet, O., Mann, .., Noble, W. (131), Max-Planck-Institute for Biological Cybernetics, Tübingen, November 2004 PDF BibTeX

Empirical Inference Talk Discrete vs. Continuous: Two Sides of Machine Learning Zhou, D. October 2004
We consider the problem of transductive inference. In many real-world problems, unlabeled data is far easier to obtain than labeled data. Hence transductive inference is very significant in many practical problems. According to Vapnik's point of view, one should predict the function value only on the given points directly rather than a function defined on the whole space, the latter being a more complicated problem. Inspired by this idea, we develop discrete calculus on finite discrete spaces, and then build discrete regularization. A family of transductive algorithms is naturally derived from this regularization framework. We validate the algorithms on both synthetic and real-world data from text/web categorization to bioinformatics problems. A significant by-product of this work is a powerful way of ranking data based on examples including images, documents, proteins and many other kinds of data. This talk is mainly based on the followiing contribution: (1) D. Zhou and B. Sch{\"o}lkopf: Transductive Inference with Graphs, MPI Technical report, August, 2004; (2) D. Zhou, B. Sch{\"o}lkopf and T. Hofmann. Semi-supervised Learning on Directed Graphs. NIPS 2004; (3) D. Zhou, O. Bousquet, T.N. Lal, J. Weston and B. Sch{\"o}lkopf. Learning with Local and Global Consistency. NIPS 2003.
PDF BibTeX

Empirical Inference Poster S-cones contribute to flicker brightness in human vision Wehrhahn, C., Hill, N., Dillenburger, B. 34(174.12), 34th Annual Meeting of the Society for Neuroscience (Neuroscience 2004), October 2004
In the retina of primates three cone types sensitive to short, middle and long wavelengths of light convert photons into electrical signals. Many investigators have presented evidence that, in color normal observers, the signals of cones sensitive to short wavelengths of light (S-cones) do not contribute to the perception of brightness of a colored surface when this is alternated with an achromatic reference (flicker brightness). Other studies indicate that humans do use S-cone signals when performing this task. Common to all these studies is the small number of observers, whose performance data are reported. Considerable variability in the occurrence of cone types across observers has been found, but, to our knowledge, no cone counts exist from larger populations of humans. We reinvestigated how much the S-cones contribute to flicker brightness. 76 color normal observers were tested in a simple psychophysical procedure neutral to the cone type occurence (Teufel & Wehrhahn (2000), JOSA A 17: 994 - 1006). The data show that, in the majority of our observers, S-cones provide input with a negative sign - relative to L- and M-cone contribution - in the task in question. There is indeed considerable between-subject variability such that for 20 out of 76 observers the magnitude of this input does not differ significantly from 0. Finally, we argue that the sign of S-cone contribution to flicker brightness perception by an observer cannot be used to infer the relative sign their contributions to the neuronal signals carrying the information leading to the perception of flicker brightness. We conclude that studies which use only a small number of observers may easily fail to find significant evidence for the small but significant population tendency for the S-cones to contribute to flicker brightness. Our results confirm all earlier results and reconcile their contradictory interpretations.
Web BibTeX

Empirical Inference Proceedings Advanced Lectures on Machine Learning Bousquet, O., von Luxburg, U., Rätsch, G. ML Summer Schools 2003, LNAI 3176:240, Springer, Berlin, Germany, ML Summer Schools, September 2004
Machine Learning has become a key enabling technology for many engineering applications, investigating scientific questions and theoretical problems alike. To stimulate discussions and to disseminate new results, a summer school series was started in February 2002, the documentation of which is published as LNAI 2600. This book presents revised lectures of two subsequent summer schools held in 2003 in Canberra, Australia, and in T{\"u}bingen, Germany. The tutorial lectures included are devoted to statistical learning theory, unsupervised learning, Bayesian inference, and applications in pattern recognition; they provide in-depth overviews of exciting new developments and contain a large number of references. Graduate students, lecturers, researchers and professionals alike will find this book a useful resource in learning and teaching machine learning.
Web BibTeX

Empirical Inference Talk Grundlagen von Support Vector Maschinen und Anwendungen in der Bildverarbeitung Eichhorn, J. September 2004
Invited talk at the workshop "Numerical, Statistical and Discrete Methods in Image Processing" at the TU M{\"u}nchen (in GERMAN)
PDF BibTeX

Empirical Inference Conference Paper Learning Depth From Stereo Sinz, F., Candela, J., BakIr, G., Rasmussen, C., Franz, M. In Pattern Recognition: 26th DAGM Symposium, 26th DAGM Symposium, 245-252, LNCS 3175, (Editors: Rasmussen, C. E., H. H. Bülthoff, B. Schölkopf, M. A. Giese), Springer, Berlin, Germany, 26th DAGM Symposium, September 2004
We compare two approaches to the problem of estimating the depth of a point in space from observing its image position in two different cameras: 1.~The classical photogrammetric approach explicitly models the two cameras and estimates their intrinsic and extrinsic parameters using a tedious calibration procedure; 2.~A generic machine learning approach where the mapping from image to spatial coordinates is directly approximated by a Gaussian Process regression. Our results show that the generic learning approach, in addition to simplifying the procedure of calibration, can lead to higher depth accuracies than classical calibration although no specific domain knowledge is used.
PDF PostScript Web BibTeX

Empirical Inference Conference Paper Modelling Spikes with Mixtures of Factor Analysers Görür, D., Rasmussen, C., Tolias, A., Sinz, F., Logothetis, N. In Pattern Recognition: Proceedings of the 26th DAGM Symposium, Pattern Recognition, 391-398, LNCS 3175, (Editors: Rasmussen, C. E. , H.H. Bülthoff, B. Schölkopf, M.A. Giese), Springer, Berlin, Germany, 26th DAGM Symposium, September 2004
Identifying the action potentials of individual neurons from extracellular recordings, known as spike sorting, is a challenging problem. We consider the spike sorting problem using a generative model,mixtures of factor analysers, which concurrently performs clustering and feature extraction. The most important advantage of this method is that it quantifies the certainty with which the spikes are classified. This can be used as a means for evaluating the quality of clustering and therefore spike isolation. Using this method, nearly simultaneously occurring spikes can also be modelled which is a hard task for many of the spike sorting methods. Furthermore, modelling the data with a generative model allows us to generate simulated data.
PDF PDF DOI BibTeX

Empirical Inference Book Kernel Methods in Computational Biology Schölkopf, B., Tsuda, K., Vert, J. 410, Computational Molecular Biology, MIT Press, Cambridge, MA, USA, August 2004
Modern machine learning techniques are proving to be extremely valuable for the analysis of data in computational biology problems. One branch of machine learning, kernel methods, lends itself particularly well to the difficult aspects of biological data, which include high dimensionality (as in microarray measurements), representation as discrete and structured data (as in DNA or amino acid sequences), and the need to combine heterogeneous sources of information. This book provides a detailed overview of current research in kernel methods and their applications to computational biology. Following three introductory chapters—an introduction to molecular and computational biology, a short review of kernel methods that focuses on intuitive concepts rather than technical details, and a detailed survey of recent applications of kernel methods in computational biology—the book is divided into three sections that reflect three general trends in current research. The first part presents different ideas for the design of kernel functions specifically adapted to various biological data; the second part covers different approaches to learning from heterogeneous data; and the third part offers examples of successful applications of support vector machine methods.
Web BibTeX

Empirical Inference Article Learning kernels from biological networks by maximizing entropy Tsuda, K., Noble, W. Bioinformatics, 20(Suppl. 1):i326-i333, August 2004
Motivation: The diffusion kernel is a general method for computing pairwise distances among all nodes in a graph, based on the sum of weighted paths between each pair of nodes. This technique has been used successfully, in conjunction with kernel-based learning methods, to draw inferences from several types of biological networks. Results: We show that computing the diffusion kernel is equivalent to maximizing the von Neumann entropy, subject to a global constraint on the sum of the Euclidean distances between nodes. This global constraint allows for high variance in the pairwise distances. Accordingly, we propose an alternative, locally constrained diffusion kernel, and we demonstrate that the resulting kernel allows for more accurate support vector machine prediction of protein functional classifications from metabolic and protein–protein interaction networks.
PDF Web BibTeX

Empirical Inference Conference Paper Learning to Find Graph Pre-Images BakIr, G., Zien, A., Tsuda, K. In Pattern Recognition: Proceedings of the 26th DAGM Symposium, Pattern Recognition, 253-261, (Editors: Rasmussen, C. E., H. H. Bülthoff, B. Schölkopf, M. A. Giese), Springer, Berlin, Germany, 26th DAGM Symposium, August 2004
The recent development of graph kernel functions has made it possible to apply well-established machine learning methods to graphs. However, to allow for analyses that yield a graph as a result, it is necessary to solve the so-called pre-image problem: to reconstruct a graph from its feature space representation induced by the kernel. Here, we suggest a practical solution to this problem.
PostScript PDF DOI BibTeX

Empirical Inference Article Masking effect produced by Mach bands on the detection of narrow bars of random polarity Henning, G., Hoddinott, K., Wilson-Smith, Z., Hill, N. Journal of the Optical Society of America, 21(8):1379-1387, A, August 2004 BibTeX

Empirical Inference Proceedings Pattern Recognition: 26th DAGM Symposium, LNCS, Vol. 3175 Rasmussen, C., Bülthoff, H., Giese, M., Schölkopf, B. Proceedings of the 26th Pattern Recognition Symposium (DAGM‘04), 581, Springer, Berlin, Germany, 26th Pattern Recognition Symposium, August 2004 Web DOI BibTeX

Empirical Inference Technical Report Semi-Supervised Induction Yu, K., Tresp, V., Zhou, D. (141), Max Planck Institute for Biological Cybernetics, Tuebingen, Germany, August 2004
Considerable progress was recently achieved on semi-supervised learning, which differs from the traditional supervised learning by additionally exploring the information of the unlabelled examples. However, a disadvantage of many existing methods is that it does not generalize to unseen inputs. This paper investigates learning methods that effectively make use of both labelled and unlabelled data to build predictive functions, which are defined on not just the seen inputs but the whole space. As a nice property, the proposed method allows effcient training and can easily handle new test points. We validate the method based on both toy data and real world data sets.
PDF PDF BibTeX

Empirical Inference Conference Paper Exponential Families for Conditional Random Fields Altun, Y., Smola, A., Hofmann, T. In Proceedings of the 20th Annual Conference on Uncertainty in Artificial Intelligence (UAI 2004), 2-9, (Editors: Chickering, D.M. , J.Y. Halpern), Morgan Kaufmann, San Francisco, CA, USA, 20th Annual Conference on Uncertainty in Artificial Intelligence (UAI 2004), July 2004
In this paper we define conditional random fields in reproducing kernel Hilbert spaces and show connections to Gaussian Process classification. More specifically, we prove decomposition results for undirected graphical models and we give constructions for kernels. Finally we present efficient means of solving the optimization problem using reduced rank decompositions and we show how stationarity can be exploited efficiently in the optimization process.
PDF Web BibTeX

Empirical Inference Technical Report Hilbertian Metrics and Positive Definite Kernels on Probability Measures Hein, M., Bousquet, O. (126), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, July 2004
We investigate the problem of defining Hilbertian metrics resp. positive definite kernels on probability measures, continuing previous work. This type of kernels has shown very good results in text classification and has a wide range of possible applications. In this paper we extend the two-parameter family of Hilbertian metrics of Topsoe such that it now includes all commonly used Hilbertian metrics on probability measures. This allows us to do model selection among these metrics in an elegant and unified way. Second we investigate further our approach to incorporate similarity information of the probability space into the kernel. The analysis provides a better understanding of these kernels and gives in some cases a more efficient way to compute them. Finally we compare all proposed kernels in two text and one image classification problem.
PDF BibTeX

Empirical Inference Technical Report Kernels, Associated Structures and Generalizations Hein, M., Bousquet, O. (127), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, July 2004
This paper gives a survey of results in the mathematical literature on positive definite kernels and their associated structures. We concentrate on properties which seem potentially relevant for Machine Learning and try to clarify some results that have been misused in the literature. Moreover we consider different lines of generalizations of positive definite kernels. Namely we deal with operator-valued kernels and present the general framework of Hilbertian subspaces of Schwartz which we use to introduce kernels which are distributions. Finally indefinite kernels and their associated reproducing kernel spaces are considered.
PDF BibTeX

Empirical Inference Technical Report Object categorization with SVM: kernels for local features Eichhorn, J., Chapelle, O. (137), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, July 2004
In this paper, we propose to combine an efficient image representation based on local descriptors with a Support Vector Machine classifier in order to perform object categorization. For this purpose, we apply kernels defined on sets of vectors. After testing different combinations of kernel / local descriptors, we have been able to identify a very performant one.
PDF BibTeX

Empirical Inference Talk Riemannian Geometry on Graphs and its Application to Ranking and Classification Zhou, D. June 2004
We consider the problem of transductive inference. In many real-world problems, unlabeled data is far easier to obtain than labeled data. Hence transductive inference is very significant in many practical problems. According to Vapnik's point of view, one should predict the function value only on the given points directly rather than a function defined on the whole space, the latter being a more complicated problem. Inspired by this idea, we develop discrete calculus on finite discrete spaces, and then build discrete regularization. A family of transductive algorithms is naturally derived from this regularization framework. We validate the algorithms on both synthetic and real-world data from text/web categorization to bioinformatics problems. A significant by-product of this work is a powerful way of ranking data based on examples including images, documents, proteins and many other kinds of data.
PDF BibTeX

Empirical Inference Proceedings Advances in Neural Information Processing Systems 16: Proceedings of the 2003 Conference Thrun, S., Saul, L., Schölkopf, B. Proceedings of the Seventeenth Annual Conference on Neural Information Processing Systems (NIPS 2003), 1621, MIT Press, Cambridge, MA, USA, 17th Annual Conference on Neural Information Processing Systems (NIPS 2003), June 2004
The annual Neural Information Processing (NIPS) conference is the flagship meeting on neural computation. It draws a diverse group of attendees—physicists, neuroscientists, mathematicians, statisticians, and computer scientists. The presentations are interdisciplinary, with contributions in algorithms, learning theory, cognitive science, neuroscience, brain imaging, vision, speech and signal processing, reinforcement learning and control, emerging technologies, and applications. Only thirty percent of the papers submitted are accepted for presentation at NIPS, so the quality is exceptionally high. This volume contains all the papers presented at the 2003 conference.
Web BibTeX

Empirical Inference Article Distance-Based Classification with Lipschitz Functions von Luxburg, U., Bousquet, O. Journal of Machine Learning Research, 5:669-695, June 2004
The goal of this article is to develop a framework for large margin classification in metric spaces. We want to find a generalization of linear decision functions for metric spaces and define a corresponding notion of margin such that the decision function separates the training points with a large margin. It will turn out that using Lipschitz functions as decision functions, the inverse of the Lipschitz constant can be interpreted as the size of a margin. In order to construct a clean mathematical setup we isometrically embed the given metric space into a Banach space and the space of Lipschitz functions into its dual space. To analyze the resulting algorithm, we prove several representer theorems. They state that there always exist solutions of the Lipschitz classifier which can be expressed in terms of distance functions to training points. We provide generalization bounds for Lipschitz classifiers in terms of the Rademacher complexities of some Lipschitz function classes. The generality of our approach can be seen from the fact that several well-known algorithms are special cases of the Lipschitz classifier, among them the support vector machine, the linear programming machine, and the 1-nearest neighbor classifier.
PDF PostScript PDF BibTeX