Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Empirical Inference Poster Efficient Approximations for Support Vector Classiers Kienzle, W., Franz, M. 7:68, 7th T{\"u}bingen Perception Conference (TWK 2004), February 2004
In face detection, support vector machines (SVM) and neural networks (NN) have been shown to outperform most other classication methods. While both approaches are learning-based, there are distinct advantages and drawbacks to each method: NNs are difcult to design and train but can lead to very small and efcient classiers. In comparison, SVM model selection and training is rather straightforward, and, more importantly, guaranteed to converge to a globally optimal (in the sense of training errors) solution. Unfortunately, SVM classiers tend to have large representations which are inappropriate for time-critical image processing applications. In this work, we examine various existing and new methods for simplifying support vector decision rules. Our goal is to obtain efcient classiers (as with NNs) while keeping the numerical and statistical advantages of SVMs. For a given SVM solution, we compute a cascade of approximations with increasing complexities. Each classier is tuned so that the detection rate is near 100%. At run-time, the rst (simplest) detector is evaluated on the whole image. Then, any subsequent classier is applied only to those positions that have been classied as positive throughout all previous stages. The false positive rate at the end equals that of the last (i.e. most complex) detector. In contrast, since many image positions are discarded by lower-complexity classiers, the average computation time per patch decreases signicantly compared to the time needed for evaluating the highest-complexity classier alone.
Web BibTeX

Empirical Inference Poster Efficient Approximations for Support Vector Classifiers Kienzle, W., Franz, M. 7:68, 7th T{\"u}bingen Perception Conference (TWK 2004), February 2004
In face detection, support vector machines (SVM) and neural networks (NN) have been shown to outperform most other classication methods. While both approaches are learning-based, there are distinct advantages and drawbacks to each method: NNs are difcult to design and train but can lead to very small and efcient classiers. In comparison, SVM model selection and training is rather straightforward, and, more importantly, guaranteed to converge to a globally optimal (in the sense of training errors) solution. Unfortunately, SVM classiers tend to have large representations which are inappropriate for time-critical image processing applications. In this work, we examine various existing and new methods for simplifying support vector decision rules. Our goal is to obtain efcient classiers (as with NNs) while keeping the numerical and statistical advantages of SVMs. For a given SVM solution, we compute a cascade of approximations with increasing complexities. Each classier is tuned so that the detection rate is near 100%. At run-time, the rst (simplest) detector is evaluated on the whole image. Then, any subsequent classier is applied only to those positions that have been classied as positive throughout all previous stages. The false positive rate at the end equals that of the last (i.e. most complex) detector. In contrast, since many image positions are discarded by lower-complexity classiers, the average computation time per patch decreases signicantly compared to the time needed for evaluating the highest-complexity classier alone.
Web BibTeX

Empirical Inference Poster Human Classification Behaviour Revisited by Machine Learning Graf, A., Wichmann, F., Bülthoff, H., Schölkopf, B. 7:134, (Editors: Bülthoff, H.H., H.A. Mallot, R. Ulrich and F.A. Wichmann), 7th T{\"u}bingen Perception Conference (TWK 2004), February 2004
We attempt to understand visual classication in humans using both psychophysical and machine learning techniques. Frontal views of human faces were used for a gender classication task. Human subjects classied the faces and their gender judgment, reaction time (RT) and condence rating (CR) were recorded for each face. RTs are longer for incorrect answers than for correct ones, high CRs are correlated with low classication errors and RTs decrease as the CRs increase. This results suggest that patterns difcult to classify need more computation by the brain than patterns easy to classify. Hyperplane learning algorithms such as Support Vector Machines (SVM), Relevance Vector Machines (RVM), Prototype learners (Prot) and K-means learners (Kmean) were used on the same classication task using the Principal Components of the texture and oweld representation of the faces. The classication performance of the learning algorithms was estimated using the face database with the true gender of the faces as labels, and also with the gender estimated by the subjects. Kmean yield a classication performance close to humans while SVM and RVM are much better. This surprising behaviour may be due to the fact that humans are trained on real faces during their lifetime while they were here tested on articial ones, while the algorithms were trained and tested on the same set of stimuli. We then correlated the human responses to the distance of the stimuli to the separating hyperplane (SH) of the learning algorithms. On the whole stimuli far from the SH are classied more accurately, faster and with higher condence than those near to the SH if we pool data across all our subjects and stimuli. We also nd three noteworthy results. First, SVMs and RVMs can learn to classify faces using the subjects' labels but perform much better when using the true labels. Second, correlating the average response of humans (classication error, RT or CR) with the distance to the SH on a face-by-face basis using Spearman's rank correlation coefcients shows that RVMs recreate human performance most closely in every respect. Third, the mean-of-class prototype, its popularity in neuroscience notwithstanding, is the least human-like classier in all cases examined.
Web BibTeX

Empirical Inference Poster Learning Depth Sinz, F., Franz, M. 69, (Editors: H.H.Bülthoff, H.A.Mallot, R.Ulrich,F.A.Wichmann), 7th T{\"u}bingen Perception Conference (TWK 2004), February 2004
The depth of a point in space can be estimated by observing its image position from two different viewpoints. The classical approach to stereo vision calculates depth from the two projection equations which together form a stereocamera model. An unavoidable preparatory work for this solution is a calibration procedure, i.e., estimating the external (position and orientation) and internal (focal length, lens distortions etc.) parameters of each camera from a set of points with known spatial position and their corresponding image positions. This is normally done by iteratively linearizing the single camera models and reestimating their parameters according to the error on the known datapoints. The advantage of the classical method is the maximal usage of prior knowledge about the underlying physical processes and the explicit estimation of meaningful model parameters such as focal length or camera position in space. However, the approach neglects the nonlinear nature of the problem such that the results critically depend on the choice of the initial values for the parameters. In this study, we approach the depth estimation problem from a different point of view by applying generic machine learning algorithms to learn the mapping from image coordinates to spatial position. These algorithms do not require any domain knowledge and are able to learn nonlinear functions by mapping the inputs into a higher-dimensional space. Compared to classical calibration, machine learning methods give a direct solution to the depth estimation problem which means that the values of the stereocamera parameters cannot be extracted from the learned mapping. On the poster, we compare the performance of classical camera calibration to that of different machine learning algorithms such as kernel ridge regression, gaussian processes and support vector regression. Our results indicate that generic learning approaches can lead to higher depth accuracies than classical calibration although no domain knowledge is used.
PDF Web BibTeX

Empirical Inference Poster Selective Attention to Auditory Stimuli: A Brain-Computer Interface Paradigm Hill, N., Lal, T., Schröder, M., Hinterberger, T., Birbaumer, N., Schölkopf, B. 7:102, (Editors: Bülthoff, H.H., H.A. Mallot, R. Ulrich and F.A. Wichmann), 7th T{\"u}bingen Perception Conference (TWK 2004), February 2004
During the last 20 years several paradigms for Brain Computer Interfaces have been proposed— see [1] for a recent review. They can be divided into (a) stimulus-driven paradigms, using e.g. event-related potentials or visual evoked potentials from an EEG signal, and (b) patient-driven paradigms such as those that use premotor potentials correlated with imagined action, or slow cortical potentials (e.g. [2]). Our aim is to develop a stimulus-driven paradigm that is applicable in practice to patients. Due to the unreliability of visual perception in “locked-in” patients in the later stages of disorders such as Amyotrophic Lateral Sclerosis, we concentrate on the auditory modality. Speci- cally, we look for the effects, in the EEG signal, of selective attention to one of two concurrent auditory stimulus streams, exploiting the increased activation to attended stimuli that is seen under some circumstances [3]. We present the results of our preliminary experiments on normal subjects. On each of 400 trials, two repetitive stimuli (sequences of drum-beats or other pulsed stimuli) could be heard simultaneously. The two stimuli were distinguishable from one another by their acoustic properties, by their source location (one from a speaker to the left of the subject, the other from the right), and by their differing periodicities. A visual cue preceded the stimulus by 500 msec, indicating which of the two stimuli to attend to, and the subject was instructed to count the beats in the attended stimulus stream. There were up to 6 beats of each stimulus: with equal probability on each trial, all 6 were played, or the fourth was omitted, or the fth was omitted. The 40-channel EEG signals were analyzed ofine to reconstruct which of the streams was attended on each trial. A linear Support Vector Machine [4] was trained on a random subset of the data and tested on the remainder. Results are compared from two types of pre-processing of the signal: for each stimulus stream, (a) EEG signals at the stream's beat periodicity are emphasized, or (b) EEG signals following beats are contrasted with those following missing beats. Both forms of pre-processing show promising results, i.e. that selective attention to one or the other auditory stream yields signals that are classiable signicantly above chance performance. In particular, the second pre-processing was found to be robust to reduction in the number of features used for classication (cf. [5]), helping us to eliminate noise.
PDF Web BibTeX

Empirical Inference Poster Texture and Haptic Cues in Slant Discrimination: Measuring the Effect of Texture Type Rosas, P., Wichmann, F., Ernst, M., Wagemans, J. 7:165, (Editors: Bülthoff, H. H., H. A. Mallot, R. Ulrich, F. A. Wichmann), 7th T{\"u}bingen Perception Conference (TWK 2004), February 2004
In a number of models of depth cue combination the depth percept is constructed via a weighted average combination of independent depth estimations. The inuence of each cue in such average depends on the reliability of the source of information [1,5]. In particular, Ernst and Banks (2002) formulate such combination as that of the minimum variance unbiased estimator that can be constructed from the available cues. We have observed systematic differences in slant discrimination performance of human observers when different types of textures were used as cue to slant [4]. If the depth percept behaves as described above, our measurements of the slopes of the psychometric functions provide the predicted weights for the texture cue for the ranked texture types. However, the results for slant discrimination obtained when combining these texture types with object motion results are difcult to reconcile with the minimum variance unbiased estimator model [3]. This apparent failure of such model might be explained by the existence of a coupling of texture and motion, violating the assumption of independence of cues. Hillis, Ernst, Banks, and Landy (2002) [2] have shown that while for between-modality combination the human visual system has access to the single-cue information, for withinmodality combination (visual cues) the single-cue information is lost. This suggests a coupling between visual cues and independence between visual and haptic cues. Then, in the present study we combined the different texture types with haptic information in a slant discrimination task, to test whether in the between-modality condition these cues are combined as predicted by an unbiased, minimum variance estimator model. The measured weights for the cues were consistent with a combination rule sensitive to the reliability of the sources of information, but did not match the predictions of a statistically optimal combination.
PDF Web BibTeX

Empirical Inference Poster m-Alternative-Forced-Choice: Improving the Efficiency of the Method of Constant Stimuli Jäkel, F., Hill, J., Wichmann, F. 7:118, 7th T{\"u}bingen Perception Conference (TWK 2004), February 2004
We explored several ways to improve the efficiency of measuring psychometric functions without resorting to adaptive procedures. a) The number m of alternatives in an m-alternative-forced-choice (m-AFC) task improves the efficiency of the method of constant stimuli. b) When alternatives are presented simultaneously on different positions on a screen rather than sequentially time can be saved and memory load for the subject can be reduced. c) A touch-screen can further help to make the experimental procedure more intuitive. We tested these ideas in the measurement of contrast sensitivity and compared them to results obtained by sequential presentation in two-interval-forced-choice (2-IFC). Qualitatively all methods (m-AFC and 2-IFC) recovered the characterictic shape of the contrast sensitivity function in three subjects. The m-AFC paradigm only took about 60% of the time of the 2-IFC task. We tried m=2,4,8 and found 4-AFC to give the best model fits and 2-AFC to have the least bias.
Web BibTeX

Empirical Inference Article Experimentally optimal v in support vector regression for different noise models and parameter settings Chalimourda, A., Schölkopf, B., Smola, A. Neural Networks, 17(1):127-141, January 2004
In Support Vector (SV) regression, a parameter ν controls the number of Support Vectors and the number of points that come to lie outside of the so-called var epsilon-insensitive tube. For various noise models and SV parameter settings, we experimentally determine the values of ν that lead to the lowest generalization error. We find good agreement with the values that had previously been predicted by a theoretical argument based on the asymptotic efficiency of a simplified model of SV regression. As a side effect of the experiments, valuable information about the generalization behavior of the remaining SVM parameters and their dependencies is gained. The experimental findings are valid even for complex ‘real-world’ data sets. Based on our results on the role of the ν-SVM parameters, we discuss various model selection methods.
PDF DOI BibTeX

Empirical Inference Talk Introduction to Category Theory Bousquet, O. Internal Seminar, January 2004
A brief introduction to the general idea behind category theory with some basic definitions and examples. A perspective on higher dimensional categories is given.
PDF BibTeX

Empirical Inference Talk Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Zhou, D. January 2004
We consider the general problem of learning from labeled and unlabeled data, which is often called semi-supervised learning or transductive inference. A principled approach to semi-supervised learning is to design a classifying function which is sufficiently smooth with respect to the intrinsic structure collectively revealed by known labeled and unlabeled points. We present a simple algorithm to obtain such a smooth solution. Our method yields encouraging experimental results on a number of classification problems and demonstrates effective use of unlabeled data.
PDF BibTeX

Empirical Inference Conference Paper A New Variational Framework for Rigid-Body Alignment Kato, T., Tsuda, K., Tomii, K., Asai, K. In Joint IAPR International Workshops on Syntactical and Structural Pattern Recognition (SSPR 2004) and Statistical Pattern Recognition (SPR 2004), 171-179, (Editors: Fred, A.,T. Caelli, R.P.W. Duin, A. Campilho and D. de Ridder), Joint IAPR International Workshops on Syntactical and Structural Pattern Recognition (SSPR 2004) and Statistical Pattern Recognition (SPR 2004), 2004 PDF BibTeX

Empirical Inference Book Chapter A Primer on Kernel Methods Vert, J., Tsuda, K., Schölkopf, B. In Kernel Methods in Computational Biology, 35-70, (Editors: B Schölkopf and K Tsuda and JP Vert), MIT Press, Cambridge, MA, USA, 2004 PDF BibTeX

Empirical Inference Conference Paper A Regularization Framework for Learningfrom Graph Data Zhou, D., Schölkopf, B. In ICML Workshop on Statistical Relational Learning and Its Connections to Other Fields, 132-137, ICML, 2004
The data in many real-world problems can be thought of as a graph, such as the web, co-author networks, and biological networks. We propose a general regularization framework on graphs, which is applicable to the classification, ranking, and link prediction problems. We also show that the method can be explained as lazy random walks. We evaluate the method on a number of experiments.
PDF PostScript BibTeX

Empirical Inference Article A Tutorial on Support Vector Regression Smola, A., Schölkopf, B. Statistics and Computing, 14(3):199-222, 2004 Web BibTeX

Empirical Inference Conference Paper A kernel view of the dimensionality reduction of manifolds Ham, J., Lee, D., Mika, S., Schölkopf, B. In Proceedings of the Twenty-First International Conference on Machine Learning, 369-376, (Editors: CE Brodley), ACM, New York, NY, USA, ICML, 2004, also appeared as MPI-TR 110
We interpret several well-known algorithms for dimensionality reduction of manifolds as kernel methods. Isomap, graph Laplacian eigenmap, and locally linear embedding (LLE) all utilize local neighborhood information to construct a global embedding of the manifold. We show how all three algorithms can be described as kernel PCA on specially constructed Gram matrices, and illustrate the similarities and differences between the algorithms with representative examples.
PDF BibTeX

Empirical Inference Book Chapter A primer on molecular biology Zien, A. In 3-34, (Editors: Schoelkopf, B., K. Tsuda and J. P. Vert), MIT Press, Cambridge, MA, USA, 2004
Modern molecular biology provides a rich source of challenging machine learning problems. This tutorial chapter aims to provide the necessary biological background knowledge required to communicate with biologists and to understand and properly formalize a number of most interesting problems in this application domain. The largest part of the chapter (its first section) is devoted to the cell as the basic unit of life. Four aspects of cells are reviewed in sequence: (1) the molecules that cells make use of (above all, proteins, RNA, and DNA); (2) the spatial organization of cells (``compartmentalization''); (3) the way cells produce proteins (``protein expression''); and (4) cellular communication and evolution (of cells and organisms). In the second section, an overview is provided of the most frequent measurement technologies, data types, and data sources. Finally, important open problems in the analysis of these data (bioinformatics challenges) are briefly outlined.
PDF PostScript Web BibTeX

Empirical Inference Article Asymptotic Properties of the Fisher Kernel Tsuda, K., Akaho, S., Kawanabe, M., Müller, K. Neural Computation, 16(1):115-137, 2004 PDF BibTeX

Empirical Inference Article Bayesian analysis of the Scatterometer Wind Retrieval Inverse Problem: Some New Approaches Cornford, D., Csato, L., Evans, D., Opper, M. Journal of the Royal Statistical Society B, 66:1-17, 3, 2004
The retrieval of wind vectors from satellite scatterometer observations is a non-linear inverse problem.A common approach to solving inverse problems is to adopt a Bayesian framework and to infer the posterior distribution of the parameters of interest given the observations by using a likelihood model relating the observations to the parameters, and a prior distribution over the parameters.We show how Gaussian process priors can be used efficiently with a variety of likelihood models, using local forward (observation) models and direct inverse models for the scatterometer.We present an enhanced Markov chain Monte Carlo method to sample from the resulting multimodal posterior distribution.We go on to show how the computational complexity of the inference can be controlled by using a sparse, sequential Bayes algorithm for estimation with Gaussian processes.This helps to overcome the most serious barrier to the use of probabilistic, Gaussian process methods in remote sensing inverse problems, which is the prohibitively large size of the data sets.We contrast the sampling results with the approximations that are found by using the sparse, sequential Bayes algorithm.
PDF BibTeX

Empirical Inference Technical Report Behaviour and Convergence of the Constrained Covariance Gretton, A., Smola, A., Bousquet, O., Herbrich, R., Schölkopf, B., Logothetis, N. (130), MPI for Biological Cybernetics, 2004
We discuss reproducing kernel Hilbert space (RKHS)-based measures of statistical dependence, with emphasis on constrained covariance (COCO), a novel criterion to test dependence of random variables. We show that COCO is a test for independence if and only if the associated RKHSs are universal. That said, no independence test exists that can distinguish dependent and independent random variables in all circumstances. Dependent random variables can result in a COCO which is arbitrarily close to zero when the source densities are highly non-smooth, which can make dependence hard to detect empirically. All current kernel-based independence tests share this behaviour. Finally, we demonstrate exponential convergence between the population and empirical COCO, which implies that COCO does not suffer from slow learning rates when used as a dependence test.
PDF BibTeX

Empirical Inference Ph.D. Thesis Classification and Feature Extraction in Man and Machine Graf, A. Biologische Kybernetik, University of Tübingen, Germany, 2004, online publication BibTeX

Empirical Inference Poster Classification and Memory Behaviour of Man Revisited by Machine Graf, A., Wichmann, F., Bülthoff, H., Schölkopf, B. CSHL Meeting on Computational & Systems Neuroscience (COSYNE), 2004 BibTeX

Empirical Inference Conference Paper Clustering Protein Sequence and Structure Space with Infinite Gaussian Mixture Models Dubey, A., Hwang, S., Rangel, C., Rasmussen, C., Ghahramani, Z., Wild, D. In Pacific Symposium on Biocomputing 2004; Vol. 9, 399-410, World Scientific Publishing, Singapore, Pacific Symposium on Biocomputing, 2004
We describe a novel approach to the problem of automatically clustering protein sequences and discovering protein families, subfamilies etc., based on the thoery of infinite Gaussian mixture models. This method allows the data itself to dictate how many mixture components are required to model it, and provides a measure of the probability that two proteins belong to the same cluster. We illustrate our methods with application to three data sets: globin sequences, globin sequences with known tree-dimensional structures and G-pretein coupled receptor sequences. The consistency of the clusters indicate that that our methods is producing biologically meaningful results, which provide a very good indication of the underlying families and subfamilies. With the inclusion of secondary structure and residue solvent accessibility information, we obtain a classification of sequences of known structure which reflects and extends their SCOP classifications. A supplementary web site containing larger versions of the figures is available at http://public.kgi.edu/~wild/PSB04
PDF BibTeX

Empirical Inference Book Chapter Concentration Inequalities Boucheron, S., Lugosi, G., Bousquet, O. In Lecture Notes in Artificial Intelligence 3176:208-240, (Editors: Bousquet, O., U. von Luxburg and G. Rätsch), Springer, Heidelberg, Germany, 2004 PDF BibTeX

Empirical Inference Technical Report Confidence Sets for Ratios: A Purely Geometric Approach To Fieller’s Theorem von Luxburg, U., Franz, V. (133), Max Planck Institute for Biological Cybernetics, 2004
We present a simple, geometric method to construct Fieller's exact confidence sets for ratios of jointly normally distributed random variables. Contrary to previous geometric approaches in the literature, our method is valid in the general case where both sample mean and covariance are unknown. Moreover, not only the construction but also its proof are purely geometric and elementary, thus giving intuition into the nature of the confidence sets.
PDF BibTeX

Empirical Inference Poster Early visual processing—data, theory, models Wichmann, F. Experimentelle Psychologie. Beitr{\"a}ge zur 46. Tagung experimentell arbeitender Psychologen, 46:24, 2004 BibTeX

Empirical Inference Conference Paper Efficient Approximations for Support Vector Machines in Object Detection Kienzle, W., BakIr, G., Franz, M., Schölkopf, B. In Pattern Recognition, Proceedings of the 26th DAGM Symposium, DAGM 2004, 54-61, (Editors: CE Rasmussen and HH Bülthoff and B Schölkopf and MA Giese), Springer, Berlin, Germany, Pattern Recognition, Proceedings of the 26th DAGM Symposium, 2004
We present a new approximation scheme for support vector decision functions in object detection. In the present approach we are building on an existing algorithm where the set of support vectors is replaced by a smaller so-called reduced set of synthetic points. Instead of finding the reduced set via unconstrained optimization, we impose a structural constraint on the synthetic vectors such that the resulting approximation can be evaluated via separable filters. Applications that require scanning an entire image can benefit from this representation: when using separable filters, the average computational complexity for evaluating a reduced set vector on a test patch of size (h x w) drops from O(hw) to O(h+w). We show experimental results on handwritten digits and face detection.
PDF BibTeX

Empirical Inference Article Feature Selection for Support Vector Machines Using Genetic Algorithms Fröhlich, H., Chapelle, O., Schölkopf, B. International Journal on Artificial Intelligence Tools (Special Issue on Selected Papers from the 15th IEEE International Conference on Tools with Artificial Intelligence 2003), 13(4):791-800, 2004 Web BibTeX

Empirical Inference Conference Paper Gasussian process model based predictive control Kocijan, J., Murray-Smith, R., Rasmussen, C., Girard, A. In Proceedings of the ACC 2004, 2214-2219, Proceedings of the ACC, 2004
Gaussian process models provide a probabilistic non-parametric modelling approach for black-box identi cation of non-linear dynamic systems. The Gaussian processes can highlight areas of the input space where prediction quality is poor, due to the lack of data or its complexity, by indicating the higher variance around the predicted mean. Gaussian process models contain noticeably less coef cients to be optimised. This paper illustrates possible application of Gaussian process models within model-based predictive control. The extra information provided within Gaussian process model is used in predictive control, where optimisation of control signal takes the variance information into account. The predictive control principle is demonstrated on control of pH process benchmark.
PDF PostScript BibTeX

Empirical Inference Book Chapter Gaussian Processes in Machine Learning Rasmussen, C. In 3176:63-71, Lecture Notes in Computer Science, (Editors: Bousquet, O., U. von Luxburg and G. Rätsch), Springer, Heidelberg, 2004, Copyright by Springer
We give a basic introduction to Gaussian Process regression models. We focus on understanding the role of the stochastic process and how it is used to define a distribution over functions. We present the simple equations for incorporating training data and examine how to learn the hyperparameters using the marginal likelihood. We explain the practical advantages of Gaussian Process and end with conclusions and a look at the current trends in GP work.
PDF PostScript BibTeX

Empirical Inference Conference Paper Hilbertian Metrics on Probability Measures and their Application in SVM’s Hein, H., Lal, T., Bousquet, O. In Pattern Recognition, Proceedings of th 26th DAGM Symposium, 3175:270-277, Lecture Notes in Computer Science, (Editors: Rasmussen, C. E., H. H. Bülthoff, M. Giese and B. Schölkopf), Pattern Recognition, Proceedings of th 26th DAGM Symposium, 2004
The goal of this article is to investigate the field of Hilbertian metrics on probability measures. Since they are very versatile and can therefore be applied in various problems they are of great interest in kernel methods. Quit recently Tops{o}e and Fuglede introduced a family of Hilbertian metrics on probability measures. We give basic properties of the Hilbertian metrics of this family and other used metrics in the literature. Then we propose an extension of the considered metrics which incorporates structural information of the probability space into the Hilbertian metric. Finally we compare all proposed metrics in an image and text classification problem using histogram data.
PDF PostScript BibTeX

Empirical Inference Poster Implicit Wiener series for capturing higher-order interactions in images Franz, M., Schölkopf, B. Sensory coding and the natural environment, (Editors: Olshausen, B.A. and M. Lewicki), 2004
The information about the objects in an image is almost exclusively described by the higher-order interactions of its pixels. The Wiener series is one of the standard methods to systematically characterize these interactions. However, the classical estimation method of the Wiener expansion coefficients via cross-correlation suffers from severe problems that prevent its application to high-dimensional and strongly nonlinear signals such as images. We propose an estimation method based on regression in a reproducing kernel Hilbert space that overcomes these problems using polynomial kernels as known from Support Vector Machines and other kernel-based methods. Numerical experiments show performance advantages in terms of convergence, interpretability and system sizes that can be handled. By the time of the conference, we will be able to present first results on the higher-order structure of natural images.
BibTeX

Empirical Inference Conference Paper Implicit estimation of Wiener series Franz, M., Schölkopf, B. In Machine Learning for Signal Processing XIV, Proc. 2004 IEEE Signal Processing Society Workshop, 735-744, (Editors: A Barros and J Principe and J Larsen and T Adali and S Douglas), IEEE, New York, Machine Learning for Signal Processing XIV, Proc. 2004 IEEE Signal Processing Society Workshop, 2004
The Wiener series is one of the standard methods to systematically characterize the nonlinearity of a system. The classical estimation method of the expansion coefficients via cross-correlation suffers from severe problems that prevent its application to high-dimensional and strongly nonlinear systems. We propose an implicit estimation method based on regression in a reproducing kernel Hilbert space that alleviates these problems. Experiments show performance advantages in terms of convergence, interpretability, and system sizes that can be handled.
PDF PostScript BibTeX

Empirical Inference Book Chapter Introduction to Statistical Learning Theory Bousquet, O., Boucheron, S., Lugosi, G. In Lecture Notes in Artificial Intelligence 3176:169-207, (Editors: Bousquet, O., U. von Luxburg and G. Rätsch), Springer, Heidelberg, Germany, 2004 PDF BibTeX

Empirical Inference Article Kernel Methods and their Potential Use in Signal Processing Perez-Cruz, F., Bousquet, O. IEEE Signal Processing Magazine, (Special issue on Signal Processing for Mining), 2004 (Accepted) PostScript BibTeX

Empirical Inference Conference Paper Kernel Methods for Manifold Estimation Schölkopf, B. In Proceedings in Computational Statistics, Proceedings in Computational Statistics, 441-452, (Editors: J Antoch), Physica-Verlag/Springer, Heidelberg, Germany, COMPSTAT, 2004 BibTeX

Empirical Inference Book Chapter Kernels for graphs Kashima, H., Tsuda, K., Inokuchi, A. In 155-170, (Editors: Schoelkopf, B., K. Tsuda and J.P. Vert), MIT Press, Cambridge, MA; USA, 2004 PDF BibTeX

Empirical Inference Conference Paper Learning from Labeled and Unlabeled Data Using Random Walks Zhou, D., Schölkopf, B. In Pattern Recognition, Proceedings of the 26th DAGM Symposium, 237-244, (Editors: Rasmussen, C.E., H.H. Bülthoff, M.A. Giese and B. Schölkopf), Pattern Recognition, Proceedings of the 26th DAGM Symposium, 2004
We consider the general problem of learning from labeled and unlabeled data. Given a set of points, some of them are labeled, and the remaining points are unlabeled. The goal is to predict the labels of the unlabeled points. Any supervised learning algorithm can be applied to this problem, for instance, Support Vector Machines (SVMs). The problem of our interest is if we can implement a classifier which uses the unlabeled data information in some way and has higher accuracy than the classifiers which use the labeled data only. Recently we proposed a simple algorithm, which can substantially benefit from large amounts of unlabeled data and demonstrates clear superiority to supervised learning methods. In this paper we further investigate the algorithm using random walks and spectral graph theory, which shed light on the key steps in this algorithm.
PDF PostScript BibTeX

Empirical Inference Technical Report Learning from Labeled and Unlabeled Data Using Random Walks Zhou, D., Schölkopf, B. Max Planck Institute for Biological Cybernetics, 2004
We consider the general problem of learning from labeled and unlabeled data. Given a set of points, some of them are labeled, and the remaining points are unlabeled. The goal is to predict the labels of the unlabeled points. Any supervised learning algorithm can be applied to this problem, for instance, Support Vector Machines (SVMs). The problem of our interest is if we can implement a classifier which uses the unlabeled data information in some way and has higher accuracy than the classifiers which use the labeled data only. Recently we proposed a simple algorithm, which can substantially benefit from large amounts of unlabeled data and demonstrates clear superiority to supervised learning methods. In this paper we further investigate the algorithm using random walks and spectral graph theory, which shed light on the key steps in this algorithm.
PDF PostScript BibTeX

Empirical Inference Poster Masking by plaid patterns revisited Wichmann, F. Experimentelle Psychologie. Beitr{\"a}ge zur 46. Tagung experimentell arbeitender Psychologen, 46:285, 2004 BibTeX

Empirical Inference Conference Paper Maximal Margin Classification for Metric Spaces Hein, M., Bousquet, O. In Learning Theory and Kernel Machines, 72-86, (Editors: Schölkopf, B. and Warmuth, M. K.), Springer, Heidelberg, Germany, 16. Annual Conference on Computational Learning Theory / COLT Kernel, 2004
In this article we construct a maximal margin classification algorithm for arbitrary metric spaces. At first we show that the Support Vector Machine (SVM) is a maximal margin algorithm for the class of metric spaces where the negative squared distance is conditionally positive definite (CPD). This means that the metric space can be isometrically embedded into a Hilbert space, where one performs linear maximal margin separation. We will show that the solution only depends on the metric, but not on the kernel. Following the framework we develop for the SVM, we construct an algorithm for maximal margin classification in arbitrary metric spaces. The main difference compared with SVM is that we no longer embed isometrically into a Hilbert space, but a Banach space. We further give an estimate of the capacity of the function class involved in this algorithm via Rademacher averages. We recover an algorithm of Graepel et al. [6].
PDF PostScript PDF DOI BibTeX

Empirical Inference Conference Paper Multivariate Regression via Stiefel Manifold Constraints BakIr, G., Gretton, A., Franz, M., Schölkopf, B. In Pattern Recognition, Proceedings of the 26th DAGM Symposium, Lecture Notes in Computer Science, Vol. 3175, 262-269, (Editors: CE Rasmussen and HH Bülthoff and B Schölkopf and MA Giese), Springer, Berlin, Germany, Pattern Recognition, Proceedings of the 26th DAGM Symposium, 2004
We introduce a learning technique for regression between high-dimensional spaces. Standard methods typically reduce this task to many one-dimensional problems, with each output dimension considered independently. By contrast, in our approach the feature construction and the regression estimation are performed jointly, directly minimizing a loss function that we specify, subject to a rank constraint. A major advantage of this approach is that the loss is no longer chosen according to the algorithmic requirements, but can be tailored to the characteristics of the task at hand; the features will then be optimal with respect to this objective, and dependence between the outputs can be exploited.
PostScript BibTeX

Empirical Inference Technical Report Multivariate Regression with Stiefel Constraints Bakir, G., Gretton, A., Franz, M., Schölkopf, B. (128), MPI for Biological Cybernetics, Spemannstr 38, 72076, Tuebingen, 2004
We introduce a new framework for regression between multi-dimensional spaces. Standard methods for solving this problem typically reduce the problem to one-dimensional regression by choosing features in the input and/or output spaces. These methods, which include PLS (partial least squares), KDE (kernel dependency estimation), and PCR (principal component regression), select features based on different a-priori judgments as to their relevance. Moreover, loss function and constraints are chosen not primarily on statistical grounds, but to simplify the resulting optimisation. By contrast, in our approach the feature construction and the regression estimation are performed jointly, directly minimizing a loss function that we specify, subject to a rank constraint. A major advantage of this approach is that the loss is no longer chosen according to the algorithmic requirements, but can be tailored to the characteristics of the task at hand; the features will then be optimal with respect to this objective. Our approach also allows for the possibility of using a regularizer in the optimization. Finally, by processing the observations sequentially, our algorithm is able to work on large scale problems.
PDF BibTeX

Empirical Inference Poster Neural mechanisms underlying control of a Brain-Computer-Interface (BCI): Simultaneous recording of bold-response and EEG Hinterberger, T., Wilhelm, B., Veit, R., Weiskopf, N., Lal, T., Birbaumer, N. 2004
Brain computer interfaces (BCI) enable humans or animals to communicate or activate external devices without muscle activity using electric brain signals. The BCI Thought Translation Device (TTD) uses learned regulation of slow cortical potentials (SCPs), a skill most people and paralyzed patients can acquire with training periods of several hours up to months. The neurophysiological mechanisms and anatomical sources of SCPs and other event-related brain macro-potentials are well understood, but the neural mechanisms underlying learning of the self-regulation skill for BCI-use are unknown. To uncover the relevant areas of brain activation during regulation of SCPs, the TTD was combined with functional MRI and EEG was recorded inside the MRI scanner in twelve healthy participants who have learned to regulate their SCP with feedback and reinforcement. The results demonstrate activation of specific brain areas during execution of the brain regulation skill: successf! ul control of cortical positivity allowing a person to activate an external device was closely related to an increase of BOLD (blood oxygen level dependent) response in the basal ganglia and frontal premotor deactivation indicating learned regulation of a cortical-striatal loop responsible for local excitation thresholds of cortical assemblies. The data suggest that human users of a BCI learn the regulation of cortical excitation thresholds of large neuronal assemblies as a prerequisite of direct brain communication: the learning of this skill depends critically on an intact and flexible interaction between these cortico-basal ganglia-circuits. Supported by the Deutsche Forschungsgemeinschaft (DFG) and the National Institute of Health (NIH).
BibTeX

Empirical Inference Conference Paper On the Convergence of Spectral Clustering on Random Samples: The Normalized Case von Luxburg, U., Bousquet, O., Belkin, M. In Proceedings of the 17th Annual Conference on Learning Theory, 457-471, Proceedings of the 17th Annual Conference on Learning Theory, 2004 PDF PostScript BibTeX

Empirical Inference Article Phenotypic Characterization of Human Chondrocyte Cell Line C-20/A4: A Comparison between Monolayer and Alginate Suspension Culture Finger, F., Schorle, C., Söder, S., Zien, A., Goldring, M., Aigner, T. Cells Tissues Organs, 178(2):65-77, 2004
DNA microarray analysis was used to investigate the molecular phenotype of one of the first human chondrocyte cell lines, C-20/A4, derived from juvenile costal chondrocytes by immortalization with origin-defective simian virus 40 large T antigen. Clontech Human Cancer Arrays 1.2 and quantitative PCR were used to examine gene expression profiles of C-20/A4 cells cultured in the presence of serum in monolayer and alginate beads. In monolayer cultures, genes involved in cell proliferation were strongly upregulated compared to those expressed by human adult articular chondrocytes in primary culture. Of the cell cycle-regulated genes, only two, the CDK regulatory subunit and histone H4, were downregulated after culture in alginate beads, consistent with the ability of these cells to proliferate in suspension culture. In contrast, the expression of several genes that are involved in pericellular matrix formation, including MMP-14, COL6A1, fibronectin, biglycan and decorin, was upregulated when the C-20/A4 cells were transferred to suspension culture in alginate. Also, nexin-1, vimentin, and IGFBP-3, which are known to be expressed by primary chondrocytes, were differentially expressed in our study. Consistent with the proliferative phenotype of this cell line, few genes involved in matrix synthesis and turnover were highly expressed in the presence of serum. These results indicate that immortalized chondrocyte cell lines, rather than substituting for primary chondrocytes, may serve as models for extending findings on chondrocyte function not achievable by the use of primary chondrocytes.
BibTeX

Empirical Inference Book Chapter Protein Classification via Kernel Matrix Completion Kin, T., Kato, T., Tsuda, K. In 261-274, (Editors: Schoelkopf, B., K. Tsuda and J.P. Vert), MIT Press, Cambridge, MA; USA, 2004 PDF BibTeX

Empirical Inference Conference Paper Protein Functional Class Prediction with a Combined Graph Shin, H., Tsuda, K., Schölkopf, B. In Proceedings of the Korean Data Mining Conference, 200-219, Proceedings of the Korean Data Mining Conference, 2004
In bioinformatics, there exist multiple descriptions of graphs for the same set of genes or proteins. For instance, in yeast systems, graph edges can represent different relationships such as protein-protein interactions, genetic interactions, or co-participation in a protein complex, etc. Relying on similarities between nodes, each graph can be used independently for prediction of protein function. However, since different graphs contain partly independent and partly complementary information about the problem at hand, one can enhance the total information extracted by combining all graphs. In this paper, we propose a method for integrating multiple graphs within a framework of semi-supervised learning. The method alternates between minimizing the objective function with respect to network output and with respect to combining weights. We apply the method to the task of protein functional class prediction in yeast. The proposed method performs significantly better than the same algorithm trained on any single graph.
PDF BibTeX

Empirical Inference Article Protein ranking: from local to global structure in the protein similarity network Weston, J., Elisseeff, A., Zhou, D., Leslie, C., Noble, W. Proceedings of the National Academy of Science, 101(17):6559-6563, 2004
Biologists regularly search databases of DNA or protein sequences for evolutionary or functional relationships to a given query sequence. We describe a ranking algorithm that exploits the entire network structure of similarity relationships among proteins in a sequence database by performing a diffusion operation on a pre-computed, weighted network. The resulting ranking algorithm, evaluated using a human-curated database of protein structures, is efficient and provides significantly better rankings than a local network search algorithm such as PSI-BLAST.
Web BibTeX