Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Empirical Inference Ph.D. Thesis Classification and Feature Extraction in Man and Machine Graf, A. Biologische Kybernetik, University of Tübingen, Germany, 2004, online publication BibTeX

Empirical Inference Poster Classification and Memory Behaviour of Man Revisited by Machine Graf, A., Wichmann, F., Bülthoff, H., Schölkopf, B. CSHL Meeting on Computational & Systems Neuroscience (COSYNE), 2004 BibTeX

Empirical Inference Conference Paper Clustering Protein Sequence and Structure Space with Infinite Gaussian Mixture Models Dubey, A., Hwang, S., Rangel, C., Rasmussen, C., Ghahramani, Z., Wild, D. In Pacific Symposium on Biocomputing 2004; Vol. 9, 399-410, World Scientific Publishing, Singapore, Pacific Symposium on Biocomputing, 2004
We describe a novel approach to the problem of automatically clustering protein sequences and discovering protein families, subfamilies etc., based on the thoery of infinite Gaussian mixture models. This method allows the data itself to dictate how many mixture components are required to model it, and provides a measure of the probability that two proteins belong to the same cluster. We illustrate our methods with application to three data sets: globin sequences, globin sequences with known tree-dimensional structures and G-pretein coupled receptor sequences. The consistency of the clusters indicate that that our methods is producing biologically meaningful results, which provide a very good indication of the underlying families and subfamilies. With the inclusion of secondary structure and residue solvent accessibility information, we obtain a classification of sequences of known structure which reflects and extends their SCOP classifications. A supplementary web site containing larger versions of the figures is available at http://public.kgi.edu/~wild/PSB04
PDF BibTeX

Empirical Inference Book Chapter Concentration Inequalities Boucheron, S., Lugosi, G., Bousquet, O. In Lecture Notes in Artificial Intelligence 3176:208-240, (Editors: Bousquet, O., U. von Luxburg and G. Rätsch), Springer, Heidelberg, Germany, 2004 PDF BibTeX

Empirical Inference Technical Report Confidence Sets for Ratios: A Purely Geometric Approach To Fieller’s Theorem von Luxburg, U., Franz, V. (133), Max Planck Institute for Biological Cybernetics, 2004
We present a simple, geometric method to construct Fieller's exact confidence sets for ratios of jointly normally distributed random variables. Contrary to previous geometric approaches in the literature, our method is valid in the general case where both sample mean and covariance are unknown. Moreover, not only the construction but also its proof are purely geometric and elementary, thus giving intuition into the nature of the confidence sets.
PDF BibTeX

Empirical Inference Poster Early visual processing—data, theory, models Wichmann, F. Experimentelle Psychologie. Beitr{\"a}ge zur 46. Tagung experimentell arbeitender Psychologen, 46:24, 2004 BibTeX

Empirical Inference Conference Paper Efficient Approximations for Support Vector Machines in Object Detection Kienzle, W., BakIr, G., Franz, M., Schölkopf, B. In Pattern Recognition, Proceedings of the 26th DAGM Symposium, DAGM 2004, 54-61, (Editors: CE Rasmussen and HH Bülthoff and B Schölkopf and MA Giese), Springer, Berlin, Germany, Pattern Recognition, Proceedings of the 26th DAGM Symposium, 2004
We present a new approximation scheme for support vector decision functions in object detection. In the present approach we are building on an existing algorithm where the set of support vectors is replaced by a smaller so-called reduced set of synthetic points. Instead of finding the reduced set via unconstrained optimization, we impose a structural constraint on the synthetic vectors such that the resulting approximation can be evaluated via separable filters. Applications that require scanning an entire image can benefit from this representation: when using separable filters, the average computational complexity for evaluating a reduced set vector on a test patch of size (h x w) drops from O(hw) to O(h+w). We show experimental results on handwritten digits and face detection.
PDF BibTeX

Empirical Inference Article Feature Selection for Support Vector Machines Using Genetic Algorithms Fröhlich, H., Chapelle, O., Schölkopf, B. International Journal on Artificial Intelligence Tools (Special Issue on Selected Papers from the 15th IEEE International Conference on Tools with Artificial Intelligence 2003), 13(4):791-800, 2004 Web BibTeX

Empirical Inference Conference Paper Gasussian process model based predictive control Kocijan, J., Murray-Smith, R., Rasmussen, C., Girard, A. In Proceedings of the ACC 2004, 2214-2219, Proceedings of the ACC, 2004
Gaussian process models provide a probabilistic non-parametric modelling approach for black-box identi cation of non-linear dynamic systems. The Gaussian processes can highlight areas of the input space where prediction quality is poor, due to the lack of data or its complexity, by indicating the higher variance around the predicted mean. Gaussian process models contain noticeably less coef cients to be optimised. This paper illustrates possible application of Gaussian process models within model-based predictive control. The extra information provided within Gaussian process model is used in predictive control, where optimisation of control signal takes the variance information into account. The predictive control principle is demonstrated on control of pH process benchmark.
PDF PostScript BibTeX

Empirical Inference Book Chapter Gaussian Processes in Machine Learning Rasmussen, C. In 3176:63-71, Lecture Notes in Computer Science, (Editors: Bousquet, O., U. von Luxburg and G. Rätsch), Springer, Heidelberg, 2004, Copyright by Springer
We give a basic introduction to Gaussian Process regression models. We focus on understanding the role of the stochastic process and how it is used to define a distribution over functions. We present the simple equations for incorporating training data and examine how to learn the hyperparameters using the marginal likelihood. We explain the practical advantages of Gaussian Process and end with conclusions and a look at the current trends in GP work.
PDF PostScript BibTeX

Empirical Inference Conference Paper Hilbertian Metrics on Probability Measures and their Application in SVM’s Hein, H., Lal, T., Bousquet, O. In Pattern Recognition, Proceedings of th 26th DAGM Symposium, 3175:270-277, Lecture Notes in Computer Science, (Editors: Rasmussen, C. E., H. H. Bülthoff, M. Giese and B. Schölkopf), Pattern Recognition, Proceedings of th 26th DAGM Symposium, 2004
The goal of this article is to investigate the field of Hilbertian metrics on probability measures. Since they are very versatile and can therefore be applied in various problems they are of great interest in kernel methods. Quit recently Tops{o}e and Fuglede introduced a family of Hilbertian metrics on probability measures. We give basic properties of the Hilbertian metrics of this family and other used metrics in the literature. Then we propose an extension of the considered metrics which incorporates structural information of the probability space into the Hilbertian metric. Finally we compare all proposed metrics in an image and text classification problem using histogram data.
PDF PostScript BibTeX

Empirical Inference Poster Implicit Wiener series for capturing higher-order interactions in images Franz, M., Schölkopf, B. Sensory coding and the natural environment, (Editors: Olshausen, B.A. and M. Lewicki), 2004
The information about the objects in an image is almost exclusively described by the higher-order interactions of its pixels. The Wiener series is one of the standard methods to systematically characterize these interactions. However, the classical estimation method of the Wiener expansion coefficients via cross-correlation suffers from severe problems that prevent its application to high-dimensional and strongly nonlinear signals such as images. We propose an estimation method based on regression in a reproducing kernel Hilbert space that overcomes these problems using polynomial kernels as known from Support Vector Machines and other kernel-based methods. Numerical experiments show performance advantages in terms of convergence, interpretability and system sizes that can be handled. By the time of the conference, we will be able to present first results on the higher-order structure of natural images.
BibTeX

Empirical Inference Conference Paper Implicit estimation of Wiener series Franz, M., Schölkopf, B. In Machine Learning for Signal Processing XIV, Proc. 2004 IEEE Signal Processing Society Workshop, 735-744, (Editors: A Barros and J Principe and J Larsen and T Adali and S Douglas), IEEE, New York, Machine Learning for Signal Processing XIV, Proc. 2004 IEEE Signal Processing Society Workshop, 2004
The Wiener series is one of the standard methods to systematically characterize the nonlinearity of a system. The classical estimation method of the expansion coefficients via cross-correlation suffers from severe problems that prevent its application to high-dimensional and strongly nonlinear systems. We propose an implicit estimation method based on regression in a reproducing kernel Hilbert space that alleviates these problems. Experiments show performance advantages in terms of convergence, interpretability, and system sizes that can be handled.
PDF PostScript BibTeX

Empirical Inference Book Chapter Introduction to Statistical Learning Theory Bousquet, O., Boucheron, S., Lugosi, G. In Lecture Notes in Artificial Intelligence 3176:169-207, (Editors: Bousquet, O., U. von Luxburg and G. Rätsch), Springer, Heidelberg, Germany, 2004 PDF BibTeX

Empirical Inference Article Kernel Methods and their Potential Use in Signal Processing Perez-Cruz, F., Bousquet, O. IEEE Signal Processing Magazine, (Special issue on Signal Processing for Mining), 2004 (Accepted) PostScript BibTeX

Empirical Inference Conference Paper Kernel Methods for Manifold Estimation Schölkopf, B. In Proceedings in Computational Statistics, Proceedings in Computational Statistics, 441-452, (Editors: J Antoch), Physica-Verlag/Springer, Heidelberg, Germany, COMPSTAT, 2004 BibTeX

Empirical Inference Book Chapter Kernels for graphs Kashima, H., Tsuda, K., Inokuchi, A. In 155-170, (Editors: Schoelkopf, B., K. Tsuda and J.P. Vert), MIT Press, Cambridge, MA; USA, 2004 PDF BibTeX

Empirical Inference Conference Paper Learning from Labeled and Unlabeled Data Using Random Walks Zhou, D., Schölkopf, B. In Pattern Recognition, Proceedings of the 26th DAGM Symposium, 237-244, (Editors: Rasmussen, C.E., H.H. Bülthoff, M.A. Giese and B. Schölkopf), Pattern Recognition, Proceedings of the 26th DAGM Symposium, 2004
We consider the general problem of learning from labeled and unlabeled data. Given a set of points, some of them are labeled, and the remaining points are unlabeled. The goal is to predict the labels of the unlabeled points. Any supervised learning algorithm can be applied to this problem, for instance, Support Vector Machines (SVMs). The problem of our interest is if we can implement a classifier which uses the unlabeled data information in some way and has higher accuracy than the classifiers which use the labeled data only. Recently we proposed a simple algorithm, which can substantially benefit from large amounts of unlabeled data and demonstrates clear superiority to supervised learning methods. In this paper we further investigate the algorithm using random walks and spectral graph theory, which shed light on the key steps in this algorithm.
PDF PostScript BibTeX

Empirical Inference Technical Report Learning from Labeled and Unlabeled Data Using Random Walks Zhou, D., Schölkopf, B. Max Planck Institute for Biological Cybernetics, 2004
We consider the general problem of learning from labeled and unlabeled data. Given a set of points, some of them are labeled, and the remaining points are unlabeled. The goal is to predict the labels of the unlabeled points. Any supervised learning algorithm can be applied to this problem, for instance, Support Vector Machines (SVMs). The problem of our interest is if we can implement a classifier which uses the unlabeled data information in some way and has higher accuracy than the classifiers which use the labeled data only. Recently we proposed a simple algorithm, which can substantially benefit from large amounts of unlabeled data and demonstrates clear superiority to supervised learning methods. In this paper we further investigate the algorithm using random walks and spectral graph theory, which shed light on the key steps in this algorithm.
PDF PostScript BibTeX

Empirical Inference Poster Masking by plaid patterns revisited Wichmann, F. Experimentelle Psychologie. Beitr{\"a}ge zur 46. Tagung experimentell arbeitender Psychologen, 46:285, 2004 BibTeX

Empirical Inference Conference Paper Maximal Margin Classification for Metric Spaces Hein, M., Bousquet, O. In Learning Theory and Kernel Machines, 72-86, (Editors: Schölkopf, B. and Warmuth, M. K.), Springer, Heidelberg, Germany, 16. Annual Conference on Computational Learning Theory / COLT Kernel, 2004
In this article we construct a maximal margin classification algorithm for arbitrary metric spaces. At first we show that the Support Vector Machine (SVM) is a maximal margin algorithm for the class of metric spaces where the negative squared distance is conditionally positive definite (CPD). This means that the metric space can be isometrically embedded into a Hilbert space, where one performs linear maximal margin separation. We will show that the solution only depends on the metric, but not on the kernel. Following the framework we develop for the SVM, we construct an algorithm for maximal margin classification in arbitrary metric spaces. The main difference compared with SVM is that we no longer embed isometrically into a Hilbert space, but a Banach space. We further give an estimate of the capacity of the function class involved in this algorithm via Rademacher averages. We recover an algorithm of Graepel et al. [6].
PDF PostScript PDF DOI BibTeX

Empirical Inference Conference Paper Multivariate Regression via Stiefel Manifold Constraints BakIr, G., Gretton, A., Franz, M., Schölkopf, B. In Pattern Recognition, Proceedings of the 26th DAGM Symposium, Lecture Notes in Computer Science, Vol. 3175, 262-269, (Editors: CE Rasmussen and HH Bülthoff and B Schölkopf and MA Giese), Springer, Berlin, Germany, Pattern Recognition, Proceedings of the 26th DAGM Symposium, 2004
We introduce a learning technique for regression between high-dimensional spaces. Standard methods typically reduce this task to many one-dimensional problems, with each output dimension considered independently. By contrast, in our approach the feature construction and the regression estimation are performed jointly, directly minimizing a loss function that we specify, subject to a rank constraint. A major advantage of this approach is that the loss is no longer chosen according to the algorithmic requirements, but can be tailored to the characteristics of the task at hand; the features will then be optimal with respect to this objective, and dependence between the outputs can be exploited.
PostScript BibTeX

Empirical Inference Technical Report Multivariate Regression with Stiefel Constraints Bakir, G., Gretton, A., Franz, M., Schölkopf, B. (128), MPI for Biological Cybernetics, Spemannstr 38, 72076, Tuebingen, 2004
We introduce a new framework for regression between multi-dimensional spaces. Standard methods for solving this problem typically reduce the problem to one-dimensional regression by choosing features in the input and/or output spaces. These methods, which include PLS (partial least squares), KDE (kernel dependency estimation), and PCR (principal component regression), select features based on different a-priori judgments as to their relevance. Moreover, loss function and constraints are chosen not primarily on statistical grounds, but to simplify the resulting optimisation. By contrast, in our approach the feature construction and the regression estimation are performed jointly, directly minimizing a loss function that we specify, subject to a rank constraint. A major advantage of this approach is that the loss is no longer chosen according to the algorithmic requirements, but can be tailored to the characteristics of the task at hand; the features will then be optimal with respect to this objective. Our approach also allows for the possibility of using a regularizer in the optimization. Finally, by processing the observations sequentially, our algorithm is able to work on large scale problems.
PDF BibTeX

Empirical Inference Poster Neural mechanisms underlying control of a Brain-Computer-Interface (BCI): Simultaneous recording of bold-response and EEG Hinterberger, T., Wilhelm, B., Veit, R., Weiskopf, N., Lal, T., Birbaumer, N. 2004
Brain computer interfaces (BCI) enable humans or animals to communicate or activate external devices without muscle activity using electric brain signals. The BCI Thought Translation Device (TTD) uses learned regulation of slow cortical potentials (SCPs), a skill most people and paralyzed patients can acquire with training periods of several hours up to months. The neurophysiological mechanisms and anatomical sources of SCPs and other event-related brain macro-potentials are well understood, but the neural mechanisms underlying learning of the self-regulation skill for BCI-use are unknown. To uncover the relevant areas of brain activation during regulation of SCPs, the TTD was combined with functional MRI and EEG was recorded inside the MRI scanner in twelve healthy participants who have learned to regulate their SCP with feedback and reinforcement. The results demonstrate activation of specific brain areas during execution of the brain regulation skill: successf! ul control of cortical positivity allowing a person to activate an external device was closely related to an increase of BOLD (blood oxygen level dependent) response in the basal ganglia and frontal premotor deactivation indicating learned regulation of a cortical-striatal loop responsible for local excitation thresholds of cortical assemblies. The data suggest that human users of a BCI learn the regulation of cortical excitation thresholds of large neuronal assemblies as a prerequisite of direct brain communication: the learning of this skill depends critically on an intact and flexible interaction between these cortico-basal ganglia-circuits. Supported by the Deutsche Forschungsgemeinschaft (DFG) and the National Institute of Health (NIH).
BibTeX

Empirical Inference Conference Paper On the Convergence of Spectral Clustering on Random Samples: The Normalized Case von Luxburg, U., Bousquet, O., Belkin, M. In Proceedings of the 17th Annual Conference on Learning Theory, 457-471, Proceedings of the 17th Annual Conference on Learning Theory, 2004 PDF PostScript BibTeX

Empirical Inference Article Phenotypic Characterization of Human Chondrocyte Cell Line C-20/A4: A Comparison between Monolayer and Alginate Suspension Culture Finger, F., Schorle, C., Söder, S., Zien, A., Goldring, M., Aigner, T. Cells Tissues Organs, 178(2):65-77, 2004
DNA microarray analysis was used to investigate the molecular phenotype of one of the first human chondrocyte cell lines, C-20/A4, derived from juvenile costal chondrocytes by immortalization with origin-defective simian virus 40 large T antigen. Clontech Human Cancer Arrays 1.2 and quantitative PCR were used to examine gene expression profiles of C-20/A4 cells cultured in the presence of serum in monolayer and alginate beads. In monolayer cultures, genes involved in cell proliferation were strongly upregulated compared to those expressed by human adult articular chondrocytes in primary culture. Of the cell cycle-regulated genes, only two, the CDK regulatory subunit and histone H4, were downregulated after culture in alginate beads, consistent with the ability of these cells to proliferate in suspension culture. In contrast, the expression of several genes that are involved in pericellular matrix formation, including MMP-14, COL6A1, fibronectin, biglycan and decorin, was upregulated when the C-20/A4 cells were transferred to suspension culture in alginate. Also, nexin-1, vimentin, and IGFBP-3, which are known to be expressed by primary chondrocytes, were differentially expressed in our study. Consistent with the proliferative phenotype of this cell line, few genes involved in matrix synthesis and turnover were highly expressed in the presence of serum. These results indicate that immortalized chondrocyte cell lines, rather than substituting for primary chondrocytes, may serve as models for extending findings on chondrocyte function not achievable by the use of primary chondrocytes.
BibTeX

Empirical Inference Book Chapter Protein Classification via Kernel Matrix Completion Kin, T., Kato, T., Tsuda, K. In 261-274, (Editors: Schoelkopf, B., K. Tsuda and J.P. Vert), MIT Press, Cambridge, MA; USA, 2004 PDF BibTeX

Empirical Inference Conference Paper Protein Functional Class Prediction with a Combined Graph Shin, H., Tsuda, K., Schölkopf, B. In Proceedings of the Korean Data Mining Conference, 200-219, Proceedings of the Korean Data Mining Conference, 2004
In bioinformatics, there exist multiple descriptions of graphs for the same set of genes or proteins. For instance, in yeast systems, graph edges can represent different relationships such as protein-protein interactions, genetic interactions, or co-participation in a protein complex, etc. Relying on similarities between nodes, each graph can be used independently for prediction of protein function. However, since different graphs contain partly independent and partly complementary information about the problem at hand, one can enhance the total information extracted by combining all graphs. In this paper, we propose a method for integrating multiple graphs within a framework of semi-supervised learning. The method alternates between minimizing the objective function with respect to network output and with respect to combining weights. We apply the method to the task of protein functional class prediction in yeast. The proposed method performs significantly better than the same algorithm trained on any single graph.
PDF BibTeX

Empirical Inference Article Protein ranking: from local to global structure in the protein similarity network Weston, J., Elisseeff, A., Zhou, D., Leslie, C., Noble, W. Proceedings of the National Academy of Science, 101(17):6559-6563, 2004
Biologists regularly search databases of DNA or protein sequences for evolutionary or functional relationships to a given query sequence. We describe a ranking algorithm that exploits the entire network structure of similarity relationships among proteins in a sequence database by performing a diffusion operation on a pre-computed, weighted network. The resulting ranking algorithm, evaluated using a human-curated database of protein structures, is efficient and provides significantly better rankings than a local network search algorithm such as PSI-BLAST.
Web BibTeX

Empirical Inference Conference Paper Semi-supervised kernel regression using whitened function classes Franz, M., Kwon, Y., Rasmussen, C., Schölkopf, B. In Pattern Recognition, Proceedings of the 26th DAGM Symposium, Pattern Recognition, Proceedings of the 26th DAGM Symposium, Lecture Notes in Computer Science, Vol. 3175, LNCS 3175:18-26, (Editors: CE Rasmussen and HH Bülthoff and MA Giese and B Schölkopf), Springer, Berlin, Gerrmany, 26th DAGM Symposium, 2004
The use of non-orthonormal basis functions in ridge regression leads to an often undesired non-isotropic prior in function space. In this study, we investigate an alternative regularization technique that results in an implicit whitening of the basis functions by penalizing directions in function space with a large prior variance. The regularization term is computed from unlabelled input data that characterizes the input distribution. Tests on two datasets using polynomial basis functions showed an improved average performance compared to standard ridge regression.
PDF PostScript BibTeX

Empirical Inference Article Some observations on the effects of slant and texture type on slant-from-texture Rosas, P., Wichmann, F., Wagemans, J. Vision Research, 44(13):1511-1535, 2004
We measure the performance of five subjects in a slant-discrimination task for differently textured planes. As textures we used uniform lattices, randomly displaced lattices, circles (polka dots), Voronoi tessellations, plaids, 1/f noise, “coherent” noise and a leopard skin-like texture. Our results show: (1) Improving performance with larger slants for all textures. (2) Thus, following from (1), cases of “non-symmetrical” performance around a particular orientation. (3) For orientations sufficiently slanted, the different textures do not elicit major differences in performance, (4) while for orientations closer to the vertical plane there are marked differences between them. (5) These differences allow a rank-order of textures to be formed according to their “helpfulness”– that is, how easy the discrimination task is when a particular texture is mapped on the plane. Polka dots tend to allow the best slant discrimination performance, noise patterns the worst. Two additional experiments were conducted to test the generality of the obtained rank-order. First, the tilt of the planes was rotated to break the axis of gravity present in the original discrimination experiment. Second, the task was changed to a slant report task via probe adjustment. The results of both control experiments confirmed the texture-based rank-order previously obtained. We comment on the importance of these results for depth perception research in general, and in particular the implications our results have for studies of cue combination (sensor fusion) using texture as one of the cues involved.
PDF BibTeX

Empirical Inference Ph.D. Thesis Statistical Learning with Similarity and Dissimilarity Functions von Luxburg, U. 1-166, Technische Universität Berlin, Germany, Technische Universität Berlin, Germany, 2004 PDF PostScript BibTeX

Empirical Inference Miscellaneous Statistische Lerntheorie und Empirische Inferenz Schölkopf, B. Jahrbuch der Max-Planck-Gesellschaft, 2004:377-382, 2004
Statistical learning theory studies the process of inferring regularities from empirical data. The fundamental problem is what is called generalization: how it is possible to infer a law which will be valid for an infinite number of future observations, given only a finite amount of data? This problem hinges upon fundamental issues of statistics and science in general, such as the problems of complexity of explanations, a priori knowledge, and representation of data.
PDF Web BibTeX

Empirical Inference Technical Report Transductive Inference with Graphs Zhou, D., Schölkopf, B. Max Planck Institute for Biological Cybernetics, 2004, See the improved version Regularization on Discrete Spaces.
We propose a general regularization framework for transductive inference. The given data are thought of as a graph, where the edges encode the pairwise relationships among data. We develop discrete analysis and geometry on graphs, and then naturally adapt the classical regularization in the continuous case to the graph situation. A new and effective algorithm is derived from this general framework, as well as an approach we developed before.
BibTeX

Empirical Inference Conference Paper Unifying Colloborative and Content-Based Filtering. Basilico, J., Hofmann, T. In Proceedings of the 21st International Conference on Machine Learning, ACM International Conference Proceeding Series, 65 , (Editors: Greiner, R. , D. Schuurmans), ACM Press, New York, USA, ICLM, 2004
Collaborative and content-based filtering are two paradigms that have been applied in the context of recommender systems and user preference prediction. This paper proposes a novel, unified approach that systematically integrates all available training information such as past user-item ratings as well as attributes of items or users to learn a prediction function. The key ingredient of our method is the design of a suitable kernel or similarity function between user-item pairs that allows simultaneous generalization across the user and item dimensions. We propose an on-line algorithm (JRank) that generalizes perceptron learning. Experimental results on the EachMovie data set show significant improvements over standard approaches.
PDF BibTeX

Empirical Inference Technical Report Support Vector Channel Selection in BCI Lal, T., Schröder, M., Hinterberger, T., Weston, J., Bogdan, M., Birbaumer, N., Schölkopf, B. (120), Max Planck Institute for Biological Cybernetics, Tuebingen, Germany, December 2003
Designing a Brain Computer Interface (BCI) system one can choose from a variety of features that may be useful for classifying brain activity during a mental task. For the special case of classifying EEG signals we propose the usage of the state of the art feature selection algorithms Recursive Feature Elimination [3] and Zero-Norm Optimization [13] which are based on the training of Support Vector Machines (SVM) [11]. These algorithms can provide more accurate solutions than standard filter methods for feature selection [14]. We adapt the methods for the purpose of selecting EEG channels. For a motor imagery paradigm we show that the number of used channels can be reduced significantly without increasing the classification error. The resulting best channels agree well with the expected underlying cortical activity patterns during the mental tasks. Furthermore we show how time dependent task specific information can be visualized.
PDF Web BibTeX

Empirical Inference Poster Texture and haptic cues in slant discrimination: Measuring the effect of texture type on cue combination Rosas, P., Wichmann, F., Ernst, M., Wagemans, J. Journal of Vision, 3(12):26, 2003 Fall Vision Meeting of the Optical Society of America, December 2003
In a number of models of depth cue combination the depth percept is constructed via a weighted average combination of independent depth estimations. The influence of each cue in such average depends on the reliability of the source of information. (Young, Landy, & Maloney, 1993; Ernst & Banks, 2002.) In particular, Ernst & Banks (2002) formulate the combination performed by the human brain as that of the minimum variance unbiased estimator that can be constructed from the available cues. Using slant discrimination and slant judgment via probe adjustment as tasks, we have observed systematic differences in performance of human observers when a number of different types of textures were used as cue to slant (Rosas, Wichmann & Wagemans, 2003). If the depth percept behaves as described above, our measurements of the slopes of the psychometric functions provide the predicted weights for the texture cue for the ranked texture types. We have combined these texture types with object motion but the obtained results are difficult to reconcile with the unbiased minimum variance estimator model (Rosas & Wagemans, 2003). This apparent failure of such model might be explained by the existence of a coupling of texture and motion, violating the assumption of independence of cues. Hillis, Ernst, Banks, & Landy (2002) have shown that while for between-modality combination the human visual system has access to the single-cue information, for within-modality combination (visual cues: disparity and texture) the single-cue information is lost, suggesting a coupling between these cues. Then, in the present study we combine the different texture types with haptic information in a slant discrimination task, to test whether in the between-modality condition the texture cue and the haptic cue to slant are combined as predicted by an unbiased, minimum variance estimator model.
Web DOI BibTeX

Empirical Inference Article Concentration Inequalities for Sub-Additive Functions Using the Entropy Method Bousquet, O. Stochastic Inequalities and Applications, 56:213-247, Progress in Probability, (Editors: Giné, E., C. Houdré and D. Nualart), November 2003
We obtain exponential concentration inequalities for sub-additive functions of independent random variables under weak conditions on the increments of those functions, like the existence of exponential moments for these increments. As a consequence of these general inequalities, we obtain refinements of Talagrand's inequality for empirical processes and new bounds for randomized empirical processes. These results are obtained by further developing the entropy method introduced by Ledoux.
PostScript BibTeX

Empirical Inference Proceedings Learning Theory and Kernel Machines: 16th Annual Conference on Learning Theory and 7th Kernel Workshop (COLT/Kernel 2003), LNCS Vol. 2777 Schölkopf, B., Warmuth, M. Proceedings of the 16th Annual Conference on Learning Theory and 7th Kernel Workshop (COLT/Kernel 2003), COLT/Kernel 2003, 746, Springer, Berlin, Germany, 16th Annual Conference on Learning Theory and 7th Kernel Workshop, November 2003, Lecture Notes in Computer Science ; 2777 DOI BibTeX

Empirical Inference Conference Paper Cluster Kernels for Semi-Supervised Learning Chapelle, O., Weston, J., Schölkopf, B. In Advances in Neural Information Processing Systems, Advances in Neural Information Processing Systems 15, 585-592, (Editors: S Becker and S Thrun and K Obermayer), MIT Press, Cambridge, MA, USA, 16th Annual Conference on Neural Information Processing Systems (NIPS 2002), October 2003
We propose a framework to incorporate unlabeled data in kernel classifier, based on the idea that two points in the same cluster are more likely to have the same label. This is achieved by modifying the eigenspectrum of the kernel matrix. Experimental results assess the validity of this approach.
PDF Web BibTeX

Empirical Inference Conference Paper Clustering with the Fisher score Tsuda, K., Kawanabe, M., Müller, K. In Advances in Neural Information Processing Systems 15, Advances in Neural Information Processing Systems 15, 729-736, (Editors: Becker, S. , S. Thrun, K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS 2002), October 2003
Recently the Fisher score (or the Fisher kernel) is increasingly used as a feature extractor for classification problems. The Fisher score is a vector of parameter derivatives of loglikelihood of a probabilistic model. This paper gives a theoretical analysis about how class information is preserved in the space of the Fisher score, which turns out that the Fisher score consists of a few important dimensions with class information and many nuisance dimensions. When we perform clustering with the Fisher score, K-Means type methods are obviously inappropriate because they make use of all dimensions. So we will develop a novel but simple clustering algorithm specialized for the Fisher score, which can exploit important dimensions. This algorithm is successfully tested in experiments with artificial data and real data (amino acid sequences).
PDF Web BibTeX

Empirical Inference Technical Report Image Reconstruction by Linear Programming Tsuda, K., Rätsch, G. (118), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, October 2003 PDF BibTeX

Empirical Inference Conference Paper Kernel Dependency Estimation Weston, J., Chapelle, O., Elisseeff, A., Schölkopf, B., Vapnik, V. In Advances in Neural Information Processing Systems 15, 873-880, (Editors: S Becker and S Thrun and K Obermayer), MIT Press, Cambridge, MA, USA, 16th Annual Conference on Neural Information Processing Systems (NIPS 2002), October 2003 PDF Web BibTeX

Empirical Inference Conference Paper Linear Combinations of Optic Flow Vectors for Estimating Self-Motion: a Real-World Test of a Neural Model Franz, M., Chahl, J. In Advances in Neural Information Processing Systems, Advances in Neural Information Processing Systems 15, 1319-1326, (Editors: Becker, S., S. Thrun and K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS 2002), October 2003
The tangential neurons in the fly brain are sensitive to the typical optic flow patterns generated during self-motion. In this study, we examine whether a simplified linear model of these neurons can be used to estimate self-motion from the optic flow. We present a theory for the construction of an estimator consisting of a linear combination of optic flow vectors that incorporates prior knowledge both about the distance distribution of the environment, and about the noise and self-motion statistics of the sensor. The estimator is tested on a gantry carrying an omnidirectional vision sensor. The experiments show that the proposed approach leads to accurate and robust estimates of rotation rates, whereas translation estimates turn out to be less reliable.
PDF Web BibTeX

Empirical Inference Conference Paper Mismatch String Kernels for SVM Protein Classification Leslie, C., Eskin, E., Weston, J., Noble, W. In Advances in Neural Information Processing Systems, Advances in Neural Information Processing Systems 15, 1417-1424, (Editors: Becker, S. , S. Thrun, K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS 2002), October 2003
We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. These kernels measure sequence similarity based on shared occurrences of k-length subsequences, counted with up to m mismatches, and do not rely on any generative model for the positive training sequences. We compute the kernels efficiently using a mismatch tree data structure and report experiments on a benchmark SCOP dataset, where we show that the mismatch kernel used with an SVM classifier performs as well as the Fisher kernel, the most successful method for remote homology detection, while achieving considerable computational savings.
PDF Web BibTeX

Empirical Inference Conference Paper On the Complexity of Learning the Kernel Matrix Bousquet, O., Herrmann, D. In Advances in Neural Information Processing Systems, 15, Advances in Neural Information Processing Systems 15, 399-406, (Editors: Becker, S. , S. Thrun, K. Obermayer), The MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS 2002), October 2003
We investigate data based procedures for selecting the kernel when learning with Support Vector Machines. We provide generalization error bounds by estimating the Rademacher complexities of the corresponding function classes. In particular we obtain a complexity bound for function classes induced by kernels with given eigenvectors, i.e., we allow to vary the spectrum and keep the eigenvectors fix. This bound is only a logarithmic factor bigger than the complexity of the function class induced by a single kernel. However, optimizing the margin over such classes leads to overfitting. We thus propose a suitable way of constraining the class. We use an efficient algorithm to solve the resulting optimization problem, present preliminary experimental results, and compare them to an alignment-based approach.
PDF Web BibTeX

Empirical Inference Thesis Real-Time Face Detection Kienzle, W. Biologische Kybernetik, Eberhard-Karls-Universitaet Tuebingen, Tuebingen, Germany, October 2003 BibTeX