Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Empirical Inference Conference Paper Evaluating Predictive Uncertainty Challenge Quinonero Candela, J., Rasmussen, C., Sinz, F., Bousquet, O., Schölkopf, B. In Machine Learning Challenges: First PASCAL Machine Learning Challenges Workshop (MLCW 2005), Machine Learning Challenges: Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment, 1-27, (Editors: J Quiñonero Candela and I Dagan and B Magnini and F d’Alché-Buc), Springer, Berlin, Germany, First PASCAL Machine Learning Challenges Workshop (MLCW 2005), April 2006
This Chapter presents the PASCAL Evaluating Predictive Uncertainty Challenge, introduces the contributed Chapters by the participants who obtained outstanding results, and provides a discussion with some lessons to be learnt. The Challenge was set up to evaluate the ability of Machine Learning algorithms to provide good “probabilistic predictions”, rather than just the usual “point predictions” with no measure of uncertainty, in regression and classification problems. Parti-cipants had to compete on a number of regression and classification tasks, and were evaluated by both traditional losses that only take into account point predictions and losses we proposed that evaluate the quality of the probabilistic predictions.
PDF Web DOI BibTeX

Empirical Inference Thesis Kernel PCA for Image Compression Huhle, B. Biologische Kybernetik, Eberhard-Karls-Universität, Tübingen, Germany, April 2006 PDF BibTeX

Empirical Inference Conference Paper Learning an Interest Operator from Human Eye Movements Kienzle, W., Wichmann, F., Schölkopf, B., Franz, M. In Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW 2006), CVPWR 2006, page 24, (Editors: C Schmid and S Soatto and C Tomasi), IEEE Computer Society, Los Alamitos, CA, USA, 2006 Conference on Computer Vision and Pattern Recognition Workshop, April 2006
We present an approach for designing interest operators that are based on human eye movement statistics. In contrast to existing methods which use hand-crafted saliency measures, we use machine learning methods to infer an interest operator directly from eye movement data. That way, the operator provides a measure of biologically plausible interestingness. We describe the data collection, training, and evaluation process, and show that our learned saliency measure significantly accounts for human eye movements. Furthermore, we illustrate connections to existing interest operators, and present a multi-scale interest point detector based on the learned function.
PDF Web DOI BibTeX

Empirical Inference Article Phase noise and the classification of natural images Wichmann, F., Braun, D., Gegenfurtner, K. Vision Research, 46(8-9):1520-1529, April 2006
We measured the effect of global phase manipulations on a rapid animal categorization task. The Fourier spectra of our images of natural scenes were manipulated by adding zero-mean random phase noise at all spatial frequencies. The phase noise was the independent variable, uniformly and symmetrically distributed between 0 degree and ±180 degrees. Subjects were remarkably resistant to phase noise. Even with ±120 degree phase noise subjects were still performing at 75% correct. The high resistance of the subjects’ animal categorization rate to phase noise suggests that the visual system is highly robust to such random image changes. The proportion of correct answers closely followed the correlation between original and the phase noise-distorted images. Animal detection rate was higher when the same task was performed with contrast reduced versions of the same natural images, at contrasts where the contrast reduction mimicked that resulting from our phase randomization. Since the subjects’ categorization rate was better in the contrast experiment, reduction of local contrast alone cannot explain the performance in the phase noise experiment. This result obtained with natural images differs from those obtained for simple sinusoidal stimuli were performance changes due to phase changes are attributed to local contrast changes only. Thus the global phasechange accompanying disruption of image structure such as edges and object boundaries at different spatial scales reduces object classification over and above the performance deficit resulting from reducing contrast. Additional colour information improves the categorization performance by 2 %.
PDF Web DOI BibTeX

Empirical Inference Article The Effect of Artifacts on Dependence Measurement in fMRI Gretton, A., Belitski, A., Murayama, Y., Schölkopf, B., Logothetis, N. Magnetic Resonance Imaging, 24(4):401-409, April 2006 PDF Web DOI BibTeX

Empirical Inference Technical Report Training a Support Vector Machine in the Primal Chapelle, O. (147), Max Planck Institute for Biological Cybernetics, Tübingen, April 2006, The version in the "Large Scale Kernel Machines" book is more up to date.
Most literature on Support Vector Machines (SVMs) concentrate on the dual optimization problem. In this paper, we would like to point out that the primal problem can also be solved efficiently, both for linear and non-linear SVMs, and there is no reason for ignoring it. Moreover, from the primal point of view, new families of algorithms for large scale SVM training can be investigated.
PDF BibTeX

Empirical Inference Poster Classification of Natural Scenes: Critical Features Revisited Drewes, J., Wichmann, F., Gegenfurtner, K. 9:92, 9th T{\"u}bingen Perception Conference (TWK 2006), March 2006
Human observers are capable of detecting animals within novel natural scenes with remarkable speed and accuracy. Despite the seeming complexity of such decisions it has been hypothesized that a simple global image feature, the relative abundance of high spatial frequencies at certain orientations, could underly such fast image classification [1]. We successfully used linear discriminant analysis to classify a set of 11.000 images into “animal” and “non-animal” images based on their individual amplitude spectra only [2]. We proceeded to sort the images based on the performance of our classifier, retaining only the best and worst classified 400 images ("best animals", "best distractors" and "worst animals", "worst distractors"). We used a Go/No-go paradigm to evaluate human performance on this subset of our images. Both reaction time and proportion of correctly classified images showed a significant effect of classification difficulty. Images more easily classified by our algorithm were also classified faster and better by humans, as predicted by the Torralba & Oliva hypothesis. We then equated the amplitude spectra of the 400 images, which, by design, reduced algorithmic performance to chance whereas human performance was only slightly reduced [3]. Most importantly, the same images as before were still classified better and faster, suggesting that even in the original condition features other than specifics of the amplitude spectrum made particular images easy to classify, clearly at odds with the Torralba & Oliva hypothesis.
Web BibTeX

Empirical Inference Poster Factorial Coding of Natural Images: How Effective are Linear Models in Removing Higher-Order Dependencies? Bethge, M. 9:90, 9th T{\"u}bingen Perception Conference (TWK 2006), March 2006
The performance of unsupervised learning models for natural images is evaluated quantitatively by means of information theory. We estimate the gain in statistical independence (the multi-information reduction) achieved with independent component analysis (ICA), principal component analysis (PCA), zero-phase whitening, and predictive coding. Predictive coding is translated into the transform coding framework, where it can be characterized by the constraint of a triangular filter matrix. A randomly sampled whitening basis and the Haar wavelet are included into the comparison as well. The comparison of all these methods is carried out for different patch sizes, ranging from 2x2 to 16x16 pixels. In spite of large differences in the shape of the basis functions, we find only small differences in the multi-information between all decorrelation transforms (5% or less) for all patch sizes. Among the second-order methods, PCA is optimal for small patch sizes and predictive coding performs best for large patch sizes. The extra gain achieved with ICA is always less than 2%. In conclusion, the `edge filters‘ found with ICA lead only to a surprisingly small improvement in terms of its actual objective.
Web BibTeX

Empirical Inference Ph.D. Thesis Gaussian Process Models for Robust Regression, Classification, and Reinforcement Learning Kuss, M. Biologische Kybernetik, Technische Universität Darmstadt, Darmstadt, Germany, March 2006, passed with distinction, published online PDF BibTeX

Empirical Inference Conference Paper Implicit Volterra and Wiener Series for Higher-Order Image Analysis Franz, M., Schölkopf, B. In Advances in Data Analysis: Proceedings of the 30th Annual Conference of The Gesellschaft für Klassifikation, 30:1, March 2006 (Published)
The computation of classical higher-order statistics such as higher-order moments or spectra is difficult for images due to the huge number of terms to be estimated and interpreted. We propose an alternative approach in which multiplicative pixel interactions are described by a series of Wiener functionals. Since the functionals are estimated implicitly via polynomial kernels, the combinatorial explosion associated with the classical higher-order statistics is avoided. In addition, the kernel framework allows for estimating infinite series expansions and for the regularized estimation of the Wiener series. First results show that image structures such as lines or corners can be predicted correctly, and that pixel interactions up to the order of five play an important role in natural images.
PDF BibTeX

Empirical Inference Conference Paper Machine Learning Methods For Estimating Operator Equations Steinke, F., Schölkopf, B. In Proceedings of the 14th IFAC Symposium on System Identification (SYSID 2006), Proceedings of the 14th IFAC Symposium on System Identification (SYSID 2006), 6, (Editors: B Ninness and H Hjalmarsson), Elsevier, Oxford, United Kingdom, 14th IFAC Symposium on System Identification (SYSID 2006), March 2006
We consider the problem of fitting a linear operator induced equation to point sampled data. In order to do so we systematically exploit the duality between minimizing a regularization functional derived from an operator and kernel regression methods. Standard machine learning model selection algorithms can then be interpreted as a search of the equation best fitting given data points. For many kernels this operator induced equation is a linear differential equation. Thus, we link a continuous-time system identification task with common machine learning methods. The presented link opens up a wide variety of methods to be applied to this system identification problem. In a series of experiments we demonstrate an example algorithm working on non-uniformly spaced data, giving special focus to the problem of identifying one system from multiple data recordings.
PDF Web BibTeX

Empirical Inference Article Network-based de-noising improves prediction from microarray data Kato, T., Murata, Y., Miura, K., Asai, K., Horton, P., Tsuda, K., Fujibuchi, W. BMC Bioinformatics, 7(Suppl. 1):S4-S4, March 2006
Prediction of human cell response to anti-cancer drugs (compounds) from microarray data is a challenging problem, due to the noise properties of microarrays as well as the high variance of living cell responses to drugs. Hence there is a strong need for more practical and robust methods than standard methods for real-value prediction. We devised an extended version of the off-subspace noise-reduction (de-noising) method to incorporate heterogeneous network data such as sequence similarity or protein-protein interactions into a single framework. Using that method, we first de-noise the gene expression data for training and test data and also the drug-response data for training data. Then we predict the unknown responses of each drug from the de-noised input data. For ascertaining whether de-noising improves prediction or not, we carry out 12-fold cross-validation for assessment of the prediction performance. We use the Pearson‘s correlation coefficient between the true and predicted respon se values as the prediction performance. De-noising improves the prediction performance for 65% of drugs. Furthermore, we found that this noise reduction method is robust and effective even when a large amount of artificial noise is added to the input data. We found that our extended off-subspace noise-reduction method combining heterogeneous biological data is successful and quite useful to improve prediction of human cell cancer drug responses from microarray data.
PDF PDF DOI BibTeX

Empirical Inference Article Statistical Properties of Kernel Principal Component Analysis Blanchard, G., Bousquet, O., Zwald, L. Machine Learning, 66(2-3):259-294, March 2006
We study the properties of the eigenvalues of Gram matrices in a non-asymptotic setting. Using local Rademacher averages, we provide data-dependent and tight bounds for their convergence towards eigenvalues of the corresponding kernel operator. We perform these computations in a functional analytic framework which allows to deal implicitly with reproducing kernel Hilbert spaces of infinite dimension. This can have applications to various kernel algorithms, such as Support Vector Machines (SVM). We focus on Kernel Principal Component Analysis (KPCA) and, using such techniques, we obtain sharp excess risk bounds for the reconstruction error. In these bounds, the dependence on the decay of the spectrum and on the closeness of successive eigenvalues is made explicit.
PDF PDF DOI BibTeX

Empirical Inference Poster The Pedestal Effect is Caused by Off-Frequency Looking, not Nonlinear Transduction or Contrast Gain-Control Wichmann, F., Henning, G. 9:174, 9th T{\"u}bingen Perception Conference (TWK 2006), March 2006
The pedestal or dipper effect is the large improvement in the detectability of a sinusoidal grating observed when the signal is added to a pedestal or masking grating having the signal‘s spatial frequency, orientation, and phase. The effect is largest with pedestal contrasts just above the ‘threshold’ in the absence of a pedestal. We measured the pedestal effect in both broadband and notched masking noise---noise from which a 1.5-octave band centered on the signal and pedestal frequency had been removed. The pedestal effect persists in broadband noise, but almost disappears with notched noise. The spatial-frequency components of the notched noise that lie above and below the spatial frequency of the signal and pedestal prevent the use of information about changes in contrast carried in channels tuned to spatial frequencies that are very much different from that of the signal and pedestal. We conclude that the pedestal effect in the absence of notched noise results principally from the use of information derived from channels with peak sensitivities at spatial frequencies that are different from that of the signal and pedestal. Thus the pedestal or dipper effect is not a characteristic of individual spatial-frequency tuned channels.
Web BibTeX

Empirical Inference Technical Report Cross-Validation Optimization for Structured Hessian Kernel Methods Seeger, M., Chapelle, O. Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, February 2006
We address the problem of learning hyperparameters in kernel methods for which the Hessian of the objective is structured. We propose an approximation to the cross-validation log likelihood whose gradient can be computed analytically, solving the hyperparameter learning problem efficiently through nonlinear optimization. Crucially, our learning method is based entirely on matrix-vector multiplication primitives with the kernel matrices and their derivatives, allowing straightforward specialization to new kernels or to large datasets. When applied to the problem of multi-way classification, our method scales linearly in the number of classes and gives rise to state-of-the-art results on a remote imaging task.
PDF Web BibTeX

Empirical Inference Article Model-based Design Analysis and Yield Optimization Pfingsten, T., Herrmann, D., Rasmussen, C. IEEE Transactions on Semiconductor Manufacturing, 19(4):475-486, February 2006
Fluctuations are inherent to any fabrication process. Integrated circuits and micro-electro-mechanical systems are particularly affected by these variations, and due to high quality requirements the effect on the devices’ performance has to be understood quantitatively. In recent years it has become possible to model the performance of such complex systems on the basis of design specifications, and model-based Sensitivity Analysis has made its way into industrial engineering. We show how an efficient Bayesian approach, using a Gaussian process prior, can replace the commonly used brute-force Monte Carlo scheme, making it possible to apply the analysis to computationally costly models. We introduce a number of global, statistically justified sensitivity measures for design analysis and optimization. Two models of integrated systems serve us as case studies to introduce the analysis and to assess its convergence properties. We show that the Bayesian Monte Carlo scheme can save costly simulation runs and can ensure a reliable accuracy of the analysis.
PDF Web DOI BibTeX

Empirical Inference Article Weighting of experimental evidence in macromolecular structure determination Habeck, M., Rieping, W., Nilges, M. Proceedings of the National Academy of Sciences of the United States of America, 103(6):1756-1761, February 2006
The determination of macromolecular structures requires weighting of experimental evidence relative to prior physical information. Although it can critically affect the quality of the calculated structures, experimental data are routinely weighted on an empirical basis. At present, cross-validation is the most rigorous method to determine the best weight. We describe a general method to adaptively weight experimental data in the course of structure calculation. It is further shown that the necessity to define weights for the data can be completely alleviated. We demonstrate the method on a structure calculation from NMR data and find that the resulting structures are optimal in terms of accuracy and structural quality. Our method is devoid of the bias imposed by an empirical choice of the weight and has some advantages over estimating the weight by cross-validation.
Web DOI BibTeX

Empirical Inference Conference Paper Causal Inference by Choosing Graphs with Most Plausible Markov Kernels Sun, X., Janzing, D., Schölkopf, B. In Proceedings of the Ninth International Symposium on Artificial Intelligence and Mathematics (AI & Math 2006, Proceedings of the 9th International Symposium on Artificial Intelligence and Mathematics, 1-11, ISAIM, January 2006
We propose a new inference rule for estimating causal structure that underlies the observed statistical dependencies among n random variables. Our method is based on comparing the conditional distributions of variables given their direct causes (the so-called Markov kernels") for all hypothetical causal directions and choosing the most plausible one. We consider those Markov kernels most plausible, which maximize the (conditional) entropies constrained by their observed first moment (expectation) and second moments (variance and covariance with its direct causes) based on their given domain. In this paper, we discuss our inference rule for causal relationships between two variables in detail, apply it to a real-world temperature data set with known causality and show that our method provides a correct result for the example.
PDF Web BibTeX

Empirical Inference Article Classification of Faces in Man and Machine Graf, A., Wichmann, F., Bülthoff, H., Schölkopf, B. Neural Computation, 18(1):143-165, January 2006 PDF Web BibTeX

Empirical Inference Book Gaussian Processes for Machine Learning Rasmussen, C., Williams, C. 248, Adaptive Computation and Machine Learning, MIT Press, Cambridge, MA, USA, January 2006
Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines. GPs have received increased attention in the machine-learning community over the past decade, and this book provides a long-needed systematic and unified treatment of theoretical and practical aspects of GPs in machine learning. The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics. The book deals with the supervised-learning problem for both regression and classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions are presented and their properties discussed. Model selection is discussed both from a Bayesian and a classical perspective. Many connections to other well-known techniques from machine learning and statistics are discussed, including support-vector machines, neural networks, splines, regularization networks, relevance vector machines and others. Theoretical issues including learning curves and the PAC-Bayesian framework are treated, and several approximation methods for learning with large datasets are discussed. The book contains illustrative examples and exercises, and code and datasets are available on the Web. Appendixes provide mathematical background and a discussion of Gaussian Markov processes.
Web BibTeX

Empirical Inference Poster Classification of natural scenes: critical features revisited Drewes, J., Wichmann, F., Gegenfurtner, K. Experimentelle Psychologie: Beitr{\"a}ge zur 48. Tagung experimentell arbeitender Psychologen, 48:251, 2006 BibTeX

Empirical Inference Book Chapter Combining a Filter Method with SVMs Lal, T., Chapelle, O., Schölkopf, B. In Feature Extraction: Foundations and Applications, Studies in Fuzziness and Soft Computing, Vol. 207, 439-446, Studies in Fuzziness and Soft Computing ; 207, (Editors: I Guyon and M Nikravesh and S Gunn and LA Zadeh), Springer, Berlin, Germany, 2006
Our goal for the competition (feature selection competition NIPS 2003) was to evaluate the usefulness of simple machine learning techniques. We decided to use the correlation criteria as a feature selection method and Support Vector Machines for the classification part. Here we explain how we chose the regularization parameter C of the SVM, how we determined the kernel parameter and how we estimated the number of features used for each data set. All analyzes were carried out on the training sets of the competition data. We choose the data set Arcene as an example to explain the approach step by step. In our view the point of this competition was the construction of a well performing classifier rather than the systematic analysis of a specific approach. This is why our search for the best classifier was only guided by the described methods and that we deviated from the road map at several occasions. All calculations were done with the software Spider [2004].
PDF DOI BibTeX

Empirical Inference Book Chapter Embedded methods Lal, T., Chapelle, O., Weston, J., Elisseeff, A. In Feature Extraction: Foundations and Applications, 137-165, Studies in Fuzziness and Soft Computing ; 207, (Editors: Guyon, I. , S. Gunn, M. Nikravesh, L. A. Zadeh), Springer, Berlin, Germany, 2006
Embedded methods are a relatively new approach to feature selection. Unlike filter methods, which do not incorporate learning, and wrapper approaches, which can be used with arbitrary classifiers, in embedded methods the features selection part can not be separated from the learning part. Existing embedded methods are reviewed based on a unifying mathematical framework.
PDF Web BibTeX

Autonomous Motion Empirical Inference Conference Paper Learning operational space control Peters, J., Schaal, S. In Robotics: Science and Systems II (RSS 2006), 255-262, (Editors: Gaurav S. Sukhatme and Stefan Schaal and Wolfram Burgard and Dieter Fox), Cambridge, MA: MIT Press, RSS , 2006, clmc
While operational space control is of essential importance for robotics and well-understood from an analytical point of view, it can be prohibitively hard to achieve accurate control in face of modeling errors, which are inevitable in complex robots, e.g., humanoid robots. In such cases, learning control methods can offer an interesting alternative to analytical control algorithms. However, the resulting learning problem is ill-defined as it requires to learn an inverse mapping of a usually redundant system, which is well known to suffer from the property of non-covexity of the solution space, i.e., the learning system could generate motor commands that try to steer the robot into physically impossible configurations. A first important insight for this paper is that, nevertheless, a physically correct solution to the inverse problem does exits when learning of the inverse map is performed in a suitable piecewise linear way. The second crucial component for our work is based on a recent insight that many operational space controllers can be understood in terms of a constraint optimal control problem. The cost function associated with this optimal control problem allows us to formulate a learning algorithm that automatically synthesizes a globally consistent desired resolution of redundancy while learning the operational space controller. From the view of machine learning, the learning problem corresponds to a reinforcement learning problem that maximizes an immediate reward and that employs an expectation-maximization policy search algorithm. Evaluations on a three degrees of freedom robot arm illustrate the feasability of our suggested approach.
URL BibTeX

Empirical Inference Proceedings Machine Learning Challenges: evaluating predictive uncertainty, visual object classification and recognising textual entailment Quinonero Candela, J., Dagan, I., Magnini, B., Lauria, F. Proceedings of the First Pascal Machine Learning Challenges Workshop on Machine Learning Challenges, Evaluating Predictive Uncertainty, Visual Object Classification and Recognizing Textual Entailment (MLCW 2005), 462, Lecture Notes in Computer Science, Springer, Heidelberg, Germany, First Pascal Machine Learning Challenges Workshop (MLCW 2005), 2006
This book constitutes the thoroughly refereed post-proceedings of the First PASCAL (pattern analysis, statistical modelling and computational learning) Machine Learning Challenges Workshop, MLCW 2005, held in Southampton, UK in April 2005. The 25 revised full papers presented were carefully selected during two rounds of reviewing and improvement from about 50 submissions. The papers reflect the concepts of three challenges dealt with in the workshop: finding an assessment base on the uncertainty of predictions using classical statistics, Bayesian inference, and statistical learning theory; the second challenge was to recognize objects from a number of visual object classes in realistic scenes; the third challenge of recognizing textual entailment addresses semantic analysis of language to form a generic framework for applied semantic inference in text understanding.
Web DOI BibTeX

Autonomous Motion Empirical Inference Conference Paper Reinforcement Learning for Parameterized Motor Primitives Peters, J., Schaal, S. In Proceedings of the 2006 International Joint Conference on Neural Networks, 73-80, IJCNN, 2006, clmc
One of the major challenges in both action generation for robotics and in the understanding of human motor control is to learn the "building blocks of movement generation", called motor primitives. Motor primitives, as used in this paper, are parameterized control policies such as splines or nonlinear differential equations with desired attractor properties. While a lot of progress has been made in teaching parameterized motor primitives using supervised or imitation learning, the self-improvement by interaction of the system with the environment remains a challenging problem. In this paper, we evaluate different reinforcement learning approaches for improving the performance of parameterized motor primitives. For pursuing this goal, we highlight the difficulties with current reinforcement learning methods, and outline both established and novel algorithms for the gradient-based improvement of parameterized policies. We compare these algorithms in the context of motor primitive learning, and show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm.
DOI URL BibTeX

Empirical Inference Poster Texture and haptic cues in slant discrimination: combination is sensitive to reliability but not statistically optimal Rosas, P., Wagemans, J., Ernst, M., Wichmann, F. Beitr{\"a}ge zur 48. Tagung experimentell arbeitender Psychologen (TeaP 2006), 48:80, 2006 BibTeX

Empirical Inference Poster The pedestal effect is caused by off-frequency looking, not nonlinear transduction or contrast gain-control Wichmann, F., Henning, B. Experimentelle Psychologie: Beitr{\"a}ge zur 48. Tagung experimentell arbeitender Psychologen, 48:205, 2006 BibTeX

Empirical Inference Poster Ähnlichkeitsmasse in Modellen zur Kategorienbildung Jäkel, F., Wichmann, F. Experimentelle Psychologie: Beitr{\"a}ge zur 48. Tagung experimentell arbeitender Psychologen, 48:223, 2006 BibTeX

Empirical Inference Article A Unifying View of Sparse Approximate Gaussian Process Regression Quinonero Candela, J., Rasmussen, C. Journal of Machine Learning Research, 6:1935-1959, December 2005
We provide a new unifying view, including all existing proper probabilistic sparse approximations for Gaussian process regression. Our approach relies on expressing the effective prior which the methods are using. This allows new insights to be gained, and highlights the relationship between existing methods. It also allows for a clear theoretically justified ranking of the closeness of the known approximations to the corresponding full GPs. Finally we point directly to designs of new better sparse approximations, combining the best of the existing strategies, within attractive computational constraints.
PDF BibTeX

Empirical Inference Conference Paper Kernel ICA for Large Scale Problems Jegelka, S., Gretton, A., Achlioptas, D. In -, NIPS 2005 Workshop on Large Scale Kernel Machines, December 2005 Web BibTeX

Empirical Inference Article Kernel Methods for Measuring Independence Gretton, A., Herbrich, R., Smola, A., Bousquet, O., Schölkopf, B. Journal of Machine Learning Research, 6:2075-2129, December 2005
We introduce two new functionals, the constrained covariance and the kernel mutual information, to measure the degree of independence of random variables. These quantities are both based on the covariance between functions of the random variables in reproducing kernel Hilbert spaces (RKHSs). We prove that when the RKHSs are universal, both functionals are zero if and only if the random variables are pairwise independent. We also show that the kernel mutual information is an upper bound near independence on the Parzen window estimate of the mutual information. Analogous results apply for two correlation-based dependence functionals introduced earlier: we show the kernel canonical correlation and the kernel generalised variance to be independence measures for universal kernels, and prove the latter to be an upper bound on the mutual information near independence. The performance of the kernel dependence functionals in measuring independence is verified in the context of independent component analysis.
PDF PostScript PDF BibTeX

Empirical Inference Talk Some thoughts about Gaussian Processes Chapelle, O. NIPS 2005 Workshop on Open Problems in Gaussian Processes for Machine Learning, December 2005 PDF Web BibTeX

Empirical Inference Technical Report Popper, Falsification and the VC-dimension Corfield, D., Schölkopf, B., Vapnik, V. (145), Max Planck Institute for Biological Cybernetics, November 2005 PDF BibTeX

Empirical Inference Ph.D. Thesis Extension to Kernel Dependency Estimation with Applications to Robotics BakIr, G. Biologische Kybernetik, Technische Universität Berlin, Berlin, November 2005
Kernel Dependency Estimation(KDE) is a novel technique which was designed to learn mappings between sets without making assumptions on the type of the involved input and output data. It learns the mapping in two stages. In a first step, it tries to estimate coordinates of a feature space representation of elements of the set by solving a high dimensional multivariate regression problem in feature space. Following this, it tries to reconstruct the original representation given the estimated coordinates. This thesis introduces various algorithmic extensions to both stages in KDE. One of the contributions of this thesis is to propose a novel linear regression algorithm that explores low-dimensional subspaces during learning. Furthermore various existing strategies for reconstructing patterns from feature maps involved in KDE are discussed and novel pre-image techniques are introduced. In particular, pre-image techniques for data-types that are of discrete nature such as graphs and strings are investigated. KDE is then explored in the context of robot pose imitation where the input is a an image with a human operator and the output is the robot articulated variables. Thus, using KDE, robot pose imitation is formulated as a regression problem.
PDF PDF BibTeX

Empirical Inference Ph.D. Thesis Geometrical aspects of statistical learning theory Hein, M. Biologische Kybernetik, Darmstadt, Darmstadt, November 2005 PDF BibTeX

Empirical Inference Poster Kernel methods for dependence testing in LFP-MUA Gretton, A., Belitski, A., Murayama, Y., Schölkopf, B., Logothetis, N. 35(689.17), 35th Annual Meeting of the Society for Neuroscience (Neuroscience 2005), November 2005
A fundamental problem in neuroscience is determining whether or not particular neural signals are dependent. The correlation is the most straightforward basis for such tests, but considerable work also focuses on the mutual information (MI), which is capable of revealing dependence of higher orders that the correlation cannot detect. That said, there are other measures of dependence that share with the MI an ability to detect dependence of any order, but which can be easier to compute in practice. We focus in particular on tests based on the functional covariance, which derive from work originally accomplished in 1959 by Renyi. Conceptually, our dependence tests work by computing the covariance between (infinite dimensional) vectors of nonlinear mappings of the observations being tested, and then determining whether this covariance is zero - we call this measure the constrained covariance (COCO). When these vectors are members of universal reproducing kernel Hilbert spaces, we can prove this covariance to be zero only when the variables being tested are independent. The greatest advantage of these tests, compared with the mutual information, is their simplicity – when comparing two signals, we need only take the largest eigenvalue (or the trace) of a product of two matrices of nonlinearities, where these matrices are generally much smaller than the number of observations (and are very simple to construct). We compare the mutual information, the COCO, and the correlation in the context of finding changes in dependence between the LFP and MUA signals in the primary visual cortex of the anaesthetized macaque, during the presentation of dynamic natural stimuli. We demonstrate that the MI and COCO reveal dependence which is not detected by the correlation alone (which we prove by artificially removing all correlation between the signals, and then testing their dependence with COCO and the MI); and that COCO and the MI give results consistent with each other on our data.
Web BibTeX

Empirical Inference Conference Paper Training Support Vector Machines with Multiple Equality Constraints Kienzle, W., Schölkopf, B. In Machine Learning: ECML 2005, Proceedings of the 16th European Conference on Machine Learning, Lecture Notes in Computer Science, Vol. 3720, 182-193, (Editors: JG Carbonell and J Siekmann), Springer, Berlin, Germany, ECML, November 2005
In this paper we present a primal-dual decomposition algorithm for support vector machine training. As with existing methods that use very small working sets (such as Sequential Minimal Optimization (SMO), Successive Over-Relaxation (SOR) or the Kernel Adatron (KA)), our method scales well, is straightforward to implement, and does not require an external QP solver. Unlike SMO, SOR and KA, the method is applicable to a large number of SVM formulations regardless of the number of equality constraints involved. The effectiveness of our algorithm is demonstrated on a more difficult SVM variant in this respect, namely semi-parametric support vector regression.
PDF DOI BibTeX

Empirical Inference Conference Paper Measuring Statistical Dependence with Hilbert-Schmidt Norms Gretton, A., Bousquet, O., Smola, A., Schoelkopf, B. In Algorithmic Learning Theory: 16th International Conference, ALT 2005, Algorithmic Learning Theory, Lecture Notes in Computer Science, Vol. 3734, 63-78, (Editors: S Jain and H-U Simon and E Tomita), Springer, Berlin, Germany, 16th International Conference ALT, October 2005
We propose an independence criterion based on the eigenspectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the Hilbert-Schmidt norm of the cross-covariance operator (we term this a Hilbert-Schmidt Independence Criterion, or HSIC). This approach has several advantages, compared with previous kernel-based independence criteria. First, the empirical estimate is simpler than any other kernel dependence test, and requires no user-defined regularisation. Second, there is a clearly defined population quantity which the empirical estimate approaches in the large sample limit, with exponential convergence guaranteed between the two: this ensures that independence tests based on {methodname} do not suffer from slow learning rates. Finally, we show in the context of independent component analysis (ICA) that the performance of HSIC is competitive with that of previously published kernel-based criteria, and of other recently published ICA methods.
PDF DOI BibTeX

Empirical Inference Conference Paper An Analysis of the Anti-Learning Phenomenon for the Class Symmetric Polyhedron Kowalczyk, A., Chapelle, O. In Algorithmic Learning Theory: 16th International Conference, 78-92, Algorithmic Learning Theory, October 2005
This paper deals with an unusual phenomenon where most machine learning algorithms yield good performance on the training set but systematically worse than random performance on the test set. This has been observed so far for some natural data sets and demonstrated for some synthetic data sets when the classification rule is learned from a small set of training samples drawn from some high dimensional space. The initial analysis presented in this paper shows that anti-learning is a property of data sets and is quite distinct from overfitting of a training data. Moreover, the analysis leads to a specification of some machine learning procedures which can overcome anti-learning and generate ma- chines able to classify training and test data consistently.
PDF BibTeX

Empirical Inference Article Assessing Approximate Inference for Binary Gaussian Process Classification Kuss, M., Rasmussen, C. Journal of Machine Learning Research, 6:1679 , October 2005
Gaussian process priors can be used to define flexible, probabilistic classification models. Unfortunately exact Bayesian inference is analytically intractable and various approximation techniques have been proposed. In this work we review and compare Laplace‘s method and Expectation Propagation for approximate Bayesian inference in the binary Gaussian process classification model. We present a comprehensive comparison of the approximations, their predictive performance and marginal likelihood estimates to results obtained by MCMC sampling. We explain theoretically and corroborate empirically the advantages of Expectation Propagation compared to Laplace‘s method.
PDF PDF BibTeX

Empirical Inference Article Maximal Margin Classification for Metric Spaces Hein, M., Bousquet, O., Schölkopf, B. Journal of Computer and System Sciences, 71(3):333-359, October 2005
In order to apply the maximum margin method in arbitrary metric spaces, we suggest to embed the metric space into a Banach or Hilbert space and to perform linear classification in this space. We propose several embeddings and recall that an isometric embedding in a Banach space is always possible while an isometric embedding in a Hilbert space is only possible for certain metric spaces. As a result, we obtain a general maximum margin classification algorithm for arbitrary metric spaces (whose solution is approximated by an algorithm of Graepel. Interestingly enough, the embedding approach, when applied to a metric which can be embedded into a Hilbert space, yields the SVM algorithm, which emphasizes the fact that its solution depends on the metric and not on the kernel. Furthermore we give upper bounds of the capacity of the function classes corresponding to both embeddings in terms of Rademacher averages. Finally we compare the capacities of these function classes directly.
PDF PDF DOI BibTeX

Empirical Inference Thesis Implicit Surfaces For Modelling Human Heads Steinke, F. Biologische Kybernetik, Eberhard-Karls-Universität, Tübingen, September 2005 BibTeX

Empirical Inference Poster Classification of natural scenes using global image statistics Drewes, J., Wichmann, F., Gegenfurtner, K. Journal of Vision, 5(8):602, Fifth Annual Meeting of the Vision Sciences Society (VSS 2005), September 2005
The algorithmic classification of complex, natural scenes is generally considered a difficult task due to the large amount of information conveyed by natural images. Work by Simon Thorpe and colleagues showed that humans are capable of detecting animals within novel natural scenes with remarkable speed and accuracy. This suggests that the relevant information for classification can be extracted at comparatively limited computational cost. One hypothesis is that global image statistics such as the amplitude spectrum could underly fast image classification (Johnson & Olshausen, Journal of Vision, 2003; Torralba & Oliva, Network: Comput. Neural Syst., 2003). We used linear discriminant analysis to classify a set of 11.000 images into animal and non-animal images. After applying a DFT to the image, we put the Fourier spectrum into bins (8 orientations with 6 frequency bands each). Using all bins, classification performance on the Fourier spectrum reached 70%. However, performance was similar (67%) when only the high spatial frequency information was used and decreased steadily at lower spatial frequencies, reaching a minimum (50%) for the low spatial frequency information. Similar results were obtained when all bins were used on spatially filtered images. A detailed analysis of the classification weights showed that a relatively high level of performance (67%) could also be obtained when only 2 bins were used, namely the vertical and horizontal orientation at the highest spatial frequency band. Our results show that in the absence of sophisticated machine learning techniques, animal detection in natural scenes is limited to rather modest levels of performance, far below those of human observers. If limiting oneself to global image statistics such as the DFT then mostly information at the highest spatial frequencies is useful for the task. This is analogous to the results obtained with human observers on filtered images (Kirchner et al, VSS 2004).
Web DOI BibTeX

Empirical Inference Article Clustering on the Unit Hypersphere using von Mises-Fisher Distributions Banerjee, A., Dhillon, I., Ghosh, J., Sra, S. Journal of Machine Learning Research, 6:1345-1382, September 2005
Several large scale data mining applications, such as text categorization and gene expression analysis, involve high-dimensional data that is also inherently directional in nature. Often such data is L2 normalized so that it lies on the surface of a unit hypersphere. Popular models such as (mixtures of) multi-variate Gaussians are inadequate for characterizing such data. This paper proposes a generative mixture-model approach to clustering directional data based on the von Mises-Fisher (vMF) distribution, which arises naturally for data distributed on the unit hypersphere. In particular, we derive and analyze two variants of the Expectation Maximization (EM) framework for estimating the mean and concentration parameters of this mixture. Numerical estimation of the concentration parameters is non-trivial in high dimensions since it involves functional inversion of ratios of Bessel functions. We also formulate two clustering algorithms corresponding to the variants of EM that we derive. Our approach provides a theoretical basis for the use of cosine similarity that has been widely employed by the information retrieval community, and obtains the spherical kmeans algorithm (kmeans with cosine similarity) as a special case of both variants. Empirical results on clustering of high-dimensional text and gene-expression data based on a mixture of vMF distributions show that the ability to estimate the concentration parameter for each vMF component, which is not present in existing approaches, yields superior results, especially for difficult clustering tasks in high-dimensional spaces.
PDF BibTeX

Empirical Inference Article Fast Protein Classification with Multiple Networks Tsuda, K., Shin, H., Schölkopf, B. Bioinformatics, 21(Suppl. 2):59-65, September 2005
Support vector machines (SVM) have been successfully used to classify proteins into functional categories. Recently, to integrate multiple data sources, a semidefinite programming (SDP) based SVM method was introduced Lanckriet et al (2004). In SDP/SVM, multiple kernel matrices corresponding to each of data sources are combined with weights obtained by solving an SDP. However, when trying to apply SDP/SVM to large problems, the computational cost can become prohibitive, since both converting the data to a kernel matrix for the SVM and solving the SDP are time and memory demanding. Another application-specific drawback arises when some of the data sources are protein networks. A common method of converting the network to a kernel matrix is the diffusion kernel method, which has time complexity of O(n^3), and produces a dense matrix of size n x n. We propose an efficient method of protein classification using multiple protein networks. Available protein networks, such as a physical interaction network or a metabolic network, can be directly incorporated. Vectorial data can also be incorporated after conversion into a network by means of neighbor point connection. Similarly to the SDP/SVM method, the combination weights are obtained by convex optimization. Due to the sparsity of network edges, the computation time is nearly linear in the number of edges of the combined network. Additionally, the combination weights provide information useful for discarding noisy or irrelevant networks. Experiments on function prediction of 3588 yeast proteins show promising results: the computation time is enormously reduced, while the accuracy is still comparable to the SDP/SVM method.
PDF Web DOI BibTeX

Empirical Inference Article Iterative Kernel Principal Component Analysis for Image Modeling Kim, K., Franz, M., Schölkopf, B. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(9):1351-1366, September 2005
In recent years, Kernel Principal Component Analysis (KPCA) has been suggested for various image processing tasks requiring an image model such as, e.g., denoising or compression. The original form of KPCA, however, can be only applied to strongly restricted image classes due to the limited number of training examples that can be processed. We therefore propose a new iterative method for performing KPCA, the Kernel Hebbian Algorithm which iteratively estimates the Kernel Principal Components with only linear order memory complexity. In our experiments, we compute models for complex image classes such as faces and natural images which require a large number of training examples. The resulting image models are tested in single-frame super-resolution and denoising applications. The KPCA model is not specifically tailored to these tasks; in fact, the same model can be used in super-resolution with variable input resolution, or denoising with unknown noise characteristics. In spite of this, both super-resolution a nd denoising performance are comparable to existing methods.
Web DOI BibTeX