Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Empirical Inference Article Classification of Faces in Man and Machine Graf, A., Wichmann, F., Bülthoff, H., Schölkopf, B. Neural Computation, 18(1):143-165, January 2006 PDF Web BibTeX

Empirical Inference Book Gaussian Processes for Machine Learning Rasmussen, C., Williams, C. 248, Adaptive Computation and Machine Learning, MIT Press, Cambridge, MA, USA, January 2006
Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines. GPs have received increased attention in the machine-learning community over the past decade, and this book provides a long-needed systematic and unified treatment of theoretical and practical aspects of GPs in machine learning. The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics. The book deals with the supervised-learning problem for both regression and classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions are presented and their properties discussed. Model selection is discussed both from a Bayesian and a classical perspective. Many connections to other well-known techniques from machine learning and statistics are discussed, including support-vector machines, neural networks, splines, regularization networks, relevance vector machines and others. Theoretical issues including learning curves and the PAC-Bayesian framework are treated, and several approximation methods for learning with large datasets are discussed. The book contains illustrative examples and exercises, and code and datasets are available on the Web. Appendixes provide mathematical background and a discussion of Gaussian Markov processes.
Web BibTeX

Empirical Inference Poster Classification of natural scenes: critical features revisited Drewes, J., Wichmann, F., Gegenfurtner, K. Experimentelle Psychologie: Beitr{\"a}ge zur 48. Tagung experimentell arbeitender Psychologen, 48:251, 2006 BibTeX

Empirical Inference Book Chapter Combining a Filter Method with SVMs Lal, T., Chapelle, O., Schölkopf, B. In Feature Extraction: Foundations and Applications, Studies in Fuzziness and Soft Computing, Vol. 207, 439-446, Studies in Fuzziness and Soft Computing ; 207, (Editors: I Guyon and M Nikravesh and S Gunn and LA Zadeh), Springer, Berlin, Germany, 2006
Our goal for the competition (feature selection competition NIPS 2003) was to evaluate the usefulness of simple machine learning techniques. We decided to use the correlation criteria as a feature selection method and Support Vector Machines for the classification part. Here we explain how we chose the regularization parameter C of the SVM, how we determined the kernel parameter and how we estimated the number of features used for each data set. All analyzes were carried out on the training sets of the competition data. We choose the data set Arcene as an example to explain the approach step by step. In our view the point of this competition was the construction of a well performing classifier rather than the systematic analysis of a specific approach. This is why our search for the best classifier was only guided by the described methods and that we deviated from the road map at several occasions. All calculations were done with the software Spider [2004].
PDF DOI BibTeX

Empirical Inference Book Chapter Embedded methods Lal, T., Chapelle, O., Weston, J., Elisseeff, A. In Feature Extraction: Foundations and Applications, 137-165, Studies in Fuzziness and Soft Computing ; 207, (Editors: Guyon, I. , S. Gunn, M. Nikravesh, L. A. Zadeh), Springer, Berlin, Germany, 2006
Embedded methods are a relatively new approach to feature selection. Unlike filter methods, which do not incorporate learning, and wrapper approaches, which can be used with arbitrary classifiers, in embedded methods the features selection part can not be separated from the learning part. Existing embedded methods are reviewed based on a unifying mathematical framework.
PDF Web BibTeX

Autonomous Motion Empirical Inference Conference Paper Learning operational space control Peters, J., Schaal, S. In Robotics: Science and Systems II (RSS 2006), 255-262, (Editors: Gaurav S. Sukhatme and Stefan Schaal and Wolfram Burgard and Dieter Fox), Cambridge, MA: MIT Press, RSS , 2006, clmc
While operational space control is of essential importance for robotics and well-understood from an analytical point of view, it can be prohibitively hard to achieve accurate control in face of modeling errors, which are inevitable in complex robots, e.g., humanoid robots. In such cases, learning control methods can offer an interesting alternative to analytical control algorithms. However, the resulting learning problem is ill-defined as it requires to learn an inverse mapping of a usually redundant system, which is well known to suffer from the property of non-covexity of the solution space, i.e., the learning system could generate motor commands that try to steer the robot into physically impossible configurations. A first important insight for this paper is that, nevertheless, a physically correct solution to the inverse problem does exits when learning of the inverse map is performed in a suitable piecewise linear way. The second crucial component for our work is based on a recent insight that many operational space controllers can be understood in terms of a constraint optimal control problem. The cost function associated with this optimal control problem allows us to formulate a learning algorithm that automatically synthesizes a globally consistent desired resolution of redundancy while learning the operational space controller. From the view of machine learning, the learning problem corresponds to a reinforcement learning problem that maximizes an immediate reward and that employs an expectation-maximization policy search algorithm. Evaluations on a three degrees of freedom robot arm illustrate the feasability of our suggested approach.
URL BibTeX

Empirical Inference Proceedings Machine Learning Challenges: evaluating predictive uncertainty, visual object classification and recognising textual entailment Quinonero Candela, J., Dagan, I., Magnini, B., Lauria, F. Proceedings of the First Pascal Machine Learning Challenges Workshop on Machine Learning Challenges, Evaluating Predictive Uncertainty, Visual Object Classification and Recognizing Textual Entailment (MLCW 2005), 462, Lecture Notes in Computer Science, Springer, Heidelberg, Germany, First Pascal Machine Learning Challenges Workshop (MLCW 2005), 2006
This book constitutes the thoroughly refereed post-proceedings of the First PASCAL (pattern analysis, statistical modelling and computational learning) Machine Learning Challenges Workshop, MLCW 2005, held in Southampton, UK in April 2005. The 25 revised full papers presented were carefully selected during two rounds of reviewing and improvement from about 50 submissions. The papers reflect the concepts of three challenges dealt with in the workshop: finding an assessment base on the uncertainty of predictions using classical statistics, Bayesian inference, and statistical learning theory; the second challenge was to recognize objects from a number of visual object classes in realistic scenes; the third challenge of recognizing textual entailment addresses semantic analysis of language to form a generic framework for applied semantic inference in text understanding.
Web DOI BibTeX

Autonomous Motion Empirical Inference Conference Paper Reinforcement Learning for Parameterized Motor Primitives Peters, J., Schaal, S. In Proceedings of the 2006 International Joint Conference on Neural Networks, 73-80, IJCNN, 2006, clmc
One of the major challenges in both action generation for robotics and in the understanding of human motor control is to learn the "building blocks of movement generation", called motor primitives. Motor primitives, as used in this paper, are parameterized control policies such as splines or nonlinear differential equations with desired attractor properties. While a lot of progress has been made in teaching parameterized motor primitives using supervised or imitation learning, the self-improvement by interaction of the system with the environment remains a challenging problem. In this paper, we evaluate different reinforcement learning approaches for improving the performance of parameterized motor primitives. For pursuing this goal, we highlight the difficulties with current reinforcement learning methods, and outline both established and novel algorithms for the gradient-based improvement of parameterized policies. We compare these algorithms in the context of motor primitive learning, and show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm.
DOI URL BibTeX

Empirical Inference Poster Texture and haptic cues in slant discrimination: combination is sensitive to reliability but not statistically optimal Rosas, P., Wagemans, J., Ernst, M., Wichmann, F. Beitr{\"a}ge zur 48. Tagung experimentell arbeitender Psychologen (TeaP 2006), 48:80, 2006 BibTeX

Empirical Inference Poster The pedestal effect is caused by off-frequency looking, not nonlinear transduction or contrast gain-control Wichmann, F., Henning, B. Experimentelle Psychologie: Beitr{\"a}ge zur 48. Tagung experimentell arbeitender Psychologen, 48:205, 2006 BibTeX

Empirical Inference Poster Ähnlichkeitsmasse in Modellen zur Kategorienbildung Jäkel, F., Wichmann, F. Experimentelle Psychologie: Beitr{\"a}ge zur 48. Tagung experimentell arbeitender Psychologen, 48:223, 2006 BibTeX

Empirical Inference Article A Unifying View of Sparse Approximate Gaussian Process Regression Quinonero Candela, J., Rasmussen, C. Journal of Machine Learning Research, 6:1935-1959, December 2005
We provide a new unifying view, including all existing proper probabilistic sparse approximations for Gaussian process regression. Our approach relies on expressing the effective prior which the methods are using. This allows new insights to be gained, and highlights the relationship between existing methods. It also allows for a clear theoretically justified ranking of the closeness of the known approximations to the corresponding full GPs. Finally we point directly to designs of new better sparse approximations, combining the best of the existing strategies, within attractive computational constraints.
PDF BibTeX

Empirical Inference Conference Paper Kernel ICA for Large Scale Problems Jegelka, S., Gretton, A., Achlioptas, D. In -, NIPS 2005 Workshop on Large Scale Kernel Machines, December 2005 Web BibTeX

Empirical Inference Article Kernel Methods for Measuring Independence Gretton, A., Herbrich, R., Smola, A., Bousquet, O., Schölkopf, B. Journal of Machine Learning Research, 6:2075-2129, December 2005
We introduce two new functionals, the constrained covariance and the kernel mutual information, to measure the degree of independence of random variables. These quantities are both based on the covariance between functions of the random variables in reproducing kernel Hilbert spaces (RKHSs). We prove that when the RKHSs are universal, both functionals are zero if and only if the random variables are pairwise independent. We also show that the kernel mutual information is an upper bound near independence on the Parzen window estimate of the mutual information. Analogous results apply for two correlation-based dependence functionals introduced earlier: we show the kernel canonical correlation and the kernel generalised variance to be independence measures for universal kernels, and prove the latter to be an upper bound on the mutual information near independence. The performance of the kernel dependence functionals in measuring independence is verified in the context of independent component analysis.
PDF PostScript PDF BibTeX

Empirical Inference Talk Some thoughts about Gaussian Processes Chapelle, O. NIPS 2005 Workshop on Open Problems in Gaussian Processes for Machine Learning, December 2005 PDF Web BibTeX

Empirical Inference Technical Report Popper, Falsification and the VC-dimension Corfield, D., Schölkopf, B., Vapnik, V. (145), Max Planck Institute for Biological Cybernetics, November 2005 PDF BibTeX

Empirical Inference Ph.D. Thesis Extension to Kernel Dependency Estimation with Applications to Robotics BakIr, G. Biologische Kybernetik, Technische Universität Berlin, Berlin, November 2005
Kernel Dependency Estimation(KDE) is a novel technique which was designed to learn mappings between sets without making assumptions on the type of the involved input and output data. It learns the mapping in two stages. In a first step, it tries to estimate coordinates of a feature space representation of elements of the set by solving a high dimensional multivariate regression problem in feature space. Following this, it tries to reconstruct the original representation given the estimated coordinates. This thesis introduces various algorithmic extensions to both stages in KDE. One of the contributions of this thesis is to propose a novel linear regression algorithm that explores low-dimensional subspaces during learning. Furthermore various existing strategies for reconstructing patterns from feature maps involved in KDE are discussed and novel pre-image techniques are introduced. In particular, pre-image techniques for data-types that are of discrete nature such as graphs and strings are investigated. KDE is then explored in the context of robot pose imitation where the input is a an image with a human operator and the output is the robot articulated variables. Thus, using KDE, robot pose imitation is formulated as a regression problem.
PDF PDF BibTeX

Empirical Inference Ph.D. Thesis Geometrical aspects of statistical learning theory Hein, M. Biologische Kybernetik, Darmstadt, Darmstadt, November 2005 PDF BibTeX

Empirical Inference Poster Kernel methods for dependence testing in LFP-MUA Gretton, A., Belitski, A., Murayama, Y., Schölkopf, B., Logothetis, N. 35(689.17), 35th Annual Meeting of the Society for Neuroscience (Neuroscience 2005), November 2005
A fundamental problem in neuroscience is determining whether or not particular neural signals are dependent. The correlation is the most straightforward basis for such tests, but considerable work also focuses on the mutual information (MI), which is capable of revealing dependence of higher orders that the correlation cannot detect. That said, there are other measures of dependence that share with the MI an ability to detect dependence of any order, but which can be easier to compute in practice. We focus in particular on tests based on the functional covariance, which derive from work originally accomplished in 1959 by Renyi. Conceptually, our dependence tests work by computing the covariance between (infinite dimensional) vectors of nonlinear mappings of the observations being tested, and then determining whether this covariance is zero - we call this measure the constrained covariance (COCO). When these vectors are members of universal reproducing kernel Hilbert spaces, we can prove this covariance to be zero only when the variables being tested are independent. The greatest advantage of these tests, compared with the mutual information, is their simplicity – when comparing two signals, we need only take the largest eigenvalue (or the trace) of a product of two matrices of nonlinearities, where these matrices are generally much smaller than the number of observations (and are very simple to construct). We compare the mutual information, the COCO, and the correlation in the context of finding changes in dependence between the LFP and MUA signals in the primary visual cortex of the anaesthetized macaque, during the presentation of dynamic natural stimuli. We demonstrate that the MI and COCO reveal dependence which is not detected by the correlation alone (which we prove by artificially removing all correlation between the signals, and then testing their dependence with COCO and the MI); and that COCO and the MI give results consistent with each other on our data.
Web BibTeX

Empirical Inference Conference Paper Training Support Vector Machines with Multiple Equality Constraints Kienzle, W., Schölkopf, B. In Machine Learning: ECML 2005, Proceedings of the 16th European Conference on Machine Learning, Lecture Notes in Computer Science, Vol. 3720, 182-193, (Editors: JG Carbonell and J Siekmann), Springer, Berlin, Germany, ECML, November 2005
In this paper we present a primal-dual decomposition algorithm for support vector machine training. As with existing methods that use very small working sets (such as Sequential Minimal Optimization (SMO), Successive Over-Relaxation (SOR) or the Kernel Adatron (KA)), our method scales well, is straightforward to implement, and does not require an external QP solver. Unlike SMO, SOR and KA, the method is applicable to a large number of SVM formulations regardless of the number of equality constraints involved. The effectiveness of our algorithm is demonstrated on a more difficult SVM variant in this respect, namely semi-parametric support vector regression.
PDF DOI BibTeX

Empirical Inference Conference Paper Measuring Statistical Dependence with Hilbert-Schmidt Norms Gretton, A., Bousquet, O., Smola, A., Schoelkopf, B. In Algorithmic Learning Theory: 16th International Conference, ALT 2005, Algorithmic Learning Theory, Lecture Notes in Computer Science, Vol. 3734, 63-78, (Editors: S Jain and H-U Simon and E Tomita), Springer, Berlin, Germany, 16th International Conference ALT, October 2005
We propose an independence criterion based on the eigenspectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the Hilbert-Schmidt norm of the cross-covariance operator (we term this a Hilbert-Schmidt Independence Criterion, or HSIC). This approach has several advantages, compared with previous kernel-based independence criteria. First, the empirical estimate is simpler than any other kernel dependence test, and requires no user-defined regularisation. Second, there is a clearly defined population quantity which the empirical estimate approaches in the large sample limit, with exponential convergence guaranteed between the two: this ensures that independence tests based on {methodname} do not suffer from slow learning rates. Finally, we show in the context of independent component analysis (ICA) that the performance of HSIC is competitive with that of previously published kernel-based criteria, and of other recently published ICA methods.
PDF DOI BibTeX

Empirical Inference Conference Paper An Analysis of the Anti-Learning Phenomenon for the Class Symmetric Polyhedron Kowalczyk, A., Chapelle, O. In Algorithmic Learning Theory: 16th International Conference, 78-92, Algorithmic Learning Theory, October 2005
This paper deals with an unusual phenomenon where most machine learning algorithms yield good performance on the training set but systematically worse than random performance on the test set. This has been observed so far for some natural data sets and demonstrated for some synthetic data sets when the classification rule is learned from a small set of training samples drawn from some high dimensional space. The initial analysis presented in this paper shows that anti-learning is a property of data sets and is quite distinct from overfitting of a training data. Moreover, the analysis leads to a specification of some machine learning procedures which can overcome anti-learning and generate ma- chines able to classify training and test data consistently.
PDF BibTeX

Empirical Inference Article Assessing Approximate Inference for Binary Gaussian Process Classification Kuss, M., Rasmussen, C. Journal of Machine Learning Research, 6:1679 , October 2005
Gaussian process priors can be used to define flexible, probabilistic classification models. Unfortunately exact Bayesian inference is analytically intractable and various approximation techniques have been proposed. In this work we review and compare Laplace‘s method and Expectation Propagation for approximate Bayesian inference in the binary Gaussian process classification model. We present a comprehensive comparison of the approximations, their predictive performance and marginal likelihood estimates to results obtained by MCMC sampling. We explain theoretically and corroborate empirically the advantages of Expectation Propagation compared to Laplace‘s method.
PDF PDF BibTeX

Empirical Inference Article Maximal Margin Classification for Metric Spaces Hein, M., Bousquet, O., Schölkopf, B. Journal of Computer and System Sciences, 71(3):333-359, October 2005
In order to apply the maximum margin method in arbitrary metric spaces, we suggest to embed the metric space into a Banach or Hilbert space and to perform linear classification in this space. We propose several embeddings and recall that an isometric embedding in a Banach space is always possible while an isometric embedding in a Hilbert space is only possible for certain metric spaces. As a result, we obtain a general maximum margin classification algorithm for arbitrary metric spaces (whose solution is approximated by an algorithm of Graepel. Interestingly enough, the embedding approach, when applied to a metric which can be embedded into a Hilbert space, yields the SVM algorithm, which emphasizes the fact that its solution depends on the metric and not on the kernel. Furthermore we give upper bounds of the capacity of the function classes corresponding to both embeddings in terms of Rademacher averages. Finally we compare the capacities of these function classes directly.
PDF PDF DOI BibTeX

Empirical Inference Thesis Implicit Surfaces For Modelling Human Heads Steinke, F. Biologische Kybernetik, Eberhard-Karls-Universität, Tübingen, September 2005 BibTeX

Empirical Inference Poster Classification of natural scenes using global image statistics Drewes, J., Wichmann, F., Gegenfurtner, K. Journal of Vision, 5(8):602, Fifth Annual Meeting of the Vision Sciences Society (VSS 2005), September 2005
The algorithmic classification of complex, natural scenes is generally considered a difficult task due to the large amount of information conveyed by natural images. Work by Simon Thorpe and colleagues showed that humans are capable of detecting animals within novel natural scenes with remarkable speed and accuracy. This suggests that the relevant information for classification can be extracted at comparatively limited computational cost. One hypothesis is that global image statistics such as the amplitude spectrum could underly fast image classification (Johnson & Olshausen, Journal of Vision, 2003; Torralba & Oliva, Network: Comput. Neural Syst., 2003). We used linear discriminant analysis to classify a set of 11.000 images into animal and non-animal images. After applying a DFT to the image, we put the Fourier spectrum into bins (8 orientations with 6 frequency bands each). Using all bins, classification performance on the Fourier spectrum reached 70%. However, performance was similar (67%) when only the high spatial frequency information was used and decreased steadily at lower spatial frequencies, reaching a minimum (50%) for the low spatial frequency information. Similar results were obtained when all bins were used on spatially filtered images. A detailed analysis of the classification weights showed that a relatively high level of performance (67%) could also be obtained when only 2 bins were used, namely the vertical and horizontal orientation at the highest spatial frequency band. Our results show that in the absence of sophisticated machine learning techniques, animal detection in natural scenes is limited to rather modest levels of performance, far below those of human observers. If limiting oneself to global image statistics such as the DFT then mostly information at the highest spatial frequencies is useful for the task. This is analogous to the results obtained with human observers on filtered images (Kirchner et al, VSS 2004).
Web DOI BibTeX

Empirical Inference Article Clustering on the Unit Hypersphere using von Mises-Fisher Distributions Banerjee, A., Dhillon, I., Ghosh, J., Sra, S. Journal of Machine Learning Research, 6:1345-1382, September 2005
Several large scale data mining applications, such as text categorization and gene expression analysis, involve high-dimensional data that is also inherently directional in nature. Often such data is L2 normalized so that it lies on the surface of a unit hypersphere. Popular models such as (mixtures of) multi-variate Gaussians are inadequate for characterizing such data. This paper proposes a generative mixture-model approach to clustering directional data based on the von Mises-Fisher (vMF) distribution, which arises naturally for data distributed on the unit hypersphere. In particular, we derive and analyze two variants of the Expectation Maximization (EM) framework for estimating the mean and concentration parameters of this mixture. Numerical estimation of the concentration parameters is non-trivial in high dimensions since it involves functional inversion of ratios of Bessel functions. We also formulate two clustering algorithms corresponding to the variants of EM that we derive. Our approach provides a theoretical basis for the use of cosine similarity that has been widely employed by the information retrieval community, and obtains the spherical kmeans algorithm (kmeans with cosine similarity) as a special case of both variants. Empirical results on clustering of high-dimensional text and gene-expression data based on a mixture of vMF distributions show that the ability to estimate the concentration parameter for each vMF component, which is not present in existing approaches, yields superior results, especially for difficult clustering tasks in high-dimensional spaces.
PDF BibTeX

Empirical Inference Article Fast Protein Classification with Multiple Networks Tsuda, K., Shin, H., Schölkopf, B. Bioinformatics, 21(Suppl. 2):59-65, September 2005
Support vector machines (SVM) have been successfully used to classify proteins into functional categories. Recently, to integrate multiple data sources, a semidefinite programming (SDP) based SVM method was introduced Lanckriet et al (2004). In SDP/SVM, multiple kernel matrices corresponding to each of data sources are combined with weights obtained by solving an SDP. However, when trying to apply SDP/SVM to large problems, the computational cost can become prohibitive, since both converting the data to a kernel matrix for the SVM and solving the SDP are time and memory demanding. Another application-specific drawback arises when some of the data sources are protein networks. A common method of converting the network to a kernel matrix is the diffusion kernel method, which has time complexity of O(n^3), and produces a dense matrix of size n x n. We propose an efficient method of protein classification using multiple protein networks. Available protein networks, such as a physical interaction network or a metabolic network, can be directly incorporated. Vectorial data can also be incorporated after conversion into a network by means of neighbor point connection. Similarly to the SDP/SVM method, the combination weights are obtained by convex optimization. Due to the sparsity of network edges, the computation time is nearly linear in the number of edges of the combined network. Additionally, the combination weights provide information useful for discarding noisy or irrelevant networks. Experiments on function prediction of 3588 yeast proteins show promising results: the computation time is enormously reduced, while the accuracy is still comparable to the SDP/SVM method.
PDF Web DOI BibTeX

Empirical Inference Article Iterative Kernel Principal Component Analysis for Image Modeling Kim, K., Franz, M., Schölkopf, B. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(9):1351-1366, September 2005
In recent years, Kernel Principal Component Analysis (KPCA) has been suggested for various image processing tasks requiring an image model such as, e.g., denoising or compression. The original form of KPCA, however, can be only applied to strongly restricted image classes due to the limited number of training examples that can be processed. We therefore propose a new iterative method for performing KPCA, the Kernel Hebbian Algorithm which iteratively estimates the Kernel Principal Components with only linear order memory complexity. In our experiments, we compute models for complex image classes such as faces and natural images which require a large number of training examples. The resulting image models are tested in single-frame super-resolution and denoising applications. The KPCA model is not specifically tailored to these tasks; in fact, the same model can be used in super-resolution with variable input resolution, or denoising with unknown noise characteristics. In spite of this, both super-resolution a nd denoising performance are comparable to existing methods.
Web DOI BibTeX

Empirical Inference Poster Learning an Interest Operator from Eye Movements Kienzle, W., Franz, M., Wichmann, F., Schölkopf, B. International Workshop on Bioinspired Information Processing (BIP 2005), 2005:1, September 2005 PDF Web BibTeX

Empirical Inference Ph.D. Thesis Machine Learning Methods for Brain-Computer Interdaces Lal, T. Biologische Kybernetik, University of Darmstadt, September 2005 Web BibTeX

Empirical Inference Poster Rapid animal detection in natural scenes: Critical features are local Wichmann, F., Rosas, P., Gegenfurtner, K. Journal of Vision, 5(8):376, Fifth Annual Meeting of the Vision Sciences Society (VSS 2005), September 2005
Thorpe et al (Nature 381, 1996) first showed how rapidly human observers are able to classify natural images as to whether they contain an animal or not. Whilst the basic result has been replicated using different response paradigms (yes-no versus forced-choice), modalities (eye movements versus button presses) as well as while measuring neurophysiological correlates (ERPs), it is still unclear which image features support this rapid categorisation. Recently Torralba and Oliva (Network: Computation in Neural Systems, 14, 2003) suggested that simple global image statistics can be used to predict seemingly complex decisions about the absence and/or presence of objects in natural scences. They show that the information contained in a small number (N=16) of spectral principal components (SPC)—principal component analysis (PCA) applied to the normalised power spectra of the images—is sufficient to achieve approximately 80% correct animal detection in natural scenes. Our goal was to test whether human observers make use of the power spectrum when rapidly classifying natural scenes. We measured our subjects' ability to detect animals in natural scenes as a function of presentation time (13 to 167 msec); images were immediately followed by a noise mask. In one condition we used the original images, in the other images whose power spectra were equalised (each power spectrum was set to the mean power spectrum over our ensemble of 1476 images). Thresholds for 75% correct animal detection were in the region of 20–30 msec for all observers, independent of the power spectrum of the images: this result makes it very unlikely that human observers make use of the global power spectrum. Taken together with the results of Gegenfurtner, Braun & Wichmann (Journal of Vision [abstract], 2003), showing the robustness of animal detection to global phase noise, we conclude that humans use local features, like edges and contours, in rapid animal detection.
Web DOI BibTeX

Empirical Inference Article Support Vector Machines for 3D Shape Processing Steinke, F., Schölkopf, B., Blanz, V. Computer Graphics Forum, 24(3, EUROGRAPHICS 2005):285-294, September 2005
We propose statistical learning methods for approximating implicit surfaces and computing dense 3D deformation fields. Our approach is based on Support Vector (SV) Machines, which are state of the art in machine learning. It is straightforward to implement and computationally competitive; its parameters can be automatically set using standard machine learning methods. The surface approximation is based on a modified Support Vector regression. We present applications to 3D head reconstruction, including automatic removal of outliers and hole filling. In a second step, we build on our SV representation to compute dense 3D deformation fields between two objects. The fields are computed using a generalized SVMachine enforcing correspondence between the previously learned implicit SV object representations, as well as correspondences between feature points if such points are available. We apply the method to the morphing of 3D heads and other objects.
PDF BibTeX

Empirical Inference Technical Report A Combinatorial View of Graph Laplacians Huang, J. (144), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, August 2005
Discussions about different graph Laplacian, mainly normalized and unnormalized versions of graph Laplacian, have been ardent with respect to various methods in clustering and graph based semi-supervised learning. Previous research on graph Laplacians investigated their convergence properties to Laplacian operators on continuous manifolds. There is still no strong proof on convergence for the normalized Laplacian. In this paper, we analyze different variants of graph Laplacians directly from the ways solving the original graph partitioning problem. The graph partitioning problem is a well-known combinatorial NP hard optimization problem. The spectral solutions provide evidence that normalized Laplacian encodes more reasonable considerations for graph partitioning. We also provide some examples to show their differences.
BibTeX

Empirical Inference Article Phenotypic characterization of chondrosarcoma-derived cell lines Schorle, C., Finger, F., Zien, A., Block, J., Gebhard, P., Aigner, T. Cancer Letters, 226(2):143-154, August 2005
Gene expression profiling of three chondrosarcoma derived cell lines (AD, SM, 105KC) showed an increased proliferative activity and a reduced expression of chondrocytic-typical matrix products compared to primary chondrocytes. The incapability to maintain an adequate matrix synthesis as well as a notable proliferative activity at the same time is comparable to neoplastic chondrosarcoma cells in vivo which cease largely cartilage matrix formation as soon as their proliferative activity increases. Thus, the investigated cell lines are of limited value as substitute of primary chondrocytes but might have a much higher potential to investigate the behavior of neoplastic chondrocytes, i.e. chondrosarcoma biology.
Web BibTeX

Empirical Inference Technical Report Beyond Pairwise Classification and Clustering Using Hypergraphs Zhou, D., Huang, J., Schölkopf, B. (143), Max Planck Institute for Biological Cybernetics, August 2005
In many applications, relationships among objects of interest are more complex than pairwise. Simply approximating complex relationships as pairwise ones can lead to loss of information. An alternative for these applications is to analyze complex relationships among data directly, without the need to first represent the complex relationships into pairwise ones. A natural way to describe complex relationships is to use hypergraphs. A hypergraph is a graph in which edges can connect more than two vertices. Thus we consider learning from a hypergraph, and develop a general framework which is applicable to classification and clustering for complex relational data. We have applied our framework to real-world web classification problems and obtained encouraging results.
PDF BibTeX

Empirical Inference Conference Paper Building Sparse Large Margin Classifiers Wu, M., Schölkopf, B., BakIr, G. In Proceedings of the 22nd International Conference on Machine Learning, 996-1003, (Editors: L De Raedt and S Wrobel ), ACM, New York, NY, USA, ICML 2005 , August 2005
This paper presents an approach to build Sparse Large Margin Classifiers (SLMC) by adding one more constraint to the standard Support Vector Machine (SVM) training problem. The added constraint explicitly controls the sparseness of the classifier and an approach is provided to solve the formulated problem. When considering the dual of this problem, it can be seen that building an SLMC is equivalent to constructing an SVM with a modified kernel function. Further analysis of this kernel function indicates that the proposed approach essentially finds a discriminating subspace that can be spanned by a small number of vectors, and in this subspace different classes of data are linearly well separated. Experimental results over several classification benchmarks show that in most cases the proposed approach outperforms the state-of-art sparse learning algorithms.
PDF DOI BibTeX

Empirical Inference Talk Building Sparse Large Margin Classifiers Wu, M., Schölkopf, B., BakIr, G. The 22nd International Conference on Machine Learning (ICML 2005), August 2005 PDF BibTeX

Empirical Inference Conference Paper Large Margin Non-Linear Embedding Zien, A., Candela, J. In Proceedings of the 22nd International Conference on Machine Learning (ICML 2005), ICML 2005, 1065-1072, (Editors: De Raedt, L. , S. Wrobel), ACM Press, New York, NY, USA, 22nd International Conference on Machine Learning, August 2005
It is common in classification methods to first place data in a vector space and then learn decision boundaries. We propose reversing that process: for fixed decision boundaries, we ``learn‘‘ the location of the data. This way we (i) do not need a metric (or even stronger structure) -- pairwise dissimilarities suffice; and additionally (ii) produce low-dimensional embeddings that can be analyzed visually. We achieve this by combining an entropy-based embedding method with an entropy-based version of semi-supervised logistic regression. We present results for clustering and semi-supervised classification.
PDF PostScript Web DOI BibTeX

Empirical Inference Conference Paper Learning from Labeled and Unlabeled Data on a Directed Graph Zhou, D., Huang, J., Schölkopf, B. In Proceedings of the 22nd International Conference on Machine Learning, 1041 -1048, (Editors: L De Raedt and S Wrobel), ACM, New York, NY, USA, ICML, August 2005
We propose a general framework for learning from labeled and unlabeled data on a directed graph in which the structure of the graph including the directionality of the edges is considered. The time complexity of the algorithm derived from this framework is nearly linear due to recently developed numerical techniques. In the absence of labeled instances, this framework can be utilized as a spectral clustering method for directed graphs, which generalizes the spectral clustering approach for undirected graphs. We have applied our framework to real-world web classification problems and obtained encouraging results.
PostScript PDF BibTeX

Empirical Inference Talk Learning from Labeled and Unlabeled Data on a Directed Graph Zhou, D. The 22nd International Conference on Machine Learning, August 2005
We propose a general framework for learning from labeled and unlabeled data on a directed graph in which the structure of the graph including the directionality of the edges is considered. The time complexity of the algorithm derived from this framework is nearly linear due to recently developed numerical techniques. In the absence of labeled instances, this framework can be utilized as a spectral clustering method for directed graphs, which generalizes the spectral clustering approach for undirected graphs. We have applied our framework to real-world web classification problems and obtained encouraging results.
PDF BibTeX

Empirical Inference Article Local Rademacher Complexities Bartlett, P., Bousquet, O., Mendelson, S. The Annals of Statistics, 33(4):1497-1537, August 2005
We propose new bounds on the error of learning algorithms in terms of a data-dependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical version of Rademacher averages, in the sense that the Rademacher averages are computed from the data, on a subset of functions with small empirical error. We present some applications to classification and prediction with convex function classes, and with kernel classes in particular.
PDF PostScript Web BibTeX

Empirical Inference Conference Paper Regularization on Discrete Spaces Zhou, D., Schölkopf, B. In Pattern Recognition, Proceedings of the 27th DAGM Symposium, Pattern Recognition, Lecture Notes in Computer Science, Vol. 3663, 361-368, (Editors: WG Kropatsch and R Sablatnig and A Hanbury), Springer, Berlin, Germany, 27th DAGM Symposium, August 2005
We consider the classification problem on a finite set of objects. Some of them are labeled, and the task is to predict the labels of the remaining unlabeled ones. Such an estimation problem is generally referred to as transductive inference. It is well-known that many meaningful inductive or supervised methods can be derived from a regularization framework, which minimizes a loss function plus a regularization term. In the same spirit, we propose a general discrete regularization framework defined on finite object sets, which can be thought of as the discrete analogue of classical regularization theory. A family of transductive inference schemes is then systemically derived from the framework, including our earlier algorithm for transductive inference, with which we obtained encouraging results on many practical classification problems. The discrete regularization framework is built on the discrete analysis and geometry developed by ourselves, in which a number of discrete differential operators of various orders are constructed, which can be thought of as the discrete analogue of their counterparts in the continuous case.
PDF PostScript DOI BibTeX

Empirical Inference Conference Paper A Machine Learning Approach to Conjoint Analysis Chapelle, O., Harchaoui, Z. In Advances in Neural Information Processing Systems, Advances in Neural Information Processing Systems 17, 257-264, (Editors: Saul, L.K. , Y. Weiss, L. Bottou), MIT Press, Cambridge, MA, USA, Eighteenth Annual Conference on Neural Information Processing Systems (NIPS 2004), July 2005
Choice-based conjoint analysis builds models of consumers preferences over products with answers gathered in questionnaires. Our main goal is to bring tools from the machine learning community to solve more efficiently this problem. Thus, we propose two algorithms to estimate quickly and accurately consumer preferences.
PDF Web BibTeX

Empirical Inference Conference Paper An Auditory Paradigm for Brain-Computer Interfaces Hill, N., Lal, T., Bierig, K., Birbaumer, N., Schölkopf, B. In Advances in Neural Information Processing Systems, Advances in Neural Information Processing Systems 17, 569-576, (Editors: LK Saul and Y Weiss and L Bottou), MIT Press, Cambridge, MA, USA, 18th Annual Conference on Neural Information Processing Systems (NIPS 2004), July 2005
Motivated by the particular problems involved in communicating with "locked-in" paralysed patients, we aim to develop a brain-computer interface that uses auditory stimuli. We describe a paradigm that allows a user to make a binary decision by focusing attention on one of two concurrent auditory stimulus sequences. Using Support Vector Machine classification and Recursive Channel Elimination on the independent components of averaged event-related potentials, we show that an untrained user's EEG data can be classified with an encouragingly high level of accuracy. This suggests that it is possible for users to modulate EEG signals in a single trial by the conscious direction of attention, well enough to be useful in BCI.
PDF Web BibTeX

Empirical Inference Conference Paper Breaking SVM Complexity with Cross-Training Bakir, G., Bottou, L., Weston, J. In Advances in Neural Information Processing Systems, Advances in Neural Information Processing Systems 17, 81-88, (Editors: Saul, L.K. , Y. Weiss, L. Bottou), MIT Press, Cambridge, MA, USA, Eighteenth Annual Conference on Neural Information Processing Systems (NIPS 2004), July 2005
We propose an algorithm for selectively removing examples from the training set using probabilistic estimates related to editing algorithms (Devijver and Kittler82). The procedure creates a separable distribution of training examples with minimal impact on the decision boundary position. It breaks the linear dependency between the number of SVs and the number of training examples, and sharply reduces the complexity of SVMs during both the training and prediction stages.
PDF Web BibTeX

Empirical Inference Conference Paper Face Detection: Efficient and Rank Deficient Kienzle, W., BakIr, G., Franz, M., Schölkopf, B. In Advances in Neural Information Processing Systems, Advances in Neural Information Processing Systems 17, 673-680, (Editors: LK Saul and Y Weiss and L Bottou), MIT Press, Cambridge, MA, USA, 18th Annual Conference on Neural Information Processing Systems (NIPS 2004), July 2005
This paper proposes a method for computing fast approximations to support vector decision functions in the field of object detection. In the present approach we are building on an existing algorithm where the set of support vectors is replaced by a smaller, so-called reduced set of synthesized input space points. In contrast to the existing method that finds the reduced set via unconstrained optimization, we impose a structural constraint on the synthetic points such that the resulting approximations can be evaluated via separable filters. For applications that require scanning an entire image, this decreases the computational complexity of a scan by a significant amount. We present experimental results on a standard face detection database.
PDF Web BibTeX

Empirical Inference Conference Paper Implicit Wiener series for higher-order image analysis Franz, M., Schölkopf, B. In Advances in Neural Information Processing Systems, Advances in Neural Information Processing Systems 17, 465-472, (Editors: LK Saul and Y Weiss and L Bottou), MIT Press, Cambridge, MA, USA, 18th Annual Conference on Neural Information Processing Systems (NIPS 2004), July 2005
The computation of classical higher-order statistics such as higher-order moments or spectra is difficult for images due to the huge number of terms to be estimated and interpreted. We propose an alternative approach in which multiplicative pixel interactions are described by a series of Wiener functionals. Since the functionals are estimated implicitly via polynomial kernels, the combinatorial explosion associated with the classical higher-order statistics is avoided. First results show that image structures such as lines or corners can be predicted correctly, and that pixel interactions up to the order of five play an important role in natural images.
PDF Web BibTeX