Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Empirical Inference Poster Study of Human Classification using Psychophysics and Machine Learning Graf, A., Wichmann, F., Bülthoff, H., Schölkopf, B. 6:149, (Editors: H.H. Bülthoff, K.R. Gegenfurtner, H.A. Mallot, R. Ulrich, F.A. Wichmann), 6. T{\"u}binger Wahrnehmungskonferenz (TWK 2003), February 2003
We attempt to reach a better understanding of classi cation in humans using both psychophysical and machine learning techniques. In our psychophysical paradigm the stimuli presented to the human subjects are modi ed using machine learning algorithms according to their responses. Frontal views of human faces taken from a processed version of the MPI face database are employed for a gender classi cation task. The processing assures that all heads have same mean intensity, same pixel-surface area and are centered. This processing stage is followed by a smoothing of the database in order to eliminate, as much as possible, scanning artifacts. Principal Component Analysis is used to obtain a low-dimensional representation of the faces in the database. A subject is asked to classify the faces and experimental parameters such as class (i.e. female/male), con dence ratings and reaction times are recorded. A mean classi cation error of 14.5% is measured and, on average, 0.5 males are classi ed as females and 21.3females as males. The mean reaction time for the correctly classi ed faces is 1229 +- 252 [ms] whereas the incorrectly classi ed faces have a mean reaction time of 1769 +- 304 [ms] showing that the reaction times increase with the subject's classi- cation error. Reaction times are also shown to decrease with increasing con dence, both for the correct and incorrect classi cations. Classi cation errors, reaction times and con dence ratings are then correlated to concepts of machine learning such as separating hyperplane obtained when considering Support Vector Machines, Relevance Vector Machines, boosted Prototype and K-means Learners. Elements near the separating hyperplane are found to be classi ed with more errors than those away from it. In addition, the subject's con dence increases when moving away from the hyperplane. A preliminary analysis on the available small number of subjects indicates that K-means classi cation seems to re ect the subject's classi cation behavior best. The above learnersare then used to generate \special" elements, or representations, of the low-dimensional database according to the labels given by the subject. A memory experiment follows where the representations are shown together with faces seen or unseen during the classi cation experiment. This experiment aims to assess the representations by investigating whether some representations, or special elements, are classi ed as \seen before" despite that they never appeared in the classi cation experiment, possibly hinting at their use during human classi cation.
PDF Web BibTeX

Empirical Inference Conference Paper Hierarchical Spatio-Temporal Morphable Models for Representation of complex movements for Imitation Learning Ilg, W., Bakir, G., Franz, M., Giese, M. In 11th International Conference on Advanced Robotics, (2)453-458, (Editors: Nunes, U., A. de Almeida, A. Bejczy, K. Kosuge and J.A.T. Machado), 11th International Conference on Advanced Robotics, January 2003 PDF BibTeX

Empirical Inference Technical Report A Note on Parameter Tuning for On-Line Shifting Algorithms Bousquet, O. Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 2003
In this short note, building on ideas of M. Herbster [2] we propose a method for automatically tuning the parameter of the FIXED-SHARE algorithm proposed by Herbster and Warmuth [3] in the context of on-line learning with shifting experts. We show that this can be done with a memory requirement of $O(nT)$ and that the additional loss incurred by the tuning is the same as the loss incurred for estimating the parameter of a Bernoulli random variable.
PDF PostScript BibTeX

Empirical Inference Book Chapter A Short Introduction to Learning with Kernels Schölkopf, B., Smola, A. In Proceedings of the Machine Learning Summer School, Lecture Notes in Artificial Intelligence, Vol. 2600, 41-64, LNAI 2600, (Editors: S Mendelson and AJ Smola), Springer, Berlin, Heidelberg, Germany, 2003 BibTeX

Empirical Inference Book Chapter An Introduction to Support Vector Machines Schölkopf, B. In Recent Advances and Trends in Nonparametric Statistics , 3-17, (Editors: MG Akritas and DN Politis), Elsevier, Amsterdam, The Netherlands, 2003 Web DOI BibTeX

Empirical Inference Article An Introduction to Variable and Feature Selection. Guyon, I., Elisseeff, A. Journal of Machine Learning, 3:1157-1182, 2003 BibTeX

Empirical Inference Book Chapter Bayesian Kernel Methods Smola, A., Schölkopf, B. In Advanced Lectures on Machine Learning, Machine Learning Summer School 2002, Lecture Notes in Computer Science, Vol. 2600, LNAI 2600:65-117, 0, (Editors: S Mendelson and AJ Smola), Springer, Berlin, Germany, 2003 DOI BibTeX

Empirical Inference Conference Paper Distance-based classification with Lipschitz functions von Luxburg, U., Bousquet, O. In Learning Theory and Kernel Machines, Proceedings of the 16th Annual Conference on Computational Learning Theory, 314-328, (Editors: Schölkopf, B. and M.K. Warmuth), Learning Theory and Kernel Machines, Proceedings of the 16th Annual Conference on Computational Learning Theory, 2003
The goal of this article is to develop a framework for large margin classification in metric spaces. We want to find a generalization of linear decision functions for metric spaces and define a corresponding notion of margin such that the decision function separates the training points with a large margin. It will turn out that using Lipschitz functions as decision functions, the inverse of the Lipschitz constant can be interpreted as the size of a margin. In order to construct a clean mathematical setup we isometrically embed the given metric space into a Banach space and the space of Lipschitz functions into its dual space. Our approach leads to a general large margin algorithm for classification in metric spaces. To analyze this algorithm, we first prove a representer theorem. It states that there exists a solution which can be expressed as linear combination of distances to sets of training points. Then we analyze the Rademacher complexity of some Lipschitz function classes. The generality of the Lipschitz approach can be seen from the fact that several well-known algorithms are special cases of the Lipschitz algorithm, among them the support vector machine, the linear programming machine, and the 1-nearest neighbor classifier.
PDF PostScript BibTeX

Empirical Inference Book Chapter Extension of the nu-SVM range for classification Perez-Cruz, F., Weston, J., Herrmann, D., Schölkopf, B. In Advances in Learning Theory: Methods, Models and Applications, NATO Science Series III: Computer and Systems Sciences, Vol. 190, 190:179-196, NATO Science Series III: Computer and Systems Sciences, (Editors: J Suykens and G Horvath and S Basu and C Micchelli and J Vandewalle), IOS Press, Amsterdam, 2003 BibTeX

Empirical Inference Conference Paper Feature Selection for Support Vector Machines by Means of Genetic Algorithms Fröhlich, H., Chapelle, O., Schölkopf, B. In 15th IEEE International Conference on Tools with AI, 142-148, 15th IEEE International Conference on Tools with AI, 2003 BibTeX

Empirical Inference Technical Report Interactive Images Toyama, K., Schölkopf, B. (MSR-TR-2003-64), Microsoft Research, Cambridge, UK, 2003
Interactive Images are a natural extension of three recent developments: digital photography, interactive web pages, and browsable video. An interactive image is a multi-dimensional image, displayed two dimensions at a time (like a standard digital image), but with which a user can interact to browse through the other dimensions. One might consider a standard video sequence viewed with a video player as a simple interactive image with time as the third dimension. Interactive images are a generalization of this idea, in which the third (and greater) dimensions may be focus, exposure, white balance, saturation, and other parameters. Interaction is handled via a variety of modes including those we call ordinal, pixel-indexed, cumulative, and comprehensive. Through exploration of three novel forms of interactive images based on color, exposure, and focus, we will demonstrate the compelling nature of interactive images.
Web BibTeX

Empirical Inference Conference Paper Kernel Methods and Their Applications to Signal Processing Bousquet, O., Perez-Cruz, F. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP ‘03), Proceedings. (ICASSP ‘03), Special Session on Kernel Methods:860 , ICASSP, 2003
Recently introduced in Machine Learning, the notion of kernels has drawn a lot of interest as it allows to obtain non-linear algorithms from linear ones in a simple and elegant manner. This, in conjunction with the introduction of new linear classification methods such as the Support Vector Machines has produced significant progress. The successes of such algorithms is now spreading as they are applied to more and more domains. Many Signal Processing problems, by their non-linear and high-dimensional nature may benefit from such techniques. We give an overview of kernel methods and their recent applications.
PDF PostScript BibTeX

Empirical Inference Poster Models of contrast transfer as a function of presentation time and spatial frequency. Wichmann, F. 2003
Understanding contrast transduction is essential for understanding spatial vision. Using standard 2AFC contrast discrimination experiments conducted using a carefully calibrated display we previously showed that the shape of the threshold versus (pedestal) contrast (TvC) curve changes with presentation time and the performance level defined as threshold (Wichmann, 1999; Wichmann & Henning, 1999). Additional experiments looked at the change of the TvC curve with spatial frequency (Bird, Henning & Wichmann, 2002), and at how to constrain the parameters of models of contrast processing (Wichmann, 2002). Here I report modelling results both across spatial frequency and presentation time. An extensive model-selection exploration was performed using Bayesian confidence regions for the fitted parameters as well as cross-validation methods. Bird, C.M., G.B. Henning and F.A. Wichmann (2002). Contrast discrimination with sinusoidal gratings of different spatial frequency. Journal of the Optical Society of America A, 19, 1267-1273. Wichmann, F.A. (1999). Some aspects of modelling human spatial vision: contrast discrimination. Unpublished doctoral dissertation, The University of Oxford. Wichmann, F.A. & Henning, G.B. (1999). Implications of the Pedestal Effect for Models of Contrast-Processing and Gain-Control. OSA Annual Meeting Program, 62. Wichmann, F.A. (2002). Modelling Contrast Transfer in Spatial Vision [Abstract]. Journal of Vision, 2, 7a.
BibTeX

Empirical Inference Article New Approaches to Statistical Learning Theory Bousquet, O. Annals of the Institute of Statistical Mathematics, 55(2):371-389, 2003
We present new tools from probability theory that can be applied to the analysis of learning algorithms. These tools allow to derive new bounds on the generalization performance of learning algorithms and to propose alternative measures of the complexity of the learning task, which in turn can be used to derive new learning algorithms.
PostScript BibTeX

Empirical Inference Conference Paper Predictive control with Gaussian process models Kocijan, J., Murray-Smith, R., Rasmussen, C., Likar, B. In Proceedings of IEEE Region 8 Eurocon 2003: Computer as a Tool, 352-356, (Editors: Zajc, B. and M. Tkal), Proceedings of IEEE Region 8 Eurocon 2003: Computer as a Tool, 2003
This paper describes model-based predictive control based on Gaussian processes.Gaussian process models provide a probabilistic non-parametric modelling approach for black-box identification of non-linear dynamic systems. It offers more insight in variance of obtained model response, as well as fewer parameters to determine than other models. The Gaussian processes can highlight areas of the input space where prediction quality is poor, due to the lack of data or its complexity, by indicating the higher variance around the predicted mean. This property is used in predictive control, where optimisation of control signal takes the variance information into account. The predictive control principle is demonstrated on a simulated example of nonlinear system.
PDF PostScript BibTeX

Empirical Inference Conference Paper Propagation of Uncertainty in Bayesian Kernel Models - Application to Multiple-Step Ahead Forecasting Quiñonero-Candela, J., Girard, A., Larsen, J., Rasmussen, C. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2:701-704, IEEE International Conference on Acoustics, Speech and Signal Processing, 2003
The object of Bayesian modelling is the predictive distribution, which in a forecasting scenario enables improved estimates of forecasted values and their uncertainties. In this paper we focus on reliably estimating the predictive mean and variance of forecasted values using Bayesian kernel based models such as the Gaussian Process and the Relevance Vector Machine. We derive novel analytic expressions for the predictive mean and variance for Gaussian kernel shapes under the assumption of a Gaussian input distribution in the static case, and of a recursive Gaussian predictive density in iterative forecasting. The capability of the method is demonstrated for forecasting of time-series and compared to approximate methods.
PDF PostScript BibTeX

Empirical Inference Book Chapter Stability of ensembles of kernel machines Elisseeff, A., Pontil, M. In 190:111-124, NATO Science Series III: Computer and Systems Science, (Editors: Suykens, J., G. Horvath, S. Basu, C. Micchelli and J. Vandewalle), IOS press, Netherlands, 2003 BibTeX

Empirical Inference Book Chapter Statistical Learning and Kernel Methods in Bioinformatics Schölkopf, B., Guyon, I., Weston, J. In Artificial Intelligence and Heuristic Methods in Bioinformatics, 183:1-21, 3, (Editors: P Frasconi und R Shamir), IOS Press, Amsterdam, The Netherlands, 2003 BibTeX

Empirical Inference Conference Paper Unsupervised Clustering of Images using their Joint Segmentation Seldin, Y., Starik, S., Werman, M. In The 3rd International Workshop on Statistical and Computational Theories of Vision (SCTV 2003), 1-24, 3rd International Workshop on Statistical and Computational Theories of Vision (SCTV 2003), 2003 PDF Web BibTeX

Empirical Inference Thesis m-Alternative Forced Choice—Improving the Efficiency of the Method of Constant Stimuli Jäkel, F. Biologische Kybernetik, Graduate School for Neural and Behavioural Sciences, Tübingen, 2003 BibTeX

Empirical Inference Book Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond Schölkopf, B., Smola, A. 644, Adaptive Computation and Machine Learning, MIT Press, Cambridge, MA, USA, December 2002, Parts of this book, including an introduction to kernel methods, can be downloaded here.
In the 1990s, a new type of learning algorithm was developed, based on results from statistical learning theory: the Support Vector Machine (SVM). This gave rise to a new class of theoretically elegant learning machines that use a central concept of SVMs-kernels—for a number of learning tasks. Kernel machines provide a modular framework that can be adapted to different tasks and domains by the choice of the kernel function and the base algorithm. They are replacing neural networks in a variety of fields, including engineering, information retrieval, and bioinformatics. Learning with Kernels provides an introduction to SVMs and related kernel methods. Although the book begins with the basics, it also includes the latest research. It provides all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms and to understand and apply the powerful algorithms that have been developed over the last few years.
Web DOI BibTeX

Empirical Inference Conference Paper Gender Classification of Human Faces Graf, A., Wichmann, F. In Biologically Motivated Computer Vision 2002, Biologically Motivated Computer Vision, 1-18, (Editors: Bülthoff, H. H., S.W. Lee, T. A. Poggio and C. Wallraven), Springer, Berlin, Germany, Second International Workshop on Biologically Motivated Computer Vision (BMCV 2002), November 2002
This paper addresses the issue of combining pre-processing methods—dimensionality reduction using Principal Component Analysis (PCA) and Locally Linear Embedding (LLE)—with Support Vector Machine (SVM) classification for a behaviorally important task in humans: gender classification. A processed version of the MPI head database is used as stimulus set. First, summary statistics of the head database are studied. Subsequently the optimal parameters for LLE and the SVM are sought heuristically. These values are then used to compare the original face database with its processed counterpart and to assess the behavior of a SVM with respect to changes in illumination and perspective of the face images. Overall, PCA was superior in classification performance and allowed linear separability.
PDF PDF DOI BibTeX

Empirical Inference Conference Paper Insect-Inspired Estimation of Self-Motion Franz, M., Chahl, J. In Proc. 2nd Workshop on Biologically Motivated Computer Vision 2002, BMCV 2002, Biologically Motivated Computer Vision, (2525)171-180, LNCS, (Editors: Bülthoff, H.H. , S.W. Lee, T.A. Poggio, C. Wallraven), Springer, Berlin, Germany, Second International Workshop on Biologically Motivated Computer Vision (BMCV 2002), November 2002
The tangential neurons in the fly brain are sensitive to the typical optic flow patterns generated during self-motion. In this study, we examine whether a simplified linear model of these neurons can be used to estimate self-motion from the optic flow. We present a theory for the construction of an optimal linear estimator incorporating prior knowledge about the environment. The optimal estimator is tested on a gantry carrying an omnidirectional vision sensor. The experiments show that the proposed approach leads to accurate and robust estimates of rotation rates, whereas translation estimates turn out to be less reliable.
PDF PDF DOI BibTeX

Empirical Inference Poster Modelling Contrast Transfer in Spatial Vision Wichmann, F. Journal of Vision, 2(10):7, Second Annual Meeting of the Vision Sciences Society (VSS 2002), November 2002
Much of our information about spatial vision comes from detection experiments involving low-contrast stimuli. Contrast discrimination experiments provide one way to explore the visual system's response to stimuli of higher contrast, the results of which allow different models of contrast processing (e.g. energy versus gain-control models) to be critically assessed (Wichmann & Henning, 1999). Studies of detection and discrimination using pulse train stimuli in noise, on the other hand, make predictions about the number, position and properties of noise sources within the processing stream (Henning, Bird & Wichmann, 2002). Here I report modelling results combining data from both sinusoidal and pulse train experiments in and without noise to arrive at a more tightly constrained model of early spatial vision.
Web DOI BibTeX

Empirical Inference Poster Pulse train detection and discrimination in pink noise Henning, G., Wichmann, F., Bird, C. Journal of Vision, 2(7):229, Second Annual Meeting of the Vision Sciences Society (VSS 2002), November 2002
Much of our information about spatial vision comes from detection experiments involving low-contrast stimuli. Contrast discrimination experiments provide one way to explore the visual system's response to stimuli of higher contrast. We explored both detection and contrast discrimination performance with sinusoidal and "pulse-train" (or line) gratings. Both types of grating had a fundamental spatial frequency of 2.09-c/deg but the pulse-train, ideally, contains, in addition to its fundamental component, all the harmonics of the fundamental. Although the 2.09-c/deg pulse-train produced on the display was measured and shown to contain at least 8 harmonics at equal contrast, it was no more detectable than its most detectable component; no benefit from having additional information at the harmonics was measurable. The addition of broadband "pink" noise, designed to equalize the detectability of the components of the pulse train, made it about a factor of four more detectable than any of its components. However, in contrast-discrimination experiments, with an in-phase pedestal or masking grating of the same form and phase as the signal and 15% contrast, the noise did not improve the discrimination performance of the pulse train relative to that of its sinusoidal components. In contrast, a 2.09-c/deg "super train," constructed to have 8 equally detectable harmonics, was a factor of five more detectable than any of its components. We discuss the implications of these observations for models of early vision in particular the implications for possible sources of internal noise.
Web DOI BibTeX

Empirical Inference Poster Surface-slant-from-texture discrimination: Effects of slant level and texture type Rosas, P., Wichmann, F., Wagemans, J. Journal of Vision, 2(7):300, Second Annual Meeting of the Vision Sciences Society (VSS 2002), November 2002
The problem of surface-slant-from-texture was studied psychophysically by measuring the performances of five human subjects in a slant-discrimination task with a number of different types of textures: uniform lattices, randomly displaced lattices, polka dots, Voronoi tessellations, orthogonal sinusoidal plaid patterns, fractal or 1/f noise, “coherent” noise and a “diffusion-based” texture (leopard skin-like). The results show: (1) Improving performance with larger slants for all textures. (2) A “non-symmetrical” performance around a particular slant characterized by a psychometric function that is steeper in the direction of the more slanted orientation. (3) For sufficiently large slants (66 deg) there are no major differences in performance between any of the different textures. (4) For slants at 26, 37 and 53 degrees, however, there are marked differences between the different textures. (5) The observed differences in performance across textures for slants up to 53 degrees are systematic within subjects, and nearly so across them. This allows a rank-order of textures to be formed according to their “helpfulness” — that is, how easy the discrimination task is when a particular texture is mapped on the surface. Polka dots tended to allow the best slant discrimination performance, noise patterns the worst up to the large slant of 66 degrees at which performance was almost independent of the particular texture chosen. Finally, our large number of 2AFC trials (approximately 2800 trials per texture across subjects) and associated tight confidence intervals may enable us to find out about which statistical properties of the textures could be responsible for surface-slant-from-texture estimation, with the ultimate goal of being able to predict observer performance for any arbitrary texture.
Web DOI BibTeX

Empirical Inference Conference Paper Combining sensory Information to Improve Visualization Ernst, M., Banks, M., Wichmann, F., Maloney, L., Bülthoff, H. In Proceedings of the Conference on Visualization ‘02 (VIS ‘02), 571-574, (Editors: Moorhead, R. , M. Joy), IEEE, Piscataway, NJ, USA, IEEE Conference on Visualization 2002 (VIS '02), October 2002
Seemingly effortlessly the human brain reconstructs the three-dimensional environment surrounding us from the light pattern striking the eyes. This seems to be true across almost all viewing and lighting conditions. One important factor for this apparent easiness is the redundancy of information provided by the sensory organs. For example, perspective distortions, shading, motion parallax, or the disparity between the two eyes' images are all, at least partly, redundant signals which provide us with information about the three-dimensional layout of the visual scene. Our brain uses all these different sensory signals and combines the available information into a coherent percept. In displays visualizing data, however, the information is often highly reduced and abstracted, which may lead to an altered perception and therefore a misinterpretation of the visualized data. In this panel we will discuss mechanisms involved in the combination of sensory information and their implications for simulations using computer displays, as well as problems resulting from current display technology such as cathode-ray tubes.
PDF Web BibTeX

Empirical Inference Article Constructing Boosting algorithms from SVMs: an application to one-class classification. Rätsch, G., Mika, S., Schölkopf, B., Müller, K. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(9):1184-1199, September 2002
We show via an equivalence of mathematical programs that a support vector (SV) algorithm can be translated into an equivalent boosting-like algorithm and vice versa. We exemplify this translation procedure for a new algorithm—one-class leveraging—starting from the one-class support vector machine (1-SVM). This is a first step toward unsupervised learning in a boosting framework. Building on so-called barrier methods known from the theory of constrained optimization, it returns a function, written as a convex combination of base hypotheses, that characterizes whether a given test point is likely to have been generated from the distribution underlying the training data. Simulations on one-class classification problems demonstrate the usefulness of our approach.
DOI BibTeX

Empirical Inference Conference Paper Incorporating Invariances in Non-Linear Support Vector Machines Chapelle, O., Schölkopf, B. In Advances in Neural Information Processing Systems, Advances in Neural Information Processing Systems 14, 609-616, (Editors: TG Dietterich and S Becker and Z Ghahramani), MIT Press, Cambridge, MA, USA, 15th Annual Neural Information Processing Systems Conference (NIPS 2001), September 2002
The choice of an SVM kernel corresponds to the choice of a representation of the data in a feature space and, to improve performance, it should therefore incorporate prior knowledge such as known transformation invariances. We propose a technique which extends earlier work and aims at incorporating invariances in nonlinear kernels. We show on a digit recognition task that the proposed approach is superior to the Virtual Support Vector method, which previously had been the method of choice.
PDF Web BibTeX

Empirical Inference Technical Report Kernel Dependency Estimation Weston, J., Chapelle, O., Elisseeff, A., Schölkopf, B., Vapnik, V. (98), Max Planck Institute for Biological Cybernetics, August 2002
We consider the learning problem of finding a dependency between a general class of objects and another, possibly different, general class of objects. The objects can be for example: vectors, images, strings, trees or graphs. Such a task is made possible by employing similarity measures in both input and output spaces using kernel functions, thus embedding the objects into vector spaces. Output kernels also make it possible to encode prior information and/or invariances in the loss function in an elegant way. We experimentally validate our approach on several tasks: mapping strings to strings, pattern recognition, and reconstruction from partial images.
PDF BibTeX

Empirical Inference Poster Phase information in the recognition of natural images Braun, D., Wichmann, F., Gegenfurtner, K. Perception, 31(ECVP Abstract Supplement):133, 25th European Conference on Visual Perception, August 2002
Fourier phase plays an important role in determining global image structure. For example, when the phase spectrum of an image of a flower is swapped with that of a tank, we usually perceive a tank, even though the amplitude spectrum is still that of the flower. Similarly, when the phase spectrum of an image is randomly swapped across frequencies, that is its Fourier energy is randomly distributed over the image, the resulting image becomes impossible to recognise. Our goal was to evaluate the effect of phase manipulations in a quantitative manner. Subjects viewed two images of natural scenes, one of which contained an animal (the target) embedded in the background. The spectra of the images were manipulated by adding random phase noise at each frequency. The phase noise was the independent variable, uniformly distributed between 0° and ±180°. Subjects were remarkably resistant to phase noise. Even with ±120° noise, subjects were still 75% correct. The proportion of correct answers closely followed the correlation between original and noise-distorted images. Thus it appears as if it was not the global phase information per se that determines our percept of natural images, but rather the effect of phase on local image features.
Web BibTeX

Empirical Inference Article The contributions of color to recognition memory for natural scenes Wichmann, F., Sharpe, L., Gegenfurtner, K. Journal of Experimental Psychology: Learning, Memory and Cognition, 28(3):509-520, May 2002
The authors used a recognition memory paradigm to assess the influence of color information on visual memory for images of natural scenes. Subjects performed 5-10% better for colored than for black-and-white images independent of exposure duration. Experiment 2 indicated little influence of contrast once the images were suprathreshold, and Experiment 3 revealed that performance worsened when images were presented in color and tested in black and white, or vice versa, leading to the conclusion that the surface property color is part of the memory representation. Experiments 4 and 5 exclude the possibility that the superior recognition memory for colored images results solely from attentional factors or saliency. Finally, the recognition memory advantage disappears for falsely colored images of natural scenes: The improvement in recognition memory depends on the color congruence of presented images with learned knowledge about the color gamut found within natural scenes. The results can be accounted for within a multiple memory systems framework.
PDF Web DOI BibTeX

Empirical Inference Poster Detection and discrimination in pink noise Wichmann, F., Henning, G. 5:100, 5. T{\"u}binger Wahrnehmungskonferenz (TWK 2002), February 2002
Much of our information about early spatial vision comes from detection experiments involving low-contrast stimuli, which are not, perhaps, particularly "natural" stimuli. Contrast discrimination experiments provide one way to explore the visual system's response to stimuli of higher contrast whilst keeping the number of unknown parameters comparatively small. We explored both detection and contrast discrimination performance with sinusoidal and "pulse-train" (or line) gratings. Both types of grating had a fundamental spatial frequency of 2.09-c/deg but the pulse-train, ideally, contains, in addition to its fundamental component, all the harmonics of the fundamental. Although the 2.09-c/deg pulse-train produced on our display was measured using a high-performance digital camera (Photometrics) and shown to contain at least 8 harmonics at equal contrast, it was no more detectable than its most detectable component; no benefit from having additional information at the harmonics was measurable. The addition of broadband 1-D "pink" noise made it about a factor of four more detectable than any of its components. However, in contrast-discrimination experiments, with an in-phase pedestal or masking grating of the same form and phase as the signal and 15% contrast, the noise did not improve the discrimination performance of the pulse train relative to that of its sinusoidal components. We discuss the implications of these observations for models of early vision in particular the implications for possible sources of internal noise.
Web BibTeX

Empirical Inference Article Training invariant support vector machines DeCoste, D., Schölkopf, B. Machine Learning, 46(1-3):161-190, January 2002
Practical experience has shown that in order to obtain the best possible performance, prior knowledge about invariances of a classification problem at hand ought to be incorporated into the training procedure. We describe and review all known methods for doing so in support vector machines, provide experimental results, and discuss their respective merits. One of the significant new results reported in this work is our recent achievement of the lowest reported test error on the well-known MNIST digit recognition benchmark task, with SVM training times that are also significantly faster than previous SVM methods.
PDF DOI BibTeX

Empirical Inference Technical Report A compression approach to support vector model selection von Luxburg, U., Bousquet, O., Schölkopf, B. (101), Max Planck Institute for Biological Cybernetics, 2002, see more detailed JMLR version
In this paper we investigate connections between statistical learning theory and data compression on the basis of support vector machine (SVM) model selection. Inspired by several generalization bounds we construct ``compression coefficients'' for SVMs, which measure the amount by which the training labels can be compressed by some classification hypothesis. The main idea is to relate the coding precision of this hypothesis to the width of the margin of the SVM. The compression coefficients connect well known quantities such as the radius-margin ratio R^2/rho^2, the eigenvalues of the kernel matrix and the number of support vectors. To test whether they are useful in practice we ran model selection experiments on several real world datasets. As a result we found that compression coefficients can fairly accurately predict the parameters for which the test error is minimized.
BibTeX

Empirical Inference Conference Paper A kernel approach for learning from almost orthogonal patterns Schölkopf, B., Weston, J., Eskin, E., Leslie, C., Noble, W. In 13th European Conference on Machine Learning (ECML 2002) and 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'2002), Helsinki, Principles of Data Mining and Knowledge Discovery, Lecture Notes in Computer Science, 2430/2431:511-528, Lecture Notes in Computer Science, (Editors: T Elomaa and H Mannila and H Toivonen), Springer, Berlin, Germany, 13th European Conference on Machine Learning (ECML 2002) and 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'2002), 2002 PostScript DOI BibTeX

Empirical Inference Poster Application of Monte Carlo Methods to Psychometric Function Fitting Wichmann, F. Proceedings of the 33rd European Conference on Mathematical Psychology, 44, 2002
The psychometric function relates an observer's performance to an independent variable, usually some physical quantity of a stimulus in a psychophysical task. Here I describe methods to (1) fitting psychometric functions, (2) assessing goodness-of-fit, and (3) providing confidence intervals for the function's parameters and other estimates derived from them. First I describe a constrained maximum-likelihood method for parameter estimation. Using Monte-Carlo simulations I demonstrate that it is important to have a fitting method that takes stimulus-independent errors (or "lapses") into account. Second, a number of goodness-of-fit tests are introduced. Because psychophysical data sets are usually rather small I advocate the use of Monte Carlo resampling techniques that do not rely on asymptotic theory for goodness-of-fit assessment. Third, a parametric bootstrap is employed to estimate the variability of fitted parameters and derived quantities such as thresholds and slopes. I describe how the bootstrap bridging assumption, on which the validity of the procedure depends, can be tested without incurring too high a cost in computation time. Finally I describe how the methods can be extended to test hypotheses concerning the form and shape of several psychometric functions. Software describing the methods is available (http://www.bootstrap-software.com/psignifit/), as well as articles describing the methods in detail (Wichmann&Hill, Perception&Psychophysics, 2001a,b).
BibTeX

Empirical Inference Article Contrast discrimination with pulse-trains in pink noise Henning, G., Bird, C., Wichmann, F. Journal of the Optical Society of America A, 19(7):1259-1266, 2002
Detection performance was measured with sinusoidal and pulse-train gratings. Although the 2.09-c/deg pulse-train, or line gratings, contained at least 8 harmonics all at equal contrast, they were no more detectable than their most detectable component. The addition of broadband pink noise designed to equalize the detectability of the components of the pulse train made the pulse train about a factor of four more detectable than any of its components. However, in contrast-discrimination experiments, with a pedestal or masking grating of the same form and phase as the signal and 15% contrast, the noise did not affect the discrimination performance of the pulse train relative to that obtained with its sinusoidal components. We discuss the implications of these observations for models of early vision in particular the implications for possible sources of internal noise.
PDF BibTeX

Empirical Inference Article Contrast discrimination with sinusoidal gratings of different spatial frequency Bird, C., Henning, G., Wichmann, F. Journal of the Optical Society of America A, 19(7):1267-1273, 2002
The detectability of contrast increments was measured as a function of the contrast of a masking or “pedestal” grating at a number of different spatial frequencies ranging from 2 to 16 cycles per degree of visual angle. The pedestal grating always had the same orientation, spatial frequency and phase as the signal. The shape of the contrast increment threshold versus pedestal contrast (TvC) functions depend of the performance level used to define the “threshold,” but when both axes are normalized by the contrast corresponding to 75% correct detection at each frequency, the (TvC) functions at a given performance level are identical. Confidence intervals on the slope of the rising part of the TvC functions are so wide that it is not possible with our data to reject Weber’s Law.
PDF BibTeX

Empirical Inference Conference Paper Luminance Artifacts on CRT Displays Wichmann, F. In IEEE Visualization, 571-574, (Editors: Moorhead, R.; Gross, M.; Joy, K. I.), IEEE Visualization, 2002
Most visualization panels today are still built around cathode-ray tubes (CRTs), certainly on personal desktops at work and at home. Whilst capable of producing pleasing images for common applications ranging from email writing to TV and DVD presentation, it is as well to note that there are a number of nonlinear transformations between input (voltage) and output (luminance) which distort the digital and/or analogue images send to a CRT. Some of them are input-independent and hence easy to fix, e.g. gamma correction, but others, such as pixel interactions, depend on the content of the input stimulus and are thus harder to compensate for. CRT-induced image distortions cause problems not only in basic vision research but also for applications where image fidelity is critical, most notably in medicine (digitization of X-ray images for diagnostic purposes) and in forms of online commerce, such as the online sale of images, where the image must be reproduced on some output device which will not have the same transfer function as the customer's CRT. I will present measurements from a number of CRTs and illustrate how some of their shortcomings may be problematic for the aforementioned applications.
BibTeX

Empirical Inference Poster Optimal linear estimation of self-motion - a real-world test of a model of fly tangential neurons Franz, M. SAB 02 Workshop, Robotics as theoretical biology, 7th meeting of the International Society for Simulation of Adaptive Behaviour (SAB), (Editors: Prescott, T.; Webb, B.), 2002
The tangential neurons in the fly brain are sensitive to the typical optic flow patterns generated during self-motion (see example in Fig.1). We examine whether a simplified linear model of these neurons can be used to estimate self-motion from the optic flow. We present a theory for the construction of an optimal linear estimator incorporating prior knowledge both about the distance distribution of the environment, and about the noise and self-motion statistics of the sensor. The optimal estimator is tested on a gantry carrying an omnidirectional vision sensor that can be moved along three translational and one rotational degree of freedom. The experiments indicate that the proposed approach yields accurate results for rotation estimates, independently of the current translation and scene layout. Translation estimates, however, turned out to be sensitive to simultaneous rotation and to the particular distance distribution of the scene. The gantry experiments confirm that the receptive field organization of the tangential neurons allows them, as an ensemble, to extract self-motion from the optic flow.
PDF BibTeX

Empirical Inference Article Regularized principal manifolds Smola, A., Mika, S., Schölkopf, B., Williamson, R. Journal of Machine Learning Research, 1:179-209, June 2001
Many settings of unsupervised learning can be viewed as quantization problems - the minimization of the expected quantization error subject to some restrictions. This allows the use of tools such as regularization from the theory of (supervised) risk minimization for unsupervised learning. This setting turns out to be closely related to principal curves, the generative topographic map, and robust coding. We explore this connection in two ways: (1) we propose an algorithm for finding principal manifolds that can be regularized in a variety of ways; and (2) we derive uniform convergence bounds and hence bounds on the learning rates of the algorithm. In particular, we give bounds on the covering numbers which allows us to obtain nearly optimal learning rates for certain types of regularization operators. Experimental results demonstrate the feasibility of the approach.
PDF BibTeX

Empirical Inference Thesis Variationsverfahren zur Untersuchung von Grundzustandseigenschaften des Ein-Band Hubbard-Modells Eichhorn, J. Biologische Kybernetik, Technische Universität Dresden, Dresden/Germany, May 2001
Using different modifications of a new variational approach, statical groundstate properties of the one-band Hubbard model such as energy and staggered magnetisation are calculated. By taking into account additional fluctuations, the method ist gradually improved so that a very good description of the energy in one and two dimensions can be achieved. After a detailed discussion of the application in one dimension, extensions for two dimensions are introduced. By use of a modified version of the variational ansatz in particular a description of the quantum phase transition for the magnetisation should be possible.
PostScript BibTeX

Empirical Inference Article Markovian domain fingerprinting: statistical segmentation of protein sequences Bejerano, G., Seldin, Y., Margalit, H., Tishby, N. Bioinformatics, 17(10):927-934, 2001 PDF Web BibTeX

Empirical Inference Article The psychometric function: I. Fitting, sampling and goodness-of-fit Wichmann, F., Hill, N. Perception and Psychophysics, 63 (8):1293-1313, 2001
The psychometric function relates an observer'sperformance to an independent variable, usually some physical quantity of a stimulus in a psychophysical task. This paper, together with its companion paper (Wichmann & Hill, 2001), describes an integrated approach to (1) fitting psychometric functions, (2) assessing the goodness of fit, and (3) providing confidence intervals for the function'sparameters and other estimates derived from them, for the purposes of hypothesis testing. The present paper deals with the first two topics, describing a constrained maximum-likelihood method of parameter estimation and developing several goodness-of-fit tests. Using Monte Carlo simulations, we deal with two specific difficulties that arise when fitting functions to psychophysical data. First, we note that human observers are prone to stimulus-independent errors (or lapses ). We show that failure to account for this can lead to serious biases in estimates of the psychometric function'sparameters and illustrate how the problem may be overcome. Second, we note that psychophysical data sets are usually rather small by the standards required by most of the commonly applied statistical tests. We demonstrate the potential errors of applying traditional X^2 methods to psychophysical data and advocate use of Monte Carlo resampling techniques that do not rely on asymptotic theory. We have made available the software to implement our methods
PDF BibTeX

Empirical Inference Conference Paper Unsupervised Segmentation and Classification of Mixtures of Markovian Sources Seldin, Y., Bejerano, G., Tishby, N. In The 33rd Symposium on the Interface of Computing Science and Statistics (Interface 2001 - Frontiers in Data Mining and Bioinformatics), 1-15, 33rd Symposium on the Interface of Computing Science and Statistics (Interface 2001 - Frontiers in Data Mining and Bioinformatics), 2001
We describe a novel algorithm for unsupervised segmentation of sequences into alternating Variable Memory Markov sources, first presented in [SBT01]. The algorithm is based on competitive learning between Markov models, when implemented as Prediction Suffix Trees [RST96] using the MDL principle. By applying a model clustering procedure, based on rate distortion theory combined with deterministic annealing, we obtain a hierarchical segmentation of sequences between alternating Markov sources. The method is applied successfully to unsupervised segmentation of multilingual texts into languages where it is able to infer correctly both the number of languages and the language switching points. When applied to protein sequence families (results of the [BSMT01] work), we demonstrate the method‘s ability to identify biologically meaningful sub-sequences within the proteins, which correspond to signatures of important functional sub-units called domains. Our approach to proteins classification (through the obtained signatures) is shown to have both conceptual and practical advantages over the currently used methods.
PDF Web BibTeX