Header logo is


2009


no image
Initial Data from a first PET/MRI-System and its Applications in Clinical Studies Using MRI Based Attenuation Correction

Kolb, A., Hofmann, M., Sossi, V., Wehrl, H., Sauter, A., Schmid, A., Judenhofer, M., Schlemmer, H., Claussen, C., Pichler, B.

2009 World Molecular Imaging Congress, 2009, pages: 1200, September 2009 (poster)

ei

Web [BibTex]

2009


Web [BibTex]


no image
A High-Speed Object Tracker from Off-the-Shelf Components

Lampert, C., Peters, J.

First IEEE Workshop on Computer Vision for Humanoid Robots in Real Environments at ICCV 2009, 1, pages: 1, September 2009 (poster)

Abstract
We introduce RTblob, an open-source real-time vision system for 3D object detection that achieves over 200 Hz tracking speed with only off-the-shelf hardware component. It allows fast and accurate tracking of colored objects in 3D without expensive and often custom-built hardware, instead making use of the PC graphics cards for the necessary image processing operations.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Fast Kernel-Based Independent Component Analysis

Shen, H., Jegelka, S., Gretton, A.

IEEE Transactions on Signal Processing, 57(9):3498-3511, September 2009 (article)

Abstract
Recent approaches to independent component analysis (ICA) have used kernel independence measures to obtain highly accurate solutions, particularly where classical methods experience difficulty (for instance, sources with near-zero kurtosis). FastKICA (fast HSIC-based kernel ICA) is a new optimization method for one such kernel independence measure, the Hilbert-Schmidt Independence Criterion (HSIC). The high computational efficiency of this approach is achieved by combining geometric optimization techniques, specifically an approximate Newton-like method on the orthogonal group, with accurate estimates of the gradient and Hessian based on an incomplete Cholesky decomposition. In contrast to other efficient kernel-based ICA algorithms, FastKICA is applicable to any twice differentiable kernel function. Experimental results for problems with large numbers of sources and observations indicate that FastKICA provides more accurate solutions at a given cost than gradient descent on HSIC. Comparing with other recently published ICA methods, FastKICA is competitive in terms of accuracy, relatively insensitive to local minima when initialized far from independence, and more robust towards outliers. An analysis of the local convergence properties of FastKICA is provided.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Guest editorial: Special issue on robot learning, Part B

Peters, J., Ng, A.

Autonomous Robots, 27(2):91-92, August 2009 (article)

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Estimating Critical Stimulus Features from Psychophysical Data: The Decision-Image Technique Applied to Human Faces

Macke, J., Wichmann, F.

Journal of Vision, 9(8):31, 9th Annual Meeting of the Vision Sciences Society (VSS), August 2009 (poster)

Abstract
One of the main challenges in the sensory sciences is to identify the stimulus features on which the sensory systems base their computations: they are a pre-requisite for computational models of perception. We describe a technique---decision-images--- for extracting critical stimulus features based on logistic regression. Rather than embedding the stimuli in noise, as is done in classification image analysis, we want to infer the important features directly from physically heterogeneous stimuli. A Decision-image not only defines the critical region-of-interest within a stimulus but is a quantitative template which defines a direction in stimulus space. Decision-images thus enable the development of predictive models, as well as the generation of optimized stimuli for subsequent psychophysical investigations. Here we describe our method and apply it to data from a human face discrimination experiment. We show that decision-images are able to predict human responses not only in terms of overall percent correct but are able to predict, for individual observers, the probabilities with which individual faces are (mis-) classified. We then test the predictions of the models using optimized stimuli. Finally, we discuss possible generalizations of the approach and its relationships with other models.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Policy Search for Motor Primitives

Peters, J., Kober, J.

KI - Zeitschrift K{\"u}nstliche Intelligenz, 23(3):38-40, August 2009 (article)

Abstract
Many motor skills in humanoid robotics can be learned using parametrized motor primitives from demonstrations. However, most interesting motor learning problems require self-improvement often beyond the reach of current reinforcement learning methods due to the high dimensionality of the state-space. We develop an EM-inspired algorithm applicable to complex motor learning tasks. We compare this algorithm to several well-known parametrized policy search methods and show that it outperforms them. We apply it to motor learning problems and show that it can learn the complex Ball-in-a-Cup task using a real Barrett WAM robot arm.

ei

Web [BibTex]

Web [BibTex]


no image
A neurophysiologically plausible population code model for human contrast discrimination

Goris, R., Wichmann, F., Henning, G.

Journal of Vision, 9(7):1-22, July 2009 (article)

Abstract
The pedestal effect is the improvement in the detectability of a sinusoidal grating in the presence of another grating of the same orientation, spatial frequency, and phase—usually called the pedestal. Recent evidence has demonstrated that the pedestal effect is differently modified by spectrally flat and notch-filtered noise: The pedestal effect is reduced in flat noise but virtually disappears in the presence of notched noise (G. B. Henning & F. A. Wichmann, 2007). Here we consider a network consisting of units whose contrast response functions resemble those of the cortical cells believed to underlie human pattern vision and demonstrate that, when the outputs of multiple units are combined by simple weighted summation—a heuristic decision rule that resembles optimal information combination and produces a contrast-dependent weighting profile—the network produces contrast-discrimination data consistent with psychophysical observations: The pedestal effect is present without noise, reduced in broadband noise, but almost disappears in notched noise. These findings follow naturally from the normalization model of simple cells in primary visual cortex, followed by response-based pooling, and suggest that in processing even low-contrast sinusoidal gratings, the visual system may combine information across neurons tuned to different spatial frequencies and orientations.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Falsificationism and Statistical Learning Theory: Comparing the Popper and Vapnik-Chervonenkis Dimensions

Corfield, D., Schölkopf, B., Vapnik, V.

Journal for General Philosophy of Science, 40(1):51-58, July 2009 (article)

Abstract
We compare Karl Popper’s ideas concerning the falsifiability of a theory with similar notions from the part of statistical learning theory known as VC-theory. Popper’s notion of the dimension of a theory is contrasted with the apparently very similar VC-dimension. Having located some divergences, we discuss how best to view Popper’s work from the perspective of statistical learning theory, either as a precursor or as aiming to capture a different learning activity.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Consistent Nonparametric Tests of Independence

Gretton, A., Györfi, L.

(172), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, July 2009 (techreport)

Abstract
Three simple and explicit procedures for testing the independence of two multi-dimensional random variables are described. Two of the associated test statistics (L1, log-likelihood) are defined when the empirical distribution of the variables is restricted to finite partitions. A third test statistic is defined as a kernel-based independence measure. Two kinds of tests are provided. Distribution-free strong consistent tests are derived on the basis of large deviation bounds on the test statistcs: these tests make almost surely no Type I or Type II error after a random sample size. Asymptotically alpha-level tests are obtained from the limiting distribution of the test statistics. For the latter tests, the Type I error converges to a fixed non-zero value alpha, and the Type II error drops to zero, for increasing sample size. All tests reject the null hypothesis of independence if the test statistics become large. The performance of the tests is evaluated experimentally on benchmark data.

ei

PDF [BibTex]

PDF [BibTex]


no image
Semi-supervised Analysis of Human fMRI Data

Shelton, JA., Blaschko, MB., Lampert, CH., Bartels, A.

Berlin Brain Computer Interface Workshop on Advances in Neurotechnology, 2009, pages: 1, July 2009 (poster)

Abstract
Kernel Canonical Correlation Analysis (KCCA) is a general technique for subspace learning that incorporates principal components analysis (PCA) and Fisher linear discriminant analysis (LDA) as special cases. By finding directions that maximize correlation, CCA learns representations tied more closely to underlying process generating the the data and can ignore high-variance noise directions. However, for data where acquisition in a given modality is expensive or otherwise limited, CCA may suffer from small sample effects. We propose to use semisupervised Laplacian regularization to utilize data that are present in only one modality. This approach is able to find highly correlated directions that also lie along the data manifold, resulting in a more robust estimate of correlated subspaces. Functional magnetic resonance imaging (fMRI) acquired data are naturally amenable to subspace techniques as data are well aligned. fMRI data of the human brain are a particularly interesting candidate. In this study we implemented various supervised and semi-supervised versions of CCA on human fMRI data, with regression to single and multivariate labels (corresponding to video content subjects viewed during the image acquisition). In each variate condition, the semi-supervised variants of CCA performed better than the supervised variants, including a supervised variant with Laplacian regularization. We additionally analyze the weights learned by the regression in order to infer brain regions that are important to different types of visual processing.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Guest editorial: Special issue on robot learning, Part A

Peters, J., Ng, A.

Autonomous Robots, 27(1):1-2, July 2009 (article)

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
A Geometric Approach to Confidence Sets for Ratios: Fieller’s Theorem, Generalizations, and Bootstrap

von Luxburg, U., Franz, V.

Statistica Sinica, 19(3):1095-1117, July 2009 (article)

Abstract
We present a geometric method to determine confidence sets for the ratio E(Y)/E(X) of the means of random variables X and Y. This method reduces the problem of constructing confidence sets for the ratio of two random variables to the problem of constructing confidence sets for the means of one-dimensional random variables. It is valid in a large variety of circumstances. In the case of normally distributed random variables, the so constructed confidence sets coincide with the standard Fieller confidence sets. Generalizations of our construction lead to definitions of exact and conservative confidence sets for very general classes of distributions, provided the joint expectation of (X,Y) exists and the linear combinations of the form aX + bY are well-behaved. Finally, our geometric method allows to derive a very simple bootstrap approach for constructing conservative confidence sets for ratios which perform favorably in certain situations, in particular in the asymmetric heavy-tailed regime.

ei

PDF PDF Web [BibTex]


no image
Learning Motor Primitives for Robotics

Kober, J., Peters, J., Oztop, E.

Advanced Telecommunications Research Center ATR, June 2009 (talk)

Abstract
The acquisition and self-improvement of novel motor skills is among the most important problems in robotics. Motor primitives offer one of the most promising frameworks for the application of machine learning techniques in this context. Employing the Dynamic Systems Motor primitives originally introduced by Ijspeert et al. (2003), appropriate learning algorithms for a concerted approach of both imitation and reinforcement learning are presented. Using these algorithms new motor skills, i.e., Ball-in-a-Cup, Ball-Paddling and Dart-Throwing, are learned.

ei

[BibTex]

[BibTex]


no image
Varieties of Justification in Machine Learning

Corfield, D.

In Proceedings of Multiplicity and Unification in Statistics and Probability, pages: 1-10, Multiplicity and Unification in Statistics and Probability, June 2009 (inproceedings)

Abstract
The field of machine learning has flourished over the past couple of decades. With huge amounts of data available, efficient algorithms can learn to extrapolate from their training sets to become very accurate classifiers. For example, it is straightforward now to develop classifiers which achieve accuracies of around 99% on databases of handwritten digits. Now these algorithms have been devised by theorists who arrive at the problem of machine learning with a range of different philosophical outlooks on the subject of inductive reasoning. This has led to a wide range of theoretical rationales for their work. In this talk I shall classify the different forms of justification for inductive machine learning into four kinds, and make some comparisons between them. With little by way of theoretical knowledge to aid in the learning tasks, while the relevance of these justificatory approaches for the inductive reasoning of the natural sciences is questionable, certain issues surrounding the presuppositions of inductive reasoning are brought sharply into focus. In particular, Frequentist, Bayesian and MDL outlooks can be compared.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Effects of Stimulus Type and of Error-Correcting Code Design on BCI Speller Performance

Hill, J., Farquhar, J., Martens, S., Biessmann, F., Schölkopf, B.

In Advances in neural information processing systems 21, pages: 665-672, (Editors: D Koller and D Schuurmans and Y Bengio and L Bottou), Curran, Red Hook, NY, USA, 22nd Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
From an information-theoretic perspective, a noisy transmission system such as a visual Brain-Computer Interface (BCI) speller could benefit from the use of errorcorrecting codes. However, optimizing the code solely according to the maximal minimum-Hamming-distance criterion tends to lead to an overall increase in target frequency of target stimuli, and hence a significantly reduced average target-to-target interval (TTI), leading to difficulties in classifying the individual event-related potentials (ERPs) due to overlap and refractory effects. Clearly any change to the stimulus setup must also respect the possible psychophysiological consequences. Here we report new EEG data from experiments in which we explore stimulus types and codebooks in a within-subject design, finding an interaction between the two factors. Our data demonstrate that the traditional, rowcolumn code has particular spatial properties that lead to better performance than one would expect from its TTIs and Hamming-distances alone, but nonetheless error-correcting codes can improve performance provided the right stimulus type is used.

ei

PDF PDF Web [BibTex]

PDF PDF Web [BibTex]


no image
Influence of graph construction on graph-based clustering measures

Maier, M., von Luxburg, U., Hein, M.

In Advances in neural information processing systems 21, pages: 1025-1032, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Graph clustering methods such as spectral clustering are defined for general weighted graphs. In machine learning, however, data often is not given in form of a graph, but in terms of similarity (or distance) values between points. In this case, first a neighborhood graph is constructed using the similarities between the points and then a graph clustering algorithm is applied to this graph. In this paper we investigate the influence of the construction of the similarity graph on the clustering results. We first study the convergence of graph clustering criteria such as the normalized cut (Ncut) as the sample size tends to infinity. We find that the limit expressions are different for different types of graph, for example the r-neighborhood graph or the k-nearest neighbor graph. In plain words: Ncut on a kNN graph does something systematically different than Ncut on an r-neighborhood graph! This finding shows that graph clustering criteria cannot be studied independently of the kind of graph they are applied to. We also provide examples which show that these differences can be observed for toy and real data already for rather small sample sizes.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Local Gaussian Process Regression for Real Time Online Model Learning and Control

Nguyen-Tuong, D., Seeger, M., Peters, J.

In Advances in neural information processing systems 21, pages: 1193-1200, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Learning in real-time applications, e.g., online approximation of the inverse dynamics model for model-based robot control, requires fast online regression techniques. Inspired by local learning, we propose a method to speed up standard Gaussian Process regression (GPR) with local GP models (LGP). The training data is partitioned in local regions, for each an individual GP model is trained. The prediction for a query point is performed by weighted estimation using nearby local models. Unlike other GP approximations, such as mixtures of experts, we use a distance based measure for partitioning of the data and weighted prediction. The proposed method achieves online learning and prediction in real-time. Comparisons with other nonparametric regression methods show that LGP has higher accuracy than LWPR and close to the performance of standard GPR and nu-SVR.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Detecting the Direction of Causal Time Series

Peters, J., Janzing, D., Gretton, A., Schölkopf, B.

In Proceedings of the 26th International Conference on Machine Learning, pages: 801-808, (Editors: A Danyluk and L Bottou and ML Littman), ACM Press, New York, NY, USA, ICML, June 2009 (inproceedings)

Abstract
We propose a method that detects the true direction of time series, by fitting an autoregressive moving average model to the data. Whenever the noise is independent of the previous samples for one ordering of the observations, but dependent for the opposite ordering, we infer the former direction to be the true one. We prove that our method works in the population case as long as the noise of the process is not normally distributed (for the latter case, the direction is not identificable). A new and important implication of our result is that it confirms a fundamental conjecture in causal reasoning - if after regression the noise is independent of signal for one direction and dependent for the other, then the former represents the true causal direction - in the case of time series. We test our approach on two types of data: simulated data sets conforming to our modeling assumptions, and real world EEG time series. Our method makes a decision for a significant fraction of both data sets, and these decisions are mostly correct. For real world data, our approach outperforms alternative solutions to the problem of time direction recovery.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Text Clustering with Mixture of von Mises-Fisher Distributions

Sra, S., Banerjee, A., Ghosh, J., Dhillon, I.

In Text mining: classification, clustering, and applications, pages: 121-161, Chapman & Hall/CRC data mining and knowledge discovery series, (Editors: Srivastava, A. N. and Sahami, M.), CRC Press, Boca Raton, FL, USA, June 2009 (inbook)

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer

Lampert, C.

IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), June 2009 (talk)

ei

Web [BibTex]

Web [BibTex]


no image
Learning Taxonomies by Dependence Maximization

Blaschko, M., Gretton, A.

In Advances in neural information processing systems 21, pages: 153-160, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
We introduce a family of unsupervised algorithms, numerical taxonomy clustering, to simultaneously cluster data, and to learn a taxonomy that encodes the relationship between the clusters. The algorithms work by maximizing the dependence between the taxonomy and the original data. The resulting taxonomy is a more informative visualization of complex data than simple clustering; in addition, taking into account the relations between different clusters is shown to substantially improve the quality of the clustering, when compared with state-ofthe-art algorithms in the literature (both spectral clustering and a previous dependence maximization approach). We demonstrate our algorithm on image and text data.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Learning object-specific grasp affordance densities

Detry, R., Baseski, E., Popovic, M., Touati, Y., Krüger, N., Kroemer, O., Peters, J., Piater, J.

In 8th IEEE International Conference on Development and Learning, pages: 1-7, IEEE Service Center, Piscataway, NJ, USA, ICDL, June 2009 (inproceedings)

Abstract
This paper addresses the issue of learning and representing object grasp affordances, i.e. object-gripper relative configurations that lead to successful grasps. The purpose of grasp affordances is to organize and store the whole knowledge that an agent has about the grasping of an object, in order to facilitate reasoning on grasping solutions and their achievability. The affordance representation consists in a continuous probability density function defined on the 6D gripper pose space-3D position and orientation-, within an object-relative reference frame. Grasp affordances are initially learned from various sources, e.g. from imitation or from visual cues, leading to grasp hypothesis densities. Grasp densities are attached to a learned 3D visual object model, and pose estimation of the visual model allows a robotic agent to execute samples from a grasp hypothesis density under various object poses. Grasp outcomes are used to learn grasp empirical densities, i.e. grasps that have been confirmed through experience. We show the result of learning grasp hypothesis densities from both imitation and visual cues, and present grasp empirical densities learned from physical experience by a robot.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Understanding Brain Connectivity Patterns during Motor Imagery for Brain-Computer Interfacing

Grosse-Wentrup, M.

In Advances in neural information processing systems 21, pages: 561-568, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
EEG connectivity measures could provide a new type of feature space for inferring a subject‘s intention in Brain-Computer Interfaces (BCIs). However, very little is known on EEG connectivity patterns for BCIs. In this study, EEG connectivity during motor imagery (MI) of the left and right is investigated in a broad frequency range across the whole scalp by combining Beamforming with Transfer Entropy and taking into account possible volume conduction effects. Observed connectivity patterns indicate that modulation intentionally induced by MI is strongest in the gamma-band, i.e., above 35 Hz. Furthermore, modulation between MI and rest is found to be more pronounced than between MI of different hands. This is in contrast to results on MI obtained with bandpower features, and might provide an explanation for the so far only moderate success of connectivity features in BCIs. It is concluded that future studies on connectivity based BCIs should focus on high frequency bands and con side r ex peri mental paradigms that maximally vary cognitive demands between conditions.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Nonlinear causal discovery with additive noise models

Hoyer, P., Janzing, D., Mooij, J., Peters, J., Schölkopf, B.

In Advances in neural information processing systems 21, pages: 689-696, (Editors: D Koller and D Schuurmans and Y Bengio and L Bottou), Curran, Red Hook, NY, USA, 22nd Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
The discovery of causal relationships between a set of observed variables is a fundamental problem in science. For continuous-valued data linear acyclic causal models are often used because these models are well understood and there are well-known methods to fit them to data. In reality, of course, many causal relationships are more or less nonlinear, raising some doubts as to the applicability and usefulness of purely linear methods. In this contribution we show that in fact the basic linear framework can be generalized to nonlinear models with additive noise. In this extended framework, nonlinearities in the data-generating process are in fact a blessing rather than a curse, as they typically provide information on the underlying causal system and allow more aspects of the true data-generating mechanisms to be identified. In addition to theoretical results we show simulations and some simple real data experiments illustrating the identification power provided by nonlinearities.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Bounds on marginal probability distributions

Mooij, JM., Kappen, B.

In Advances in neural information processing systems 21, pages: 1105-1112, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
We propose a novel bound on single-variable marginal probability distributions in factor graphs with discrete variables. The bound is obtained by propagating local bounds (convex sets of probability distributions) over a subtree of the factor graph, rooted in the variable of interest. By construction, the method not only bounds the exact marginal probability distribution of a variable, but also its approximate Belief Propagation marginal ("belief"). Thus, apart from providing a practical means to calculate bounds on marginals, our contribution also lies in providing a better understanding of the error made by Belief Propagation. We show that our bound outperforms the state-of-the-art on some inference problems arising in medical diagnosis.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Convex variational Bayesian inference for large scale generalized linear models

Nickisch, H., Seeger, M.

In ICML 2009, pages: 761-768, (Editors: Danyluk, A. , L. Bottou, M. Littman), ACM Press, New York, NY, USA, 26th International Conference on Machine Learning, June 2009 (inproceedings)

Abstract
We show how variational Bayesian inference can be implemented for very large generalized linear models. Our relaxation is proven to be a convex problem for any log-concave model. We provide a generic double loop algorithm for solving this relaxation on models with arbitrary super-Gaussian potentials. By iteratively decoupling the criterion, most of the work can be done by solving large linear systems, rendering our algorithm orders of magnitude faster than previously proposed solvers for the same problem. We evaluate our method on problems of Bayesian active learning for large binary classification models, and show how to address settings with many candidates and sequential inclusion steps.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis

Schweikert, G., Widmer, C., Schölkopf, B., Rätsch, G.

In Advances in neural information processing systems 21, pages: 1433-1440, (Editors: D Koller and D Schuurmans and Y Bengio and L Bottou), Curran, Red Hook, NY, USA, 22nd Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
We study the problem of domain transfer for a supervised classification task in mRNA splicing. We consider a number of recent domain transfer methods from machine learning, including some that are novel, and evaluate them on genomic sequence data from model organisms of varying evolutionary distance. We find that in cases where the organisms are not closely related, the use of domain adaptation methods can help improve classification performance.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Diffeomorphic Dimensionality Reduction

Walder, C., Schölkopf, B.

In Advances in neural information processing systems 21, pages: 1713-1720, (Editors: D Koller and D Schuurmans and Y Bengio and L Bottou), Curran, Red Hook, NY, USA, 22nd Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
This paper introduces a new approach to constructing meaningful lower dimensional representations of sets of data points. We argue that constraining the mapping between the high and low dimensional spaces to be a diffeomorphism is a natural way of ensuring that pairwise distances are approximately preserved. Accordingly we develop an algorithm which diffeomorphically maps the data near to a lower dimensional subspace and then projects onto that subspace. The problem of solving for the mapping is transformed into one of solving for an Eulerian flow field which we compute using ideas from kernel methods. We demonstrate the efficacy of our approach on various real world data sets.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Combining appearance and motion for human action classification in videos

Dhillon, P., Nowozin, S., Lampert, C.

In 1st International Workshop on Visual Scene Understanding, pages: 22-29, IEEE Service Center, Piscataway, NJ, USA, ViSU, June 2009 (inproceedings)

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Let the Kernel Figure it Out: Principled Learning of Pre-processing for Kernel Classifiers

Gehler, P., Nowozin, S.

In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages: 2836-2843, IEEE Service Center, Piscataway, NJ, USA, CVPR, June 2009 (inproceedings)

Abstract
Most modern computer vision systems for high-level tasks, such as image classification, object recognition and segmentation, are based on learning algorithms that are able to separate discriminative information from noise. In practice, however, the typical system consists of a long pipeline of pre-processing steps, such as extraction of different kinds of features, various kinds of normalizations, feature selection, and quantization into aggregated representations such as histograms. Along this pipeline, there are many parameters to set and choices to make, and their effect on the overall system performance is a-priori unclear. In this work, we shorten the pipeline in a principled way. We move pre-processing steps into the learning system by means of kernel parameters, letting the learning algorithm decide upon suitable parameter values. Learning to optimize the pre-processing choices becomes learning the kernel parameters. We realize this paradigm by extending the recent Multiple Kernel Learning formulation from the finite case of having a fixed number of kernels which can be combined to the general infinite case where each possible parameter setting induces an associated kernel. We evaluate the new paradigm extensively on image classification and object classification tasks. We show that it is possible to learn optimal discriminative codebooks and optimal spatial pyramid schemes, consistently outperforming all previous state-of-the-art approaches.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Identifying confounders using additive noise models

Janzing, D., Peters, J., Mooij, J., Schölkopf, B.

In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, pages: 249-257, (Editors: J Bilmes and AY Ng), AUAI Press, Corvallis, OR, USA, UAI, June 2009 (inproceedings)

Abstract
We propose a method for inferring the existence of a latent common cause ("confounder") of two observed random variables. The method assumes that the two effects of the confounder are (possibly nonlinear) functions of the confounder plus independent, additive noise. We discuss under which conditions the model is identifiable (up to an arbitrary reparameterization of the confounder) from the joint distribution of the effects. We state and prove a theoretical result that provides evidence for the conjecture that the model is generically identifiable under suitable technical conditions. In addition, we propose a practical method to estimate the confounder from a finite i.i.d. sample of the effects and illustrate that the method works well on both simulated and real-world data.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Regression by dependence minimization and its application to causal inference in additive noise models

Mooij, J., Janzing, D., Peters, J., Schölkopf, B.

In Proceedings of the 26th International Conference on Machine Learning, pages: 745-752, (Editors: A Danyluk and L Bottou and M Littman), ACM Press, New York, NY, USA, ICML, June 2009 (inproceedings)

Abstract
Motivated by causal inference problems, we propose a novel method for regression that minimizes the statistical dependence between regressors and residuals. The key advantage of this approach to regression is that it does not assume a particular distribution of the noise, i.e., it is non-parametric with respect to the noise distribution. We argue that the proposed regression method is well suited to the task of causal inference in additive noise models. A practical disadvantage is that the resulting optimization problem is generally non-convex and can be difficult to solve. Nevertheless, we report good results on one of the tasks of the NIPS 2008 Causality Challenge, where the goal is to distinguish causes from effects in pairs of statistically dependent variables. In addition, we propose an algorithm for efficiently inferring causal models from observational data for more than two variables. The required number of regressions and independence tests is quadratic in the number of variables, which is a significant improvement over the simple method that tests all possible DAGs.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Fitted Q-iteration by Advantage Weighted Regression

Neumann, G., Peters, J.

In Advances in neural information processing systems 21, pages: 1177-1184, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Recently, fitted Q-iteration (FQI) based methods have become more popular due to their increased sample efficiency, a more stable learning process and the higher quality of the resulting policy. However, these methods remain hard to use for continuous action spaces which frequently occur in real-world tasks, e.g., in robotics and other technical applications. The greedy action selection commonly used for the policy improvement step is particularly problematic as it is expensive for continuous actions, can cause an unstable learning process, introduces an optimization bias and results in highly non-smooth policies unsuitable for real-world systems. In this paper, we show that by using a soft-greedy action selection the policy improvement step used in FQI can be simplified to an inexpensive advantage-weighted regression. With this result, we are able to derive a new, computationally efficient FQI algorithm which can even deal with high dimensional action spaces.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Global Connectivity Potentials for Random Field Models

Nowozin, S., Lampert, C.

In CVPR 2009, pages: 818-825, IEEE Service Center, Piscataway, NJ, USA, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2009 (inproceedings)

Abstract
Markov random field (MRF, CRF) models are popular in computer vision. However, in order to be computationally tractable they are limited to incorporate only local interactions and cannot model global properties, such as connectedness, which is a potentially useful high-level prior for object segmentation. In this work, we overcome this limitation by deriving a potential function that enforces the output labeling to be connected and that can naturally be used in the framework of recent MAP-MRF LP relaxations. Using techniques from polyhedral combinatorics, we show that a provably tight approximation to the MAP solution of the resulting MRF can still be found efficiently by solving a sequence of max-flow problems. The efficiency of the inference procedure also allows us to learn the parameters of a MRF with global connectivity potentials by means of a cutting plane algorithm. We experimentally evaluate our algorithm on both synthetic data and on the challenging segmentation task of the PASCAL VOC 2008 data set. We show that in both cases the addition of a connectedness prior significantly reduces the segmentation error.

ei

PDF PDF Web DOI [BibTex]

PDF PDF Web DOI [BibTex]


no image
Bayesian Experimental Design of Magnetic Resonance Imaging Sequences

Seeger, M., Nickisch, H., Pohmann, R., Schölkopf, B.

In Advances in neural information processing systems 21, pages: 1441-1448, (Editors: D Koller and D Schuurmans and Y Bengio and L Bottou), Curran, Red Hook, NY, USA, 22nd Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
We show how improved sequences for magnetic resonance imaging can be found through automated optimization of Bayesian design scores. Combining recent advances in approximate Bayesian inference and natural image statistics with high-performance numerical computation, we propose the first scalable Bayesian experimental design framework for this problem of high relevance to clinical and brain research. Our solution requires approximate inference for dense, non-Gaussian models on a scale seldom addressed before. We propose a novel scalable variational inference algorithm, and show how powerful methods of numerical mathematics can be modified to compute primitives in our framework. Our approach is evaluated on a realistic setup with raw data from a 3T MR scanner.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Characteristic Kernels on Groups and Semigroups

Fukumizu, K., Sriperumbudur, B., Gretton, A., Schölkopf, B.

In Advances in neural information processing systems 21, pages: 473-480, (Editors: D Koller and D Schuurmans and Y Bengio and L Bottou), Curran, Red Hook, NY, USA, 22nd Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Embeddings of random variables in reproducing kernel Hilbert spaces (RKHSs) may be used to conduct statistical inference based on higher order moments. For sufficiently rich (characteristic) RKHSs, each probability distribution has a unique embedding, allowing all statistical properties of the distribution to be taken into consideration. Necessary and sufficient conditions for an RKHS to be characteristic exist for Rn. In the present work, conditions are established for an RKHS to be characteristic on groups and semigroups. Illustrative examples are provided, including characteristic kernels on periodic domains, rotation matrices, and Rn+.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Policy Search for Motor Primitives in Robotics

Kober, J., Peters, J.

In Advances in neural information processing systems 21, pages: 849-856, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Many motor skills in humanoid robotics can be learned using parametrized motor primitives as done in imitation learning. However, most interesting motor learning problems are high-dimensional reinforcement learning problems often beyond the reach of current methods. In this paper, we extend previous work on policy learning from the immediate reward case to episodic reinforcement learning. We show that this results into a general, common framework also connected to policy gradient methods and yielding a novel algorithm for policy learning by assuming a form of exploration that is particularly well-suited for dynamic motor primitives. The resulting algorithm is an EM-inspired algorithm applicable in complex motor learning tasks. We compare this algorithm to alternative parametrized policy search methods and show that it outperforms previous methods. We apply it in the context of motor learning and show that it can learn a complex Ball-in-a-Cup task using a real Barrett WAM robot arm.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
The graphlet spectrum

Kondor, R., Shervashidze, N., Borgwardt, K.

In Proceedings of the 26th International Conference on Machine Learning (ICML 2009), pages: 529-536, (Editors: Danyluk, A. , L. Bottou, M. Littman), ACM Press, New York, NY, USA, 26th International Conference on Machine Learning (ICML), June 2009 (inproceedings)

Abstract
Current graph kernels suffer from two limitations: graph kernels based on counting particular types of subgraphs ignore the relative position of these subgraphs to each other, while graph kernels based on algebraic methods are limited to graphs without node labels. In this paper we present the graphlet spectrum, a system of graph invariants derived by means of group representation theory that capture information about the number as well as the position of labeled subgraphs in a given graph. In our experimental evaluation the graphlet spectrum outperforms state-of-the-art graph kernels.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Learning Similarity Measure for Multi-Modal 3D Image Registration

Lee, D., Hofmann, M., Steinke, F., Altun, Y., Cahill, N., Schölkopf, B.

In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages: 186-193, IEEE Service Center, Piscataway, NJ, USA, CVPR, June 2009 (inproceedings)

Abstract
Multi-modal image registration is a challenging problem in medical imaging. The goal is to align anatomically identical structures; however, their appearance in images acquired with different imaging devices, such as CT or MR, may be very different. Registration algorithms generally deform one image, the floating image, such that it matches with a second, the reference image, by maximizing some similarity score between the deformed and the reference image. Instead of using a universal, but a priori fixed similarity criterion such as mutual information, we propose learning a similarity measure in a discriminative manner such that the reference and correctly deformed floating images receive high similarity scores. To this end, we develop an algorithm derived from max-margin structured output learning, and employ the learned similarity measure within a standard rigid registration algorithm. Compared to other approaches, our method adapts to the specific registration problem at hand and exploits correlations between neighboring pixels in the reference and the floating image. Empirical evaluation on CT-MR/PET-MR rigid registration tasks demonstrates that our approach yields robust performance and outperforms the state of the art methods for multi-modal medical image registration.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Learning Complex Motions by Sequencing Simpler Motion Templates

Neumann, G., Maass, W., Peters, J.

In ICML 2009, pages: 753-760, (Editors: Danyluk, A. , L. Bottou, M. Littman), ACM Press, New York, NY, USA, 26th International Conference on Machine Learning, June 2009 (inproceedings)

Abstract
Abstraction of complex, longer motor tasks into simpler elemental movements enables humans and animals to exhibit motor skills which have not yet been matched by robots. Humans intuitively decompose complex motions into smaller, simpler segments. For example when describing simple movements like drawing a triangle with a pen, we can easily name the basic steps of this movement. Surprisingly, such abstractions have rarely been used in artificial motor skill learning algorithms. These algorithms typically choose a new action (such as a torque or a force) at a very fast time-scale. As a result, both policy and temporal credit assignment problem become unnecessarily complex - often beyond the reach of current machine learning methods. We introduce a new framework for temporal abstractions in reinforcement learning (RL), i.e. RL with motion templates. We present a new algorithm for this framework which can learn high-quality policies by making only few abstract decisions.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Solution Stability in Linear Programming Relaxations: Graph Partitioning and Unsupervised Learning

Nowozin, S., Jegelka, S.

In ICML 2009, pages: 769-776, (Editors: Danyluk, A. , L. Bottou, M. Littman), ACM Press, New York, NY, USA, 26th International Conference on Machine Learning, June 2009 (inproceedings)

Abstract
We propose a new method to quantify the solution stability of a large class of combinatorial optimization problems arising in machine learning. As practical example we apply the method to correlation clustering, clustering aggregation, modularity clustering, and relative performance significance clustering. Our method is extensively motivated by the idea of linear programming relaxations. We prove that when a relaxation is used to solve the original clustering problem, then the solution stability calculated by our method is conservative, that is, it never overestimates the solution stability of the true, unrelaxed problem. We also demonstrate how our method can be used to compute the entire path of optimal solutions as the optimization problem is increasingly perturbed. Experimentally, our method is shown to perform well on a number of benchmark problems.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Kernel Measures of Independence for Non-IID Data

Zhang, X., Song, L., Gretton, A., Smola, A.

In Advances in neural information processing systems 21, pages: 1937-1944, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Many machine learning algorithms can be formulated in the framework of statistical independence such as the Hilbert Schmidt Independence Criterion. In this paper, we extend this criterion to deal with structured and interdependent observations. This is achieved by modeling the structures using undirected graphical models and comparing the Hilbert space embeddings of distributions. We apply this new criterion to independent component analysis and sequence clustering.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Using Bayesian Dynamical Systems for Motion Template Libraries

Chiappa, S., Kober, J., Peters, J.

In Advances in neural information processing systems 21, pages: 297-304, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Motor primitives or motion templates have become an important concept for both modeling human motor control as well as generating robot behaviors using imitation learning. Recent impressive results range from humanoid robot movement generation to timing models of human motions. The automatic generation of skill libraries containing multiple motion templates is an important step in robot learning. Such a skill learning system needs to cluster similar movements together and represent each resulting motion template as a generative model which is subsequently used for the execution of the behavior by a robot system. In this paper, we show how human trajectories captured as multidimensional time-series can be clustered using Bayesian mixtures of linear Gaussian state-space models based on the similarity of their dynamics. The appropriate number of templates is automatically determined by enforcing a parsimonious parametrization. As the resulting model is intractable, we introduce a novel approximation method based on variational Bayes, which is especially designed to enable the use of efficient inference algorithms. On recorded human Balero movements, this method is not only capable of finding reasonable motion templates but also yields a generative model which works well in the execution of this complex task on a simulated anthropomorphic SARCOS arm.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Multi-way set enumeration in real-valued tensors

Georgii, E., Tsuda, K., Schölkopf, B.

In Proceedings of the 2nd Workshop on Data Mining using Matrices and Tensors (DMMT 2009), pages: 32-41, (Editors: C Ding and T Li), ACM Press, New York, NY, USA, 2nd Workshop on Data Mining using Matrices and Tensors (DMMT/KDD), June 2009 (inproceedings)

Abstract
The analysis of n-ary relations receives attention in many different fields, for instance biology, web mining, and social studies. In the basic setting, there are n sets of instances, and each observation associates n instances, one from each set. A common approach to explore these n-way data is the search for n-set patterns. An n-set pattern consists of specific subsets of the n instance sets such that all possible n- ary associations between the corresponding instances are observed. This provides a higher-level view of the data, revealing associative relationships between groups of instances. Here, we generalize this approach in two respects. First, we tolerate missing observations to a certain degree, that means we are also interested in n-sets where most (although not all) of the possible combinations have been recorded in the data. Second, we take association weights into account. More precisely, we propose a method to enumerate all n- sets that satisfy a minimum threshold with respect to the average association weight. Non-observed associations obtain by default a weight of zero. Technically, we solve the enumeration task using a reverse search strategy, which allows for effective pruning of the search space. In addition, our algorithm provides a ranking of the solutions and can consider further constraints. We show experimental results on artificial and real-world data sets from different domains.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Non-parametric Regression between Riemannian Manifolds

Steinke, F., Hein, M.

In Advances in neural information processing systems 21, pages: 1561-1568, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
This paper discusses non-parametric regression between Riemannian manifolds. This learning problem arises frequently in many application areas ranging from signal processing, computer vision, over robotics to computer graphics. We present a new algorithmic scheme for the solution of this general learning problem based on regularized empirical risk minimization. The regularization functional takes into account the geometry of input and output manifold, and we show that it implements a prior which is particularly natural. Moreover, we demonstrate that our algorithm performs well in a difficult surface registration problem.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Near-optimal supervised feature selection among frequent subgraphs

Thoma, M., Cheng, H., Gretton, A., Han, J., Kriegel, H., Smola, A., Song, L., Yu, P., Yan, X., Borgwardt, K.

In Proccedings of the 2009 SIAM Conference on Data Mining (SDM 2009), pages: 1076-1087, (Editors: Park, H. , S. Parthasarathy, H. Liu), Philadelphia, PA, USA, Society for Industrial and Applied Mathematics, 9th SIAM Conference on Data Mining (SDM), May 2009 (inproceedings)

Abstract
Graph classification is an increasingly important step in numerous application domains, such as function prediction of molecules and proteins, computerised scene analysis, and anomaly detection in program flows. Among the various approaches proposed in the literature, graph classification based on frequent subgraphs is a popular branch: Graphs are represented as (usually binary) vectors, with components indicating whether a graph contains a particular subgraph that is frequent across the dataset. On large graphs, however, one faces the enormous problem that the number of these frequent subgraphs may grow exponentially with the size of the graphs, but only few of them possess enough discriminative power to make them useful for graph classification. Efficient and discriminative feature selection among frequent subgraphs is hence a key challenge for graph mining. In this article, we propose an approach to feature selection on frequent subgraphs, called CORK, that combines two central advantages. First, it optimizes a submodular quality criterion, which means that we can yield a near-optimal solution using greedy feature selection. Second, our submodular quality function criterion can be integrated into gSpan, the state-of-the-art tool for frequent subgraph mining, and help to prune the search space for discriminative frequent subgraphs even during frequent subgraph mining.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Semi-supervised subspace analysis of human functional magnetic resonance imaging data

Shelton, J., Blaschko, M., Bartels, A.

(185), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, May 2009 (techreport)

Abstract
Kernel Canonical Correlation Analysis is a very general technique for subspace learning that incorporates PCA and LDA as special cases. Functional magnetic resonance imaging (fMRI) acquired data is naturally amenable to these techniques as data are well aligned. fMRI data of the human brain is a particularly interesting candidate. In this study we implemented various supervised and semi-supervised versions of KCCA on human fMRI data, with regression to single- and multi-variate labels (corresponding to video content subjects viewed during the image acquisition). In each variate condition, the semi-supervised variants of KCCA performed better than the supervised variants, including a supervised variant with Laplacian regularization. We additionally analyze the weights learned by the regression in order to infer brain regions that are important to different types of visual processing.

ei

PDF [BibTex]

PDF [BibTex]


no image
Center-surround patterns emerge as optimal predictors for human saccade targets

Kienzle, W., Franz, M., Schölkopf, B., Wichmann, F.

Journal of Vision, 9(5:7):1-15, May 2009 (article)

Abstract
The human visual system is foveated, that is, outside the central visual field resolution and acuity drop rapidly. Nonetheless much of a visual scene is perceived after only a few saccadic eye movements, suggesting an effective strategy for selecting saccade targets. It has been known for some time that local image structure at saccade targets influences the selection process. However, the question of what the most relevant visual features are is still under debate. Here we show that center-surround patterns emerge as the optimal solution for predicting saccade targets from their local image structure. The resulting model, a one-layer feed-forward network, is surprisingly simple compared to previously suggested models which assume much more complex computations such as multi-scale processing and multiple feature channels. Nevertheless, our model is equally predictive. Furthermore, our findings are consistent with neurophysiological hardware in the superior colliculus. Bottom-up visual saliency may thus not be computed cortically as has been thought previously.

ei

PDF DOI [BibTex]


no image
Link Propagation: A Fast Semi-supervised Learning Algorithm for Link Prediction

Kashima, H., Kato, T., Yamanishi, Y., Sugiyama, M., Tsuda, K.

In Proceedings of the 2009 SIAM International Conference on Data Mining, pages: 1099-1110, (Editors: Park, H. , S. Parthasarathy, H. Liu), Philadelphia, PA, USA, Society for Industrial and Applied Mathematics, SDM, May 2009 (inproceedings)

Abstract
We propose Link Propagation as a new semi-supervised learning method for link prediction problems, where the task is to predict unknown parts of the network structure by using auxiliary information such as node similarities. Since the proposed method can fill in missing parts of tensors, it is applicable to multi-relational domains, allowing us to handle multiple types of links simultaneously. We also give a novel efficient algorithm for Link Propagation based on an accelerated conjugate gradient method.

ei

PDF Web [BibTex]

PDF Web [BibTex]