Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Empirical Inference Talk BCPy2000 Hill, N., Schreiner, T., Puzicha, C., Farquhar, J. Workshop "Machine Learning Open-Source Software" at NIPS, December 2008 Web BibTeX

Empirical Inference Conference Paper A Bayesian Approach to Switching Linear Gaussian State-Space Models for Unsupervised Time-Series Segmentation Chiappa, S. In Proceedings of the 7th International Conference on Machine Learning and Applications (ICMLA 2008), ICMLA 2008, 3-9, (Editors: Wani, M. A., X.-W. Chen, D. Casasent, L. Kurgan, T. Hu, K. Hafeez), IEEE Computer Society, Los Alamitos, CA, USA, 7th International Conference on Machine Learning and Applications, December 2008
Time-series segmentation in the fully unsupervised scenario in which the number of segment-types is a priori unknown is a fundamental problem in many applications. We propose a Bayesian approach to a segmentation model based on the switching linear Gaussian state-space model that enforces a sparse parametrization, such as to use only a small number of a priori available different dynamics to explain the data. This enables us to estimate the number of segment-types within the model, in contrast to previous non-Bayesian approaches where training and comparing several separate models was required. As the resulting model is computationally intractable, we introduce a variational approximation where a reformulation of the problem enables the use of efficient inference algorithms.
PDF Web DOI BibTeX

Empirical Inference Conference Paper Block Iterative Algorithms for Non-negative Matrix Approximation Sra, S. In Proceedings of the Eighth IEEE International Conference on Data Mining (ICDM 2008), ICDM 2008, 1037-1042, (Editors: Giannotti, F. , D. Gunopulos, F. Turini, C. Zaniolo, N. Ramakrishnan, X. Wu), IEEE Service Center, Piscataway, NJ, USA, Eighth IEEE International Conference on Data Mining, December 2008
In this paper we present new algorithms for non-negative matrix approximation (NMA), commonly known as the NMF problem. Our methods improve upon the well-known methods of Lee & Seung~cite{lee00} for both the Frobenius norm as well the Kullback-Leibler divergence versions of the problem. For the latter problem, our results are especially interesting because it seems to have witnessed much lesser algorithmic progress as compared to the Frobenius norm NMA problem. Our algorithms are based on a particular textbf {block-iterative} acceleration technique for EM, which preserves the multiplicative nature of the updates and also ensures monotonicity. Furthermore, our algorithms also naturally apply to the Bregman-divergence NMA algorithms of~cite{suv.nips}. Experimentally, we show that our algorithms outperform the traditional Lee/Seung approach most of the time.
Web DOI BibTeX

Empirical Inference Conference Paper Frequent Subgraph Retrieval in Geometric Graph Databases Nowozin, S., Tsuda, K. In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008), ICDM 2008, 953-958, (Editors: Giannotti, F. , D. Gunopulos, F. Turini, C. Zaniolo, N. Ramakrishnan, X. Wu), IEEE Computer Society, Los Alamitos, CA, USA, 8th IEEE International Conference on Data Mining, December 2008
Discovery of knowledge from geometric graph databases is of particular importance in chemistry and biology, because chemical compounds and proteins are represented as graphs with 3D geometric coordinates. In such applications, scientists are not interested in the statistics of the whole database. Instead they need information about a novel drug candidate or protein at hand, represented as a query graph. We propose a polynomial-delay algorithm for geometric frequent subgraph retrieval. It enumerates all subgraphs of a single given query graph which are frequent geometric $epsilon$-subgraphs under the entire class of rigid geometric transformations in a database. By using geometric$epsilon$-subgraphs, we achieve tolerance against variations in geometry. We compare the proposed algorithm to gSpan on chemical compound data, and we show that for a given minimum support the total number of frequent patterns is substantially limited by requiring geometric matching. Although the computation time per pattern is lar ger than for non-geometric graph mining,the total time is within a reasonable level even for small minimum support.
PDF Web DOI BibTeX

Empirical Inference Conference Paper Iterative Subgraph Mining for Principal Component Analysis Saigo, H., Tsuda, K. In Proceedings of the IEEE International Conference on Data Mining (ICDM 2008), ICDM 2008, 1007-1012, (Editors: Giannotti, F. , D. Gunopulos, F. Turini, C. Zaniolo, N. Ramakrishnan, X. Wu), IEEE Computer Society, Los Alamitos, CA, USA, IEEE International Conference on Data Mining, December 2008
Graph mining methods enumerate frequent subgraphs efficiently, but they are not necessarily good features for machine learning due to high correlation among features. Thus it makes sense to perform principal component analysis to reduce the dimensionality and create decorrelated features. We present a novel iterative mining algorithm that captures informative patterns corresponding to major entries of top principal components. It repeatedly calls weighted substructure mining where example weights are updated in each iteration. The Lanczos algorithm, a standard algorithm of eigendecomposition, is employed to update the weights. In experiments, our patterns are shown to approximate the principal components obtained by frequent mining.
PDF Web DOI BibTeX

Empirical Inference Conference Paper Joint Kernel Support Estimation for Structured Prediction Lampert, C., Blaschko, M. In Proceedings of the NIPS 2008 Workshop on "Structured Input - Structured Output" (NIPS SISO 2008), 1-4, NIPS 2008 Workshop on "Structured Input - Structured Output" (NIPS SISO 2008), December 2008
We present a new technique for structured prediction that works in a hybrid generative/ discriminative way, using a one-class support vector machine to model the joint probability of (input, output)-pairs in a joint reproducing kernel Hilbert space. Compared to discriminative techniques, like conditional random elds or structured out- put SVMs, the proposed method has the advantage that its training time depends only on the number of training examples, not on the size of the label space. Due to its generative aspect, it is also very tolerant against ambiguous, incomplete or incorrect labels. Experiments on realistic data show that our method works eciently and robustly in situations for which discriminative techniques have computational or statistical problems.
PDF Web BibTeX

Empirical Inference Talk Logistic Regression for Graph Classification Shervashidze, N., Tsuda, K. NIPS 2008 Workshop on "Structured Input - Structured Output" (NIPS SISO 2008), December 2008
In this paper we deal with graph classification. We propose a new algorithm for performing sparse logistic regression for graphs, which is comparable in accuracy with other methods of graph classification and produces probabilistic output in addition. Sparsity is required for the reason of interpretability, which is often necessary in domains such as bioinformatics or chemoinformatics.
Web BibTeX

Empirical Inference Talk New Projected Quasi-Newton Methods with Applications Sra, S. Microsoft Research Tech-talk, December 2008
Box-constrained convex optimization problems are central to several applications in a variety of fields such as statistics, psychometrics, signal processing, medical imaging, and machine learning. Two fundamental examples are the non-negative least squares (NNLS) problem and the non-negative Kullback-Leibler (NNKL) divergence minimization problem. The non-negativity constraints are usually based on an underlying physical restriction, for e.g., when dealing with applications in astronomy, tomography, statistical estimation, or image restoration, the underlying parameters represent physical quantities such as concentration, weight, intensity, or frequency counts and are therefore only interpretable with non-negative values. Several modern optimization methods can be inefficient for simple problems such as NNLS and NNKL as they are really designed to handle far more general and complex problems. In this work we develop two simple quasi-Newton methods for solving box-constrained (differentiable) convex optimization problems that utilize the well-known BFGS and limited memory BFGS updates. We position our method between projected gradient (Rosen, 1960) and projected Newton (Bertsekas, 1982) methods, and prove its convergence under a simple Armijo step-size rule. We illustrate our method by showing applications to: Image deblurring, Positron Emission Tomography (PET) image reconstruction, and Non-negative Matrix Approximation (NMA). On medium sized data we observe performance competitive to established procedures, while for larger data the results are even better.
PDF BibTeX

Empirical Inference Conference Paper Stereo Matching for Calibrated Cameras without Correspondence Helmke, U., Hüper, K., Vences, L. In Proceedings of the 47th IEEE Conference on Decision and Control (CDC 2008), CDC 2008, 2408-2413, IEEE Service Center, Piscataway, NJ, USA, 47th IEEE Conference on Decision and Control, December 2008
We study the stereo matching problem for reconstruction of the location of 3D-points on an unknown surface patch from two calibrated identical cameras without using any a priori information about the pointwise correspondences. We assume that camera parameters and the pose between the cameras are known. Our approach follows earlier work for coplanar cameras where a gradient flow algorithm was proposed to match associated Gramians. Here we extend this method by allowing arbitrary poses for the cameras. We introduce an intrinsic Riemannian Newton algorithm that achieves local quadratic convergence rates. A closed form solution is presented, too. The efficiency of both algorithms is demonstrated by numerical experiments.
PDF Web DOI BibTeX

Empirical Inference Article Modelling contrast discrimination data suggest both the pedestal effect and stochastic resonance to be caused by the same mechanism Goris, R., Wagemans, J., Wichmann, F. Journal of Vision, 8(15):1-21, November 2008
Computational models of spatial vision typically make use of a (rectified) linear filter, a nonlinearity and dominant late noise to account for human contrast discrimination data. Linear–nonlinear cascade models predict an improvement in observers' contrast detection performance when low, subthreshold levels of external noise are added (i.e., stochastic resonance). Here, we address the issue whether a single contrast gain-control model of early spatial vision can account for both the pedestal effect, i.e., the improved detectability of a grating in the presence of a low-contrast masking grating, and stochastic resonance. We measured contrast discrimination performance without noise and in both weak and moderate levels of noise. Making use of a full quantitative description of our data with few parameters combined with comprehensive model selection assessments, we show the pedestal effect to be more reduced in the presence of weak noise than in moderate noise. This reduction rules out independent, additive sources of performance improvement and, together with a simulation study, supports the parsimonious explanation that a single mechanism underlies the pedestal effect and stochastic resonance in contrast perception.
Web DOI BibTeX

Empirical Inference Technical Report Frequent Subgraph Retrieval in Geometric Graph Databases Nowozin, S., Tsuda, K. (180), Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, November 2008
Discovery of knowledge from geometric graph databases is of particular importance in chemistry and biology, because chemical compounds and proteins are represented as graphs with 3D geometric coordinates. In such applications, scientists are not interested in the statistics of the whole database. Instead they need information about a novel drug candidate or protein at hand, represented as a query graph. We propose a polynomial-delay algorithm for geometric frequent subgraph retrieval. It enumerates all subgraphs of a single given query graph which are frequent geometric epsilon-subgraphs under the entire class of rigid geometric transformations in a database. By using geometric epsilon-subgraphs, we achieve tolerance against variations in geometry. We compare the proposed algorithm to gSpan on chemical compound data, and we show that for a given minimum support the total number of frequent patterns is substantially limited by requiring geometric matching. Although the computation time per pattern is larger than for non-geometric graph mining, the total time is within a reasonable level even for small minimum support.
PDF BibTeX

Empirical Inference Article Kernels, Regularization and Differential Equations Steinke, F., Schölkopf, B. Pattern Recognition, 41(11):3271-3286, November 2008
Many common machine learning methods such as Support Vector Machines or Gaussian process inference make use of positive definite kernels, reproducing kernel Hilbert spaces, Gaussian processes, and regularization operators. In this work these objects are presented in a general, unifying framework, and interrelations are highlighted. With this in mind we then show how linear stochastic differential equation models can be incorporated naturally into the kernel framework. And vice versa, many kernel machines can be interpreted in terms of differential equations. We focus especially on ordinary differential equations, also known as dynamical systems, and it is shown that standard kernel inference algorithms are equivalent to Kalman filter methods based on such models. In order not to cloud qualitative insights with heavy mathematical machinery, we restrict ourselves to finite domains, implying that differential equations are treated via their corresponding finite difference equations.
PDF DOI BibTeX

Empirical Inference Article Machine Learning for Motor Skills in Robotics Peters, J. K{\"u}nstliche Intelligenz, 2008(4):41-43, November 2008
Autonomous robots that can adapt to novel situations has been a long standing vision of robotics, artificial intelligence, and the cognitive sciences. Early approaches to this goal during the heydays of artificial intelligence research in the late 1980s, however, made it clear that an approach purely based on reasoning or human insights would not be able to model all the perceptuomotor tasks of future robots. Instead, new hope was put in the growing wake of machine learning that promised fully adaptive control algorithms which learn both by observation and trial-and-error. However, to date, learning techniques have yet to fulfill this promise as only few methods manage to scale into the high-dimensional domains of manipulator and humanoid robotics and usually scaling was only achieved in precisely pre-structured domains. We have investigated the ingredients for a general approach to motor skill learning in order to get one step closer towards human-like performance. For doing so, we study two major components for such an approach, i.e., firstly, a theoretically well-founded general approach to representing the required control structures for task representation and execution and, secondly, appropriate learning algorithms which can be applied in this setting.
PDF Web BibTeX

Empirical Inference Conference Paper Policy Learning: A Unified Perspective with Applications in Robotics Peters, J., Kober, J., Nguyen-Tuong, D. In Recent Advances in Reinforcement Learning: 8th European Workshop (EWRL 2008), EWRL 2008, 220-228, (Editors: Girgin, S. , M. Loth, R. Munos, P. Preux, D. Ryabko), Springer, Berlin, Germany, 8th European Workshop on Reinforcement Learning, November 2008
Policy Learning approaches are among the best suited methods for high-dimensional, continuous control systems such as anthropomorphic robot arms and humanoid robots. In this paper, we show two contributions: firstly, we show a unified perspective which allows us to derive several policy learning algorithms from a common point of view, i.e, policy gradient algorithms, natural-gradient algorithms and EM-like policy learning. Secondly, we present several applications to both robot motor primitive learning as well as to robot control in task space. Results both from simulation and several different real robots are shown.
PDF Web DOI BibTeX

Empirical Inference Conference Paper Probabilistic Inference for Fast Learning in Control Rasmussen, C., Deisenroth, M. In Recent Advances in Reinforcement Learning: 8th European Workshop (EWRL 2008), EWRL 2008, 229-242, (Editors: Girgin, S. , M. Loth, R. Munos, P. Preux, D. Ryabko), Springer, Berlin, Germany, 8th European Workshop on Reinforcement Learning, November 2008
We provide a novel framework for very fast model-based reinforcement learning in continuous state and action spaces. The framework requires probabilistic models that explicitly characterize their levels of confidence. Within this framework, we use flexible, non-parametric models to describe the world based on previously collected experience. We demonstrate learning on the cart-pole problem in a setting where we provide very limited prior knowledge about the task. Learning progresses rapidly, and a good policy is found after only a hand-full of iterations.
PDF Web DOI BibTeX

Empirical Inference Technical Report Simultaneous Implicit Surface Reconstruction and Meshing Giesen, J., Maier, M., Schölkopf, B. (179), Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, November 2008
We investigate an implicit method to compute a piecewise linear representation of a surface from a set of sample points. As implicit surface functions we use the weighted sum of piecewise linear kernel functions. For such a function we can partition Rd in such a way that these functions are linear on the subsets of the partition. For each subset in the partition we can then compute the zero level set of the function exactly as the intersection of a hyperplane with the subset.
PDF BibTeX

Empirical Inference Technical Report Taxonomy Inference Using Kernel Dependence Measures Blaschko, M., Gretton, A. (181), Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, November 2008
We introduce a family of unsupervised algorithms, numerical taxonomy clustering, to simultaneously cluster data, and to learn a taxonomy that encodes the relationship between the clusters. The algorithms work by maximizing the dependence between the taxonomy and the original data. The resulting taxonomy is a more informative visualization of complex data than simple clustering; in addition, taking into account the relations between different clusters is shown to substantially improve the quality of the clustering, when compared with state-of-the-art algorithms in the literature (both spectral clustering and a previous dependence maximization approach). We demonstrate our algorithm on image and text data.
PDF BibTeX

Empirical Inference Poster Variational Bayesian Model Selection in Linear Gaussian State-Space based Models Chiappa, S. International Workshop on Flexible Modelling: Smoothing and Robustness (FMSR 2008), 2008:1, November 2008 Web BibTeX

Empirical Inference Article gBoost: A Mathematical Programming Approach to Graph Classification and Regression Saigo, H., Nowozin, S., Kadowaki, T., Kudo, T., Tsuda, K. Machine Learning, 75(1):69-89, November 2008
Graph mining methods enumerate frequently appearing subgraph patterns, which can be used as features for subsequent classification or regression. However, frequent patterns are not necessarily informative for the given learning problem. We propose a mathematical programming boosting method (gBoost) that progressively collects informative patterns. Compared to AdaBoost, gBoost can build the prediction rule with fewer iterations. To apply the boosting method to graph data, a branch-and-bound pattern search algorithm is developed based on the DFS code tree. The constructed search space is reused in later iterations to minimize the computation time. Our method can learn more efficiently than the simpler method based on frequent substructure mining, because the output labels are used as an extra information source for pruning the search space. Furthermore, by engineering the mathematical program, a wide range of machine learning problems can be solved without modifying the pattern search algorithm.
PDF DOI BibTeX

Autonomous Motion Conference Paper A Versatile Stair-Climbing Robot for Search and Rescue Applications Eich, M., Grimminger, F., Kirchner, F. In 2008 IEEE International Workshop on Safety, Security and Rescue Robotics, 35-40, October 2008 DOI BibTeX

Movement Generation and Control Conference Paper A modular bio-inspired architecture for movement generation for the infant-like robot iCub Degallier, S., Righetti, L., Natale, L., Nori, F., Metta, G., Ijspeert, A. In 2008 2nd IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics, 795-800, IEEE, Scottsdale, USA, October 2008
Movement generation in humans appears to be processed through a three-layered architecture, where each layer corresponds to a different level of abstraction in the representation of the movement. In this article, we will present an architecture reflecting this organization and based on a modular approach to human movement generation. We will show that our architecture is well suited for the online generation and modulation of motor behaviors, but also for switching between motor behaviors. This will be illustrated respectively through an interactive drumming task and through switching between reaching and crawling.
DOI URL BibTeX

Movement Generation and Control Conference Paper A Dynamical System for Online Learning of Periodic Movements of Unknown Waveform and Frequency Gams, A., Righetti, L., Ijspeert, A., Lenarčič, J. In 2008 2nd IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics, 85-90, IEEE, Scottsdale, USA, October 2008
The paper presents a two-layered system for learning and encoding a periodic signal onto a limit cycle without any knowledge on the waveform and the frequency of the signal, and without any signal processing. The first dynamical system is responsible for extracting the main frequency of the input signal. It is based on adaptive frequency phase oscillators in a feedback structure, enabling us to extract separate frequency components without any signal processing, as all of the processing is embedded in the dynamics of the system itself. The second dynamical system is responsible for learning of the waveform. It has a built-in learning algorithm based on locally weighted regression, which adjusts the weights according to the amplitude of the input signal. By combining the output of the first system with the input of the second system we can rapidly teach new trajectories to robots. The systems works online for any periodic signal and can be applied in parallel to multiple dimensions. Furthermore, it can adapt to changes in frequency and shape, e.g. to non-stationary signals, and is computationally inexpensive. Results using simulated and hand-generated input signals, along with applying the algorithm to a HOAP-2 humanoid robot are presented.
DOI URL BibTeX

Empirical Inference Article Approximations for Binary Gaussian Process Classification Nickisch, H., Rasmussen, C. Journal of Machine Learning Research, 9:2035-2078, October 2008
We provide a comprehensive overview of many recent algorithms for approximate inference in Gaussian process models for probabilistic binary classification. The relationships between several approaches are elucidated theoretically, and the properties of the different algorithms are corroborated by experimental results. We examine both 1) the quality of the predictive distributions and 2) the suitability of the different marginal likelihood approximations for model selection (selecting hyperparameters) and compare to a gold standard based on MCMC. Interestingly, some methods produce good predictive distributions although their marginal likelihood approximations are poor. Strong conclusions are drawn about the methods: The Expectation Propagation algorithm is almost always the method of choice unless the computational budget is very tight. We also extend existing methods in various ways, and provide unifying code implementing all approaches.
PDF PDF BibTeX

Empirical Inference Conference Paper Automatic Image Colorization Via Multimodal Predictions Charpiat, G., Hofmann, M., Schölkopf, B. In Computer Vision: ECCV 2008, Computer Vision - ECCV 2008, Lecture Notes in Computer Science, Vol. 5304, 126-139, (Editors: DA Forsyth and PHS Torr and A Zisserman), Springer, Berlin, Germany, 10th European Conference on Computer Vision, October 2008
We aim to color automatically greyscale images, without any manual intervention. The color proposition could then be interactively corrected by user-provided color landmarks if necessary. Automatic colorization is nontrivial since there is usually no one-to-one correspondence between color and local texture. The contribution of our framework is that we deal directly with multimodality and estimate, for each pixel of the image to be colored, the probability distribution of all possible colors, instead of choosing the most probable color at the local level. We also predict the expected variation of color at each pixel, thus defining a nonuniform spatial coherency criterion. We then use graph cuts to maximize the probability of the whole colored image at the global level. We work in the L-a-b color space in order to approximate the human perception of distances between colors, and we use machine learning tools to extract as much information as possible from a dataset of colored examples. The resulting algorithm is fast, designed to be more robust to texture noise, and is above all able to deal with ambiguity, in contrary to previous approaches.
PDF Web DOI BibTeX

Empirical Inference Conference Paper Learning to Localize Objects with Structured Output Regression Blaschko, M., Lampert, C. In Computer Vision: ECCV 2008, ECCV 2008, 2-15, (Editors: Forsyth, D. A., P. H.S. Torr, A. Zisserman), Springer, Berlin, Germany, 10th European Conference on Computer Vision, October 2008, Best Student Paper Award
Sliding window classifiers are among the most successful and widely applied techniques for object localization. However, training is typically done in a way that is not specific to the localization task. First a binary classifier is trained using a sample of positive and negative examples, and this classifier is subsequently applied to multiple regions within test images. We propose instead to treat object localization in a principled way by posing it as a problem of predicting structured data: we model the problem not as binary classification, but as the prediction of the bounding box of objects located in images. The use of a joint-kernel framework allows us to formulate the training procedure as a generalization of an SVM, which can be solved efficiently. We further improve computational efficiency by using a branch-and-bound strategy for localization during both training and testing. Experimental evaluation on the PASCAL VOC and TU Darmstadt datasets show that the structured training procedure improves pe rformance over binary training as well as the best previously published scores.
PDF Web DOI BibTeX

Empirical Inference Talk MR-Based PET Attenuation Correction: Initial Results for Whole Body Hofmann, M., Steinke, F., Aschoff, P., Lichy, M., Brady, M., Schölkopf, B., Pichler, B. Medical Imaging Conference, October 2008 BibTeX

Empirical Inference Article MRI-Based Attenuation Correction for PET/MRI: A Novel Approach Combining Pattern Recognition and Atlas Registration Hofmann, M., Steinke, F., Scheel, V., Charpiat, G., Farquhar, J., Aschoff, P., Brady, M., Schölkopf, B., Pichler, B. Journal of Nuclear Medicine, 49(11):1875-1883, October 2008
For quantitative PET information, correction of tissue photon attenuation is mandatory. Generally in conventional PET, the attenuation map is obtained from a transmission scan, which uses a rotating radionuclide source, or from the CT scan in a combined PET/CT scanner. In the case of PET/MRI scanners currently under development, insufficient space for the rotating source exists; the attenuation map can be calculated from the MR image instead. This task is challenging because MR intensities correlate with proton densities and tissue-relaxation properties, rather than with attenuation-related mass density. METHODS: We used a combination of local pattern recognition and atlas registration, which captures global variation of anatomy, to predict pseudo-CT images from a given MR image. These pseudo-CT images were then used for attenuation correction, as the process would be performed in a PET/CT scanner. RESULTS: For human brain scans, we show on a database of 17 MR/CT image pairs that our method reliably enables e stimation of a pseudo-CT image from the MR image alone. On additional datasets of MRI/PET/CT triplets of human brain scans, we compare MRI-based attenuation correction with CT-based correction. Our approach enables PET quantification with a mean error of 3.2% for predefined regions of interest, which we found to be clinically not significant. However, our method is not specific to brain imaging, and we show promising initial results on 1 whole-body animal dataset. CONCLUSION: This method allows reliable MRI-based attenuation correction for human brain scans. Further work is necessary to validate the method for whole-body imaging.
Web DOI BibTeX

Empirical Inference Article Mixture Models for Protein Structure Ensembles Hirsch, M., Habeck, M. Bioinformatics, 24(19):2184-2192, October 2008 Web DOI BibTeX

Empirical Inference Talk Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches Gretton, A., Györfi, L. 19th International Conference on Algorithmic Learning Theory (ALT08), October 2008 PDF Web BibTeX

Empirical Inference Conference Paper Nonparametric Independence Tests: Space Partitioning and Kernel Approaches Gretton, A., Györfi, L. In Algorithmic Learning Theory: 19th International Conference (ALT08), ALT08, 183-198, (Editors: Freund, Y. , L. Györfi, G. Turán, T. Zeugmann), Springer, Berlin, Germany, 19th International Conference on Algorithmic Learning Theory (ALT08), October 2008
Three simple and explicit procedures for testing the independence of two multi-dimensional random variables are described. Two of the associated test statistics (L1, log-likelihood) are defined when the empirical distribution of the variables is restricted to finite partitions. A third test statistic is defined as a kernel-based independence measure. All tests reject the null hypothesis of independence if the test statistics become large. The large deviation and limit distribution properties of all three test statistics are given. Following from these results, distributionfree strong consistent tests of independence are derived, as are asymptotically alpha-level tests. The performance of the tests is evaluated experimentally on benchmark data.
PDF Web DOI BibTeX

Movement Generation and Control Conference Paper Passive compliant quadruped robot using central pattern generators for locomotion control Rutishauser, S., Sproewitz, A., Righetti, L., Ijspeert, A. In 2008 IEEE International Conference on Biomedical Robotics and Biomechatronics, 710-715, IEEE, Scottsdale, USA, October 2008
We present a new quadruped robot, ldquoCheetahrdquo, featuring three-segment pantographic legs with passive compliant knee joints. Each leg has two degrees of freedom - knee and hip joint can be actuated using proximal mounted RC servo motors, force transmission to the knee is achieved by means of a bowden cable mechanism. Simple electronics to command the actuators from a desktop computer have been designed in order to test the robot. A Central Pattern Generator (CPG) network has been implemented to generate different gaits. A parameter space search was performed and tested on the robot to optimize forward velocity.
DOI URL BibTeX

Empirical Inference Article Structure of the human voltage-dependent anion channel Bayrhuber, M., Meins, T., Habeck, M., Becker, S., Giller, K., Villinger, S., Vonrhein, C., Griesinger, C., Zweckstetter, M., Zeth, K. Proceedings of the National Academy of Sciences of the United States of America, 105(40):15370-15375, October 2008
The voltage-dependent anion channel (VDAC), also known as mitochondrial porin, is the most abundant protein in the mitochondrial outer membrane (MOM). VDAC is the channel known to guide the metabolic flux across the MOM and plays a key role in mitochondrially induced apoptosis. Here, we present the 3D structure of human VDAC1, which was solved conjointly by NMR spectroscopy and x-ray crystallography. Human VDAC1 (hVDAC1) adopts a β-barrel architecture composed of 19 β-strands with an α-helix located horizontally midway within the pore. Bioinformatic analysis indicates that this channel architecture is common to all VDAC proteins and is adopted by the general import pore TOM40 of mammals, which is also located in the MOM.
Web DOI BibTeX

Empirical Inference Article Support Vector Machines and Kernels for Computational Biology Ben-Hur, A., Ong, C., Sonnenburg, S., Schölkopf, B., Rätsch, G. PLoS Computational Biology, 4(10: e1000173):1-10, October 2008 PDF Web DOI BibTeX

Empirical Inference Conference Paper A Kernel Statistical Test of Independence Gretton, A., Fukumizu, K., Teo, C., Song, L., Schölkopf, B., Smola, A. In Advances in Neural Information Processing Systems 20: 21st Annual Conference on Neural Information Processing Systems 2007, Advances in neural information processing systems 20, 585-592, (Editors: JC Platt and D Koller and Y Singer and S Roweis), Curran, Red Hook, NY, USA, 21st Annual Conference on Neural Information Processing Systems (NIPS 2007), September 2008
Whereas kernel measures of independence have been widely applied in machine learning (notably in kernel ICA), there is as yet no method to determine whether they have detected statistically significant dependence. We provide a novel test of the independence hypothesis for one particular kernel independence measure, the Hilbert-Schmidt independence criterion (HSIC). The resulting test costs O(m^2), where m is the sample size. We demonstrate that this test outperforms established contingency table-based tests. Finally, we show the HSIC test also applies to text (and to structured data more generally), for which no other independence test presently exists.
PDF Web BibTeX

Empirical Inference Article A Single-shot Measurement of the Energy of Product States in a Translation Invariant Spin Chain Can Replace Any Quantum Computation Janzing, D., Wocjan, P., Zhang, S. New Journal of Physics, 10(093004):1-18, September 2008
In measurement-based quantum computation, quantum algorithms are implemented via sequences of measurements. We describe a translationally invariant finite-range interaction on a one-dimensional qudit chain and prove that a single-shot measurement of the energy of an appropriate computational basis state with respect to this Hamiltonian provides the output of any quantum circuit. The required measurement accuracy scales inverse polynomially with the size of the simulated quantum circuit. This shows that the implementation of energy measurements on generic qudit chains is as hard as the realization of quantum computation. Here, a ‘measurement‘ is any procedure that samples from the spectral measurement induced by the observable and the state under consideration. As opposed to measurement-based quantum computation, the post-measurement state is irrelevant.
PDF DOI BibTeX

Empirical Inference Article Accurate NMR Structures Through Minimization of an Extended Hybrid Energy Nilges, M., Bernard, A., Bardiaux, B., Malliavin, T., Habeck, M., Rieping, W. Structure, 16(9):1305-1312, September 2008
The use of generous distance bounds has been the hallmark of NMR structure determination. However, bounds necessitate the estimation of data quality before the calculation, reduce the information content, introduce human bias, and allow for major errors in the structures. Here, we propose a new rapid structure calculation scheme based on Bayesian analysis. The minimization of an extended energy function, including a new type of distance restraint and a term depending on the data quality, results in an estimation of the data quality in addition to coordinates. This allows for the determination of the optimal weight on the experimental information. The resulting structures are of better quality and closer to the X–ray crystal structure of the same molecule. With the new calculation approach, the analysis of discrepancies from the target distances becomes meaningful. The strategy may be useful in other applications—for example, in homology modeling.
PDF DOI BibTeX

Empirical Inference Conference Paper An Analysis of Inference with the Universum Sinz, F., Chapelle, O., Agarwal, A., Schölkopf, B. In Advances in Neural Information Processing Systems 20: 21st Annual Conference on Neural Information Processing Systems 2007, Advances in neural information processing systems 20, 1369-1376, (Editors: JC Platt and D Koller and Y Singer and S Roweis), Curran, Red Hook, NY, USA, 21st Annual Conference on Neural Information Processing Systems (NIPS 2007), September 2008
We study a pattern classification algorithm which has recently been proposed by Vapnik and coworkers. It builds on a new inductive principle which assumes that in addition to positive and negative data, a third class of data is available, termed the Universum. We assay the behavior of the algorithm by establishing links with Fisher discriminant analysis and oriented PCA, as well as with an SVM in a projected subspace (or, equivalently, with a data-dependent reduced kernel). We also provide experimental results.
PDF Web BibTeX

Empirical Inference Conference Paper An Automated Combination of Kernels for Predicting Protein Subcellular Localization Ong, C., Zien, A. In Algorithms in Bioinformatics: 8th International Workshop (WABI 2008), WABI 2008, 186-197, (Editors: Crandall, K. A., J. Lagergren), Springer, Berlin, Germany, 8th Workshop on Algorithms in Bioinformatics, September 2008
Protein subcellular localization is a crucial ingredient to many important inferences about cellular processes, including prediction of protein function and protein interactions. While many predictive computational tools have been proposed, they tend to have complicated architectures and require many design decisions from the developer. Here we utilize the multiclass support vector machine (m-SVM) method to directly solve protein subcellular localization without resorting to the common approach of splitting the problem into several binary classification problems. We further propose a general class of protein sequence kernels which considers all motifs, including motifs with gaps. Instead of heuristically selecting one or a few kernels from this family, we utilize a recent extension of SVMs that optimizes over multiple kernels simultaneously. This way, we automatically search over families of possible amino acid motifs. We compare our automated approach to three other predictors on four different datasets, and show that we perform better than the current state of the art. Further, our method provides some insights as to which sequence motifs are most useful for determining subcellular ocalization, which are in agreement with biological reasoning.
PDF PDF Web DOI BibTeX

Empirical Inference Technical Report Approximation Algorithms for Bregman Clustering Co-clustering and Tensor Clustering Sra, S., Jegelka, S., Banerjee, A. (177), Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, September 2008
The Euclidean K-means problem is fundamental to clustering and over the years it has been intensely investigated. More recently, generalizations such as Bregman k-means [8], co-clustering [10], and tensor (multi-way) clustering [40] have also gained prominence. A well-known computational difficulty encountered by these clustering problems is the NP-Hardness of the associated optimization task, and commonly used methods guarantee at most local optimality. Consequently, approximation algorithms of varying degrees of sophistication have been developed, though largely for the basic Euclidean K-means (or `1-norm K-median) problem. In this paper we present approximation algorithms for several Bregman clustering problems by building upon the recent paper of Arthur and Vassilvitskii [5]. Our algorithms obtain objective values within a factor O(logK) for Bregman k-means, Bregman co-clustering, Bregman tensor clustering, and weighted kernel k-means. To our knowledge, except for some special cases, approximation algorithms have not been considered for these general clustering problems. There are several important implications of our work: (i) under the same assumptions as Ackermann et al. [1] it yields a much faster algorithm (non-exponential in K, unlike [1]) for information-theoretic clustering, (ii) it answers several open problems posed by [4], including generalizations to Bregman co-clustering, and tensor clustering, (iii) it provides practical and easy to implement methods—in contrast to several other common approximation approaches.
PDF BibTeX

Empirical Inference Conference Paper Assessing Nonlinear Granger Causality from Multivariate Time Series Sun, X. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2008, 440-455, (Editors: Daelemans, W. , B. Goethals, K. Morik), Springer, Berlin, Germany, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2008), September 2008
A straightforward nonlinear extension of Granger’s concept of causality in the kernel framework is suggested. The kernel-based approach to assessing nonlinear Granger causality in multivariate time series enables us to determine, in a model-free way, whether the causal relation between two time series is present or not and whether it is direct or mediated by other processes. The trace norm of the so-called covariance operator in feature space is used to measure the prediction error. Relying on this measure, we test the improvement of predictability between time series by subsampling-based multiple testing. The distributional properties of the resulting p-values reveal the direction of Granger causality. Experiments with simulated and real-world data show that our method provides encouraging results.
PDF PDF DOI BibTeX

Empirical Inference Conference Paper Automatic 3D Face Reconstruction from Single Images or Video Breuer, P., Kim, K., Kienzle, W., Schölkopf, B., Blanz, V. In Proceedings of the 8th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2008), FG 2008, 1-8, IEEE Computer Society, Los Alamitos, CA, USA, 8th IEEE International Conference on Automatic Face and Gesture Recognition, September 2008
This paper presents a fully automated algorithm for reconstructing a textured 3D model of a face from a single photograph or a raw video stream. The algorithm is based on a combination of Support Vector Machines (SVMs) and a Morphable Model of 3D faces. After SVM face detection, individual facial features are detected using a novel regression- and classification-based approach, and probabilistically plausible configurations of features are selected to produce a list of candidates for several facial feature positions. In the next step, the configurations of feature points are evaluated using a novel criterion that is based on a Morphable Model and a combination of linear projections. To make the algorithm robust with respect to head orientation, this process is iterated while the estimate of pose is refined. Finally, the feature points initialize a model-fitting procedure of the Morphable Model. The result is a highresolution 3D surface model.
PDF DOI BibTeX

Empirical Inference Conference Paper Bayesian Inference for Spiking Neuron Models with a Sparsity Prior Gerwinn, S., Macke, J., Seeger, M., Bethge, M. In Advances in Neural Information Processing Systems 20: 21st Annual Conference on Neural Information Processing Systems 2007, Advances in neural information processing systems 20, 529-536, (Editors: Platt, J. C., D. Koller, Y. Singer, S. Roweis), Curran, Red Hook, NY, USA, Twenty-First Annual Conference on Neural Information Processing Systems (NIPS 2007), September 2008
Generalized linear models are the most commonly used tools to describe the stimulus selectivity of sensory neurons. Here we present a Bayesian treatment of such models. Using the expectation propagation algorithm, we are able to approximate the full posterior distribution over all weights. In addition, we use a Laplacian prior to favor sparse solutions. Therefore, stimulus features that do not critically influence neural activity will be assigned zero weights and thus be effectively excluded by the model. This feature selection mechanism facilitates both the interpretation of the neuron model as well as its predictive abilities. The posterior distribution can be used to obtain confidence intervals which makes it possible to assess the statistical significance of the solution. In neural data analysis, the available amount of experimental measurements is often limited whereas the parameter space is large. In such a situation, both regularization by a sparsity prior and uncertainty estimates for the model parameters are essential. We apply our method to multi-electrode recordings of retinal ganglion cells and use our uncertainty estimate to test the statistical significance of functional couplings between neurons. Furthermore we used the sparsity of the Laplace prior to select those filters from a spike-triggered covariance analysis that are most informative about the neural response.
PDF Web BibTeX

Empirical Inference Technical Report Block-Iterative Algorithms for Non-Negative Matrix Approximation Sra, S. (176), Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, September 2008
In this report we present new algorithms for non-negative matrix approximation (NMA), commonly known as the NMF problem. Our methods improve upon the well-known methods of Lee & Seung [19] for both the Frobenius norm as well the Kullback-Leibler divergence versions of the problem. For the latter problem, our results are especially interesting because it seems to have witnessed much lesser algorithmic progress as compared to the Frobenius norm NMA problem. Our algorithms are based on a particular block-iterative acceleration technique for EM, which preserves the multiplicative nature of the updates and also ensures monotonicity. Furthermore, our algorithms also naturally apply to the Bregman-divergence NMA algorithms of Dhillon and Sra [8]. Experimentally, we show that our algorithms outperform the traditional Lee/Seung approach most of the time.
PDF BibTeX

Empirical Inference Conference Paper Colored Maximum Variance Unfolding Song, L., Smola, A., Borgwardt, K., Gretton, A. In Advances in Neural Information Processing Systems 20: 21st Annual Conference on Neural Information Processing Systems 2007, Advances in neural information processing systems 20, 1385-1392, (Editors: Platt, J. C., D. Koller, Y. Singer, S. Roweis), Curran, Red Hook, NY, USA, Twenty-First Annual Conference on Neural Information Processing Systems (NIPS 2007), September 2008
Maximum variance unfolding (MVU) is an effective heuristic for dimensionality reduction. It produces a low-dimensional representation of the data by maximizing the variance of their embeddings while preserving the local distances of the original data. We show that MVU also optimizes a statistical dependence measure which aims to retain the identity of individual observations under the distancepreserving constraints. This general view allows us to design "colored" variants of MVU, which produce low-dimensional representations for a given task, e.g. subject to class labels or other side information.
PDF Web BibTeX

Empirical Inference Article Comparison of Pattern Recognition Methods in Classifying High-resolution BOLD Signals Obtained at High Magnetic Field in Monkeys Ku, S., Gretton, A., Macke, J., Logothetis, N. Magnetic Resonance Imaging, 26(7):1007-1014, September 2008
Pattern recognition methods have shown that functional magnetic resonance imaging (fMRI) data can reveal significant information about brain activity. For example, in the debate of how object categories are represented in the brain, multivariate analysis has been used to provide evidence of a distributed encoding scheme [Science 293:5539 (2001) 2425–2430]. Many follow-up studies have employed different methods to analyze human fMRI data with varying degrees of success [Nature reviews 7:7 (2006) 523–534]. In this study, we compare four popular pattern recognition methods: correlation analysis, support-vector machines (SVM), linear discriminant analysis (LDA) and Gaussian naïve Bayes (GNB), using data collected at high field (7 Tesla) with higher resolution than usual fMRI studies. We investigate prediction performance on single trials and for averages across varying numbers of stimulus presentations. The performance of the various algorithms depends on the nature of the brain activity being categorized: for several tasks, many of the methods work well, whereas for others, no method performs above chance level. An important factor in overall classification performance is careful preprocessing of the data, including dimensionality reduction, voxel selection and outlier elimination.
PDF Web DOI BibTeX

Empirical Inference Conference Paper Consistent Minimization of Clustering Objective Functions von Luxburg, U., Bubeck, S., Jegelka, S., Kaufmann, M. In Advances in Neural Information Processing Systems 20: 21st Annual Conference on Neural Information Processing Systems 2007, Advances in neural information processing systems 20, 961-968, (Editors: Platt, J. C., D. Koller, Y. Singer, S. Roweis), Curran, Red Hook, NY, USA, Twenty-First Annual Conference on Neural Information Processing Systems (NIPS 2007), September 2008
Clustering is often formulated as a discrete optimization problem. The objective is to find, among all partitions of the data set, the best one according to some quality measure. However, in the statistical setting where we assume that the finite data set has been sampled from some underlying space, the goal is not to find the best partition of the given sample, but to approximate the true partition of the underlying space. We argue that the discrete optimization approach usually does not achieve this goal. As an alternative, we suggest the paradigm of nearest neighbor clustering‘‘. Instead of selecting the best out of all partitions of the sample, it only considers partitions in some restricted function class. Using tools from statistical learning theory we prove that nearest neighbor clustering is statistically consistent. Moreover, its worst case complexity is polynomial by co nstructi on, and it can b e implem ented wi th small average case co mplexity using b ranch an d bound.
PDF Web BibTeX

Empirical Inference Conference Paper Discriminative K-means for Clustering Ye, J., Zhao, Z., Wu, M. In Advances in Neural Information Processing Systems 20: 21st Annual Conference on Neural Information Processing Systems 2007, Advances in neural information processing systems 20, 1649-1656, (Editors: Platt, J. C., D. Koller, Y. Singer, S. Roweis), Curran, Red Hook, NY, USA, Twenty-First Annual Conference on Neural Information Processing Systems (NIPS 2007), September 2008
We present a theoretical study on the discriminative clustering framework, recently proposed for simultaneous subspace selection via linear discriminant analysis (LDA) and clustering. Empirical results have shown its favorable performance in comparison with several other popular clustering algorithms. However, the inherent relationship between subspace selection and clustering in this framework is not well understood, due to the iterative nature of the algorithm. We show in this paper that this iterative subspace selection and clustering is equivalent to kernel K-means with a specific kernel Gram matrix. This provides significant and new insights into the nature of this subspace selection procedure. Based on this equivalence relationship, we propose the Discriminative K-means (DisKmeans) algorithm for simultaneous LDA subspace selection and clustering, as well as an automatic parameter estimation procedure. We also present the nonlinear extension of DisKmeans using kernels. We show that the learning of the ke rnel matrix over a convex set of pre-specified kernel matrices can be incorporated into the clustering formulation. The connection between DisKmeans and several other clustering algorithms is also analyzed. The presented theories and algorithms are evaluated through experiments on a collection of benchmark data sets.
PDF Web BibTeX

Empirical Inference Conference Paper Distribution-free Learning of Bayesian Network Structure Sun, X. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2008, ECML PKDD 2008, 423-439, (Editors: Daelemans, W. , B. Goethals, K. Morik), Springer, Berlin, Germany, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, September 2008
We present an independence-based method for learning Bayesian network (BN) structure without making any assumptions on the probability distribution of the domain. This is mainly useful for continuous domains. Even mixed continuous-categorical domains and structures containing vectorial variables can be handled. We address the problem by developing a non-parametric conditional independence test based on the so-called kernel dependence measure, which can be readily used by any existing independence-based BN structure learning algorithm. We demonstrate the structure learning of graphical models in continuous and mixed domains from real-world data without distributional assumptions. We also experimentally show that our test is a good alternative, in particular in case of small sample sizes, compared to existing tests, which can only be used in purely categorical or continuous domains.
PDF PDF DOI BibTeX

Empirical Inference Conference Paper Episodic Reinforcement Learning by Logistic Reward-Weighted Regression Wierstra, D., Schaul, T., Peters, J., Schmidhuber, J. In Artificial Neural Networks: ICANN 2008, ICANN 2008, 407-416, (Editors: Kurkova-Pohlova, V. , R. Neruda, J. Koutnik), Springer, Berlin, Germany, 18th International Conference on Artificial Neural Networks, September 2008
It has been a long-standing goal in the adaptive control community to reduce the generically difficult, general reinforcement learning (RL) problem to simpler problems solvable by supervised learning. While this approach is today’s standard for value function-based methods, fewer approaches are known that apply similar reductions to policy search methods. Recently, it has been shown that immediate RL problems can be solved by reward-weighted regression, and that the resulting algorithm is an expectation maximization (EM) algorithm with strong guarantees. In this paper, we extend this algorithm to the episodic case and show that it can be used in the context of LSTM recurrent neural networks (RNNs). The resulting RNN training algorithm is equivalent to a weighted self-modeling supervised learning technique. We focus on partially observable Markov decision problems (POMDPs) where it is essential that the policy is nonstationary in order to be optimal. We show that this new reward-weighted logistic regression u sed in conjunction with an RNN architecture can solve standard benchmark POMDPs with ease.
PDF Web DOI BibTeX

Movement Generation and Control Conference Paper Experimental Study of Limit Cycle and Chaotic Controllers for the Locomotion of Centipede Robots Matthey, L., Righetti, L., Ijspeert, A. In 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, 1860-1865, IEEE, Nice, France, September 2008
In this contribution we present a CPG (central pattern generator) controller based on coupled Rossler systems. It is able to generate both limit cycle and chaotic behaviors through bifurcation. We develop an experimental test bench to measure quantitatively the performance of different controllers on unknown terrains of increasing difficulty. First, we show that for flat terrains, open loop limit cycle systems are the most efficient (in terms of speed of locomotion) but that they are quite sensitive to environmental changes. Second, we show that sensory feedback is a crucial addition for unknown terrains. Third, we show that the chaotic controller with sensory feedback outperforms the other controllers in very difficult terrains and actually promotes the emergence of short synchronized movement patterns. All that is done using an unified framework for the generation of limit cycle and chaotic behaviors, where a simple parameter change can switch from one behavior to the other through bifurcation. Such flexibility would allow the automatic adaptation of the robot locomotion strategy to the terrain uncertainty.
DOI URL BibTeX