Header logo is


2006


no image
Mining frequent stem patterns from unaligned RNA sequences

Hamada, M., Tsuda, K., Kudo, T., Kin, T., Asai, K.

Bioinformatics, 22(20):2480-2487, October 2006 (article)

Abstract
Motivation: In detection of non-coding RNAs, it is often necessary to identify the secondary structure motifs from a set of putative RNA sequences. Most of the existing algorithms aim to provide the best motif or few good motifs, but biologists often need to inspect all the possible motifs thoroughly. Results: Our method RNAmine employs a graph theoretic representation of RNA sequences, and detects all the possible motifs exhaustively using a graph mining algorithm. The motif detection problem boils down to finding frequently appearing patterns in a set of directed and labeled graphs. In the tasks of common secondary structure prediction and local motif detection from long sequences, our method performed favorably both in accuracy and in efficiency with the state-of-the-art methods such as CMFinder.

ei

PDF Web DOI [BibTex]

2006


PDF Web DOI [BibTex]


no image
Geometric Analysis of Hilbert Schmidt Independence criterion based ICA contrast function

Shen, H., Jegelka, S., Gretton, A.

(PA006080), National ICT Australia, Canberra, Australia, October 2006 (techreport)

ei

Web [BibTex]

Web [BibTex]


no image
Large-Scale Gene Expression Profiling Reveals Major Pathogenetic Pathways of Cartilage Degeneration in Osteoarthritis

Aigner, T., Fundel, K., Saas, J., Gebhard, P., Haag, J., Weiss, T., Zien, A., Obermayr, F., Zimmer, R., Bartnik, E.

Arthritis and Rheumatism, 54(11):3533-3544, October 2006 (article)

Abstract
Objective. Despite many research efforts in recent decades, the major pathogenetic mechanisms of osteo- arthritis (OA), including gene alterations occurring during OA cartilage degeneration, are poorly under- stood, and there is no disease-modifying treatment approach. The present study was therefore initiated in order to identify differentially expressed disease-related genes and potential therapeutic targets. Methods. This investigation consisted of a large gene expression profiling study performed based on 78 normal and disease samples, using a custom-made complementar y DNA array covering >4,000 genes. Results. Many differentially expressed genes were identified, including the expected up-regulation of ana- bolic and catabolic matrix genes. In particular, the down-regulation of important oxidative defense genes, i.e., the genes for superoxide dismutases 2 and 3 and glutathione peroxidase 3, was prominent. This indicates that continuous oxidative stress to the cells and the matrix is one major underlying pathogenetic mecha- nism in OA. Also, genes that are involved in the phenot ypic stabilit y of cells, a feature that is greatly reduced in OA cartilage, appeared to be suppressed. Conclusion. Our findings provide a reference data set on gene alterations in OA cartilage and, importantly, indicate major mechanisms underlying central cell bio- logic alterations that occur during the OA disease process. These results identify molecular targets that can be further investigated in the search for therapeutic interventions.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Semi-Supervised Support Vector Machines and Application to Spam Filtering

Zien, A.

ECML Discovery Challenge Workshop, September 2006 (talk)

Abstract
After introducing the semi-supervised support vector machine (aka TSVM for "transductive SVM"), a few popular training strategies are briefly presented. Then the assumptions underlying semi-supervised learning are reviewed. Finally, two modern TSVM optimization techniques are applied to the spam filtering data sets of the workshop; it is shown that they can achieve excellent results, if the problem of the data being non-iid can be handled properly.

ei

PDF Web [BibTex]


no image
A Linear Programming Approach for Molecular QSAR analysis

Saigo, H., Kadowaki, T., Tsuda, K.

In MLG 2006, pages: 85-96, (Editors: Gärtner, T. , G. C. Garriga, T. Meinl), International Workshop on Mining and Learning with Graphs, September 2006, Best Paper Award (inproceedings)

Abstract
Small molecules in chemistry can be represented as graphs. In a quantitative structure-activity relationship (QSAR) analysis, the central task is to find a regression function that predicts the activity of the molecule in high accuracy. Setting a QSAR as a primal target, we propose a new linear programming approach to the graph-based regression problem. Our method extends the graph classification algorithm by Kudo et al. (NIPS 2004), which is a combination of boosting and graph mining. Instead of sequential multiplicative updates, we employ the linear programming boosting (LP) for regression. The LP approach allows to include inequality constraints for the parameter vector, which turns out to be particularly useful in QSAR tasks where activity values are sometimes unavailable. Furthermore, the efficiency is improved significantly by employing multiple pricing.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Incremental Aspect Models for Mining Document Streams

Surendran, A., Sra, S.

In PKDD 2006, pages: 633-640, (Editors: Fürnkranz, J. , T. Scheffer, M. Spiliopoulou), Springer, Berlin, Germany, 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, September 2006 (inproceedings)

Abstract
In this paper we introduce a novel approach for incrementally building aspect models, and use it to dynamically discover underlying themes from document streams. Using the new approach we present an application which we call “query-line tracking” i.e., we automatically discover and summarize different themes or stories that appear over time, and that relate to a particular query. We present evaluation on news corpora to demonstrate the strength of our method for both query-line tracking, online indexing and clustering.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Learning Eye Movements

Kienzle, W., Wichmann, F., Schölkopf, B., Franz, M.

Sensory Coding And The Natural Environment, 2006, pages: 1, September 2006 (poster)

Abstract
The human visual system samples images through saccadic eye movements which rapidly change the point of fixation. Although the selection of eye movement targets depends on numerous top-down mechanisms, a number of recent studies have shown that low-level image features such as local contrast or edges play an important role. These studies typically used predefined image features which were afterwards experimentally verified. Here, we follow a complementary approach: instead of testing a set of candidate image features, we infer these hypotheses from the data, using methods from statistical learning. To this end, we train a non-linear classifier on fixated vs. randomly selected image patches without making any physiological assumptions. The resulting classifier can be essentially characterized by a nonlinear combination of two center-surround receptive fields. We find that the prediction performance of this simple model on our eye movement data is indistinguishable from the physiologically motivated model of Itti & Koch (2000) which is far more complex. In particular, we obtain a comparable performance without using any multi-scale representations, long-range interactions or oriented image features.

ei

Web [BibTex]

Web [BibTex]


no image
Implicit Surface Modelling with a Globally Regularised Basis of Compact Support

Walder, C., Schölkopf, B., Chapelle, O.

Computer Graphics Forum, 25(3):635-644, September 2006 (article)

Abstract
We consider the problem of constructing a globally smooth analytic function that represents a surface implicitly by way of its zero set, given sample points with surface normal vectors. The contributions of the paper include a novel means of regularising multi-scale compactly supported basis functions that leads to the desirable interpolation properties previously only associated with fully supported bases. We also provide a regularisation framework for simpler and more direct treatment of surface normals, along with a corresponding generalisation of the representer theorem lying at the core of kernel-based machine learning methods. We demonstrate the techniques on 3D problems of up to 14 million data points, as well as 4D time series data and four-dimensional interpolation between three-dimensional shapes.

ei

PDF GZIP DOI [BibTex]


no image
PALMA: Perfect Alignments using Large Margin Algorithms

Rätsch, G., Hepp, B., Schulze, U., Ong, C.

In GCB 2006, pages: 104-113, (Editors: Huson, D. , O. Kohlbacher, A. Lupas, K. Nieselt, A. Zell), Gesellschaft für Informatik, Bonn, Germany, German Conference on Bioinformatics, September 2006 (inproceedings)

Abstract
Despite many years of research on how to properly align sequences in the presence of sequencing errors, alternative splicing and micro-exons, the correct alignment of mRNA sequences to genomic DNA is still a challenging task. We present a novel approach based on large margin learning that combines kernel based splice site predictions with common sequence alignment techniques. By solving a convex optimization problem, our algorithm -- called PALMA -- tunes the parameters of the model such that the true alignment scores higher than all other alignments. In an experimental study on the alignments of mRNAs containing artificially generated micro-exons, we show that our algorithm drastically outperforms all other methods: It perfectly aligns all 4358 sequences on an hold-out set, while the best other method misaligns at least 90 of them. Moreover, our algorithm is very robust against noise in the query sequence: when deleting, inserting, or mutating up to 50% of the query sequence, it still aligns 95% of all sequences correctly, while other methods achieve less than 36% accuracy. For datasets, additional results and a stand-alone alignment tool see http://www.fml.mpg.de/raetsch/projects/palma.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Graph Based Semi-Supervised Learning with Sharper Edges

Shin, H., Hill, N., Rätsch, G.

In ECML 2006, pages: 401-412, (Editors: Fürnkranz, J. , T. Scheffer, M. Spiliopoulou), Springer, Berlin, Germany, 17th European Conference on Machine Learning (ECML), September 2006 (inproceedings)

Abstract
In many graph-based semi-supervised learning algorithms, edge weights are assumed to be fixed and determined by the data points‘ (often symmetric)relationships in input space, without considering directionality. However, relationships may be more informative in one direction (e.g. from labelled to unlabelled) than in the reverse direction, and some relationships (e.g. strong weights between oppositely labelled points) are unhelpful in either direction. Undesirable edges may reduce the amount of influence an informative point can propagate to its neighbours -- the point and its outgoing edges have been ``blunted.‘‘ We present an approach to ``sharpening‘‘ in which weights are adjusted to meet an optimization criterion wherever they are directed towards labelled points. This principle can be applied to a wide variety of algorithms. In the current paper, we present one ad hoc solution satisfying the principle, in order to show that it can improve performance on a number of publicly available benchmark data sets.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Semi-Supervised Learning

Chapelle, O., Schölkopf, B., Zien, A.

pages: 508, Adaptive computation and machine learning, MIT Press, Cambridge, MA, USA, September 2006 (book)

Abstract
In the field of machine learning, semi-supervised learning (SSL) occupies the middle ground, between supervised learning (in which all training examples are labeled) and unsupervised learning (in which no label data are given). Interest in SSL has increased in recent years, particularly because of application domains in which unlabeled data are plentiful, such as images, text, and bioinformatics. This first comprehensive overview of SSL presents state-of-the-art algorithms, a taxonomy of the field, selected applications, benchmark experiments, and perspectives on ongoing and future research. Semi-Supervised Learning first presents the key assumptions and ideas underlying the field: smoothness, cluster or low-density separation, manifold structure, and transduction. The core of the book is the presentation of SSL methods, organized according to algorithmic strategies. After an examination of generative models, the book describes algorithms that implement the low-density separation assumption, graph-based methods, and algorithms that perform two-step learning. The book then discusses SSL applications and offers guidelines for SSL practitioners by analyzing the results of extensive benchmark experiments. Finally, the book looks at interesting directions for SSL research. The book closes with a discussion of the relationship between semi-supervised learning and transduction.

ei

Web [BibTex]

Web [BibTex]


no image
Finite-Horizon Optimal State-Feedback Control of Nonlinear Stochastic Systems Based on a Minimum Principle

Deisenroth, MP., Ohtsuka, T., Weissel, F., Brunn, D., Hanebeck, UD.

In MFI 2006, pages: 371-376, (Editors: Hanebeck, U. D.), IEEE Service Center, Piscataway, NJ, USA, 6th IEEE International Conference on Multisensor Fusion and Integration, September 2006 (inproceedings)

Abstract
In this paper, an approach to the finite-horizon optimal state-feedback control problem of nonlinear, stochastic, discrete-time systems is presented. Starting from the dynamic programming equation, the value function will be approximated by means of Taylor series expansion up to second-order derivatives. Moreover, the problem will be reformulated, such that a minimum principle can be applied to the stochastic problem. Employing this minimum principle, the optimal control problem can be rewritten as a two-point boundary-value problem to be solved at each time step of a shrinking horizon. To avoid numerical problems, the two-point boundary-value problem will be solved by means of a continuation method. Thus, the curse of dimensionality of dynamic programming is avoided, and good candidates for the optimal state-feedback controls are obtained. The proposed approach will be evaluated by means of a scalar example system.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Uniform Convergence of Adaptive Graph-Based Regularization

Hein, M.

In COLT 2006, pages: 50-64, (Editors: Lugosi, G. , H.-U. Simon), Springer, Berlin, Germany, 19th Annual Conference on Learning Theory, September 2006 (inproceedings)

Abstract
The regularization functional induced by the graph Laplacian of a random neighborhood graph based on the data is adaptive in two ways. First it adapts to an underlying manifold structure and second to the density of the data-generating probability measure. We identify in this paper the limit of the regularizer and show uniform convergence over the space of Hoelder functions. As an intermediate step we derive upper bounds on the covering numbers of Hoelder functions on compact Riemannian manifolds, which are of independent interest for the theoretical analysis of manifold-based learning methods.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Regularised CSP for Sensor Selection in BCI

Farquhar, J., Hill, N., Lal, T., Schölkopf, B.

In Proceedings of the 3rd International Brain-Computer Interface Workshop and Training Course 2006, pages: 14-15, (Editors: GR Müller-Putz and C Brunner and R Leeb and R Scherer and A Schlögl and S Wriessnegger and G Pfurtscheller), Verlag der Technischen Universität Graz, Graz, Austria, 3rd International Brain-Computer Interface Workshop and Training Course, September 2006 (inproceedings)

Abstract
The Common Spatial Pattern (CSP) algorithm is a highly successful method for efficiently calculating spatial filters for brain signal classification. Spatial filtering can improve classification performance considerably, but demands that a large number of electrodes be mounted, which is inconvenient in day-to-day BCI usage. The CSP algorithm is also known for its tendency to overfit, i.e. to learn the noise in the training set rather than the signal. Both problems motivate an approach in which spatial filters are sparsified. We briefly sketch a reformulation of the problem which allows us to do this, using 1-norm regularisation. Focusing on the electrode selection issue, we present preliminary results on EEG data sets that suggest that effective spatial filters may be computed with as few as 10--20 electrodes, hence offering the potential to simplify the practical realisation of BCI systems significantly.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Time-Dependent Demixing of Task-Relevant EEG Signals

Hill, N., Farquhar, J., Lal, T., Schölkopf, B.

In Proceedings of the 3rd International Brain-Computer Interface Workshop and Training Course 2006, pages: 20-21, (Editors: GR Müller-Putz and C Brunner and R Leeb and R Scherer and A Schlögl and S Wriessnegger and G Pfurtscheller), Verlag der Technischen Universität Graz, Graz, Austria, 3rd International Brain-Computer Interface Workshop and Training Course, September 2006 (inproceedings)

Abstract
Given a spatial filtering algorithm that has allowed us to identify task-relevant EEG sources, we present a simple approach for monitoring the activity of these sources while remaining relatively robust to changes in other (task-irrelevant) brain activity. The idea is to keep spatial *patterns* fixed rather than spatial filters, when transferring from training to test sessions or from one time window to another. We show that a fixed spatial pattern (FSP) approach, using a moving-window estimate of signal covariances, can be more robust to non-stationarity than a fixed spatial filter (FSF) approach.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Inferential Structure Determination: Probabilistic determination and validation of NMR structures

Habeck, M.

Gordon Research Conference on Computational Aspects of Biomolecular NMR, September 2006 (talk)

ei

Web [BibTex]

Web [BibTex]


no image
A Sober Look at Clustering Stability

Ben-David, S., von Luxburg, U., Pal, D.

In COLT 2006, pages: 5-19, (Editors: Lugosi, G. , H.-U. Simon), Springer, Berlin, Germany, 19th Annual Conference on Learning Theory, September 2006 (inproceedings)

Abstract
Stability is a common tool to verify the validity of sample based algorithms. In clustering it is widely used to tune the parameters of the algorithm, such as the number k of clusters. In spite of the popularity of stability in practical applications, there has been very little theoretical analysis of this notion. In this paper we provide a formal definition of stability and analyze some of its basic properties. Quite surprisingly, the conclusion of our analysis is that for large sample size, stability is fully determined by the behavior of the objective function which the clustering algorithm is aiming to minimize. If the objective function has a unique global minimizer, the algorithm is stable, otherwise it is unstable. In particular we conclude that stability is not a well-suited tool to determine the number of clusters - it is determined by the symmetries of the data which may be unrelated to clustering parameters. We prove our results for center-based clusterings and for spectral clustering, and support our conclusions by many examples in which the behavior of stability is counter-intuitive.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Information Marginalization on Subgraphs

Huang, J., Zhu, T., Rereiner, R., Zhou, D., Schuurmans, D.

In ECML/PKDD 2006, pages: 199-210, (Editors: Fürnkranz, J. , T. Scheffer, M. Spiliopoulou), Springer, Berlin, Germany, 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, September 2006 (inproceedings)

Abstract
Real-world data often involves objects that exhibit multiple relationships; for example, ‘papers’ and ‘authors’ exhibit both paper-author interactions and paper-paper citation relationships. A typical learning problem requires one to make inferences about a subclass of objects (e.g. ‘papers’), while using the remaining objects and relations to provide relevant information. We present a simple, unified mechanism for incorporating information from multiple object types and relations when learning on a targeted subset. In this scheme, all sources of relevant information are marginalized onto the target subclass via random walks. We show that marginalized random walks can be used as a general technique for combining multiple sources of information in relational data. With this approach, we formulate new algorithms for transduction and ranking in relational data, and quantify the performance of new schemes on real world data—achieving good results in many problems.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Bayesian Active Learning for Sensitivity Analysis

Pfingsten, T.

In ECML 2006, pages: 353-364, (Editors: Fürnkranz, J. , T. Scheffer, M. Spiliopoulou), Springer, Berlin, Germany, 17th European Conference on Machine Learning, September 2006 (inproceedings)

Abstract
Designs of micro electro-mechanical devices need to be robust against fluctuations in mass production. Computer experiments with tens of parameters are used to explore the behavior of the system, and to compute sensitivity measures as expectations over the input distribution. Monte Carlo methods are a simple approach to estimate these integrals, but they are infeasible when the models are computationally expensive. Using a Gaussian processes prior, expensive simulation runs can be saved. This Bayesian quadrature allows for an active selection of inputs where the simulation promises to be most valuable, and the number of simulation runs can be reduced further. We present an active learning scheme for sensitivity analysis which is rigorously derived from the corresponding Bayesian expected loss. On three fully featured, high dimensional physical models of electro-mechanical sensors, we show that the learning rate in the active learning scheme is significantly better than for passive learning.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
An Online Support Vector Machine for Abnormal Events Detection

Davy, M., Desobry, F., Gretton, A., Doncarli, C.

Signal Processing, 86(8):2009-2025, August 2006 (article)

Abstract
The ability to detect online abnormal events in signals is essential in many real-world Signal Processing applications. Previous algorithms require an explicit signal statistical model, and interpret abnormal events as statistical model abrupt changes. Corresponding implementation relies on maximum likelihood or on Bayes estimation theory with generally excellent performance. However, there are numerous cases where a robust and tractable model cannot be obtained, and model-free approaches need to be considered. In this paper, we investigate a machine learning, descriptor-based approach that does not require an explicit descriptors statistical model, based on Support Vector novelty detection. A sequential optimization algorithm is introduced. Theoretical considerations as well as simulations on real signals demonstrate its practical efficiency.

ei

PDF PostScript PDF DOI [BibTex]

PDF PostScript PDF DOI [BibTex]


no image
A tutorial on spectral clustering

von Luxburg, U.

(149), Max Planck Institute for Biological Cybernetics, Tübingen, August 2006 (techreport)

Abstract
In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. Nevertheless, on the first glance spectral clustering looks a bit mysterious, and it is not obvious to see why it works at all and what it really does. This article is a tutorial introduction to spectral clustering. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.

ei

PDF [BibTex]

PDF [BibTex]


no image
Machine Learning Algorithms for Polymorphism Detection

Schweikert, G., Zeller, G., Clark, R., Ossowski, S., Warthmann, N., Shinn, P., Frazer, K., Ecker, J., Huson, D., Weigel, D., Schölkopf, B., Rätsch, G.

2nd ISCB Student Council Symposium, August 2006 (talk)

Abstract
Analyzing resequencing array data using machine learning, we obtain a genome-wide inventory of polymorphisms in 20 wild strains of Arabidopsis thaliana, including 750,000 single nucleotide poly- morphisms (SNPs) and thousands of highly polymorphic regions and deletions. We thus provide an unprecedented resource for the study of natural variation in plants.

ei

Web [BibTex]

Web [BibTex]


no image
Integrating Structured Biological data by Kernel Maximum Mean Discrepancy

Borgwardt, K., Gretton, A., Rasch, M., Kriegel, H., Schölkopf, B., Smola, A.

Bioinformatics, 22(4: ISMB 2006 Conference Proceedings):e49-e57, August 2006 (article)

Abstract
Motivation: Many problems in data integration in bioinformatics can be posed as one common question: Are two sets of observations generated by the same distribution? We propose a kernel-based statistical test for this problem, based on the fact that two distributions are different if and only if there exists at least one function having different expectation on the two distributions. Consequently we use the maximum discrepancy between function means as the basis of a test statistic. The Maximum Mean Discrepancy (MMD) can take advantage of the kernel trick, which allows us to apply it not only to vectors, but strings, sequences, graphs, and other common structured data types arising in molecular biology. Results: We study the practical feasibility of an MMD-based test on three central data integration tasks: Testing cross-platform comparability of microarray data, cancer diagnosis, and data-content based schema matching for two different protein function classification schemas. In all of these experiments, including high-dimensional ones, MMD is very accurate in finding samples that were generated from the same distribution, and outperforms its best competitors. Conclusions: We have defined a novel statistical test of whether two samples are from the same distribution, compatible with both multivariate and structured data, that is fast, easy to implement, and works well, as confirmed by our experiments.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Towards the Inference of Graphs on Ordered Vertexes

Zien, A., Raetsch, G., Ong, C.

(150), Max Planck Institute for Biological Cybernetics, Tübingen, August 2006 (techreport)

Abstract
We propose novel methods for machine learning of structured output spaces. Specifically, we consider outputs which are graphs with vertices that have a natural order. We consider the usual adjacency matrix representation of graphs, as well as two other representations for such a graph: (a) decomposing the graph into a set of paths, (b) converting the graph into a single sequence of nodes with labeled edges. For each of the three representations, we propose an encoding and decoding scheme. We also propose an evaluation measure for comparing two graphs.

ei

PDF [BibTex]

PDF [BibTex]


no image
Large Scale Transductive SVMs

Collobert, R., Sinz, F., Weston, J., Bottou, L.

Journal of Machine Learning Research, 7, pages: 1687-1712, August 2006 (article)

Abstract
We show how the Concave-Convex Procedure can be applied to the optimization of Transductive SVMs, which traditionally requires solving a combinatorial search problem. This provides for the first time a highly scalable algorithm in the nonlinear case. Detailed experiments verify the utility of our approach.

ei

PostScript PDF PDF [BibTex]

PostScript PDF PDF [BibTex]


no image
Supervised Probabilistic Principal Component Analysis

Yu, S., Yu, K., Tresp, V., Kriegel, H., Wu, M.

In KDD 2006, pages: 464-473, (Editors: Ungar, L. ), ACM Press, New York, NY, USA, 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2006 (inproceedings)

Abstract
Principal component analysis (PCA) has been extensively applied in data mining, pattern recognition and information retrieval for unsupervised dimensionality reduction. When labels of data are available, e.g.,~in a classification or regression task, PCA is however not able to use this information. The problem is more interesting if only part of the input data are labeled, i.e.,~in a semi-supervised setting. In this paper we propose a supervised PCA model called SPPCA and a semi-supervised PCA model called S$^2$PPCA, both of which are extensions of a probabilistic PCA model. The proposed models are able to incorporate the label information into the projection phase, and can naturally handle multiple outputs (i.e.,~in multi-task learning problems). We derive an efficient EM learning algorithm for both models, and also provide theoretical justifications of the model behaviors. SPPCA and S$^2$PPCA are compared with other supervised projection methods on various learning tasks, and show not only promising performance but also good scalability.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Building Support Vector Machines with Reduced Classifier Complexity

Keerthi, S., Chapelle, O., DeCoste, D.

Journal of Machine Learning Research, 7, pages: 1493-1515, July 2006 (article)

Abstract
Support vector machines (SVMs), though accurate, are not preferred in applications requiring great classification speed, due to the number of support vectors being large. To overcome this problem we devise a primal method with the following properties: (1) it decouples the idea of basis functions from the concept of support vectors; (2) it greedily finds a set of kernel basis functions of a specified maximum size ($dmax$) to approximate the SVM primal cost function well; (3) it is efficient and roughly scales as $O(ndmax^2)$ where $n$ is the number of training examples; and, (4) the number of basis functions it requires to achieve an accuracy close to the SVM accuracy is usually far less than the number of SVM support vectors.

ei

PDF [BibTex]

PDF [BibTex]


no image
Inferential structure determination: Overview and new developments

Habeck, M.

Sixth CCPN Annual Conference: Efficient and Rapid Structure Determination by NMR, July 2006 (talk)

ei

Web [BibTex]

Web [BibTex]


no image
ARTS: Accurate Recognition of Transcription Starts in Human

Sonnenburg, S., Zien, A., Rätsch, G.

Bioinformatics, 22(14):e472-e480, July 2006 (article)

Abstract
Motivation: One of the most important features of genomic DNA are the protein-coding genes. While it is of great value to identify those genes and the encoded proteins, it is also crucial to understand how their transcription is regulated. To this end one has to identify the corresponding promoters and the contained transcription factor binding sites. TSS finders can be used to locate potential promoters. They may also be used in combination with other signal and content detectors to resolve entire gene structures. Results: We have developed a novel kernel based method - called ARTS - that accurately recognizes transcription start sites in human. The application of otherwise too computationally expensive Support Vector Machines was made possible due to the use of efficient training and evaluation techniques using suffix tries. In a carefully designed experimental study, we compare our TSS finder to state-of-the-art methods from the literature: McPromoter, Eponine and FirstEF. For given false positive rates within a reasonable range, we consistently achieve considerably higher true positive rates. For instance, ARTS finds about 24% true positives at a false positive rate of 1/1000, where the other methods find less than half (10.5%). Availability: Datasets, model selection results, whole genome predictions, and additional experimental results are available at http://www.fml.tuebingen.mpg.de/raetsch/projects/arts

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
MR/PET Attenuation Correction

Hofmann, M., Schölkopf, B., Steinke, F., Pichler, B.

Max-Planck-Gesellschaft, Biologische Kybernetik, July 2006 (patent)

ei

[BibTex]

[BibTex]


no image
Large Scale Multiple Kernel Learning

Sonnenburg, S., Rätsch, G., Schäfer, C., Schölkopf, B.

Journal of Machine Learning Research, 7, pages: 1531-1565, July 2006 (article)

Abstract
While classical kernel-based learning algorithms are based on a single kernel, in practice it is often desirable to use multiple kernels. Lanckriet et al. (2004) considered conic combinations of kernel matrices for classification, leading to a convex quadratically constrained quadratic program. We show that it can be rewritten as a semi-infinite linear program that can be efficiently solved by recycling the standard SVM implementations. Moreover, we generalize the formulation and our method to a larger class of problems, including regression and one-class classification. Experimental results show that the proposed algorithm works for hundred thousands of examples or hundreds of kernels to be combined, and helps for automatic model selection, improving the interpretability of the learning result. In a second part we discuss general speed up mechanism for SVMs, especially when used with sparse feature maps as appear for string kernels, allowing us to train a string kernel SVM on a 10 million real-world splice data set from computational biology. We integrated multiple kernel learning in our machine learning toolbox SHOGUN for which the source code is publicly available at http://www.fml.tuebingen.mpg.de/raetsch/projects/shogun.

ei

PDF [BibTex]

PDF [BibTex]


no image
Factorial coding of natural images: how effective are linear models in removing higher-order dependencies?

Bethge, M.

Journal of the Optical Society of America A, 23(6):1253-1268, June 2006 (article)

Abstract
The performance of unsupervised learning models for natural images is evaluated quantitatively by means of information theory. We estimate the gain in statistical independence (the multi-information reduction) achieved with independent component analysis (ICA), principal component analysis (PCA), zero-phase whitening, and predictive coding. Predictive coding is translated into the transform coding framework, where it can be characterized by the constraint of a triangular filter matrix. A randomly sampled whitening basis and the Haar wavelet are included into the comparison as well. The comparison of all these methods is carried out for different patch sizes, ranging from 2x2 to 16x16 pixels. In spite of large differences in the shape of the basis functions, we find only small differences in the multi-information between all decorrelation transforms (5% or less) for all patch sizes. Among the second-order methods, PCA is optimal for small patch sizes and predictive coding performs best for large patch sizes. The extra gain achieved with ICA is always less than 2%. In conclusion, the `edge filters‘ found with ICA lead only to a surprisingly small improvement in terms of its actual objective.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
A Continuation Method for Semi-Supervised SVMs

Chapelle, O., Chi, M., Zien, A.

In ICML 2006, pages: 185-192, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
Semi-Supervised Support Vector Machines (S3VMs) are an appealing method for using unlabeled data in classification: their objective function favors decision boundaries which do not cut clusters. However their main problem is that the optimization problem is non-convex and has many local minima, which often results in suboptimal performances. In this paper we propose to use a global optimization technique known as continuation to alleviate this problem. Compared to other algorithms minimizing the same objective function, our continuation method often leads to lower test errors.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Classification of natural scenes: Critical features revisited

Drewes, J., Wichmann, F., Gegenfurtner, K.

Journal of Vision, 6(6):561, 6th Annual Meeting of the Vision Sciences Society (VSS), June 2006 (poster)

Abstract
Human observers are capable of detecting animals within novel natural scenes with remarkable speed and accuracy. Despite the seeming complexity of such decisions it has been hypothesized that a simple global image feature, the relative abundance of high spatial frequencies at certain orientations, could underly such fast image classification (A. Torralba & A. Oliva, Network: Comput. Neural Syst., 2003). We successfully used linear discriminant analysis to classify a set of 11.000 images into “animal” and “non-animal” images based on their individual amplitude spectra only (Drewes, Wichmann, Gegenfurtner VSS 2005). We proceeded to sort the images based on the performance of our classifier, retaining only the best and worst classified 400 images (“best animals”, “best distractors” and “worst animals”, “worst distractors”). We used a Go/No-go paradigm to evaluate human performance on this subset of our images. Both reaction time and proportion of correctly classified images showed a significant effect of classification difficulty. Images more easily classified by our algorithm were also classified faster and better by humans, as predicted by the Torralba & Oliva hypothesis. We then equated the amplitude spectra of the 400 images, which, by design, reduced algorithmic performance to chance whereas human performance was only slightly reduced (cf. Wichmann, Rosas, Gegenfurtner, VSS 2005). Most importantly, the same images as before were still classified better and faster, suggesting that even in the original condition features other than specifics of the amplitude spectrum made particular images easy to classify, clearly at odds with the Torralba & Oliva hypothesis.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
MCMC inference in (Conditionally) Conjugate Dirichlet Process Gaussian Mixture Models

Rasmussen, C., Görür, D.

ICML Workshop on Learning with Nonparametric Bayesian Methods, June 2006 (talk)

Abstract
We compare the predictive accuracy of the Dirichlet Process Gaussian mixture models using conjugate and conditionally conjugate priors and show that better density models result from using the wider class of priors. We explore several MCMC schemes exploiting conditional conjugacy and show their computational merits on several multidimensional density estimation problems.

ei

Web [BibTex]

Web [BibTex]


no image
Trading Convexity for Scalability

Collobert, R., Sinz, F., Weston, J., Bottou, L.

In ICML 2006, pages: 201-208, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
Convex learning algorithms, such as Support Vector Machines (SVMs), are often seen as highly desirable because they offer strong practical properties and are amenable to theoretical analysis. However, in this work we show how non-convexity can provide scalability advantages over convexity. We show how concave-convex programming can be applied to produce (i) faster SVMs where training errors are no longer support vectors, and (ii) much faster Transductive SVMs.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Personalized handwriting recognition via biased regularization

Kienzle, W., Chellapilla, K.

In ICML 2006, pages: 457-464, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
We present a new approach to personalized handwriting recognition. The problem, also known as writer adaptation, consists of converting a generic (user-independent) recognizer into a personalized (user-dependent) one, which has an improved recognition rate for a particular user. The adaptation step usually involves user-specific samples, which leads to the fundamental question of how to fuse this new information with that captured by the generic recognizer. We propose adapting the recognizer by minimizing a regularized risk functional (a modified SVM) where the prior knowledge from the generic recognizer enters through a modified regularization term. The result is a simple personalization framework with very good practical properties. Experiments on a 100 class real-world data set show that the number of errors can be reduced by over 40% with as few as five user samples per character.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Sampling for non-conjugate infinite latent feature models

Görür, D., Rasmussen, C.

(Editors: Bernardo, J. M.), 8th Valencia International Meeting on Bayesian Statistics (ISBA), June 2006 (talk)

Abstract
Latent variable models are powerful tools to model the underlying structure in data. Infinite latent variable models can be defined using Bayesian nonparametrics. Dirichlet process (DP) models constitute an example of infinite latent class models in which each object is assumed to belong to one of the, mutually exclusive, infinitely many classes. Recently, the Indian buffet process (IBP) has been defined as an extension of the DP. IBP is a distribution over sparse binary matrices with infinitely many columns which can be used as a distribution for non-exclusive features. Inference using Markov chain Monte Carlo (MCMC) in conjugate IBP models has been previously described, however requiring conjugacy restricts the use of IBP. We describe an MCMC algorithm for non-conjugate IBP models. Modelling the choice behaviour is an important topic in psychology, economics and related fields. Elimination by Aspects (EBA) is a choice model that assumes each alternative has latent features with associated weights that lead to the observed choice outcomes. We formulate a non-parametric version of EBA by using IBP as the prior over the latent binary features. We infer the features of objects that lead to the choice data by using our sampling scheme for inference.

ei

PDF [BibTex]

PDF [BibTex]


no image
Deterministic annealing for semi-supervised kernel machines

Sindhwani, V., Keerthi, S., Chapelle, O.

In ICML 2006, pages: 841-848, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
An intuitive approach to utilizing unlabeled data in kernel-based classification algorithms is to simply treat the unknown labels as additional optimization variables. For margin-based loss functions, one can view this approach as attempting to learn low-density separators. However, this is a hard optimization problem to solve in typical semi-supervised settings where unlabeled data is abundant. The popular Transductive SVM algorithm is a label-switching-retraining procedure that is known to be susceptible to local minima. In this paper, we present a global optimization framework for semi-supervised Kernel machines where an easier problem is parametrically deformed to the original hard problem and minimizers are smoothly tracked. Our approach is motivated from deterministic annealing techniques and involves a sequence of convex optimization problems that are exactly and efficiently solved. We present empirical results on several synthetic and real world datasets that demonstrate the effectiveness of our approach.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Clustering Graphs by Weighted Substructure Mining

Tsuda, K., Kudo, T.

In ICML 2006, pages: 953-960, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
Graph data is getting increasingly popular in, e.g., bioinformatics and text processing. A main difficulty of graph data processing lies in the intrinsic high dimensionality of graphs, namely, when a graph is represented as a binary feature vector of indicators of all possible subgraphs, the dimensionality gets too large for usual statistical methods. We propose an efficient method for learning a binomial mixture model in this feature space. Combining the $ell_1$ regularizer and the data structure called DFS code tree, the MAP estimate of non-zero parameters are computed efficiently by means of the EM algorithm. Our method is applied to the clustering of RNA graphs, and is compared favorably with graph kernels and the spectral graph distance.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A Choice Model with Infinitely Many Latent Features

Görür, D., Jäkel, F., Rasmussen, C.

In ICML 2006, pages: 361-368, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
Elimination by aspects (EBA) is a probabilistic choice model describing how humans decide between several options. The options from which the choice is made are characterized by binary features and associated weights. For instance, when choosing which mobile phone to buy the features to consider may be: long lasting battery, color screen, etc. Existing methods for inferring the parameters of the model assume pre-specified features. However, the features that lead to the observed choices are not always known. Here, we present a non-parametric Bayesian model to infer the features of the options and the corresponding weights from choice data. We use the Indian buffet process (IBP) as a prior over the features. Inference using Markov chain Monte Carlo (MCMC) in conjugate IBP models has been previously described. The main contribution of this paper is an MCMC algorithm for the EBA model that can also be used in inference for other non-conjugate IBP models---this may broaden the use of IBP priors considerably.

ei

PostScript PDF Web DOI [BibTex]

PostScript PDF Web DOI [BibTex]


no image
Learning High-Order MRF Priors of Color Images

McAuley, J., Caetano, T., Smola, A., Franz, MO.

In ICML 2006, pages: 617-624, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
In this paper, we use large neighborhood Markov random fields to learn rich prior models of color images. Our approach extends the monochromatic Fields of Experts model (Roth and Blackwell, 2005) to color images. In the Fields of Experts model, the curse of dimensionality due to very large clique sizes is circumvented by parameterizing the potential functions according to a product of experts. We introduce several simplifications of the original approach by Roth and Black which allow us to cope with the increased clique size (typically 3x3x3 or 5x5x3 pixels) of color images. Experimental results are presented for image denoising which evidence improvements over state-of-the-art monochromatic image priors.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Inference with the Universum

Weston, J., Collobert, R., Sinz, F., Bottou, L., Vapnik, V.

In ICML 2006, pages: 1009-1016, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
WIn this paper we study a new framework introduced by Vapnik (1998) and Vapnik (2006) that is an alternative capacity concept to the large margin approach. In the particular case of binary classification, we are given a set of labeled examples, and a collection of "non-examples" that do not belong to either class of interest. This collection, called the Universum, allows one to encode prior knowledge by representing meaningful concepts in the same domain as the problem at hand. We describe an algorithm to leverage the Universum by maximizing the number of observed contradictions, and show experimentally that this approach delivers accuracy improvements over using labeled data alone.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
The pedestal effect is caused by off-frequency looking, not nonlinear transduction or contrast gain-control

Wichmann, F., Henning, B.

Journal of Vision, 6(6):194, 6th Annual Meeting of the Vision Sciences Society (VSS), June 2006 (poster)

Abstract
The pedestal or dipper effect is the large improvement in the detectabilty of a sinusoidal grating observed when the signal is added to a pedestal or masking grating having the signal‘s spatial frequency, orientation, and phase. The effect is largest with pedestal contrasts just above the ‘threshold‘ in the absence of a pedestal. We measured the pedestal effect in both broadband and notched masking noise---noise from which a 1.5- octave band centered on the signal and pedestal frequency had been removed. The pedestal effect persists in broadband noise, but almost disappears with notched noise. The spatial-frequency components of the notched noise that lie above and below the spatial frequency of the signal and pedestal prevent the use of information about changes in contrast carried in channels tuned to spatial frequencies that are very much different from that of the signal and pedestal. We conclude that the pedestal effect in the absence of notched noise results principally from the use of information derived from channels with peak sensitivities at spatial frequencies that are different from that of the signal and pedestal. Thus the pedestal or dipper effect is not a characteristic of individual spatial-frequency tuned channels.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Classifying EEG and ECoG Signals without Subject Training for Fast BCI Implementation: Comparison of Non-Paralysed and Completely Paralysed Subjects

Hill, N., Lal, T., Schröder, M., Hinterberger, T., Wilhelm, B., Nijboer, F., Mochty, U., Widman, G., Elger, C., Schölkopf, B., Kübler, A., Birbaumer, N.

IEEE Transactions on Neural Systems and Rehabilitation Engineering, 14(2):183-186, June 2006 (article)

Abstract
We summarize results from a series of related studies that aim to develop a motor-imagery-based brain-computer interface using a single recording session of EEG or ECoG signals for each subject. We apply the same experimental and analytical methods to 11 non-paralysed subjects (8 EEG, 3 ECoG), and to 5 paralysed subjects (4 EEG, 1 ECoG) who had been unable to communicate for some time. While it was relatively easy to obtain classifiable signals quickly from most of the non-paralysed subjects, it proved impossible to classify the signals obtained from the paralysed patients by the same methods. This highlights the fact that though certain BCI paradigms may work well with healthy subjects, this does not necessarily indicate success with the target user group. We outline possible reasons for this failure to transfer.

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
SCARNA: Fast and Accurate Structural Alignment of RNA Sequences by Matching Fixed-Length Stem Fragments

Tabei, Y., Tsuda, K., Kin, T., Asai, K.

Bioinformatics, 22(14):1723-1729, May 2006 (article)

Abstract
The functions of non-coding RNAs are strongly related to their secondary structures, but it is known that a secondary structure prediction of a single sequence is not reliable. Therefore, we have to collect similar RNA sequences with a common secondary structure for the analyses of a new non-coding RNA without knowing the exact secondary structure itself. Therefore, the sequence comparison in searching similar RNAs should consider not only their sequence similarities but their potential secondary structures. Sankoff‘s algorithm predicts the common secondary structures of the sequences, but it is computationally too expensive to apply to large-scale analyses. Because we often want to compare a large number of cDNA sequences or to search similar RNAs in the whole genome sequences, much faster algorithms are required. We propose a new method of comparing RNA sequences based on the structural alignments of the fixed-length fragments of the stem candidates. The implemented software, SCARNA (Stem Candidate Aligner for RNAs), is fast enough to apply to the long sequences in the large-scale analyses. The accuracy of the alignments is better or comparable to the much slower existing algorithms.

ei

PDF Web DOI [BibTex]


no image
Statistical Convergence of Kernel CCA

Fukumizu, K., Bach, F., Gretton, A.

In Advances in neural information processing systems 18, pages: 387-394, (Editors: Weiss, Y. , B. Schölkopf, J. Platt), MIT Press, Cambridge, MA, USA, Nineteenth Annual Conference on Neural Information Processing Systems (NIPS), May 2006 (inproceedings)

Abstract
While kernel canonical correlation analysis (kernel CCA) has been applied in many problems, the asymptotic convergence of the functions estimated from a finite sample to the true functions has not yet been established. This paper gives a rigorous proof of the statistical convergence of kernel CCA and a related method (NOCCO), which provides a theoretical justification for these methods. The result also gives a sufficient condition on the decay of the regularization coefficient in the methods to ensure convergence.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Products of "Edge-perts"

Gehler, PV., Welling, M.

In Advances in neural information processing systems 18, pages: 419-426, (Editors: Weiss, Y. , B. Schölkopf, J. Platt), MIT Press, Cambridge, MA, USA, Nineteenth Annual Conference on Neural Information Processing Systems (NIPS), May 2006 (inproceedings)

Abstract
Images represent an important and abundant source of data. Understanding their statistical structure has important applications such as image compression and restoration. In this paper we propose a particular kind of probabilistic model, dubbed the “products of edge-perts model” to describe the structure of wavelet transformed images. We develop a practical denoising algorithm based on a single edge-pert and show state-ofthe-art denoising performance on benchmark images.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Assessing Approximations for Gaussian Process Classification

Kuss, M., Rasmussen, C.

In Advances in neural information processing systems 18, pages: 699-706, (Editors: Weiss, Y. , B. Schölkopf, J. Platt), MIT Press, Cambridge, MA, USA, Nineteenth Annual Conference on Neural Information Processing Systems (NIPS), May 2006 (inproceedings)

Abstract
Gaussian processes are attractive models for probabilistic classification but unfortunately exact inference is analytically intractable. We compare Laplace‘s method and Expectation Propagation (EP) focusing on marginal likelihood estimates and predictive performance. We explain theoretically and corroborate empirically that EP is superior to Laplace. We also compare to a sophisticated MCMC scheme and show that EP is surprisingly accurate.

ei

PDF Web [BibTex]

PDF Web [BibTex]