PALMA: Perfect Alignments using Large Margin Algorithms
PDF WebDespite many years of research on how to properly align sequences in the presence of sequencing errors, alternative splicing and micro-exons, the correct alignment of mRNA sequences to genomic DNA is still a challenging task. We present a novel approach based on large margin learning that combines kernel based splice site predictions with common sequence alignment techniques. By solving a convex optimization problem, our algorithm -- called PALMA -- tunes the parameters of the model such that the true alignment scores higher than all other alignments. In an experimental study on the alignments of mRNAs containing artificially generated micro-exons, we show that our algorithm drastically outperforms all other methods: It perfectly aligns all 4358 sequences on an hold-out set, while the best other method misaligns at least 90 of them. Moreover, our algorithm is very robust against noise in the query sequence: when deleting, inserting, or mutating up to 50% of the query sequence, it still aligns 95% of all sequences correctly, while other methods achieve less than 36% accuracy. For datasets, additional results and a stand-alone alignment tool see http://www.fml.mpg.de/raetsch/projects/palma.
| Author(s): | Rätsch, G. and Hepp, B. and Schulze, U. and Ong, CS. |
| Links: | |
| Book Title: | GCB 2006 |
| Journal: | Proceedings of the German Conference on Bioinformatics 2006 (GCB 2006) |
| Pages: | 104-113 |
| Year: | 2006 |
| Month: | September |
| Day: | 0 |
| Editors: | Huson, D. , O. Kohlbacher, A. Lupas, K. Nieselt, A. Zell |
| Publisher: | Gesellschaft f{\"u}r Informatik |
| BibTeX Type: | Conference Paper (inproceedings) |
| Address: | Bonn, Germany |
| Event Name: | German Conference on Bioinformatics 2006 |
| Event Place: | Tübingen, Germany |
| Digital: | 0 |
| Electronic Archiving: | grant_archive |
| Language: | en |
| Organization: | Max-Planck-Gesellschaft |
| School: | Biologische Kybernetik |
BibTeX
@inproceedings{4157,
title = {PALMA: Perfect Alignments using Large Margin Algorithms},
journal = {Proceedings of the German Conference on Bioinformatics 2006 (GCB 2006)},
booktitle = {GCB 2006},
abstract = {Despite many years of research on how to properly align sequences in
the presence of sequencing errors, alternative splicing and
micro-exons, the correct alignment of mRNA sequences to genomic DNA is
still a challenging task. We present a novel approach based on large
margin learning that combines kernel based splice site predictions
with common sequence alignment techniques. By solving a convex
optimization problem, our algorithm -- called PALMA -- tunes the
parameters of the model such that the true alignment scores higher
than all other alignments. In an experimental study on the alignments
of mRNAs containing artificially generated micro-exons, we show that
our algorithm drastically outperforms all other methods: It perfectly
aligns all 4358 sequences on an hold-out set, while the best other
method misaligns at least 90 of them. Moreover, our algorithm is very
robust against noise in the query sequence: when deleting, inserting,
or mutating up to 50% of the query sequence, it still aligns 95% of
all sequences correctly, while other methods achieve less than 36%
accuracy. For datasets, additional results and a stand-alone
alignment tool see
http://www.fml.mpg.de/raetsch/projects/palma.},
pages = {104-113},
editors = {Huson, D. , O. Kohlbacher, A. Lupas, K. Nieselt, A. Zell},
publisher = {Gesellschaft f{\"u}r Informatik},
organization = {Max-Planck-Gesellschaft},
school = {Biologische Kybernetik},
address = {Bonn, Germany},
month = sep,
year = {2006},
author = {R{\"a}tsch, G. and Hepp, B. and Schulze, U. and Ong, CS.},
month_numeric = {9}
}
