Empirical Inference Conference Paper 2006

PALMA: Perfect Alignments using Large Margin Algorithms

no image
Empirical Inference

Despite many years of research on how to properly align sequences in the presence of sequencing errors, alternative splicing and micro-exons, the correct alignment of mRNA sequences to genomic DNA is still a challenging task. We present a novel approach based on large margin learning that combines kernel based splice site predictions with common sequence alignment techniques. By solving a convex optimization problem, our algorithm -- called PALMA -- tunes the parameters of the model such that the true alignment scores higher than all other alignments. In an experimental study on the alignments of mRNAs containing artificially generated micro-exons, we show that our algorithm drastically outperforms all other methods: It perfectly aligns all 4358 sequences on an hold-out set, while the best other method misaligns at least 90 of them. Moreover, our algorithm is very robust against noise in the query sequence: when deleting, inserting, or mutating up to 50% of the query sequence, it still aligns 95% of all sequences correctly, while other methods achieve less than 36% accuracy. For datasets, additional results and a stand-alone alignment tool see http://www.fml.mpg.de/raetsch/projects/palma.

Author(s): Rätsch, G. and Hepp, B. and Schulze, U. and Ong, CS.
Book Title: GCB 2006
Journal: Proceedings of the German Conference on Bioinformatics 2006 (GCB 2006)
Pages: 104-113
Year: 2006
Month: September
Day: 0
Editors: Huson, D. , O. Kohlbacher, A. Lupas, K. Nieselt, A. Zell
Publisher: Gesellschaft f{\"u}r Informatik
Bibtex Type: Conference Paper (inproceedings)
Address: Bonn, Germany
Event Name: German Conference on Bioinformatics 2006
Event Place: Tübingen, Germany
Digital: 0
Electronic Archiving: grant_archive
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik
Links:

BibTex

@inproceedings{4157,
  title = {PALMA: Perfect Alignments using Large Margin Algorithms},
  journal = {Proceedings of the German Conference on Bioinformatics 2006 (GCB 2006)},
  booktitle = {GCB 2006},
  abstract = {Despite many years of research on how to properly align sequences in
  the presence of sequencing errors, alternative splicing and
  micro-exons, the correct alignment of mRNA sequences to genomic DNA is
  still a challenging task.  We present a novel approach based on large
  margin learning that combines kernel based splice site predictions
  with common sequence alignment techniques. By solving a convex
  optimization problem, our algorithm -- called PALMA -- tunes the
  parameters of the model such that the true alignment scores higher
  than all other alignments. In an experimental study on the alignments
  of mRNAs containing artificially generated micro-exons, we show that
  our algorithm drastically outperforms all other methods: It perfectly
  aligns all 4358 sequences on an hold-out set, while the best other
  method misaligns at least 90 of them. Moreover, our algorithm is very
  robust against noise in the query sequence: when deleting, inserting,
  or mutating up to 50% of the query sequence, it still aligns 95% of
  all sequences correctly, while other methods achieve less than 36%
  accuracy.  For datasets, additional results and a stand-alone
  alignment tool see
  http://www.fml.mpg.de/raetsch/projects/palma.},
  pages = {104-113},
  editors = {Huson, D. , O. Kohlbacher, A. Lupas, K. Nieselt, A. Zell},
  publisher = {Gesellschaft f{\"u}r Informatik},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  address = {Bonn, Germany},
  month = sep,
  year = {2006},
  slug = {4157},
  author = {R{\"a}tsch, G. and Hepp, B. and Schulze, U. and Ong, CS.},
  month_numeric = {9}
}