Pash 2.0: scaleable sequence anchoring for next-generation sequencing technologies.

TitlePash 2.0: scaleable sequence anchoring for next-generation sequencing technologies.
Publication TypeJournal Article
Year of Publication2008
AuthorsCoarfa, C, Milosavljevic, A
JournalPac Symp Biocomput
Date Published2008
KeywordsAlgorithms, Animals, Computational Biology, Databases, Genetic, Evolution, Molecular, Genome, Human, Humans, Sensitivity and Specificity, Sequence Alignment, Software

Many applications of next-generation sequencing technologies involve anchoring of a sequence fragment or a tag onto a corresponding position on a reference genome assembly. Positional Hashing method, implemented in the Pash 2.0 program, is specifically designed for the task of high-volume anchoring. In this article we present multi-diagonal gapped kmer collation and other improvements introduced in Pash 2.0 that further improve accuracy and speed of Positional Hashing. The goal of this article is to show that gapped kmer matching with cross-diagonal collation suffices for anchoring across close evolutionary distances and for the purpose of human resequencing. We propose a benchmark for evaluating the performance of anchoring programs that captures key parameters in specific applications, including duplicative structure of genomes of humans and other species. We demonstrate speedups of up to tenfold in large-scale anchoring experiments achieved by PASH 2.0 when compared to BLAT, another similarity search program frequently used for anchoring.

Alternate JournalPac Symp Biocomput
PubMed ID18229679
Grant List1R33CA114151-01A1 / CA / NCI NIH HHS / United States
5R01HG004009-02 / HG / NHGRI NIH HHS / United States