About the Software
Author: K. James Durbin (Atlas Genome Tools Group)
Copyright © Baylor College of Medicine Human Genome Sequencing Center. All rights reserved.
Bang is a fast repeat-supressing search tool, written primarily for anchoring reads to genomes but also adaptable to other genome scale comparison problems.
Bang's key features are efficiently coded k-mer hashing which gives it exceptional speed, a k-mer thinning scheme that allows it to run large jobs in a comparatively small RAM footprint, and an ability to use k-mer distributions to effectively perform de novo repeat filtering of matches. Bang is also designed to be convenient to use with even very large data sets (e.g. no arbitrary limits on the number of query sequences like megablast).
An example of the kind of performance you can expect from bang is given in the table below.
|bang||120 MB||500K (~450 MB)||~400 MB||10 minutes||0.981||0.973|
|megablast||120 MB||500K (~450 MB)||~300 MB||60 minutes||0.999||0.265|
|ssaha||120 MB||500K (~450 MB)||? (ran out of memory on 4 GB machine)|
This table gives out-of-the-box performance characteristics and is intended only to give you a relative view of bang's characteristics. Suitability for any particular application is dependent on many factors.
To further illustrate the blinding speed of bang, consider a recent read-mapping task we performed. In this task, 2.5 million human re-sequencing reads were mapped to the human genome using eight cluster CPUs in only 2.5 wall-clock hours, or a total of 18 CPU-hours. Peak RAM usage was 600 MB. Said another way, we could have done a complete mapping of 1x coverage of reads to the human genome on a single laptop in less than a day.
Binaries are available for Solaris, Linux x86, and OS X.
Compilation from the source requires the CBT++ library.
Please read the terms of the license agreement carefully.
This software is made available for internal research purposes only by individuals or employees of an academic institution or private company. Commercial production use or sale of software incorporating these libraries is prohibited. Commercial use licenses are available.
No part of this software, or modifications thereof, may be redistributed for any purpose to any other company, person, or individual, without prior written permission from the author.
This software is provided AS IS. Baylor College of Medicine assumes no responsibility or liability for damages of any kind that may result, directly or indirectly, from the use of this software.
Contact us at email@example.com for information about obtaining a commercial license for this software.
By downloading this software you explicitly agree to the terms of the license agreement.
For additional information, contact us at firstname.lastname@example.org