Atlas 2

About the Software

Atlas2 logotypeAuthors: Danny Challis, Jin Yu, Uday Evani, and Fuli Yu

Atlas2 is a next-generation sequencing suite of variant analysis tools specializing in the separation of true SNPs and insertions and deletions (indels) from sequencing and mapping errors in Whole Exome Capture Sequencing (WECS) data.

SNPs may be called using the Atlas-SNP2 application and indels may be called using the Atlas-Indel2 application. The suite implements logistic regression models trained on validated WECS data to identify the true variants. There is a separate regression model for each sequencing platform. The suite currently supports the SOLiD, Illumina, and Roche 454 (SNPs only) platforms. Future versions of Atlas2 will include additional models for new sequencing platforms.

The Atlas2 suite takes a Binary sequence Alignment/Mapping (BAM) file (see
http://samtools.github.io/hts-specs/SAMv1.pdf) and a FASTA reference genome as input and produces variant calls in Variant Call Format (VCF) (see http://www.1000genomes.org/wiki/Analysis/vcf4.0).

In addition to variant calls, the application collects coverage information and uses simple heuristic cutoffs to estimate the likely genotype of each variant site.

System Requirements

SOLiD-SNP-caller is coded in C++ and must be compiled to run. For 64-bit Linux system this has already been done. If you have a different system take the following steps before running SOLiD-SNP-caller for the first time:

  • Navigate to the SOLiD-SNP-caller directory in a terminal

  • Run: make clean

  • Run: make

  • Compiling may take several minutes

Download

Current release

Version 1.4.3 (01-03-2013)

Version 1.4.3 software and documentation at SourceForge

Previous releases

Version 1.4.1 (09-10-2012)

Version 1.4.1 software and documentation at SourceForge

Version 1.0 (08-29-2011)

Version 1.0 software and documentation at SourceForge

License

Copyright © Baylor College of Medicine Human Genome Sequencing Center. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE

Publications

Challis D. etc. An integrative variant analysis suite for whole exome next-generation sequencing data. BMC Bioinformatics 2012, 13:8 doi:10.1186/1471-2105-13-8

Release History

Go to the next page for previous releases, including Atlas-Indel2, Atlas-SNP2, and Solid-genotyper.

Release History

Atlas-SNP2

a) For Illumina/454 platforms

Version 1.4.1 (09-06-2012)

  • Added always-include option

  • Added show-filtered option

  • Added version information and running commen in the VCF filter

Version 1.3 (08-18-2011)

  • Add a new option to call SNP on given regions or by chromosomes

  • Change the default maximum coverage for SNP calling to 1024

  • For pair-end data, add an option to use insertion size for mapping quality control

  • Improve the performance of crossmatch2SAM

Version 1.2 (01-18-2011)

This is a major upgrade of Atlas-SNP2

New features

  • One-stop running: take sorted BAM files and reference file as input and output SNP genotypes in VCF format

  • Use mapping quality score as alignment quality control

  • Use insertion size as mapping quality control for pair-end re-sequencing data

  • More filters are integrated for higher quality SNP calls

Performance

  • Whole genome SNP calling is doable on a typical PC with 4G memory now. In our test, it can process 1 million reads per 5 minutes for whole exome SNP calling only using one CPU core of Xeon 5520 and 4G memory

Bugs fixed and compatibility

  • More robust to alignment errors

  • Crossmatch2SAM tool is compatible to Ruby 1.9.X now

  • A few minor bugs

Version 1.1 (04-26-2010)

  • Added a heuristics-based genotyping module

  • Added a column of “numRefReads_afterFilter” in Atlas-SNP2 result file

  • Revised the header line in Atlas-SNP2 output file to be more explicit

  • Skipped duplicate reads masked in the BAM files when processing

  • Added an option for the user to setup the max number of alignments allowed to be piled up at a particular site

  • Printed more running information and more detailed alignments statistics

  • More robust to various alignments errors

  • Fixed several bugs

Version 1.0 (01-20-2010)

  • Added Illumina Platform support

  • All calculations are now based on required fields of SAM to get maximum compatibility

  • Added CIGAR and reference sequence test code

  • Used pileup number to calculate TotalCoverage

  • Improved performance

  • Migrated to Ruby 1.9

  • Many minor improvements

Draft release version 0.1 (12-10-2009)

  • Initial implementation

  • Initial support of SAM files

b) For SOLiD platform

Version 1.0 (08-18-2011)

  • Major SNP calling model update

  • Support GATK base quality re-calibrated BAM by using OQ tags

  • Call SNPs only on regions define in a bed format file

  • Output the SNP calls in vcf format directly

Draft release version 0.1 (01-26-2011)

  • Initial implementation


Previous release: Atlas-Indel2

Version 1.4.1

  • Added always-include option

  • Added show-filtered option

  • Fixed bug caused by passing a non-fasta reference genome

  • Fixed bug occasionally returning infinite P value in INFO column

  • Fixed bug caused by reads mapping past the end of the reference genome

  • Made Atlas2-Indel2 more tolerant of malformed SAM lines

Version 1.0

  • Updated SOLiD model and adjusted P cutoffs

  • Changed -P cutoff to apply to both 1bp insertions and deletions (rather than just 1bp deletions)

Version 0.3.1

  • Added options to use original base quality

  • Fixed bug that sometimes returned success exit code when there was a failure

  • Fixed bug in simple_genotyper that caused samples with exactly 0.05 variant read ratio to be 0/0

  • Fixed bug in simple genotyper that caused genotypes to occasionaly read ./.

  • Fixed bug in bed_filter that was filtering some on-target reads in very small target regions

Version 0.3

  • Updated SOLiD and Illumina models and recallibrated default settings

  • Implemented the ability to input a bed file to call only on-target indels

  • Switched from using z cutoffs to using p cutoffs

  • Modified 1bp p cutoff to only filter 1bp deletions

  • Fixed bug where the strand direction filter failed to be enabled

  • Added check for proper ruby version

  • Fixed bug that occasionally allows an indel quality of 110 (max should be 100)

  • Minor code-structure changes

Version 0.2.1

  • Added read_level model and improved site level model for SOLiD data

  • Adjusted default SOLiD z cutoff to 0.0 (to reflect new model)

  • Added check for proper ruby version

  • Minor codes structure changes

  • Added additional heuristic filter that allows for a stricter z cutoff for 1bp indels, very useful for SOLiD data

  • Integrated heuristic genotyping –implemented

  • Fixed bug where Atlas-Indel2 crashes if a BAM chromosome is not in the reference

  • Now will keep ‘chr’ in the chromosome label if it is in the BAM

  • The depreciated script "Atlas-Indel2-Illum-Exome.rb, has been removed. Please use Atlas-Indel2.rb with the -I flag instead.

Version 0.2 (02-09-2011)

  • Version 0.2 software

  • Version 0.2 documentation

  • Implemented regression model for SIOiD data. You must now specify a regression model -S or -l.

  • Renamed main script to Atlas-Indel.rb.

  • Modified Reference sequence class to allow for unsorted reference genomes.

  • Added the indel z to the info column of the VCF output (not included after running VCF printer).

  • Now echos all settings back onto the command line. Fixed a bug that caused loss of precision in the normalized variant square variable of the Illumina site model.

  • Fixed a bug in the depth coverage algorithm that caused reads not to be counted in total depth at the deleted sites.

  • Fixed the sample columns order to be compatible with vcfPrinter. Removed "x flagged lines skipped" message at end of run.

Version 0.1 (12-2011)


Previous release: Atlas-SNP2:

Version 1.1 (04-26-2010)

Get Adobe Reader

Version 1.0 (01-20-2010):

Get Adobe Reader

Version 0.1 (12-10-2009):


Solid-genotyper

Solid-genotyper is a SNP discovery and genotyping tool for high coverage SOLiD data. This tool combines a logistics regression model and heuristics methods to characterize systematic sequencing error and overcome mapping bias issue.

Draft release version 0.1 (01-26-2011):

Draft release version 0.1 (01-26-2011):

  • Initial implementation