About the Software
Authors: Danny Challis, Jin Yu, Uday Evani, and Fuli Yu
Atlas2 is a next-generation sequencing suite of variant analysis tools specializing in the separation of true SNPs and insertions and deletions (indels) from sequencing and mapping errors in Whole Exome Capture Sequencing (WECS) data.
SNPs may be called using the Atlas-SNP2 application and indels may be called using the Atlas-Indel2 application. The suite implements logistic regression models trained on validated WECS data to identify the true variants. There is a separate regression model for each sequencing platform. The suite currently supports the SOLiD, Illumina, and Roche 454 (SNPs only) platforms. Future versions of Atlas2 will include additional models for new sequencing platforms.
The Atlas2 suite takes a Binary sequence Alignment/Mapping (BAM) file (see
http://samtools.github.io/hts-specs/SAMv1.pdf) and a FASTA reference genome as input and produces variant calls in Variant Call Format (VCF) (see http://www.1000genomes.org/wiki/Analysis/vcf4.0).
In addition to variant calls, the application collects coverage information and uses simple heuristic cutoffs to estimate the likely genotype of each variant site.
System Requirements
-
Unix-like operation systems
-
Ruby 1.9.1: https://www.ruby-lang.org/en/downloads/
-
If you do not have a 64-bit Linux system, a C++ compiler and Make must be installed
-
If needed, the external mapping tools—BLAT and cross_match—can be obtained from http://users.soe.ucsc.edu/~kent/src/ and http://www.phrap.org respectively.
SOLiD-SNP-caller is coded in C++ and must be compiled to run. For 64-bit Linux system this has already been done. If you have a different system take the following steps before running SOLiD-SNP-caller for the first time:
-
Navigate to the SOLiD-SNP-caller directory in a terminal
-
Run: make clean
-
Run: make
-
Compiling may take several minutes
Download
Current release
Version 1.4.3 (01-03-2013)
Version 1.4.3 software and documentation at SourceForge
Previous releases
Version 1.4.1 (09-10-2012)
Version 1.4.1 software and documentation at SourceForge
Version 1.0 (08-29-2011)
Version 1.0 software and documentation at SourceForge
License
Copyright © Baylor College of Medicine Human Genome Sequencing Center. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
-
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
-
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE
Publications
Challis D. etc. An integrative variant analysis suite for whole exome next-generation sequencing data. BMC Bioinformatics 2012, 13:8 doi:10.1186/1471-2105-13-8
Release History
Go to the next page for previous releases, including Atlas-Indel2, Atlas-SNP2, and Solid-genotyper.
Release History
Atlas-SNP2
a) For Illumina/454 platforms
Version 1.4.1 (09-06-2012)
-
Added always-include option
-
Added show-filtered option
-
Added version information and running commen in the VCF filter
Version 1.3 (08-18-2011)
-
Add a new option to call SNP on given regions or by chromosomes
-
Change the default maximum coverage for SNP calling to 1024
-
For pair-end data, add an option to use insertion size for mapping quality control
-
Improve the performance of crossmatch2SAM
Version 1.2 (01-18-2011)
This is a major upgrade of Atlas-SNP2
New features
-
One-stop running: take sorted BAM files and reference file as input and output SNP genotypes in VCF format
-
Use mapping quality score as alignment quality control
-
Use insertion size as mapping quality control for pair-end re-sequencing data
-
More filters are integrated for higher quality SNP calls
Performance
-
Whole genome SNP calling is doable on a typical PC with 4G memory now. In our test, it can process 1 million reads per 5 minutes for whole exome SNP calling only using one CPU core of Xeon 5520 and 4G memory
Bugs fixed and compatibility
-
More robust to alignment errors
-
Crossmatch2SAM tool is compatible to Ruby 1.9.X now
-
A few minor bugs
Version 1.1 (04-26-2010)
-
Added a heuristics-based genotyping module
-
Added a column of “numRefReads_afterFilter” in Atlas-SNP2 result file
-
Revised the header line in Atlas-SNP2 output file to be more explicit
-
Skipped duplicate reads masked in the BAM files when processing
-
Added an option for the user to setup the max number of alignments allowed to be piled up at a particular site
-
Printed more running information and more detailed alignments statistics
-
More robust to various alignments errors
-
Fixed several bugs
Version 1.0 (01-20-2010)
-
Added Illumina Platform support
-
All calculations are now based on required fields of SAM to get maximum compatibility
-
Added CIGAR and reference sequence test code
-
Used pileup number to calculate TotalCoverage
-
Improved performance
-
Migrated to Ruby 1.9
-
Many minor improvements
Draft release version 0.1 (12-10-2009)
-
Initial implementation
-
Initial support of SAM files
b) For SOLiD platform
Version 1.0 (08-18-2011)
-
Major SNP calling model update
-
Support GATK base quality re-calibrated BAM by using OQ tags
-
Call SNPs only on regions define in a bed format file
-
Output the SNP calls in vcf format directly
Draft release version 0.1 (01-26-2011)
-
Initial implementation
Previous release: Atlas-Indel2
Version 1.4.1
-
Added always-include option
-
Added show-filtered option
-
Fixed bug caused by passing a non-fasta reference genome
-
Fixed bug occasionally returning infinite P value in INFO column
-
Fixed bug caused by reads mapping past the end of the reference genome
-
Made Atlas2-Indel2 more tolerant of malformed SAM lines
Version 1.0
-
Updated SOLiD model and adjusted P cutoffs
-
Changed -P cutoff to apply to both 1bp insertions and deletions (rather than just 1bp deletions)
Version 0.3.1
-
Added options to use original base quality
-
Fixed bug that sometimes returned success exit code when there was a failure
-
Fixed bug in simple_genotyper that caused samples with exactly 0.05 variant read ratio to be 0/0
-
Fixed bug in simple genotyper that caused genotypes to occasionaly read ./.
-
Fixed bug in bed_filter that was filtering some on-target reads in very small target regions
Version 0.3
-
Updated SOLiD and Illumina models and recallibrated default settings
-
Implemented the ability to input a bed file to call only on-target indels
-
Switched from using z cutoffs to using p cutoffs
-
Modified 1bp p cutoff to only filter 1bp deletions
-
Fixed bug where the strand direction filter failed to be enabled
-
Added check for proper ruby version
-
Fixed bug that occasionally allows an indel quality of 110 (max should be 100)
-
Minor code-structure changes
Version 0.2.1
-
Added read_level model and improved site level model for SOLiD data
-
Adjusted default SOLiD z cutoff to 0.0 (to reflect new model)
-
Added check for proper ruby version
-
Minor codes structure changes
-
Added additional heuristic filter that allows for a stricter z cutoff for 1bp indels, very useful for SOLiD data
-
Integrated heuristic genotyping –implemented
-
Fixed bug where Atlas-Indel2 crashes if a BAM chromosome is not in the reference
-
Now will keep ‘chr’ in the chromosome label if it is in the BAM
-
The depreciated script "Atlas-Indel2-Illum-Exome.rb, has been removed. Please use Atlas-Indel2.rb with the -I flag instead.
Version 0.2 (02-09-2011)
-
Implemented regression model for SIOiD data. You must now specify a regression model -S or -l.
-
Renamed main script to Atlas-Indel.rb.
-
Modified Reference sequence class to allow for unsorted reference genomes.
-
Added the indel z to the info column of the VCF output (not included after running VCF printer).
-
Now echos all settings back onto the command line. Fixed a bug that caused loss of precision in the normalized variant square variable of the Illumina site model.
-
Fixed a bug in the depth coverage algorithm that caused reads not to be counted in total depth at the deleted sites.
-
Fixed the sample columns order to be compatible with vcfPrinter. Removed "x flagged lines skipped" message at end of run.
Version 0.1 (12-2011)
Previous release: Atlas-SNP2:
Version 1.1 (04-26-2010)
Version 1.0 (01-20-2010):
Version 0.1 (12-10-2009):
Solid-genotyper
Solid-genotyper is a SNP discovery and genotyping tool for high coverage SOLiD data. This tool combines a logistics regression model and heuristics methods to characterize systematic sequencing error and overcome mapping bias issue.
Draft release version 0.1 (01-26-2011):
Draft release version 0.1 (01-26-2011):
-
Initial implementation