Title | A SNP discovery method to assess variant allele probability from next-generation resequencing data. |
Publication Type | Journal Article |
Year of Publication | 2010 |
Authors | Shen, Y, Wan, Z, Coarfa, C, Drabek, R, Chen, L, Ostrowski, EA, Liu, Y, Weinstock, GM, Wheeler, DA, Gibbs, RA, Yu, F |
Journal | Genome Res |
Volume | 20 |
Issue | 2 |
Pagination | 273-80 |
Date Published | 2010 Feb |
ISSN | 1549-5469 |
Keywords | Algorithms, Alleles, Bayes Theorem, Computer Simulation, Genome, Bacterial, Logistic Models, Polymorphism, Single Nucleotide, Sequence Analysis, DNA, Software, Staphylococcus aureus |
Abstract | Accurate identification of genetic variants from next-generation sequencing (NGS) data is essential for immediate large-scale genomic endeavors such as the 1000 Genomes Project, and is crucial for further genetic analysis based on the discoveries. The key challenge in single nucleotide polymorphism (SNP) discovery is to distinguish true individual variants (occurring at a low frequency) from sequencing errors (often occurring at frequencies orders of magnitude higher). Therefore, knowledge of the error probabilities of base calls is essential. We have developed Atlas-SNP2, a computational tool that detects and accounts for systematic sequencing errors caused by context-related variables in a logistic regression model learned from training data sets. Subsequently, it estimates the posterior error probability for each substitution through a Bayesian formula that integrates prior knowledge of the overall sequencing error probability and the estimated SNP rate with the results from the logistic regression model for the given substitutions. The estimated posterior SNP probability can be used to distinguish true SNPs from sequencing errors. Validation results show that Atlas-SNP2 achieves a false-positive rate of lower than 10%, with an approximately 5% or lower false-negative rate. |
DOI | 10.1101/gr.096388.109 |
Alternate Journal | Genome Res |
PubMed ID | 20019143 |
PubMed Central ID | PMC2813483 |
Grant List | U54 HG003273 / HG / NHGRI NIH HHS / United States 1U01HG005211-0109 / HG / NHGRI NIH HHS / United States 5U54HG003273 / HG / NHGRI NIH HHS / United States |
A SNP discovery method to assess variant allele probability from next-generation resequencing data.
Similar Publications
Unveiling novel genetic variants in 370 challenging medically relevant genes using the long read sequencing data of 41 samples from 19 global populations. Mol Genet Genomics. 2024;299(1):65. | .
MethPhaser: methylation-based long-read haplotype phasing of human genomes. Nat Commun. 2024;15(1):5327. | .
Genetic diversity of 1,845 rhesus macaques improves genetic variation interpretation and identifies disease models. Nat Commun. 2024;15(1):5658. | .