Title | An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data. |
Publication Type | Journal Article |
Year of Publication | 2013 |
Authors | Wang, Y, Lu, J, Yu, J, Gibbs, RA, Yu, F |
Journal | Genome Res |
Volume | 23 |
Issue | 5 |
Pagination | 833-42 |
Date Published | 2013 May |
ISSN | 1549-5469 |
Keywords | Algorithms, Base Sequence, Genotype, Haplotypes, High-Throughput Nucleotide Sequencing, Human Genome Project, Humans, Polymorphism, Single Nucleotide |
Abstract | Next-generation sequencing is a powerful approach for discovering genetic variation. Sensitive variant calling and haplotype inference from population sequencing data remain challenging. We describe methods for high-quality discovery, genotyping, and phasing of SNPs for low-coverage (approximately 5×) sequencing of populations, implemented in a pipeline called SNPTools. Our pipeline contains several innovations that specifically address challenges caused by low-coverage population sequencing: (1) effective base depth (EBD), a nonparametric statistic that enables more accurate statistical modeling of sequencing data; (2) variance ratio scoring, a variance-based statistic that discovers polymorphic loci with high sensitivity and specificity; and (3) BAM-specific binomial mixture modeling (BBMM), a clustering algorithm that generates robust genotype likelihoods from heterogeneous sequencing data. Last, we develop an imputation engine that refines raw genotype likelihoods to produce high-quality phased genotypes/haplotypes. Designed for large population studies, SNPTools' input/output (I/O) and storage aware design leads to improved computing performance on large sequencing data sets. We apply SNPTools to the International 1000 Genomes Project (1000G) Phase 1 low-coverage data set and obtain genotyping accuracy comparable to that of SNP microarray. |
DOI | 10.1101/gr.146084.112 |
Alternate Journal | Genome Res |
PubMed ID | 23296920 |
PubMed Central ID | PMC3638139 |
Grant List | U01 HG005211 / HG / NHGRI NIH HHS / United States 2U54HG003273 / HG / NHGRI NIH HHS / United States U54 HG003273 / HG / NHGRI NIH HHS / United States F30 MH098571 / MH / NIMH NIH HHS / United States 5U01HG005211 / HG / NHGRI NIH HHS / United States |
An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data.
Similar Publications
DNA Methylation-Derived Immune Cell Proportions and Cancer Risk in Black Participants. Cancer Res Commun. 2024;4(10):2714-2723. | .
StratoMod: predicting sequencing and variant calling errors with interpretable machine learning. Commun Biol. 2024;7(1):1316. | .
Identification of allele-specific KIV-2 repeats and impact on Lp(a) measurements for cardiovascular disease risk. BMC Med Genomics. 2024;17(1):255. | .