Title | An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data. |
Publication Type | Journal Article |
Year of Publication | 2013 |
Authors | Wang, Y, Lu, J, Yu, J, Gibbs, RA, Yu, F |
Journal | Genome Res |
Volume | 23 |
Issue | 5 |
Pagination | 833-42 |
Date Published | 2013 May |
ISSN | 1549-5469 |
Keywords | Algorithms, Base Sequence, Genotype, Haplotypes, High-Throughput Nucleotide Sequencing, Human Genome Project, Humans, Polymorphism, Single Nucleotide |
Abstract | Next-generation sequencing is a powerful approach for discovering genetic variation. Sensitive variant calling and haplotype inference from population sequencing data remain challenging. We describe methods for high-quality discovery, genotyping, and phasing of SNPs for low-coverage (approximately 5×) sequencing of populations, implemented in a pipeline called SNPTools. Our pipeline contains several innovations that specifically address challenges caused by low-coverage population sequencing: (1) effective base depth (EBD), a nonparametric statistic that enables more accurate statistical modeling of sequencing data; (2) variance ratio scoring, a variance-based statistic that discovers polymorphic loci with high sensitivity and specificity; and (3) BAM-specific binomial mixture modeling (BBMM), a clustering algorithm that generates robust genotype likelihoods from heterogeneous sequencing data. Last, we develop an imputation engine that refines raw genotype likelihoods to produce high-quality phased genotypes/haplotypes. Designed for large population studies, SNPTools' input/output (I/O) and storage aware design leads to improved computing performance on large sequencing data sets. We apply SNPTools to the International 1000 Genomes Project (1000G) Phase 1 low-coverage data set and obtain genotyping accuracy comparable to that of SNP microarray. |
DOI | 10.1101/gr.146084.112 |
Alternate Journal | Genome Res |
PubMed ID | 23296920 |
PubMed Central ID | PMC3638139 |
Grant List | U01 HG005211 / HG / NHGRI NIH HHS / United States 2U54HG003273 / HG / NHGRI NIH HHS / United States U54 HG003273 / HG / NHGRI NIH HHS / United States F30 MH098571 / MH / NIMH NIH HHS / United States 5U01HG005211 / HG / NHGRI NIH HHS / United States |
An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data.
Similar Publications
Inverted triplications formed by iterative template switches generate structural variant diversity at genomic disorder loci. Cell Genom. 2024;4(7):100590. | .
Unveiling novel genetic variants in 370 challenging medically relevant genes using the long read sequencing data of 41 samples from 19 global populations. Mol Genet Genomics. 2024;299(1):65. | .
Genetic diversity of 1,845 rhesus macaques improves genetic variation interpretation and identifies disease models. Nat Commun. 2024;15(1):5658. | .