Title | Quality control and integration of genotypes from two calling pipelines for whole genome sequence data in the Alzheimer's disease sequencing project. |
Publication Type | Journal Article |
Year of Publication | 2019 |
Authors | Naj, AC, Lin, H, Vardarajan, BN, White, S, Lancour, D, Ma, Y, Schmidt, M, Sun, F, Butkiewicz, M, Bush, WS, Kunkle, BW, Malamon, J, Amin, N, Choi, SHoan, Hamilton-Nelson, KL, van der Lee, SJ, Gupta, N, Koboldt, DC, Saad, M, Wang, B, Nato, AQ, Sohi, HK, Kuzma, A, San Wang, L-, L Cupples, A, van Duijn, C, Seshadri, S, Schellenberg, GD, Boerwinkle, E, Bis, JC, Dupuis, J, Salerno, WJ, Wijsman, EM, Martin, ER, DeStefano, AL |
Corporate Authors | Alzheimer's Disease Sequencing Project (ADSP) |
Journal | Genomics |
Volume | 111 |
Issue | 4 |
Pagination | 808-818 |
Date Published | 2019 Jul |
ISSN | 1089-8646 |
Keywords | Algorithms, Alzheimer Disease, Female, Genome-Wide Association Study, Genotype, Genotyping Techniques, Humans, Male, Polymorphism, Genetic, Quality Control, Whole Genome Sequencing |
Abstract | The Alzheimer's Disease Sequencing Project (ADSP) performed whole genome sequencing (WGS) of 584 subjects from 111 multiplex families at three sequencing centers. Genotype calling of single nucleotide variants (SNVs) and insertion-deletion variants (indels) was performed centrally using GATK-HaplotypeCaller and Atlas V2. The ADSP Quality Control (QC) Working Group applied QC protocols to project-level variant call format files (VCFs) from each pipeline, and developed and implemented a novel protocol, termed "consensus calling," to combine genotype calls from both pipelines into a single high-quality set. QC was applied to autosomal bi-allelic SNVs and indels, and included pipeline-recommended QC filters, variant-level QC, and sample-level QC. Low-quality variants or genotypes were excluded, and sample outliers were noted. Quality was assessed by examining Mendelian inconsistencies (MIs) among 67 parent-offspring pairs, and MIs were used to establish additional genotype-specific filters for GATK calls. After QC, 578 subjects remained. Pipeline-specific QC excluded ~12.0% of GATK and 14.5% of Atlas SNVs. Between pipelines, ~91% of SNV genotypes across all QCed variants were concordant; 4.23% and 4.56% of genotypes were exclusive to Atlas or GATK, respectively; the remaining ~0.01% of discordant genotypes were excluded. For indels, variant-level QC excluded ~36.8% of GATK and 35.3% of Atlas indels. Between pipelines, ~55.6% of indel genotypes were concordant; while 10.3% and 28.3% were exclusive to Atlas or GATK, respectively; and ~0.29% of discordant genotypes were. The final WGS consensus dataset contains 27,896,774 SNVs and 3,133,926 indels and is publicly available. |
DOI | 10.1016/j.ygeno.2018.05.004 |
Alternate Journal | Genomics |
PubMed ID | 29857119 |
PubMed Central ID | PMC6397097 |
Grant List | R01 AG054060 / AG / NIA NIH HHS / United States R01 AG054076 / AG / NIA NIH HHS / United States U54 HG003067 / HG / NHGRI NIH HHS / United States U24 AG021886 / AG / NIA NIH HHS / United States P50 AG008702 / AG / NIA NIH HHS / United States U01 AG016976 / AG / NIA NIH HHS / United States P50 AG005136 / AG / NIA NIH HHS / United States R01 HL105756 / HL / NHLBI NIH HHS / United States U24 AG041689 / AG / NIA NIH HHS / United States R01 AG033193 / AG / NIA NIH HHS / United States HHSN268201100009C / HL / NHLBI NIH HHS / United States P30 AG010129 / AG / NIA NIH HHS / United States HHSN268201100006C / HL / NHLBI NIH HHS / United States HHSN268201100010C / HL / NHLBI NIH HHS / United States U01 AG049505 / AG / NIA NIH HHS / United States HHSN268201100008C / HL / NHLBI NIH HHS / United States RC2 HL102419 / HL / NHLBI NIH HHS / United States U01 AG058654 / AG / NIA NIH HHS / United States U54 AG052427 / AG / NIA NIH HHS / United States R01 NS017950 / NS / NINDS NIH HHS / United States HHSN268201100007C / HL / NHLBI NIH HHS / United States U24 AG072122 / AG / NIA NIH HHS / United States U01 AG049507 / AG / NIA NIH HHS / United States U01 AG032984 / AG / NIA NIH HHS / United States HHSN268201100011C / HL / NHLBI NIH HHS / United States UF1 AG047133 / AG / NIA NIH HHS / United States U54 HG003273 / HG / NHGRI NIH HHS / United States HHSN268201100012C / HL / NHLBI NIH HHS / United States U01 AG049508 / AG / NIA NIH HHS / United States HHSN268201100005C / HL / NHLBI NIH HHS / United States U01 AG062602 / AG / NIA NIH HHS / United States P30 AG066546 / AG / NIA NIH HHS / United States U54 HG003079 / HG / NHGRI NIH HHS / United States U01 AG052409 / AG / NIA NIH HHS / United States |
Quality control and integration of genotypes from two calling pipelines for whole genome sequence data in the Alzheimer's disease sequencing project.
Similar Publications
Single cell dual-omic atlas of the human developing retina. Nat Commun. 2024;15(1):6792. | .
Loss of symmetric cell division of apical neural progenitors drives DENND5A-related developmental and epileptic encephalopathy. Nat Commun. 2024;15(1):7239. | .
The DNA methylome of pediatric brain tumors appears shaped by structural variation and predicts survival. Nat Commun. 2024;15(1):6775. | .