Exome variant discrepancies due to reference-genome differences.

TitleExome variant discrepancies due to reference-genome differences.
Publication TypeJournal Article
Year of Publication2021
AuthorsLi, H, Dawood, M, Khayat, MM, Farek, JR, Jhangiani, SN, Khan, ZM, Mitani, T, Coban-Akdemir, Z, Lupski, JR, Venner, E, Posey, JE, Sabo, A, Gibbs, RA
JournalAm J Hum Genet
Date Published2021 Jul 01
KeywordsCohort Studies, Exome, Genetic Diseases, Inborn, Genome, Human, Humans, Polymorphism, Single Nucleotide, Reference Values

Despite release of the GRCh38 human reference genome more than seven years ago, GRCh37 remains more widely used by most research and clinical laboratories. To date, no study has quantified the impact of utilizing different reference assemblies for the identification of variants associated with rare and common diseases from large-scale exome-sequencing data. By calling variants on both the GRCh37 and GRCh38 references, we identified single-nucleotide variants (SNVs) and insertion-deletions (indels) in 1,572 exomes from participants with Mendelian diseases and their family members. We found that a total of 1.5% of SNVs and 2.0% of indels were discordant when different references were used. Notably, 76.6% of the discordant variants were clustered within discrete discordant reference patches (DISCREPs) comprising only 0.9% of loci targeted by exome sequencing. These DISCREPs were enriched for genomic elements including segmental duplications, fix patch sequences, and loci known to contain alternate haplotypes. We identified 206 genes significantly enriched for discordant variants, most of which were in DISCREPs and caused by multi-mapped reads on the reference assembly that lacked the variant call. Among these 206 genes, eight are implicated in known Mendelian diseases and 53 are associated with common phenotypes from genome-wide association studies. In addition, variant interpretations could also be influenced by the reference after lifting-over variant loci to another assembly. Overall, we identified genes and genomic loci affected by reference assembly choice, including genes associated with Mendelian disorders and complex human diseases that require careful evaluation in both research and clinical applications.

Alternate JournalAm J Hum Genet
PubMed ID34129815
PubMed Central IDPMC8322936
Grant ListR35 NS105078 / NS / NINDS NIH HHS / United States