Title | Curated variation benchmarks for challenging medically relevant autosomal genes. |
Publication Type | Journal Article |
Year of Publication | 2022 |
Authors | Wagner, J, Olson, ND, Harris, L, McDaniel, J, Cheng, H, Fungtammasan, A, Hwang, Y-C, Gupta, R, Wenger, AM, Rowell, WJ, Khan, ZM, Farek, J, Zhu, Y, Pisupati, A, Mahmoud, M, Xiao, C, Yoo, B, Sahraeian, SMohammad E, Miller, DE, Jáspez, D, Lorenzo-Salazar, JM, Muñoz-Barrera, A, Rubio-Rodríguez, LA, Flores, C, Narzisi, G, Evani, UShanker, Clarke, WE, Lee, J, Mason, CE, Lincoln, SE, Miga, KH, Ebbert, MTW, Shumate, A, Li, H, Chin, C-S, Zook, JM, Sedlazeck, FJ |
Journal | Nat Biotechnol |
Volume | 40 |
Issue | 5 |
Pagination | 672-680 |
Date Published | 2022 May |
ISSN | 1546-1696 |
Keywords | Genome, Human, Haplotypes, Humans, Sequence Analysis, DNA |
Abstract | The repetitive nature and complexity of some medically relevant genes poses a challenge for their accurate analysis in a clinical setting. The Genome in a Bottle Consortium has provided variant benchmark sets, but these exclude nearly 400 medically relevant genes due to their repetitiveness or polymorphic complexity. Here, we characterize 273 of these 395 challenging autosomal genes using a haplotype-resolved whole-genome assembly. This curated benchmark reports over 17,000 single-nucleotide variations, 3,600 insertions and deletions and 200 structural variations each for human genome reference GRCh37 and GRCh38 across HG002. We show that false duplications in either GRCh37 or GRCh38 result in reference-specific, missed variants for short- and long-read technologies in medically relevant genes, including CBS, CRYAA and KCNE1. When masking these false duplications, variant recall can improve from 8% to 100%. Forming benchmarks from a haplotype-resolved whole-genome assembly may become a prototype for future benchmarks covering the whole genome. |
DOI | 10.1038/s41587-021-01158-1 |
Alternate Journal | Nat Biotechnol |
PubMed ID | 35132260 |
PubMed Central ID | PMC9117392 |
Grant List | R01 AI151059 / AI / NIAID NIH HHS / United States R01 HG010040 / HG / NHGRI NIH HHS / United States R01 HG011274 / HG / NHGRI NIH HHS / United States 9999-NIST / ImNIST / Intramural NIST DOC / United States R01 AG068331 / AG / NIA NIH HHS / United States UM1 HG008898 / HG / NHGRI NIH HHS / United States R01 CA249054 / CA / NCI NIH HHS / United States U01 HG010961 / HG / NHGRI NIH HHS / United States U01 DA053941 / DA / NIDA NIH HHS / United States R01 MH117406 / MH / NIMH NIH HHS / United States P01 CA214274 / CA / NCI NIH HHS / United States U01 HG010971 / HG / NHGRI NIH HHS / United States L30 HG009212 / HG / NHGRI NIH HHS / United States R35 GM138636 / GM / NIGMS NIH HHS / United States |
Curated variation benchmarks for challenging medically relevant autosomal genes.
Similar Publications
DNA Methylation-Derived Immune Cell Proportions and Cancer Risk in Black Participants. Cancer Res Commun. 2024;4(10):2714-2723. | .
StratoMod: predicting sequencing and variant calling errors with interpretable machine learning. Commun Biol. 2024;7(1):1316. | .
Identification of allele-specific KIV-2 repeats and impact on Lp(a) measurements for cardiovascular disease risk. BMC Med Genomics. 2024;17(1):255. | .