Title | Benchmarking challenging small variants with linked and long reads. |
Publication Type | Journal Article |
Year of Publication | 2022 |
Authors | Wagner, J, Olson, ND, Harris, L, Khan, Z, Farek, J, Mahmoud, M, Stankovic, A, Kovacevic, V, Yoo, B, Miller, N, Rosenfeld, JA, Ni, B, Zarate, S, Kirsche, M, Aganezov, S, Schatz, MC, Narzisi, G, Byrska-Bishop, M, Clarke, W, Evani, US, Markello, C, Shafin, K, Zhou, X, Sidow, A, Bansal, V, Ebert, P, Marschall, T, Lansdorp, P, Hanlon, V, Mattsson, C-A, Barrio, AMartinez, Fiddes, IT, Xiao, C, Fungtammasan, A, Chin, C-S, Wenger, AM, Rowell, WJ, Sedlazeck, FJ, Carroll, A, Salit, M, Zook, JM |
Journal | Cell Genom |
Volume | 2 |
Issue | 5 |
Date Published | 2022 May |
ISSN | 2666-979X |
Abstract | Genome in a Bottle benchmarks are widely used to help validate clinical sequencing pipelines and develop variant calling and sequencing methods. Here we use accurate linked and long reads to expand benchmarks in 7 samples to include difficult-to-map regions and segmental duplications that are challenging for short reads. These benchmarks add more than 300,000 SNVs and 50,000 insertions or deletions (indels) and include 16% more exonic variants, many in challenging, clinically relevant genes not covered previously, such as . For HG002, we include 92% of the autosomal GRCh38 assembly while excluding regions problematic for benchmarking small variants, such as copy number variants, that should not have been in the previous version, which included 85% of GRCh38. It identifies eight times more false negatives in a short read variant call set relative to our previous benchmark. We demonstrate that this benchmark reliably identifies false positives and false negatives across technologies, enabling ongoing methods development. |
DOI | 10.1016/j.xgen.2022.100128 |
Alternate Journal | Cell Genom |
PubMed ID | 36452119 |
PubMed Central ID | PMC9706577 |
Grant List | 9999-NIST / ImNIST / Intramural NIST DOC / United States R01 HG010759 / HG / NHGRI NIH HHS / United States |