Title | Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation. |
Publication Type | Journal Article |
Year of Publication | 2023 |
Authors | Kolmogorov, M, Billingsley, KJ, Mastoras, M, Meredith, M, Monlong, J, Lorig-Roach, R, Asri, M, Jerez, PAlvarez, Malik, L, Dewan, R, Reed, X, Genner, RM, Daida, K, Behera, S, Shafin, K, Pesout, T, Prabakaran, J, Carnevali, P, Yang, J, Rhie, A, Scholz, SW, Traynor, BJ, Miga, KH, Jain, M, Timp, W, Phillippy, AM, Chaisson, M, Sedlazeck, FJ, Blauwendraat, C, Paten, B |
Corporate Authors | North American Brain Expression Consortium (NABEC) |
Journal | bioRxiv |
Date Published | 2023 Apr 05 |
Abstract | Long-read sequencing technologies substantially overcome the limitations of short-reads but to date have not been considered as feasible replacement at scale due to a combination of being too expensive, not scalable enough, or too error-prone. Here, we develop an efficient and scalable wet lab and computational protocol for Oxford Nanopore Technologies (ONT) long-read sequencing that seeks to provide a genuine alternative to short-reads for large-scale genomics projects. We applied our protocol to cell lines and brain tissue samples as part of a pilot project for the NIH Center for Alzheimer's and Related Dementias (CARD). Using a single PromethION flow cell, we can detect SNPs with F1-score better than Illumina short-read sequencing. Small indel calling remains to be difficult inside homopolymers and tandem repeats, but is comparable to Illumina calls elsewhere. Further, we can discover structural variants with F1-score comparable to state-of the-art methods involving Pacific Biosciences HiFi sequencing and trio information (but at a lower cost and greater throughput). Using ONT based phasing, we can then combine and phase small and structural variants at megabase scales. Our protocol also produces highly accurate, haplotype-specific methylation calls. Overall, this makes large-scale long-read sequencing projects feasible; the protocol is currently being used to sequence thousands of brain-based genomes as a part of the NIH CARD initiative. We provide the protocol and software as open-source integrated pipelines for generating phased variant calls and assemblies. |
DOI | 10.1101/2023.01.12.523790 |
Alternate Journal | bioRxiv |
PubMed ID | 36711673 |
PubMed Central ID | PMC9882142 |
Grant List | U01 HG010961 / HG / NHGRI NIH HHS / United States P01 AG000538 / AG / NIA NIH HHS / United States P30 AG072980 / AG / NIA NIH HHS / United States OT3 HL142481 / HL / NHLBI NIH HHS / United States OT2 OD033761 / OD / NIH HHS / United States R01 HG011274 / HG / NHGRI NIH HHS / United States U24 HG010262 / HG / NHGRI NIH HHS / United States U24 HG011853 / HG / NHGRI NIH HHS / United States ZIA NS003154 / ImNIH / Intramural NIH HHS / United States R01 HG010485 / HG / NHGRI NIH HHS / United States U24 NS072026 / NS / NINDS NIH HHS / United States P30 AG019610 / AG / NIA NIH HHS / United States |