Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome.

TitleAccurate circular consensus long-read sequencing improves variant detection and assembly of a human genome.
Publication TypeJournal Article
Year of Publication2019
AuthorsWenger, AM, Peluso, P, Rowell, WJ, Chang, P-C, Hall, RJ, Concepcion, GT, Ebler, J, Fungtammasan, A, Kolesnikov, A, Olson, ND, Töpfer, A, Alonge, M, Mahmoud, M, Qian, Y, Chin, C-S, Phillippy, AM, Schatz, MC, Myers, G, DePristo, MA, Ruan, J, Marschall, T, Sedlazeck, FJ, Zook, JM, Li, H, Koren, S, Carroll, A, Rank, DR, Hunkapiller, MW
JournalNat Biotechnol
Volume37
Issue10
Pagination1155-1162
Date Published2019 Oct
ISSN1546-1696
KeywordsBase Sequence, DNA, Circular, Genetic Variation, Genome, Human, Haplotypes, High-Throughput Nucleotide Sequencing, Humans, Sequence Analysis, DNA
Abstract

The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.

DOI10.1038/s41587-019-0217-9
Alternate JournalNat Biotechnol
PubMed ID31406327
PubMed Central IDPMC6776680
Grant ListR01 HG006677 / HG / NHGRI NIH HHS / United States
R01 HG010040 / HG / NHGRI NIH HHS / United States
UM1 HG008898 / HG / NHGRI NIH HHS / United States

Similar Publications