Title | Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. |
Publication Type | Journal Article |
Year of Publication | 2019 |
Authors | Wenger, AM, Peluso, P, Rowell, WJ, Chang, P-C, Hall, RJ, Concepcion, GT, Ebler, J, Fungtammasan, A, Kolesnikov, A, Olson, ND, Töpfer, A, Alonge, M, Mahmoud, M, Qian, Y, Chin, C-S, Phillippy, AM, Schatz, MC, Myers, G, DePristo, MA, Ruan, J, Marschall, T, Sedlazeck, FJ, Zook, JM, Li, H, Koren, S, Carroll, A, Rank, DR, Hunkapiller, MW |
Journal | Nat Biotechnol |
Volume | 37 |
Issue | 10 |
Pagination | 1155-1162 |
Date Published | 2019 Oct |
ISSN | 1546-1696 |
Keywords | Base Sequence, DNA, Circular, Genetic Variation, Genome, Human, Haplotypes, High-Throughput Nucleotide Sequencing, Humans, Sequence Analysis, DNA |
Abstract | The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads. |
DOI | 10.1038/s41587-019-0217-9 |
Alternate Journal | Nat Biotechnol |
PubMed ID | 31406327 |
PubMed Central ID | PMC6776680 |
Grant List | R01 HG006677 / HG / NHGRI NIH HHS / United States R01 HG010040 / HG / NHGRI NIH HHS / United States UM1 HG008898 / HG / NHGRI NIH HHS / United States |
Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome.
Similar Publications
Inverted triplications formed by iterative template switches generate structural variant diversity at genomic disorder loci. Cell Genom. 2024;4(7):100590. | .
Unveiling novel genetic variants in 370 challenging medically relevant genes using the long read sequencing data of 41 samples from 19 global populations. Mol Genet Genomics. 2024;299(1):65. | .
Genetic diversity of 1,845 rhesus macaques improves genetic variation interpretation and identifies disease models. Nat Commun. 2024;15(1):5658. | .