Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes.

TitleNanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes.
Publication TypeJournal Article
Year of Publication2020
AuthorsShafin, K, Pesout, T, Lorig-Roach, R, Haukness, M, Olsen, HE, Bosworth, C, Armstrong, J, Tigyi, K, Maurer, N, Koren, S, Sedlazeck, FJ, Marschall, T, Mayes, S, Costa, V, Zook, JM, Liu, KJ, Kilburn, D, Sorensen, M, Munson, KM, Vollger, MR, Monlong, J, Garrison, E, Eichler, EE, Salama, S, Haussler, D, Green, RE, Akeson, M, Phillippy, A, Miga, KH, Carnevali, P, Jain, M, Paten, B
JournalNat Biotechnol
Date Published2020 Sep
KeywordsAlgorithms, Benchmarking, Chromosomes, Human, Deep Learning, Genome, Human, Genomics, Haploidy, High-Throughput Nucleotide Sequencing, HLA Antigens, Humans, Nanopore Sequencing, Sequence Analysis, DNA

De novo assembly of a human genome using nanopore long-read sequences has been reported, but it used more than 150,000 CPU hours and weeks of wall-clock time. To enable rapid human genome assembly, we present Shasta, a de novo long-read assembler, and polishing algorithms named MarginPolish and HELEN. Using a single PromethION nanopore sequencer and our toolkit, we assembled 11 highly contiguous human genomes de novo in 9 d. We achieved roughly 63× coverage, 42-kb read N50 values and 6.5× coverage in reads >100 kb using three flow cells per sample. Shasta produced a complete haploid human genome assembly in under 6 h on a single commercial compute node. MarginPolish and HELEN polished haploid assemblies to more than 99.9% identity (Phred quality score QV = 30) with nanopore reads alone. Addition of proximity-ligation sequencing enabled near chromosome-level scaffolds for all 11 genomes. We compare our assembly performance to existing methods for diploid, haploid and trio-binned human samples and report superior accuracy and speed.

Alternate JournalNat Biotechnol
PubMed ID32686750
PubMed Central IDPMC7483855
Grant ListU01 HG010961 / HG / NHGRI NIH HHS / United States
U01 HL137183 / HL / NHLBI NIH HHS / United States
U41 HG010972 / HG / NHGRI NIH HHS / United States
/ HHMI / Howard Hughes Medical Institute / United States
U41 HG007234 / HG / NHGRI NIH HHS / United States
T32 HG008345 / HG / NHGRI NIH HHS / United States
R01 HG010329 / HG / NHGRI NIH HHS / United States
U01 HG010971 / HG / NHGRI NIH HHS / United States
R01 HG010053 / HG / NHGRI NIH HHS / United States
R01 HG009737 / HG / NHGRI NIH HHS / United States
R01 HG010485 / HG / NHGRI NIH HHS / United States
U54 HG007990 / HG / NHGRI NIH HHS / United States
U24 HG009084 / HG / NHGRI NIH HHS / United States
R03 HG009730 / HG / NHGRI NIH HHS / United States
OT3 HL142481 / HL / NHLBI NIH HHS / United States
R44 GM134994 / GM / NIGMS NIH HHS / United States
OT2 OD026682 / OD / NIH HHS / United States
U24 HG010262 / HG / NHGRI NIH HHS / United States
R43 HG009859 / HG / NHGRI NIH HHS / United States

Similar Publications

Chen F, Zhang Y, Chandrashekar DS, Varambally S, Creighton CJ. Global impact of somatic structural variation on the cancer proteome. Nat Commun. 2023;14(1):5637.
Rhie A, Nurk S, Cechova M, Hoyt SJ, Taylor DJ, Altemose N, et al.. The complete sequence of a human Y chromosome. Nature. 2023;621(7978):344-354.
Saengboonmee C, Sorin S, Sangkhamanon S, Chomphoo S, Indramanee S, Seubwai W, et al.. γ-aminobutyric acid B2 receptor: A potential therapeutic target for cholangiocarcinoma in patients with diabetes mellitus. World J Gastroenterol. 2023;29(28):4416-4432.
Wojcik MH, Reuter CM, Marwaha S, Mahmoud M, Duyzend MH, Barseghyan H, et al.. Beyond the exome: What's next in diagnostic testing for Mendelian conditions. Am J Hum Genet. 2023;110(8):1229-1248.
Chin C-S, Behera S, Khalak A, Sedlazeck FJ, Sudmant PH, Wagner J, et al.. Multiscale analysis of pangenomes enables improved representation of genomic diversity for repetitive and clinically relevant genes. Nat Methods. 2023;20(8):1213-1221.
Zhao N, Teles F, Lu J, Koestler DC, Beck J, Boerwinkle E, et al.. Epigenome-wide association study using peripheral blood leukocytes identifies genomic regions associated with periodontal disease and edentulism in the Atherosclerosis Risk in Communities study. J Clin Periodontol. 2023;50(9):1140-1153.
Harris RA, McAllister JM, Strauss JF. Single-Cell RNA-Seq Identifies Pathways and Genes Contributing to the Hyperandrogenemia Associated with Polycystic Ovary Syndrome. Int J Mol Sci. 2023;24(13).
Qian X, Srinivasan T, He J, Chen R. The Role of Ceramide in Inherited Retinal Disease Pathology. Adv Exp Med Biol. 2023;1415:303-307.
Calame DG, Guo T, Wang C, Garrett L, Jolly A, Dawood M, et al.. Monoallelic variation in DHX9, the gene encoding the DExH-box helicase DHX9, underlies neurodevelopment disorders and Charcot-Marie-Tooth disease. Am J Hum Genet. 2023;110(8):1394-1413.
Walker KA, Chen J, Shi L, Yang Y, Fornage M, Zhou L, et al.. Proteomics analysis of plasma from middle-aged adults identifies protein markers of dementia risk in later life. Sci Transl Med. 2023;15(705):eadf5681.