Title | A multi-task convolutional deep neural network for variant calling in single molecule sequencing. |
Publication Type | Journal Article |
Year of Publication | 2019 |
Authors | Luo, R, Sedlazeck, FJ, Lam, T-W, Schatz, MC |
Journal | Nat Commun |
Volume | 10 |
Issue | 1 |
Pagination | 998 |
Date Published | 2019 Mar 01 |
ISSN | 2041-1723 |
Keywords | Base Sequence, Computational Biology, DNA Mutational Analysis, Genome, Human, Genome-Wide Association Study, Genomics, Genotype, Genotyping Techniques, Humans, INDEL Mutation, Nanopores, Neural Networks, Computer, Polymorphism, Single Nucleotide, Sequence Analysis, DNA, Software |
Abstract | The accurate identification of DNA sequence variants is an important, but challenging task in genomics. It is particularly difficult for single molecule sequencing, which has a per-nucleotide error rate of ~5-15%. Meeting this demand, we developed Clairvoyante, a multi-task five-layer convolutional neural network model for predicting variant type (SNP or indel), zygosity, alternative allele and indel length from aligned reads. For the well-characterized NA12878 human sample, Clairvoyante achieves 99.67, 95.78, 90.53% F1-score on 1KP common variants, and 98.65, 92.57, 87.26% F1-score for whole-genome analysis, using Illumina, PacBio, and Oxford Nanopore data, respectively. Training on a second human sample shows Clairvoyante is sample agnostic and finds variants in less than 2 h on a standard server. Furthermore, we present 3,135 variants that are missed using Illumina but supported independently by both PacBio and Oxford Nanopore reads. Clairvoyante is available open-source ( https://github.com/aquaskyline/Clairvoyante ), with modules to train, utilize and visualize the model. |
DOI | 10.1038/s41467-019-09025-z |
Alternate Journal | Nat Commun |
PubMed ID | 30824707 |
PubMed Central ID | PMC6397153 |
Grant List | R01 HG006677 / HG / NHGRI NIH HHS / United States UM1 HG008898 / HG / NHGRI NIH HHS / United States |
A multi-task convolutional deep neural network for variant calling in single molecule sequencing.
Similar Publications
Inverted triplications formed by iterative template switches generate structural variant diversity at genomic disorder loci. Cell Genom. 2024;4(7):100590. | .
Unveiling novel genetic variants in 370 challenging medically relevant genes using the long read sequencing data of 41 samples from 19 global populations. Mol Genet Genomics. 2024;299(1):65. | .
Genetic diversity of 1,845 rhesus macaques improves genetic variation interpretation and identifies disease models. Nat Commun. 2024;15(1):5658. | .