Title | xAtlas: scalable small variant calling across heterogeneous next-generation sequencing experiments. |
Publication Type | Journal Article |
Year of Publication | 2022 |
Authors | Farek, J, Hughes, D, Salerno, W, Zhu, Y, Pisupati, A, Mansfield, A, Krasheninina, O, English, AC, Metcalf, GA, Boerwinkle, E, Muzny, DM, Gibbs, RA, Khan, Z, Sedlazeck, FJ |
Journal | Gigascience |
Volume | 12 |
Date Published | 2022 Dec 28 |
ISSN | 2047-217X |
Keywords | Algorithms, Genome, High-Throughput Nucleotide Sequencing, INDEL Mutation, Polymorphism, Single Nucleotide, Software |
Abstract | BACKGROUND: The growing volume and heterogeneity of next-generation sequencing (NGS) data complicate the further optimization of identifying DNA variation, especially considering that curated high-confidence variant call sets frequently used to validate these methods are generally developed from the analysis of comparatively small and homogeneous sample sets. FINDINGS: We have developed xAtlas, a single-sample variant caller for single-nucleotide variants (SNVs) and small insertions and deletions (indels) in NGS data. xAtlas features rapid runtimes, support for CRAM and gVCF file formats, and retraining capabilities. xAtlas reports SNVs with 99.11% recall and 98.43% precision across a reference HG002 sample at 60× whole-genome coverage in less than 2 CPU hours. Applying xAtlas to 3,202 samples at 30× whole-genome coverage from the 1000 Genomes Project achieves an average runtime of 1.7 hours per sample and a clear separation of the individual populations in principal component analysis across called SNVs. CONCLUSIONS: xAtlas is a fast, lightweight, and accurate SNV and small indel calling method. Source code for xAtlas is available under a BSD 3-clause license at https://github.com/jfarek/xatlas. |
DOI | 10.1093/gigascience/giac125 |
Alternate Journal | Gigascience |
PubMed ID | 36644891 |
PubMed Central ID | PMC9841152 |
Grant List | UM1 HG008898 / HG / NHGRI NIH HHS / United States UM1 HG008901 / HG / NHGRI NIH HHS / United States |
xAtlas: scalable small variant calling across heterogeneous next-generation sequencing experiments.
Similar Publications
Genetic diversity of 1,845 rhesus macaques improves genetic variation interpretation and identifies disease models. Nat Commun. 2024;15(1):5658. | .
MethPhaser: methylation-based long-read haplotype phasing of human genomes. Nat Commun. 2024;15(1):5327. | .
Unveiling novel genetic variants in 370 challenging medically relevant genes using the long read sequencing data of 41 samples from 19 global populations. Mol Genet Genomics. 2024;299(1):65. | .