Title | Copy number variation detection in whole-genome sequencing data using the Bayesian information criterion. |
Publication Type | Journal Article |
Year of Publication | 2011 |
Authors | Xi, R, Hadjipanayis, AG, Luquette, LJ, Kim, T-M, Lee, E, Zhang, J, Johnson, MD, Muzny, DM, Wheeler, DA, Gibbs, RA, Kucherlapati, R, Park, PJ |
Journal | Proc Natl Acad Sci U S A |
Volume | 108 |
Issue | 46 |
Pagination | E1128-36 |
Date Published | 2011 Nov 15 |
ISSN | 1091-6490 |
Keywords | Algorithms, Bayes Theorem, Brain Neoplasms, Comparative Genomic Hybridization, Computer Simulation, DNA Copy Number Variations, Female, Gene Dosage, Genome, Genome, Human, Glioblastoma, Humans, Models, Genetic, Models, Statistical, Sequence Analysis, DNA |
Abstract | DNA copy number variations (CNVs) play an important role in the pathogenesis and progression of cancer and confer susceptibility to a variety of human disorders. Array comparative genomic hybridization has been used widely to identify CNVs genome wide, but the next-generation sequencing technology provides an opportunity to characterize CNVs genome wide with unprecedented resolution. In this study, we developed an algorithm to detect CNVs from whole-genome sequencing data and applied it to a newly sequenced glioblastoma genome with a matched control. This read-depth algorithm, called BIC-seq, can accurately and efficiently identify CNVs via minimizing the Bayesian information criterion. Using BIC-seq, we identified hundreds of CNVs as small as 40 bp in the cancer genome sequenced at 10× coverage, whereas we could only detect large CNVs (> 15 kb) in the array comparative genomic hybridization profiles for the same genome. Eighty percent (14/16) of the small variants tested (110 bp to 14 kb) were experimentally validated by quantitative PCR, demonstrating high sensitivity and true positive rate of the algorithm. We also extended the algorithm to detect recurrent CNVs in multiple samples as well as deriving error bars for breakpoints using a Gibbs sampling approach. We propose this statistical approach as a principled yet practical and efficient method to estimate CNVs in whole-genome sequencing data. |
DOI | 10.1073/pnas.1110574108 |
Alternate Journal | Proc Natl Acad Sci U S A |
PubMed ID | 22065754 |
PubMed Central ID | PMC3219132 |
Grant List | R01 GM082798 / GM / NIGMS NIH HHS / United States RC1 HG005482 / HG / NHGRI NIH HHS / United States U24 CA144025 / CA / NCI NIH HHS / United States |
Copy number variation detection in whole-genome sequencing data using the Bayesian information criterion.
Similar Publications
DNA Methylation-Derived Immune Cell Proportions and Cancer Risk in Black Participants. Cancer Res Commun. 2024;4(10):2714-2723. | .
Whole genomes of Amazonian uakari monkeys reveal complex connectivity and fast differentiation driven by high environmental dynamism. Commun Biol. 2024;7(1):1283. | .
Identification of allele-specific KIV-2 repeats and impact on Lp(a) measurements for cardiovascular disease risk. BMC Med Genomics. 2024;17(1):255. | .