Title | Inferring population mutation rate and sequencing error rate using the SNP frequency spectrum in a sample of DNA sequences. |
Publication Type | Journal Article |
Year of Publication | 2009 |
Authors | Liu, X, Maxwell, TJ, Boerwinkle, E, Fu, Y-X |
Journal | Mol Biol Evol |
Volume | 26 |
Issue | 7 |
Pagination | 1479-90 |
Date Published | 2009 Jul |
ISSN | 1537-1719 |
Keywords | Base Sequence, Computer Simulation, Humans, Mutation, Polymorphism, Single Nucleotide, Sequence Analysis, DNA |
Abstract | One challenge of analyzing samples of DNA sequences is to account for the nonnegligible polymorphisms produced by error when the sequencing error rate is high or the sample size is large. Specifically, those artificial sequence variations will bias the observed single nucleotide polymorphism (SNP) frequency spectrum, which in turn may further bias the estimators of the population mutation rate theta =4N mu for diploids. In this paper, we propose a new approach based on the generalized least squares (GLS) method to estimate theta, given a SNP frequency spectrum in a random sample of DNA sequences from a population. With this approach, error rate epsilon can be either known or unknown. In the latter case, epsilon can be estimated given an estimation of theta. Using coalescent simulation, we compared our estimators with other estimators of theta. The results showed that the GLS estimators are more efficient than other theta estimators with error, and the estimation of epsilon is usable in practice when the theta per bp is small. We demonstrate the application of the estimators with 10-kb noncoding region sequence sampled from a human population and provide suggestions for choosing theta estimators with error. |
DOI | 10.1093/molbev/msp059 |
Alternate Journal | Mol Biol Evol |
PubMed ID | 19318520 |
PubMed Central ID | PMC2734145 |
Grant List | P50 GM065509 / GM / NIGMS NIH HHS / United States 5P50 GM 065509-07 / GM / NIGMS NIH HHS / United States |
Inferring population mutation rate and sequencing error rate using the SNP frequency spectrum in a sample of DNA sequences.
Similar Publications
Single cell dual-omic atlas of the human developing retina. Nat Commun. 2024;15(1):6792. | .
Improved high quality sand fly assemblies enabled by ultra low input long read sequencing. Sci Data. 2024;11(1):918. | .
Loss of symmetric cell division of apical neural progenitors drives DENND5A-related developmental and epileptic encephalopathy. Nat Commun. 2024;15(1):7239. | .