|Title||Inferring population mutation rate and sequencing error rate using the SNP frequency spectrum in a sample of DNA sequences.|
|Publication Type||Journal Article|
|Year of Publication||2009|
|Authors||Liu, X, Maxwell, TJ, Boerwinkle, E, Fu, Y-X|
|Journal||Mol Biol Evol|
|Date Published||2009 Jul|
|Keywords||Base Sequence, Computer Simulation, Humans, Mutation, Polymorphism, Single Nucleotide, Sequence Analysis, DNA|
One challenge of analyzing samples of DNA sequences is to account for the nonnegligible polymorphisms produced by error when the sequencing error rate is high or the sample size is large. Specifically, those artificial sequence variations will bias the observed single nucleotide polymorphism (SNP) frequency spectrum, which in turn may further bias the estimators of the population mutation rate theta =4N mu for diploids. In this paper, we propose a new approach based on the generalized least squares (GLS) method to estimate theta, given a SNP frequency spectrum in a random sample of DNA sequences from a population. With this approach, error rate epsilon can be either known or unknown. In the latter case, epsilon can be estimated given an estimation of theta. Using coalescent simulation, we compared our estimators with other estimators of theta. The results showed that the GLS estimators are more efficient than other theta estimators with error, and the estimation of epsilon is usable in practice when the theta per bp is small. We demonstrate the application of the estimators with 10-kb noncoding region sequence sampled from a human population and provide suggestions for choosing theta estimators with error.
|Alternate Journal||Mol Biol Evol|
|PubMed Central ID||PMC2734145|
|Grant List||P50 GM065509 / GM / NIGMS NIH HHS / United States |
5P50 GM 065509-07 / GM / NIGMS NIH HHS / United States
Inferring population mutation rate and sequencing error rate using the SNP frequency spectrum in a sample of DNA sequences.
|Single-cell multiomics of the human retina reveals hierarchical transcription factor collaboration in mediating cell type-specific effects of genetic variants on gene regulation. Genome Biol. 2023;24(1):269..|
|Association Between Whole Blood-Derived Mitochondrial DNA Copy Number, Low-Density Lipoprotein Cholesterol, and Cardiovascular Disease Risk. J Am Heart Assoc. 2023;12(20):e029090..|
|Rare variants in long non-coding RNAs are associated with blood lipid levels in the TOPMed whole-genome sequencing study. Am J Hum Genet. 2023;110(10):1704-1717..|