Abstract | We investigated substitution patterns and neighboring-nucleotide effects for 2,576,903 single nucleotide polymorphisms (SNPs) publicly available through the National Center for Biotechnology Information (NCBI). The proportions of substitutions were A/G, 32.77%; C/T, 32.81%; A/C, 8.98%; G/T, 9.06%; A/T, 7.46%; and C/G, 8.92%. The two nucleotides immediately neighboring the variable site showed major deviation from genome-wide and chromosome-specific expectations, although lesser biases extended as far as 200 bp. On the 5' side, the biases for A, C, G, and T were 1.43%, 4.91%, -1.70%, and -4.62%, respectively. These biases were -4.44%, -1.59%, 5.05%, and 0.99%, respectively, on the 3' side. The neighboring-nucleotide patterns for transitions were dominated by the hypermutability effects of CpG dinucleotides. Transitions were more common than transversions, and the probability of a transversion increased with increasing A + T content at the two adjacent sites. Neighboring-nucleotide biases were not consistent among chromosomes, with Chromosomes 19 and 22 standing out as different from the others. These data provide genome-wide information about the effects of neighboring nucleotides on mutational and evolutionary processes giving rise to contemporary patterns of nucleotide occurrence surrounding SNPs.
|