SNP Discovery

BCM-HGSC Mutation Discovery Pipeline

Single Nucleotide Mutations

Sequencing reads are compared with their respective amplicon reference sequences using a modification of SNPDetector, which employs a relaxed Het peak ratio threshold to compensate for possible heterogeneity of the tumor tissue sample. We use Polyphred 6.0b as a backup discovery method and capture any high-scoring variation missed (very rarely) by SNPDetector. Special mention should be made of our collaboration with Dr. Jinghui Zhang in the laboratory of Dr Ken Beautow, on the development and calibration of SNPdetector. This software has consistently outperformed other routines for the direct discovery of heterozygotes.

Putative polymorphisms accumulated from this analysis are annotated with the following information:

  1. Chromosome and global coordinates

  2. Coincidence with known variation (dbSNP current build, as well as local databases of newly identified variants)

  3. Functional information:

    1. gene compartment (intron, exon, splice junction)

    2. non-synonymous amino acid change if any

    3. position of non-synonymous amino acid in protein

    4. BLOSSUM62 score of variant amino acid compared to reference

Novel SNPs with recognizable functional potential (e.g., non-synonymous SNP or splice junctional variants) are further evaluated. First, they are visually inspected at the trace level and those that are not clearly noise are passed on to experimental validation, currently pyro-sequencing. We plan to resequence with Sanger reads the matched normal tissue in patients with mutations passing pyro-sequencing validation.

All putative genotypes of each individual at each mutation position, along with annotation and validation status will be stored in local databases. Reports are formatted for submission to common data repositories according to protocols jointly established.

Structural Variation Discovery

We are in the process of evaluating Polyphred 6.0b and a new module for SNPDetector designed for detecting intra-exonic indels in biallelic resequencing traces. One or both of these will be used for indel discovery and characterization. Genotype frequencies of constitutional variants (i.e., known SNPs) will be tracked since they might reveal commonly deleted genes or gene segments (LOH) through departures fro, Hardy-Weinberg equilibrium.

Quality Control

Sequencing coverage is a critical factor leading to variation discovery. We track coverage using the SNPDetector program rather than by a single base quality measure. Bases are judged to be covered in a given read if SNPDetector is able to make a call at any given position regardless of their Phred quality score (although there is a high correlation between Phred quality score and SNPDetector coverage).

Related Publications

Wang H, Chen X, Dudinsky L, Patenia C, Chen Y, Li Y, et al. Exome capture sequencing identifies a novel mutation in BBS4. Mol Vis. 2011 ;17:3529-40.

Zhou G, Gingras M-C, Liu S-H, Sanchez R, Edwards D, Dawson D, et al. SSTR5 P335L monoclonal antibody differentiates pancreatic neuroendocrine neuroplasms with different SSTR5 genotypes. Surgery. 2011 ;150(6):1136-42.

Barbalic M, Reiner AP, Wu C, Hixson JE, Franceschini N, Eaton CB, et al. Genome-wide association analysis of incident coronary heart disease (CHD) in African Americans: a short report. PLoS Genet. 2011 ;7(8):e1002199.

Butte NF, V Voruganti S, Cole SA, Haack K, Comuzzie AG, Muzny DM, et al. Resequencing of IRS2 reveals rare variants for obesity but not fasting glucose homeostasis in Hispanic children. Physiol Genomics. 2011 ;43(18):1029-37.

Kuang S-Q, Guo D-chuan, Prakash SK, McDonald M-LN, Johnson RJ, Wang M, et al. Recurrent chromosome 16p13.1 duplications are a risk factor for aortic dissections. PLoS Genet. 2011 ;7(6):e1002118.

Li D, Tanaka M, F Brunicardi C, Fisher WE, Gibbs RA, Gingras M-C. Association between somatostatin receptor 5 gene polymorphisms and pancreatic cancer risk and survival. Cancer. 2011 ;117(13):2863-72.

Fornage M, Debette S, Bis JC, Schmidt H, M Ikram A, Dufouil C, et al. Genome-wide association studies of cerebral white matter lesion burden: the CHARGE consortium. Ann Neurol. 2011 ;69(6):928-39.

Fawcett GL, Raveendran M, Deiros DRio, Chen D, Yu F, R Harris A, et al. Characterization of single-nucleotide variation in Indian-origin rhesus macaques (Macaca mulatta). BMC Genomics. 2011 ;12:311.

Bell R, Herring SM, Gokul N, Monita M, Grove ML, Boerwinkle E, et al. High-resolution identity by descent mapping uncovers the genetic basis for blood pressure differences between spontaneously hypertensive rat lines. Circ Cardiovasc Genet. 2011 ;4(3):223-31.