Genetic Variation Projects
Improved human knowledge of genetic variation is key to our understanding the causes of many diseases. The study of variation in the genome centered historically on PCR amplification of short genomic regions. More recently, new high-throughput technologies have made it viable to use whole genome sequencing for detection of variation.
There are a number of techniques used to select and amplify genomic regions of interest including whole genome or chromosome sampling by random shotgun sequencing, PCR amplification covering contiguous stretches of targeted genomic regions, PCR amplification of targeted gene coding regions and array-based hybridization to immobilized probes covering genomic regions, targeted exonic sequences or the entire exome. Each of these efforts has benefited from the close and historical affiliation with the Department of Molecular and Human Genetics (MHG), as well as other departments and institutions in the Texas Medical Center. The in-depth knowledge generated from these efforts is paving the way toward a new era of genetic research and the promise of genomically-driven personalized medicine.
The BCM-HGSC pioneered the concept of sequencing directed PCR amplification across the exons and splice junctions of potential disease genes. These techniques have been refined into a robust laboratory and informatics pipeline for Sanger sequencing of directed PCR (Medical Re-sequencing) The BCM-HGSC is currently investigating the replacement of PCR amplification with capture chip technology; sequencing for variant discovery is moving to second generation platforms such as Roche/454, AB SOLiD, and Illumina/Solexa.
Functional Mutation Discovery (FMD)
As part of the 1000 Genomes Project, the BCM-HGSC is playing a defining role in Pilot 3, the targeted sequencing of more than 1000 genes across 1000 individuals. The Center has worked closely with Roche/Nimblegen to develop and design array-based capture chips targeting either the entire human exome or subsets of genes. These sub-regions of the genome are then eluted and used to construct Roche/454 sequencing libraries; sequencing libraries prepared for the other second generation platforms are under development.
Large Region Variant Discovery
The BCM-HGSC has pursued two paths for discovery of sequence and structural variants across defined regions of the human genome. As a part of the International HapMap Project, the BCM-HGSC generated sequence from a pool of 16 human DNAs to identify common single base variants that could then be used to generate subsequent deep genotyping across the initial four ethnic populations. A similar project was carried out using material from individual flow-sorted chromosomes. These data were used to develop our SNP computational identification methods.
The efficiency of this general approach led its application in other species. For example, within the bovine Hereford genome project, we have sampled more than 300,000 sequence reads from six additional bovine breeds. Using the methods that we developed in characterizing human data, we found an average of one SNP every 1.2kb with a remarkably high conversion rate of nearly 95% when tested independently. A similar approach in the honeybee genome project was used to generate variant data from a strain of Africanized bees. Ten thousand markers discovered there are now being used to map the genes responsible for traits associated with aggression.
The second path involved the generation of overlapping PCR amplicons across ten defined 500 kb regions (as defined by the first phase of the ENCODE project from 48 individuals to ascertain all variants across these intervals. This large project allowed the development of many of the protocols and informatics tools that we are using today.