|Comparing vertebrate whole-genome shotgun reads to the human genome.
|Year of Publication
|Chen, R, Bouck, JB, Weinstock, GM, Gibbs, RA
|Animals, Base Sequence, Computational Biology, Conserved Sequence, Databases, Genetic, Expressed Sequence Tags, Genome, Genome, Human, Heterochromatin, Humans, Mice, Molecular Sequence Data, Rats, Sequence Analysis, DNA, Sequence Homology, Nucleic Acid, Species Specificity, Transcription, Genetic
Multi-species sequence comparisons are a very efficient way to reveal conserved genes. Because sequence finishing is expensive and time consuming, many genome sequences are likely to stay incomplete. A challenge is to use these fragmented data for understanding the human genome. Methods for using cross-species whole-genome shotgun sequence (WGS) for genome annotation are described in this paper. About one-half million high-quality rat WGS reads (covering 7.5% of the rat genome) generated at the Baylor College of Medicine Human Genome Sequencing Center were compared with the human genome. Using computer-generated random reads as a negative control, a set of parameters was determined for reliable interpretation of BLAST search results. About 10% of the rat reads contain regions that are conserved in the human genomic sequence and about one-third of these include known gene-coding regions. Mapping the conserved regions to human chromosomes showed a 23-fold enrichment for coding regions compared with noncoding regions. This approach can also be applied to other mammalian genomes for gene finding. These data predicted approximately 42,500 genes in the human, slightly more than reported previously.
|PubMed Central ID
|HG 02395 / HG / NHGRI NIH HHS / United States