%0 Journal Article %J Nat Commun %D 2020 %T NEMF mutations that impair ribosome-associated quality control are associated with neuromuscular disease. %A Martin, Paige B %A Kigoshi-Tansho, Yu %A Sher, Roger B %A Ravenscroft, Gianina %A Stauffer, Jennifer E %A Kumar, Rajesh %A Yonashiro, Ryo %A Müller, Tina %A Griffith, Christopher %A Allen, William %A Pehlivan, Davut %A Harel, Tamar %A Zenker, Martin %A Howting, Denise %A Schanze, Denny %A Faqeih, Eissa A %A Almontashiri, Naif A M %A Maroofian, Reza %A Houlden, Henry %A Mazaheri, Neda %A Galehdari, Hamid %A Douglas, Ganka %A Posey, Jennifer E %A Ryan, Monique %A James R Lupski %A Laing, Nigel G %A Joazeiro, Claudio A P %A Cox, Gregory A %K Amino Acid Sequence %K Animals %K Female %K Humans %K Male %K Mice %K Mice, Knockout %K Mutation %K Neuromuscular Diseases %K Proteolysis %K Ribosomes %K RNA-Binding Proteins %K Saccharomyces cerevisiae %K Saccharomyces cerevisiae Proteins %K Sequence Alignment %X

A hallmark of neurodegeneration is defective protein quality control. The E3 ligase Listerin (LTN1/Ltn1) acts in a specialized protein quality control pathway-Ribosome-associated Quality Control (RQC)-by mediating proteolytic targeting of incomplete polypeptides produced by ribosome stalling, and Ltn1 mutation leads to neurodegeneration in mice. Whether neurodegeneration results from defective RQC and whether defective RQC contributes to human disease have remained unknown. Here we show that three independently-generated mouse models with mutations in a different component of the RQC complex, NEMF/Rqc2, develop progressive motor neuron degeneration. Equivalent mutations in yeast Rqc2 selectively interfere with its ability to modify aberrant translation products with C-terminal tails which assist with RQC-mediated protein degradation, suggesting a pathomechanism. Finally, we identify NEMF mutations expected to interfere with function in patients from seven families presenting juvenile neuromuscular disease. These uncover NEMF's role in translational homeostasis in the nervous system and implicate RQC dysfunction in causing neurodegeneration.

%B Nat Commun %V 11 %P 4625 %8 2020 Sep 15 %G eng %N 1 %1 https://www.ncbi.nlm.nih.gov/pubmed/32934225?dopt=Abstract %R 10.1038/s41467-020-18327-6 %0 Journal Article %J J Evol Biol %D 2016 %T Comparative genomic study of arachnid immune systems indicates loss of beta-1,3-glucanase-related proteins and the immune deficiency pathway. %A Bechsgaard, J %A Vanthournout, B %A Funch, P %A Vestbo, S %A Gibbs, R A %A Richards, S %A Sanggaard, K W %A Enghild, J J %A Bilde, T %K Amino Acid Sequence %K Animals %K Antimicrobial Cationic Peptides %K Arachnida %K Blood Proteins %K Defensins %K Gene Dosage %K Genome %K Genomics %K Hemolymph %K Immune System %K Immunity, Innate %K Protein Domains %K Sequence Alignment %K Signal Transduction %X

Analyses of arthropod genomes have shown that the genes in the different innate humoral immune responses are conserved. These genes encode proteins that are involved in immune signalling pathways that recognize pathogens and activate immune responses. These immune responses include phagocytosis, encapsulation of the pathogen and production of effector molecules for pathogen elimination. So far, most studies have focused on insects leaving other major arthropod groups largely unexplored. Here, we annotate the immune-related genes of six arachnid genomes and present evidence for a conserved pattern of some immune genes, but also evolutionary changes in the arachnid immune system. Specifically, our results suggest that the family of recognition molecules of beta-1,3-glucanase-related proteins (βGRPs) and the genes from the immune deficiency (IMD) signalling pathway have been lost in a common ancestor of arachnids. These findings are consistent with previous work suggesting that the humoral immune effector proteins are constitutively produced in arachnids in contrast to insects, where these have to be induced. Further functional studies are needed to verify this. We further show that the full haemolymph clotting cascade found in the horseshoe crab is retrieved in most arachnid genomes. Tetranychus lacks at least one major component, although it is possible that this cascade could still function through recruitment of a different protein. The gel-forming protein in horseshoe crabs, coagulogen, was not recovered in any of the arachnid genomes; however, it is possible that the arachnid clot consists of a related protein, spätzle, that is present in all of the genomes.

%B J Evol Biol %V 29 %P 277-91 %8 2016 Feb %G eng %N 2 %1 https://www.ncbi.nlm.nih.gov/pubmed/26528622?dopt=Abstract %R 10.1111/jeb.12780 %0 Journal Article %J Hum Mutat %D 2016 %T Mechanisms for the Generation of Two Quadruplications Associated with Split-Hand Malformation. %A Gu, Shen %A Posey, Jennifer E %A Yuan, Bo %A Carvalho, Claudia M B %A Luk, H M %A Erikson, Kelly %A Lo, Ivan F M %A Leung, Gordon K C %A Pickering, Curtis R %A Chung, Brian H Y %A Lupski, James R %K 14-3-3 Proteins %K Adult %K Aged %K Alu Elements %K Base Sequence %K Basic Helix-Loop-Helix Transcription Factors %K Chromosome Duplication %K Chromosomes, Human, Pair 17 %K DNA Copy Number Variations %K Female %K Genetic Loci %K Genome, Human %K Hand Deformities, Congenital %K Humans %K Infant %K Male %K Molecular Sequence Data %K Pedigree %K Sequence Alignment %K Sequence Analysis, DNA %X

Germline copy-number variants (CNVs) involving quadruplications are rare and the mechanisms generating them are largely unknown. Previously, we reported a 20-week gestation fetus with split-hand malformation; clinical microarray detected two maternally inherited triplications separated by a copy-number neutral region at 17p13.3, involving BHLHA9 and part of YWHAE. Here, we describe an 18-month-old male sibling of the previously described fetus with split-hand malformation. Custom high-density microarray and digital droplet PCR revealed the copy-number gains were actually quadruplications in the mother, the fetus, and her later born son. This quadruplication-normal-quadruplication pattern was shown to be expanded from the triplication-normal-triplication CNV at the same loci in the maternal grandmother. We mapped two breakpoint junctions and demonstrated that both are mediated by Alu repetitive elements and identical in these four individuals. We propose a three-step process combining Alu-mediated replicative-repair-based mechanism(s) and intergenerational, intrachromosomal nonallelic homologous recombination to generate the quadruplications in this family.

%B Hum Mutat %V 37 %P 160-4 %8 2016 Feb %G eng %N 2 %1 https://www.ncbi.nlm.nih.gov/pubmed/26549411?dopt=Abstract %R 10.1002/humu.22929 %0 Journal Article %J Am J Hum Genet %D 2015 %T De Novo GMNN Mutations Cause Autosomal-Dominant Primordial Dwarfism Associated with Meier-Gorlin Syndrome. %A Burrage, Lindsay C %A Charng, Wu-Lin %A Eldomery, Mohammad K %A Willer, Jason R %A Davis, Erica E %A Lugtenberg, Dorien %A Zhu, Wenmiao %A Leduc, Magalie S %A Akdemir, Zeynep C %A Azamian, Mahshid %A Zapata, Gladys %A Hernandez, Patricia P %A Schoots, Jeroen %A de Munnik, Sonja A %A Roepman, Ronald %A Pearring, Jillian N %A Jhangiani, Shalini %A Katsanis, Nicholas %A Vissers, Lisenka E L M %A Brunner, Han G %A Beaudet, Arthur L %A Rosenfeld, Jill A %A Muzny, Donna M %A Gibbs, Richard A %A Eng, Christine M %A Xia, Fan %A Lalani, Seema R %A Lupski, James R %A Bongers, Ernie M H F %A Yang, Yaping %K Adolescent %K Amino Acid Sequence %K Base Sequence %K Cell Cycle %K Child, Preschool %K Congenital Microtia %K Dwarfism %K Exons %K Female %K Geminin %K Gene Expression %K Genes, Dominant %K Growth Disorders %K Heterozygote %K High-Throughput Nucleotide Sequencing %K Humans %K Inheritance Patterns %K Male %K Micrognathism %K Molecular Sequence Data %K Mutation %K Patella %K Pedigree %K Protein Stability %K Proteolysis %K RNA Splicing %K Sequence Alignment %X

Meier-Gorlin syndrome (MGS) is a genetically heterogeneous primordial dwarfism syndrome known to be caused by biallelic loss-of-function mutations in one of five genes encoding pre-replication complex proteins: ORC1, ORC4, ORC6, CDT1, and CDC6. Mutations in these genes cause disruption of the origin of DNA replication initiation. To date, only an autosomal-recessive inheritance pattern has been described in individuals with this disorder, with a molecular etiology established in about three-fourths of cases. Here, we report three subjects with MGS and de novo heterozygous mutations in the 5' end of GMNN, encoding the DNA replication inhibitor geminin. We identified two truncating mutations in exon 2 (the 1(st) coding exon), c.16A>T (p.Lys6(∗)) and c.35_38delTCAA (p.Ile12Lysfs(∗)4), and one missense mutation, c.50A>G (p.Lys17Arg), affecting the second-to-last nucleotide of exon 2 and possibly RNA splicing. Geminin is present during the S, G2, and M phases of the cell cycle and is degraded during the metaphase-anaphase transition by the anaphase-promoting complex (APC), which recognizes the destruction box sequence near the 5' end of the geminin protein. All three GMNN mutations identified alter sites 5' to residue Met28 of the protein, which is located within the destruction box. We present data supporting a gain-of-function mechanism, in which the GMNN mutations result in proteins lacking the destruction box and hence increased protein stability and prolonged inhibition of replication leading to autosomal-dominant MGS.

%B Am J Hum Genet %V 97 %P 904-13 %8 2015 Dec 03 %G eng %N 6 %1 https://www.ncbi.nlm.nih.gov/pubmed/26637980?dopt=Abstract %R 10.1016/j.ajhg.2015.11.006 %0 Journal Article %J Eur J Hum Genet %D 2015 %T Mutations in COL27A1 cause Steel syndrome and suggest a founder mutation effect in the Puerto Rican population. %A Gonzaga-Jauregui, Claudia %A Gamble, Candace N %A Yuan, Bo %A Penney, Samantha %A Jhangiani, Shalini %A Muzny, Donna M %A Gibbs, Richard A %A Lupski, James R %A Hecht, Jacqueline T %K Amino Acid Sequence %K Child, Preschool %K Comparative Genomic Hybridization %K Exome %K Female %K Fibrillar Collagens %K Follow-Up Studies %K Founder Effect %K Genotype %K Hispanic or Latino %K Humans %K Infant %K Male %K Molecular Sequence Data %K Mutation %K Osteochondrodysplasias %K Pedigree %K Polymorphism, Single Nucleotide %K Prostaglandins F %K Puerto Rico %K Sequence Alignment %X

Osteochondrodysplasias represent a large group of developmental structural disorders that can be caused by mutations in a variety of genes responsible for chondrocyte development, differentiation, mineralization and early ossification. The application of whole-exome sequencing to disorders apparently segregating as Mendelian traits has proven to be an effective approach to disease gene identification for conditions with unknown molecular etiology. We identified a homozygous missense variant p.(Gly697Arg) in COL27A1, in a family with Steel syndrome and no consanguinity. Interestingly, the identified variant seems to have arisen as a founder mutation in the Puerto Rican population.

%B Eur J Hum Genet %V 23 %P 342-6 %8 2015 Mar %G eng %N 3 %1 https://www.ncbi.nlm.nih.gov/pubmed/24986830?dopt=Abstract %R 10.1038/ejhg.2014.107 %0 Journal Article %J Genome Biol %D 2015 %T Teaser: Individualized benchmarking and optimization of read mapping results for NGS data. %A Smolka, Moritz %A Rescheneder, Philipp %A Schatz, Michael C %A von Haeseler, Arndt %A Fritz J Sedlazeck %K Animals %K Benchmarking %K Genomics %K High-Throughput Nucleotide Sequencing %K Perciformes %K Sequence Alignment %K Software %X

Mapping reads to a genome remains challenging, especially for non-model organisms with lower quality assemblies, or for organisms with higher mutation rates. While most research has focused on speeding up the mapping process, little attention has been paid to optimize the choice of mapper and parameters for a user's dataset. Here, we present Teaser, a software that assists in these choices through rapid automated benchmarking of different mappers and parameter settings for individualized data. Within minutes, Teaser completes a quantitative evaluation of an ensemble of mapping algorithms and parameters. We use Teaser to demonstrate how Bowtie2 can be optimized for different data.

%B Genome Biol %V 16 %P 235 %8 2015 Oct 22 %G eng %1 https://www.ncbi.nlm.nih.gov/pubmed/26494581?dopt=Abstract %R 10.1186/s13059-015-0803-1 %0 Journal Article %J Nucleic Acids Res %D 2015 %T Tissue-specific transcriptome sequencing analysis expands the non-human primate reference transcriptome resource (NHPRTR). %A Peng, Xinxia %A Thierry-Mieg, Jean %A Thierry-Mieg, Danielle %A Nishida, Andrew %A Pipes, Lenore %A Bozinoski, Marjan %A Thomas, Matthew J %A Kelly, Sara %A Weiss, Jeffrey M %A Raveendran, Muthuswamy %A Muzny, Donna %A Gibbs, Richard A %A Rogers, Jeffrey %A Schroth, Gary P %A Katze, Michael G %A Mason, Christopher E %K Animals %K Databases, Genetic %K Gene Expression Profiling %K Internet %K Macaca %K Molecular Sequence Annotation %K Organ Specificity %K Primates %K Reference Standards %K Sequence Alignment %K Sequence Analysis, RNA %X

The non-human primate reference transcriptome resource (NHPRTR, available online at http://nhprtr.org/) aims to generate comprehensive RNA-seq data from a wide variety of non-human primates (NHPs), from lemurs to hominids. In the 2012 Phase I of the NHPRTR project, 19 billion fragments or 3.8 terabases of transcriptome sequences were collected from pools of ∼ 20 tissues in 15 species and subspecies. Here we describe a major expansion of NHPRTR by adding 10.1 billion fragments of tissue-specific RNA-seq data. For this effort, we selected 11 of the original 15 NHP species and subspecies and constructed total RNA libraries for the same ∼ 15 tissues in each. The sequence quality is such that 88% of the reads align to human reference sequences, allowing us to compute the full list of expression abundance across all tissues for each species, using the reads mapped to human genes. This update also includes improved transcript annotations derived from RNA-seq data for rhesus and cynomolgus macaques, two of the most commonly used NHP models and additional RNA-seq data compiled from related projects. Together, these comprehensive reference transcriptomes from multiple primates serve as a valuable community resource for genome annotation, gene dynamics and comparative functional analysis.

%B Nucleic Acids Res %V 43 %P D737-42 %8 2015 Jan %G eng %N Database issue %1 https://www.ncbi.nlm.nih.gov/pubmed/25392405?dopt=Abstract %R 10.1093/nar/gku1110 %0 Journal Article %J Mol Phylogenet Evol %D 2014 %T Untangling the influences of unmodeled evolutionary processes on phylogenetic signal in a forensically important HIV-1 transmission cluster. %A Doyle, Vinson P %A Andersen, John J %A Nelson, Bradley J %A Metzker, Michael L %A Brown, Jeremy M %K Bayes Theorem %K env Gene Products, Human Immunodeficiency Virus %K Evolution, Molecular %K HIV Infections %K HIV-1 %K Humans %K Likelihood Functions %K Markov Chains %K Models, Genetic %K Phylogeny %K Selection, Genetic %K Sequence Alignment %K Sequence Analysis, DNA %X

Stochastic models of sequence evolution have been developed to reflect many biologically important processes, allowing for accurate phylogenetic reconstruction when an appropriate model is selected. However, commonly used models do not incorporate several potentially important biological processes. Spurious phylogenetic inference may result if these processes play an important role in the evolution of a dataset yet are not incorporated into assumed models. Few studies have attempted to assess the relative importance of multiple processes in producing spurious inferences. The application of phylogenetic methods to infer the source of HIV-1 transmission clusters depends upon accurate phylogenetic results, yet there are several relevant unmodeled biological processes (e.g., recombination and convergence) that may cause complications. Here, through analyses of HIV-1 env sequences from a small, forensically important transmission cluster, we tease apart the impact of these processes and present evidence suggesting that convergent evolution and high rates of insertions and deletions (causing alignment uncertainty) led to spurious phylogenetic signal with forensic relevance. Previous analyses show paraphyly of HIV-1 lineages sampled from an individual who, based on non-phylogenetic evidence, had never acted as a source of infection for others in this transmission cluster. If true, this pattern calls into question assumptions underlying phylogenetic approaches to source and recipient identification. By systematically assessing the contribution of different unmodeled processes, we demonstrate that removal of sites likely influenced by strong positive selection both reduces the alignment-wide signal supporting paraphyly of viruses sampled from this individual and eliminates support for the effects of recombination. Additionally, the removal of ambiguously aligned sites alters strongly supported relationships among viruses sampled from different individuals. These observations highlight the need to jointly consider multiple unmodeled evolutionary processes and motivate a phylogenomic perspective when inferring viral transmission histories.

%B Mol Phylogenet Evol %V 75 %P 126-37 %8 2014 Jun %G eng %1 https://www.ncbi.nlm.nih.gov/pubmed/24589520?dopt=Abstract %R 10.1016/j.ympev.2014.02.022 %0 Journal Article %J Bioinformatics %D 2013 %T NextGenMap: fast and accurate read mapping in highly polymorphic genomes. %A Fritz J Sedlazeck %A Rescheneder, Philipp %A von Haeseler, Arndt %K Genome %K Genomics %K High-Throughput Nucleotide Sequencing %K Polymorphism, Genetic %K Sequence Alignment %K Software %X

SUMMARY: When choosing a read mapper, one faces the trade off between speed and the ability to map reads in highly polymorphic regions. Here, we report NextGenMap, a fast and accurate read mapper, which reduces this dilemma. NextGenMap aligns reads reliably to a reference genome even when the sequence difference between target and reference genome is large, i.e. highly polymorphic genome. At the same time, NextGenMap outperforms current mapping methods with respect to runtime and to the number of correctly mapped reads. NextGenMap efficiently uses the available hardware by exploiting multi-core CPUs as well as graphic cards (GPUs), if available. In addition, NextGenMap handles automatically any read data independent of read length and sequencing technology.

AVAILABILITY: NextGenMap source code and documentation are available at: http://cibiv.github.io/NextGenMap/.

CONTACT: fritz.sedlazeck@univie.ac.at.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

%B Bioinformatics %V 29 %P 2790-1 %8 2013 Nov 01 %G eng %N 21 %1 https://www.ncbi.nlm.nih.gov/pubmed/23975764?dopt=Abstract %R 10.1093/bioinformatics/btt468 %0 Journal Article %J PLoS One %D 2012 %T Advanced methylome analysis after bisulfite deep sequencing: an example in Arabidopsis. %A Dinh, Huy Q %A Dubin, Manu %A Fritz J Sedlazeck %A Lettner, Nicole %A Mittelsten Scheid, Ortrun %A von Haeseler, Arndt %K Arabidopsis %K Base Sequence %K Cytosine %K DNA Methylation %K Genome, Plant %K High-Throughput Nucleotide Sequencing %K Reference Standards %K Reproducibility of Results %K Sequence Alignment %K Sequence Analysis, DNA %K Sulfites %X

Deep sequencing after bisulfite conversion (BS-Seq) is the method of choice to generate whole genome maps of cytosine methylation at single base-pair resolution. Its application to genomic DNA of Arabidopsis flower bud tissue resulted in the first complete methylome, determining a methylation rate of 6.7% in this tissue. BS-Seq reads were mapped onto an in silico converted reference genome, applying the so-called 3-letter genome method. Here, we present BiSS (Bisufite Sequencing Scorer), a new method applying Smith-Waterman alignment to map bisulfite-converted reads to a reference genome. In addition, we introduce a comprehensive adaptive error estimate that accounts for sequencing errors, erroneous bisulfite conversion and also wrongly mapped reads. The re-analysis of the Arabidopsis methylome data with BiSS mapped substantially more reads to the genome. As a result, it determines the methylation status of an extra 10% of cytosines and estimates the methylation rate to be 7.7%. We validated the results by individual traditional bisulfite sequencing for selected genomic regions. In addition to predicting the methylation status of each cytosine, BiSS also provides an estimate of the methylation degree at each genomic site. Thus, BiSS explores BS-Seq data more extensively and provides more information for downstream analysis.

%B PLoS One %V 7 %P e41528 %8 2012 %G eng %N 7 %1 https://www.ncbi.nlm.nih.gov/pubmed/22911809?dopt=Abstract %R 10.1371/journal.pone.0041528 %0 Journal Article %J J Virol %D 2012 %T Biological characterization and next-generation genome sequencing of the unclassified Cotia virus SPAn232 (Poxviridae). %A Afonso, Priscila P %A Silva, Patrícia M %A Schnellrath, Laila C %A Jesus, Desyreé M %A Hu, Jianhong %A Yang, Yajie %A Renne, Rolf %A Attias, Marcia %A Condit, Richard C %A Moussatché, Nissin %A Damaso, Clarissa R %K Amino Acid Sequence %K Animals %K Chick Embryo %K Chlorocebus aethiops %K Cross Reactions %K Cytopathogenic Effect, Viral %K Genes, Viral %K Genome, Viral %K High-Throughput Nucleotide Sequencing %K Humans %K Macaca mulatta %K Mice %K Molecular Sequence Data %K Neutralization Tests %K Phylogeny %K Poxviridae %K Rabbits %K Rats %K Sequence Alignment %K Swine %K Viral Tropism %K Virus Replication %X

Cotia virus (COTV) SPAn232 was isolated in 1961 from sentinel mice at Cotia field station, São Paulo, Brazil. Attempts to classify COTV within a recognized genus of the Poxviridae have generated contradictory findings. Studies by different researchers suggested some similarity to myxoma virus and swinepox virus, whereas another investigation characterized COTV SPAn232 as a vaccinia virus strain. Because of the lack of consensus, we have conducted an independent biological and molecular characterization of COTV. Virus growth curves reached maximum yields at approximately 24 to 48 h and were accompanied by virus DNA replication and a characteristic early/late pattern of viral protein synthesis. Interestingly, COTV did not induce detectable cytopathic effects in BSC-40 cells until 4 days postinfection and generated viral plaques only after 8 days. We determined the complete genomic sequence of COTV by using a combination of the next-generation DNA sequencing technologies 454 and Illumina. A unique contiguous sequence of 185,139 bp containing 185 genes, including the 90 genes conserved in all chordopoxviruses, was obtained. COTV has an interesting panel of open reading frames (ORFs) related to the evasion of host defense, including two novel genes encoding C-C chemokine-like proteins, each present in duplicate copies. Phylogenetic analysis revealed the highest amino acid identity scores with Cervidpoxvirus, Capripoxvirus, Suipoxvirus, Leporipoxvirus, and Yatapoxvirus. However, COTV grouped as an independent branch within this clade, which clearly excluded its classification as an Orthopoxvirus. Therefore, our data suggest that COTV could represent a new poxvirus genus.

%B J Virol %V 86 %P 5039-54 %8 2012 May %G eng %N 9 %1 https://www.ncbi.nlm.nih.gov/pubmed/22345477?dopt=Abstract %R 10.1128/JVI.07162-11 %0 Journal Article %J Genome Biol %D 2011 %T Genome sequence of an Australian kangaroo, Macropus eugenii, provides insight into the evolution of mammalian reproduction and development. %A Renfree, Marilyn B %A Papenfuss, Anthony T %A Deakin, Janine E %A Lindsay, James %A Heider, Thomas %A Belov, Katherine %A Rens, Willem %A Waters, Paul D %A Pharo, Elizabeth A %A Shaw, Geoff %A Wong, Emily S W %A Lefèvre, Christophe M %A Nicholas, Kevin R %A Kuroki, Yoko %A Wakefield, Matthew J %A Zenger, Kyall R %A Wang, Chenwei %A Ferguson-Smith, Malcolm %A Nicholas, Frank W %A Hickford, Danielle %A Yu, Hongshi %A Short, Kirsty R %A Siddle, Hannah V %A Frankenberg, Stephen R %A Chew, Keng Yih %A Menzies, Brandon R %A Stringer, Jessica M %A Suzuki, Shunsuke %A Hore, Timothy A %A Delbridge, Margaret L %A Patel, Hardip R %A Mohammadi, Amir %A Schneider, Nanette Y %A Hu, Yanqiu %A O'Hara, William %A Al Nadaf, Shafagh %A Wu, Chen %A Feng, Zhi-Ping %A Cocks, Benjamin G %A Wang, Jianghui %A Flicek, Paul %A Searle, Stephen M J %A Fairley, Susan %A Beal, Kathryn %A Herrero, Javier %A Carone, Dawn M %A Suzuki, Yutaka %A Sugano, Sumio %A Toyoda, Atsushi %A Sakaki, Yoshiyuki %A Kondo, Shinji %A Nishida, Yuichiro %A Tatsumoto, Shoji %A Mandiou, Ion %A Hsu, Arthur %A McColl, Kaighin A %A Lansdell, Benjamin %A Weinstock, George %A Kuczek, Elizabeth %A McGrath, Annette %A Wilson, Peter %A Men, Artem %A Hazar-Rethinam, Mehlika %A Hall, Allison %A Davis, John %A Wood, David %A Williams, Sarah %A Sundaravadanam, Yogi %A Muzny, Donna M %A Jhangiani, Shalini N %A Lewis, Lora R %A Morgan, Margaret B %A Okwuonu, Geoffrey O %A Ruiz, San Juana %A Santibanez, Jireh %A Nazareth, Lynne %A Cree, Andrew %A Fowler, Gerald %A Kovar, Christie L %A Dinh, Huyen H %A Joshi, Vandita %A Jing, Chyn %A Lara, Fremiet %A Thornton, Rebecca %A Chen, Lei %A Deng, Jixin %A Liu, Yue %A Shen, Joshua Y %A Song, Xing-Zhi %A Edson, Janette %A Troon, Carmen %A Thomas, Daniel %A Stephens, Amber %A Yapa, Lankesha %A Levchenko, Tanya %A Gibbs, Richard A %A Cooper, Desmond W %A Speed, Terence P %A Fujiyama, Asao %A Graves, Jennifer A M %A O'Neill, Rachel J %A Pask, Andrew J %A Forrest, Susan M %A Worley, Kim C %K Animals %K Australia %K Biological Evolution %K Chromosome Mapping %K Chromosomes, Mammalian %K Female %K Gene Expression Regulation %K Genome %K Genomic Imprinting %K In Situ Hybridization, Fluorescence %K Macropodidae %K MicroRNAs %K Molecular Sequence Data %K Reproduction %K Sequence Alignment %K Sequence Analysis, DNA %K Transcriptome %X

BACKGROUND: We present the genome sequence of the tammar wallaby, Macropus eugenii, which is a member of the kangaroo family and the first representative of the iconic hopping mammals that symbolize Australia to be sequenced. The tammar has many unusual biological characteristics, including the longest period of embryonic diapause of any mammal, extremely synchronized seasonal breeding and prolonged and sophisticated lactation within a well-defined pouch. Like other marsupials, it gives birth to highly altricial young, and has a small number of very large chromosomes, making it a valuable model for genomics, reproduction and development.

RESULTS: The genome has been sequenced to 2 × coverage using Sanger sequencing, enhanced with additional next generation sequencing and the integration of extensive physical and linkage maps to build the genome assembly. We also sequenced the tammar transcriptome across many tissues and developmental time points. Our analyses of these data shed light on mammalian reproduction, development and genome evolution: there is innovation in reproductive and lactational genes, rapid evolution of germ cell genes, and incomplete, locus-specific X inactivation. We also observe novel retrotransposons and a highly rearranged major histocompatibility complex, with many class I genes located outside the complex. Novel microRNAs in the tammar HOX clusters uncover new potential mammalian HOX regulatory elements.

CONCLUSIONS: Analyses of these resources enhance our understanding of marsupial gene evolution, identify marsupial-specific conserved non-coding elements and critical genes across a range of biological systems, including reproduction, development and immunity, and provide new insight into marsupial and mammalian biology and genome evolution.

%B Genome Biol %V 12 %P R81 %8 2011 Aug 29 %G eng %N 8 %1 https://www.ncbi.nlm.nih.gov/pubmed/21854559?dopt=Abstract %R 10.1186/gb-2011-12-8-r81 %0 Journal Article %J Nature %D 2011 %T A high-resolution map of human evolutionary constraint using 29 mammals. %A Lindblad-Toh, Kerstin %A Garber, Manuel %A Zuk, Or %A Lin, Michael F %A Parker, Brian J %A Washietl, Stefan %A Kheradpour, Pouya %A Ernst, Jason %A Jordan, Gregory %A Mauceli, Evan %A Ward, Lucas D %A Lowe, Craig B %A Holloway, Alisha K %A Clamp, Michele %A Gnerre, Sante %A Alföldi, Jessica %A Beal, Kathryn %A Chang, Jean %A Clawson, Hiram %A Cuff, James %A Di Palma, Federica %A Fitzgerald, Stephen %A Flicek, Paul %A Guttman, Mitchell %A Hubisz, Melissa J %A Jaffe, David B %A Jungreis, Irwin %A Kent, W James %A Kostka, Dennis %A Lara, Marcia %A Martins, André L %A Massingham, Tim %A Moltke, Ida %A Raney, Brian J %A Rasmussen, Matthew D %A Robinson, Jim %A Stark, Alexander %A Vilella, Albert J %A Wen, Jiayu %A Xie, Xiaohui %A Zody, Michael C %A Baldwin, Jen %A Bloom, Toby %A Chin, Chee Whye %A Heiman, Dave %A Nicol, Robert %A Nusbaum, Chad %A Young, Sarah %A Wilkinson, Jane %A Worley, Kim C %A Kovar, Christie L %A Muzny, Donna M %A Gibbs, Richard A %A Cree, Andrew %A Dihn, Huyen H %A Fowler, Gerald %A Jhangiani, Shalili %A Joshi, Vandita %A Lee, Sandra %A Lewis, Lora R %A Nazareth, Lynne V %A Okwuonu, Geoffrey %A Santibanez, Jireh %A Warren, Wesley C %A Mardis, Elaine R %A Weinstock, George M %A Wilson, Richard K %A Delehaunty, Kim %A Dooling, David %A Fronik, Catrina %A Fulton, Lucinda %A Fulton, Bob %A Graves, Tina %A Minx, Patrick %A Sodergren, Erica %A Birney, Ewan %A Margulies, Elliott H %A Herrero, Javier %A Green, Eric D %A Haussler, David %A Siepel, Adam %A Goldman, Nick %A Pollard, Katherine S %A Pedersen, Jakob S %A Lander, Eric S %A Kellis, Manolis %K Animals %K Disease %K Evolution, Molecular %K Exons %K Genome %K Genome, Human %K Genomics %K Health %K Humans %K Mammals %K Molecular Sequence Annotation %K Phylogeny %K RNA %K Selection, Genetic %K Sequence Alignment %K Sequence Analysis, DNA %X

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.

%B Nature %V 478 %P 476-82 %8 2011 Oct 12 %G eng %N 7370 %1 https://www.ncbi.nlm.nih.gov/pubmed/21993624?dopt=Abstract %R 10.1038/nature10530 %0 Journal Article %J Nature %D 2010 %T A map of human genome variation from population-scale sequencing. %A Abecasis, Gonçalo R %A Altshuler, David %A Auton, Adam %A Brooks, Lisa D %A Durbin, Richard M %A Gibbs, Richard A %A Hurles, Matt E %A McVean, Gil A %K Calibration %K Chromosomes, Human, Y %K Computational Biology %K DNA Mutational Analysis %K DNA, Mitochondrial %K Evolution, Molecular %K Female %K Genetic Association Studies %K Genetic Variation %K Genetics, Population %K Genome, Human %K Genome-Wide Association Study %K Genomics %K Genotype %K Haplotypes %K Humans %K Male %K Mutation %K Pilot Projects %K Polymorphism, Single Nucleotide %K Recombination, Genetic %K Sample Size %K Selection, Genetic %K Sequence Alignment %K Sequence Analysis, DNA %X

The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

%B Nature %V 467 %P 1061-73 %8 2010 Oct 28 %G eng %N 7319 %1 https://www.ncbi.nlm.nih.gov/pubmed/20981092?dopt=Abstract %R 10.1038/nature09534 %0 Journal Article %J Genome Biol %D 2010 %T Whole exome capture in solution with 3 Gbp of data. %A Bainbridge, Matthew N %A Wang, Min %A Burgess, Daniel L %A Kovar, Christie %A Rodesch, Matthew J %A D'Ascenzo, Mark %A Kitzman, Jacob %A Wu, Yuan-Qing %A Newsham, Irene %A Richmond, Todd A %A Jeddeloh, Jeffrey A %A Muzny, Donna %A Albert, Thomas J %A Gibbs, Richard A %K Base Pairing %K Databases, Nucleic Acid %K Exons %K Gene Library %K Haplotypes %K Humans %K Polymorphism, Single Nucleotide %K Reproducibility of Results %K Sequence Alignment %K Sequence Analysis, DNA %K Solutions %X

We have developed a solution-based method for targeted DNA capture-sequencing that is directed to the complete human exome. Using this approach allows the discovery of greater than 95% of all expected heterozygous singe base variants, requires as little as 3 Gbp of raw sequence data and constitutes an effective tool for identifying rare coding alleles in large scale genomic studies.

%B Genome Biol %V 11 %P R62 %8 2010 %G eng %N 6 %1 https://www.ncbi.nlm.nih.gov/pubmed/20565776?dopt=Abstract %R 10.1186/gb-2010-11-6-r62 %0 Journal Article %J Nature %D 2008 %T The complete genome of an individual by massively parallel DNA sequencing. %A Wheeler, David A %A Srinivasan, Maithreyan %A Egholm, Michael %A Shen, Yufeng %A Chen, Lei %A McGuire, Amy %A He, Wen %A Chen, Yi-Ju %A Makhijani, Vinod %A Roth, G Thomas %A Gomes, Xavier %A Tartaro, Karrie %A Niazi, Faheem %A Turcotte, Cynthia L %A Irzyk, Gerard P %A Lupski, James R %A Chinault, Craig %A Song, Xing-Zhi %A Liu, Yue %A Yuan, Ye %A Nazareth, Lynne %A Xiang Qin %A Donna M Muzny %A Margulies, Marcel %A Weinstock, George M %A Richard A Gibbs %A Rothberg, Jonathan M %K Alleles %K Computational Biology %K Genetic Predisposition to Disease %K Genetic Variation %K Genome, Human %K Genomics %K Genotype %K Humans %K Individuality %K Male %K Oligonucleotide Array Sequence Analysis %K Polymorphism, Single Nucleotide %K Reproducibility of Results %K Sensitivity and Specificity %K Sequence Alignment %K Sequence Analysis, DNA %K Software %X

The association of genetic variation with disease and drug response, and improvements in nucleic acid technologies, have given great optimism for the impact of 'genomic medicine'. However, the formidable size of the diploid human genome, approximately 6 gigabases, has prevented the routine application of sequencing methods to deciphering complete individual human genomes. To realize the full potential of genomics for human health, this limitation must be overcome. Here we report the DNA sequence of a diploid genome of a single individual, James D. Watson, sequenced to 7.4-fold redundancy in two months using massively parallel sequencing in picolitre-size reaction vessels. This sequence was completed in two months at approximately one-hundredth of the cost of traditional capillary electrophoresis methods. Comparison of the sequence to the reference genome led to the identification of 3.3 million single nucleotide polymorphisms, of which 10,654 cause amino-acid substitution within the coding sequence. In addition, we accurately identified small-scale (2-40,000 base pair (bp)) insertion and deletion polymorphism as well as copy number variation resulting in the large-scale gain and loss of chromosomal segments ranging from 26,000 to 1.5 million base pairs. Overall, these results agree well with recent results of sequencing of a single individual by traditional methods. However, in addition to being faster and significantly less expensive, this sequencing technology avoids the arbitrary loss of genomic sequences inherent in random shotgun sequencing by bacterial cloning because it amplifies DNA in a cell-free system. As a result, we further demonstrate the acquisition of novel human sequence, including novel genes not previously identified by traditional genomic sequencing. This is the first genome sequenced by next-generation technologies. Therefore it is a pilot for the future challenges of 'personalized genome sequencing'.

%B Nature %V 452 %P 872-6 %8 2008 Apr 17 %G eng %N 7189 %1 https://www.ncbi.nlm.nih.gov/pubmed/18421352?dopt=Abstract %R 10.1038/nature06884 %0 Journal Article %J Pac Symp Biocomput %D 2008 %T Pash 2.0: scaleable sequence anchoring for next-generation sequencing technologies. %A Coarfa, Cristian %A Milosavljevic, Aleksandar %K Algorithms %K Animals %K Computational Biology %K Databases, Genetic %K Evolution, Molecular %K Genome, Human %K Humans %K Sensitivity and Specificity %K Sequence Alignment %K Software %X

Many applications of next-generation sequencing technologies involve anchoring of a sequence fragment or a tag onto a corresponding position on a reference genome assembly. Positional Hashing method, implemented in the Pash 2.0 program, is specifically designed for the task of high-volume anchoring. In this article we present multi-diagonal gapped kmer collation and other improvements introduced in Pash 2.0 that further improve accuracy and speed of Positional Hashing. The goal of this article is to show that gapped kmer matching with cross-diagonal collation suffices for anchoring across close evolutionary distances and for the purpose of human resequencing. We propose a benchmark for evaluating the performance of anchoring programs that captures key parameters in specific applications, including duplicative structure of genomes of humans and other species. We demonstrate speedups of up to tenfold in large-scale anchoring experiments achieved by PASH 2.0 when compared to BLAT, another similarity search program frequently used for anchoring.

%B Pac Symp Biocomput %P 102-13 %8 2008 %G eng %1 https://www.ncbi.nlm.nih.gov/pubmed/18229679?dopt=Abstract %0 Journal Article %J BMC Microbiol %D 2008 %T PrimerSNP: a web tool for whole-genome selection of allele-specific and common primers of phylogenetically-related bacterial genomic sequences. %A Yao, Jiqiang %A Lin, Hong %A Van Deynze, Allen %A Doddapaneni, Harshavardhan %A Francis, Martha %A Lemos, Eliana Gertrudes Macedo %A Civerolo, Edwin L %K Alleles %K Bacteria %K Computational Biology %K DNA Primers %K Genome, Bacterial %K Internet %K Polymorphism, Single Nucleotide %K Sensitivity and Specificity %K Sequence Alignment %K Software %X

BACKGROUND: The increasing number of genomic sequences of bacteria makes it possible to select unique SNPs of a particular strain/species at the whole genome level and thus design specific primers based on the SNPs. The high similarity of genomic sequences among phylogenetically-related bacteria requires the identification of the few loci in the genome that can serve as unique markers for strain differentiation. PrimerSNP attempts to identify reliable strain-specific markers, on which specific primers are designed for pathogen detection purpose.

RESULTS: PrimerSNP is an online tool to design primers based on strain specific SNPs for multiple strains/species of microorganisms at the whole genome level. The allele-specific primers could distinguish query sequences of one strain from other homologous sequences by standard PCR reaction. Additionally, PrimerSNP provides a feature for designing common primers that can amplify all the homologous sequences of multiple strains/species of microorganisms. PrimerSNP is freely available at http://cropdisease.ars.usda.gov/~primer.

CONCLUSION: PrimerSNP is a high-throughput specific primer generation tool for the differentiation of phylogenetically-related strains/species. Experimental validation showed that this software had a successful prediction rate of 80.4 - 100% for strain specific primer design.

%B BMC Microbiol %V 8 %P 185 %8 2008 Oct 20 %G eng %1 https://www.ncbi.nlm.nih.gov/pubmed/18937861?dopt=Abstract %R 10.1186/1471-2180-8-185 %0 Journal Article %J Genome Res %D 2007 %T 28-way vertebrate alignment and conservation track in the UCSC Genome Browser. %A Miller, Webb %A Rosenbloom, Kate %A Hardison, Ross C %A Hou, Minmei %A Taylor, James %A Raney, Brian %A Burhans, Richard %A King, David C %A Baertsch, Robert %A Blankenberg, Daniel %A Kosakovsky Pond, Sergei L %A Nekrutenko, Anton %A Giardine, Belinda %A Harris, Robert S %A Tyekucheva, Svitlana %A Diekhans, Mark %A Pringle, Thomas H %A Murphy, William J %A Lesk, Arthur %A Weinstock, George M %A Lindblad-Toh, Kerstin %A Gibbs, Richard A %A Lander, Eric S %A Siepel, Adam %A Haussler, David %A Kent, W James %K Animals %K Base Sequence %K Cats %K Cattle %K Codon, Initiator %K Codon, Terminator %K Conserved Sequence %K Databases, Genetic %K Dogs %K Genome, Human %K Guinea Pigs %K Humans %K Mice %K Molecular Sequence Data %K Mutagenesis, Insertional %K Rabbits %K Rats %K Sequence Alignment %K Sequence Deletion %X

This article describes a set of alignments of 28 vertebrate genome sequences that is provided by the UCSC Genome Browser. The alignments can be viewed on the Human Genome Browser (March 2006 assembly) at http://genome.ucsc.edu, downloaded in bulk by anonymous FTP from http://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz28way, or analyzed with the Galaxy server at http://g2.bx.psu.edu. This article illustrates the power of this resource for exploring vertebrate and mammalian evolution, using three examples. First, we present several vignettes involving insertions and deletions within protein-coding regions, including a look at some human-specific indels. Then we study the extent to which start codons and stop codons in the human sequence are conserved in other species, showing that start codons are in general more poorly conserved than stop codons. Finally, an investigation of the phylogenetic depth of conservation for several classes of functional elements in the human genome reveals striking differences in the rates and modes of decay in alignability. Each functional class has a distinctive period of stringent constraint, followed by decays that allow (for the case of regulatory regions) or reject (for coding regions and ultraconserved elements) insertions and deletions.

%B Genome Res %V 17 %P 1797-808 %8 2007 Dec %G eng %N 12 %1 https://www.ncbi.nlm.nih.gov/pubmed/17984227?dopt=Abstract %R 10.1101/gr.6761107 %0 Journal Article %J Genome Res %D 2007 %T Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. %A Margulies, Elliott H %A Cooper, Gregory M %A Asimenos, George %A Thomas, Daryl J %A Dewey, Colin N %A Siepel, Adam %A Birney, Ewan %A Keefe, Damian %A Schwartz, Ariel S %A Hou, Minmei %A Taylor, James %A Nikolaev, Sergey %A Montoya-Burgos, Juan I %A Löytynoja, Ari %A Whelan, Simon %A Pardi, Fabio %A Massingham, Tim %A Brown, James B %A Bickel, Peter %A Holmes, Ian %A Mullikin, James C %A Ureta-Vidal, Abel %A Paten, Benedict %A Stone, Eric A %A Rosenbloom, Kate R %A Kent, W James %A Bouffard, Gerard G %A Guan, Xiaobin %A Hansen, Nancy F %A Idol, Jacquelyn R %A Maduro, Valerie V B %A Maskeri, Baishali %A McDowell, Jennifer C %A Park, Morgan %A Thomas, Pamela J %A Young, Alice C %A Blakesley, Robert W %A Muzny, Donna M %A Sodergren, Erica %A Wheeler, David A %A Worley, Kim C %A Jiang, Huaiyang %A Weinstock, George M %A Gibbs, Richard A %A Graves, Tina %A Fulton, Robert %A Mardis, Elaine R %A Wilson, Richard K %A Clamp, Michele %A Cuff, James %A Gnerre, Sante %A Jaffe, David B %A Chang, Jean L %A Lindblad-Toh, Kerstin %A Lander, Eric S %A Hinrichs, Angie %A Trumbower, Heather %A Clawson, Hiram %A Zweig, Ann %A Kuhn, Robert M %A Barber, Galt %A Harte, Rachel %A Karolchik, Donna %A Field, Matthew A %A Moore, Richard A %A Matthewson, Carrie A %A Schein, Jacqueline E %A Marra, Marco A %A Antonarakis, Stylianos E %A Batzoglou, Serafim %A Goldman, Nick %A Hardison, Ross %A Haussler, David %A Miller, Webb %A Pachter, Lior %A Green, Eric D %A Sidow, Arend %K Animals %K Evolution, Molecular %K Genome, Human %K Human Genome Project %K Humans %K Mammals %K Open Reading Frames %K Phylogeny %K Sequence Alignment %X

A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization.

%B Genome Res %V 17 %P 760-74 %8 2007 Jun %G eng %N 6 %1 https://www.ncbi.nlm.nih.gov/pubmed/17567995?dopt=Abstract %R 10.1101/gr.6034307 %0 Journal Article %J In Silico Biol %D 2007 %T nWayComp: a genome-wide sequence comparison tool for multiple strains/species of phylogenetically related microorganisms. %A Yao, Jiqiang %A Lin, Hong %A Harshavardhan Doddapaneni %A Civerolo, Edwin L %K Computational Biology %K Databases, Genetic %K Genes, Bacterial %K Genome, Bacterial %K Multigene Family %K Phylogeny %K Sequence Alignment %K Sequence Homology %K Software %K Xanthomonas %X

The increasing number of whole genomic sequences of microorganisms has led to the complexity of genome-wide annotation and gene sequence comparison among multiple microorganisms. To address this problem, we have developed nWayComp software that compares DNA and protein sequences of phylogenetically-related microorganisms. This package integrates a series of bioinformatics tools such as BLAST, ClustalW, ALIGN, PHYLIP and PRIMER3 for sequence comparison. It searches for homologous sequences among multiple organisms and identifies genes that are unique to a particular organism. The homologous gene sets are then ranked in the descending order of the sequence similarity. For each set of homologous sequences, a table of sequence identity among homologous genes along with sequence variations such as SNPs and INDELS is developed, and a phylogenetic tree is constructed. In addition, a common set of primers that can amplify all the homologous sequences are generated. The nWayComp package provides users with a quick and convenient tool to compare genomic sequences among multiple organisms at the whole-genome level.

%B In Silico Biol %V 7 %P 195-200 %8 2007 %G eng %N 2 %1 https://www.ncbi.nlm.nih.gov/pubmed/17688445?dopt=Abstract %0 Journal Article %J Proc Natl Acad Sci U S A %D 2006 %T Recurrent duplication-driven transposition of DNA during hominoid evolution. %A Johnson, Matthew E %A Cheng, Ze %A Morrison, V Anne %A Scherer, Steven %A Ventura, Mario %A Gibbs, Richard A %A Green, Eric D %A Eichler, Evan E %K Animals %K Base Sequence %K Biological Evolution %K Chromosomes, Human, Pair 16 %K DNA %K DNA Transposable Elements %K Evolution, Molecular %K Gene Duplication %K Hominidae %K Humans %K Molecular Sequence Data %K Phylogeny %K Sequence Alignment %K Sequence Analysis, DNA %X

The underlying mechanism by which the interspersed pattern of human segmental duplications has evolved is unknown. Based on a comparative analysis of primate genomes, we show that a particular segmental duplication (LCR16a) has been the source locus for the formation of the majority of intrachromosomal duplications blocks on human chromosome 16. We provide evidence that this particular segment has been active independently in each great ape and human lineage at different points during evolution. Euchromatic sequence that flanks sites of LCR16a integration are frequently lineage-specific duplications. This process has mobilized duplication blocks (15-200 kb in size) to new genomic locations in each species. Breakpoint analysis of lineage-specific insertions suggests coordinated deletion of repeat-rich DNA at the target site, in some cases deleting genes in that species. Our data support a model of duplication where the probability that a segment of DNA becomes duplicated is determined by its proximity to core duplicons, such as LCR16a.

%B Proc Natl Acad Sci U S A %V 103 %P 17626-31 %8 2006 Nov 21 %G eng %N 47 %1 https://www.ncbi.nlm.nih.gov/pubmed/17101969?dopt=Abstract %R 10.1073/pnas.0605426103 %0 Journal Article %J Proc Natl Acad Sci U S A %D 2006 %T Scale-invariant structure of strongly conserved sequence in genomic intersections and alignments. %A Salerno, William %A Havlak, Paul %A Miller, Jonathan %K Animals %K Base Sequence %K Cluster Analysis %K Conserved Sequence %K Genome %K Genomics %K Humans %K Mice %K Repetitive Sequences, Nucleic Acid %K Sequence Alignment %X

A power-law distribution of the length of perfectly conserved sequence from mouse/human whole-genome intersection and alignment is exhibited. Spatial correlations of these elements within the mouse genome are studied. It is argued that these power-law distributions and correlations are comprised in part by functional noncoding sequence and ought to be accounted for in estimating the statistical significance of apparent sequence conservation. These inter-genomic correlations of conservation are placed in the context of previously observed intra-genomic correlations, and their possible origins and consequences are discussed.

%B Proc Natl Acad Sci U S A %V 103 %P 13121-5 %8 2006 Aug 29 %G eng %N 35 %1 https://www.ncbi.nlm.nih.gov/pubmed/16924100?dopt=Abstract %R 10.1073/pnas.0605735103 %0 Journal Article %J Oncogene %D 2005 %T Pas1c1 is a candidate for the mouse pulmonary adenoma susceptibility 1 locus. %A Wang, Min %A Futamura, Manabu %A Wang, Yian %A You, Ming %K Adenoma %K Amino Acid Sequence %K Animals %K Base Sequence %K DNA Primers %K Exons %K Genetic Predisposition to Disease %K Lung Neoplasms %K Mice %K Molecular Sequence Data %K Sequence Alignment %K Sequence Homology, Amino Acid %K Tumor Suppressor Proteins %X

Pas1 candidate 1 (Pas1c1) gene (also named Lmna-rs1) was found to encode two alternatively spliced mRNA transcripts (i.e. Pas1c1-Va and Pas1c1-Vb). In this study, we identified three additional mRNA transcripts encoded by the Pas1c1 gene, which were designated as Pas1c1-Vc, Pas1c1-Vd, and Pas1c1-Ve, respectively. Similar to Pas1c1-Vb, the newly identified transcripts were only expressed in mouse lung tissues from strains carrying the Pas1-susceptible (Pas1/s) allele. Pas1c1 transcripts were also detected in heart, testis, or brain but not in liver, spleen, or kidney. An 11-nucleotide polymorphism was found within the 3'-acceptor splice site of exon 8, which cosegregates with mouse strain Pas1 alleles and may underlie the strain-specific exon 8 skipping. We also found that ectopic expression of the Pas1c1-Va and Pas1c1-Vb in COS7 and NIH3T3 cells exhibited distinct intracellular distributions. These results support that Pas1c1 as a candidate for the Pas1 locus and the strain-specific isoforms may have differential effects on cell proliferation.

%B Oncogene %V 24 %P 1958-63 %8 2005 Mar 10 %G eng %N 11 %1 https://www.ncbi.nlm.nih.gov/pubmed/15688036?dopt=Abstract %R 10.1038/sj.onc.1208295 %0 Journal Article %J Genome Res %D 2005 %T Pooled genomic indexing of rhesus macaque. %A Milosavljevic, Aleksandar %A Harris, Ronald A %A Sodergren, Erica J %A Jackson, Andrew R %A Kalafus, Ken J %A Hodgson, Anne %A Cree, Andrew %A Dai, Weilie %A Csuros, Miklos %A Zhu, Baoli %A De Jong, Pieter J %A Weinstock, George M %A Gibbs, Richard A %K Animals %K Chromosome Aberrations %K Chromosomes, Artificial, Bacterial %K Contig Mapping %K DNA %K Genetic Markers %K Genome %K Genome, Human %K Humans %K Macaca mulatta %K Sequence Alignment %K Sequence Analysis, DNA %X

Pooled genomic indexing (PGI) is a method for mapping collections of bacterial artificial chromosome (BAC) clones between species by using a combination of clone pooling and DNA sequencing. PGI has been used to map a total of 3858 BAC clones covering approximately 24% of the rhesus macaque (Macaca mulatta) genome onto 4178 homologous loci in the human genome. A number of intrachromosomal rearrangements were detected by mapping multiple segments within the individual rhesus BACs onto multiple disjoined loci in the human genome. Transversal pooling designs involving shuffled BAC arrays were employed for robust mapping even with modest DNA sequence read coverage. A further innovation, short-tag pooled genomic indexing (ST-PGI), was also introduced to further improve the economy of mapping by sequencing multiple, short, mapable tags within a single sequencing reaction.

%B Genome Res %V 15 %P 292-301 %8 2005 Feb %G eng %N 2 %1 https://www.ncbi.nlm.nih.gov/pubmed/15687293?dopt=Abstract %R 10.1101/gr.3162505 %0 Journal Article %J Genome Res %D 2004 %T EAnnot: a genome annotation tool using experimental evidence. %A Ding, Li %A Sabo, Aniko %A Berkowicz, Nicolas %A Meyer, Rekha R %A Shotland, Yoram %A Johnson, Mark R %A Pepin, Kymberlie H %A Wilson, Richard K %A Spieth, John %K Algorithms %K Base Sequence %K Chromosomes, Human, Pair 6 %K Computational Biology %K Genome %K Genomics %K Humans %K Models, Genetic %K Sensitivity and Specificity %K Sequence Alignment %X

The sequence of any genome becomes most useful for biological experimentation when a complete and accurate gene set is available. Gene prediction programs offer an efficient way to generate an automated gene set. Manual annotation, when performed by experienced annotators, is more accurate and complete than automated annotation. However, it is a laborious and expensive process, and by its nature, introduces a degree of variability not found with automated annotation. EAnnot (Electronic Annotation) is a program originally developed for manually annotating the human genome. It combines the latest bioinformatics tools to extract and analyze a wide range of publicly available data in order to achieve fast and reliable automatic gene prediction and annotation. EAnnot builds gene models based on mRNA, EST, and protein alignments to genomic sequence, attaches supporting evidence to the corresponding genes, identifies pseudogenes, and locates poly(A) sites and signals. Here, we compare manual annotation of human chromosome 6 with annotation performed by EAnnot in order to assess the latter's accuracy. EAnnot can readily be applied to manual annotation of other eukaryotic genomes and can be used to rapidly obtain an automated gene set.

%B Genome Res %V 14 %P 2503-9 %8 2004 Dec %G eng %N 12 %1 https://www.ncbi.nlm.nih.gov/pubmed/15574829?dopt=Abstract %R 10.1101/gr.3152604 %0 Journal Article %J Genome Inform %D 2003 %T Clone-array pooled shotgun mapping and sequencing: design and analysis of experiments. %A Csuros, Miklos %A Li, Bingshan %A Milosavljevic, Aleksandar %K Animals %K Chromosomes, Artificial, Bacterial %K Computational Biology %K Computer Simulation %K Drosophila melanogaster %K Genome %K Reproducibility of Results %K Research Design %K Sequence Alignment %X

This paper studies sequencing and mapping methods that rely solely on pooling and shotgun sequencing of clones. First, we scrutinize and improve the recently proposed Clone-Array Pooled Shotgun Sequencing (CAPSS) method, which delivers a BAC-linked assembly of a whole genome sequence. Secondly, we introduce a novel physical mapping method, called Clone-Array Pooled Shotgun Mapping (CAPS-MAP), which computes the physical ordering of BACs in a random library. Both CAPSS and CAPS-MAP construct subclone libraries from pooled genomic BAC clones.

%B Genome Inform %V 14 %P 186-95 %8 2003 %G eng %1 https://www.ncbi.nlm.nih.gov/pubmed/15706533?dopt=Abstract %0 Journal Article %J Development %D 2003 %T Graded phenotypic response to partial and complete deficiency of a brain-specific transcript variant of the winged helix transcription factor RFX4. %A Blackshear, Perry J %A Graves, Joan P %A Stumpo, Deborah J %A Cobos, Inma %A Rubenstein, John L R %A Zeldin, Darryl C %K Alternative Splicing %K Amino Acid Sequence %K Animals %K Brain %K DNA-Binding Proteins %K Embryo, Mammalian %K Helix-Turn-Helix Motifs %K Humans %K Hydrocephalus %K In Situ Hybridization %K Mice %K Mice, Transgenic %K Molecular Sequence Data %K Phenotype %K Protein Isoforms %K Regulatory Factor X Transcription Factors %K Sequence Alignment %K Tissue Distribution %K Transcription Factors %X

One line of mice harboring a cardiac-specific epoxygenase transgene developed head swelling and rapid neurological decline in young adulthood, and had marked hydrocephalus of the lateral and third ventricles. The transgene was found to be inserted into an intron in the mouse Rfx4 locus. This insertion apparently prevented expression of a novel variant transcript of RFX4 (RFX4_v3), a member of the regulatory factor X family of winged helix transcription factors. Interruption of two alleles resulted in profound failure of dorsal midline brain structure formation and perinatal death, presumably by interfering with expression of downstream genes. Interruption of a single allele prevented formation of the subcommissural organ, a structure important for cerebrospinal fluid flow through the aqueduct of Sylvius, and resulted in congenital hydrocephalus. These data implicate the RFX4_v3 variant transcript as being crucial for early brain development, as well as for the genesis of the subcommissural organ. These findings may be relevant to human congenital hydrocephalus, a birth defect that affects approximately 0.6 per 1000 newborns.

%B Development %V 130 %P 4539-52 %8 2003 Oct %G eng %N 19 %1 https://www.ncbi.nlm.nih.gov/pubmed/12925582?dopt=Abstract %R 10.1242/dev.00661 %0 Journal Article %J Proc Natl Acad Sci U S A %D 2003 %T Positional cloning of the major quantitative trait locus underlying lung tumor susceptibility in mice. %A Zhang, Zhongqiu %A Futamura, Manabu %A Vikis, Haris G %A Wang, Min %A Li, Jie %A Wang, Yian %A Guan, Kun-Liang %A You, Ming %K Adenoma %K Amino Acid Sequence %K Animals %K Base Sequence %K Blotting, Northern %K Chromosome Mapping %K Ciona intestinalis %K Cloning, Molecular %K DNA Primers %K Genetic Markers %K Genetic Predisposition to Disease %K Humans %K Lung Neoplasms %K Male %K Mice %K Mice, Inbred A %K Mice, Inbred C57BL %K Mice, Nude %K Microsatellite Repeats %K Molecular Sequence Data %K Proto-Oncogene Proteins %K Proto-Oncogene Proteins p21(ras) %K Quantitative Trait Loci %K ras Proteins %K Reverse Transcriptase Polymerase Chain Reaction %K Sequence Alignment %K Sequence Homology, Amino Acid %X

Pulmonary adenoma susceptibility 1 (Pas1), located on chromosome 6, is the major locus affecting inherited predisposition to lung tumor development in mice. We have fine mapped the Pas1 locus to a region of approximately 0.5 megabases by using congenic strains of mice, constructed by placing the Pas1 region of chromosome 6 from A/J mice onto the genetic background of C57BL/6J mice. Systematic characterization of Pas1 candidates establishes the Las1 (lung adenoma susceptibility 1) and Kras2 (Kirsten rat sarcoma oncogene 2) genes as primary candidates for the Pas1 locus. Clearly, Kras2 affects lung tumor progression only, and Las1 is likely to affect lung tumor multiplicity.

%B Proc Natl Acad Sci U S A %V 100 %P 12642-7 %8 2003 Oct 28 %G eng %N 22 %1 https://www.ncbi.nlm.nih.gov/pubmed/14583591?dopt=Abstract %R 10.1073/pnas.2133947100 %0 Journal Article %J Genome Res %D 2000 %T PipMaker--a web server for aligning two genomic DNA sequences. %A Schwartz, S %A Zhang, Z %A Frazer, K A %A Smit, A %A Riemer, C %A Bouck, J %A Gibbs, R %A Hardison, R %A Miller, W %K Animals %K Base Sequence %K Caenorhabditis elegans %K Computational Biology %K DNA %K Escherichia coli %K Genes, Bacterial %K Genes, Helminth %K Genes, Protozoan %K Humans %K Internet %K Mice %K Molecular Sequence Data %K Salmonella typhimurium %K Sequence Alignment %K Software %X

PipMaker (http://bio.cse.psu.edu) is a World-Wide Web site for comparing two long DNA sequences to identify conserved segments and for producing informative, high-resolution displays of the resulting alignments. One display is a percent identity plot (pip), which shows both the position in one sequence and the degree of similarity for each aligning segment between the two sequences in a compact and easily understandable form. Positions along the horizontal axis can be labeled with features such as exons of genes and repetitive elements, and colors can be used to clarify and enhance the display. The web site also provides a plot of the locations of those segments in both species (similar to a dot plot). PipMaker is appropriate for comparing genomic sequences from any two related species, although the types of information that can be inferred (e.g., protein-coding regions and cis-regulatory elements) depend on the level of conservation and the time and divergence rate since the separation of the species. Gene regulatory elements are often detectable as similar, noncoding sequences in species that diverged as much as 100-300 million years ago, such as humans and mice, Caenorhabditis elegans and C. briggsae, or Escherichia coli and Salmonella spp. PipMaker supports analysis of unfinished or "working draft" sequences by permitting one of the two sequences to be in unoriented and unordered contigs.

%B Genome Res %V 10 %P 577-86 %8 2000 Apr %G eng %N 4 %1 https://www.ncbi.nlm.nih.gov/pubmed/10779500?dopt=Abstract %R 10.1101/gr.10.4.577 %0 Journal Article %J Genome Res %D 1999 %T Identification of three novel Ca(2+) channel gamma subunit genes reveals molecular diversification by tandem and chromosome duplication. %A Burgess, D L %A Davis, C F %A Gefrides, L A %A Noebels, J L %K Amino Acid Sequence %K Calcium Channels %K Chromosomes, Human, Pair 16 %K Chromosomes, Human, Pair 17 %K Evolution, Molecular %K Expressed Sequence Tags %K Gene Duplication %K Genetic Variation %K Humans %K Molecular Sequence Data %K Multigene Family %K Peptides %K Phylogeny %K Physical Chromosome Mapping %K Sequence Alignment %X

Gene duplication is believed to be an important evolutionary mechanism for generating functional diversity within genomes. The accumulated products of ancient duplication events can be readily observed among the genes encoding voltage-dependent Ca(2+) ion channels. Ten paralogous genes have been identified that encode isoforms of the alpha(1) subunit, four that encode beta subunits, and three that encode alpha(2)delta subunits. Until recently, only a single gene encoding a muscle-specific isoform of the Ca(2+) channel gamma subunit (CACNG1) was known. Expression of a distantly related gene in the brain was subsequently demonstrated upon isolation of the Cacng2 gene, which is mutated in the mouse neurological mutant stargazer (stg). In this study, we sought to identify additional genes that encoded gamma subunits. Because gene duplication often generates paralogs that remain in close syntenic proximity (tandem duplication) or are copied onto related daughter chromosomes (chromosome or whole-genome duplication), we hypothesized that the known positions of CACNG1 and CACNG2 could be used to predict the likely locations of additional gamma subunit genes. Low-stringency genomic sequence analysis of targeted regions led to the identification of three novel Ca(2+) channel gamma subunit genes, CACNG3, CACNG4, and CACNG5, on chromosomes 16 and 17. These results demonstrate the value of genome evolution models for the identification of distantly related members of gene families.

%B Genome Res %V 9 %P 1204-13 %8 1999 Dec %G eng %N 12 %1 https://www.ncbi.nlm.nih.gov/pubmed/10613843?dopt=Abstract %R 10.1101/gr.9.12.1204 %0 Journal Article %J Bioinformatics %D 1998 %T BEAUTY-X: enhanced BLAST searches for DNA queries. %A Kim C Worley %A Culpepper, P %A Wiese, B A %A Smith, R F %K Amino Acid Sequence %K Computational Biology %K Databases, Factual %K DNA %K Molecular Sequence Data %K Proteins %K Sequence Alignment %K Software %X

UNLABELLED: BEAUTY (BLAST Enhanced Alignment Utility) is an enhanced version of the BLAST database search tool that facilitates identification of the functions of matched sequences. Three recent improvements to the BEAUTY program described here make the enhanced output (1) available for DNA queries, (2) available for searches of any protein database, and (3) more up-to-date, with periodic updates of the domain information.

AVAILABILITY: BEAUTY searches of the NCBI and EMBL non-redundant protein sequence databases are available from the BCM Search Launcher Web pages (http://gc.bcm.tmc. edu:8088/search-launcher/launcher.html). BEAUTY Post-Processing of submitted search results is available using the BCM Search Launcher Batch Client (version 2.6) (ftp://gc.bcm.tmc. edu/pub/software/search-launcher/).

SUPPLEMENTARY INFORMATION: Example figures are available at http://dot.bcm.tmc. edu:9331/papers/beautypp.html

CONTACT: (kworley,culpep)@bcm.tmc.edu

%B Bioinformatics %V 14 %P 890-1 %8 1998 %G eng %N 10 %1 https://www.ncbi.nlm.nih.gov/pubmed/9927720?dopt=Abstract %R 10.1093/bioinformatics/14.10.890 %0 Journal Article %J Genome Res %D 1998 %T Comparative sequence analysis of a gene-rich cluster at human chromosome 12p13 and its syntenic region in mouse chromosome 6. %A Ansari-Lari, M A %A Oeltjen, J C %A Schwartz, S %A Zhang, Z %A Donna M Muzny %A Lu, J %A Gorrell, J H %A Chinault, A C %A Belmont, J W %A Miller, W %A Richard A Gibbs %K Amino Acid Sequence %K Animals %K Chromosome Mapping %K Chromosomes %K Chromosomes, Human, Pair 12 %K Conserved Sequence %K Humans %K Mice %K Molecular Sequence Data %K Multigene Family %K Repetitive Sequences, Nucleic Acid %K Sequence Alignment %K Sequence Analysis, DNA %X

The Human Genome Project has created a formidable challenge: the extraction of biological information from extensive amounts of raw sequence. With the increasing availability of genomic sequence from other species, one approach to extracting coding and regulatory element information is through cross-species sequence comparison. To assess the strengths and weaknesses of this methodology for large-scale sequence analysis, 227 kb of mouse sequence syntenic to a gene-rich cluster on human chromosome 12p13 was obtained. Primarily through percent identity plots (PIPs) of SIM comparative sequence alignments, the sequence of coding regions, putative alternative exons, conserved noncoding regions, and correlation in repetitive element insertions were easily determined. The analysis demonstrated that the number, order, and orientation of all 17 genes are conserved between the two species, whereas two human pseudogenes are absent in mouse. In addition, apart from MIRs, no direct correlation of distribution or position of the majority of repetitive elements between the two species is seen. Finally, in examining the synonymous and nonsynonymous substitution rates in the conserved genes, a large variation in nonsynonymous rates is observed indicating that the genes in this region are diverging at different rates. This study indicates the utility and strength of large-scale cross-species sequence comparisons in the extraction of biological information from raw sequence, especially when combined with other computational tools such as GRAIL and BLAST.

%B Genome Res %V 8 %P 29-40 %8 1998 Jan %G eng %N 1 %1 https://www.ncbi.nlm.nih.gov/pubmed/9445485?dopt=Abstract %0 Journal Article %J Gene %D 1998 %T The genomic organization of Isopeptidase T-3 (ISOT-3), a new member of the ubiquitin specific protease family (UBP). %A Timms, K M %A Ansari-Lari, M A %A Morris, W %A Brown, S N %A Richard A Gibbs %K Amino Acid Sequence %K Animals %K Base Sequence %K Carbon-Nitrogen Lyases %K Chromosome Mapping %K Chromosomes, Human, Pair 3 %K Consensus Sequence %K DNA, Complementary %K Exons %K Female %K Gene Library %K Humans %K Male %K Mice %K Molecular Sequence Data %K Organ Specificity %K Ovary %K Polymerase Chain Reaction %K Recombinant Proteins %K Saccharomyces cerevisiae %K Sequence Alignment %K Sequence Homology, Amino Acid %K Substrate Specificity %K Testis %K Ubiquitins %X

A novel Isopeptidase T gene (ISOT-3) has been identified on human mosome 3q26.2--q26.3. gene shows 67.3% nucleotide identity and 54.8% amino acid identity to n Isopeptidase (ISOT-1). Northern blot analysis has shown that ISOT-3 is highly essed in ovary and testes, low-level expression in six other tissues tested. In contrast, ISOT-1 is essed at high levels in brain, and there is no detectable expression in ovary. The exonic nization of these two genes highly conserved with only one variant intron position. Intron 15 in -3 is absent in ISOT-1, there is an alternate splice site at the same location. Although the --intron structure has been erved between the two genes, ISOT-3 has significantly larger intronic ons, and the overall of this gene is at least 90 kb compared to 15 kb for ISOT-1. These data suggest that both ISOT-1 and ISOT-3 have descended from a common ancestor. In addition, the low overall sequence identity and different expression patterns may reflect differences in substrate specificity.

%B Gene %V 217 %P 101-6 %8 1998 Sep 14 %G eng %N 1-2 %1 https://www.ncbi.nlm.nih.gov/pubmed/9841226?dopt=Abstract %R 10.1016/s0378-1119(98)00341-2 %0 Journal Article %J Trends Genet %D 1997 %T Hares and tortoises in the race to sequence the human genome: expectations and realities. %A Richard A Gibbs %K Human Genome Project %K Humans %K Polymerase Chain Reaction %K Sequence Alignment %K Sequence Analysis, DNA %K Sequence Tagged Sites %K Software %B Trends Genet %V 13 %P 381-3 %8 1997 Oct %G eng %N 10 %1 https://www.ncbi.nlm.nih.gov/pubmed/9351336?dopt=Abstract %R 10.1016/s0168-9525(97)01267-5 %0 Journal Article %J Genome Res %D 1997 %T Large-scale comparative sequence analysis of the human and murine Bruton's tyrosine kinase loci reveals conserved regulatory domains. %A Oeltjen, J C %A Malley, T M %A Donna M Muzny %A Miller, W %A Richard A Gibbs %A Belmont, J W %K Agammaglobulinaemia Tyrosine Kinase %K alpha-Galactosidase %K Animals %K Base Sequence %K Conserved Sequence %K Enhancer Elements, Genetic %K Genetic Variation %K Humans %K Mice %K Models, Genetic %K Molecular Sequence Data %K Promoter Regions, Genetic %K Protein-Tyrosine Kinases %K Recombinant Proteins %K Regulatory Sequences, Nucleic Acid %K Repetitive Sequences, Nucleic Acid %K Sequence Alignment %K Sequence Analysis, DNA %K Sequence Homology, Nucleic Acid %K Transcription Factors %K Transcription, Genetic %K Transfection %X

Large-scale genomic DNA sequencing of orthologous and paralogous loci in different species should contribute to a basic understanding of the evolution of both the protein-coding regions and noncoding regulatory elements. We compared 93 kb of human sequence to 89 kb of mouse sequence in the Bruton's tyrosine kinase (BTK) region. In addition to showing the conservation of both position and orientation of the five functionally unrelated genes in the region (BTK, alpha-D-galactosidase A, L44L, FTP-3, and FCI-12), the comparison revealed conservation of clusters of noncoding sequence flanking the first exon of each gene. Furthermore, in the sequence comparison at the BTK locus, the conservation of clusters of noncoding sequence extends throughout the locus; the noncoding sequence is more highly conserved in the BTK locus in comparison to the flanking loci. This suggests a correlation with the complex developmental regulation of expression of btk. To determine whether a highly conserved 3.5-kb segment flanking the first exon of BTK contains transcriptional regulatory signals, we tested various portions of the segment for promoter and expression activity in several appropriate cell lines. The results demonstrate the contribution of the conserved region flanking the first exon to the cell lineage-specific expression pattern of btk. These data show the usefulness of large scale sequence comparisons to focus investigation on regions of noncoding sequence that play essential roles in complex gene regulation.

%B Genome Res %V 7 %P 315-29 %8 1997 Apr %G eng %N 4 %1 https://www.ncbi.nlm.nih.gov/pubmed/9110171?dopt=Abstract %R 10.1101/gr.7.4.315 %0 Journal Article %J Genome Res %D 1995 %T 130 kb of DNA sequence reveals two new genes and a regional duplication distal to the human iduronate-2-sulfate sulfatase locus. %A Timms, K M %A Lu, F %A Shen, Y %A Pierson, C A %A Donna M Muzny %A Gu, Y %A Nelson, D L %A Richard A Gibbs %K Base Sequence %K Chromosome Walking %K Chromosomes, Artificial, Yeast %K Cosmids %K Gene Expression %K Genes %K Genetic Markers %K Humans %K Iduronate Sulfatase %K Male %K Mucopolysaccharidosis II %K Multigene Family %K Polymerase Chain Reaction %K Pseudogenes %K Repetitive Sequences, Nucleic Acid %K RNA, Messenger %K Sequence Alignment %K Transcription, Genetic %K X Chromosome %X

Deficiency of IDs activity results in Hunter Syndrome (mucopolysaccharidosis type II), a fatal X-linked recessive disorder. We report characterization of 28 cosmids around the IDS locus in Xq28. Four overlapping cosmids have been sequenced in their entirety generating a 130-kb contig. These studies show the fine structure of the IDS gene and identify an IDS pseudogene-like structure located 20 kb distal to the active gene. Two novel genes have also been identified in this sequence, and one of these genes is also locally duplicated. Both homologs are expressed, and a number of alternative transcript products have been characterized. The presence of a highly conserved pseudogene-like structure within a larger duplicated region close to the IDS gene has significant implications for the study of mutations at this locus.

%B Genome Res %V 5 %P 71-8 %8 1995 Aug %G eng %N 1 %1 https://www.ncbi.nlm.nih.gov/pubmed/8717057?dopt=Abstract %R 10.1101/gr.5.1.71 %0 Journal Article %J Genome Res %D 1995 %T BEAUTY: an enhanced BLAST-based search tool that integrates multiple biological information resources into sequence similarity search results. %A Worley, K C %A Wiese, B A %A Smith, R F %K Amino Acid Sequence %K Computer Communication Networks %K Databases, Factual %K Information Storage and Retrieval %K Molecular Sequence Data %K Sequence Alignment %K Software %X

BEAUTY (BLAST enhanced alignment utility) is an enhanced version of the NCBI's BLAST data base search tool that facilitates identification of the functions of matched sequences. We have created new data bases of conserved regions and functional domains for protein sequences in NCBI's Entrez data base, and BEAUTY allows this information to be incorporated directly into BLAST search results. A Conserved Regions Data Base, containing the locations of conserved regions within Entrez protein sequences, was constructed by (1) clustering the entire data base into families, (2) aligning each family using our PIMA multiple sequence alignment program, and (3) scanning the multiple alignments to locate the conserved regions within each aligned sequence. A separate Annotated Domains Data Base was constructed by extracting the locations of all annotated domains and sites from sequences represented in the Entrez, PROSITE, BLOCKS, and PRINTS data bases. BEAUTY performs a BLAST search of those Entrez sequences with conserved regions and/or annotated domains. BEAUTY then uses the information from the Conserved Regions and Annotated Domains data bases to generate, for each matched sequence, a schematic display that allows one to directly compare the relative locations of (1) the conserved regions, (2) annotated domains and sites, and (3) the locally aligned regions matched in the BLAST search. In addition, BEAUTY search results include World-Wide Web hypertext links to a number of external data bases that provide a variety of additional types of information on the function of matched sequences. This convenient integration of protein families, conserved regions, annotated domains, alignment displays, and World-Wide Web resources greatly enhances the biological informativeness of sequence similarity searches. BEAUTY searches can be performed remotely on our system using the "BCM Search Launcher" World-Wide Web pages (URL is < http:/ /gc.bcm.tmc.edu:8088/ search-launcher/launcher.html > ).

%B Genome Res %V 5 %P 173-84 %8 1995 Sep %G eng %N 2 %1 https://www.ncbi.nlm.nih.gov/pubmed/9132271?dopt=Abstract %R 10.1101/gr.5.2.173 %0 Journal Article %J J Comput Biol %D 1995 %T Identification of new members of a carbohydrate kinase-encoding gene family. %A Kim C Worley %A King, K Y %A Chua, S %A McCabe, E R %A Smith, R F %K Amino Acid Sequence %K Animals %K Caenorhabditis elegans %K Carbohydrates %K Conserved Sequence %K Databases, Factual %K Genes, Helminth %K Glucokinase %K Glycerol Kinase %K Humans %K Molecular Sequence Data %K Multigene Family %K Phosphotransferases (Alcohol Group Acceptor) %K Phylogeny %K Sequence Alignment %K Sequence Homology, Amino Acid %K Software %X

In a sequence database search using the human glycerol kinase-encoding sequence (HUMGLYKINB) as a query, we identified six previously unidentified carbohydrate kinase sequences. Five of the six newly identified sequences appear to be known types of carbohydrate kinases, four are glycerol kinases and one is a gluconokinase. The sixth newly identified sequence, the Caenorhabditis elegans gene, CER08D7.7-CEF59B2.1, shows similarity to the family of carbohydrate kinases including other glycerol kinases, xylulokinases, gluconokinases, ribulokinases, rhamnulokinases, and fucokinases. A phylogenetic comparison of this newly identified Caenorhabditis elegans gene with the other members of the carbohydrate kinase family demonstrated that this sequence cannot be assigned to one of the known classes of carbohydrate kinases.

%B J Comput Biol %V 2 %P 451-8 %8 1995 Fall %G eng %N 3 %1 https://www.ncbi.nlm.nih.gov/pubmed/8521274?dopt=Abstract %R 10.1089/cmb.1995.2.451