Human genome
Image source: Webridge (revised from) [CC-BY-2.0], via Wikimedia Commons


Adenine (A)
A nitrogenous base, one member of the base pair AT (adeninethymine).
Alternative form of a genetic locus; a single allele for each locus is inherited separately from each parent (e.g., at a locus for eye color the allele might result in blue or brown eyes).
Amino acid
Any of a class of 20 molecules that are combined to form proteins in living things. The sequence of amino acids in a protein and hence protein function are determined by the genetic code.
Refers to the production of additional copies of a specific DNA fragment; can be in vivo or in vitro. See cloning, polymerase chain reaction.
Is the pairing of complementary single strands of DNA to form a double helix.
Anticoding strand
The strand of DNA that is used as a template to direct the synthesis of RNA that is complementary to it.
Strands of the double helix are organized in opposite orientation, so that the 5' end of one strand is aligned with the 3' end of the other strand.
Arrayed library
Individual primary recombinant clones (hosted in phage, cosmid, YAC, or other vector) that are placed in two-dimensional arrays in microtiter dishes. Each primary clone can be identified by the identity of the plate and the clone location (row and column) on that plate. Arrayed libraries of clones can be used for many applications, including screening for a specific gene or genomic region of interest as well as for physical mapping. Information gathered on individual clones from various genetic linkage and physical map analyses is entered into a relational database and used to construct physical and genetic linkage maps simultaneously; clone identifiers serve to interrelate the multilevel maps. Compare library, genomic library.
All the chromosomes except the sex chromosomes; a diploid cell has two copies of each autosome.
Bacterial artificial chromosome (BAC)
A vector used to clone DNA fragments (100- to 300-kb insert size; average, 150 kb) in Escherichia coli cells. Based on naturally occurring F-factor plasmid found in the bacterium E. coli. Compare cloning vector.
A, T, G or C.
Base pair (bp)
Is a partnership of A with T or of C with G in a DNA double helix; other pairs can be formed in RNA under certain circumstances distance along DNA is measured in base pairs.
A set of biological techniques developed through basic research and now applied to research and product development. In particular, biotechnology refers to the use by industry or recombinant DNA, cell fusion, and new processing techniques.
cDNA clone
A duplex DNA sequence representing an RNA, carried in a cloning vector.
The fundamental microscopic unit of which all living things except viruses are composed.
A discrete unit of the genome carrying many genes. Each chromosome consists of a very long molecule of duplex DNA and an approximately equal mass of proteins. It is visible as a morphological entity only during cell division.
Describes a large number of cells or molecules derived from a single ancestral cell or molecule.
Using specialized DNA technology to produce multiple, exact copies of a single gene or other segment of DNA to obtain enough material for further study. This process, used by researchers in the Human Genome Project, is referred to as cloning DNA. The resulting cloned (copied) collections of DNA molecules are called clone libraries.
Cloning vector
A plasmid or phage that is used to "carry" or propagate inserted foreign DNA for the purposes of producing more material or a protein product. Examples are plasmids, cosmids, and yeast artificial chromosomes; vectors are often recombinant molecules containing DNA sequences from several sources.
Coding strand
The strand of DNA that has the same sequence as mRNA.
A triplet of nucleotides that represents an amino acid or a termination signal.
Complimentary DNA (cDNA)
A single-stranded DNA complementary to an RNA, synthesized from it by reverse transcription in vitro.
Consensus sequence
An idealized sequence in which each position represents the base most often found when many actual sequences are compared.
Group of clones representing overlapping regions of a genome.
Plasmids into which phage lambda cos sites have been inserted; as a result, the plasmid DNA can be packaged in vitro in the phage coat.
Cytosine (C)
A nitrogenous base, one member of the base pair GC (guanine and cytosine).
Is an enzyme that cuts bonds between DNA bases.
DNA polymerase
Is an enzyme that synthesizes a daughter strand(s) of DNA (under direction from a DNA template). May be involved in repair or replication.
DNA replicase
Is a DNA-synthesizing enzyme required specifically for replication.
Refers to 64 possible combinations of triplets, 20 amino acids, 3 stop codons which lead to redundancy. This causes the lack of an effect of many changes in the third base of the codon on the amino acid that is represented.
Describes the conversion of DNA from the double-stranded to the single-stranded state; separation of the strands is most often accomplished by heating.
Deoxyribonucleic acid (DNA)
The molecule that encodes genetic information. DNA is a double stranded molecule held together by weak bonds between base pairs of nucleotides. The four nucleotides in DNA contain the bases: adenine (A), guanine (G), cytosine (C), and thymine (T). In nature, base pairs form only between A and T and between G and C; thus the base sequence of each single strand can be deduced from that of its partner.
The percent difference in nucleotide sequence between two related DNA sequences or in amino acid sequences between two proteins.
Draft Sequence
The sequence generated by the HGP project as of June 2000 that, while incomplete, offers a virtual road map to an estimated 95% of all human genes.
A method of separating large molecules (such as DNA fragments or proteins) from a mixture of similar molecules. An electric current is passed through a medium containing the mixture, and each kind of molecule travels through the medium at a different rate, depending on its electrical charge and size. Separation is based on these differences. Agarose and acrylamide gels are the media commonly used for electrophoresis of proteins and nucleic acids.
Cleave bonds within a nucleic acid chain; they may be specific for RNA or for single-stranded or double-stranded DNA.
Cell or organism with membrane bound, structurally discrete nucleus and other well developed sub-cellular compartments. Eukaryotes include all organisms except viruses, bacteria, and blue-green algae. Compare prokaryote. See chromosome.
The protein coding DNA sequence of a gene. Compare intron.
Cleave nucleotides one at a time from the end of a polynucleotide chain; they may be specific for either the 5' or 3' end of DNA or RNA.
Expression vector
A cloning vector designed so that a coding sequence inserted at a particular site will be transcribed and translated into protein.
Of DNA is a pattern of restriction fragments often used to determine the relatedness of two pieces of DNA.
A technique for identifying the site on DNA bound by some protein by virtue of the protection of bonds in this region against attack by nucleases.
Gene (cistron)
The fundamental physical and functional unit of heredity. A gene is an ordered sequence of nucleotides located in a particular position on a particular chromosome that encodes a specific functional product (i.e., a protein or RNA molecule). It is also know as a and it includes regions preceding and following the coding region (leader and trailer) as well as intervening sequences (introns) between individual coding segments (exons).
Gene Mapping
Determination of the relative positions of genes on a DNA molecule, and of the distance, in linkage units or physical units between them.
Genetic code
The sequence of nucleotides, coded in triplets (codons) along the mRNA, that determines the sequence of amino acids in protein synthesis. The DNA sequence of a gene can be used to predict the mRNA sequence, and the genetic code can in turn be used to predict the amino acid sequence.
The scientific study of heredity.
All the genetic material in the chromosomes of a particular organism; its size is generally given as its total number of base pairs. (e.g. the human genome contains an estimated 3.3 billion base pairs)
Genomic library
A collection of clones made from a set of randomly generated overlapping DNA fragments representing the entire genome of an organism. Compare library, arrayed library.
The study of genes and their functions.
The genetic constitution of an organism. (see also phenotype).
Guanine (G)
A nitrogenous base, one member of the base pair GC (guanine and cytosine).
Evolutionarily significant similarity in DNA or protein sequences between individuals of the same species or among different species.
The process of joining two complementary strands of DNA or one each of DNA and RNA to form a double-stranded molecule.
The acquisition by an eukaryotic cell line of the ability to grow through an indefinite number of divisions in culture.
In vitro
Outside a living organism.
The study of the application of computer and statistical techniques to the management of information. In genome projects, informatics includes the development of methods to search databases quickly, to analyze DNA sequence information, and to predict protein sequence and structure from DNA sequence data.
The DNA base sequence interrupting the protein coding sequence of a gene; this sequence is transcribed into RNA but is cut out of the message before it is translated into protein. Compare exon.
kilobase (kb)
An abbreviation for 1000 base pairs of DNA or 1000 bases of RNA.
Lagging strand
Of DNA must grow overall in the 3'-5' direction and is synthesized discontinuously in the form of short fragments (5'-3') that are later connected covalently.
Leading strand
Of DNA is synthesized continuously in the 5'-3' direction.
(1) a fragment of known size used to calibrate an electrophoretic gel. (2) an identifiable physical location on a chromosome (e.g., restriction enzyme cutting site, gene) whose inheritance can be monitored. Markers can be expressed regions of DNA (genes) or some segment of DNA with no known coding function but whose pattern of inheritance can be determined.
Unit of length for DNA fragments equal to 1 million nucleotides and roughly equal to 1 cM.
Refers to the denaturation of DNA.
Melting temperature (Tm)
The midpoint of the temperature at which 1/2 of the DNA molecules are denatured and 1/2 are annealed.
Messenger RNA (mRNA)
RNA that serves as a template for protein synthesis. See genetic code.
Molecular Medicine
The treatment of injury or disease at the molecular level. Examples include the use of DNA-based diagnostics tests or medicine derived from DNA sequence information.
Nonsense codon
Any one of three triplets (UAG, UAA, UGA) that cause termination of protein synthesis. (UAG is known as amber UAA as ochre.)
Nonsense mutation
Any change in DNA that causes a (termination) codon to replace a codon representing an amino acid.
A subunit of DNA or RNA consisting of a nitrogenous base (adenine, guanine, thymine, or cytosine in DNA; adenine, guanine, uracil, or cytosine in RNA), a phosphate molecule, and a sugar molecule (deoxyribose in DNA and ribose in RNA). Thousands of nucleotides are linked to form a DNA or RNA molecule. See DNA, base pair, RNA.
Open reading frame (ORF)
Contains a series of triplets coding for amino acids without any termination codons; sequence is (potentially) translatable into protein.
Origin (ori)
A sequence of DNA at which replication is initiated.
P1-derived artificial chromosome (PAC)
A vector used to clone DNA fragments (100- to 300-kb insert size; average, 150 kb) in Escherichia coli cells. Based on bacteriophage (a virus) P1 genome.
PCR (polymerase chain reaction)
Describes a technique in which cycles of denaturation, annealing with primer, and extension with DNA polymerase, are used to amplify the number of copies of a target DNA sequence by>106 times.
Phage (bacteriophage)
A virus whose host is bacteria.
The study of interaction of an individual's genetic makeup and response to a drug.
The appearance or other characteristics of an organism, resulting from the interaction of its genetic constitution with the environment.
Physical map
A map of the locations of identifiable landmarks on DNA (e.g., restriction enzyme cutting sites, genes), regardless of inheritance. Distance is measured in base pairs. For the human genome, the lowest-resolution physical map is the banding patterns on the 24 different chromosomes; the highest resolution map would be the complete nucleotide sequence of the chromosomes.
An extrachromosomal circular DNA.
Point mutations
Changes involving single base pairs.
Refers to the effect of a mutation in one gene in influencing the expression (at transcription or translation) of subsequent genes in the same transcription unit.
The addition of a sequence of polyadenylic acid to the 3' end of an eukaryotic RNA after its transcription.
Enzymes that catalyze the synthesis of nucleic acids on preexisting nucleic acid templates, assembling RNA from ribonucleotides or DNA from deoxyribonucleotides.
Prep Tray
A multiwelled tray, currently 96 well, that contains a set of templates. This set almost always represents a set of related clones from the same project and library, given a projectnumber, which are numbered sequentially.
Short preexisting polynucleotide chain to which new deoxyribonucleotides can be added by DNA polymerase.
Prokaryotic organisms (bacteria)
Lack nuclei.
A region of DNA involved in binding of RNA polymerase to initiate transcription.
Refers to any mechanism for correcting errors in protein or nucleic acid synthesis that involves scrutiny of individual units after they have been added to the chain.
A phage genome covalently integrated as a linear part of the bacterial chromosome.
A large molecule composed of one or more chains of amino acids in a specific order; the order is determined by the base sequence of nucleotides in the gene that codes for protein. Proteins are required for the structure, function, and regulation of the body's cells, tissues, and organs; and each protein has unique functions. Examples: Hormones, enzymes, and antibodies.
Recessive allele
Is obscured in the phenotype of a heterozygote by the dominant allele, often due to inactivity or absence of the product of the recessive allele.
Release (termination)
Factors that respond to termination codons to cause release of the completed polypeptide chain and the ribosome from mRNA.
A unit of the genome in which DNA is replicated; contains an origin for initiation of replication.
The multiprotein structure that assembles at the bacterial replicating fork to undertake synthesis of DNA. Contains DNA polymerase and other enzymes.
Reporter gene
A coding unit whose product is easily assayed (such as chloramphenicol transacetylase); it may be connected to any promoter of interest so that expression of the gene can be used to assay promoter function.
Restriction enzymes
A protein that can recognize specific short sequences of (usually) unmethylated DNA and cuts DNA at those sites
Restriction enzyme cutting site
A specific nucleotide sequence of DNA at which a particular restriction enzyme cuts the DNA. Some sites occur frequently in DNA (e.g., every several hundred base pairs), others much less frequently (rarecutter; e.g., every 10,000 base pairs).
Restriction fragment length polymorphism (RFLP)
Refers to inherited differences in sites for restriction enzymes between two related DNA sequences (for example, caused by base changes in the target site) that result in differences in the lengths of the fragments produced by cleavage with the relevant restriction enzyme. RFLPs are used for genetic mapping to link the genome directly to a conventional genetic marker.
Restriction map
A linear array of sites on DNA cleaved by various restriction enzymes.
Reverse transcription
Synthesis of DNA on a template of RNA; accomplished by reverse transcriptase enzyme.
Describes the use of particular conditions to allow survival only of cells with a particular phenotype.
Semiconservative replication
Accomplished by separation of the strands of a parental duplex, each then acting as a template for synthesis of a complementary strand.
Semidiscontinuous replication
Mode in which one new strand is synthesized continuously while the other is synthesized discontinuously.
Seqreact (v); Seqreaction (n)
The part of the pipeline where a DNA template, normally stored on a prep tray, is mixed with cocktail (primer plus die labelled nucleotides) and DNA Polymerase to generate the set of variable length DNA fragments where fragments ending in different nuleotides (A, T, G, C) are labelled with different colored dyes. N.B.: do not confuse the noun form of this definition, seqreaction, with the table of the same name: barcode.seqreaction. Data for some seqreactions is stored in this table, but data for others is not, but rather is stored in other tables, currently barcode.experiments+barcode.seqtray, and barcode.sampleset+barcode.sequencing+barcode.tray.
Sequence tagged site (STS)
Short (200 to 500 base pairs) DNA sequence that has a single occurrence in the human genome and whose location and base sequence are known. Detectable by polymerase chain reaction, STSs are useful for localizing and orienting the mapping and sequence data reported from many different laboratories and serve as landmarks on the developing physical map of the human genome. Expressed sequence tags (ESTs) are STSs derived from cDNAs.
Determination of the order of nucleotides (base sequences) in a DNA or RNA molecule or the order of amino acids in a protein.
Sex chromosomes
Those whose contents are different in the two sexes usually labeled X and Y (or W and Z), one sex has XX (or WW), the other sex has XY (or WZ).
Shotgun experiment
Cloning of an entire genome in the form of randomly generated fragments.
Shuttle vector
A plasmid constructed to have origins for replication for two hosts (for example, E. coli and S. cerevisiae) so that it can be used to carry a foreign sequence in either prokaryotes or eukaryotes.
Silent mutations
Mutations that do not change the product of a gene.
Southern blotting
Describes the procedure for transferring denatured DNA from an agarose gel to a nitrocellulose filter where it can be hybridized with a complementary nucleic acid.
Describes the removal of introns and joining of exons in RNA; thus introns are spliced out, while exons are spliced together.
Sticky ends
Complementary single strands of DNA that protrude from opposite ends of a duplex or from ends of different duplex molecules; can be generated by staggered cuts in duplex DNA.
Stop codons
The three triplets (UAA, UAG, UGA) which terminate protein synthesis.
TATA box
A conserved A· T-rich septamer found about 25 bp before the startpoint of each eukaryotic RNA polymerase II transcription unit; may be involved in positioning the enzyme for correct initiation.
The abbreviation for melting temperature.
A purified DNA preparation ready for seqreaction.
Termination codon
One of three triplet sequences, UAG (amber), UAA (ochre), or UGA that cause termination of protein synthesis; they are also called "nonsense" codons.
Thymine (T)
A nitrogenous base, one member of the base pair AT (adeninethymine).
The synthesis of mRNA from a DNA template.
A process by which the genetic material carried by an individual cell is altered by incorporation of exogenous DNA into its genome.
The synthesis of protein on the mRNA template.
A nitrogenous base normally found in RNA but not DNA; uracil is capable of forming a base pair with adenine.
Wobble hypothesis
Accounts for the ability of a tRNA to recognize more than one codon by unusual (non-G· C, A· T) pairing with the third base of a codon.
Yeast artificial chromosome (YAC)
A vector used to clone DNA fragments (up to 400 kb); it is constructed from the telomeric, centromeric, and replication origin sequences needed for replication in yeast cells. Compare cloning vector.