About the Project
Purple sea urchin
The sea urchin has been an important model system for studying modern molecular, evolutionary and cell biology particularly in the realm of developmental biology.
Echinoderms occupy an important evolutionary position with respect to vertebrates and humans: they, along with their sister phylum hemichordates, are the closest known relatives to chordates.
The draft quality genome for the California Purple Sea Urchin (Strongylocentrotus purpuratus) has been sequenced and annotated by the Sea Urchin Genome Sequencing Consortium led by the HGSC (published in Science [PubMed] and Developmental Biology [PubMed].
Currently, several echinoderms are being sequenced for comparative analysis including Strongylcentrotus franciscanus, Allocentrotus fragilis, Patiria miniata, Lytechinus variegates, P. flava, A. punctulata.
This series of evolutionary distances within one clade is not available elsewhere. The comparisons will highlight the cis-regulatory networks and their evolution in these well studied developmental models.
The green urchin (L. variegates) is of an appropriate distance such that putative exons and cis-regulatory sequence regions are identifiable by sequence conservation while most sequence is too divergent to align to S. purpuratus. The goal is to define the genome sequence so that the intergenic sequence on either side of a gene of interest is included in the contig with the gene itself.
The sea star (P. miniata) is more distant to S. purpuratus and will provide evidence for assessing the difference between flexible and inflexible gene regulatory networks in evolution. Other related species included in this project are the “modern” sea urchin (A. punctulata), “primitive” sea urchin (E. tribuloides), the sea star (D. imbricata), and the hemichordate (P. flava).
A. punctulata is the most distantly related euechinoid to S. purpuratus in frequent use as an experimental laboratory organism. D. imbricata would be useful to identify cis-regulatory sequences in P. miniata (similar to the S. purpuratus – L. variegates comparision).
|Spur_v0.3||preliminary assembly of Whole Genome Shotgun (WGS) sequence (Spur20041123)|
|Spur_v0.4||preliminary assembly of WGS sequence (Spur20050323)|
|Spur_v0.5||[NCBI build 1.1] - assembly of WGS sequence (Spur20050415) used for initial annotation and analysis of the genome|
|Spur_v2.0||interim assembly of BAC plus WGS sequence|
|Spur_v2.1||[NCBI build 2.1] - assembly of BAC plus WGS sequence|
The Spur_v0.5 assembly covers 84-97% (EST representation) of the ~800Mbase genome while including a 13% redundancy level. The Spur_v2.1 assembly covers 90-94% of the genome with 5% redundancy. All the above assemblies are available for download using the FTP Data link in the sidebar to the right. Genome browsers for this data are listed in the resource section below.
This project is funded by the National Human Genome Research Institute (NHGRI), National Institutes of Health. The white paper describing this project was developed by Eric Davidson and Andrew Cameron at California Institute of Technology in collaboration with the sea urchin genome advisory group and the BCM-HGSC.
The S. purpuratus was sequenced using the Clone-Array Pooled Shotgun Sequencing (CAPSS) method where shotgun libraries are made from row and column pools of arrayed BACs from an FPC generated tiling path (BC Genome Sciences Center). The deconvoluted individual BAC sequences as well as assemblies enriched with WGS sequences from the BAC region are available in GenBank.
Preliminary genome assemblies, as well as published version Spur_v2.1 [NCBI build 2.1], and the later version Spur_v2.6 with additional scaffolding, are available from the NCBI and the HGSC ftp site.
||Distance to S. purpuratus||cDNA||Genome Assembly|
||Spur_v2.1 - WGS + BAC draft assembly Spur_v2.5 – with SOLiD scaffolding data
|Allocentrotus fragilis||Planned comparative Illumina for indels|
|Lytechinus variegates||50 mya
|Eucidaris tribuloides||250 mya||Planned comparative Illumina for indels|
|Dermasterias inbricata||Near P. miniata|
Sequence reads are available in the NCBI Trace and Sequence Read archives. Species names in the table are linked to the NCBI Taxonomy pages where there are links to the read data
and assembled genome and transcript sequences when available.
The genome assembly, Spur_2.1 is available for download as linearized scaffolds and as individual contigs files. The scaffolds are not placed on chromosomes.
The genome assembly, Spur 2.1 and annotated features including gene predictions and curated gene models are available for browsing and download via the Genboree Sea Urchin site by using the link in the sidebar. The annotation database can also be queried directly.
Traces are available from the NCBI Trace Archive by using the link in the sidebar or by using NCBI MegaBLAST with a same species or cross species query.
BAC-based Data Resources
Individual BAC assemblies are available in GenBank as enriched BAC assemblies.