Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ensembl Genome Repository.

Similar presentations


Presentation on theme: "Ensembl Genome Repository."— Presentation transcript:

1 Ensembl Genome Repository

2 Main Data Repositories
Ensembl- BLAST or BLAT UCSC - BLAT NCBI (Entrez) - BLAST Ensembl, NCBI, and UCSC use the same human genome assembly that is generated by NCBI UCSC display but release timing is different between sites Why use Ensembl

3 Ensembl Provide automatic annotation of sequenced genomes
Integrate with biological data Make available from web Genome Browser Web interface BioMart Direct database access Perl API

4 Outline Where the data comes from Questions that can be answered

5 Ensembl Genomes Over 50 genomes

6 Genome Annotation Identify elements on the genome
Attach biological information to the elements Automatic annotation and curation Vega/Havana

7 Annotation Addition of positional, functional, regulatory and evolutionary datasets to a raw assembled genome. Genes, exon-intron boundaries, protein products, miRNAs, alternative splicing, transcriptional start sites, expression,orthologs, paralogs, repeats, structural features, syntenic relationships, ChIP-chip data ... Based on experimental data and computational predictions. CDS definition mix of experimental and computationally derived sources

8 Genebuild Align species-specific proteins to the genome to create CDS models (targeted build) Align proteins from closely related species to locate additional CDS models (similarity build) Add UTRs using cDNA/EST evidence and ditag data Cluster transcripts into genes Classify transcripts Name genes A new genebuild for a species is only done if there is a new genome assembly or lots of new supporting evidence

9 Human/Mouse Genebuild
additional steps not included in the standard Ensembl build. For both species, transcripts from the Consensus Coding Sequence (CCDS) set are imported directly and not altered by the genebuild process. In addition, where manual curation is available for a transcript, the Ensembl and HAVANA transcript models are compared. The Ensembl and HAVANA models are merged when they agree on the same coding sequence

10 Ensembl Identifiers ENS_Species_Type_00000_ID
Species: blank for human for all other species a three letter code (MUS - mouse) Type: G (gene), T (transcript), P (protein) ID: six-digit number ENSMUST ENSMUSP ENSMUSG

11 Ensembl Organization Views designed into four classes Gene Transcript
Location (Genome Browser) Variation Not covering What’s the conservation track? How do I zoom in and change the gene focus. Un-stacking a track (e.g. human cDNAs) Adding a track (i.e. variations)

12 Questions Are there splice variants?
How do I find orthologs and paralogs? Are there variations in the genomic sequence? How can I download different parts of the mRNA sequence? What protein domains exist? Gene Ontology Can I download sets of data (DNA, cDNA, protein) for a species? BioMart question

13 Resources Ensembl Tutorials Ensembl 2009 Nucleic Acids Research PMID: Bert Overduin, Ph.D. Ensembl


Download ppt "Ensembl Genome Repository."

Similar presentations


Ads by Google