Download presentation
Presentation is loading. Please wait.
1
Ensembl Database and Web Browser www.ensembl.org
Erin Pleasance February 21, 2005 Ensembl Database and Web Browser Stephen Baird Apoptosis Research Centre Children’s Hospital of Eastern Ontario Lecture/Lab 7.3 (c) 2005 CGDN
2
Focus on vertebrates No fungi/plants
Erin Pleasance February 21, 2005 Focus on vertebrates No fungi/plants Brassica/Arabidopsis genome browser is at Lecture 7.1 (c) 2005 CGDN
3
What is Ensembl? Joint project of EBI and Sanger
Erin Pleasance What is Ensembl? February 21, 2005 Joint project of EBI and Sanger Automated annotation of eukaryotic genomes Open source software Relational database system Web interface “The main aim of this campaign is to encourage scientists across the world - in academia, pharmaceutical companies, and the biotechnology and computer industries - to use this free information.” - Dr. Mike Dexter, Director of the Wellcome Trust Lecture/Lab 7.3 (c) 2005 CGDN
4
Ensembl components Search tools: Data: Chromosomes SNPs and Haplotypes
Erin Pleasance Ensembl components February 21, 2005 Search tools: Data: Chromosomes (FeatureView, KaryoView, Ctyoview, MapView) SNPs and Haplotypes (SNPView, GeneSNPView, HaploView, LDView) Sequence Similarity (BLAST, SSAHA) Diseases (DiseaseView) Genome Sequence (ContigView) Genes (GeneView, TransView, ExonView, GeneSeqView) Markers (MarkerView) Functions (GOView) Text (TextView) Other Annotations Protein (ProtView, DomainView, FamilyView Anything (BioMart/Martview) Comparative Genomics (ContigView, MultiContigView, SyntenyView, GeneView) Lecture/Lab 7.3 (c) 2005 CGDN
5
Ensembl Gene Annotation
Erin Pleasance Ensembl Gene Annotation February 21, 2005 “Basis for initial analysis and publication of most vertebrate genomes” Genome assembly from NCBI Gene build system Targeted gene builds predict known genes Similarity gene builds predict novel genes Lecture/Lab 7.3 (c) 2005 CGDN
6
Curwen et al, Genome Res 14: 942-950, 2004
Erin Pleasance February 21, 2005 Curwen et al, Genome Res 14: , 2004 Lecture/Lab 7.3 (c) 2005 CGDN
7
Targeted gene build Align known proteins with pmatch and BLAST
Erin Pleasance Targeted gene build February 21, 2005 Align known proteins with pmatch and BLAST Incorporate aligned cDNA sequences to find splice sites, UTRs with genewise ContigView of best in genome gene with associated evidence UTRs predicted Known gene (p53) Proteins aligned Unigene clusters aligned cDNAs aligned Lecture/Lab 7.3 (c) 2005 CGDN
8
Similarity gene build Identify novel exons ab initio using Genscan
Erin Pleasance Similarity gene build February 21, 2005 Identify novel exons ab initio using Genscan Confirm exons by BLAST to known proteins, mRNAs, UniGene clusters ContigView of homology gene with associated evidence Novel gene GenScan predictions Proteins aligned Unigene clusters aligned Lecture/Lab 7.3 (c) 2005 CGDN
9
Ensembl Gene Annotation
Erin Pleasance Ensembl Gene Annotation February 21, 2005 Resulting “Ensembl genes” are highly accurate with low false positive rates Ensembl human gene identifiers are 95% stable between builds Ensembl and RefSeq differ with % of the genes The Consensus CDS (CCDS) project is a collaborative effort between Ensembl/EBI, UCSC and NCBI to identify a core set of human protein coding regions that are consistently annotated and of high quality (~13,000 genes). Lecture/Lab 7.3 (c) 2005 CGDN
10
Manually curated genes: VEGA
Erin Pleasance February 21, 2005 Some chromosomes contain manually curated genes from VEGA database “Otter manual annotation system” allows integration of automatic and manual annotations (eg. from Apollo) into Ensembl by The Human and Vertebrate Annotation (HAVANA) group annotators at the Sanger center VEGA gene Lecture/Lab 7.3 (c) 2005 CGDN
11
Ensembl EST genes Erin Pleasance February 21, 2005 ESTs not accurate enough to produce Ensembl genes, but important for identifying alternative transcripts ESTs aligned to genome and merged to create an independent set of “EST genes” Known gene EST genes Unigene clusters aligned Lecture/Lab 7.3 (c) 2005 CGDN
12
Erin Pleasance Pseudogenes February 21, 2005 Processed pseudogenes in annotation identified (lack of introns, frameshifts, presence of multi-exon version elsewhere in genome, etc.) Pseudogene Lecture/Lab 7.3 (c) 2005 CGDN
13
Erin Pleasance Noncoding RNA Genes February 21, 2005 Genes with no ORFs that are functional (tRNAs, rRNAs, miRNAs …) 7220 annotations from Sean Eddy and Tom Jones miRNAs Coding gene Lecture/Lab 7.3 (c) 2005 CGDN
14
Example 1: Exploring Caspase-3
Erin Pleasance Example 1: Exploring Caspase-3 February 21, 2005 Aim to demonstrate basic browsing and views Caspase-3 is a gene involved in apoptosis (cell suicide) We will look at: Gene annotation SNPs Orthologs and genome alignments Alternative transcripts and EST genes Protein Structure Lecture/Lab 7.3 (c) 2005 CGDN
15
Text Search Species-specific homepage caspase-3 Gene Erin Pleasance
February 21, 2005 Gene caspase-3 Lecture/Lab 7.3 (c) 2005 CGDN
16
GeneView GeneSpliceView GeneRegulationView ContigView GeneSNPView
Erin Pleasance February 21, 2005 GeneSpliceView GeneRegulationView ContigView GeneSNPView TransView of transcript ExonView ExportView ProteinView Orthologs predicted by sequence similarity and synteny Lecture/Lab 7.3 (c) 2005 CGDN
17
GeneView DAS - Distributed Annotation System
Erin Pleasance GeneView February 21, 2005 DAS - Distributed Annotation System - external annotation of splicing, transcripts, array expression, pubmed links, associated phenotypes, Protonet, Reactome, UniProt. Information for each Transcript - similarity matches, links to RefSeq, OMIM, PDB, Array probes, GO, InterPro, Protein FamilyView, transcript structure, protein properties. Lecture/Lab 7.3 (c) 2005 CGDN
18
GeneView GeneSNPView Erin Pleasance February 21, 2005 Lecture/Lab 7.3
(c) 2005 CGDN
19
GeneSNPView Erin Pleasance February 21, 2005 Lecture/Lab 7.3
(c) 2005 CGDN
20
Other SNP/Haplotype tools
Erin Pleasance Other SNP/Haplotype tools February 21, 2005 SNPView – info on a single SNP ProteinView (protein sequence with SNP markup) LDView: View linkage disequilibrium (only limited regions) HaploView: View haplotypes (only limited regions) Lecture/Lab 7.3 (c) 2005 CGDN
21
GeneView Click Back to Erin Pleasance February 21, 2005
Lecture/Lab 7.3 (c) 2005 CGDN
22
ContigView Chromosome and bands Sequence contigs To Detailed view
Erin Pleasance February 21, 2005 Chromosome and bands Sequence contigs To Detailed view Lecture/Lab 7.3 (c) 2005 CGDN
23
ContigView: Detailed View
Erin Pleasance February 21, 2005 See other tracks, options in menus Genscan predictions Targetted gene predictions (2 alternative transcripts) Gene annotations EST genes Other tracks: Aligned sequences etc. Base View Region Lecture/Lab 7.3 (c) 2005 CGDN
24
ContigView- Features menu
Erin Pleasance February 21, 2005 Export image (ps, pdf, svg) or fasta file Click on ‘close menu’ Lecture/Lab 7.3 (c) 2005 CGDN
25
MultiContigView Conserved regions Rat ortholog Lecture/Lab 7.3
26
Other Comparative Genomics Tools
Up to 6 genome alignments with MLAGAN in AlignSliceView Other view is SyntenyView Also access comparative genomics through EnsMart Lecture/Lab 7.3
27
DAS-Distributed Annotation System
Lecture/Lab 7.3
28
Data Mining with BioMart
Allows very fast, cross-data source querying Search for genes (features, sequences, etc.) or SNPs based on Position; function; domains; similarity; expression; etc. Accessible from Ensembl website (MartView) as well as stand-alone Extremely powerful for data mining Lecture/Lab 7.3
29
Erin Pleasance Example 2: BioMart February 21, 2005 A new disease locus has been mapped between markers D21S1991 and D21S171. It may be that the gene involved has already been identified as having a role in another disease. What candidates are in this region? Lecture/Lab 7.3 (c) 2005 CGDN
30
BioMart: Choosing your dataset
Lecture/Lab 7.3
31
BioMart: Filtering 21 D21S1991 D21S171 Lecture/Lab 7.3
32
Note you can output different types of information
BioMart: Output Note you can output different types of information Lecture/Lab 7.3
33
BioMart: Output Lecture/Lab 7.3
34
Sequence Similarity Searching
Use SSAHA for exact matches (fast) Use BLAST for more distant similarity (slow) Lecture/Lab 7.3
35
Looking for Help? Lecture/Lab 7.3
36
DAS: Getting your Own Data in Ensembl
DAS (Distributed Annotation System) Anyone can load data into Ensembl and allow others to view it in the same view (eg. ContigView) as other Ensembl annotations Click on ‘Manage sources’ in DAS dropdown menu Lecture/Lab 7.3
37
Other Ways to Access Ensembl
MySQL database directly accessible APIs for Perl and Java Other software Apollo Java genome annotation viewer/editor Sockeye Java viewer You can get your own local version of Ensembl: software and data freely available Sockeye Lecture/Lab 7.3
38
Exercises Erin Pleasance February 21, 2005 Ex 1. Homologues of human genes are often present in Fugu rubripes in more condensed form (with shorter introns). Is this true for the gene PTEN, a tumor suppressor often mutated in advanced cancers? Try MultiContigView; can you think of another way to get this information as well? Ex 2. The microRNA bantam regulates the Drosophila (fruitfly) gene hid by binding the 3’ UTR. Hid is involved in apoptosis, and it is possible that binding sites for bantam could be found in the 3’ UTR of other apoptosis genes as well. Obtain the 3’ UTR sequence of all Drosophila genes known to be involved in apoptosis. Using BioMart, the GO term for apoptosis is GO: , evidence code TAS Ex 3. The file “PCR_product.txt” on the webserver contains the sequence of a PCR product amplified from a mouse cDNA library. What gene does the product correspond to? Does it contain the complete coding sequence of that gene? Would it be better to use BLAST or SSAHA? Lecture/Lab 7.3 (c) 2005 CGDN
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.