Lecture/Lab
Lecture/Lab 7.32 Ensembl Database and Web Browser Erin Pleasance Canada’s Michael Smith Genome Sciences Centre, Vancouver
Lecture
Lecture/Lab 7.34 What is Ensembl? Joint project of EBI and Sanger Automated annotation of eukaryotic genomes Open source software Relational database system Web interface “ The main aim of this campaign is to encourage scientists across the world - in academia, pharmaceutical companies, and the biotechnology and computer industries - to use this free information.” - Dr. Mike Dexter, Director of the Wellcome Trust
Lecture/Lab 7.35 Ensembl components Genes (GeneView, TransView, ExonView, ProtView) SNPs and Haplotypes (SNPView, GeneSNPView, HaploView, LDView) Diseases (DiseaseView) Markers (MarkerView) Families (DomainView, FamilyView Functions (GOView) Genome Sequence (ContigView) Comparative Genomics (ContigView, MultiContigView, SyntenyView, GeneView) Search tools: Data: Sequence Similarity (BLAST, SSAHA) Text (TextView) Anything (EnsMart) Other Annotations Chromosomes (ChromoView, KaryoView, CytoView, MapView)
Lecture/Lab 7.36 Species in Ensembl Focus on vertebrates No fungi/plants Arabidopsis genome browser based on Ensembl at Mammals: Human Chimp Mouse Rat Dog Cow Opossum Insects: Fruitfly Mosquito Honeybee Fish: Zebrafish Fugu Pufferfish Tetraodon Pufferfish Other: Chicken Frog Other: Nematode VertebratesInvertebrates
Lecture/Lab 7.37 Ensembl Gene Annotation “Basis for initial analysis and publication of most vertebrate genomes” Genome assembly from NCBI Gene build system –Targetted gene builds predict known genes –Similarity gene builds predict novel genes
Lecture/Lab 7.38 Curwen et al, Genome Res 14: , 2004
Lecture/Lab 7.39 Targetted gene build ContigView of best in genome gene with associated evidence Known gene (p53) Proteins aligned cDNAs aligned UTRs predicted Unigene clusters aligned Align known proteins with pmatch and BLAST Incorporate aligned cDNA sequences to find splice sites, UTRs with genewise
Lecture/Lab Similarity gene build Identify novel exons ab initio using Genscan Confirm exons by BLAST to known proteins, mRNAs, UniGene clusters ContigView of homology gene with associated evidence Novel gene GenScan predictions Proteins aligned Unigene clusters aligned
Lecture/Lab Ensembl Gene Annotation Resulting “Ensembl genes” are highly accurate with low false positive rates Ensembl human gene identifiers are 95% stable between builds
Lecture/Lab Manually curated genes: VEGA Some chromosomes contain manually curated genes from VEGA database Otter database/server allows integration of automatic and manual annotations (eg. from Apollo) VEGA gene
Lecture/Lab Ensembl EST genes ESTs not accurate enough to produce Ensembl genes, but important especially for identifying alternative transcripts ESTs aligned to genome and merged to create an independent set of “EST genes” Known gene Unigene clusters aligned EST genes
Lecture/Lab Pseudogenes Processed pseudogenes in annotation identified (lack of introns, frameshifts, presence of multi-exon version elsewhere in genome, etc.) Pseudogene
Lecture/Lab Noncoding RNA Genes Genes with no ORFs that are functional (tRNAs, rRNAs, miRNAs …) 7220 annotations from Sean Eddy and Tom Jones miRNAs Coding gene
Lecture/Lab Example 1: Exploring Caspase-3 Aim to demonstrate basic browsing and views Caspase-3 is a gene involved in apoptosis (cell suicide) We will look at: –Gene annotation –SNPs –Orthologs and genome alignments –Alternative transcripts and EST genes
Lecture/Lab Example 1: Exploring Caspase-3 Go to human homepage
Lecture/Lab Species-specific homepage Statistics of current release Site map
Lecture/Lab Finding the tool/view: Site Map
Lecture/Lab Text Search Species-specific homepage caspase-3 Gene Click Back to
Lecture/Lab GeneView ContigView ProteinView ExonView TransView of transcript SNPView ExportView
Lecture/Lab GeneView Orthologs predicted by sequence similarity and synteny GeneDAS: Get data from external sources
Lecture/Lab GeneView Links to external databases On the same page, information provided for each transcript individually
Lecture/Lab GeneView
Lecture/Lab GeneSNPView
Lecture/Lab Other SNP/Haplotype tools SNPView ProteinView (protein sequence with SNP markup) LDView: View linkage disequilibrium (only limited regions) HaploView: View haplotypes (only limited regions)
Lecture/Lab GeneView Click Back to
Lecture/Lab ContigView Sequence contigs Chromosome and bands
Lecture/Lab ContigView: Detailed View Other tracks: Aligned sequences etc. Gene annotations See other tracks, options in menus Targetted gene predictions (2 alternative transcripts) EST genes Genscan predictions
Lecture/Lab ContigView
Lecture/Lab MultiContigView Rat ortholog DNA sequence homology
Lecture/Lab Other Comparative Genomics Tools Saw gene orthology, DNA homology Other view is SyntenyView Also access comparative genomics through EnsMart
Lecture/Lab Data Mining with EnsMart Allows very fast, cross-data source querying Search for genes (features, sequences, etc.) or SNPs based on –Position; function; domains; similarity; expression; etc. Accessible from Ensembl website (MartView) as well as stand-alone Extremely powerful for data mining
Lecture/Lab Example 2: EnsMart A new disease locus has been mapped between markers D21S1991 and D21S171. It may be that the gene involved has already been identified as having a role in another disease. What candidates are in this region?
Lecture/Lab Example 2: EnsMart EnsMart is based on BioMart OR
Lecture/Lab EnsMart: Choosing your dataset
Lecture/Lab EnsMart: Filtering D21S1991 D21S171 21
Lecture/Lab EnsMart: Output Note you can output different types of information eg. sequences
Lecture/Lab EnsMart: Output
Lecture/Lab Sequence Similarity Searching Use SSAHA for exact matches (fast) Use BLAST for more distant similarity (slow)
Lecture/Lab Finding anything else: Help
Lecture/Lab DAS: Getting your Own Data in Ensembl DAS (Distributed Annotation System) –Anyone can load data into Ensembl and allow others to view it in the same view (eg. ContigView) as other Ensembl annotations –Some built-in DAS sources ldas.html
Lecture/Lab Other Ways to Access Ensembl MySQL database directly accessible APIs for Perl and Java Other software –Apollo Java genome annotation viewer/editor –Sockeye Java viewer You can get your own local version of Ensembl: software and data freely available Sockeye
Lecture/Lab For more information Publications (listed at wiki/html/EnsemblDocs/EnsemblPublications.html) –Ensembl Special: Genome Research May 2004 –Ensembl updates: NAR Jan –EnsMart: Kasprzyk et al, Genome Res Jan Documentation on how to download software and database: –
Lecture/Lab Exercises Homologues of human genes are often present in Fugu rubripes in more condensed form (with shorter introns). Is this true for the gene PTEN, a tumor suppressor often mutated in advanced cancers? –Try MultiContigView; can you think of another way to get this information as well? The microRNA bantam regulates the Drosophila (fruitfly) gene hid by binding the 3’ UTR. Hid is involved in apoptosis, and it is possible that binding sites for bantam could be found in the 3’ UTR of other apoptosis genes as well. Obtain the 3’ UTR sequence of all Drosophila genes known to be involved in apoptosis. –Using EnsMart, the GO term for apoptosis is GO: , evidence code TAS The file “PCR_product.txt” contains the sequence of a PCR product amplified from a mouse cDNA library. What gene does the product correspond to? Does it contain the complete coding sequence of that gene? –Would it be better to use BLAST or SSAHA?