Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture/Lab 7.31

Similar presentations


Presentation on theme: "Lecture/Lab 7.31"— Presentation transcript:

1 Lecture/Lab 7.31 http://creativecommons.org/licenses/by-sa/2.0/

2 Lecture/Lab 7.32 Ensembl Database and Web Browser Erin Pleasance Canada’s Michael Smith Genome Sciences Centre, Vancouver

3 Lecture 7.13 www.ensembl.org

4 Lecture/Lab 7.34 What is Ensembl? Joint project of EBI and Sanger Automated annotation of eukaryotic genomes Open source software Relational database system Web interface “ The main aim of this campaign is to encourage scientists across the world - in academia, pharmaceutical companies, and the biotechnology and computer industries - to use this free information.” - Dr. Mike Dexter, Director of the Wellcome Trust

5 Lecture/Lab 7.35 Ensembl components Genes (GeneView, TransView, ExonView, ProtView) SNPs and Haplotypes (SNPView, GeneSNPView, HaploView, LDView) Diseases (DiseaseView) Markers (MarkerView) Families (DomainView, FamilyView Functions (GOView) Genome Sequence (ContigView) Comparative Genomics (ContigView, MultiContigView, SyntenyView, GeneView) Search tools: Data: Sequence Similarity (BLAST, SSAHA) Text (TextView) Anything (EnsMart) Other Annotations Chromosomes (ChromoView, KaryoView, CytoView, MapView)

6 Lecture/Lab 7.36 Species in Ensembl Focus on vertebrates No fungi/plants Arabidopsis genome browser based on Ensembl at http://atensembl.arabidopsis.info/ Mammals: Human Chimp Mouse Rat Dog Cow Opossum Insects: Fruitfly Mosquito Honeybee Fish: Zebrafish Fugu Pufferfish Tetraodon Pufferfish Other: Chicken Frog Other: Nematode VertebratesInvertebrates

7 Lecture/Lab 7.37 Ensembl Gene Annotation “Basis for initial analysis and publication of most vertebrate genomes” Genome assembly from NCBI Gene build system –Targetted gene builds predict known genes –Similarity gene builds predict novel genes

8 Lecture/Lab 7.38 Curwen et al, Genome Res 14: 942-950, 2004

9 Lecture/Lab 7.39 Targetted gene build ContigView of best in genome gene with associated evidence Known gene (p53) Proteins aligned cDNAs aligned UTRs predicted Unigene clusters aligned Align known proteins with pmatch and BLAST Incorporate aligned cDNA sequences to find splice sites, UTRs with genewise

10 Lecture/Lab 7.310 Similarity gene build Identify novel exons ab initio using Genscan Confirm exons by BLAST to known proteins, mRNAs, UniGene clusters ContigView of homology gene with associated evidence Novel gene GenScan predictions Proteins aligned Unigene clusters aligned

11 Lecture/Lab 7.311 Ensembl Gene Annotation Resulting “Ensembl genes” are highly accurate with low false positive rates Ensembl human gene identifiers are 95% stable between builds

12 Lecture/Lab 7.312 Manually curated genes: VEGA Some chromosomes contain manually curated genes from VEGA database Otter database/server allows integration of automatic and manual annotations (eg. from Apollo) VEGA gene

13 Lecture/Lab 7.313 Ensembl EST genes ESTs not accurate enough to produce Ensembl genes, but important especially for identifying alternative transcripts ESTs aligned to genome and merged to create an independent set of “EST genes” Known gene Unigene clusters aligned EST genes

14 Lecture/Lab 7.314 Pseudogenes Processed pseudogenes in annotation identified (lack of introns, frameshifts, presence of multi-exon version elsewhere in genome, etc.) Pseudogene

15 Lecture/Lab 7.315 Noncoding RNA Genes Genes with no ORFs that are functional (tRNAs, rRNAs, miRNAs …) 7220 annotations from Sean Eddy and Tom Jones miRNAs Coding gene

16 Lecture/Lab 7.316 Example 1: Exploring Caspase-3 Aim to demonstrate basic browsing and views Caspase-3 is a gene involved in apoptosis (cell suicide) We will look at: –Gene annotation –SNPs –Orthologs and genome alignments –Alternative transcripts and EST genes

17 Lecture/Lab 7.317 Example 1: Exploring Caspase-3 http://www.ensembl.org Go to human homepage

18 Lecture/Lab 7.318 Species-specific homepage Statistics of current release Site map

19 Lecture/Lab 7.319 Finding the tool/view: Site Map

20 Lecture/Lab 7.320 Text Search Species-specific homepage caspase-3 Gene Click Back to

21 Lecture/Lab 7.321 GeneView ContigView ProteinView ExonView TransView of transcript SNPView ExportView

22 Lecture/Lab 7.322 GeneView Orthologs predicted by sequence similarity and synteny GeneDAS: Get data from external sources

23 Lecture/Lab 7.323 GeneView Links to external databases On the same page, information provided for each transcript individually

24 Lecture/Lab 7.324 GeneView

25 Lecture/Lab 7.325 GeneSNPView

26 Lecture/Lab 7.326 Other SNP/Haplotype tools SNPView ProteinView (protein sequence with SNP markup) LDView: View linkage disequilibrium (only limited regions) HaploView: View haplotypes (only limited regions)

27 Lecture/Lab 7.327 GeneView Click Back to

28 Lecture/Lab 7.328 ContigView Sequence contigs Chromosome and bands

29 Lecture/Lab 7.329 ContigView: Detailed View Other tracks: Aligned sequences etc. Gene annotations See other tracks, options in menus Targetted gene predictions (2 alternative transcripts) EST genes Genscan predictions

30 Lecture/Lab 7.330 ContigView

31 Lecture/Lab 7.331 MultiContigView Rat ortholog DNA sequence homology

32 Lecture/Lab 7.332 Other Comparative Genomics Tools Saw gene orthology, DNA homology Other view is SyntenyView Also access comparative genomics through EnsMart

33 Lecture/Lab 7.333 Data Mining with EnsMart Allows very fast, cross-data source querying Search for genes (features, sequences, etc.) or SNPs based on –Position; function; domains; similarity; expression; etc. Accessible from Ensembl website (MartView) as well as stand-alone Extremely powerful for data mining

34 Lecture/Lab 7.334 Example 2: EnsMart A new disease locus has been mapped between markers D21S1991 and D21S171. It may be that the gene involved has already been identified as having a role in another disease. What candidates are in this region?

35 Lecture/Lab 7.335 Example 2: EnsMart EnsMart is based on BioMart http://www.ensembl.org/Multi/martview OR http://www.ebi.ac.uk/BioMart/martview

36 Lecture/Lab 7.336 EnsMart: Choosing your dataset

37 Lecture/Lab 7.337 EnsMart: Filtering D21S1991 D21S171 21

38 Lecture/Lab 7.338 EnsMart: Output Note you can output different types of information eg. sequences

39 Lecture/Lab 7.339 EnsMart: Output

40 Lecture/Lab 7.340 Sequence Similarity Searching Use SSAHA for exact matches (fast) Use BLAST for more distant similarity (slow)

41 Lecture/Lab 7.341 Finding anything else: Help

42 Lecture/Lab 7.342 DAS: Getting your Own Data in Ensembl DAS (Distributed Annotation System) –Anyone can load data into Ensembl and allow others to view it in the same view (eg. ContigView) as other Ensembl annotations –Some built-in DAS sources http://www.ensembl.org/Docs/ ldas.html

43 Lecture/Lab 7.343 Other Ways to Access Ensembl MySQL database directly accessible APIs for Perl and Java Other software –Apollo Java genome annotation viewer/editor –Sockeye Java viewer You can get your own local version of Ensembl: software and data freely available Sockeye

44 Lecture/Lab 7.344 For more information Publications (listed at http://www.ensembl.org/Docs/ wiki/html/EnsemblDocs/EnsemblPublications.html) –Ensembl Special: Genome Research May 2004 –Ensembl updates: NAR Jan. 2002-2005 –EnsMart: Kasprzyk et al, Genome Res Jan. 2004 Documentation on how to download software and database: –http://www.ensembl.org/Docs/

45 Lecture/Lab 7.345 Exercises Homologues of human genes are often present in Fugu rubripes in more condensed form (with shorter introns). Is this true for the gene PTEN, a tumor suppressor often mutated in advanced cancers? –Try MultiContigView; can you think of another way to get this information as well? The microRNA bantam regulates the Drosophila (fruitfly) gene hid by binding the 3’ UTR. Hid is involved in apoptosis, and it is possible that binding sites for bantam could be found in the 3’ UTR of other apoptosis genes as well. Obtain the 3’ UTR sequence of all Drosophila genes known to be involved in apoptosis. –Using EnsMart, the GO term for apoptosis is GO:0006915, evidence code TAS The file “PCR_product.txt” contains the sequence of a PCR product amplified from a mouse cDNA library. What gene does the product correspond to? Does it contain the complete coding sequence of that gene? –Would it be better to use BLAST or SSAHA?


Download ppt "Lecture/Lab 7.31"

Similar presentations


Ads by Google