Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 of 42 Browsing Genes and Genomes with Ensembl Bert Overduin Ensembl User Support EMBL Outstation European Bioinformatics Institute Wellcome Trust Genome.

Similar presentations


Presentation on theme: "1 of 42 Browsing Genes and Genomes with Ensembl Bert Overduin Ensembl User Support EMBL Outstation European Bioinformatics Institute Wellcome Trust Genome."— Presentation transcript:

1 1 of 42 Browsing Genes and Genomes with Ensembl Bert Overduin Ensembl User Support EMBL Outstation European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton, Cambridge, UK

2 2 of 42 Course Schedule Introduction Website walk-through Coffee Exercises BioMart Lunch Exercises GeneBuild Tea Variations / Compara Exercises

3 3 of 42 Ensembl Workshops

4 4 of 42 EMBL-EBI Hinxton, Cambridge

5 5 of 42 Wellcome Trust Genome Campus Hinxton, Cambridge © John Freebrey (www.thedigitaldarkcloth.com)www.thedigitaldarkcloth.com

6 6 of 42

7 7 of 42 © Sean T. McHugh (www.cambridgeincolour.com)www.cambridgeincolour.com Cambridge

8 8 of 42 A Bit of History 1995Haemophilus influenzae 1.8 Mb 1996Yeast 12 Mb 1998C. elegans100 Mb 1999Fruit fly125 Mb 2000Arabidopsis115 Mb 2001Human (draft) 2002Mouse 2.6 Gb 2004Human (“finished”) 3 Gb Sequenced genomes

9 9 of 42 A Bit of History http://www.genomesonline.org/

10 10 of 42 Annotation Wikipedia : Genome annotation is the process of attaching biological information to sequences. It consists of two main steps: 1. identifying elements on the genome, a process called Gene Finding, and 2. attaching biological information to these elements. Automatic annotation tools try to perform all this by computer analysis, as opposed to manual annotation which involves human expertise. Ideally, these approaches co-exist and complement each other in the same annotation pipeline.

11 11 of 42 Ensembl - Goals Provide automatic annotation of genomic sequence Integrate other biological data Make data available to all via the web

12 12 of 42 Ensembl - Organisation Joint project between European Bioinformatics Institute (EMBL-EBI) and Wellcome Trust Sanger Institute Started in 1999 for the Human Genome Project Funded primarily by the Wellcome Trust, additional funding by EMBL, EU, NIH-NIAID, BBSRC and MRC Team of ca. 40 people, led by Ewan Birney (EBI) and Tim Hubbard (Sanger) Uses the largest dedicated computer system in biology in Europe

13 13 of 42 Genome Browsers Ensembl Genome browser http://www.ensembl.org NCBI Map Viewer http://www.ncbi.nlm.nih.gov/mapview/ UCSC Genome Browser http://genome.ucsc.edu

14 14 of 42 NCBI Map Viewer

15 15 of 42 UCSC Genome Browser

16 16 of 42 Ensembl Genome Browser

17 17 of 42 What Distinguishes Ensembl from the UCSC and NCBI Browsers? Automatic annotation for those species for which no manually curated gene set exists Direct database access and programmatic access via the Perl API Not only the data, but also the software source code is open source

18 18 of 42 Caveats While genome browsers can be very useful tools they do not provide the definitive answer to every question! Data is fluid

19 19 of 42 Which Species Are Available? 36 chordates, ranging from mammals to ‘primitive’ chordates (Ciona intestinalis and Ciona savignyi) 3 key eukaryote model organisms: fruitfly (Drosophila melanogaster) nematode (Caenorhabditis elegans) yeast (Saccharomyces cerevisiae) 2 insect pathogen vectors: malaria mosquito (Anopheles gambiae) yellow fever / dengue mosquito (Aedes aegypti)

20 20 of 42 Species in Ensembl CAMBRI ORDO SIL DEV CARBON PER TRIA JURA CRETAC TERTIA 570 505438408360286245208144 65 MYBP FISHES BIRDS REPTILES MAMMALS PLACENTALS MONOTREMES MARSUPIALS OTHER BIRDS PALEOGNATHS PASSERINES CROCODILES TURTLES LIZARDS AMPHIBIANS TELEOSTS SHARKS RAYS LATIMERIA BICHIR/POLYPTERUS LUNGFISHES AGNATHANS NON-VERTEBRATES

21 21 of 42 More Species to Come …. Oikopleura Gorilla Zebrafinch Orangutan Marmoset Amphioxus Acorn worm Hyrax Megabat Dolphin Tarsier Kangaroo rat Chinese pangolin Two toed sloth Llama Flying lemur

22 22 of 42 Which Data Are Available? Genomic sequence Gene/transcript/peptide models External references Mapped cDNAs, peptides, micro array probes, BAC clones etc. Other features of the genome: cytogenetic bands, markers, repeats etc. Comparative data: orthologues and paralogues, protein families, whole genome alignments, syntenic regions Variation data: SNPs Regulatory data: “best guess” set of regulatory elements Data from external sources (DAS)

23 23 of 42 Gene/Transcript/Peptide Models Manual annotation For parts of genomes: human, dog, mouse, zebrafish (“Vega genes”) For complete genomes: fruitfly (FlyBase), C. elegans (WormBase), yeast (SGD) Automatic predictions (“Ensembl genes”) EST predictions Ab initio predictions (GENSCAN, SNAP)

24 24 of 42 Biological Evidence UniProt/Swiss-Prot A manually curated database and therefore of highest accuracy NCBI RefSeq A partially manually curated database UniProt/TrEMBL Automatically annotated translations of EMBL coding sequence (CDS) features EMBL / GenBank / DDBJ Primary nucleotide sequence repository All Ensembl gene predictions are based on experimental evidence:

25 25 of 42 The Ensembl Genebuild Genome assembly Computer programs Experimental evidence Ensembl Genes + +

26 26 of 42 Ensembl Identifiers ENSG###Ensembl Gene ID ENST###Ensembl Transcript ID ENSP###Ensembl Peptide ID ENSE###Ensembl Exon ID ENSF###Ensembl Family ID ENSR###Ensembl Regulatory Feature ID For other species than human a suffix is added: MUS for mouse (Mus musculus) : ENSMUSG###, DAR for zebrafish (Danio rerio) : ENSDARG### etc.etc. For imported genes Ensembl uses the original identifiers

27 27 of 42 Access to Genome Annotation Release web site http://www.ensembl.org/ http://www.ensembl.org/ Pre-Release http://pre.ensembl.org/ http://pre.ensembl.org/ Archive http://archive.ensembl.org http://archive.ensembl.org BioMart http://www.ensembl.org/Multi/martview http://www.ensembl.org/Multi/martview Downloads ftp://ftp.ensembl.org/ ftp://ftp.ensembl.org/ MySQL interface ensembldb.ensembl.org Perl API http://www.ensembl.org/info/software/ http://www.ensembl.org/i

28 28 of 42 Pre! and Archive! Sites

29 29 of 42 BioMart Data Mining Tool

30 30 of 42 Downloads ftp://ftp.ensembl.org/pub http://www.ensembl.org/info/data/download.html FASTA files: plain sequence DNA (assembly masked and unmasked) cDNA (Ensembl and ab initio predictions) Peptides (Ensembl and ab initio predictions) RNA (non-coding RNA predictions) Flatfiles: annotated 1Mb slices EMBL format GenBank format MySQL: database table dumps

31 31 of 42 MySQL SQL = Structured Query Language Needed: MySQL client program http://www.mysql.com Ability to write MySQL queries Knowledge of database schema

32 32 of 42 Perl API API = Application Programming Interface Needed: BioPerl modules Ensembl modules Ability to code in Perl For more information (installation instructions, tutorials, documentation etc.): http://www.ensembl.org/info/software/index.html

33 33 of 42 Ensembl BLAST WU-BLAST 2.0: search against assemblies, Ensembl predictions or ab initio predictions BLAT and SSAHA2: BLAST-like Alignment Tool Sequence Search and Alignment by Hashing Algorithm very fast search against assemblies for (almost) exact DNA-DNA matches Search against one or multiple species Search max. 30 sequences simultaneously

34 34 of 42 Ensembl Accounts Personalise Ensembl by saving bookmarks, view configurations and homepage preferences in a user account Share bookmarks and configurations by setting up groups Please note that all Ensembl data remains free access. It is not necessary to register in order to gain access to Ensembl data!

35 35 of 42 Website Statistics On average 1,000,000 page impressions / week Top 3 species: Top 3 countries:

36 36 of 42 Ensembl – Open Source Data and software freely available More than 50 installs worldwide Academia and industry Local or available via the web Mirrors with Ensembl data, e.g. http://ensembl.genome.tugraz.at/index.html http://ensembl.genome.tugraz.at/index.html or user projects with own data

37 37 of 42 Powered by Ensembl

38 38 of 42 What If I Need Help? Helpdesk: helpdesk@ensembl.org Workshops on use of the browser or the API Mailing lists: ensembl-dev@ebi.ac.uk ensembl-announce@ebi.ac.uk ‘Geek for a week’ program Animated tutorials http://www.ensembl.org/common/Workshops_Online

39 39 of 42 Ensembl Team Guy Coates, Tim Cutts, Shelley Goddard Systems & Support Paul Flicek, Yuan Chen, Stefan Gräf, Nathan Johnson, Daniel Rios Functional Genomics Ewan Birney (EBI), Tim Hubbard (Sanger Institute) Leaders Damian Keefe, Guy Slater, Michael Hoffman, Alison Meynert, Dace Ruklisa, Daniel Zerbino Research Martin Hammond, Dan Lawson, Karyn Megy Vectorbase Annotation Kerstin Howe, Tina Eyre, Ian Sealy Zebrafish Annotation Val Curwen, Steve Searle, Bronwen Aken, Julio Banet, Laura Clarke, Sarah Dyer, Felix Kokocinski, Jan-Hinnerck Vogel, Simon White Analysis and Annotation Pipeline Javier Herrero, Benoit Ballester, Kathryn Beal, Stephen Fitzgerald, Albert Vilella Comparative Genomics James Smith, Fiona Cunningham, Anne Parker, Bethan Pritchard, Stephen Rice, Steve Trevanion Web Team Xosé M Fernández, Bert Overduin, Michael Schuster, Giulietta Spudich Outreach & QC Eugene Kulesha, Andy Jenkinson Distributed Annotation System (DAS) Arek Kasprzyk, Syed Haider, Richard Holland, Damian Smedley BioMart Glenn Proctor, Andreas Kähäri, Ian Longden, Patrick Meidl Database Schema and Core API

40 40 of 42 Ensembl Team on the river Cam, 2006

41 41 of 42 Ewan Birney

42 42 of 42 Q & A Q U E S T I O N S A N S W E R S


Download ppt "1 of 42 Browsing Genes and Genomes with Ensembl Bert Overduin Ensembl User Support EMBL Outstation European Bioinformatics Institute Wellcome Trust Genome."

Similar presentations


Ads by Google