1 of 42 Browsing Genes and Genomes with Ensembl Bert Overduin Ensembl User Support EMBL Outstation European Bioinformatics Institute Wellcome Trust Genome.

Slides:



Advertisements
Similar presentations
GBrowse at TAIR Philippe Lamesch TAIR curator. Seqviewer.
Advertisements

Ensembl Compara Perl API Stephen Fitzgerald EBI - Wellcome Trust Genome Campus, UK compara.
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Ensembl Developers Meeting September 2008 Xosé Mª Fernández European Bioinformatics Institute.
April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.
Genome Browsers Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Lecture 7.11 The Ensembl Database Erin Pleasance Steven Jones Canada’s Michael Smith Genome Sciences Centre, Vancouver.
Data Mining in Ensembl with EnsMart. 2 of 24 All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes.
Bootcamp: Data Resources1 Paul Bain Reference and Education Services Librarian Countway Library of Medicine Countway.
Genomic Database - Ensembl Ka-Lok Ng Department of Bioinformatics Asia University.
How to access genomic information using Ensembl August 2005.
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
1 of 34 Ensembl use of RNASeq Steve Searle. 2 of 34 Ways we use RNASeq data in Ensembl: Build complete gene set from scratch for individual or pooled.
Nucleotide sequence alignments in Compara Stephen Fitzgerald
The Ensembl Gene set The “Genebuild” 21 April 2008.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
VectorBase A Resource Centre for Invertebrate Hosts of Human Pathogens Bob MacCallum Imperial College London.
Kerstin Howe, Mario Caccamo, Ian Sealy The Zebrafish Genome Sequencing Project Bioinformatics resources.
Genome Annotation and Databases Genomic DNA sequence Genomic annotation BIO520 BioinformaticsJim Lund Reading Ch 9, Ch10.
EBI is an Outstation of the European Molecular Biology Laboratory. Bert Overduin Edinburgh, 24 February 2009 Ensembl Developers Workshop Core API.
How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November.
Genomes School B&I TCD Bioinformatics May Genome sizes Completed eukaryotic nuclear genomes Type of organismSpeciesGenome size (10 6 base pairs)
EBI is an Outstation of the European Molecular Biology Laboratory. Bert Overduin Daniel Rios Stephen Fitzgerald Edinburgh, 24 & 25 February 2009 Ensembl.
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
Biological Databases and Tools Sandra Sinisi / Kathryn Steiger November 25, 2002.
NGS Bioinformatics Workshop 1.5 Tutorial – Genome Annotation April 5th, 2012 IRMACS Facilitator: Richard Bruskiewich Adjunct Professor, MBB.
NCBI Vector-Parasite Genomic Related Databases Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 12, 2004
DAY 1c: Accessing Completed Genomes 1. UCSC Genome Bioinformatics 2. Ensembl 3. NCBI Genomic Biology.
Introduction to Bioinformatics Databases. DNARNAphenotypeprotein Central dogma of molecular biology A main focus of bioinformatics is to study molecular.
An Introduction to ENSEMBL Cédric Notredame. The Top 5 Surprises in the Human Genome Map 1.The blue gene exists in 3 genotypes: Straight Leg, Loose Fit.
An Introduction to Ensembl Presented By Hilary O. Pavlidis.
1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments.
EnsEMBL Opening up the whole Genome Philip Lijnzaad
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
VectorBase BRC The evolving VectorBase gene build: mixing automated and manual approaches when annotating vector genomes Daniel Lawson VectorBase-EBI,
D A S for ENCODE data coordination Felix Kokocinski, WTSI.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden
Web Databases for Drosophila An introduction to web tools, databases and NCBI BLAST Wilson Leung08/2015.
Variation data in VectorBase NIH/NIAID VectorBase site visit March 2015.
Data Mining in Ensembl with BioMart Giulietta Spudich.
VectorBase Vectorbase probe mapping. VectorBase Automatic Annotation browser Array data CHADO Manual Annotation XML vectorbase Automatic Annotation.
Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome.
Kevin Howe, B. Aken, M. Caccamo, Y. Chen, L. Clarke, S. Dyer, G. Coates, T. Cox, F. Cunningham, V. Curwen, T. Cutts, R. Durbin, J. Fernandez-Banet, X.M.
ARGOS (A Replicable Genome InfOrmation System) for FlyBase and wFleaBase Don Gilbert, Hardik Sheth, Vasanth Singan { gilbertd, hsheth, vsingan
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Evaluating genes and transcripts in Ensembl March 2007.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
1 of 31 Dr. Giulietta M. Spudich European Bioinformatics Institute The Ensembl Browser.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
Accessing and visualizing genomics data
Welcome to the combined BLAST and Genome Browser Tutorial.
Lecture/Lab 7.31
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
Introduction to Genes and Genomes with Ensembl
Ensembl Database and Web Browser
VectorBase genome annotation
Data Mining with BioMart
Comparative Genomics.
Access to Sequence Data and Related Information
Gene Annotation with DNA Subway
Part I: Tips and Techniques from curators
Ensembl Genome Repository.
Welcome - webinar instructions
Presentation transcript:

1 of 42 Browsing Genes and Genomes with Ensembl Bert Overduin Ensembl User Support EMBL Outstation European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton, Cambridge, UK

2 of 42 Course Schedule Introduction Website walk-through Coffee Exercises BioMart Lunch Exercises GeneBuild Tea Variations / Compara Exercises

3 of 42 Ensembl Workshops

4 of 42 EMBL-EBI Hinxton, Cambridge

5 of 42 Wellcome Trust Genome Campus Hinxton, Cambridge © John Freebrey (

6 of 42

7 of 42 © Sean T. McHugh ( Cambridge

8 of 42 A Bit of History 1995Haemophilus influenzae 1.8 Mb 1996Yeast 12 Mb 1998C. elegans100 Mb 1999Fruit fly125 Mb 2000Arabidopsis115 Mb 2001Human (draft) 2002Mouse 2.6 Gb 2004Human (“finished”) 3 Gb Sequenced genomes

9 of 42 A Bit of History

10 of 42 Annotation Wikipedia : Genome annotation is the process of attaching biological information to sequences. It consists of two main steps: 1. identifying elements on the genome, a process called Gene Finding, and 2. attaching biological information to these elements. Automatic annotation tools try to perform all this by computer analysis, as opposed to manual annotation which involves human expertise. Ideally, these approaches co-exist and complement each other in the same annotation pipeline.

11 of 42 Ensembl - Goals Provide automatic annotation of genomic sequence Integrate other biological data Make data available to all via the web

12 of 42 Ensembl - Organisation Joint project between European Bioinformatics Institute (EMBL-EBI) and Wellcome Trust Sanger Institute Started in 1999 for the Human Genome Project Funded primarily by the Wellcome Trust, additional funding by EMBL, EU, NIH-NIAID, BBSRC and MRC Team of ca. 40 people, led by Ewan Birney (EBI) and Tim Hubbard (Sanger) Uses the largest dedicated computer system in biology in Europe

13 of 42 Genome Browsers Ensembl Genome browser NCBI Map Viewer UCSC Genome Browser

14 of 42 NCBI Map Viewer

15 of 42 UCSC Genome Browser

16 of 42 Ensembl Genome Browser

17 of 42 What Distinguishes Ensembl from the UCSC and NCBI Browsers? Automatic annotation for those species for which no manually curated gene set exists Direct database access and programmatic access via the Perl API Not only the data, but also the software source code is open source

18 of 42 Caveats While genome browsers can be very useful tools they do not provide the definitive answer to every question! Data is fluid

19 of 42 Which Species Are Available? 36 chordates, ranging from mammals to ‘primitive’ chordates (Ciona intestinalis and Ciona savignyi) 3 key eukaryote model organisms: fruitfly (Drosophila melanogaster) nematode (Caenorhabditis elegans) yeast (Saccharomyces cerevisiae) 2 insect pathogen vectors: malaria mosquito (Anopheles gambiae) yellow fever / dengue mosquito (Aedes aegypti)

20 of 42 Species in Ensembl CAMBRI ORDO SIL DEV CARBON PER TRIA JURA CRETAC TERTIA MYBP FISHES BIRDS REPTILES MAMMALS PLACENTALS MONOTREMES MARSUPIALS OTHER BIRDS PALEOGNATHS PASSERINES CROCODILES TURTLES LIZARDS AMPHIBIANS TELEOSTS SHARKS RAYS LATIMERIA BICHIR/POLYPTERUS LUNGFISHES AGNATHANS NON-VERTEBRATES

21 of 42 More Species to Come …. Oikopleura Gorilla Zebrafinch Orangutan Marmoset Amphioxus Acorn worm Hyrax Megabat Dolphin Tarsier Kangaroo rat Chinese pangolin Two toed sloth Llama Flying lemur

22 of 42 Which Data Are Available? Genomic sequence Gene/transcript/peptide models External references Mapped cDNAs, peptides, micro array probes, BAC clones etc. Other features of the genome: cytogenetic bands, markers, repeats etc. Comparative data: orthologues and paralogues, protein families, whole genome alignments, syntenic regions Variation data: SNPs Regulatory data: “best guess” set of regulatory elements Data from external sources (DAS)

23 of 42 Gene/Transcript/Peptide Models Manual annotation For parts of genomes: human, dog, mouse, zebrafish (“Vega genes”) For complete genomes: fruitfly (FlyBase), C. elegans (WormBase), yeast (SGD) Automatic predictions (“Ensembl genes”) EST predictions Ab initio predictions (GENSCAN, SNAP)

24 of 42 Biological Evidence UniProt/Swiss-Prot A manually curated database and therefore of highest accuracy NCBI RefSeq A partially manually curated database UniProt/TrEMBL Automatically annotated translations of EMBL coding sequence (CDS) features EMBL / GenBank / DDBJ Primary nucleotide sequence repository All Ensembl gene predictions are based on experimental evidence:

25 of 42 The Ensembl Genebuild Genome assembly Computer programs Experimental evidence Ensembl Genes + +

26 of 42 Ensembl Identifiers ENSG###Ensembl Gene ID ENST###Ensembl Transcript ID ENSP###Ensembl Peptide ID ENSE###Ensembl Exon ID ENSF###Ensembl Family ID ENSR###Ensembl Regulatory Feature ID For other species than human a suffix is added: MUS for mouse (Mus musculus) : ENSMUSG###, DAR for zebrafish (Danio rerio) : ENSDARG### etc.etc. For imported genes Ensembl uses the original identifiers

27 of 42 Access to Genome Annotation Release web site Pre-Release Archive BioMart Downloads ftp://ftp.ensembl.org/ ftp://ftp.ensembl.org/ MySQL interface ensembldb.ensembl.org Perl API

28 of 42 Pre! and Archive! Sites

29 of 42 BioMart Data Mining Tool

30 of 42 Downloads ftp://ftp.ensembl.org/pub FASTA files: plain sequence DNA (assembly masked and unmasked) cDNA (Ensembl and ab initio predictions) Peptides (Ensembl and ab initio predictions) RNA (non-coding RNA predictions) Flatfiles: annotated 1Mb slices EMBL format GenBank format MySQL: database table dumps

31 of 42 MySQL SQL = Structured Query Language Needed: MySQL client program Ability to write MySQL queries Knowledge of database schema

32 of 42 Perl API API = Application Programming Interface Needed: BioPerl modules Ensembl modules Ability to code in Perl For more information (installation instructions, tutorials, documentation etc.):

33 of 42 Ensembl BLAST WU-BLAST 2.0: search against assemblies, Ensembl predictions or ab initio predictions BLAT and SSAHA2: BLAST-like Alignment Tool Sequence Search and Alignment by Hashing Algorithm very fast search against assemblies for (almost) exact DNA-DNA matches Search against one or multiple species Search max. 30 sequences simultaneously

34 of 42 Ensembl Accounts Personalise Ensembl by saving bookmarks, view configurations and homepage preferences in a user account Share bookmarks and configurations by setting up groups Please note that all Ensembl data remains free access. It is not necessary to register in order to gain access to Ensembl data!

35 of 42 Website Statistics On average 1,000,000 page impressions / week Top 3 species: Top 3 countries:

36 of 42 Ensembl – Open Source Data and software freely available More than 50 installs worldwide Academia and industry Local or available via the web Mirrors with Ensembl data, e.g. or user projects with own data

37 of 42 Powered by Ensembl

38 of 42 What If I Need Help? Helpdesk: Workshops on use of the browser or the API Mailing lists: ‘Geek for a week’ program Animated tutorials

39 of 42 Ensembl Team Guy Coates, Tim Cutts, Shelley Goddard Systems & Support Paul Flicek, Yuan Chen, Stefan Gräf, Nathan Johnson, Daniel Rios Functional Genomics Ewan Birney (EBI), Tim Hubbard (Sanger Institute) Leaders Damian Keefe, Guy Slater, Michael Hoffman, Alison Meynert, Dace Ruklisa, Daniel Zerbino Research Martin Hammond, Dan Lawson, Karyn Megy Vectorbase Annotation Kerstin Howe, Tina Eyre, Ian Sealy Zebrafish Annotation Val Curwen, Steve Searle, Bronwen Aken, Julio Banet, Laura Clarke, Sarah Dyer, Felix Kokocinski, Jan-Hinnerck Vogel, Simon White Analysis and Annotation Pipeline Javier Herrero, Benoit Ballester, Kathryn Beal, Stephen Fitzgerald, Albert Vilella Comparative Genomics James Smith, Fiona Cunningham, Anne Parker, Bethan Pritchard, Stephen Rice, Steve Trevanion Web Team Xosé M Fernández, Bert Overduin, Michael Schuster, Giulietta Spudich Outreach & QC Eugene Kulesha, Andy Jenkinson Distributed Annotation System (DAS) Arek Kasprzyk, Syed Haider, Richard Holland, Damian Smedley BioMart Glenn Proctor, Andreas Kähäri, Ian Longden, Patrick Meidl Database Schema and Core API

40 of 42 Ensembl Team on the river Cam, 2006

41 of 42 Ewan Birney

42 of 42 Q & A Q U E S T I O N S A N S W E R S