1 of 31 Dr. Giulietta M. Spudich European Bioinformatics Institute The Ensembl Browser.

Slides:



Advertisements
Similar presentations
The Consensus CoDing Sequence (CCDS) Database
Advertisements

Genomic Innovations- Orthology Paralogy. Genomic innovation.
1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Ensembl Developers Meeting September 2008 Xosé Mª Fernández European Bioinformatics Institute.
April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.
Tutorial 7 Genome browser. Free, open source, on-line broswer for genomes Contains ~100 genomes, from nematodes to human. Many tools that can be used.
Genome Browsers Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Genome Related Biological Databases. Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website.
Genome Browsers Ensembl (EBI, UK) and UCSC (Santa Cruz, California)
Bootcamp: Data Resources1 Paul Bain Reference and Education Services Librarian Countway Library of Medicine Countway.
Genomic Database - Ensembl Ka-Lok Ng Department of Bioinformatics Asia University.
How to access genomic information using Ensembl August 2005.
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
UniProt - The Universal Protein Resource
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
Plants.ensembl.org 2 nd transPLANT user training workshop Poznań, 27th-28th June 2013 EBI is an Outstation of the European Molecular Biology Laboratory.
1 of 34 Ensembl use of RNASeq Steve Searle. 2 of 34 Ways we use RNASeq data in Ensembl: Build complete gene set from scratch for individual or pooled.
Nucleotide sequence alignments in Compara Stephen Fitzgerald
The Ensembl Gene set The “Genebuild” 21 April 2008.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
VectorBase A Resource Centre for Invertebrate Hosts of Human Pathogens Bob MacCallum Imperial College London.
1 of 42 Browsing Genes and Genomes with Ensembl Bert Overduin Ensembl User Support EMBL Outstation European Bioinformatics Institute Wellcome Trust Genome.
Kerstin Howe, Mario Caccamo, Ian Sealy The Zebrafish Genome Sequencing Project Bioinformatics resources.
Genome Annotation and Databases Genomic DNA sequence Genomic annotation BIO520 BioinformaticsJim Lund Reading Ch 9, Ch10.
How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November.
Genomes School B&I TCD Bioinformatics May Genome sizes Completed eukaryotic nuclear genomes Type of organismSpeciesGenome size (10 6 base pairs)
EBI is an Outstation of the European Molecular Biology Laboratory. Bert Overduin Daniel Rios Stephen Fitzgerald Edinburgh, 24 & 25 February 2009 Ensembl.
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
NCBI’s Genome Annotation: Overview Incremental processing Re-annotation ( batch ) Post-annotation review Case studies NOTE: limiting discussion to annotation.
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
DAY 1c: Accessing Completed Genomes 1. UCSC Genome Bioinformatics 2. Ensembl 3. NCBI Genomic Biology.
An Introduction to ENSEMBL Cédric Notredame. The Top 5 Surprises in the Human Genome Map 1.The blue gene exists in 3 genotypes: Straight Leg, Loose Fit.
An Introduction to Ensembl Presented By Hilary O. Pavlidis.
1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments.
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
D A S for ENCODE data coordination Felix Kokocinski, WTSI.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden
Data Mining in Ensembl with BioMart Giulietta Spudich.
Bioinformatics and Computational Biology
Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome.
Kevin Howe, B. Aken, M. Caccamo, Y. Chen, L. Clarke, S. Dyer, G. Coates, T. Cox, F. Cunningham, V. Curwen, T. Cutts, R. Durbin, J. Fernandez-Banet, X.M.
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Copyright OpenHelix. No use or reproduction without express written consent1.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Evaluating genes and transcripts in Ensembl March 2007.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
Accessing and visualizing genomics data
Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster.
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
Lecture/Lab 7.31
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
Web Databases for Drosophila
Denise Carvalho-Silva Ensembl Outreach
Introduction to Genes and Genomes with Ensembl
Ensembl Genomes: Overview Poznań, 27th-28th June 2013
Ensembl Genome Repository.
Ensembl Genomes: Overview Versailles, 12th-13th November 2012
Gene Safari (Biological Databases)
Welcome - webinar instructions
Presentation transcript:

1 of 31 Dr. Giulietta M. Spudich European Bioinformatics Institute The Ensembl Browser

2 of 31 Today  Introduction to the Ensembl project and gene set  Walk-through of the browser  Hands-on Browser  BioMart Lunch  BioMart Hands-on  Comparative Genomics + Hands-on  Variations &Functional Genomics + Hands-on

3 of 31 Course Objectives gene  How to browse information about a gene transcript  How to choose a transcript variations  Where to find sequence variations alignments  How to view multiple alignments BioMart  How to use BioMart help  Where to go for help

4 of 31 Introduction to Ensembl Why do we have genome browsers? Why Ensembl? Ensembl genes and genomes Where to go for help?

5 of 31 Histone modification DNase I sensitive site Gene Allele Conserved sequence Genome browsers provide a map Figure adapted from the ENCODE project

6 of 31 Genome Browsers Ensembl Genome browser NCBI Map Viewer UCSC Genome Browser

7 of 31 Ensembl Features The gene set. Automatic annotation based on mRNA and protein information plus manual annotation (GENCODE set). BioMart (data export tool) Comparative analysis (gene trees) Variation and functional genomics Integration with other databases (DAS) Programmatic access via the Perl API (open source)

8 of 31 Subjects Why do we have genome browsers? Why Ensembl? Ensembl genes and genomes Where to go for help?

9 of 31 To meet a challenge… Ensembl’s AIM: To provide annotation for the biological community that is freely available and of high quality Started in 2000 Joint project between EBI and Sanger Funded primarily by the Wellcome Trust, additional funding by EMBL, NIH-NIAID, EU, BBSRC and MRC

10 of 31 Genome annotation Genome annotation is the process of attaching biological information to sequences. It consists of two main steps: 1. Identifying genes on the genome. 2. Attaching biological information to genes and the genome. (For example, effects of sequence variation).

11 of 31 Ensembl Annotates Vertebrate Genomes Non-chordates: D. melanogaster C. elegans S. cerevisiae 50 species including:

12 of 3112 of of 35 3 Plasmodia falciparum knowlesi vivax 48 Chordates including: Human Mouse Zebrafish Chicken Chimpanzee Pig Platypus 134 species - 6 bacterial clades - 1 prokaryotic clades 8 Aspergillums 2 yeast - S.cerevisiae - S.pombe 8 species Arabidopsis thaliana Arabidopsis lyrata Oryza sativa : Extending Ensembl across the taxonomic space 21 species Drosophila (12) Caenorhabditis (5) Anopheles gambiae F. D. Ciccarelli, T. Doerks, C. von Mering, C. J. Creevey, B. Snel & P. Bork. Towards automatic reconstruction of a highly resolved tree of life. Science, 3 March Slide design by Jeff Almeida-King

13 of 31 Exploring genomes Vertebrates focus: Other species:

14 of 31 Subjects Why do we have genome browsers? Why Ensembl? Ensembl genes and genomes Where to go for help?

15 of 31 What is known? Genomic assemblies from sequencing consortia

16 of 31 What is known? UniProtKB/Swiss-Prot (manually curated) UniProtKB/TrEMBL NCBI RefSeq (manually curated) Proteins and cDNA/mRNA sequences from the research community found in: Note: See pages 55 and 56 of the course booklet

17 of 31 Exon Untranslated+Coding CodingUntranslated …tgcctgttag... Combining genes and genomes

18 of 31 Too many pieces… Genome Aligned cDNA and protein Exon Untranslated+Coding CodingUntranslated

19 of 31 Ensembl shows one transcript with underlying evidence

20 of 31 Ensembl Compared with Swiss-Prot and NCBI RefSeq sequences

21 of 31 Is there any consensus? NCBI RefSeq set ≠ UniProt set Ensembl combines these sets UCSC has it’s own gene set How do we come up with a consensus gene set between all these?

22 of 31 CCDS Reaching a consensus coding sequence set for human and mouse. 19,851 (ENS human), 17,679 (ENS mouse) (*as of Sept 2009) If you see a “CCDS ID”, the coding sequence is agreed upon. Genome Res Jul;19(7): Epub 2009 Jun 4

23 of 31 VEGA/Havana Automatic annotation pipeline: Gene building all at once (whole genome) Ensembl Manual curation: case-by-case basis VEGA: Vertebrate Genome Annotation Havana

24 of 31 Genes and Transcripts in Ensembl High Quality: CCDS transcripts Ensembl/Havana merged transcripts

25 of 31 Ensembl/Havana Transcripts are from:EnsemblHavana Ensembl/Havana merge

26 of 31 Gene Names in Ensembl ENSG###Ensembl Gene ID ENST###Ensembl Transcript ID ENSP###Ensembl Peptide ID ENSE###Ensembl Exon ID For other species than human a suffix is added: MUS (Mus musculus) for mouse: ENSMUSG### DAR (Danio rerio) for zebrafish: ENSDARG###, etc.

27 of 31 How is all this information organised? Ensembl Views (Website) Ensembl Database (open source) BioMart ‘DataMining tool’

28 of 31 What other annotation? Non-coding (nc)RNAs IDs in other databases microarray probes, clonesets, BAC maps Other features of the genome: repeats, CpG islands Homologs and whole genome alignments: orthologues and paralogues, protein families, syntenic regions Variation data: Single Nucleotide Polymorphisms, InDels, CNVs Regulatory data (a first guess at promoter and enhancer elements) Data from external sources (DAS)

29 of 31 Subjects Why do we have genome browsers? Why Ensembl? Ensembl genes and genomes Where to go for help?

30 of 31 Help and Information Comments and questions? Check out our tutorials page: Videos Mailing list Come visit our blog! FTP site: ftp://ftp.ensembl.orgftp://ftp.ensembl.org Amazon Web Services:

31/40 EnsemblPaul Flicek (EBI), Steve Searle (Sanger Institute) SoftwareGlenn Proctor, Andreas Kähäri, Stephen Keenan, Rhoda Kinsella, Eugene Kulesha, Ian Longden, Iliana Toneva, Jorge Zamora Comparative GenomicsJavier Herrero, Kathryn Beal, Stephen Fitzgerald, Leo Gordon Functional GenomicsIan Dunham, Nathan Johnson, Daniel Sobral, Steven Wilder VariationFiona Cunningham, Pontus Larsson, Will McLaren, Graham Ritchie Analysis and AnnotationJan-Hinnerck Vogel, Bronwen Aken, Susan Fairley, Thibaut Hourlier, Magali Ruffier, Simon White, Amy Tang, Amonida Zadissa Web TeamAnne Parker, Ridwan Amode, Simon Brent, Maurice Hendrix, Bethan Pritchard, Steve Trevanion (VEGA) OutreachXosé M Fernández, Jeff Almeida-King, Bert Overduin, Michael Schuster (QC), Giulietta Spudich, Jana Vandrovcova Systems & SupportGuy Coates, James Beal, Gen-Tao Chiang, Peter Clapham, Simon Kelley, Shelley Goddard, Tracy Mumford, Kerry Smith Research Benoît Ballester, Petra Catalina Schwalie, André Faure, Markus Fritz, Damian Keefe, Alison Meynert, Dace Ruklisa, Mikhail Spivakov, David Thybert, Sander Timmer, Albert Vilella Vertebrate Genomics Chao-Kung Chen, Laura Clarke, Jonathan Hinton, Zam Iqbal, Vasudev Kumanduri, Ilkka Lappalainen, Edoardo Marcora, Pablo Marín, Damian Smedley, Richard Smth, Phil Wilkinson, Holly Zheng-Bradley Ensembl Genomes Paul Kersey, Paul Derwent, Matthias Haimel, Alan Horne, Arnaud Kerhornou, Uma Maheswari, Michael Nuhn, Dan Staines, Andy Yates VectorBaseDan Lawson, Gautier Koscielny, Karyn Megy ZebrafishKerstin Howe, Kim Brugger, Will Chow, Britt Reimholz, James Torrance Ensembl StrategyEwan Birney, Richard Durbin, Tim Hubbard Ensembl Team Ensembl’s 10 th Year Nucleic Acids Res