April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

Slides:



Advertisements
Similar presentations
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Advertisements

Genomic Innovations- Orthology Paralogy. Genomic innovation.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Ensembl Developers Meeting September 2008 Xosé Mª Fernández European Bioinformatics Institute.
Genome Browsers Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
Lecture 7.11 The Ensembl Database Erin Pleasance Steven Jones Canada’s Michael Smith Genome Sciences Centre, Vancouver.
Data Mining in Ensembl with EnsMart. 2 of 24 All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes.
How to access genomic information using Ensembl August 2005.
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
November 2007BRC5 Bethesda Variation data in VectorBase Dan Lawson, VectorBase EMBL-EBI.
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
1 of 34 Ensembl use of RNASeq Steve Searle. 2 of 34 Ways we use RNASeq data in Ensembl: Build complete gene set from scratch for individual or pooled.
The Ensembl Gene set The “Genebuild” 21 April 2008.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
VectorBase A Resource Centre for Invertebrate Hosts of Human Pathogens Bob MacCallum Imperial College London.
1 of 42 Browsing Genes and Genomes with Ensembl Bert Overduin Ensembl User Support EMBL Outstation European Bioinformatics Institute Wellcome Trust Genome.
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Kerstin Howe, Mario Caccamo, Ian Sealy The Zebrafish Genome Sequencing Project Bioinformatics resources.
Genome Annotation and Databases Genomic DNA sequence Genomic annotation BIO520 BioinformaticsJim Lund Reading Ch 9, Ch10.
How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November.
EBI is an Outstation of the European Molecular Biology Laboratory. Bert Overduin Daniel Rios Stephen Fitzgerald Edinburgh, 24 & 25 February 2009 Ensembl.
1 The Genome Browser allows you to –Browse the Rice-Japonica, Maize and Arabidopsis genomes. –View the location of a particular feature on the rice genome.
BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004.
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
An Introduction to ENSEMBL Cédric Notredame. The Top 5 Surprises in the Human Genome Map 1.The blue gene exists in 3 genotypes: Straight Leg, Loose Fit.
An Introduction to Ensembl Presented By Hilary O. Pavlidis.
EnsEMBL Opening up the whole Genome Philip Lijnzaad
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Chapter 21 Eukaryotic Genome Sequences
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
D A S for ENCODE data coordination Felix Kokocinski, WTSI.
Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Mark D. Adams Dept. of Genetics 9/10/04
1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden
Data Mining in Ensembl with BioMart Giulietta Spudich.
VectorBase Vectorbase probe mapping. VectorBase Automatic Annotation browser Array data CHADO Manual Annotation XML vectorbase Automatic Annotation.
Copyright OpenHelix. No use or reproduction without express written consent1.
Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome.
Kevin Howe, B. Aken, M. Caccamo, Y. Chen, L. Clarke, S. Dyer, G. Coates, T. Cox, F. Cunningham, V. Curwen, T. Cutts, R. Durbin, J. Fernandez-Banet, X.M.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Data Integration & Data Mining Tool Donald Dunbar BHF CoRE Bioinformatics Team Edinburgh Bioinformatics Meeting April 2013.
A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.
1 of 31 Dr. Giulietta M. Spudich European Bioinformatics Institute The Ensembl Browser.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Gene Finding in Chimpanzee Evidence based improvement of ab initio gene predictions Chris Shaffer06/2009.
Welcome to the combined BLAST and Genome Browser Tutorial.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
Lecture/Lab 7.31
bacteria and eukaryotes
Ensembl Database and Web Browser
VectorBase genome annotation
The Ensembl Database Steven Jones August 18, 2004
Data Mining with BioMart
Genome Annotation w/ MAKER
Ensembl Genome Repository.
with the Ensembl Genome Browser
Presentation transcript:

April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

2 of 50 Overview of Ensembl Making genomes useful Beyond Ensembl Outline of talk

3 of 50 Overview of Ensembl –Ensembl - Project –Exploring genomes –Gene annotation Making genomes useful Beyond Ensembl Outline of talk

4 of 50 Ensembl - Project Joint project –EMBL – European Bioinformatics Institute (EBI) –Wellcome Trust Sanger Institute Produce accurate, automatic genome annotation Focused on selected eukaryotic genomes Integrate external (distributed) biological data Presentation of the analysis to all via the Web at Open distribution of the analysis the community Development of open, collaborative software (databases and APIs)

5 of 50 Ensembl - Project Joint project –EMBL – European Bioinformatics Institute (EBI) –Wellcome Trust Sanger Institute Produce accurate, automatic genome annotation Focused on selected eukaryotic genomes Integrate external (distributed) biological data Presentation of the analysis to all via the Web at Open distribution of the analysis the community Development of open, collaborative software (databases and APIs)

6 of 50 Beyond classical ab initio gene prediction Ensembl automatic gene prediction relies on homology ‘supporting evidence’ to avoid overprediction. Classical ab initio gene prediction (eg GENSCAN ) relies partly on global statistics of protein coding potentials, not used in the cell Genes are just a series of short signals –Transcription start site –Translation start site –5’ & 3’ Intron splicing signals –Termination signals Short signal sequences difficult to recognise over background noise in large genomes

7 of 50 Ensembl - Project Joint project –EMBL – European Bioinformatics Institute (EBI) –Wellcome Trust Sanger Institute Produce accurate, automatic genome annotation Focused on selected eukaryotic genomes Integrate external (distributed) biological data Presentation of the analysis to all via the Web at Open distribution of the analysis the community Development of open, collaborative software (databases and APIs)

8 of 50 Ensembl v43

9 of 50 Ensembl - Project Joint project –EMBL – European Bioinformatics Institute (EBI) –Wellcome Trust Sanger Institute Produce accurate, automatic genome annotation Focused on selected eukaryotic genomes Integrate external (distributed) biological data Presentation of the analysis to all via the Web at Open distribution of the analysis the community Development of open, collaborative software (databases and APIs)

10 of 50 DAS Registry

11 of 50 DAS

12 of 50 Ensembl - Project Joint project –EMBL – European Bioinformatics Institute (EBI) –Wellcome Trust Sanger Institute Produce accurate, automatic genome annotation Focused on selected eukaryotic genomes Integrate external (distributed) biological data Presentation of the analysis to all via the Web athttp:// Open distribution of the analysis the community Development of open, collaborative software (databases and APIs)

13 of 50 Pr and Archiv sites Pre! and Archive! sites

14 of 50 Ensembl - Project Joint project –EMBL – European Bioinformatics Institute (EBI) –Wellcome Trust Sanger Institute Produce accurate, automatic genome annotation Focused on selected eukaryotic genomes Integrate external (distributed) biological data Presentation of the analysis to all via the Web at Open distribution of the analysis the community Development of open, collaborative software (databases and APIs)

15 of 50 Object model –standard interface makes it easy for others to build custom applications on top of Ensembl data Open discussion of design Most major pharma and many academics represented on mailing list and code is being actively developed externally Ensembl locally –Both industry & academia Open source open standards

16 of 50 Ensembl – Open source

17 of 50 Ensembl - Project Joint project –EMBL – European Bioinformatics Institute (EBI) –Wellcome Trust Sanger Institute Produce accurate, automatic genome annotation Focused on selected eukaryotic genomes Integrate external (distributed) biological data Presentation of the analysis to all via the Web at Open distribution of the analysis the community Development of open, collaborative software (databases and APIs)

18 of 50 APIs Used to retrieve data from and to store data in Ensembl databases. Ensembl Perl API; –Written in Object-Oriented Perl, –Foundation for the Ensembl Pipeline and Ensembl Web interface.

19 of 50 Overview of Ensembl –Ensembl - Project –Exploring genomes –Gene annotation Making genomes useful Beyond Ensembl

20 of 50 Making genomes useful Interpretation –Where are the interesting parts of the genome? –What do they do? –How are they related to elements in other genomes? Access –for bench biologists –for non-programming mid-scale groups –for good programming groups

21 of 50 Access… bench biologists Mainly via the web Web site designed for non programming, not that genome aware biologist –Simple things to find are simple to find –Graphically displays and overviews –Consistency of layout, colour and text

22 of 50 Analysis DB CPU Final DB Supporting Databases SNP Manual Annotation Ensembl

23 of 50 Genome browsing why present the whole genome? Explore what is in a chromosome region See features in and around a specific gene Search & retrieve across the whole genome Investigate genome organization Compare to other genomes

24 of 50 Introduction to the Ensembl web site Ensembl … … takes genomic sequence assemblies human build 36, mouse, rat, mosquito… adds annotation and links automated process presents all the data on a web site

25 of 50 Basic Genome Annotation Genes –Genomic location –Gene model structures Exons Introns UTRs –Transcript(s) Pseudogenes Non-coding RNA –Protein(s) –Links to other sources of information

26 of 50 Advanced Genome Annotation Cytogenetic bands Polymorphic markers –Sequence Tagged Sites (STS) Genetic variation –Single Nucleotide Polymorphisms (SNPs) –Deletion-Insertion Polymorphisms (DIPs) –Short Tandem Repeats (STRs) Repetitive sequences Expressed Sequence Tags (ESTs) cDNAs or mRNAs from related species Regions of sequence homology

27 of 50 How to get started … … Species homepage Map View Text search BLAST SSAHA

28 of 50 Homepage

MapView

30 of 50 BLAST and SSAHA See blast hit on genome

31 of 50 Regions, maps and markers MarkerView SNPView GeneSNPView ContigView CytoView SyntenyView MultiContigView

Ensembl Ensembl ContigView

33 of 62 ContigView ContigView close-up Transcripts red & black (Ensembl predictions) Blue (Vega) & gold (HAVANA, only in human) Pop-up menu

34 of 62 ContigView ContigView - Navigation Click and drag mouse to select region

CytoView

GeneSNP View

SNPView

MarkerView

MultiContigView

40 of 50 Genes & gene products GeneView TransView ExonView ProteinView FamilyView GOView

Ensembl Ensembl GeneView

ExonView TransView

Protein View

Family View

GOView

46 of 50 Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View

ExportView

48 of 50 Help! context sensitive help pages - click access other documentation via generic home page the helpdesk

49 of 50 Ensembl Team July 2006

50 of 50 Leaders Ewan Birney (EBI), Tim Hubbard (Sanger Institute) Database Schema and Core APIGlenn Proctor, Andreas Kähäri, Ian Longden, Patrick Meidl BioMartArek Kasprzyk, Damian Smedley, Richard Holland, Syed Haider Distributed Annotation System (DAS) Eugene Kulesha OutreachXosé M Fernández, Bert Overduin, Giulietta Spudich, Michael Schuster Web Team James Smith, Bethan Pritchard, Fiona Cunningham, Anne Parker, Stephen Rice, Steve Trevanion (VEGA), Matt Wood Comparative Genomics Abel Ureta-Vidal, Kathryn Beal, Benoît Ballester, Stephen Fitzgerald, Javier Herrero Sánchez, Albert Vilella Analysis and Annotation Pipeline Val Curwen, Steve Searle, Bronwen Aken, Julio Banet, Laura Clarke, Sarah Dyer, Jan-Hinnerck Vogel, Kevin Howe, Felix Kokocinski, Stephen Rice, Simon White Functional GenomicsPaul Flicek, Yuan Chen, Stefan Gräf, Nathan Johnson, Daniel Rios Zebrafish AnnotationKerstin Howe, Mario Caccamo, Tina Eyre, Ian Sealy VectorBase AnnotationMartin Hammond, Dan Lawson, Karyn Megy Systems & SupportGuy Coates, Tim Cutts, Shelley Goddard Research Damian Keefe, Guy Slater, Michael Hoffman, Alison Meynert, Benedict Paten, Daniel Zerbino, Dace Ruklisa Ensembl Team March 2007

51 of 50 Training... Somewhere near you