Presentation is loading. Please wait.

Presentation is loading. Please wait.

April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

Similar presentations


Presentation on theme: "April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl."— Presentation transcript:

1 April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl

2 2 of 50 Overview of Ensembl Making genomes useful Beyond Ensembl Outline of talk

3 3 of 50 Overview of Ensembl –Ensembl - Project –Exploring genomes –Gene annotation Making genomes useful Beyond Ensembl Outline of talk

4 4 of 50 Ensembl - Project Joint project –EMBL – European Bioinformatics Institute (EBI) –Wellcome Trust Sanger Institute Produce accurate, automatic genome annotation Focused on selected eukaryotic genomes Integrate external (distributed) biological data Presentation of the analysis to all via the Web at http://www.ensembl.org Open distribution of the analysis the community Development of open, collaborative software (databases and APIs)

5 5 of 50 Ensembl - Project Joint project –EMBL – European Bioinformatics Institute (EBI) –Wellcome Trust Sanger Institute Produce accurate, automatic genome annotation Focused on selected eukaryotic genomes Integrate external (distributed) biological data Presentation of the analysis to all via the Web at http://www.ensembl.org Open distribution of the analysis the community Development of open, collaborative software (databases and APIs)

6 6 of 50 Beyond classical ab initio gene prediction Ensembl automatic gene prediction relies on homology ‘supporting evidence’ to avoid overprediction. Classical ab initio gene prediction (eg GENSCAN ) relies partly on global statistics of protein coding potentials, not used in the cell Genes are just a series of short signals –Transcription start site –Translation start site –5’ & 3’ Intron splicing signals –Termination signals Short signal sequences difficult to recognise over background noise in large genomes

7 7 of 50 Ensembl - Project Joint project –EMBL – European Bioinformatics Institute (EBI) –Wellcome Trust Sanger Institute Produce accurate, automatic genome annotation Focused on selected eukaryotic genomes Integrate external (distributed) biological data Presentation of the analysis to all via the Web at http://www.ensembl.org Open distribution of the analysis the community Development of open, collaborative software (databases and APIs)

8 8 of 50 Ensembl v43

9 9 of 50 Ensembl - Project Joint project –EMBL – European Bioinformatics Institute (EBI) –Wellcome Trust Sanger Institute Produce accurate, automatic genome annotation Focused on selected eukaryotic genomes Integrate external (distributed) biological data Presentation of the analysis to all via the Web at http://www.ensembl.org Open distribution of the analysis the community Development of open, collaborative software (databases and APIs)

10 10 of 50 http://www.dasregistry.org DAS Registry

11 11 of 50 DAS

12 12 of 50 Ensembl - Project Joint project –EMBL – European Bioinformatics Institute (EBI) –Wellcome Trust Sanger Institute Produce accurate, automatic genome annotation Focused on selected eukaryotic genomes Integrate external (distributed) biological data Presentation of the analysis to all via the Web athttp://www.ensembl.org Open distribution of the analysis the community Development of open, collaborative software (databases and APIs)

13 13 of 50 Pr and Archiv sites Pre! and Archive! sites http://pre.ensembl.org http://www.ensembl.org http://archive.ensembl.org

14 14 of 50 Ensembl - Project Joint project –EMBL – European Bioinformatics Institute (EBI) –Wellcome Trust Sanger Institute Produce accurate, automatic genome annotation Focused on selected eukaryotic genomes Integrate external (distributed) biological data Presentation of the analysis to all via the Web at http://www.ensembl.org Open distribution of the analysis the community Development of open, collaborative software (databases and APIs)

15 15 of 50 Object model –standard interface makes it easy for others to build custom applications on top of Ensembl data Open discussion of design (ensembl-dev@ebi.ac.uk) Most major pharma and many academics represented on mailing list and code is being actively developed externally Ensembl locally –Both industry & academia Open source open standards

16 16 of 50 Ensembl – Open source

17 17 of 50 Ensembl - Project Joint project –EMBL – European Bioinformatics Institute (EBI) –Wellcome Trust Sanger Institute Produce accurate, automatic genome annotation Focused on selected eukaryotic genomes Integrate external (distributed) biological data Presentation of the analysis to all via the Web at http://www.ensembl.org Open distribution of the analysis the community Development of open, collaborative software (databases and APIs)

18 18 of 50 APIs Used to retrieve data from and to store data in Ensembl databases. Ensembl Perl API; –Written in Object-Oriented Perl, –Foundation for the Ensembl Pipeline and Ensembl Web interface.

19 19 of 50 Overview of Ensembl –Ensembl - Project –Exploring genomes –Gene annotation Making genomes useful Beyond Ensembl

20 20 of 50 Making genomes useful Interpretation –Where are the interesting parts of the genome? –What do they do? –How are they related to elements in other genomes? Access –for bench biologists –for non-programming mid-scale groups –for good programming groups

21 21 of 50 Access… bench biologists Mainly via the web Web site designed for non programming, not that genome aware biologist –Simple things to find are simple to find –Graphically displays and overviews –Consistency of layout, colour and text

22 22 of 50 Analysis DB CPU Final DB Supporting Databases SNP Manual Annotation Ensembl

23 23 of 50 Genome browsing why present the whole genome? Explore what is in a chromosome region See features in and around a specific gene Search & retrieve across the whole genome Investigate genome organization Compare to other genomes

24 24 of 50 Introduction to the Ensembl web site Ensembl … … takes genomic sequence assemblies human build 36, mouse, rat, mosquito… adds annotation and links automated process presents all the data on a web site

25 25 of 50 Basic Genome Annotation Genes –Genomic location –Gene model structures Exons Introns UTRs –Transcript(s) Pseudogenes Non-coding RNA –Protein(s) –Links to other sources of information

26 26 of 50 Advanced Genome Annotation Cytogenetic bands Polymorphic markers –Sequence Tagged Sites (STS) Genetic variation –Single Nucleotide Polymorphisms (SNPs) –Deletion-Insertion Polymorphisms (DIPs) –Short Tandem Repeats (STRs) Repetitive sequences Expressed Sequence Tags (ESTs) cDNAs or mRNAs from related species Regions of sequence homology

27 27 of 50 How to get started … … Species homepage Map View Text search BLAST SSAHA

28 28 of 50 Homepage

29 MapView

30 30 of 50 BLAST and SSAHA See blast hit on genome

31 31 of 50 Regions, maps and markers MarkerView SNPView GeneSNPView ContigView CytoView SyntenyView MultiContigView

32 Ensembl Ensembl ContigView

33 33 of 62 ContigView ContigView close-up Transcripts red & black (Ensembl predictions) Blue (Vega) & gold (HAVANA, only in human) Pop-up menu

34 34 of 62 ContigView ContigView - Navigation Click and drag mouse to select region

35 CytoView

36 GeneSNP View

37 SNPView

38 MarkerView

39 MultiContigView

40 40 of 50 Genes & gene products GeneView TransView ExonView ProteinView FamilyView GOView

41 Ensembl Ensembl GeneView

42 ExonView TransView

43 Protein View

44 Family View

45 GOView

46 46 of 50 Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View

47 ExportView

48 48 of 50 Help! context sensitive help pages - click access other documentation via generic home page email the helpdesk

49 49 of 50 Ensembl Team July 2006

50 50 of 50 Leaders Ewan Birney (EBI), Tim Hubbard (Sanger Institute) Database Schema and Core APIGlenn Proctor, Andreas Kähäri, Ian Longden, Patrick Meidl BioMartArek Kasprzyk, Damian Smedley, Richard Holland, Syed Haider Distributed Annotation System (DAS) Eugene Kulesha OutreachXosé M Fernández, Bert Overduin, Giulietta Spudich, Michael Schuster Web Team James Smith, Bethan Pritchard, Fiona Cunningham, Anne Parker, Stephen Rice, Steve Trevanion (VEGA), Matt Wood Comparative Genomics Abel Ureta-Vidal, Kathryn Beal, Benoît Ballester, Stephen Fitzgerald, Javier Herrero Sánchez, Albert Vilella Analysis and Annotation Pipeline Val Curwen, Steve Searle, Bronwen Aken, Julio Banet, Laura Clarke, Sarah Dyer, Jan-Hinnerck Vogel, Kevin Howe, Felix Kokocinski, Stephen Rice, Simon White Functional GenomicsPaul Flicek, Yuan Chen, Stefan Gräf, Nathan Johnson, Daniel Rios Zebrafish AnnotationKerstin Howe, Mario Caccamo, Tina Eyre, Ian Sealy VectorBase AnnotationMartin Hammond, Dan Lawson, Karyn Megy Systems & SupportGuy Coates, Tim Cutts, Shelley Goddard Research Damian Keefe, Guy Slater, Michael Hoffman, Alison Meynert, Benedict Paten, Daniel Zerbino, Dace Ruklisa Ensembl Team March 2007

51 51 of 50 Training... Somewhere near you


Download ppt "April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl."

Similar presentations


Ads by Google