Presentation is loading. Please wait.

Presentation is loading. Please wait.

How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November.

Similar presentations


Presentation on theme: "How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November."— Presentation transcript:

1 How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November 2004

2 2 of 45 Schedule Today Introduction to the Ensembl system Hands-on examples to introduce the system Evaluating genes and transcripts Variation in Ensembl (SNPs, haplotypes) Tomorrow Data mining with EnsMart Comparative genomics and proteomics in Ensembl BioMart Advanced topics (Upload your own data, DAS)

3 3 of 45 Our goal

4 4 of 45 Other ordering data to 26,720 overlapping clones From 325,109 initial contigs Assembly non-redundant, “virtual contig” view

5 finished BAC draft sequence assembly WGS fragment pUCs avg size 2-4 kb Bentley et al 2001 Bruls et al 2001 McPherson et al 2001 Montgomery et al 2001 Tilford et al 2001 map Osoegawa et al 2001 fragment BACs bacterial artificial chromosomes avg size 150 kb Shizuya et al 1992 Dib et al 1996 Deloukas et al 1998 Mapping and Sequencing the human genome

6 Status of the human sequence finished red /orange ~96% (99.999% accurate) 30-40% repetitive elements ( eg Alpha satellite, Alu repeats ) All known genes, correctly identified (99.74%) heterochromatin ~4% grey Assembled draft sequence totals 2.85 Gb

7 7 of 45 Human genome: Current status 22,287 'gene loci‘ defined, consisting of 19,599 protein-coding genes in the human genome and 2,188 DNA additional segments ‘predicted’ to be protein-coding genes –1183 genes ‘were born’ in the last 60-100 My –~ 30 genes ‘died’ in a similar time period Finishing the euchromatic sequence of the human genome, Nature 431:931-45 (2004)

8 8 of 45 Ensembl - project aims funded to provide metazoan genomes to the world aims to provide the world’s best automated genome annotation a leading group for human and mouse analysis all software, data and results freely available

9 9 of 45 Ensembl - project background group split between EBI and Sanger mainly Wellcome Trust funded largest dedicated compute in biology in Europe developer community > 100 people, including companies

10 10 of 45 Freely-available Community development. – >51 Ensembl installs worldwide. – Both public and commercial, e.g. Gramene (CSHL)Gramene Fugu-sg (ICMB)Fugu-sg Ciona-sg (Temasek)Ciona-sg Ensembl – Open source

11 11 of 45 Analysis DB CPU Final DB Supporting Databases SNP Manual Annotation Ensembl

12 12 of 45 Genome browsing why present the whole genome? Explore what is in a chromosome region See features in and around a specific gene Search & retrieve across the whole genome Investigate genome organization Compare to other genomes

13 13 of 45 Ensembl – public site + installable system Genome browsers NCBI Map Viewer UCSC Human Genome Browser http://www.ensembl.org http://www.ncbi.nlm.nih.gov/mapview http://genome.ucsc.edu

14 14 of 45 Introduction to the Ensembl web site Ensembl … … takes genomic sequence assemblies human build 34, mouse, rat, Fugu,mosquito adds annotation and links automated process presents all the data on a web site

15 15 of 45 Known genesNovel genes where? genomic structure? transcripts(s)? protein(s)? orthologues? attach useful links how to predict? require evidence transcripts(s)? protein(s)? orthologues? attach useful links Annotation: genes

16 16 of 45 Annotation: other features markers and SNPs cytogenetic bands repeated sequences ESTs & other sequence records where do they show sequence similarity? regions homologous to other species

17 17 of 45 How to get started … … Species homepage Site map Map View Text search BLAST SSAHA Disease View

18 Homepage

19 Site map

20 MapView AnchorView

21 BLAST and SSAHA

22

23 23 of 45 Regions, maps and markers MarkerView SNPView ContigView CytoView SyntenyView MultiContigView

24 Ensembl ContigView

25 ContigView close-up Evidence Transcripts red & black (Ensembl predictions) Blue (Vega) Customising & short cuts Pop-up menu

26 ContigView - Chromosome 20 close-up Manual annotation via Vega Ensembl predictions Ensembl EST-based predictions Forward strand Reverse strand Other chromosomes with manual annotation from http://vega.sanger.ac.uk : 6, 7, 9, 10, 13, 14, 20, 22, X

27 CytoView

28 GeneSNP View

29 MarkerView SNPView

30 Synteny View

31 MultiContig View

32 32 of 45 Genes & gene products GeneView TransView ExonView ProteinView FamilyView DomainView GOView DiseaseView

33 Ensembl GeneView

34 TransView ExonView

35 Protein View

36 Family View

37 GOView

38 DiseaseView

39 39 of 45 Data retrieval EnsMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View

40

41 EnsMart

42 42 of 45 Mouse differences Genomic sequence assembly based on whole genome shotgun, with finished ‘stitched’ BACs BACs are shown in CytoView (FPC map), but for most no sequence is available

43 Mouse CytoView

44 44 of 45 Help! context sensitive help pages - click access other documentation via generic home page email the helpdesk HelpDesk / Suggestions

45 45 of 45 Thanks Ensembl Team

46 Database Schema and Core API Arne Stabenau Yuan Chen Ian Longden Craig Melsopp Glenn Proctor Daniel Ríos Guy Slater Distributed Annotation System Andreas Kähäri Project Leader Ewan Birney (EBI) Tim Hubbard (Sanger) Ensembl Web Team James Stalker Fiona Cunningham James Smith Vega Web Team Patrick Meidl Steve Trevianon Analysis and Annotation Pipeline Val Curwen Steve Searle Dan Andrews Mario Caccamo Laura Clarke Martin Hammond Jan Hinnerck-Vogel Kevin Howe Vivek Iyer Kerstin Jekosch Felix Kokocinski Simon White User Support Xosé Mª Fernández Michael Schuster Comparative Genomics Abel Ureta-Vidal Javier Herrero Sánchez Jessica Severin Cara Woodwark EnsMart & BioMart Arek Kasprzyk Damian Keefe Darin London Damian Smedley Ensembl Team November 2004


Download ppt "How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November."

Similar presentations


Ads by Google