Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden

Similar presentations


Presentation on theme: "1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden"— Presentation transcript:

1 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

2 2 of 42 Several lecture notes taken from: Bert Overduin Ensembl User Support EMBL Outstation European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton, Cambridge, UK Alvaro Martinez Barrio Linneaus Centre for Bioinformatics, Uppsala University, Sweden

3 3 of 42 What is Ensembl A software system which produces and maintains automatic annotation on selected eukaryotic genomes. Perform automatic analysis of new genome data Analysis and annotation maintained on the current data Presentation of the analysis to all via the web Ensembl will concentrate on vertebrate genomes, but other groups have adapted the system for use with plant and fungal genomes Powered by Ensembl shows a list of projects that use Ensembl technologyPowered by Ensembl

4 4 of 42 Ensembl - Organisation Joint project between European Bioinformatics Institute (EMBL-EBI) and Wellcome Trust Sanger Institute Started in 1999 for the Human Genome Project Funded primarily by the Wellcome Trust, additional funding by EMBL, EU, NIH-NIAID, BBSRC and MRC Team of ca. 40 people, led by Ewan Birney (EBI) and Tim Hubbard (Sanger) Uses the largest dedicated computer system in biology in Europe

5 5 of 42 A Bit of History 1995Haemophilus influenzae 1.8 Mb 1996Yeast 12 Mb 1998C. elegans100 Mb 1999Fruit fly125 Mb 2000Arabidopsis115 Mb 2001Human (draft) 2002Mouse 2.6 Gb 2004Human (“finished”) 3 Gb Sequenced genomes

6 6 of 42 Sequencing genomes The term DNA sequencing is a method for determining the order of the nucleotide bases (A,T,C,G)

7 7 of 42 Ensembl genomes (Ensembl release 49 - March 2008)

8 8 of 42 Species in Ensembl CAMBRI ORDO SIL DEV CARBON PER TRIA JURA CRETAC TERTIA 570 505438408360286245208144 65 MYBP FISHES BIRDS REPTILES MAMMALS PLACENTALS MONOTREMES MARSUPIALS OTHER BIRDS PALEOGNATHS PASSERINES CROCODILES TURTLES LIZARDS AMPHIBIANS TELEOSTS SHARKS RAYS LATIMERIA BICHIR/POLYPTERUS LUNGFISHES AGNATHANS NON-VERTEBRATES

9 9 of 42 Ensembl - Goals Provide automatic annotation of genomic sequence Integrate other biological data Make data available to all via the web

10 10 of 42 Annotation Wikipedia : Genome annotation is the process of attaching biological information to sequences. It consists of two main steps: 1.identifying elements on the genome, a process called Gene Finding: - ORFs and their localisation - gene structure - coding regions - location of regulatory motifs 2. attaching biological information to these elements. - biochemical function - biological function - involved regulation and interactions - expression

11 11 of 42 The big Genome Browsers Ensembl Genome browser http://www.ensembl.org NCBI Map Viewer http://www.ncbi.nlm.nih.gov/mapview/ UCSC Genome Browser http://genome.ucsc.edu

12 12 of 42 Ensembl / NCBI Map Viewer / UCSC All allow access of multiple organisms All are based on same data Annotations are different Assembly versions may differ Some organisms specific to only a certain browser

13 13 of 42 NCBI Map Viewer - Opening page

14 14 of 42 NCBI Map Viewer - Result page

15 15 of 42 UCSC Genome Browser - Opening page

16 16 of 42 UCSC Genome Browser - Search page

17 17 of 42 UCSC Genome Browser - Default view

18 18 of 42 UCSC Genome Browser - Options

19 19 of 42 UCSC Genome Browser - BLAT search

20 20 of 42 Ensembl Genome Browser -Opening page

21 21 of 42 Ensembl Genome Browser - Search view Choose human gene

22 22 of 42 Ensembl Genome Browser - Gene view

23 23 of 42 Ensembl Genome Browser - BLAST

24 24 of 42 What Distinguishes Ensembl from the UCSC and NCBI Browsers? Automatic annotation for those species for which no manually curated gene set exists Direct database access and programmatic access via the Perl API Not only the data, but also the software source code is open source

25 25 of 42 Which Data Are Available? Genomic sequence Transcript and peptide models External references Variation data: SNPs Mapped cDNAs, peptides, micro array probes, BAC clones etc. Other features of the genome: cytogenetic bands, markers, repeats etc. Comparative data: orthologues and paralogues, protein families, whole genome alignments, syntenic regions Regulatory data: “best guess” set of regulatory elements Data from external sources (DAS)

26 26 of 42 Genomic sequence Gene location

27 27 of 42 Genomic sequence Export

28 28 of 42 Transcript and peptide info Click to view

29 29 of 42 External references Click to view

30 30 of 42 Single nucleotide polymorphisms (SNPs) Two human genomes differ by ~0.1% Polymorphism: a DNA variation in which each possible sequence is present in at least 1% of people Most polymorphisms (~90%) take the forms of SNPs: variations that involve just one nucleotide ~1 out of every 300 bases in the human genome ~10 million in the human genome

31 31 of 42 Practical Applications Disease diagnosis Association studies Forensic testing Population genetics and evolutionary studies Marker-assisted selection

32 32 of 42 SNPs in Ensembl - Types Non-synonymousIn coding sequence, resulting in an aa change Synonymous In coding sequence, not resulting in an aa change FrameshiftIn coding sequence, resulting in a frameshift Stop lostIn coding sequence, resulting in the loss of a stop codon Stop gainedIn coding sequence, resulting in the gain of a stop codon Essential splice site In the first 2 or the last 2 basepairs of an intron Splice site1-3 bps into an exon or 3-8 bps into an intron UpstreamWithin 5 kb upstream of the 5'-end of a transcript Regulatory regionIn regulatory region annotated by Ensembl 5' UTRIn 5' UTR IntronicIn intron 3' UTRIn 3' UTR DownstreamWithin 5 kb downstream of the 3'-end of a transcript IntergenicMore than 5 kb away from a transcript

33 33 of 42 SNPs in Ensembl ContigView: SNPs in genomic context

34 34 of 42 SNPs in Ensembl

35 35 of 42 Biological Evidence UniProt/Swiss-Prot A manually curated database and therefore of highest accuracy NCBI RefSeq A partially manually curated database UniProt/TrEMBL Automatically annotated translations of EMBL coding sequence (CDS) features EMBL / GenBank / DDBJ Primary nucleotide sequence repository All Ensembl gene predictions are based on experimental evidence:

36 36 of 42 The Ensembl Genebuild Genome assembly Computer programs Experimental evidence Ensembl Genes + +

37 37 of 42 Ensembl Identifiers ENSG###Ensembl Gene ID ENST###Ensembl Transcript ID ENSP###Ensembl Peptide ID ENSE###Ensembl Exon ID ENSF###Ensembl Family ID ENSR###Ensembl Regulatory Feature ID For other species than human a suffix is added: MUS for mouse (Mus musculus) : ENSMUSG###, DAR for zebrafish (Danio rerio) : ENSDARG### etc.etc. For imported genes Ensembl uses the original identifiers

38 38 of 42 Pre! and Archive! Sites

39 39 of 42 Powered by Ensembl

40 40 of 42 Ensembl – Open Source Data and software freely available More than 50 installs worldwide Academia and industry Local or available via the web Mirrors with Ensembl data, e.g. http://ensembl.genome.tugraz.at/index.html http://ensembl.genome.tugraz.at/index.html or user projects with own data

41 41 of 42 Ensembl Accounts Personalise Ensembl by saving bookmarks, view configurations and homepage preferences in a user account Share bookmarks and configurations by setting up groups Please note that all Ensembl data remains free access. It is not necessary to register in order to gain access to Ensembl data!

42 42 of 42 Website Statistics On average 1,000,000 page impressions / week Top 3 species: Top 3 countries:

43 43 of 42 What If I Need Help? Helpdesk: helpdesk@ensembl.org Mailing lists: ensembl-dev@ebi.ac.uk ensembl-announce@ebi.ac.uk Animated tutorials http://www.ensembl.org/common/Workshops_Online

44 44 of 42 Today 1.Ensembl: www.ensembl.org 1.WORKED EXAMPLE: A walk through the main pages of the Ensembl browser, using the EPO (Erythropoietin precursor) gene as an example (Course Homepage). 2.Ensembl Exercise: Answering questions by using Ensembl (Course Homepage). 3.If time, find information about your favorite gene by using Ensembl.


Download ppt "1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden"

Similar presentations


Ads by Google