1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden

Slides:



Advertisements
Similar presentations
© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
Advertisements

Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Genomic Innovations- Orthology Paralogy. Genomic innovation.
1 of 25 Sequence Variation in Ensembl. 2 of 25 Outline SNPs SNPs in Ensembl Linkage disequilibrium SNPs in BioMart DAS sources.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Genome Browsers Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.
Lecture 7.11 The Ensembl Database Erin Pleasance Steven Jones Canada’s Michael Smith Genome Sciences Centre, Vancouver.
Genes. Outline  Genes: definitions  Molecular genetics - methodology  Genome Content  Molecular structure of mRNA-coding genes  Genetics  Gene regulation.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Genome Browsers Ensembl (EBI, UK) and UCSC (Santa Cruz, California)
Bootcamp: Data Resources1 Paul Bain Reference and Education Services Librarian Countway Library of Medicine Countway.
Genomic Database - Ensembl Ka-Lok Ng Department of Bioinformatics Asia University.
Displaying associations, improving alignments and gene sets at UCSC Jim Kent and the UCSC Genome Bioinformatics Group.
How to access genomic information using Ensembl August 2005.
BI420 – Course information Web site: Instructor: Gabor Marth Teaching.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
Genome Sequencing & App. of DNA Technologies Genomics is a branch of science that focuses on the interactions of sets of genes with the environment. –
The Ensembl Gene set The “Genebuild” 21 April 2008.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
VectorBase A Resource Centre for Invertebrate Hosts of Human Pathogens Bob MacCallum Imperial College London.
1 of 42 Browsing Genes and Genomes with Ensembl Bert Overduin Ensembl User Support EMBL Outstation European Bioinformatics Institute Wellcome Trust Genome.
Kerstin Howe, Mario Caccamo, Ian Sealy The Zebrafish Genome Sequencing Project Bioinformatics resources.
Genome Annotation and Databases Genomic DNA sequence Genomic annotation BIO520 BioinformaticsJim Lund Reading Ch 9, Ch10.
Manifestations of a Code Genes, genomes, bioinformatics and cyberspace – and the promise they hold for biology education.
Genomes School B&I TCD Bioinformatics May Genome sizes Completed eukaryotic nuclear genomes Type of organismSpeciesGenome size (10 6 base pairs)
EBI is an Outstation of the European Molecular Biology Laboratory. Bert Overduin Daniel Rios Stephen Fitzgerald Edinburgh, 24 & 25 February 2009 Ensembl.
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Genome Organization and Evolution. Assignment For 2/24/04 Read: Lesk, Chapter 2 Exercises 2.1, 2.5, 2.7, p 110 Problem 2.2, p 112 Weblems 2.4, 2.7, pp.
Molecular Biology Primer. Starting 19 th century… Cellular biology: Cell as a fundamental building block 1850s+: ``DNA’’ was discovered by Friedrich Miescher.
Introduction to Bioinformatics Databases. DNARNAphenotypeprotein Central dogma of molecular biology A main focus of bioinformatics is to study molecular.
An Introduction to ENSEMBL Cédric Notredame. The Top 5 Surprises in the Human Genome Map 1.The blue gene exists in 3 genotypes: Straight Leg, Loose Fit.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
EMBL-EBI EMBL-EBI EMBL-EBI What is the EBI's particular niche? Provides Core Biomolecular Resources in Europe –Nucleotide; genome, protein sequences,
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
1 of 32 Sequence Variation in Ensembl. 2 of 32 Outline SNPs SNPs in Ensembl Haplotypes & Linkage Disequilibrium SNPs in BioMart HapMap project Strain-specific.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Srr-1 from Streptococcus. i/v nonpolar s serine (polar uncharged) n/s/t polar uncharged s serine (polar uncharged) e glutamic acid (neg. charge) sserine.
Sackler Medical School
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Data Mining in Ensembl with BioMart Giulietta Spudich.
Bioinformatics and Computational Biology
The iPlant Collaborative Vision Enable life science researchers and educators to use and extend cyberinfrastructure.
How can we find genes? Search for them Look them up.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
The iPlant Collaborative Vision Enable life science researchers and educators to use and extend cyberinfrastructure.
A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.
The Future of Genetics Research Lesson 7. Human Genome Project 13 year project to sequence human genome and other species (fruit fly, mice yeast, nematodes,
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
Accessing and visualizing genomics data
Gene Finding in Chimpanzee Evidence based improvement of ab initio gene predictions Chris Shaffer06/2009.
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
Welcome to the combined BLAST and Genome Browser Tutorial.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
Lecture/Lab 7.31
Introduction to Genes and Genomes with Ensembl
The Ensembl Database Steven Jones August 18, 2004
Comparative Genomics.
Access to Sequence Data and Related Information
Genomes and Their Evolution
Ensembl Genome Repository.
Part II SeqViewer AraCyc Help
Welcome - webinar instructions
Presentation transcript:

1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden

2 of 42 Several lecture notes taken from: Bert Overduin Ensembl User Support EMBL Outstation European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton, Cambridge, UK Alvaro Martinez Barrio Linneaus Centre for Bioinformatics, Uppsala University, Sweden

3 of 42 What is Ensembl A software system which produces and maintains automatic annotation on selected eukaryotic genomes. Perform automatic analysis of new genome data Analysis and annotation maintained on the current data Presentation of the analysis to all via the web Ensembl will concentrate on vertebrate genomes, but other groups have adapted the system for use with plant and fungal genomes Powered by Ensembl shows a list of projects that use Ensembl technologyPowered by Ensembl

4 of 42 Ensembl - Organisation Joint project between European Bioinformatics Institute (EMBL-EBI) and Wellcome Trust Sanger Institute Started in 1999 for the Human Genome Project Funded primarily by the Wellcome Trust, additional funding by EMBL, EU, NIH-NIAID, BBSRC and MRC Team of ca. 40 people, led by Ewan Birney (EBI) and Tim Hubbard (Sanger) Uses the largest dedicated computer system in biology in Europe

5 of 42 A Bit of History 1995Haemophilus influenzae 1.8 Mb 1996Yeast 12 Mb 1998C. elegans100 Mb 1999Fruit fly125 Mb 2000Arabidopsis115 Mb 2001Human (draft) 2002Mouse 2.6 Gb 2004Human (“finished”) 3 Gb Sequenced genomes

6 of 42 Sequencing genomes The term DNA sequencing is a method for determining the order of the nucleotide bases (A,T,C,G)

7 of 42 Ensembl genomes (Ensembl release 49 - March 2008)

8 of 42 Species in Ensembl CAMBRI ORDO SIL DEV CARBON PER TRIA JURA CRETAC TERTIA MYBP FISHES BIRDS REPTILES MAMMALS PLACENTALS MONOTREMES MARSUPIALS OTHER BIRDS PALEOGNATHS PASSERINES CROCODILES TURTLES LIZARDS AMPHIBIANS TELEOSTS SHARKS RAYS LATIMERIA BICHIR/POLYPTERUS LUNGFISHES AGNATHANS NON-VERTEBRATES

9 of 42 Ensembl - Goals Provide automatic annotation of genomic sequence Integrate other biological data Make data available to all via the web

10 of 42 Annotation Wikipedia : Genome annotation is the process of attaching biological information to sequences. It consists of two main steps: 1.identifying elements on the genome, a process called Gene Finding: - ORFs and their localisation - gene structure - coding regions - location of regulatory motifs 2. attaching biological information to these elements. - biochemical function - biological function - involved regulation and interactions - expression

11 of 42 The big Genome Browsers Ensembl Genome browser NCBI Map Viewer UCSC Genome Browser

12 of 42 Ensembl / NCBI Map Viewer / UCSC All allow access of multiple organisms All are based on same data Annotations are different Assembly versions may differ Some organisms specific to only a certain browser

13 of 42 NCBI Map Viewer - Opening page

14 of 42 NCBI Map Viewer - Result page

15 of 42 UCSC Genome Browser - Opening page

16 of 42 UCSC Genome Browser - Search page

17 of 42 UCSC Genome Browser - Default view

18 of 42 UCSC Genome Browser - Options

19 of 42 UCSC Genome Browser - BLAT search

20 of 42 Ensembl Genome Browser -Opening page

21 of 42 Ensembl Genome Browser - Search view Choose human gene

22 of 42 Ensembl Genome Browser - Gene view

23 of 42 Ensembl Genome Browser - BLAST

24 of 42 What Distinguishes Ensembl from the UCSC and NCBI Browsers? Automatic annotation for those species for which no manually curated gene set exists Direct database access and programmatic access via the Perl API Not only the data, but also the software source code is open source

25 of 42 Which Data Are Available? Genomic sequence Transcript and peptide models External references Variation data: SNPs Mapped cDNAs, peptides, micro array probes, BAC clones etc. Other features of the genome: cytogenetic bands, markers, repeats etc. Comparative data: orthologues and paralogues, protein families, whole genome alignments, syntenic regions Regulatory data: “best guess” set of regulatory elements Data from external sources (DAS)

26 of 42 Genomic sequence Gene location

27 of 42 Genomic sequence Export

28 of 42 Transcript and peptide info Click to view

29 of 42 External references Click to view

30 of 42 Single nucleotide polymorphisms (SNPs) Two human genomes differ by ~0.1% Polymorphism: a DNA variation in which each possible sequence is present in at least 1% of people Most polymorphisms (~90%) take the forms of SNPs: variations that involve just one nucleotide ~1 out of every 300 bases in the human genome ~10 million in the human genome

31 of 42 Practical Applications Disease diagnosis Association studies Forensic testing Population genetics and evolutionary studies Marker-assisted selection

32 of 42 SNPs in Ensembl - Types Non-synonymousIn coding sequence, resulting in an aa change Synonymous In coding sequence, not resulting in an aa change FrameshiftIn coding sequence, resulting in a frameshift Stop lostIn coding sequence, resulting in the loss of a stop codon Stop gainedIn coding sequence, resulting in the gain of a stop codon Essential splice site In the first 2 or the last 2 basepairs of an intron Splice site1-3 bps into an exon or 3-8 bps into an intron UpstreamWithin 5 kb upstream of the 5'-end of a transcript Regulatory regionIn regulatory region annotated by Ensembl 5' UTRIn 5' UTR IntronicIn intron 3' UTRIn 3' UTR DownstreamWithin 5 kb downstream of the 3'-end of a transcript IntergenicMore than 5 kb away from a transcript

33 of 42 SNPs in Ensembl ContigView: SNPs in genomic context

34 of 42 SNPs in Ensembl

35 of 42 Biological Evidence UniProt/Swiss-Prot A manually curated database and therefore of highest accuracy NCBI RefSeq A partially manually curated database UniProt/TrEMBL Automatically annotated translations of EMBL coding sequence (CDS) features EMBL / GenBank / DDBJ Primary nucleotide sequence repository All Ensembl gene predictions are based on experimental evidence:

36 of 42 The Ensembl Genebuild Genome assembly Computer programs Experimental evidence Ensembl Genes + +

37 of 42 Ensembl Identifiers ENSG###Ensembl Gene ID ENST###Ensembl Transcript ID ENSP###Ensembl Peptide ID ENSE###Ensembl Exon ID ENSF###Ensembl Family ID ENSR###Ensembl Regulatory Feature ID For other species than human a suffix is added: MUS for mouse (Mus musculus) : ENSMUSG###, DAR for zebrafish (Danio rerio) : ENSDARG### etc.etc. For imported genes Ensembl uses the original identifiers

38 of 42 Pre! and Archive! Sites

39 of 42 Powered by Ensembl

40 of 42 Ensembl – Open Source Data and software freely available More than 50 installs worldwide Academia and industry Local or available via the web Mirrors with Ensembl data, e.g. or user projects with own data

41 of 42 Ensembl Accounts Personalise Ensembl by saving bookmarks, view configurations and homepage preferences in a user account Share bookmarks and configurations by setting up groups Please note that all Ensembl data remains free access. It is not necessary to register in order to gain access to Ensembl data!

42 of 42 Website Statistics On average 1,000,000 page impressions / week Top 3 species: Top 3 countries:

43 of 42 What If I Need Help? Helpdesk: Mailing lists: Animated tutorials

44 of 42 Today 1.Ensembl: 1.WORKED EXAMPLE: A walk through the main pages of the Ensembl browser, using the EPO (Erythropoietin precursor) gene as an example (Course Homepage). 2.Ensembl Exercise: Answering questions by using Ensembl (Course Homepage). 3.If time, find information about your favorite gene by using Ensembl.