Ensembl Database and Web Browser

Slides:



Advertisements
Similar presentations
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Advertisements

Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Genome Browsers Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
Lecture 7.11 The Ensembl Database Erin Pleasance Steven Jones Canada’s Michael Smith Genome Sciences Centre, Vancouver.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Genome Related Biological Databases. Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website.
Data Mining in Ensembl with EnsMart. 2 of 24 All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes.
Genomic Database - Ensembl Ka-Lok Ng Department of Bioinformatics Asia University.
How to access genomic information using Ensembl August 2005.
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
Genome Annotation BCB 660 October 20, From Carson Holt.
Doug Brutlag 2011 Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University School of Medicine Genomics, Bioinformatics.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
Doug Brutlag 2011 Next Generation Sequencing and Human Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University.
The Ensembl Gene set The “Genebuild” 21 April 2008.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Kerstin Howe, Mario Caccamo, Ian Sealy The Zebrafish Genome Sequencing Project Bioinformatics resources.
Genome Annotation and Databases Genomic DNA sequence Genomic annotation BIO520 BioinformaticsJim Lund Reading Ch 9, Ch10.
How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Genome Annotation BBSI July 14, 2005 Rita Shiang.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Copyright OpenHelix. No use or reproduction without express written consent1.
An Introduction to ENSEMBL Cédric Notredame. The Top 5 Surprises in the Human Genome Map 1.The blue gene exists in 3 genotypes: Straight Leg, Loose Fit.
EnsEMBL Opening up the whole Genome Philip Lijnzaad
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
1 of 32 Sequence Variation in Ensembl. 2 of 32 Outline SNPs SNPs in Ensembl Haplotypes & Linkage Disequilibrium SNPs in BioMart HapMap project Strain-specific.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Srr-1 from Streptococcus. i/v nonpolar s serine (polar uncharged) n/s/t polar uncharged s serine (polar uncharged) e glutamic acid (neg. charge) sserine.
Sackler Medical School
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,
Data Mining in Ensembl with BioMart Giulietta Spudich.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
Gene Finding in Chimpanzee Evidence based improvement of ab initio gene predictions Chris Shaffer06/2009.
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
Welcome to the combined BLAST and Genome Browser Tutorial.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
Lecture/Lab 7.31
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Web Databases for Drosophila
bacteria and eukaryotes
Introduction to Genes and Genomes with Ensembl
The Ensembl Database Steven Jones August 18, 2004
Data Mining with BioMart
Functional Annotation of the Horse Genome
Visualization of genomic data
Visualization of genomic data
Genome Annotation w/ MAKER
Ensembl Genome Repository.
Next Generation Sequencing and Human Genome Databases
with the Ensembl Genome Browser
Gene Safari (Biological Databases)
Problems from last section
Welcome - webinar instructions
Presentation transcript:

Ensembl Database and Web Browser www.ensembl.org Erin Pleasance February 21, 2005 Ensembl Database and Web Browser www.ensembl.org Stephen Baird Apoptosis Research Centre Children’s Hospital of Eastern Ontario sbaird@arc.cheo.ca Lecture/Lab 7.3 (c) 2005 CGDN

Focus on vertebrates No fungi/plants Erin Pleasance February 21, 2005 Focus on vertebrates No fungi/plants Brassica/Arabidopsis genome browser is at http://ensembl.warwick.ac.uk/ Lecture 7.1 (c) 2005 CGDN

What is Ensembl? Joint project of EBI and Sanger Erin Pleasance What is Ensembl? February 21, 2005 Joint project of EBI and Sanger Automated annotation of eukaryotic genomes Open source software Relational database system Web interface “The main aim of this campaign is to encourage scientists across the world - in academia, pharmaceutical companies, and the biotechnology and computer industries - to use this free information.” - Dr. Mike Dexter, Director of the Wellcome Trust Lecture/Lab 7.3 (c) 2005 CGDN

Ensembl components Search tools: Data: Chromosomes SNPs and Haplotypes Erin Pleasance Ensembl components February 21, 2005 Search tools: Data: Chromosomes (FeatureView, KaryoView, Ctyoview, MapView) SNPs and Haplotypes (SNPView, GeneSNPView, HaploView, LDView) Sequence Similarity (BLAST, SSAHA) Diseases (DiseaseView) Genome Sequence (ContigView) Genes (GeneView, TransView, ExonView, GeneSeqView) Markers (MarkerView) Functions (GOView) Text (TextView) Other Annotations Protein (ProtView, DomainView, FamilyView Anything (BioMart/Martview) Comparative Genomics (ContigView, MultiContigView, SyntenyView, GeneView) Lecture/Lab 7.3 (c) 2005 CGDN

Ensembl Gene Annotation Erin Pleasance Ensembl Gene Annotation February 21, 2005 “Basis for initial analysis and publication of most vertebrate genomes” Genome assembly from NCBI Gene build system Targeted gene builds predict known genes Similarity gene builds predict novel genes Lecture/Lab 7.3 (c) 2005 CGDN

Curwen et al, Genome Res 14: 942-950, 2004 Erin Pleasance February 21, 2005 Curwen et al, Genome Res 14: 942-950, 2004 Lecture/Lab 7.3 (c) 2005 CGDN

Targeted gene build Align known proteins with pmatch and BLAST Erin Pleasance Targeted gene build February 21, 2005 Align known proteins with pmatch and BLAST Incorporate aligned cDNA sequences to find splice sites, UTRs with genewise ContigView of best in genome gene with associated evidence UTRs predicted Known gene (p53) Proteins aligned Unigene clusters aligned cDNAs aligned Lecture/Lab 7.3 (c) 2005 CGDN

Similarity gene build Identify novel exons ab initio using Genscan Erin Pleasance Similarity gene build February 21, 2005 Identify novel exons ab initio using Genscan Confirm exons by BLAST to known proteins, mRNAs, UniGene clusters ContigView of homology gene with associated evidence Novel gene GenScan predictions Proteins aligned Unigene clusters aligned Lecture/Lab 7.3 (c) 2005 CGDN

Ensembl Gene Annotation Erin Pleasance Ensembl Gene Annotation February 21, 2005 Resulting “Ensembl genes” are highly accurate with low false positive rates Ensembl human gene identifiers are 95% stable between builds Ensembl and RefSeq differ with 8-12% of the genes The Consensus CDS (CCDS) project is a collaborative effort between Ensembl/EBI, UCSC and NCBI to identify a core set of human protein coding regions that are consistently annotated and of high quality (~13,000 genes). Lecture/Lab 7.3 (c) 2005 CGDN

Manually curated genes: VEGA Erin Pleasance February 21, 2005 Some chromosomes contain manually curated genes from VEGA database “Otter manual annotation system” allows integration of automatic and manual annotations (eg. from Apollo) into Ensembl by The Human and Vertebrate Annotation (HAVANA) group annotators at the Sanger center VEGA gene Lecture/Lab 7.3 (c) 2005 CGDN

Ensembl EST genes Erin Pleasance February 21, 2005 ESTs not accurate enough to produce Ensembl genes, but important for identifying alternative transcripts ESTs aligned to genome and merged to create an independent set of “EST genes” Known gene EST genes Unigene clusters aligned Lecture/Lab 7.3 (c) 2005 CGDN

Erin Pleasance Pseudogenes February 21, 2005 Processed pseudogenes in annotation identified (lack of introns, frameshifts, presence of multi-exon version elsewhere in genome, etc.) Pseudogene Lecture/Lab 7.3 (c) 2005 CGDN

Erin Pleasance Noncoding RNA Genes February 21, 2005 Genes with no ORFs that are functional (tRNAs, rRNAs, miRNAs …) 7220 annotations from Sean Eddy and Tom Jones miRNAs Coding gene Lecture/Lab 7.3 (c) 2005 CGDN

Example 1: Exploring Caspase-3 Erin Pleasance Example 1: Exploring Caspase-3 February 21, 2005 Aim to demonstrate basic browsing and views Caspase-3 is a gene involved in apoptosis (cell suicide) We will look at: Gene annotation SNPs Orthologs and genome alignments Alternative transcripts and EST genes Protein Structure Lecture/Lab 7.3 (c) 2005 CGDN

Text Search Species-specific homepage caspase-3 Gene Erin Pleasance February 21, 2005 Gene caspase-3 Lecture/Lab 7.3 (c) 2005 CGDN

GeneView GeneSpliceView GeneRegulationView ContigView GeneSNPView Erin Pleasance February 21, 2005 GeneSpliceView GeneRegulationView ContigView GeneSNPView TransView of transcript ExonView ExportView ProteinView Orthologs predicted by sequence similarity and synteny Lecture/Lab 7.3 (c) 2005 CGDN

GeneView DAS - Distributed Annotation System Erin Pleasance GeneView February 21, 2005 DAS - Distributed Annotation System - external annotation of splicing, transcripts, array expression, pubmed links, associated phenotypes, Protonet, Reactome, UniProt. Information for each Transcript - similarity matches, links to RefSeq, OMIM, PDB, Array probes, GO, InterPro, Protein FamilyView, transcript structure, protein properties. Lecture/Lab 7.3 (c) 2005 CGDN

GeneView GeneSNPView Erin Pleasance February 21, 2005 Lecture/Lab 7.3 (c) 2005 CGDN

GeneSNPView Erin Pleasance February 21, 2005 Lecture/Lab 7.3 (c) 2005 CGDN

Other SNP/Haplotype tools Erin Pleasance Other SNP/Haplotype tools February 21, 2005 SNPView – info on a single SNP ProteinView (protein sequence with SNP markup) LDView: View linkage disequilibrium (only limited regions) HaploView: View haplotypes (only limited regions) Lecture/Lab 7.3 (c) 2005 CGDN

GeneView Click Back to Erin Pleasance February 21, 2005 Lecture/Lab 7.3 (c) 2005 CGDN

ContigView Chromosome and bands Sequence contigs To Detailed view Erin Pleasance February 21, 2005 Chromosome and bands Sequence contigs To Detailed view Lecture/Lab 7.3 (c) 2005 CGDN

ContigView: Detailed View Erin Pleasance February 21, 2005 See other tracks, options in menus Genscan predictions Targetted gene predictions (2 alternative transcripts) Gene annotations EST genes Other tracks: Aligned sequences etc. Base View Region Lecture/Lab 7.3 (c) 2005 CGDN

ContigView- Features menu Erin Pleasance February 21, 2005 Export image (ps, pdf, svg) or fasta file Click on ‘close menu’ Lecture/Lab 7.3 (c) 2005 CGDN

MultiContigView Conserved regions Rat ortholog Lecture/Lab 7.3

Other Comparative Genomics Tools Up to 6 genome alignments with MLAGAN in AlignSliceView Other view is SyntenyView Also access comparative genomics through EnsMart Lecture/Lab 7.3

DAS-Distributed Annotation System Lecture/Lab 7.3

Data Mining with BioMart Allows very fast, cross-data source querying Search for genes (features, sequences, etc.) or SNPs based on Position; function; domains; similarity; expression; etc. Accessible from Ensembl website (MartView) as well as stand-alone Extremely powerful for data mining Lecture/Lab 7.3

Erin Pleasance Example 2: BioMart February 21, 2005 A new disease locus has been mapped between markers D21S1991 and D21S171. It may be that the gene involved has already been identified as having a role in another disease. What candidates are in this region? Lecture/Lab 7.3 (c) 2005 CGDN

BioMart: Choosing your dataset Lecture/Lab 7.3

BioMart: Filtering 21 D21S1991 D21S171 Lecture/Lab 7.3

Note you can output different types of information BioMart: Output Note you can output different types of information Lecture/Lab 7.3

BioMart: Output Lecture/Lab 7.3

Sequence Similarity Searching Use SSAHA for exact matches (fast) Use BLAST for more distant similarity (slow) Lecture/Lab 7.3

Looking for Help? Lecture/Lab 7.3

DAS: Getting your Own Data in Ensembl DAS (Distributed Annotation System) Anyone can load data into Ensembl and allow others to view it in the same view (eg. ContigView) as other Ensembl annotations Click on ‘Manage sources’ in DAS dropdown menu Lecture/Lab 7.3

Other Ways to Access Ensembl MySQL database directly accessible APIs for Perl and Java Other software Apollo Java genome annotation viewer/editor Sockeye Java viewer You can get your own local version of Ensembl: software and data freely available http://www.ensembl.org/Docs/ Sockeye Lecture/Lab 7.3

Exercises Erin Pleasance February 21, 2005 Ex 1. Homologues of human genes are often present in Fugu rubripes in more condensed form (with shorter introns). Is this true for the gene PTEN, a tumor suppressor often mutated in advanced cancers? Try MultiContigView; can you think of another way to get this information as well? Ex 2. The microRNA bantam regulates the Drosophila (fruitfly) gene hid by binding the 3’ UTR. Hid is involved in apoptosis, and it is possible that binding sites for bantam could be found in the 3’ UTR of other apoptosis genes as well. Obtain the 3’ UTR sequence of all Drosophila genes known to be involved in apoptosis. Using BioMart, the GO term for apoptosis is GO:0006915, evidence code TAS Ex 3. The file “PCR_product.txt” on the webserver contains the sequence of a PCR product amplified from a mouse cDNA library. What gene does the product correspond to? Does it contain the complete coding sequence of that gene? Would it be better to use BLAST or SSAHA? Lecture/Lab 7.3 (c) 2005 CGDN