Lecture/Lab 7.31

Slides:



Advertisements
Similar presentations
Model Organism Databases and Community Annotation
Advertisements

1 of 25 Sequence Variation in Ensembl. 2 of 25 Outline SNPs SNPs in Ensembl Linkage disequilibrium SNPs in BioMart DAS sources.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Genome Browsers Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Alignment of mRNAs to genomic DNA Sequence Martin Berglund Khanh Huy Bui Md. Asaduzzaman Jean-Luc Leblond.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
Lecture 7.11 The Ensembl Database Erin Pleasance Steven Jones Canada’s Michael Smith Genome Sciences Centre, Vancouver.
Data Mining in Ensembl with EnsMart. 2 of 24 All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes.
Gene Discovery & Genome Browsing
How to access genomic information using Ensembl August 2005.
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
Genome Annotation BCB 660 October 20, From Carson Holt.
Doug Brutlag 2011 Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University School of Medicine Genomics, Bioinformatics.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
Doug Brutlag 2011 Next Generation Sequencing and Human Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University.
The Ensembl Gene set The “Genebuild” 21 April 2008.
Kerstin Howe, Mario Caccamo, Ian Sealy The Zebrafish Genome Sequencing Project Bioinformatics resources.
Genome Annotation and Databases Genomic DNA sequence Genomic annotation BIO520 BioinformaticsJim Lund Reading Ch 9, Ch10.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Genome Annotation BBSI July 14, 2005 Rita Shiang.
Genomes School B&I TCD Bioinformatics May Genome sizes Completed eukaryotic nuclear genomes Type of organismSpeciesGenome size (10 6 base pairs)
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME: J. Pevsner
Copyright OpenHelix. No use or reproduction without express written consent1.
An Introduction to ENSEMBL Cédric Notredame. The Top 5 Surprises in the Human Genome Map 1.The blue gene exists in 3 genotypes: Straight Leg, Loose Fit.
EnsEMBL Opening up the whole Genome Philip Lijnzaad
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
1 of 32 Sequence Variation in Ensembl. 2 of 32 Outline SNPs SNPs in Ensembl Haplotypes & Linkage Disequilibrium SNPs in BioMart HapMap project Strain-specific.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Srr-1 from Streptococcus. i/v nonpolar s serine (polar uncharged) n/s/t polar uncharged s serine (polar uncharged) e glutamic acid (neg. charge) sserine.
Sackler Medical School
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,
EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14: , Genome research EBI, Wellcome Trust.
Data Mining in Ensembl with BioMart Giulietta Spudich.
How can we find genes? Search for them Look them up.
Annotation of Drosophila virilis Chris Shaffer GEP workshop, 2006.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.
Evaluating genes and transcripts in Ensembl March 2007.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
Accessing and visualizing genomics data
What is BLAST? Basic BLAST search What is BLAST?
Gene Finding in Chimpanzee Evidence based improvement of ab initio gene predictions Chris Shaffer06/2009.
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
Welcome to the combined BLAST and Genome Browser Tutorial.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
What is BLAST? Basic BLAST search What is BLAST?
Ensembl Database and Web Browser
Data Mining with BioMart
Comparative Genomics.
Genome Projects Maps Human Genome Mapping Human Genome Sequencing
Visualization of genomic data
Genome Annotation w/ MAKER
Ensembl Genome Repository.
with the Ensembl Genome Browser
Part II SeqViewer AraCyc Help
Presentation transcript:

Lecture/Lab

Lecture/Lab 7.32 Ensembl Database and Web Browser Erin Pleasance Canada’s Michael Smith Genome Sciences Centre, Vancouver

Lecture

Lecture/Lab 7.34 What is Ensembl? Joint project of EBI and Sanger Automated annotation of eukaryotic genomes Open source software Relational database system Web interface “ The main aim of this campaign is to encourage scientists across the world - in academia, pharmaceutical companies, and the biotechnology and computer industries - to use this free information.” - Dr. Mike Dexter, Director of the Wellcome Trust

Lecture/Lab 7.35 Ensembl components Genes (GeneView, TransView, ExonView, ProtView) SNPs and Haplotypes (SNPView, GeneSNPView, HaploView, LDView) Diseases (DiseaseView) Markers (MarkerView) Families (DomainView, FamilyView Functions (GOView) Genome Sequence (ContigView) Comparative Genomics (ContigView, MultiContigView, SyntenyView, GeneView) Search tools: Data: Sequence Similarity (BLAST, SSAHA) Text (TextView) Anything (EnsMart) Other Annotations Chromosomes (ChromoView, KaryoView, CytoView, MapView)

Lecture/Lab 7.36 Species in Ensembl Focus on vertebrates No fungi/plants Arabidopsis genome browser based on Ensembl at Mammals: Human Chimp Mouse Rat Dog Cow Opossum Insects: Fruitfly Mosquito Honeybee Fish: Zebrafish Fugu Pufferfish Tetraodon Pufferfish Other: Chicken Frog Other: Nematode VertebratesInvertebrates

Lecture/Lab 7.37 Ensembl Gene Annotation “Basis for initial analysis and publication of most vertebrate genomes” Genome assembly from NCBI Gene build system –Targetted gene builds predict known genes –Similarity gene builds predict novel genes

Lecture/Lab 7.38 Curwen et al, Genome Res 14: , 2004

Lecture/Lab 7.39 Targetted gene build ContigView of best in genome gene with associated evidence Known gene (p53) Proteins aligned cDNAs aligned UTRs predicted Unigene clusters aligned Align known proteins with pmatch and BLAST Incorporate aligned cDNA sequences to find splice sites, UTRs with genewise

Lecture/Lab Similarity gene build Identify novel exons ab initio using Genscan Confirm exons by BLAST to known proteins, mRNAs, UniGene clusters ContigView of homology gene with associated evidence Novel gene GenScan predictions Proteins aligned Unigene clusters aligned

Lecture/Lab Ensembl Gene Annotation Resulting “Ensembl genes” are highly accurate with low false positive rates Ensembl human gene identifiers are 95% stable between builds

Lecture/Lab Manually curated genes: VEGA Some chromosomes contain manually curated genes from VEGA database Otter database/server allows integration of automatic and manual annotations (eg. from Apollo) VEGA gene

Lecture/Lab Ensembl EST genes ESTs not accurate enough to produce Ensembl genes, but important especially for identifying alternative transcripts ESTs aligned to genome and merged to create an independent set of “EST genes” Known gene Unigene clusters aligned EST genes

Lecture/Lab Pseudogenes Processed pseudogenes in annotation identified (lack of introns, frameshifts, presence of multi-exon version elsewhere in genome, etc.) Pseudogene

Lecture/Lab Noncoding RNA Genes Genes with no ORFs that are functional (tRNAs, rRNAs, miRNAs …) 7220 annotations from Sean Eddy and Tom Jones miRNAs Coding gene

Lecture/Lab Example 1: Exploring Caspase-3 Aim to demonstrate basic browsing and views Caspase-3 is a gene involved in apoptosis (cell suicide) We will look at: –Gene annotation –SNPs –Orthologs and genome alignments –Alternative transcripts and EST genes

Lecture/Lab Example 1: Exploring Caspase-3 Go to human homepage

Lecture/Lab Species-specific homepage Statistics of current release Site map

Lecture/Lab Finding the tool/view: Site Map

Lecture/Lab Text Search Species-specific homepage caspase-3 Gene Click Back to

Lecture/Lab GeneView ContigView ProteinView ExonView TransView of transcript SNPView ExportView

Lecture/Lab GeneView Orthologs predicted by sequence similarity and synteny GeneDAS: Get data from external sources

Lecture/Lab GeneView Links to external databases On the same page, information provided for each transcript individually

Lecture/Lab GeneView

Lecture/Lab GeneSNPView

Lecture/Lab Other SNP/Haplotype tools SNPView ProteinView (protein sequence with SNP markup) LDView: View linkage disequilibrium (only limited regions) HaploView: View haplotypes (only limited regions)

Lecture/Lab GeneView Click Back to

Lecture/Lab ContigView Sequence contigs Chromosome and bands

Lecture/Lab ContigView: Detailed View Other tracks: Aligned sequences etc. Gene annotations See other tracks, options in menus Targetted gene predictions (2 alternative transcripts) EST genes Genscan predictions

Lecture/Lab ContigView

Lecture/Lab MultiContigView Rat ortholog DNA sequence homology

Lecture/Lab Other Comparative Genomics Tools Saw gene orthology, DNA homology Other view is SyntenyView Also access comparative genomics through EnsMart

Lecture/Lab Data Mining with EnsMart Allows very fast, cross-data source querying Search for genes (features, sequences, etc.) or SNPs based on –Position; function; domains; similarity; expression; etc. Accessible from Ensembl website (MartView) as well as stand-alone Extremely powerful for data mining

Lecture/Lab Example 2: EnsMart A new disease locus has been mapped between markers D21S1991 and D21S171. It may be that the gene involved has already been identified as having a role in another disease. What candidates are in this region?

Lecture/Lab Example 2: EnsMart EnsMart is based on BioMart OR

Lecture/Lab EnsMart: Choosing your dataset

Lecture/Lab EnsMart: Filtering D21S1991 D21S171 21

Lecture/Lab EnsMart: Output Note you can output different types of information eg. sequences

Lecture/Lab EnsMart: Output

Lecture/Lab Sequence Similarity Searching Use SSAHA for exact matches (fast) Use BLAST for more distant similarity (slow)

Lecture/Lab Finding anything else: Help

Lecture/Lab DAS: Getting your Own Data in Ensembl DAS (Distributed Annotation System) –Anyone can load data into Ensembl and allow others to view it in the same view (eg. ContigView) as other Ensembl annotations –Some built-in DAS sources ldas.html

Lecture/Lab Other Ways to Access Ensembl MySQL database directly accessible APIs for Perl and Java Other software –Apollo Java genome annotation viewer/editor –Sockeye Java viewer You can get your own local version of Ensembl: software and data freely available Sockeye

Lecture/Lab For more information Publications (listed at wiki/html/EnsemblDocs/EnsemblPublications.html) –Ensembl Special: Genome Research May 2004 –Ensembl updates: NAR Jan –EnsMart: Kasprzyk et al, Genome Res Jan Documentation on how to download software and database: –

Lecture/Lab Exercises Homologues of human genes are often present in Fugu rubripes in more condensed form (with shorter introns). Is this true for the gene PTEN, a tumor suppressor often mutated in advanced cancers? –Try MultiContigView; can you think of another way to get this information as well? The microRNA bantam regulates the Drosophila (fruitfly) gene hid by binding the 3’ UTR. Hid is involved in apoptosis, and it is possible that binding sites for bantam could be found in the 3’ UTR of other apoptosis genes as well. Obtain the 3’ UTR sequence of all Drosophila genes known to be involved in apoptosis. –Using EnsMart, the GO term for apoptosis is GO: , evidence code TAS The file “PCR_product.txt” contains the sequence of a PCR product amplified from a mouse cDNA library. What gene does the product correspond to? Does it contain the complete coding sequence of that gene? –Would it be better to use BLAST or SSAHA?