An Introduction to Ensembl Presented By Hilary O. Pavlidis.

Slides:



Advertisements
Similar presentations
GBrowse at TAIR Philippe Lamesch TAIR curator. Seqviewer.
Advertisements

Part I: Tips and Techniques from curators GBrowse at TAIR David Swarbreck.
Genomic Innovations- Orthology Paralogy. Genomic innovation.
HCS806 “Methods in Horticulture and Crop Science” Introduction to methods in Bioinformatics for plant science. David Francis (Coordinator) Ian Holford.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Ensembl Developers Meeting September 2008 Xosé Mª Fernández European Bioinformatics Institute.
BIOINFORMATICS Ency Lee.
April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.
Genome Browsers Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Lecture 7.11 The Ensembl Database Erin Pleasance Steven Jones Canada’s Michael Smith Genome Sciences Centre, Vancouver.
Data Mining in Ensembl with EnsMart. 2 of 24 All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes.
How to access genomic information using Ensembl August 2005.
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Goals of the Human Genome Project determine the entire sequence of human DNA identify all the genes in human DNA store this information in databases improve.
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
The Ensembl Gene set The “Genebuild” 21 April 2008.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
VectorBase A Resource Centre for Invertebrate Hosts of Human Pathogens Bob MacCallum Imperial College London.
1 of 42 Browsing Genes and Genomes with Ensembl Bert Overduin Ensembl User Support EMBL Outstation European Bioinformatics Institute Wellcome Trust Genome.
TAIR, PMN, SGN and Gramene workshop Focus on comparative genomics and new tools Philippe Lamesch, A. S. Karthikeyan, Aureliano Bombarely Gomez, Pankaj.
Kerstin Howe, Mario Caccamo, Ian Sealy The Zebrafish Genome Sequencing Project Bioinformatics resources.
How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November.
Genomes School B&I TCD Bioinformatics May Genome sizes Completed eukaryotic nuclear genomes Type of organismSpeciesGenome size (10 6 base pairs)
EBI is an Outstation of the European Molecular Biology Laboratory. Bert Overduin Daniel Rios Stephen Fitzgerald Edinburgh, 24 & 25 February 2009 Ensembl.
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
DAY 1c: Accessing Completed Genomes 1. UCSC Genome Bioinformatics 2. Ensembl 3. NCBI Genomic Biology.
An Introduction to ENSEMBL Cédric Notredame. The Top 5 Surprises in the Human Genome Map 1.The blue gene exists in 3 genotypes: Straight Leg, Loose Fit.
1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments.
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14: , Genome research EBI, Wellcome Trust.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
EB3233 Bioinformatics Introduction to Bioinformatics.
Algorithms for Biological Sequence Analysis Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University,
A collaborative tool for sequence annotation. Contact:
Kevin Howe, B. Aken, M. Caccamo, Y. Chen, L. Clarke, S. Dyer, G. Coates, T. Cox, F. Cunningham, V. Curwen, T. Cutts, R. Durbin, J. Fernandez-Banet, X.M.
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
Evolution of Animal Cytochromes P450 from Sponges to Mammals David R. Nelson University of Tennessee Health Sciences Center Memphis.
ARGOS (A Replicable Genome InfOrmation System) for FlyBase and wFleaBase Don Gilbert, Hardik Sheth, Vasanth Singan { gilbertd, hsheth, vsingan
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
Accessing and visualizing genomics data
Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Lecture/Lab 7.31
Introduction to Genes and Genomes with Ensembl
Tools For Vertebrate Gene Naming
Ensembl Database and Web Browser
VectorBase genome annotation
The Ensembl Database Steven Jones August 18, 2004
Data Mining with BioMart
Mangaldai College, Mangaldai
TAIR, PMN, SGN and Gramene workshop
Genomes and Their Evolution
Ensembl Genomes: Overview Poznań, 27th-28th June 2013
Part I: Tips and Techniques from curators
Ensembl Genome Repository.
Ensembl Genomes: Overview Versailles, 12th-13th November 2012
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

An Introduction to Ensembl Presented By Hilary O. Pavlidis

Objectives What is Ensembl? Goals of Ensembl Genomes and Ensembl Ensembl Software Ensembl Data Files Get in and look at Ensembl –Do an example

What is Ensembl? Ensembl is one of 3 main systems that are currently available that annotate and display genomic information –Ensembl –UCSC Genome Browser –NCBI Genome Browser An Overview of Ensembl, 2004

What is Ensembl? Ensembl is a joint project between 3 organizations –EMBL- European Molecular Biology Laboratory –EBI- European Bioinformatics Institute –WTSI – Wellcome Trust Sanger Institute Provides the primary financial support for this endeavor Information from

Ensembl Audience Ensembl’s target audience –Researchers Want to download small datasets, and to do sequence similarity searches –“Power Users” Doing research that spans classes of genes, or certain genomic regions –Bioinformaticians Doing bioinformatic research or supporting labs with large data sets An Overview of Ensembl, 2004

Major Challenges with Genomes Scientific challenge of decoding a genome from its nucleotides to a set of functional elements Development of software which is capable of storing, manipulating, and evaluating genomes Challenge of providing comprehensive and informative access to a large amount of data in a user friendly way An Overview of Ensembl, 2004

Goals of Ensembl Goals are to provide (from website) –Accurate, automatic analysis of genome data –Analysis and annotation of current data –Presentation of the analysis to all via Web access –Distribution of the analysis to other bioinformatics laboratories Information from

Genomes and Annotation Ensembl does not assembly any genome project directly –Works in relation with the sequencing centers that generate the genome assembly Ensembl provides high quality annotation for genomes that do not have existing annotation –Works in relation with genomes that do have high quality annotation An Overview of Ensembl, 2004

Gene Building in Ensembl Two step process Targeted Build –Aligns all species specific protein and mRNA information to the genome sequence Similarity Build –Based on the information obtained from closely related species –Aims to further advance the transcript predictions An Overview of Ensembl, 2004

Integration and Comparative Genomics Genomes are inherently related Ensembl provides resources that are capable of taking advantage of this –Alignment of sequences between genomes –Pairing of orthologous gene pairs between genomes –Derivation of long range blocks of synteny An Overview of Ensembl, 2004

Mammalian Genomes Homo Sapiens –Human Pan troglodytes –Chimpanzee Macaca mulatta –Rhesus monkey Mus musculus –Mouse Rattas norvegicus –Rat Canis familiaris –Dog Bos taurus –Cow Monodelphis domestica –Opossum Pre Ensembl Genomes –Dasypus novemcinctus Nine banded armadillo –Loxodonta africana African Elephant –Echinops telfairi Madagascar hedgehog Information from

Other Genomes Gallus –Chicken Xenopus tropicalis –Pipid frog Danio rerio –Zebra fish Fugu rubripes –Puffer fish Tetradon nigroviridis –Tetradon fish **Ciona intestinalis and C. savignyi –Sea squirt Drosophila melanogaster –Fruit fly Anopheles gambiae –Mosquito Apis mellifera –Honey bee Caenorhabditis elegans –Roundworm Saccharomyces cerevisiae –Yeast Aedes aegypti –Pre Ensembl Genome of Egyptian mosquito Information from

Ensembl Software

Ensembl Website Construction Update datasets and software ten times per year –Yields a new version number that incorporates the month and year Ensembl v37-Feb 2006 is current version Ensembl now archives previous versions of for up to 2 years –Started in November of 2004 Website is written in Perl Information from

Ensembl Databases Ensembl uses MySQL to store information in relational databases 4 Main Databases –Ensembl Core Database –Ensembl EST Database –Ensembl Compara Database –Ensembl Variation Database Information from

Ensembl Databases Ensembl also utilizes APIs Application Programme Interfaces (APIs) –Serve as a connection between the databases and specific application programs –Ensembl has Perl API and Java API Perl API more “complete” than Java API Information from

Ensembl Databases Ensembl Core Databases –Species specific Ensembl core databases that store genome sequence and annotation information Gene, transcript, and protein models that are annotated by the Ensembl automated genome analysis –Databases also stores information about cDNA and protein alignments, as well as external references Ex. - NCBI Numbers AB Information from

Ensembl Databases Ensembl Compara Database –Is a multi-species database that stores the results of genome wide species comparisons –The comparative genomic dataset allows for pairwise whole genome alignments and synteny regions –The comparative proteomics dataset allows for orthologue predictions and protein family clusters Information from

Ensembl Tools 4 Main Tools –BioMart –Exonerate –SSAHA –Wise2 Information from

Ensembl Tools BioMart –Generic data management system built specifically for use in Ensembl –Ensembl will build a BioMart database to provide users with the ability to conduct fast and powerful searches –It simplifies the task of integrating external data sets (provided by the user) with the Ensembl databases Information from

Ensembl Tools Exonerate –A tool designed for pair wise sequence comparison Allows for the alignment of sequences using many different models –Example – Dynamic Programming Information from

Ensembl Tools SSAHA –Sequence Search and Alignment by Hashing Algorithm –A tool that provides very fast matching and alignment of DNA sequences –Speed is gained from converting sequence information into a hash data structure Information from

Ensembl Tools Wise2 –Tool that is designed for the comparison of biopolymers Biopolymer Ex. – DNA and protein sequences –Genewise and estwise are algorithms associated with the Wise2 package Information from

Ensembl Data

Data Exporting –All data generated is free for download via the ftp.ensembl.org site ftp.ensembl.org Includes gene sequences, transcript and protein predictions –Ensembl provides a dedicated ExportView page Can be exported into HTML, text or zipped format Data Importing –Able to import your own dataset for analysis –For “large” personal datasets DAS server to help stabilize these datasets Information from

Ensembl Data Searching Ensembl –Search for nucleotide or protein sequences –Many available search functions General text search across all available species genomes Search within a specific genome ( Gallus Gallus) Use external numbering sequences –NCBI number – AB Blast search with a gene sequence –Your own data or from external source Information from

Ensembl Reference T. Hubbard, D. Andrews, M. Caccamo, G. Cameron, Y. Chen, M. Clamp, L. Clarke, G. Coates, T. Cox, F. Cunningham, V. Curwen, T. Cutts, T. Down, R. Durbin, X. M. Fernandez-Suarez, J. Gilbert, M. Hammond, J. Herrero, H. Hotz, K. Howe, V. Iyer, K. Jekosch, A. Kahari, A. Kasprzyk, D. Keefe, S. Keenan, F. Kokocinsci, D. London, I. Longden, G. McVicker, C. Melsopp, P. Meidl, S. Potter, G. Proctor, M. Rae, D. Rios, M. Schuster, S. Searle, J. Severin, G. Slater, D. Smedley, J. Smith, W. Spooner, A. Stabenau, J. Stalker, R. Storey, S. Trevanion, A. Ureta-Vidal, J. Vogel, S. White, C. Woodwark and E. Birney Ensembl 2005 Nucleic Acids Res Jan 1;33 Database issue:D447-D453. –Available as a free full text at: Include Current Version Release of Ensembl –Current version is Ensembl v37-Feb 2006 Free full text access to other scientific publications concerning applications associated with Ensembl – Information from

Lets get familiarized with Ensembl and view a Chicken Example

Presentation References Ensembl Web Information –Available at Ensembl 2005 –Hubbard et al., Nucleic Acids Research. January 2005 An Overview of Ensembl –Birney et al., Genome Research. Available at: