Download presentation
Presentation is loading. Please wait.
Published byAlexandra Burke Modified over 9 years ago
1
An Introduction to Ensembl Presented By Hilary O. Pavlidis
2
Objectives What is Ensembl? Goals of Ensembl Genomes and Ensembl Ensembl Software Ensembl Data Files Get in and look at Ensembl –Do an example
3
What is Ensembl? Ensembl is one of 3 main systems that are currently available that annotate and display genomic information –Ensembl http://www.ensembl.org –UCSC Genome Browser http://genome.ucsc.edu –NCBI Genome Browser http://www.ncbi.nlm.nih.gov An Overview of Ensembl, 2004
4
What is Ensembl? Ensembl is a joint project between 3 organizations –EMBL- European Molecular Biology Laboratory –EBI- European Bioinformatics Institute –WTSI – Wellcome Trust Sanger Institute Provides the primary financial support for this endeavor Information from http://www.ensembl.org
5
Ensembl Audience Ensembl’s target audience –Researchers Want to download small datasets, and to do sequence similarity searches –“Power Users” Doing research that spans classes of genes, or certain genomic regions –Bioinformaticians Doing bioinformatic research or supporting labs with large data sets An Overview of Ensembl, 2004
6
Major Challenges with Genomes Scientific challenge of decoding a genome from its nucleotides to a set of functional elements Development of software which is capable of storing, manipulating, and evaluating genomes Challenge of providing comprehensive and informative access to a large amount of data in a user friendly way An Overview of Ensembl, 2004
7
Goals of Ensembl Goals are to provide (from website) –Accurate, automatic analysis of genome data –Analysis and annotation of current data –Presentation of the analysis to all via Web access –Distribution of the analysis to other bioinformatics laboratories Information from http://www.ensembl.org
8
Genomes and Annotation Ensembl does not assembly any genome project directly –Works in relation with the sequencing centers that generate the genome assembly Ensembl provides high quality annotation for genomes that do not have existing annotation –Works in relation with genomes that do have high quality annotation An Overview of Ensembl, 2004
9
Gene Building in Ensembl Two step process Targeted Build –Aligns all species specific protein and mRNA information to the genome sequence Similarity Build –Based on the information obtained from closely related species –Aims to further advance the transcript predictions An Overview of Ensembl, 2004
10
Integration and Comparative Genomics Genomes are inherently related Ensembl provides resources that are capable of taking advantage of this –Alignment of sequences between genomes –Pairing of orthologous gene pairs between genomes –Derivation of long range blocks of synteny An Overview of Ensembl, 2004
11
Mammalian Genomes Homo Sapiens –Human Pan troglodytes –Chimpanzee Macaca mulatta –Rhesus monkey Mus musculus –Mouse Rattas norvegicus –Rat Canis familiaris –Dog Bos taurus –Cow Monodelphis domestica –Opossum Pre Ensembl Genomes –Dasypus novemcinctus Nine banded armadillo –Loxodonta africana African Elephant –Echinops telfairi Madagascar hedgehog Information from http://www.ensembl.org
12
Other Genomes Gallus –Chicken Xenopus tropicalis –Pipid frog Danio rerio –Zebra fish Fugu rubripes –Puffer fish Tetradon nigroviridis –Tetradon fish **Ciona intestinalis and C. savignyi –Sea squirt Drosophila melanogaster –Fruit fly Anopheles gambiae –Mosquito Apis mellifera –Honey bee Caenorhabditis elegans –Roundworm Saccharomyces cerevisiae –Yeast Aedes aegypti –Pre Ensembl Genome of Egyptian mosquito Information from http://www.ensembl.org
13
Ensembl Software
14
Ensembl Website Construction Update datasets and software ten times per year –Yields a new version number that incorporates the month and year Ensembl v37-Feb 2006 is current version Ensembl now archives previous versions of for up to 2 years –Started in November of 2004 Website is written in Perl Information from http://www.ensembl.org
15
Ensembl Databases Ensembl uses MySQL to store information in relational databases 4 Main Databases –Ensembl Core Database –Ensembl EST Database –Ensembl Compara Database –Ensembl Variation Database Information from http://www.ensembl.org
16
Ensembl Databases Ensembl also utilizes APIs Application Programme Interfaces (APIs) –Serve as a connection between the databases and specific application programs –Ensembl has Perl API and Java API Perl API more “complete” than Java API Information from http://www.ensembl.org
17
Ensembl Databases Ensembl Core Databases –Species specific Ensembl core databases that store genome sequence and annotation information Gene, transcript, and protein models that are annotated by the Ensembl automated genome analysis –Databases also stores information about cDNA and protein alignments, as well as external references Ex. - NCBI Numbers AB012211 Information from http://www.ensembl.org
18
Ensembl Databases Ensembl Compara Database –Is a multi-species database that stores the results of genome wide species comparisons –The comparative genomic dataset allows for pairwise whole genome alignments and synteny regions –The comparative proteomics dataset allows for orthologue predictions and protein family clusters Information from http://www.ensembl.org
19
Ensembl Tools 4 Main Tools –BioMart –Exonerate –SSAHA –Wise2 Information from http://www.ensembl.org
20
Ensembl Tools BioMart –Generic data management system built specifically for use in Ensembl –Ensembl will build a BioMart database to provide users with the ability to conduct fast and powerful searches –It simplifies the task of integrating external data sets (provided by the user) with the Ensembl databases Information from http://www.ensembl.org
21
Ensembl Tools Exonerate –A tool designed for pair wise sequence comparison Allows for the alignment of sequences using many different models –Example – Dynamic Programming Information from http://www.ensembl.org
22
Ensembl Tools SSAHA –Sequence Search and Alignment by Hashing Algorithm –A tool that provides very fast matching and alignment of DNA sequences –Speed is gained from converting sequence information into a hash data structure Information from http://www.ensembl.org
23
Ensembl Tools Wise2 –Tool that is designed for the comparison of biopolymers Biopolymer Ex. – DNA and protein sequences –Genewise and estwise are algorithms associated with the Wise2 package Information from http://www.ensembl.org
24
Ensembl Data
25
Data Exporting –All data generated is free for download via the ftp.ensembl.org site ftp.ensembl.org Includes gene sequences, transcript and protein predictions –Ensembl provides a dedicated ExportView page Can be exported into HTML, text or zipped format Data Importing –Able to import your own dataset for analysis –For “large” personal datasets DAS server to help stabilize these datasets Information from http://www.ensembl.org
26
Ensembl Data Searching Ensembl –Search for nucleotide or protein sequences –Many available search functions General text search across all available species genomes Search within a specific genome ( Gallus Gallus) Use external numbering sequences –NCBI number – AB012211 Blast search with a gene sequence –Your own data or from external source Information from http://www.ensembl.org
27
Ensembl Reference T. Hubbard, D. Andrews, M. Caccamo, G. Cameron, Y. Chen, M. Clamp, L. Clarke, G. Coates, T. Cox, F. Cunningham, V. Curwen, T. Cutts, T. Down, R. Durbin, X. M. Fernandez-Suarez, J. Gilbert, M. Hammond, J. Herrero, H. Hotz, K. Howe, V. Iyer, K. Jekosch, A. Kahari, A. Kasprzyk, D. Keefe, S. Keenan, F. Kokocinsci, D. London, I. Longden, G. McVicker, C. Melsopp, P. Meidl, S. Potter, G. Proctor, M. Rae, D. Rios, M. Schuster, S. Searle, J. Severin, G. Slater, D. Smedley, J. Smith, W. Spooner, A. Stabenau, J. Stalker, R. Storey, S. Trevanion, A. Ureta-Vidal, J. Vogel, S. White, C. Woodwark and E. Birney Ensembl 2005 Nucleic Acids Res. 2005 Jan 1;33 Database issue:D447-D453. –Available as a free full text at: http://nar.oxfordjournals.org/cgi/content/full/33/suppl_1/D447http://nar.oxfordjournals.org/cgi/content/full/33/suppl_1/D447 Include Current Version Release of Ensembl –Current version is Ensembl v37-Feb 2006 Free full text access to other scientific publications concerning applications associated with Ensembl – http://www.ensembl.org/info/publications.htmlhttp://www.ensembl.org/info/publications.html Information from http://www.ensembl.org
28
Lets get familiarized with Ensembl and view a Chicken Example http://www.ensembl.org
29
Presentation References Ensembl Web Information –Available at http://www.ensembl.orghttp://www.ensembl.org Ensembl 2005 –Hubbard et al., 2005. Nucleic Acids Research. January 2005 An Overview of Ensembl –Birney et al., 2004. Genome Research. Available at: http://www.genome.orghttp://www.genome.org
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.