A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.

Slides:



Advertisements
Similar presentations
Enter Presentation Everything you expect …plus DNASIS MAX 2.0 Sequence Analysis Software.
Advertisements

Dreamweaver MX 2004 “Viewing the Workspace” Mrs. Wilson.
SRI International Bioinformatics 1 Genome Browser Markus Krummenacker Bioinformatics Research Group SRI, International Q
© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Software to Manage EEP Vegetation Plot Data A design proposal Michael Lee January 31, 2011.
Genome Browsers Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
How to access genomic information using Ensembl August 2005.
McGraw-Hill Technology Education© 2004 by the McGraw-Hill Companies, Inc. All Rights Reserved. Introduction to Microsoft Office 2003.
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
Goals of the Human Genome Project determine the entire sequence of human DNA identify all the genes in human DNA store this information in databases improve.
UniProt - The Universal Protein Resource
Human Genome Project Seminal achievement. Scientific milestone. Scientific implications. Social implications.
1.Learning the Terms Learning the TermsLearning the Terms 2.Accessing the Internet from a PC Accessing the Internet from a PCAccessing the Internet from.
11 Games and Content Session 4.1. Session Overview  Show how games are made up of program code and content  Find out about the content management system.
Development of Bioinformatics and its application on Biotechnology
GeVab: Genome Variation Analysis Browsing Server Korean BioInformation Center, KRIBB InCoB2009 KRIBB
Mouse Genome Sequencing
The Ensembl Gene set The “Genebuild” 21 April 2008.
Kerstin Howe, Mario Caccamo, Ian Sealy The Zebrafish Genome Sequencing Project Bioinformatics resources.
Genome Annotation and Databases Genomic DNA sequence Genomic annotation BIO520 BioinformaticsJim Lund Reading Ch 9, Ch10.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
1 The Genome Browser allows you to –Browse the Rice-Japonica, Maize and Arabidopsis genomes. –View the location of a particular feature on the rice genome.
1 Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
Creating your own form from scratch.. To create a custom form, you can modify an existing form or design and create a form from scratch. In either case,
Use cases for Tools at the Bovine Genome Database Apollo and Bovine QTL viewer.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Copyright OpenHelix. No use or reproduction without express written consent1.
ACTIVINSPIRE TRAINING Tips and tools for creating Flipcharts on ActivInspire.
EnsEMBL Opening up the whole Genome Philip Lijnzaad
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Sackler Medical School
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
SRI International Bioinformatics 1 Genome Browser Markus Krummenacker Bioinformatics Research Group SRI, International Q
A Genomics View of Unix. General Unix Tips To use the command line start X11 and type commands into the “xterm” window A few things about unix commands:
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Human Genome.
SRI International Bioinformatics 1 Genome Browser Tomer Altman Bioinformatics Research Group SRI, International August 19th, 2009.
VectorBase Vectorbase probe mapping. VectorBase Automatic Annotation browser Array data CHADO Manual Annotation XML vectorbase Automatic Annotation.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
What do we already know ? The rice disease resistance gene Pi-ta Genetically mapped to chromosome 12 Rybka et al. (1997). It has also been sequenced Bryan.
Mojavensis: Issues of Polymorphisms Chris Shaffer GEP 2009 Washington University.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
Welcome to the combined BLAST and Genome Browser Tutorial.
Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
Virginia Commonwealth University
The Ensembl Database Steven Jones August 18, 2004
Data Mining with BioMart
Bioinformatics Research Group
3.1 Genes Essential idea: Every living organism inherits a blueprint for life from its parents. Genes and hence genetic information is inherited from parents,
Genomes and Their Evolution
INFORMATION FLOW AARTHI & NEHA.
Basic Local Alignment Search Tool
3.1 Genes Essential idea: Every living organism inherits a blueprint for life from its parents. Genes and hence genetic information is inherited from.
Part II SeqViewer AraCyc Help
Welcome - webinar instructions
SNPs and CNPs By: David Wendel.
Presentation transcript:

A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the Ensembl database –Who is behind it –What it can tell you about the genome –How to interpret the data Ensembl database web pages

Background The HGP has produced the first “draft” sequence of the human genome. This sequence is not a finished product –it contains errors and will need much work before it can be considered truly accurate. However, it will provide scientists with their first overall view of the sequence of the human genome. Producing this draft sequence is much like assembling a huge jigsaw puzzle. Millions of short pieces of DNA must be fitted together to form an overall sequence of the complete genome. To make it more complicated the “pieces” come from all over the world; produced by the international collaboration of sequencing laboratories that comprise the HGP consortium. As the DNA is produced it is released into the public domain by placing it in publicly accessible databases such as EMBL and Genbank.

The Jigsaw Puzzle Genome Modern DNA sequencing technology can only determine accurate sequences of short stretches of DNA (less than 1000 base pairs). Since the human genome is in excess of 3 billion base pairs long the genome has had to be sequenced in many small pieces that must be reassembled afterwards. The pieces are reassembled by comparing the sequence of the ends to find overlaps which can be used to join them together.

Ensembl is a joint project between EMBL-EBI and the Sanger Center to develop a system which automatically tracks all the sequenced pieces of the human genome, attempts to assemble them into large single stretches and then analyse the assembled DNA to find genes and other features of interest to biologists and medical researchers. Ensembl: –Is “fed” raw DNA sequence taken from the public DNA databases –Puts it into a large tracking database (the “Ensembl” database) –Joins the sequences into their proper place in the genome –Automatically finds genes and other features in the sequence –Presents the results on the internet for everyone to see, for free. What is Ensembl? Ensembl Database World DNA data MapSNP WWW Sanger Centre Computation Analysis Pipeline

Keeping track of the thousands of individual pieces of DNA making up the human genome jigsaw puzzle is very difficult. As the sequence is refined and mistakes are corrected in sequencing labs around the world the sequence of the pieces changes. It is vitally important to keep track of these changes accurately so that consistent “big picture” is maintained. This task is extremely difficult to do manually and would require many people to do it. Automatic tracking via a system such as Ensembl is quicker, cheaper and more accurate. Why do we need Ensembl?

What’s in the Ensembl database? All of the human genome DNA that is currently available in the public domain. Collectively, the features identified on the DNA sequence by Ensembl are called “annotation” and mostly comprise: –Genes. These fall into 3 general classes: genes that are known already from other experiments genes that are predicted by the Ensembl system –Other interesting features of the DNA such as: SNPs (single nucleotide polymorphisms) Repeats (regions of simple repetitive DNA sequence) Regions highly similar to other sequences in the public databases (also called “homologies”).

How does Ensembl predict genes? Ensembl uses specialized gene finding software called “Genscan” to predict the location of gene sequences. The software studies DNA sequences and identifies DNA regions that look like they may be genes. These “candidate” gene sequences are then compared to the sequence of all known genes in the public databases. If matches are found then this provides “supporting evidence” suggesting the predictions are accurate to some degree. The predicted genes are stored in the database so that they can be retrieved later.

Ensembl Naming Conventions Keeping stable name for “things”, such as genes, in databases is very important. This allows scientists in different labs around the world to be confident that they are all referring to the same “thing”. Ensembl goes to great lengths to try to maintain stable names for genes and other features in the genome. This is a very difficult task because Ensembl is an environment where DNA sequence is continuously changing and being improved. Changes to the underlying DNA sequence may cause new genes to be created, deleted, altered or merged with one another. Wherever possible the names are maintained when the DNA sequence is revised. Ensembl keeps a “version” number for many things so changes can be tracked over time. Ensembl identifiers look like: –“ENSG00000XXXX” for genes –“ENST00000XXXX” for gene transcripts, –etc,

The Ensembl Website The Ensembl website is at: It provides a quick and easy way to browse the contents of the Ensembl database or find specific items of interest. There are a number of main entry points into the Ensembl database. –DNA similarity searches (“BLAST” searches). This is useful if you already have a DNA or protein sequence and you want to see if anything similar exists in the Ensembl database. –Browse from the chromosome level all the way down to the DNA sequence level. –Ensembl identifier search. If you already have an ID number you can search for it directly. –Known gene names. –OMIM diseases. –Free text search of OMIM,SWISSPROT and InterPro annotation

Browsing Ensembl From the Ensembl home page click on a picture of a chromosome you are interested in.

Browsing Chromosome Maps Feature Density Plots The chromosome view shows a picture of the chromosome and graphical representations of features on the chromosome. Click anywhere on the image to see a magnified view of that region.

In addition to sequence displays a map of DNA fragments is shown giving the location of genes. Each display is a magnified view of the red window in the display above. Genes positions are shown under the map Use these buttons to move and resize your view Browsing Contig Displays Adjacent contigs are shown in alternating blue Landmark map markers Location on the chromosome 1Mb overview of the region Use these menus to reconfigure your view and access advanced features.

The region of interest is the area surrounded by the red window. The Contigview Overview display always shows 1Mb around this region. Clicking anywhere within this display will center around that click. Using Contig Overview

Holding your mouse over features in the detailed view will pop up a menu through which you can access detailed information about those features. Homologies to other known sequences Features on reverse DNA strand Simple sequence repeats Sequence length Features on forward DNA strand Using Contig Detailed View (1) EMBL annotation Known Ensembl transcript Mouse trace alignments “Unstranded” features Clone tiling path

Using Contig Detailed View (2) Menus at the top of the detailed display control the features which are displayed. You may also export the region in a variety of formats or view the region in using other genome browsers.

Adding External Data to ContigView via DAS DAS provides a system for adding user-defined data to Ensembl displays. An external server serves features which may be layered onto the Ensembl ContigView. 1. Access the “DAS sources” menu. 2. Enter your DAS server and add your sources 3. Manage your existing sources 4. Refresh your ContigView Display External Sources

View marker sequence Database where markers are stored Marker flanking sequences Interpreting Marker Information View chromosome map

Gene Views Clicking on a gene displays detailed information about gene structure Gene structure Supporting evidence leading to the prediction of this gene Transcript cDNA sequence Predicted properties Transcript context

Evidence supporting the prediction of a gene ordered according to its reliability. Reliable data is shown in green. Lower reliability evidence is shown in grey Supporting data source Supporting data ID Summary of data Diagrammatic representation of which part of the gene prediction this evidence supports. Supporting Evidence For Genes

Further Information The Ensembl Project: Ensembl Trace Server: Ensembl Distributed Annotation Server: Human Genome Central Resources: