Presentation is loading. Please wait.

Presentation is loading. Please wait.

A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.

Similar presentations


Presentation on theme: "A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the."— Presentation transcript:

1 A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the Ensembl database –Who is behind it –What it can tell you about the genome –How to interpret the data Ensembl database web pages

2 Background The HGP has produced the first “draft” sequence of the human genome. This sequence is not a finished product –it contains errors and will need much work before it can be considered truly accurate. However, it will provide scientists with their first overall view of the sequence of the human genome. Producing this draft sequence is much like assembling a huge jigsaw puzzle. Millions of short pieces of DNA must be fitted together to form an overall sequence of the complete genome. To make it more complicated the “pieces” come from all over the world; produced by the international collaboration of sequencing laboratories that comprise the HGP consortium. As the DNA is produced it is released into the public domain by placing it in publicly accessible databases such as EMBL and Genbank.

3 The Jigsaw Puzzle Genome Modern DNA sequencing technology can only determine accurate sequences of short stretches of DNA (less than 1000 base pairs). Since the human genome is in excess of 3 billion base pairs long the genome has had to be sequenced in many small pieces that must be reassembled afterwards. The pieces are reassembled by comparing the sequence of the ends to find overlaps which can be used to join them together.

4 Ensembl is a joint project between EMBL-EBI and the Sanger Center to develop a system which automatically tracks all the sequenced pieces of the human genome, attempts to assemble them into large single stretches and then analyse the assembled DNA to find genes and other features of interest to biologists and medical researchers. Ensembl: –Is “fed” raw DNA sequence taken from the public DNA databases –Puts it into a large tracking database (the “Ensembl” database) –Joins the sequences into their proper place in the genome –Automatically finds genes and other features in the sequence –Presents the results on the internet for everyone to see, for free. What is Ensembl? Ensembl Database World DNA data MapSNP WWW Sanger Centre Computation Analysis Pipeline

5 Keeping track of the thousands of individual pieces of DNA making up the human genome jigsaw puzzle is very difficult. As the sequence is refined and mistakes are corrected in sequencing labs around the world the sequence of the pieces changes. It is vitally important to keep track of these changes accurately so that consistent “big picture” is maintained. This task is extremely difficult to do manually and would require many people to do it. Automatic tracking via a system such as Ensembl is quicker, cheaper and more accurate. Why do we need Ensembl?

6 What’s in the Ensembl database? All of the human genome DNA that is currently available in the public domain. Collectively, the features identified on the DNA sequence by Ensembl are called “annotation” and mostly comprise: –Genes. These fall into 3 general classes: genes that are known already from other experiments genes that are predicted by the Ensembl system –Other interesting features of the DNA such as: SNPs (single nucleotide polymorphisms) Repeats (regions of simple repetitive DNA sequence) Regions highly similar to other sequences in the public databases (also called “homologies”).

7 How does Ensembl predict genes? Ensembl uses specialized gene finding software called “Genscan” to predict the location of gene sequences. The software studies DNA sequences and identifies DNA regions that look like they may be genes. These “candidate” gene sequences are then compared to the sequence of all known genes in the public databases. If matches are found then this provides “supporting evidence” suggesting the predictions are accurate to some degree. The predicted genes are stored in the database so that they can be retrieved later.

8 Ensembl Naming Conventions Keeping stable name for “things”, such as genes, in databases is very important. This allows scientists in different labs around the world to be confident that they are all referring to the same “thing”. Ensembl goes to great lengths to try to maintain stable names for genes and other features in the genome. This is a very difficult task because Ensembl is an environment where DNA sequence is continuously changing and being improved. Changes to the underlying DNA sequence may cause new genes to be created, deleted, altered or merged with one another. Wherever possible the names are maintained when the DNA sequence is revised. Ensembl keeps a “version” number for many things so changes can be tracked over time. Ensembl identifiers look like: –“ENSG00000XXXX” for genes –“ENST00000XXXX” for gene transcripts, –etc,

9 The Ensembl Website The Ensembl website is at: http://www.ensembl.org/ It provides a quick and easy way to browse the contents of the Ensembl database or find specific items of interest. There are a number of main entry points into the Ensembl database. –DNA similarity searches (“BLAST” searches). This is useful if you already have a DNA or protein sequence and you want to see if anything similar exists in the Ensembl database. –Browse from the chromosome level all the way down to the DNA sequence level. –Ensembl identifier search. If you already have an ID number you can search for it directly. –Known gene names. –OMIM diseases. –Free text search of OMIM,SWISSPROT and InterPro annotation

10 Browsing Ensembl From the Ensembl home page click on a picture of a chromosome you are interested in.

11 Browsing Chromosome Maps Feature Density Plots The chromosome view shows a picture of the chromosome and graphical representations of features on the chromosome. Click anywhere on the image to see a magnified view of that region.

12 In addition to sequence displays a map of DNA fragments is shown giving the location of genes. Each display is a magnified view of the red window in the display above. Genes positions are shown under the map Use these buttons to move and resize your view Browsing Contig Displays Adjacent contigs are shown in alternating blue Landmark map markers Location on the chromosome 1Mb overview of the region Use these menus to reconfigure your view and access advanced features.

13 The region of interest is the area surrounded by the red window. The Contigview Overview display always shows 1Mb around this region. Clicking anywhere within this display will center around that click. Using Contig Overview

14 Holding your mouse over features in the detailed view will pop up a menu through which you can access detailed information about those features. Homologies to other known sequences Features on reverse DNA strand Simple sequence repeats Sequence length Features on forward DNA strand Using Contig Detailed View (1) EMBL annotation Known Ensembl transcript Mouse trace alignments “Unstranded” features Clone tiling path

15 Using Contig Detailed View (2) Menus at the top of the detailed display control the features which are displayed. You may also export the region in a variety of formats or view the region in using other genome browsers.

16 Adding External Data to ContigView via DAS DAS provides a system for adding user-defined data to Ensembl displays. An external server serves features which may be layered onto the Ensembl ContigView. 1. Access the “DAS sources” menu. 2. Enter your DAS server and add your sources 3. Manage your existing sources 4. Refresh your ContigView Display External Sources

17 View marker sequence Database where markers are stored Marker flanking sequences Interpreting Marker Information View chromosome map

18 Gene Views Clicking on a gene displays detailed information about gene structure Gene structure Supporting evidence leading to the prediction of this gene Transcript cDNA sequence Predicted properties Transcript context

19 Evidence supporting the prediction of a gene ordered according to its reliability. Reliable data is shown in green. Lower reliability evidence is shown in grey Supporting data source Supporting data ID Summary of data Diagrammatic representation of which part of the gene prediction this evidence supports. Supporting Evidence For Genes

20 Further Information The Ensembl Project: http://www.ensembl.org/ Ensembl Trace Server: http://trace.ensembl.org/ Ensembl Distributed Annotation Server: http://servlet.sanger.ac.uk/das/ Human Genome Central Resources: http://www.ensembl.org/genome/central/


Download ppt "A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the."

Similar presentations


Ads by Google