Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Genes and Genomes with Ensembl

Similar presentations


Presentation on theme: "Introduction to Genes and Genomes with Ensembl"— Presentation transcript:

1 Introduction to Genes and Genomes with Ensembl

2 Large amounts of raw DNA sequence data
CGGCCTTTGGGCTCCGCCTTCAGCTCAAGACTTAACTTCCCTCCCAGCTGTCCCAGATGACGCCATCTGAAATTTCTTGGAAAC ACGATCACTTTAACGGAATATTGCTGTTTTGGGGAAGTGTTTTACAGCTGCTGGGCACGCTGTATTTGCCTTACTTAAGCCCCT GGTAATTGCTGTATTCCGAAGACATGCTGATGGGAATTACCAGGCGGCGTTGGTCTCTAACTGGAGCCCTCTGTCCCCACTAGC CACGCGTCACTGGTTAGCGTGATTGAAACTAAATCGTATGAAAATCCTCTTCTCTAGTCGCACTAGCCACGTTTCGAGTGCTTA ATGTGGCTAGTGGCACCGGTTTGGACAGCACAGCTGTAAAATGTTCCCATCCTCACAGTAAGCTGTTACCGTTCCAGGAGATGG GACTGAATTAGAATTCAAACAAATTTTCCAGCGCTTCTGAGTTTTACCTCAGTCACATAATAAGGAATGCATCCCTGTGTAAGT GCATTTTGGTCTTCTGTTTTGCAGACTTATTTACCAAGCATTGGAGGAATATCGTAGGTAAAAATGCCTATTGGATCCAAAGAG AGGCCAACATTTTTTGAAATTTTTAAGACACGCTGCAACAAAGCAGGTATTGACAAATTTTATATAACTTTATAAATTACACCG AGAAAGTGTTTTCTAAAAAATGCTTGCTAAAAACCCAGTACGTCACAGTGTTGCTTAGAACCATAAACTGTTCCTTATGTGTGT ATAAATCCAGTTAACAACATAATCATCGTTTGCAGGTTAACCACATGATAAATATAGAACGTCTAGTGGATAAAGAGGAAACTG GCCCCTTGACTAGCAGTAGGAACAATTACTAACAAATCAGAAGCATTAATGTTACTTTATGGCAGAAGTTGTCCAACTTTTTGG TTTCAGTACTCCTTATACTCT AACTAAGAATTTAAGGCTGGG CCAGAAGTTTGAGACCAGCCT GTGCCTGTAATCCCAGCTACA ATGCCACTGCACTCTAGCCTG TAAAAATGATCTAGGACCCCCGGAGTGCTTTTGTTTATGTAGCT CGTGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAG GGCCAACATGGTGAAACCCTATCTCTACTAAAAATACAAAAAAT CGGGAGGTGGAGGCAGGAGAATCGCTTGAACCCTGGAGGCAGAG GGCCACATAGCATGACTCTGTCTCAAAACAAACAAACAAACAAA Large amounts of raw DNA sequence data TACCATATTAGAAATTTAA GTGGGCGGATCACTTGAGG GTGCTGCGTGTGGTGGTGC GTTGCAGTGAGCCAAGATC AAACTAAGAATTTAAAGTT AATTTACTTAAAAATAATGAAAGCTAACCCATTGCATATTATCACAACATTCTTAGGAAAAATAACTTTTTGAAAACAAGTGAG TGGAATAGTTTTTACATTTTTGCAGTTCTCTTTAATGTCTGGCTAAATAGAGATAGCTGGATTCACTTATCTGTGTCTAATCTG TTATTTTGGTAGAAGTATGTGAAAAAAAATTAACCTCACGTTGAAAAAAGGAATATTTTAATAGTTTTCAGTTACTTTTTGGTA TTTTTCCTTGTACTTTGCATAGATTTTTCAAAGATCTAATAGATATACCATAGGTCTTTCCCATGTCGCAACATCATGCAGTGA TTATTTGGAAGATAGTGGTGTTCTGAATTATACAAAGTTTCCAAATATTGATAAATTGCATTAAACTATTTTAAAAATCTCATT CATTAATACCACCATGGATGTCAGAAAAGTCTTTTAAGATTGGGTAGAAATGAGCCACTGGAAATTCTAATTTTCATTTGAAAG TTCACATTTTGTCATTGACAACAAACTGTTTTCCTTGCAGCAACAAGATCACTTCATTGATTTGTGAGAAAATGTCTACCAAAT TATTTAAGTTGAAATAACTTTGTCAGCTGTTCTTTCAAGTAAAAATGACTTTTCATTGAAAAAATTGCTTGTTCAGATCACAGC TCAACATGAGTGCTTTTCTAGGCAGTATTGTACTTCAGTATGCAGAAGTGCTTTATGTATGCTTCCTATTTTGTCAGAGATTAT TAAAAGAAGTGCTAAAGCATTGAGCTTCGAAATTAATTTTTACTGCTTCATTAGGACATTCTTACATTAAACTGGCATTATTAT TACTATTATTTTTAACAAGGACACTCAGTGGTAAGGAATATAATGGCTACTAGTATTAGTTTGGTGCCACTGCCATAACTCATG CAAATGTGCCAGCAGTTTTACCCAGCATCATCTTTGCACTGTTGATACAAATGTCAACATCATGAAAAAGGGTTGAAAAAAGGA ATATTTTAATAGTTTTCAGTTACTTTTTGGTATTTTTCCTTGTACTTTGCATAGATTTTTCAAAGATCTAATAGATATACCCGA

3 Making Sense out of Sequence …

4 The Ensembl genome browser: making it interesting
The ENCODE (ENCyclopedia Of DNA Elements) project Science 306: (2004) Genes Variation Regulatory elements 9

5 Vertebrate species on Ensembl
Mostly vertebrates                    

6 Non‐vertebrates on Ensembl genomes
Fungi Bacteria Protists Metazoa Plants

7 Ensembl and EnsemblGenomes

8 Ensembl gene models Automatic annotation Manual annotation

9 Automatic gene annotation
Genome-wide determination using the Ensembl automated pipeline Predictions based on the genomic sequence (ab initio) Predictions based on experimental (biological) data ESTs RNAseq data cDNA and protein alignments (from sequence DBs)

10 Biological Evidence International Nucleotide Sequence databases
Protein sequence databases Swiss-Prot: manually curated TrEMBL: unreviewed translations NCBI RefSeq Manually annotated proteins and mRNAs (NP, NM)

11 Manual gene annotation
Gene determination on a case by case basis by a curator Genome-wide Genes list h v

12 Ensembl automatic annotation

13 Automatic annotation Many species (>60) Genome-wide at once Manual annotation Few species (Hs, Mm, Dr) Gene-by-gene

14 Golden transcripts Identical annotation • Higher confidence and quality
gf 3’ UTR 5’ UTR UTR Intron Exon Exons are drawn as boxes – filled boxes are coding and unfilled boxes are untranslated. Introns are drawn as lines.

15 CCDS transcripts Consensus coding DNA sequence set
Agreement between EBI, WTSI, UCSC and NCBI CCDS transcript vg

16 Higher quality transcripts
CCDS transcripts (protein-coding only) Ensembl/Havana merged transcripts Both a limited number of species

17 Ensembl stable IDs ENSG########### Ensembl Gene ID
ENST########### Ensembl Transcript ID ENSP########### Ensembl Peptide ID ENSE########### Ensembl Exon ID For non‐human species a suffix is added: MUS (Mus musculus) for mouse ENSMUSG### DAR (Danio rerio) for zebrafish: ENSDARG###

18 NCBI http://www.youtube.com/ncbinlm Go to www.youtube.com
Search “NCBI tutorial general”

19 The National Center for Biotechnology Information
Bethesda,MD Created in 1988 as a part of the National Library of Medicine at NIH Establish public databases Research in computational biology Develop software tools for sequence analysis – Disseminate biomedical information

20 Three international nucleotide sequence databases

21 Selected NCBI Databases
Biomedical literature PubMed free Medline PubMed Central full text online access NCBI Bookshelf online biomedical textbooks Biomolecular Databases Nucleotide GenBank submitted sequence records RefSeq curated NCBI reference sequences Protein GenBank and RefSeq translations, outside protein dbSNP small scale genetic variations Structure biomolecular 3-D structures MMDB NCBI’s 3D structure database GEO microarray expression data SRA next-generation sequence data

22 GenBank & RefSeq

23 RefSeq: NCBI’s Derivative Sequence Database
Experimentally verified / curated transcripts and proteins NM_, NP_ accession numbers Model transcripts and proteins XM_, XP_ accession numbers Assembled Genomic Regions (contigs) NT_, NW_ accession numbers Chromosome records NC_, AC_ accession numbers RefSeqGene Records NG_ accession numbers (NG_ also used pseudo genes and other fixed genomic sequences) Draft whole genome shotgun assemblies (microbial) NZ_ accession numbers Microbial proteins NP_, YP_, ZP_ accessions

24 UCSC Genome Browser

25 GeneCards


Download ppt "Introduction to Genes and Genomes with Ensembl"

Similar presentations


Ads by Google