Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome Annotation.

Similar presentations


Presentation on theme: "Genome Annotation."— Presentation transcript:

1 Genome Annotation

2 Genome Annotation Annotation is the process of adding information to DNA sequence. The information usually has DNA coordinate. Features could be repeats, genes, promoters, protein domains…….. Features can be linked to other databases eg Pfam/Pubmed

3 Genome Annotation Genome Databases The EMBL file format
Editing EMBL files with Artemis The annotation process Common pitfalls

4 Public Databases Genbank, Embl and DDBJ.
All databases update each other automatically

5 EMBL and TREMBL Patricia Rodriguez-Tomé , Peter J. Stoehr , Graham N. Cameron and Tomas P. Flores, "The European Bioinformatics Institute (EBI) databases", Nucleic Acids Res. 24:(6-13), 1996 EMBL currently contains entries

6 EMBL File Contains: A header File containing:
Information about the sequence Organism Authors References Comments A feature table containing Sequence features and co-ordinates

7 Header File ID PFMAL1P4 standard; DNA; INV; 66441 BP. XX AC AL031747;
SV AL DT 24-SEP-1998 (Rel. 57, Created) DT 27-APR-2000 (Rel. 63, Last updated, Version 13) DE Plasmodium falciparum DNA from MAL1P4 KW HTG; rifin; telomere; var; var-like hypothetical protein. OS Plasmodium falciparum (malaria parasite P. falciparum) OC Eukaryota; Alveolata; Apicomplexa; Haemosporida; Plasmodium. RN [1] RA Oliver K., Bowman S., Churcher C., Harris B., Harris D., Lawson D., RA Quail M., Rajandream M., Barrell B.; RT ; RL Submitted (24-SEP-1998) to the EMBL/GenBank/DDBJ databases. RL P.falciparum Genome Sequencing Consortium, The Sanger Centre, Wellcome RL Trust Genome Campus, Hinxton, Cambridge CB10 1S. Header File

8 EMBL File Feature Table
Anything that can have a cordinate on a DNA sequence. misc_difference misc_feature misc_recomb misc_RNA misc_signal misc_structure modified_base mRNA N_region old_sequence polyA_signal polyA_site precursor_RNA prim_transcript primer_bind promoter protein_bind RBS repeat_region repeat_unit rep_origin rRNA S_region satellite scRNA sig_peptide snRNA snoRNA source stem_loop STS TATA_signal terminator attenuator C_region CAAT_signal CDS conflict D-loop D_segment enhancer exon GC_signal gene iDNA intron J_segment LTR mat_peptide misc_binding transit_peptide tRNA unsure V_region V_segment variation 3'clip 3'UTR 5'clip 5'UTR -10_signal -35_signal

9 Feature qualifiesr Additional information about a feature
/allele="text" /citation=[number] /codon=(seq:"text",aa:<amino_acid>) /codon_start=<1 /db_xref="<database>:<identifier>" /EC_number="text" /evidence=<evidence_value> /exception="text" /function="text" /gene="text" /label=feature_label /map="text" /note="text" /number=unquoted /product="text" /protein_id="<identifier>" /pseudo /standard_name="text" /translation="text" /transl_except=(pos:<base_range>,aa:<amino_acid>) /transl_table /usedin=accnum:feature_label

10 Features

11 Annotation in Artemis FT CDS FT /db_xref="IPR002038" FT /gene="PfLtest.01" FT /label=PfLtest.01 FT /note="PfLtest.01. len=227aa. Asp-rich protein.Predicted FT by Genefinder, Phat and GlimmerM. Similar to Plasmodium FT falciparum hypothetical kDa protein TR:O97242 FT (EMBL:AL034558) (1114 aa) fasta scores: E(): 7.1e-21, FT % id in 196 aa." FT /product="Asp-rich hypothetical protein" FT /colour=10 FT /fasta_file="fasta/sanger_100kb.embl.seq out" FT misc_feature complement( ) FT /fasta_file="fasta/TEST100.tab.seq out" FT CDS FT /gene="PfLtest.02" FT /label=PfLtest.02 FT /note="PfLtest.02. len=556aa. Predicted by Genefinder, FT Phat and GlimmerM. Unknown hypothetical protein" FT /product="unknown hypothetical protein" FT /colour=8 FT /fasta_file="fasta/sanger_100kb.embl.seq out"

12 CDS features CDS stands for coding sequence and is used to denote genes and pseudogenes. These features are automatically translated on submission and the protein added to the protein databases.

13 /note Note field contains all the evidence for a gene call……..plus anything else. Similarity (fasta or blast) Domain/motif information (pfam, tmhmm etc) Unusual features (repeats, aa richness)

14 /product The name of the gene product
eg Alcohol dehydrogenase Unless there is proof we must qualify.. Putative Possible Always be conservative!.. eg. Putative dehydrogenase dehyrogenase like protein Only piece of annotation added to the protein databases.

15 Naming protocols Hypothetical protein unknown function and no homology
Conserved hypothetical protein unknown function WITH homology alcohol dehydrogenase like looks a bit like it, but may not be. Putative alcohol dehydrogenase probably a alcohol dehydrogenase Alcohol dehydrogenase this has previously been characterised and shown to be alcohol dehydrogenase in this organism.

16 /gene The gene name Only transfer a gene name if it is meaningful
Eg ADH1 Only transfer a gene name if it is meaningful Never transfer a gene name like PfB0024. Is it a gene family? make sure two genes have the same name.

17 Transitive Annotation
AKA annotation catastrophe Junk in = Junk out Miss-annotations spread through incorrect database submissions.

18

19

20

21

22

23

24

25

26

27

28

29


Download ppt "Genome Annotation."

Similar presentations


Ads by Google