Glossina Transcriptome Annotation Karyn Megy, VectorBase European Bioinformatics Institute, UK
Glossina Transcriptome Annotation Nairobi, May Plan Goal Background What to annotate? How to annotate? Tips for annotation
Glossina Transcriptome Annotation Nairobi, May Goals Use the Glossina ESTs to… –… characterize the gene structure –… predict the functional annotation Ultimate goal –Tsetse genome project –Transcriptome analysis –Gene expression analysis –Vector disease, viviparity, strict hemiphagy etc. –Gene expansion, species specific genes etc. –Species comparison (Gl.morsitans vs. Gl.palpalis)
Glossina Transcriptome Annotation Nairobi, May Who? Bioinformatics –EST -> cluster -> contig –Contig -> ORF -> annotation Visualization –H-Inv lite Functional annotation assessment –Manually us!
Glossina Transcriptome Annotation Nairobi, May Background: ESTs Expressed Sequence Tag (EST) –Short fragment of expressed sequence Single read sequences Generated from the 5’ or 3’ ends of transcripts nt
Glossina Transcriptome Annotation Nairobi, May EST generation
Glossina Transcriptome Annotation Nairobi, May Background: ESTs Expressed Sequence Tag (EST) –Short fragment of expressed sequence Single read sequences Generated from the 5’ or 3’ ends of transcripts nt EST libraries Represent the transcriptome of a cell, at a given stage, in a given condition
Glossina Transcriptome Annotation Nairobi, May EST disadvantages Error prone (single read) Incomplete gene sequence (3’ or 5’ ends) Bias toward highly expressed genes (random transcripts) Repeated domains and large gene families lead to misinterpretation
Glossina Transcriptome Annotation Nairobi, May Background: from ESTs to contigs EST preprocessing Mask ESTs (remove vector etc.) Size selection (>200nt) X XX XXX XX ESTs XX XXX XX Clusters Contigs Clusterise using STACK Winston Hide - SANBI Uses RM, d2 cluster and PHRAP
Glossina Transcriptome Annotation Nairobi, May Glossina fct: ? contig Background: functional annotation Open Reading Frame (ORF) prediction Drosophila fct: myosin light chain Transfer function myosin light chain ORF Annotation ‘‘by association’’ –Blast contigs vs. SwissProt, UniProt, nr GenBank –All organisms –‘Transfer’ description of a sequence that match
Glossina Transcriptome Annotation Nairobi, May Background: functional annotation Annotation ‘‘by association’’ –Blast contigs vs. SwissProt, UniProt, nr GenBank –All organisms –‘Transfer’ description of a sequence that match SuperTACT (JBIRC) => Manual selection of the description to transfer
Glossina Transcriptome Annotation Nairobi, May Background: from EST to ORF X XX XXX XX ESTs XX XXX XX Clusters Contigs SANBI JBIRC
Glossina Transcriptome Annotation Nairobi, May Background: functional annotation Six categories 1. SANBI + JBIRC identical to known Glossina proteins 2. SANBI or JBIRC identical to known Glossina proteins 3. SANBI + JBIRC identical to known proteins, any species 4. SANBI or JBIRC identical to known proteins, any species 5. SANBI or JBIRC identical to Interpro domains (only) 6. SANBI + JBRIC identical to ‘hypothetical’ proteins <0.5% <0.5% 45% 6% 6% 45%
Glossina Transcriptome Annotation Nairobi, May What to annotate? ORF –Select the most probable one (SANBI, JBIRC) Gene Ontology, Describe a gene function with a define vocabulary Enzyme Classification Describe an enzyme function with a define vocabulary Function –Description, –Gene name, –Bonus: GO term, EC number, processes
Glossina Transcriptome Annotation Nairobi, May How to annotate? H-Inv lite –From the JBIRC –Initially developed for annotation of Human cDNA –‘Light’ version for Glossina
Glossina Transcriptome Annotation Nairobi, May How to annotate? H-Inv lite –One page per contig, –Two sections per page: SANBI and JBIRC, –Each section contains: EST contig & proposed ORF, Information about this ORF, Blast results (links), Interpro matches, Best Drosophila match, Annotation proposed, ORF and protein sequences.
Glossina Transcriptome Annotation Nairobi, May H-Inv lite Contig ORF Blast matches Interpro matches name # ESTs
Glossina Transcriptome Annotation Nairobi, May ORF information Gene description Organism Blast results
Glossina Transcriptome Annotation Nairobi, May
Glossina Transcriptome Annotation Nairobi, May
Glossina Transcriptome Annotation Nairobi, May Low complexity? Ns? STOP? Xs?
Glossina Transcriptome Annotation Nairobi, May Annotation Summary Match to transfer the annotation from Annotator Status Annotator SANBI automatic JBIRC automatic
Glossina Transcriptome Annotation Nairobi, May How to annotate? H-Inv lite - edit –Decide on the ORF and the annotation, –Edit the entry, –Select the annotator name and set a status, –Select the ORF and a description, –Add comments if necessary, –Save, –Double check.
Glossina Transcriptome Annotation Nairobi, May H-Inv lite - edit... and log in
Glossina Transcriptome Annotation Nairobi, May H-Inv lite - edit 1. Should be yours automatically 2. Set to finish … and change if required IGNORE THIS PART ! (and don’t modify it)
Glossina Transcriptome Annotation Nairobi, May Select the annotation you’ve chosen: SANBI auto-annotation SANBI Fasty1 SANBI Fasty2 SANBI Fasty3 etc. Same for JBIRC 4. Add comments if required (use the comment tags!)
Glossina Transcriptome Annotation Nairobi, May
Glossina Transcriptome Annotation Nairobi, May How do I know which genes to annotate? Edit to change status
Glossina Transcriptome Annotation Nairobi, May How to annotate? ORF choice –Length –Protein sequence: stop/start at the extremities? stop in the middle? stretches of Xs? Start = M (Methionine) Stop = *
Glossina Transcriptome Annotation Nairobi, May How to annotate? Function choice –Proper gene description, –Closest organisms are the most trustful –Drosophila best annotation –Aedes, Anopheles automatic annotation, Aedes best –SwissProt preferably (SW) –Good e-value, –Good subject coverage, good %-identity
Glossina Transcriptome Annotation Nairobi, May How to annotate? Function choice –Description, –Transfer from another sequence, –Combine several description, –Interpro description, –Gene name, –Bonus: GO term, EC number, processes MEANINGFUL !! CG13017, ENSANGxxxx, LOC1234 are identifiers, not description!
Glossina Transcriptome Annotation Nairobi, May How to annotate? Function choice - be careful !! –Large gene families –If unsure about the member, don’t put it! –E.g.: ‘Yolk-1’ or ‘Yolk-2’ ?Choose ‘Yolk’ –Gene name –Don’t invent one –Try to take an insect one –Meaningful E.g.: CG13017 doesn’t mean anything!
Glossina Transcriptome Annotation Nairobi, May How to annotate? Comments –Change name, (= gene description) –Gene symbol, –Process type, –Revisions, –GO disagreement, –EC number, –Suspend
Glossina Transcriptome Annotation Nairobi, May How to annotate? Comments –Change name, –Gene symbol, –Process type, –Revisions, –GO disagreement, –EC number, –Suspend Modify/add the gene description. Has to be meaningful ! Name: Yolk protein 2 fragment
Glossina Transcriptome Annotation Nairobi, May How to annotate? Comments –Change name, –Gene symbol, –Process type, –Revisions, –GO disagreement, –EC number, –Suspend Modify/add the gene symbol Don’t invent one ! Gene symbol: Yp2
Glossina Transcriptome Annotation Nairobi, May How to annotate? Comments –Change name, –Gene symbol, –Process type, –Revisions, –GO disagreement, –EC number, –Suspend Describe the process in which this gene is involved Defense, Olfactory, Signaling, Immunity, Reproduction, Sensory Metabolism, Development. Only if known, don’t spend time on it ! Process type: Olfactory
Glossina Transcriptome Annotation Nairobi, May How to annotate? Comments –Change name, –Gene symbol, –Process type, –Revisions, –GO disagreement, –EC number, –Suspend Modify the ORF If the ORF is too long/short, Frameshift, Fragment Revision: ORF too short
Glossina Transcriptome Annotation Nairobi, May How to annotate? Comments –Change name, –Gene symbol, –Process type, –Revisions, –GO disagreement, –EC number, –Suspend If disagreement with the ORF Only if obvious ! GO disagreement:GO:
Glossina Transcriptome Annotation Nairobi, May How to annotate? Comments –Change name, –Gene symbol, –Process type, –Revisions, –GO disagreement, –EC number, –Suspend Assign an EC number Only if obvious ! E.g. from other description EC_Number: E.C
Glossina Transcriptome Annotation Nairobi, May How to annotate? Comments –Change name, –Gene symbol, –Process type, –Revisions, –GO disagreement, –EC number, –Suspend When suspending an entry, Explanation for Suspension Suspend: ORF fusion
Glossina Transcriptome Annotation Nairobi, May Practical tips Reduce the browser size Ctrl -(Ctrl + to increase) Open two tabs at the same time One to work with, one that’s loading NOT MORE! Or we will saturate the SANBI server Use a text editor to c/paste Keep track of the status in the wiki It’s good morally!
Glossina Transcriptome Annotation Nairobi, May Huge responsibility! The description is permanent –Used in analysis, –Transferred to other genes, You will have to make some decisions First few contigs: –Spend some time to make sure you understand how to do then it goes much faster. When to seek for help? –weird case, unsure of something
Glossina Transcriptome Annotation Nairobi, May Good luck!
Glossina Transcriptome Annotation Nairobi, May Examples Example: –