Download presentation
Presentation is loading. Please wait.
Published byHector Webster Modified over 9 years ago
1
Glossina Transcriptome Annotation Karyn Megy, VectorBase European Bioinformatics Institute, UK
2
Glossina Transcriptome Annotation Nairobi, May 2008 2 Plan Goal Background What to annotate? How to annotate? Tips for annotation
3
Glossina Transcriptome Annotation Nairobi, May 2008 3 Goals Use the Glossina ESTs to… –… characterize the gene structure –… predict the functional annotation Ultimate goal –Tsetse genome project –Transcriptome analysis –Gene expression analysis –Vector disease, viviparity, strict hemiphagy etc. –Gene expansion, species specific genes etc. –Species comparison (Gl.morsitans vs. Gl.palpalis)
4
Glossina Transcriptome Annotation Nairobi, May 2008 4 Who? Bioinformatics –EST -> cluster -> contig –Contig -> ORF -> annotation Visualization –H-Inv lite Functional annotation assessment –Manually us!
5
Glossina Transcriptome Annotation Nairobi, May 2008 5 Background: ESTs Expressed Sequence Tag (EST) –Short fragment of expressed sequence Single read sequences Generated from the 5’ or 3’ ends of transcripts 500-700 nt
6
Glossina Transcriptome Annotation Nairobi, May 2008 6 EST generation
7
Glossina Transcriptome Annotation Nairobi, May 2008 7 Background: ESTs Expressed Sequence Tag (EST) –Short fragment of expressed sequence Single read sequences Generated from the 5’ or 3’ ends of transcripts 500-700 nt EST libraries Represent the transcriptome of a cell, at a given stage, in a given condition
8
Glossina Transcriptome Annotation Nairobi, May 2008 8 EST disadvantages Error prone (single read) Incomplete gene sequence (3’ or 5’ ends) Bias toward highly expressed genes (random transcripts) Repeated domains and large gene families lead to misinterpretation
9
Glossina Transcriptome Annotation Nairobi, May 2008 9 Background: from ESTs to contigs EST preprocessing Mask ESTs (remove vector etc.) Size selection (>200nt) X XX XXX XX ESTs XX XXX XX Clusters Contigs Clusterise using STACK Winston Hide - SANBI Uses RM, d2 cluster and PHRAP
10
Glossina Transcriptome Annotation Nairobi, May 2008 10 Glossina fct: ? contig Background: functional annotation Open Reading Frame (ORF) prediction Drosophila fct: myosin light chain Transfer function myosin light chain ORF Annotation ‘‘by association’’ –Blast contigs vs. SwissProt, UniProt, nr GenBank –All organisms –‘Transfer’ description of a sequence that match
11
Glossina Transcriptome Annotation Nairobi, May 2008 11 Background: functional annotation Annotation ‘‘by association’’ –Blast contigs vs. SwissProt, UniProt, nr GenBank –All organisms –‘Transfer’ description of a sequence that match SuperTACT (JBIRC) => Manual selection of the description to transfer
12
Glossina Transcriptome Annotation Nairobi, May 2008 12 Background: from EST to ORF X XX XXX XX ESTs XX XXX XX Clusters Contigs SANBI JBIRC
13
Glossina Transcriptome Annotation Nairobi, May 2008 13 Background: functional annotation Six categories 1. SANBI + JBIRC identical to known Glossina proteins 2. SANBI or JBIRC identical to known Glossina proteins 3. SANBI + JBIRC identical to known proteins, any species 4. SANBI or JBIRC identical to known proteins, any species 5. SANBI or JBIRC identical to Interpro domains (only) 6. SANBI + JBRIC identical to ‘hypothetical’ proteins <0.5% <0.5% 45% 6% 6% 45%
14
Glossina Transcriptome Annotation Nairobi, May 2008 14 What to annotate? ORF –Select the most probable one (SANBI, JBIRC) Gene Ontology, Describe a gene function with a define vocabulary Enzyme Classification Describe an enzyme function with a define vocabulary Function –Description, –Gene name, –Bonus: GO term, EC number, processes
15
Glossina Transcriptome Annotation Nairobi, May 2008 15 How to annotate? H-Inv lite –From the JBIRC –Initially developed for annotation of Human cDNA –‘Light’ version for Glossina
16
Glossina Transcriptome Annotation Nairobi, May 2008 16 How to annotate? H-Inv lite –One page per contig, –Two sections per page: SANBI and JBIRC, –Each section contains: EST contig & proposed ORF, Information about this ORF, Blast results (links), Interpro matches, Best Drosophila match, Annotation proposed, ORF and protein sequences.
17
Glossina Transcriptome Annotation Nairobi, May 2008 17 H-Inv lite Contig ORF Blast matches Interpro matches name # ESTs
18
Glossina Transcriptome Annotation Nairobi, May 2008 18 ORF information Gene description Organism Blast results
19
Glossina Transcriptome Annotation Nairobi, May 2008 19
20
Glossina Transcriptome Annotation Nairobi, May 2008 20
21
Glossina Transcriptome Annotation Nairobi, May 2008 21 Low complexity? Ns? STOP? Xs?
22
Glossina Transcriptome Annotation Nairobi, May 2008 22 Annotation Summary Match to transfer the annotation from Annotator Status Annotator SANBI automatic JBIRC automatic
23
Glossina Transcriptome Annotation Nairobi, May 2008 23 How to annotate? H-Inv lite - edit –Decide on the ORF and the annotation, –Edit the entry, –Select the annotator name and set a status, –Select the ORF and a description, –Add comments if necessary, –Save, –Double check.
24
Glossina Transcriptome Annotation Nairobi, May 2008 24 H-Inv lite - edit... and log in
25
Glossina Transcriptome Annotation Nairobi, May 2008 25 H-Inv lite - edit 1. Should be yours automatically 2. Set to finish … and change if required IGNORE THIS PART ! (and don’t modify it)
26
Glossina Transcriptome Annotation Nairobi, May 2008 26 3. Select the annotation you’ve chosen: SANBI auto-annotation SANBI Fasty1 SANBI Fasty2 SANBI Fasty3 etc. Same for JBIRC 4. Add comments if required (use the comment tags!)
27
Glossina Transcriptome Annotation Nairobi, May 2008 27
28
Glossina Transcriptome Annotation Nairobi, May 2008 28 How do I know which genes to annotate? Edit to change status
29
Glossina Transcriptome Annotation Nairobi, May 2008 29 How to annotate? ORF choice –Length –Protein sequence: stop/start at the extremities? stop in the middle? stretches of Xs? Start = M (Methionine) Stop = *
30
Glossina Transcriptome Annotation Nairobi, May 2008 30 How to annotate? Function choice –Proper gene description, –Closest organisms are the most trustful –Drosophila best annotation –Aedes, Anopheles automatic annotation, Aedes best –SwissProt preferably (SW) –Good e-value, –Good subject coverage, good %-identity
31
Glossina Transcriptome Annotation Nairobi, May 2008 31 How to annotate? Function choice –Description, –Transfer from another sequence, –Combine several description, –Interpro description, –Gene name, –Bonus: GO term, EC number, processes MEANINGFUL !! CG13017, ENSANGxxxx, LOC1234 are identifiers, not description!
32
Glossina Transcriptome Annotation Nairobi, May 2008 32 How to annotate? Function choice - be careful !! –Large gene families –If unsure about the member, don’t put it! –E.g.: ‘Yolk-1’ or ‘Yolk-2’ ?Choose ‘Yolk’ –Gene name –Don’t invent one –Try to take an insect one –Meaningful E.g.: CG13017 doesn’t mean anything!
33
Glossina Transcriptome Annotation Nairobi, May 2008 33 How to annotate? Comments –Change name, (= gene description) –Gene symbol, –Process type, –Revisions, –GO disagreement, –EC number, –Suspend
34
Glossina Transcriptome Annotation Nairobi, May 2008 34 How to annotate? Comments –Change name, –Gene symbol, –Process type, –Revisions, –GO disagreement, –EC number, –Suspend Modify/add the gene description. Has to be meaningful ! Name: Yolk protein 2 fragment
35
Glossina Transcriptome Annotation Nairobi, May 2008 35 How to annotate? Comments –Change name, –Gene symbol, –Process type, –Revisions, –GO disagreement, –EC number, –Suspend Modify/add the gene symbol Don’t invent one ! Gene symbol: Yp2
36
Glossina Transcriptome Annotation Nairobi, May 2008 36 How to annotate? Comments –Change name, –Gene symbol, –Process type, –Revisions, –GO disagreement, –EC number, –Suspend Describe the process in which this gene is involved Defense, Olfactory, Signaling, Immunity, Reproduction, Sensory Metabolism, Development. Only if known, don’t spend time on it ! Process type: Olfactory
37
Glossina Transcriptome Annotation Nairobi, May 2008 37 How to annotate? Comments –Change name, –Gene symbol, –Process type, –Revisions, –GO disagreement, –EC number, –Suspend Modify the ORF If the ORF is too long/short, Frameshift, Fragment Revision: ORF too short
38
Glossina Transcriptome Annotation Nairobi, May 2008 38 How to annotate? Comments –Change name, –Gene symbol, –Process type, –Revisions, –GO disagreement, –EC number, –Suspend If disagreement with the ORF Only if obvious ! GO disagreement:GO:0016459
39
Glossina Transcriptome Annotation Nairobi, May 2008 39 How to annotate? Comments –Change name, –Gene symbol, –Process type, –Revisions, –GO disagreement, –EC number, –Suspend Assign an EC number Only if obvious ! E.g. from other description EC_Number: E.C. 3.4.11.4
40
Glossina Transcriptome Annotation Nairobi, May 2008 40 How to annotate? Comments –Change name, –Gene symbol, –Process type, –Revisions, –GO disagreement, –EC number, –Suspend When suspending an entry, Explanation for Suspension Suspend: ORF fusion
41
Glossina Transcriptome Annotation Nairobi, May 2008 41 Practical tips Reduce the browser size Ctrl -(Ctrl + to increase) Open two tabs at the same time One to work with, one that’s loading NOT MORE! Or we will saturate the SANBI server Use a text editor to c/paste Keep track of the status in the wiki It’s good morally!
42
Glossina Transcriptome Annotation Nairobi, May 2008 42 Huge responsibility! The description is permanent –Used in analysis, –Transferred to other genes, You will have to make some decisions First few contigs: –Spend some time to make sure you understand how to do then it goes much faster. When to seek for help? –weird case, unsure of something
43
Glossina Transcriptome Annotation Nairobi, May 2008 43 Good luck!
44
Glossina Transcriptome Annotation Nairobi, May 2008 44 Examples Example: –http://hinvlite.sanbi.ac.za/bin/view/Main/CN131567 http://hinvlite.sanbi.ac.za/bin/view/Main/CN7337 http://hinvlite.sanbi.ac.za/bin/view/Main/CN1330 http://hinvlite.sanbi.ac.za/bin/view/Main/CN4928 http://hinvlite.sanbi.ac.za/bin/view/Main/CN13961 http://hinvlite.sanbi.ac.za/bin/view/Main/CN13156
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.