Glossina Transcriptome Annotation Karyn Megy, VectorBase European Bioinformatics Institute, UK.

Slides:



Advertisements
Similar presentations
Annotating a Scarlet Runner Bean genome fragment put together by shotgun sequencing Scarlet Runner ean Max Bachour.
Advertisements

SRI International Bioinformatics 1 The consistency Checker, or Overhauling a PGDB By Ron Caspi.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Genome Related Biological Databases. Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website.
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
Genome Annotation BCB 660 October 20, From Carson Holt.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Enzymatic Function Module (KEGG, MetaCyc, and EC Numbers)
Working with SharePoint Document Libraries. What are document libraries? Document libraries are collections of files that you can share with team members.
The Ensembl Gene set The “Genebuild” 21 April 2008.
Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Probe selection for Microarrays Considerations and pitfalls.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Advancing Science with DNA Sequence Data Curation in IMG-ER Natalia Ivanova MGM Workshop May 16, 2012.
Atlas.ti Training Manual Part 3: Quotations. 2 PART 3: QUOTATIONS What is a Quotation? A Quotation (or Quote) is a.
VectorBase A Resource Centre for Invertebrate Hosts of Human Pathogens Bob MacCallum Imperial College London.
Tomato genome annotation pipeline in Cyrille2
© Wiley Publishing All Rights Reserved. Protein and Specialized Sequence Databases.
Networks and Interactions Boo Virk v1.0.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support.
UniProt Non-redundant Reference Cluster (UniRef) Databases Swiss Institute of Bioinformatics (SIB) European Bioinformatics Institute (EMBL-EBI)
Muhammad Awais PhD Biochemistry 08-ARID-1103 Understanding Basic Local Alignment Search Tool.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Tutorial session 2 Network annotation Exploring PPI networks using Cytoscape EMBO Practical Course Session 8 Nadezhda Doncheva and Piet Molenaar.
MAKER Annotation Process Example of Glossina VectorBase Karyn Mégy Dan Hughes.
NCBI Vector-Parasite Genomic Related Databases Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 12, 2004
Build it Tweak it Use it Know it Love it. A tool to collaborate on projects What does Collaborate mean? To work together.
Part I: Identifying sequences with … Speaker : S. Gaj Date
Welcome to DNA Subway Classroom-friendly Bioinformatics.
RNA Sequencing I: De novo RNAseq
The consistency Checker, or Overhauling a PGDB By Ron Caspi.
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
Genome Annotation Rosana O. Babu.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
INTRODUCTION ● Expressed sequence tags offer a low cost approach to gene discovery ● For a range of non-model organisms, ESTs represent the only sequence.
Copyright OpenHelix. No use or reproduction without express written consent1.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
(PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt.
How can we find genes? Search for them Look them up.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Copyright OpenHelix. No use or reproduction without express written consent1.
A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.
Accessing and visualizing genomics data
SRI International Bioinformatics 1 Pathway Tools Features Available Only in the Desktop Version PathoLogic.
What is BLAST? Basic BLAST search What is BLAST?
Copyright OpenHelix. No use or reproduction without express written consent1.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Welcome to the combined BLAST and Genome Browser Tutorial.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Gene prediction in metagenomic fragments: A large scale machine learning approach Katharina J Hoff, Maike Tech, Thomas Lingner, Rolf Daniel, Burkhard Morgenstern.
What is BLAST? Basic BLAST search What is BLAST?
Bacterial infection by lytic virus
Bacterial infection by lytic virus
Human Genome Project.
Protein databases Henrik Nielsen
VectorBase genome annotation
Sequence based searches:
PathoLogic: More about Matching Enzyme Names to Reactions
Pick a Gene Assignment 4 Requirements
UniProt: Universal Protein Resource
Genome Annotation Continued
2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
Basic Local Alignment Search Tool
Part II SeqViewer AraCyc Help
Presentation transcript:

Glossina Transcriptome Annotation Karyn Megy, VectorBase European Bioinformatics Institute, UK

Glossina Transcriptome Annotation Nairobi, May Plan Goal Background What to annotate? How to annotate? Tips for annotation

Glossina Transcriptome Annotation Nairobi, May Goals Use the Glossina ESTs to… –… characterize the gene structure –… predict the functional annotation Ultimate goal –Tsetse genome project –Transcriptome analysis –Gene expression analysis –Vector disease, viviparity, strict hemiphagy etc. –Gene expansion, species specific genes etc. –Species comparison (Gl.morsitans vs. Gl.palpalis)

Glossina Transcriptome Annotation Nairobi, May Who? Bioinformatics –EST -> cluster -> contig –Contig -> ORF -> annotation Visualization –H-Inv lite Functional annotation assessment –Manually us!

Glossina Transcriptome Annotation Nairobi, May Background: ESTs Expressed Sequence Tag (EST) –Short fragment of expressed sequence Single read sequences Generated from the 5’ or 3’ ends of transcripts nt

Glossina Transcriptome Annotation Nairobi, May EST generation

Glossina Transcriptome Annotation Nairobi, May Background: ESTs Expressed Sequence Tag (EST) –Short fragment of expressed sequence Single read sequences Generated from the 5’ or 3’ ends of transcripts nt EST libraries  Represent the transcriptome of a cell, at a given stage, in a given condition

Glossina Transcriptome Annotation Nairobi, May EST disadvantages Error prone (single read) Incomplete gene sequence (3’ or 5’ ends) Bias toward highly expressed genes (random transcripts) Repeated domains and large gene families lead to misinterpretation

Glossina Transcriptome Annotation Nairobi, May Background: from ESTs to contigs EST preprocessing Mask ESTs (remove vector etc.) Size selection (>200nt) X XX XXX XX ESTs XX XXX XX Clusters Contigs Clusterise using STACK Winston Hide - SANBI Uses RM, d2 cluster and PHRAP

Glossina Transcriptome Annotation Nairobi, May Glossina fct: ? contig Background: functional annotation Open Reading Frame (ORF) prediction Drosophila fct: myosin light chain Transfer function myosin light chain ORF Annotation ‘‘by association’’ –Blast contigs vs. SwissProt, UniProt, nr GenBank –All organisms –‘Transfer’ description of a sequence that match

Glossina Transcriptome Annotation Nairobi, May Background: functional annotation Annotation ‘‘by association’’ –Blast contigs vs. SwissProt, UniProt, nr GenBank –All organisms –‘Transfer’ description of a sequence that match SuperTACT (JBIRC) => Manual selection of the description to transfer

Glossina Transcriptome Annotation Nairobi, May Background: from EST to ORF X XX XXX XX ESTs XX XXX XX Clusters Contigs SANBI JBIRC

Glossina Transcriptome Annotation Nairobi, May Background: functional annotation Six categories 1. SANBI + JBIRC identical to known Glossina proteins 2. SANBI or JBIRC identical to known Glossina proteins 3. SANBI + JBIRC identical to known proteins, any species 4. SANBI or JBIRC identical to known proteins, any species 5. SANBI or JBIRC identical to Interpro domains (only) 6. SANBI + JBRIC identical to ‘hypothetical’ proteins <0.5% <0.5% 45% 6% 6% 45%

Glossina Transcriptome Annotation Nairobi, May What to annotate? ORF –Select the most probable one (SANBI, JBIRC) Gene Ontology, Describe a gene function with a define vocabulary Enzyme Classification Describe an enzyme function with a define vocabulary Function –Description, –Gene name, –Bonus: GO term, EC number, processes

Glossina Transcriptome Annotation Nairobi, May How to annotate? H-Inv lite –From the JBIRC –Initially developed for annotation of Human cDNA –‘Light’ version for Glossina

Glossina Transcriptome Annotation Nairobi, May How to annotate? H-Inv lite –One page per contig, –Two sections per page: SANBI and JBIRC, –Each section contains: EST contig & proposed ORF, Information about this ORF, Blast results (links), Interpro matches, Best Drosophila match, Annotation proposed, ORF and protein sequences.

Glossina Transcriptome Annotation Nairobi, May H-Inv lite Contig ORF Blast matches Interpro matches name # ESTs

Glossina Transcriptome Annotation Nairobi, May ORF information Gene description Organism Blast results

Glossina Transcriptome Annotation Nairobi, May

Glossina Transcriptome Annotation Nairobi, May

Glossina Transcriptome Annotation Nairobi, May Low complexity? Ns? STOP? Xs?

Glossina Transcriptome Annotation Nairobi, May Annotation Summary Match to transfer the annotation from Annotator Status Annotator SANBI automatic JBIRC automatic

Glossina Transcriptome Annotation Nairobi, May How to annotate? H-Inv lite - edit –Decide on the ORF and the annotation, –Edit the entry, –Select the annotator name and set a status, –Select the ORF and a description, –Add comments if necessary, –Save, –Double check.

Glossina Transcriptome Annotation Nairobi, May H-Inv lite - edit... and log in

Glossina Transcriptome Annotation Nairobi, May H-Inv lite - edit 1. Should be yours automatically 2. Set to finish … and change if required IGNORE THIS PART ! (and don’t modify it)

Glossina Transcriptome Annotation Nairobi, May Select the annotation you’ve chosen: SANBI auto-annotation SANBI Fasty1 SANBI Fasty2 SANBI Fasty3 etc. Same for JBIRC 4. Add comments if required (use the comment tags!)

Glossina Transcriptome Annotation Nairobi, May

Glossina Transcriptome Annotation Nairobi, May How do I know which genes to annotate? Edit to change status

Glossina Transcriptome Annotation Nairobi, May How to annotate? ORF choice –Length –Protein sequence: stop/start at the extremities? stop in the middle? stretches of Xs? Start = M (Methionine) Stop = *

Glossina Transcriptome Annotation Nairobi, May How to annotate? Function choice –Proper gene description, –Closest organisms are the most trustful –Drosophila best annotation –Aedes, Anopheles automatic annotation, Aedes best –SwissProt preferably (SW) –Good e-value, –Good subject coverage, good %-identity

Glossina Transcriptome Annotation Nairobi, May How to annotate? Function choice –Description, –Transfer from another sequence, –Combine several description, –Interpro description, –Gene name, –Bonus: GO term, EC number, processes MEANINGFUL !! CG13017, ENSANGxxxx, LOC1234 are identifiers, not description!

Glossina Transcriptome Annotation Nairobi, May How to annotate? Function choice - be careful !! –Large gene families –If unsure about the member, don’t put it! –E.g.: ‘Yolk-1’ or ‘Yolk-2’ ?Choose ‘Yolk’ –Gene name –Don’t invent one –Try to take an insect one –Meaningful E.g.: CG13017 doesn’t mean anything!

Glossina Transcriptome Annotation Nairobi, May How to annotate? Comments –Change name, (= gene description) –Gene symbol, –Process type, –Revisions, –GO disagreement, –EC number, –Suspend

Glossina Transcriptome Annotation Nairobi, May How to annotate? Comments –Change name, –Gene symbol, –Process type, –Revisions, –GO disagreement, –EC number, –Suspend Modify/add the gene description. Has to be meaningful ! Name: Yolk protein 2 fragment

Glossina Transcriptome Annotation Nairobi, May How to annotate? Comments –Change name, –Gene symbol, –Process type, –Revisions, –GO disagreement, –EC number, –Suspend Modify/add the gene symbol Don’t invent one ! Gene symbol: Yp2

Glossina Transcriptome Annotation Nairobi, May How to annotate? Comments –Change name, –Gene symbol, –Process type, –Revisions, –GO disagreement, –EC number, –Suspend Describe the process in which this gene is involved Defense, Olfactory, Signaling, Immunity, Reproduction, Sensory Metabolism, Development. Only if known, don’t spend time on it ! Process type: Olfactory

Glossina Transcriptome Annotation Nairobi, May How to annotate? Comments –Change name, –Gene symbol, –Process type, –Revisions, –GO disagreement, –EC number, –Suspend Modify the ORF If the ORF is too long/short, Frameshift, Fragment Revision: ORF too short

Glossina Transcriptome Annotation Nairobi, May How to annotate? Comments –Change name, –Gene symbol, –Process type, –Revisions, –GO disagreement, –EC number, –Suspend If disagreement with the ORF Only if obvious ! GO disagreement:GO:

Glossina Transcriptome Annotation Nairobi, May How to annotate? Comments –Change name, –Gene symbol, –Process type, –Revisions, –GO disagreement, –EC number, –Suspend Assign an EC number Only if obvious ! E.g. from other description EC_Number: E.C

Glossina Transcriptome Annotation Nairobi, May How to annotate? Comments –Change name, –Gene symbol, –Process type, –Revisions, –GO disagreement, –EC number, –Suspend When suspending an entry, Explanation for Suspension Suspend: ORF fusion

Glossina Transcriptome Annotation Nairobi, May Practical tips Reduce the browser size Ctrl -(Ctrl + to increase) Open two tabs at the same time One to work with, one that’s loading NOT MORE! Or we will saturate the SANBI server Use a text editor to c/paste Keep track of the status in the wiki It’s good morally!

Glossina Transcriptome Annotation Nairobi, May Huge responsibility! The description is permanent –Used in analysis, –Transferred to other genes, You will have to make some decisions First few contigs: –Spend some time to make sure you understand how to do then it goes much faster. When to seek for help? –weird case, unsure of something

Glossina Transcriptome Annotation Nairobi, May Good luck!

Glossina Transcriptome Annotation Nairobi, May Examples Example: –