First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope & CNRS UMR8030 MicroScope functionalities to support pathways curation
The MicroScope platform October 2002 : Begining of the Acinetobacter baylyi ADP1genome annotation Computational platform for the annotation and comparative analysis of bacterial genomes. - equipments (servers/disks storage/backups) - softwares and data - human resources (development/training/support) => it offers to the community of microbiologists high technological resources for the automatic and expert analysis of genomic data. Labelled in 2006 (RIO) and in 2009
493 in France 175 in Europe 81 in USA others countries 859 personal accounts { About 980 bacterial genomes : 345 genomes annotated in the system (mostly sequenced at Genoscope and in USA...) and 635 from public databanks About 980 bacterial genomes : 345 genomes annotated in the system (mostly sequenced at Genoscope and in USA...) and 635 from public databanks Since 2004, 33 ‘genome’ papers (4 announcements) Specific genomic analysis : 22 other publications Usage of the platform Expert annotations : Expert annotations : expert annotations 5000 expert annotations a month (2010)
Visualization Primary Databanks Internal Genomic Objects Computational results Pathway Genome DataBases PkGDB Data Management Process Management MaGe Web Interface MicroCyc JBPM Workflows DB Release JBPM Database Functional / relational Analyses Primary Databank Update Login Genome browser and Synteny maps Tutorial Artemis Data Export CGView LinePlot Genome overview Keyword search Blast and Pattern Phylogenetic Profile Fusion / Fission Tandem duplications Minimal Gene Set RGPfinder SNPs /InDels KEGG MicroCyc Metabolic Profile Pathway / Synteny Synton display Gene editor Job History Syntactic Annotations Gene cart Vallenet D, et al. «MaGe - a microbial genome annotation system supported by synteny results» Nucleic Acids Research 2006 Vallenet D. et al. «MicroScope - a platform for microbial genome annotation and comparative genomics» Database 2009 Three MicroScope components > 25 methods : => full automatisation : genome annotation genome annotation primary data up-to-date primary data up-to-date Integrated in a workflow management system
Public tools : RepSeek (repeats), Oriloc (oriC/terC position), tRNAscan- SE (tRNA genes), Blast on Rfam (snRNA genes). “homemade” tools : findrRNA (rRNA genes), AMIMat (gene models according to codon usage), AMIGene (based on GeneMark), MICheck (re- annotation of public bacterial genomes). Tools for the syntactic & functional annotation Syntactic annotation Functional annotation Public tools : BLAST (searches in specialized databases and Uniprot), InterproScan (domains and functional sites), COGnitor (COG protein families), PRIAM (enzymatic functions), Pathway tools (metabolic pathways reconstruction), SignalP & TMHMM & PSORT (protein localisation). “homemade” tools : Syntonizer (gene context analysis), and at the end, AutoFAssign, automatic functional annotation procedure : Blast on ‘reference genome annotations’ & syntenies > HAMAP results > TIGRfam/Pfam results & Blast on UniProt
Gene Ontogoly (GO classification) <- InterProScan results Classification of protein genes Functional classifications from annotation tools Functional classifications (Gene Editor) COG classification <- COGnitor results MultiFun (E. coli; M. Riley) TIGR main roles Inspired by the ‘protein name confidence’ defined in PseudoCAP = Pseudomonas aeruginosa community annotation project ( Other kind of classification
Results available to correct/complete annotation Annotations from reference genomes MicroScope curated annotations Synteny results on available complete bacterial genomes TrEMBL contains functional annotations which often come from automatic procedures only: ‘IPMed?’ is used for proteins that may have an experimentally validated function.
TrEMBL Blast similarities: example IPMed = Interesting PubMed?
One instance of PkGDB for all MicroScope projects Collaborative annotation Annotator accounts and rights on sequences Annotation history Public/primary data Data generated during the annotation process (analysis results and expert annotations) The MicroScope platform : data management -1- Data organisation and persistence : Relational DataBase PkGDB (Prokaryotic Genome DataBase)
EC numbers correspondence Bacterial Genome Pathway Tools A metabolic database is built for each annotated microbial genome PGDB = Pathway/Genome Database (orgname_Cyc) (P. Karp, SRI, USA) Experimentally elucidated metabolic pathways 1600 pathways from 2000 organisms Today: Today: 977 organisms, 20 Go The MicroScope platform : data management -2- Enzymatic activities prediction (PRIAM)
«Metabolic profiles» functionality Total number of reactions in pathway x Select organisms to compare Select pathway classes Number of reactions for pathway x in a given organism PkGDB
Metabolic phyloprofile : example of results
Using the “Keywords Search” functionality
Automatically annotated genes + validated genes Only all/personal validated genes Only annotations from databank files or from our annotation pipeline Gene/Protein features: G+C%, MW, Pi Specific fields of the gene editor: Comments/Note BlastP/Synteny results against: The set of genomes of the Microscope project Escherichia coli (updated annotation ) or Bacillus subtilis (SubtiList database) annotations The set of E. coli, B. subtilis, or P. aeruginosa essential genes Genes involved in synteny groups and annotated as Protein of Unknown Function or Putative enzyme The set of similarities obtained with different sources: - HAMAP High-quality Automated/Manual Annotation - SwissProt or TrEMBL databank, limited or not to blast hits having a possible interesting PubMedID - PRIAM enzymatic profiles (Enzyme commission), - COG databank, - InterPro databank Genes encoding enzymes involved in KEGG and BioCyc metabolic pathways The results obtained with SignalP, Tmhmm, PsortB and Coiled Coil Available datasets to be explored ?
Query on P. putida annotation Step1 : genes annotated as « unknown function » => 2093 results (35%) Step2 : which ones have blast similarities (<> unknown functions) with UnitProt entries linked to PubMedID ?
Results of the query... Result : 216 genes (123 in SP and 93 in TrEMBL) « Get gene » => 114 genes (can be re-annotated)
Syntaxic re-annotation of P. putida PSEPK386 8 Quinohemoprotein amine dehydrogenase PP3461 PP3460 PP3459 PP3462 PP3463 PP3464 PP3465 PSEPK3872 PSEPK3873 PP3466
Correspondence relationship = Sequence similarity : BlastP Bidirectional Best Hit OR at least 30% identity on 80% of the shortest sequence Co-localization Gap = 5 Bacterial synteny: parameters
A putative ortholog to ACIAD2440 on the E. coli genome ACIAD2450 A putative paralog to ACIAD2450 with two others co-localized ADP1 genes (in yellow) Another putative paralog to ACIAD2450, elsewhere on the ADP1 chromosome ACIAD2440 This P. putida « ortholog » (PP0114) is in synteny with two other genes (coloured in blue-purple). These two P. putida genes (PP0220 and PP4425) are similar to ACIAD2450 (putative paralogs of PP0114 ?) How to read the synteny maps ?
How are genes organized in a synteny group ? -2-
« Syntonome » results in the gene annotation editor PkGDB proteomes NCBI + WGS proteomes
KeyWords Blast / Motives Phylogenetic profiles Fusions / Fissions Genomic islands Metabolic profiles Exploration Synteny map MicroScope project Authentication CGView Artemis LinePlot Metabolic pathways Synton visualization Annotation editor EXPERT CURATION Help Export Options Genome Overview MicroScope web interfaces : MaGe
MicroScope tutorial
With the help of the Analysis Results section This automatic information does not need to be changed This information must be completed or corrected by the annotator This information is optional Annotation data in the ‘Gene Validation’ section of the editor
New
Adding gene-protein-reaction association (metacyc reactions) PP0082 = trpA gene List of the predicted reactions linked to the gene Click on EC to search for all MetaCyc reactions corresponding to the annotated EC number 1 2 3
Adding gene-protein-reaction association (metacyc reactions) PP0082 = trpA gene PP0083 = trpB gene Added for PP
David Vallenet Demo : please go to