First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

Slides:



Advertisements
Similar presentations
SRI International Bioinformatics 1 Navigation to Related Objects Bioinformatics Research Group SRI International Mario Latendresse.
Advertisements

SRI International Bioinformatics Comparative Analysis Q
2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
Comparative genomics: Overview & Tools Urmila Kulkarni-Kale Bioinformatics Centre University of Pune, Pune
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Web Apollo Resources at the National Agricultural Library Christopher Childers NAL ARS USDA i5k.nal.usda.gov.
Gramene Comparative & Phylogenomics Resources for Plants Joshua C. Stein 1, William Spooner 1, Sharon Wei 1, Liya Ren 1, Doreen Ware 1,2 1 Cold Spring.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for.
Introduction to the Pathway Tools Software David Walsh and Simon Eng bigDATA Workshop—May 29, 2010.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
Update on The Pathway Tools Software Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org MetaCyc.org.
Enzymatic Function Module (KEGG, MetaCyc, and EC Numbers)
1 SRI International Bioinformatics Advanced PGDB Editing: Regulation GO Terms Ingrid M. Keseler Bioinformatics Research Group SRI International
Ch10. Intermolecular Interactions and Biological Pathways
1 SRI International Bioinformatics BioCyc Tutorial Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org,
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
SRI International Bioinformatics 1 Pathway Tools: Recent Developments GMOD Meeting, June 2006.
Metagenomic Analysis Using MEGAN4
Pathway Assignments. The assignment – Annotating Pathways KEGG Pathway Database.
Overview. What is Annotation? Annotation is the process of determining the location and function of all identifiable genes in a genome. Annotation is.
Biological Databases By : Lim Yun Ping E mail :
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli.
The BioCyc Collection of Pathway/Genome Databases Alexander Shearer Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org.
SRI International Bioinformatics 1 Recent Developments in Pathway Tools GMOD Workshop November ‘07 Suzanne Paley Bioinformatics Research Group SRI International.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International.
Metagenomic Analysis Using MEGAN4 Peter R. Hoyt Director, OSU Bioinformatics Graduate Certificate Program Matthew Vaughn iPlant, University of Texas Super.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Protein and RNA Families
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.
Sequencing the World of Possibilities for Energy & Environment MGM workshop. 19 Oct 2010 Information Sources for Genomics Konstantinos Mavrommatis Genome.
Pathway Tools Meeting - December 1, 2005, Geneva (SIB) Putting together synteny and metabolic information to achieve relevant expert annotation of microbial.
PPI team Progress Report PPI team, IDB Lab. Sangwon Yoo, Hoyoung Jeong, Taewhi Lee Mar 2006.
Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic.
Copyright OpenHelix. No use or reproduction without express written consent1.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
A collaborative tool for sequence annotation. Contact:
Web Apollo Resources at the National Agricultural Library Christopher Childers NAL ARS USDA i5k.nal.usda.gov.
1, StarOmics course,Lausanne, Monday November 19 th Training agenda Chemicals Reactions Enzymes Pathways.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
SRI International Bioinformatics 1 Editing Pathway/Genome Databases Ron Caspi.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
Copyright OpenHelix. No use or reproduction without express written consent1 1.
InterPro Sandra Orchard.
SRI International Bioinformatics 1 Pathway Tools Features Available Only in the Desktop Version PathoLogic.
Recent Developments and Future Directions in Pathway Tools Peter D. Karp SRI International.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Comparative Analysis in BioCyc
An Advanced Web Query Interface for Biological Databases
Demo: Protein Information Resource
Sequence based searches:
The Pathway Tools Schema
GEP Annotation Workflow
PIR: Protein Information Resource
INFORMATION FLOW AARTHI & NEHA.
Overview of Microbial Pathway and Genome Databases
Advanced PGDB Editing: Regulation GO Terms
PANTHER (Protein Analysis Through Evolutionary Relationships): Trees, Hidden Markov Models, Biological Annotations Paul Thomas, Ph.D. Division of Bioinformatics.
Annotation Presentation
Overview of the Pathway Tools Software and Pathway/Genome Databases
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope & CNRS UMR8030 MicroScope functionalities to support pathways curation

The MicroScope platform October 2002 : Begining of the Acinetobacter baylyi ADP1genome annotation Computational platform for the annotation and comparative analysis of bacterial genomes. - equipments (servers/disks storage/backups) - softwares and data - human resources (development/training/support) => it offers to the community of microbiologists high technological resources for the automatic and expert analysis of genomic data. Labelled in 2006 (RIO) and in 2009

493 in France 175 in Europe 81 in USA others countries 859 personal accounts { About 980 bacterial genomes : 345 genomes annotated in the system (mostly sequenced at Genoscope and in USA...) and 635 from public databanks About 980 bacterial genomes : 345 genomes annotated in the system (mostly sequenced at Genoscope and in USA...) and 635 from public databanks Since 2004, 33 ‘genome’ papers (4 announcements) Specific genomic analysis : 22 other publications Usage of the platform Expert annotations : Expert annotations : expert annotations 5000 expert annotations a month (2010)

Visualization Primary Databanks Internal Genomic Objects Computational results Pathway Genome DataBases PkGDB Data Management Process Management MaGe Web Interface MicroCyc JBPM Workflows DB Release JBPM Database Functional / relational Analyses Primary Databank Update Login Genome browser and Synteny maps Tutorial Artemis Data Export CGView LinePlot Genome overview Keyword search Blast and Pattern Phylogenetic Profile Fusion / Fission Tandem duplications Minimal Gene Set RGPfinder SNPs /InDels KEGG MicroCyc Metabolic Profile Pathway / Synteny Synton display Gene editor Job History Syntactic Annotations Gene cart Vallenet D, et al. «MaGe - a microbial genome annotation system supported by synteny results» Nucleic Acids Research 2006 Vallenet D. et al. «MicroScope - a platform for microbial genome annotation and comparative genomics» Database 2009 Three MicroScope components > 25 methods : => full automatisation : genome annotation genome annotation primary data up-to-date primary data up-to-date Integrated in a workflow management system

 Public tools : RepSeek (repeats), Oriloc (oriC/terC position), tRNAscan- SE (tRNA genes), Blast on Rfam (snRNA genes).  “homemade” tools : findrRNA (rRNA genes), AMIMat (gene models according to codon usage), AMIGene (based on GeneMark), MICheck (re- annotation of public bacterial genomes). Tools for the syntactic & functional annotation Syntactic annotation Functional annotation  Public tools : BLAST (searches in specialized databases and Uniprot), InterproScan (domains and functional sites), COGnitor (COG protein families), PRIAM (enzymatic functions), Pathway tools (metabolic pathways reconstruction), SignalP & TMHMM & PSORT (protein localisation).  “homemade” tools : Syntonizer (gene context analysis), and at the end, AutoFAssign, automatic functional annotation procedure : Blast on ‘reference genome annotations’ & syntenies > HAMAP results > TIGRfam/Pfam results & Blast on UniProt

Gene Ontogoly (GO classification) <- InterProScan results Classification of protein genes Functional classifications from annotation tools Functional classifications (Gene Editor) COG classification <- COGnitor results MultiFun (E. coli; M. Riley) TIGR main roles Inspired by the ‘protein name confidence’ defined in PseudoCAP = Pseudomonas aeruginosa community annotation project ( Other kind of classification

Results available to correct/complete annotation Annotations from reference genomes MicroScope curated annotations Synteny results on available complete bacterial genomes TrEMBL contains functional annotations which often come from automatic procedures only: ‘IPMed?’ is used for proteins that may have an experimentally validated function.

TrEMBL Blast similarities: example IPMed = Interesting PubMed?

One instance of PkGDB for all MicroScope projects  Collaborative annotation  Annotator accounts and rights on sequences Annotation history  Public/primary data  Data generated during the annotation process (analysis results and expert annotations) The MicroScope platform : data management -1- Data organisation and persistence : Relational DataBase PkGDB (Prokaryotic Genome DataBase)

EC numbers correspondence Bacterial Genome Pathway Tools A metabolic database is built for each annotated microbial genome PGDB = Pathway/Genome Database (orgname_Cyc) (P. Karp, SRI, USA) Experimentally elucidated metabolic pathways 1600 pathways from 2000 organisms Today: Today: 977 organisms, 20 Go The MicroScope platform : data management -2- Enzymatic activities prediction (PRIAM)

«Metabolic profiles» functionality Total number of reactions in pathway x Select organisms to compare Select pathway classes Number of reactions for pathway x in a given organism PkGDB

Metabolic phyloprofile : example of results

Using the “Keywords Search” functionality

Automatically annotated genes + validated genes Only all/personal validated genes Only annotations from databank files or from our annotation pipeline Gene/Protein features: G+C%, MW, Pi Specific fields of the gene editor: Comments/Note BlastP/Synteny results against: The set of genomes of the Microscope project Escherichia coli (updated annotation ) or Bacillus subtilis (SubtiList database) annotations The set of E. coli, B. subtilis, or P. aeruginosa essential genes Genes involved in synteny groups and annotated as Protein of Unknown Function or Putative enzyme The set of similarities obtained with different sources: - HAMAP High-quality Automated/Manual Annotation - SwissProt or TrEMBL databank, limited or not to blast hits having a possible interesting PubMedID - PRIAM enzymatic profiles (Enzyme commission), - COG databank, - InterPro databank Genes encoding enzymes involved in KEGG and BioCyc metabolic pathways The results obtained with SignalP, Tmhmm, PsortB and Coiled Coil Available datasets to be explored ?

Query on P. putida annotation Step1 : genes annotated as « unknown function » => 2093 results (35%) Step2 : which ones have blast similarities (<> unknown functions) with UnitProt entries linked to PubMedID ?

Results of the query... Result : 216 genes (123 in SP and 93 in TrEMBL) « Get gene » => 114 genes (can be re-annotated)

Syntaxic re-annotation of P. putida PSEPK386 8 Quinohemoprotein amine dehydrogenase PP3461 PP3460 PP3459 PP3462 PP3463 PP3464 PP3465 PSEPK3872 PSEPK3873 PP3466

Correspondence relationship = Sequence similarity : BlastP Bidirectional Best Hit OR at least 30% identity on 80% of the shortest sequence Co-localization Gap = 5 Bacterial synteny: parameters

A putative ortholog to ACIAD2440 on the E. coli genome ACIAD2450 A putative paralog to ACIAD2450 with two others co-localized ADP1 genes (in yellow) Another putative paralog to ACIAD2450, elsewhere on the ADP1 chromosome ACIAD2440 This P. putida « ortholog » (PP0114) is in synteny with two other genes (coloured in blue-purple). These two P. putida genes (PP0220 and PP4425) are similar to ACIAD2450 (putative paralogs of PP0114 ?) How to read the synteny maps ?

How are genes organized in a synteny group ? -2-

« Syntonome » results in the gene annotation editor PkGDB proteomes NCBI + WGS proteomes

KeyWords Blast / Motives Phylogenetic profiles Fusions / Fissions Genomic islands Metabolic profiles Exploration Synteny map MicroScope project Authentication CGView Artemis LinePlot Metabolic pathways Synton visualization Annotation editor EXPERT CURATION Help Export Options Genome Overview MicroScope web interfaces : MaGe

MicroScope tutorial

With the help of the Analysis Results section This automatic information does not need to be changed This information must be completed or corrected by the annotator This information is optional Annotation data in the ‘Gene Validation’ section of the editor

New

Adding gene-protein-reaction association (metacyc reactions) PP0082 = trpA gene List of the predicted reactions linked to the gene Click on EC to search for all MetaCyc reactions corresponding to the annotated EC number 1 2 3

Adding gene-protein-reaction association (metacyc reactions) PP0082 = trpA gene PP0083 = trpB gene Added for PP

David Vallenet Demo : please go to