Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pathway Tools Meeting - December 1, 2005, Geneva (SIB) Putting together synteny and metabolic information to achieve relevant expert annotation of microbial.

Similar presentations


Presentation on theme: "Pathway Tools Meeting - December 1, 2005, Geneva (SIB) Putting together synteny and metabolic information to achieve relevant expert annotation of microbial."— Presentation transcript:

1 Pathway Tools Meeting - December 1, 2005, Geneva (SIB) Putting together synteny and metabolic information to achieve relevant expert annotation of microbial genomes Putting together synteny and metabolic information to achieve relevant expert annotation of microbial genomes Dr Claudine Médigue & :

2  Its development started in Oct. 2002 Context : the Acinetobacter sp. ADP1 genome annotation (Summer 2004) What is MaGe ? Yet another bacterial annotation platform !…  An automatic annotation process :  Shares functionalities with other existing annotation systems :  A relational database (MySQL) used to store the sequences and the analysis results. Syntaxic and functional annotations Functional annotation and classification inferences  A WEB interface allowing multiple users to simultaneously annotate a genome.  Connectivity to other databases or systems  Developed by biologists involved in manual expert annotation  Graphical interface which focuses on gene context and synteny results with available bacterial proteomes.

3 Relational SGBD (MySQL) Relational SGBD (MySQL) Purpose: storage of ‘clean’ and complete annotation data which are subsequently used in the genomic comparative analysis. Annotation tool results : Annotation tool results : Intrinsic: genes, signals, repeats,… New bacterial genomes (annotation projects) New bacterial genomes (annotation projects) Extrinsic : BLAST, InterPro, COG, synteny … Introduction to the Prokaryotic Genome DataBase (PkGDB) Complete bacterial genomes (Refseq NCBI and Genome Review EBI) Complete bacterial genomes (Refseq NCBI and Genome Review EBI) Integration in PkGDB Management of frameshifts Correction of obvious errors Syntactic re-annotation Add missing gene annotations NAR (WS), 2003 NAR (WS), 2005

4 Simplified structure of PkGDB Genomic Objects Automatic and manual functional assignations Published genomesNewly sequenced genomes Gene prediction AMIGene Re-annotation project Annotation project Annotation history Sequence updates and annotation transfer Functional Classification Annotator management Functional predictions Orthologs & Paralogs Syntenies Protein similarities Domains and motifs Enzymatic functions  helixes and signal peptides Uniprot KEGG COG Interpro Reference annotation for model organisms Specific regions Ecogene Geneprotec Subtilist Genome Reviews NCBI RefSeq Annotation management MultiFun GeneOntology BioCyc Project customization Multiple correspondences Local rearrangements (ins/del) Boyer et al. Bioinformatics (Nov 2005)

5 How to read the synteny maps ? ACIAD0574 hutH Two ‘homologs’ to ACIAD0574 on the P. aeruginosa genome This P. syringae gene (PSPTO0599/hutH-1) is a putative ‘ortholog’ to ACIAD0574 and is involved in a synteny group containing 17 genes (in green) These two P. syringae genes (PSPTO5274/hutH-2 and 5276/ hutH-3) are similar to ACIAD0574 (putative paralogs of PSPTO0599)

6 A larger view of the previous Acinetobacter ADP1 region 0574 hutH 0582-0583 fabG-fabF 0562 hisS 4 of 138 genomes in PkGDB 9 of 284 complete microbial proteomes (RefSeq section)

7 How are genes organized in a synteny group ? Synteny with Ralstonia solanacearum Mega Plasmid Synteny with Ralstonia solanacearum chromosome

8 Synteny maps are useful to annotate gene fusion/fission Colored rectangles represent the part of the protein which aligns with the corresponding Acinetobacter protein. Fusion of genes involved in DNA replication dnaQ (DNA polIII, epsilon subunit + proofreading 3’-5’ exonuclease) rnhA (degradation of Okazaki fragments) (dnaQ) YPO1082 YPO1081 (rnhA) (dnaQ) STM0264 STM0263 (rnhA) (dnaQ) NMB1514 (rnhA) NMB1618 (dnaQ) PA1816 PA1815 (rnhA) (dnaQ) PSPTO3711 PSPTO3712 (rnhA)

9 Genomic Objects Automatic and manual functional assignations Published genomesNewly sequenced genomes Gene prediction AMIGene Re-annotation project Annotation project Annotation history Sequence updates and annotation transfer Functional Classification Annotator management Functional predictions Protein similarities Domains and motifs Enzymatic functions  helixes and signal peptides Uniprot KEGG COG Interpro Reference annotation for model organisms Ecogene Geneprotec Subtilist Genome Reviews NCBI RefSeq Annotation management MultiFun GeneOntology BioCyc Functional Classification Annotator management Orthologs & Paralogs Syntenies Reference annotation for model organisms Specific regions Ecogene Geneprotec Subtilist MultiFun GeneOntology Project customization Simplified structure of PkGDB PRIAM http://bioinfo.genopole- toulouse.prd.fr/priam/ Position-specific scoring matrices ('profiles') built with SwissProt proteins www.genome.jp/kegg/ Dynamic requests Local installation http://www.biocyc.org/

10 Setting up a new annotation project : an example Newly sequenced genomes Bradyrhizobium sp. ORS278 (Genoscope) -> 1 chr (7,5 Mb) Bradyrhizobium sp. BTAi (DOE/JGI) -> 1 chr (8,5 Mb) Genomes in public DataBanks Mesorhizobium loti (00) Sinorhizobium meliloti (01) Bradyrhizobium japonicum (02) Rhodopeudomonas palustris (03 ) Available related sequences Rhizobium leguminosarum (Sanger Center) Rhodobacter sphaeroides (DOE/JGI) Rhodospirillum rubrum (DOE/JGI) Complete pipeline of automatic annotations Re-annotation process (pseudogenes, missing genes) Automatic syntaxic annotations (in some cases, functional annotations) Searching for synteny groups with complete proteomes available in RefSeq section (NCBI, 284 to date) and in PkGDB (curated genomes, 138 to date) PkGDB AcinetoScope RhizoScope YersiniaScope ColiScope CloacaScope FrankiaScope Pathway Tools BradyBTCyc BradyORCyc Metabolic pathway reconstruction BrajapCyc Ocelot object model RhizoCyc BioWareHouse relational model

11 14 43 127 873 897 830 16 76 30 724 Bradyrhizobium sp. ORS278 Bradyrhizobium sp. BTAi Bradyrhizobium japonicum USDA 110 Comparative Metabolic Capabilities : an example Reaction content comparisons between the 3 Bradyrhizobium organisms (BioWareHouse SQL query on reactions having gene-> protein->reaction correspondences )

12 BRAOR5771-5772 - 5773 Bradyrhizobium ORS278 region containing CDS 5771&5772 !!! ??? “Cloning and Characterization of the Genes Encoding Enzymes for the Protocatechuate Meta-degradation Pathways of Pseudomonas ochraceae NGJ1” Maruyama et al. (2004) Biosci. Biotechnol. Biochem, 68, 1434-1441. 15277747

13 AUTOmatic vs EXPert annotation of the region BRAOR5770 BRAOR5771 BRAOR5772 BRAOR5773 BRAOR5774 BRAOR5775 BRAOR5776 AUTO = PRODUCT EC-number Gene Evidence 4-carboxy-2-hydroxymuconate-6-semialdehyde dehydrogenase EXP 1.1.1.18 ligC BLAST R. palus PRIAM (medium) 4-carboxy-2-hydroxymuconate-6-semialdehyde dehydrogenase 1.2.1.45 ligC BLAST P. testosteroni Publication + Enzyme Protochatechuate 4,5-dioxygenase, alpha subunit 1.13.11.8 ligB BLAST R. palus PRIAM (high) AUTO EXP AUTO = EXP Protochatechuate 4,5-dioxygenase, beta subunit 1.13.11.8 ligA BLAST R. palus PRIAM (high) 2-pyrone-4,6-dicarboxylic acid hydrolase none ligI BLAST R. palus 3.1.1.57 ligI BLAST R. palus Publication + Enzyme AUTO EXP 2-pyrone-4,6-dicarboxylic acid hydrolase Putative dehydrogenase none BLAST R. palus AUTO none 1.1.1.- BLAST R. palus InterproScan EXP none Putative dehydrogenase with NAD binding protein Putative acyl transferase none BLAST R. palus AUTO fidZ 4.1.3.17 BLAST P. ochraceae Publication + Enzyme EXP ligK 4-hydroxy-4-methyly-2-oxoglutarate aldolase 4-oxalomesaconate hydratase none ligJ BLAST R. palus 4.2.1.83 ligJ BLAST R. palus Publication + Enzyme AUTO EXP 4-oxalomesaconate hydratase

14 Bradyrhizobium ORS278 region after expert annotation ligC 1.2.1.45 4.1.3.17 BRAOR5770 4.2.1.83 ligBA 1.13.11.8 BRAOR5771-72 BRAOR5773 ligI 3.1.1.57 BRAOR5775 ligK ligJ BRAOR5776 BRAOR5777 BRAOR5778

15 Connectivity to KEGG database Enzymes encoded by genes in the MaGe region Enzymes encoded by genes elsewhere in the Bradyrhizobium genome Additional enzymes in E. coli 4.2.1.83 ?

16 Connectivity to KEGG database Enzymes encoded by genes in the MaGe region Enzymes encoded by genes elsewhere in the Bradyrhizobium genome Additional enzymes in E. coli

17 5771 5775 5772 5773 Bradyrhizobium ORS278 region after expert annotation 5770 5776 BRAOR5770_ligC 4-carboxy-2-hydroxymuconate 6-semialdehyde dehydrogenase 1.2.1.45 BRAOR5776_ligJ 4-oxalmesaconate hydratase 4.2.1.83 The reactions catalyzed by 1.2.1.45 and 4.2.1.83 exist in MetaCyc but they are not involved in a pathway. Probable protochatechuate transporter Probable transcriptional regulator of protochatechuate degradation BRAOR5777 BRAOR5778 ligR

18 Enzymatic activity predictions (PRIAM) : some results  Comparison of PRIAM predictions [P] and Expert annotations [E] Nb EC_[P] vs EC_[E] Total genes 3325 1012 / 947 Acinetobacter ADP1 Pseudoalteromonas haloplanktis Frankia alni Pseudomonas entomophila 3514 927 / 993 6861 1729 / 1498 5182 1455 / 1232 EC_[P] = EC_[E] 632 (62.5%) 47 (4.6%) EC_[P] (3 digit) = EC_[E] 697 (75.2%) 23 (2.5%) 912 (52.8%) 68 (3.9%) 820 (56.3%) 46 (3.2%) EC_[P] <> EC_[E] 111 (11.7%) EC_[P] & (NO EC_[E]) 131 (12.9%) 202 (20.0%) EC_[E] & (NO EC_[P]) 152 (15.3%) 102 (11.0%) 105 (11.3%) 111 (7.4%) 401 (23.2%) 348 (20.1%) 90 (7.3%) 285 (19.6%) 304 (20.9%)  Limitations of PRIAM sequence-based enzyme prediction Availability of at least one UniProt/SwissProt sequence in the Enzyme entry ! Existence of closely related enzymes with different substrate specificity Several wrong predictions in case of Medium/Low PRIAM confidence Relaxed substrate specificity exhibited by some enzymes

19 PGDBs built at Genoscope Automatic updates of PathoLogic predictions : every week MaGe’s training courses include a quick overview of how to explore PathoLogic results to perform relevant expert annotation The number of enzymes and pathways is slightly greater in our PGDBs (source of annotations + process of Pathologic file format generation) Important discrepancies with Sinorhizobium meliloti (44 predicted pathways in the SRI/EBI PGDB vs 259 in the Genoscope PGDB) 18 PGDBs : other published bacterial genomes 25 PGDBs for newly sequenced and annotated bacterial genomes  Our PGDBs are currently available in the MaGe’s interface NO curation to date (Tier 3* Databases) ( except for Acinetobacter ADP1-> Metabolic Thesaurus project ) HomePage : http://www.genoscope.cns.fr/agc/mage/ «Expansion of the BioCyc collection of pathway/genome databases to 160 genomes» Karp et al. Nucleic Acid Research, 2005, 33: 6083-6089.  To date : about 60 Tier 3 PGDBs 16 PGDBs common to SRI/EBI PGDBs Tier3* (and 4 with Tier2 * ) : *Tier 3: Computationally-Derived Databases Subject to No Curation *Tier 2: Computationally-Derived Databases Subject to Moderate Curation

20 Some Questions / Perspectives  Better correspondences between BioCyc and MaGe Optional fields in the PathoLogic file format (PubMedID, Funcat, …)  How to tackle the pseudogene information ? Pathway X doesn’t exist because No enzyme has been found Some enzymes correspond to pseudogenes Remove false-positive pathway (Tier 3 -> Tier2)  Curation of PGDB ? Automatic reduction of false positive pathway predictions stored in the PGDBs Integration and evaluation of Pathway Hole Filler Finding a way to get a list of false positive pathways at the end of the manual process of annotation. Tier2 -> Tier1*, especially creation of new metabolic pathways : PGDBs freely available for «adoption» by biologists !!! Not an easy task !!! (a strong knowledge of metabolism is required) *Tier1: Intensively Curated Databases

21 Metabolic Thesaurus project at Genoscope Annotation Knock-out collection 2240 ADP1 genes knocked out Metabolism prediction Vincent Schächter’s bioInformatic team Flux Models Model Network reconstruction Biological evidence Accurate phenotyping Systematic phenotyping Transcriptome analyses Biochemical studies Functional complementation Véronique de Berardinis’s team 3325 Acinetobacter ADP1 annotated genes

22 Metabolic Pathway Reconstruction / Experimental Data Metabolic Thesaurus ColiScope Acinetobacter ADP1 KO collection Sequencing of 2 commensal and 4 pathogenic E. coli strains Phenotypic analysis: growth essay on different nutrient sources + Metabolome analysis: LC/MS and CE/MS Data Integration and Comparative Analysis Evolution of metabolic capabilities => adaptation of microorganisms commensalism / virulence emergence Linked enzymatic activity to genes of unknown function

23 Participating teams David Vallenet Stéphane Cruveiller  AGC team : Zoé Rouy Aurélie Lajus  Genoscope informatic system team Laurent Sainte-Marthe Claude Scarpelli Sylvain Bonneval  … and with the help of : François Lefèvre (V. Schächter team)  Mage’s users feedback helps in improving many functionalities of our system ! Claudine Médigue


Download ppt "Pathway Tools Meeting - December 1, 2005, Geneva (SIB) Putting together synteny and metabolic information to achieve relevant expert annotation of microbial."

Similar presentations


Ads by Google