陳虹瑋 國立陽明大學 生物資訊學程 Genome Engineering Lab. Genome Engineering Lab The Newest.

Slides:



Advertisements
Similar presentations
Editing Pathway/Genome Databases. SRI International Bioinformatics Pathway Tools Paradigm Separate database from user interface Navigator provides one.
Advertisements

How pathway databases were created and curated Peifen Zhang Plant Metabolic Network (PMN)
SRI International Bioinformatics Comparative Analysis Q
Biocyc.org Identify Pathway Hole Fillers Definition: Pathway Holes are reactions in metabolic pathways for which no enzyme is identified in the PGDB. holes.
Overview of the Pathway Tools Software and Pathway/Genome Databases.
SRI International Bioinformatics 1 The consistency Checker, or Overhauling a PGDB By Ron Caspi.
Curation of the EcoCyc Database: The EcoCyc Update Project Martha Arnaud Scientific Database Curator Bioinformatics Research Group SRI International
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Introduction to Bioinformatics - Tutorial no. 13 Probe Design Gene Networks.
Introduction to the Pathway Tools Software David Walsh and Simon Eng bigDATA Workshop—May 29, 2010.
Pathway databases Goto S, Bono H, Ogata H, Fujibuchi W, Nishioka T, Sato K, Kanehisa M. (1997) Organizing and computing metabolic pathway data in terms.
Pathway/Genome Databases and Software Tools Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International
國立陽明大學生資學程 陳虹瑋. Genetic Algorithm Background Fitness function ……. population selection Cross over mutation Fitness values Random cross over.
Update on The Pathway Tools Software Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org MetaCyc.org.
Creating a … Community Database Organism-Specific Database Model-Organism Database.
Enzymatic Function Module (KEGG, MetaCyc, and EC Numbers)
PathoLogic Pathway Predictor. SRI International Bioinformatics Inference of Metabolic Pathways Pathway/Genome Database Annotated Genomic Sequence Genes/ORFs.
Ch10. Intermolecular Interactions and Biological Pathways
1 SRI International Bioinformatics BioCyc Tutorial Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org,
SRI International Bioinformatics 1 Pathway Tools: Recent Developments GMOD Meeting, June 2006.
Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.
ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University.
BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli.
The BioCyc Collection of Pathway/Genome Databases Alexander Shearer Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org.
SRI International Bioinformatics 1 Recent Developments in Pathway Tools GMOD Workshop November ‘07 Suzanne Paley Bioinformatics Research Group SRI International.
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman & Mario Latendresse Bioinformatics Research Group SRI, International.
Tutorial on Current Biochemical Pathway Visualization Tools By Rana Khartabil.
SRI International Bioinformatics 1 Advanced Editing of Pathway/Genome Databases Ron Caspi.
PathoLogic Pathway Predictor
EBI is an Outstation of the European Molecular Biology Laboratory. Avazeh Ghanbarian Paul Kersey Alessandro Vullo EBI Microme Annotation Meeting June 2011.
SRI International Bioinformatics 1 Recent Pathway Tools Performance Enhancements (Versions 13.0 to 14.5) Bioinformatics Research Group SRI International.
The consistency Checker, or Overhauling a PGDB By Ron Caspi.
MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution.
1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International.
Top Four Essential TAIR Resources Debbie Alexander Metabolic Pathway Databases for Arabidopsis and Other Plants Peifen Zhang.
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
1 SRI International Bioinformatics And now for our ‘Feature’ presentation: Automatic Loading of Protein Sequence Annotation Data from UniProt to Pathway.
Copyright © 1997 Pangea Systems, Inc. All rights reserved. Pathway Tools Training Course.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Worldwide Protein Data Bank Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable.
Overview of the Pathway Tools Software and Pathway/Genome Databases Peter D. Karp Bioinformatics Research Group SRI International
SRI International Bioinformatics 1 The Structured Advanced Query Page Mario Latendresse Tomer Altman Bioinformatics Research Group SRI International March,
SRI International Bioinformatics 1 Editing Pathway/Genome Databases Ron Caspi.
1 AraCyc Metabolic Pathway Annotation. 2 AraCyc – An overview  AraCyc is a metabolic pathway database for Arabidopsis thaliana;  Computational prediction.
SRI International Bioinformatics 1 Pathway Tools Features Available Only in the Desktop Version PathoLogic.
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Mario Latendresse Bioinformatics Research Group SRI International April.
Plasmodium falciparum (3D7) - published in Draft coverage. No sequence updates for a year. No new annotation since? Leishmania major Friedlin - version.
SRI International Bioinformatics Selected PathoLogic Refining Tasks Creation of Protein Complexes Assignment of Modified Proteins Operon Prediction.
Recent Developments and Future Directions in Pathway Tools Peter D. Karp SRI International.
RDF based on Integration of Pathway Database and Gene Ontology SNU OOPSLA LAB DongHyuk Im.
PythonCyc and other APIs A Python package to access Pathway Tools and its data using the Python programming language Mario Latendresse March 2016.
Lecture 4.31 Protein Pathways and Pathway Databases Shan Sundararaj University of Alberta Edmonton, AB
Pathway Team SNU, IDB Lab. DongHyuk Im DongHee Lee.
PathoLogic Pathway Predictor
Why Create a PGDB? Perform pathway analyses as part of a genome project Analyze omics data Create a central public information resource for the organism,
The Pathway Tools FBA Module
The Pathway Tools Schema
PathoLogic: More about Matching Enzyme Names to Reactions
Building Metabolic Models
Bioinformatics Capstone Project
Department of Genetics • Stanford University School of Medicine
A Community Effort to Model the Human Microbiome
Overview of Microbial Pathway and Genome Databases
Incremental PathoLogic
Propagating Changed Annotation and Pathway Information
Annotation Presentation
The MultiOmics Explainer
Overview of the Pathway Tools FBA Module
Overview of the Pathway Tools Software and Pathway/Genome Databases
Presentation transcript:

陳虹瑋 國立陽明大學 生物資訊學程 Genome Engineering Lab

Genome Engineering Lab The Newest

What do they evaluate ?? They seek to determine the accuracy of different computational method for predict metabolic pathway. The Comparison between different reference PGDB and the same early prediction algorithm called HpyCyc-1 and HpyCyc-2 PGDB The Comparison between different prediction algorithm using MetaCyc DB as the reference PGDB called HpyCyc – 2A and HpyCyc – 2B Compare prediction HpyCyc - 2B with manual Prediction for H.pylori False positive Genome Engineering Lab

What is PGDB (Pathway/Genome Database ) A database that describes the genome of an organism (its chromosome(s), genes, and genome sequence), the product of each gene, the biochemical reaction(s) catalyzed by each gene product, the substrates of each reaction, and the organization of reactions into pathways. PGDB is not only the database, but also a software MetaCyc Database: A PGDB containing metabolic data for more than 150 organisms,include EcoCyc 。 EcoCyc Database: A PGDB for the organism E. coli. The majority of the information in EcoCyc is derived from the biomedical literature 。 Framed based Knowledge system and flat file to input. Genome Engineering Lab

Pathway Tools Software Pathway Tools Software: Software used to construct, update, visualize, query, and analyze PGDBs. The three components of the Pathway Tools are as follows : The Pathway/Genome Navigator supports querying, visualization, and analysis of PGDBs. The Pathway/Genome Navigator The Pathway/Genome Editors support interactive updating and refinement of PGDBs. The PathoLogic pathway-prediction program supports automated creation of a PGDB and prediction of the metabolic pathway complement of an organism. Genome Engineering Lab

The Pathologic program use to predict The Pathologic predict metabolic pathway of an organism from its annotated genome and produce new PGDB. It takes as input an annotated and Genebank format file. The second input required by Pathologic is the reference pathway DB such as EcoCyc, MetaCyc Genome Engineering Lab

Link enzyme to reaction algorithm : The matching process between the enzyme names and EC numbers listed in the annotated genome. The matching process is based on the functions assigned to individual genes by the genome center that annotated the genome. To Input LinkEnzymesToReactions(name,ecnum) that accepts as its inputs one or more alternative gene product name for a single enzymatic activity. The Pathologic Algorithm

The Pathologic Algorithm (cont.) And an EC number from a single Genbank coding region. It will return up to two reaction as its outputs that correspond to that enzyme activity. The Algorithm is as follows :

The Pathologic Algorithm Flow (Name, EC num) To Transfer Name to Cannon E Match by EC Number Find reaction in MetaCyc Or not Find reaction in MetaCyc Or not Store the reaction in R1 Yes If have Build hash table H in MetaCyc Enzyme nameReaction ……..…… To transfer Enzyme name and reaction In Canon To transfer Enzyme name and reaction In Canon Match or not Store the reaction in R2 Yes Compute variant form No

The Pathologic Algorithm Flow (cont.) If R1 <> R2 Report to user that enzyme name and EC Number are not inconsistent Yes No IF R1 Then Create a connection within the DB between The current enzyme and R1 IF R2 Then Create a connection within the DB between The current enzyme and R2

Why compute variant form?? In computeing variant forms of E the program attempts to remove various extraneous text that is too frequently found in Genbank format file. Such as : Prefix and Suffix words added to the enzyme name like “ putative“, “ probable”, “alpha chain”, “large subunit” etc parenthesized gene names that follow the product name in some Genebank entried. They still found that 10 – 20 % of the enzyme in a given genome are not identified because of not finding in H and depend on manual.

Infer pathway Once the matching process is complete, this program has inferred a set of reaction expected to occur in the target organism. The remaining task is to determine which pf those pathway are likely to be present in the organism.

The evidence for inferring pathway If there is evidence for some reaction s in a pathway,there are three possible interpretations :  The pathway is not present in the target organism.  A variant form of the pathway is present in the target organism that uses some but not all of the steps from the pathway, as described in the reference pathway.  The pathway is present in the target organism, but the genes for the missing reaction steps either have not been found by the name- matcher or have not yet been identified in the genome.

Result : Comparison of HpyCyc – 1 with.2A They expected that, by using MetaCyc as the reference DB, it would be able to infer additional pathways that are not found in E.Coli. With MetaCyc as the reference database, 135 pathways were predicted, as opposed to 77 pathways when EcoCyc was used to as the reference DB.

Result : Comparison of HpyCyc – 2A with.2B They created a new version of PathoLogic called PathoLogic 2, containing an algorithm that identifies false positive pathway. The enhanced algorithm removes only pathways that we believe to be false positive predictions.(Criteria) HpyCyc-2B contains 98 pathways, which is almost 30% fewer pathway than does HpyCyc – 2A.

Following heuristic criteria The pathway contained evidence for no unique reaction The pathway was classified as a biosynthetic pathway and was missing one or more steps from the end of the pathway. The pathway was classified as a degradation pathway and was missing one or more steps from its beginning. The pathway consisted of more than two reactions but contained evidence for only a single reaction.

Result : Comparison of HpyCyc – 2B with Manual We compared the 98 pathways predicted by the HpyCyc-2B with the result of a manual analysis of the pathways of H – Pylori

Discussion Since the enzyme to reaction matching procedure is fairly consertvative, we did not expect it to make many incorrect matches. The Genebank file contains few or no EC Numbers.

Comparsion with other pathway prediction algorithm The comparsion are hampered by two factors  first : published prediction algorithms are not clear  second: KEGG lie ……(about EC number and prediction) About WIT : It is not clear if the pathway prediction process is automated or manual. WIT does seem to be much selective in its prediction but It didn’t predict any obviously incorrect photosynthetic pathway for H.pylori. WIT failed to predict such pathway as glycolysis or Entner - Doudoroff

Conclusion This study validate the usefulness of pathologic as a tool for metabolic analysis of an organism’s annotated genome False positive would be increased from EcyC to MetaCyc False positive can be decreased by pathlogic -2 exceeding the expert analysis is comprehensiveness