1 SRI International Bioinformatics Large-Scale Metabolic Network Alignment: MetaCyc and KEGG Tomer Altman Bioinformatics Research Group SRI International.

Slides:



Advertisements
Similar presentations
SRI International Bioinformatics 1 Navigation to Related Objects Bioinformatics Research Group SRI International Mario Latendresse.
Advertisements

Editing Pathway/Genome Databases. SRI International Bioinformatics Pathway Tools Paradigm Separate database from user interface Navigator provides one.
SRI International Bioinformatics 1 A BRG Biofuels Metabolic Engineering Project Bioinformatics Research Group SRI International
The new JKlustor suite Miklós Vargyas Solutions for Cheminformatics.
ChEBI and SABIO-RK: Association of chemical compound information and reaction kinetics data Ulrike Wittig Scientific Databases and Visualization Group.
SRI International Bioinformatics Data Import / Export Markus Krummenacker Bioinformatics Research Group SRI, International Q
Instantiation of Generic Reactions by Markus Krummenacker Q
SRI International Bioinformatics Comparative Analysis Q
SRI International Bioinformatics 1 Orphan Enzymes Alexander Shearer, Tomer Altman, Anamika Kothari, Christian Ngo, Shahrzad Zarafshar.
EBI Proteomics Services Team – Standards, Data, and Tools for Proteomics Henning Hermjakob European Bioinformatics Institute SME forum 2009 Vienna.
SRI International Bioinformatics 1 The consistency Checker, or Overhauling a PGDB By Ron Caspi.
Using Pathway-tools for phenotype- directed curation Jeremy Zucker Broad Institute of MIT and Harvard Boston University.
Curation of the EcoCyc Database: The EcoCyc Update Project Martha Arnaud Scientific Database Curator Bioinformatics Research Group SRI International
Interoperation of Molecular Biology Databases Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International Menlo Park, CA
Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for.
Introduction to Bioinformatics - Tutorial no. 13 Probe Design Gene Networks.
Introduction to the Pathway Tools Software David Walsh and Simon Eng bigDATA Workshop—May 29, 2010.
August 29, 2002InforMax Confidential1 Vector PathBlazer Product Overview.
Pathway databases Goto S, Bono H, Ogata H, Fujibuchi W, Nishioka T, Sato K, Kanehisa M. (1997) Organizing and computing metabolic pathway data in terms.
Pathway/Genome Databases and Software Tools Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International
Update on The Pathway Tools Software Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org MetaCyc.org.
Methods and resources for pathway analysis PABIO590B Week 2.
Enzymatic Function Module (KEGG, MetaCyc, and EC Numbers)
Introduction to metabolism: Compounds, Reactions, Enzymes and Pathways Kristian Axelsen, Alan Bridge Elisabeth Coudert & Anne Morgat SIB Swiss Institute.
Ch10. Intermolecular Interactions and Biological Pathways
1 SRI International Bioinformatics BioCyc Tutorial Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org,
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University.
Copyright OpenHelix. No use or reproduction without express written consent1.
The BioCyc Collection of Pathway/Genome Databases Alexander Shearer Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org.
SRI International Bioinformatics 1 Recent Developments in Pathway Tools GMOD Workshop November ‘07 Suzanne Paley Bioinformatics Research Group SRI International.
Tutorial on Current Biochemical Pathway Visualization Tools By Rana Khartabil.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
The consistency Checker, or Overhauling a PGDB By Ron Caspi.
1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
1 SRI International Bioinformatics And now for our ‘Feature’ presentation: Automatic Loading of Protein Sequence Annotation Data from UniProt to Pathway.
SRI International Bioinformatics 1 SmartTables & Enrichment Analysis Peter Karp SRI Bioinformatics Research Group September 2015.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos.
An overview of Bioinformatics. Cell and Central Dogma.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
PlantCyc, AraCyc, PoplarCyc and more... Building databases with YOUR help at the Plant Metabolic Network kate dreher curator PMN/TAIR.
A database of biological pathways and processes (borrowed from a presentation created by Steve Jupe)
1, StarOmics course,Lausanne, Monday November 19 th Training agenda Chemicals Reactions Enzymes Pathways.
05/02/2008 Jae Hyun Kim Genome scale enzyme-metabolite and drug-target interaction predictions using the signature molecular descriptor Faulon, J. L.,
SRI International Bioinformatics Update your computers! To install a patch: Tools => Instant Patch => Download and Activate All Patches.
SRI International Bioinformatics 1 Editing Pathway/Genome Databases Ron Caspi.
1 AraCyc Metabolic Pathway Annotation. 2 AraCyc – An overview  AraCyc is a metabolic pathway database for Arabidopsis thaliana;  Computational prediction.
Use of Machine Learning in Chemoinformatics
Reconstructing the metabolic network of a bacterium from its genome: the construction of LacplantCyc Christof Francke In silico reconstruction of the metabolic.
SRI International Bioinformatics 1 Pathway Tools Features Available Only in the Desktop Version PathoLogic.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
SRI International Bioinformatics Selected PathoLogic Refining Tasks Creation of Protein Complexes Assignment of Modified Proteins Operon Prediction.
Recent Developments and Future Directions in Pathway Tools Peter D. Karp SRI International.
RDF based on Integration of Pathway Database and Gene Ontology SNU OOPSLA LAB DongHyuk Im.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
OncoTrack Bioinformatics Workshop Max Planck Institute for Molecular Genetics, Berlin Wednesday 6 th November 2013 TimeSubject 13:30-15:00 Introduction.
Pathway Team SNU, IDB Lab. DongHyuk Im DongHee Lee.
Classifying Chemistry: Current Efforts in Canada
Comparative Analysis in BioCyc
The Pathway Tools FBA Module
Open PHACTS 1.3 Release ( triples)
Bioinformatics Capstone Project
A Community Effort to Model the Human Microbiome
Predicting Active Site Residue Annotations in the Pfam Database
Comparative Analysis Q
Overview of Microbial Pathway and Genome Databases
Importing and Working with Data from the Metabolomics Workbench
Presentation transcript:

1 SRI International Bioinformatics Large-Scale Metabolic Network Alignment: MetaCyc and KEGG Tomer Altman Bioinformatics Research Group SRI International

2 SRI International Bioinformatics Problem Motivation There are an increasing number of ‘encyclopedic’ metabolic networks, or reaction databases KEGG and MetaCyc, plus Rhea, BRENDA, and GO A natural question to ask is, “what is similar / different between them?” There has been some linking of MetaCyc compounds to KEGG, but none for reactions

3 SRI International Bioinformatics Challenges with Mapping Objects Multiple aspects to compare (name, chemical structure, reaction substrates, external identifiers) Inexact naming Inexact structures (different specificity of stereocenters) Inexact description of reactions (classes vs. instances, proton-balancing) How to combine the evidence in a logical fashion

4 SRI International Bioinformatics Evidence / Prediction Data Structure Prediction type Predictor name MetaCyc Object Frame ID KEGG Object identifier Iteration number Parameters used

5 SRI International Bioinformatics Compound Evidence Curated MetaCyc links to KEGG Name matching PubChem identifier mapping (used for ChEBI as well) Molecular Fingerprint Tanimoto Similarity Coefficient InChI string comparison Exact Sub-Structure Match (no stereochemistry) ‘All-but-one’ inference

6 SRI International Bioinformatics Compound Prediction Detail: ‘All-but- one’ Most of the compounds between these two reactions are the same Class vs. instance, and naming issues lead to unknown match between “acceptor” and “oxidized electron acceptor”

7 SRI International Bioinformatics Reaction Evidence EC Numbers UniProt Accession Numbers Name matches (gleaned from associated objects) Exact equation match Inexact equation match (cosine similarity)

8 SRI International Bioinformatics Reaction Prediction Detail: UniProt Mapping Use UniProt Accession numbers to map the enzymes in MetaCyc and KEGG to one another Use UniRef 90 or 100 to map “the same protein” when not exact same Accession Number

9 SRI International Bioinformatics From Evidence to Prediction Rule #1: Evidence is partitioned into exact and inexact quality types Rule #2: Evidence is partitioned (orthogonally) into qualitative and quantitative types Rule #3: A high-quality prediction will consist of qualitative and quantitative evidence in favor Rule #4: We try exact quality types first, then inexact Rule #5: No contradictory predictions Rule #6: Iterate until no new predictions This is the general idea, but implementation has some domain-based heuristics thrown in, such as with EC numbers

10 SRI International Bioinformatics Statistics 3269 MetaCyc reactions with links to KEGG (~75%) 4004 MetaCyc compounds with links to KEGG (>50%)

11 SRI International Bioinformatics Future Work Greatest Common Subgraph Compound Matching Sensitivity / Specificity characterization of all evidence types Analysis of unmatched content of KEGG and MetaCyc for algorithm improvement and focused curation (inexact reaction matching and novel reaction import) Hierarchical clustering of compounds and reactions A general tool that can be used for any two given metabolic networks Recasting algorithm in a machine learning framework

12 SRI International Bioinformatics Acknowledgements Peter Karp Douglas Brutlag Dan Davison Anamika Kothari Joseph Dale MetaCyc.org

13 SRI International Bioinformatics … One more thing! Pathway Tools API (beta)