ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University.

Slides:



Advertisements
Similar presentations
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
Advertisements

The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Profiles for Sequences
درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.
BIOINFORMATICS Ency Lee.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Structural bioinformatics
KEGG: Kyoto Encyclopedia of Genes and Genomes Susan Seo Intro to Bioinformatics Fall 2004.
Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Introduction to Bioinformatics - Tutorial no. 13 Probe Design Gene Networks.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network.
Bioinformatics and Phylogenetic Analysis
Pathway databases Goto S, Bono H, Ogata H, Fujibuchi W, Nishioka T, Sato K, Kanehisa M. (1997) Organizing and computing metabolic pathway data in terms.
Protein Modules An Introduction to Bioinformatics.
陳虹瑋 國立陽明大學 生物資訊學程 Genome Engineering Lab. Genome Engineering Lab The Newest.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Protein Classification A comparison of function inference techniques.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Enzymatic Function Module (KEGG, MetaCyc, and EC Numbers)
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
Ch10. Intermolecular Interactions and Biological Pathways
Metagenomic Analysis Using MEGAN4
Functional Associations of Protein in Entire Genomes Sequences Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding.
HOGENOM a phylogenomic database
Bioinformatics Dr. Víctor Treviño BT4007
Gene Regulatory Network Inference. Progress in Disease Treatment  Personalized medicine is becoming more prevalent for several kinds of cancer treatment.
Networks and Interactions Boo Virk v1.0.
Improving Gene Function Prediction Using Gene Neighborhoods Kwangmin Choi Bioinformatics Program School of Informatics Indiana University, Bloomington,
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Tutorial on Current Biochemical Pathway Visualization Tools By Rana Khartabil.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
BioPerf: A Benchmark Suite to Evaluate High- Performance Computer Architecture on Bioinformatics Applications David A. Bader, Yue Li Tao Li Vipin Sachdeva.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Protein and RNA Families
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Sequencing the World of Possibilities for Energy & Environment MGM workshop. 19 Oct 2010 Information Sources for Genomics Konstantinos Mavrommatis Genome.
Structural Models Lecture 11. Structural Models: Introduction Structural models display relationships among entities and have a variety of uses, such.
An overview of Bioinformatics. Cell and Central Dogma.
Protein Domain Database
Copyright OpenHelix. No use or reproduction without express written consent1.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Reconstructing the metabolic network of a bacterium from its genome: the construction of LacplantCyc Christof Francke In silico reconstruction of the metabolic.
Bioinformatics Research Overview Li Liao Develop new algorithms and (statistical) learning methods > Capable of incorporating domain knowledge > Effective,
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
RDF based on Integration of Pathway Database and Gene Ontology SNU OOPSLA LAB DongHyuk Im.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Networks and Interactions
Demo: Protein Information Resource
Bioinformatics Capstone Project
Sequence Based Analysis Tutorial
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Sequence Based Analysis Tutorial
Presentation transcript:

ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University

Contents Introduction System Components Current Implementation Experiment Result Future Plan

INTRODUCTION & SYSTEM COMPONENTS

Introduction ComPath is a web-based sequence analysis system built upon: KEGG (Kyoto Encyclopedia of Genes and Genomes) PLATCOM (A Platform for Computational Comparative Genomics)

KEGG Kyoto Encyclopedia of Genes and Genomes Four Databases PATHWAY 32,657 pathways generated from 262 reference pathways GENES 1,213,035 genes in 32 eukaryotes bacteria + 24 archaea LIGAND 13,387 compounds, 2,543 drugs, 11,161 glycans, 6,446 reactions BRITE 7,817 KO (KEGG Orthology) groups KEGG adopts EC enzyme classification system

EC system 0/2 An Old, but still universally accepted system by biochemists EC system was developed long before protein sequence or structure information were available, so the system focuses on reaction, not sequence homology and structure Many biochemists and structural biologists try to harmonize newly available chemical, sequential, and structural data with traditional understanding of enzyme function.

Problems in EC system 1/2 Inconsistency in the EC hierarchy For each of the six top-level EC classes, subclasses and sub- subclasses may have different meanings. e.g. EC1.* are divided by substrate type, but EC5.* by general isomerase type Problem with Multi-functional enzymes and multiple subunits involved in a function EC presumes only a 1:1:1 relationship between gene, protein, and reaction. Different sequence/structure, but similar EC Two enzymes with lower sequence identities sometimes belong to the same or very similar EC. e.g. o-succinylbenzoate synthase across several bacteria have below the 40% sequence identity

Problems in EC system 2/2 Similar sequence/structure, but different EC Even variation in the fourth digit of the EC number is rare above a sequence identity threshold of 40%. However, exceptions to this rule are prevalent. e.g. melamine deaminase and atrazine chlorohydrolase have 98% identical, but belong to different EC. No information on sequence/structure-mechanism relationship EC system considers only overall transformation Similarity among sequences is strongly correlated with similarities in the level of a common (structural domain-related) partial reaction, rather than overall transformation How to combine enzyme structure data with partial reaction data? Research Goal We provide a computational environment for enzyme analysis via genome comparison And it will be built on PLATCOM system

Our Research Goal We provide a computational environment for enzyme analysis via multiple genome comparison And it will be built on PLATCOM system

PLATCOM A Platform for Comparative Genomics Providing a platform for comparative genomics ON THE WEB Comparative analysis system for users to freely select any sets of genomes Scalable system interactively combining high-performance sequence analysis tools

CURRENT IMPLEMENTATION

ComPath Compatative Metabolic Pathway Analyzer ComPath = KEGG + PLATCOM Not just for retrieving information from Database, but focuses on analyzing enzymes using the enzyme-genome table Easy to use {Optional} Upload a user sequence and/or a saved enzyme- genome table data Select a metabolic pathway Select any combination of genomes in KEGG Create an enzyme-genome table Then use the table for various enzyme sequence analysis tasks

Screenshot: Pathway Selection 11 categories 123 pathways Users can upload the previous Enzyme-Genome table datatype to continue analysis

Screenshot: Genome Selection 250 genomes from KEGG database Users can select genomes by taxonomical and alphabetical order

Screenshot: Operations 1/2

Screenshot: Operations 2/2

Enzyme-Genome Table An enzyme-genome table allows for tests on whether a certain enzyme in a given pathway is present or missing using sequence analysis techniques. Information in this table can be easily saved, uploaded, transferred. Users also can upload their sequence set, e.g., an entire set of predicted proteins in a newly sequenced genome, and perform annotation of the sequences in terms of KEGG pathways.

Screenshot: ComPath’ Enzyme-Genome Table – INTERACTIVE!

Screenshot: KEGG’s Ortholog Table – STATIC!

How ComPath works: Overall Design

How ComPath works:

Sequence Analyses Missing enzyme search Pairwise (FASTA) and multiple sequence alignment (CLUSTALW), Domain search using SCOPEC/SUPERFAMILY and PDB domains Domain-based analysis using hidden markov models (HMM), Contextual sequence analysis Sequence analysis for further investigation Phylogenetic analysis of enzymes in selected genomes, Gibbs motif sampler. BAG clustering Contextual sequence analysis Global network analysis Physical network analysis

TEST

Experiments: Genomes, Queries, Pathways Selected Genomes B.subtilus, B.Halodurans, E.coli H.Influenza, H.pylori, M.genitalium, Y.pestis KIM Query genomes M.tuberculosis A.aeolicus B.anthracis Metabolic Pathways (glycolysis+glycogenesis), (TCA cycle)

Experiments: Comparison of Sequence Analysis Methods Four methods (abbr.) HMMer HMM search using the whole sequence CSR HMM search using common shared regions generated by BAG program SCOPEC Domain search using SCOP/SUPERFMAILY and PDB database FASTA Simple FASTA search Cutoff 1e-10, 1e-20, 1e-30

Experiments: Overall Design

Screenshot: ComPath’ Enzyme-Genome Table – INTERACTIVE!

Experiment Results (e.g.) Query GenomePathwayMehthodSensitivitySpecificityE-value M. tuberculosisPath 00010HMMer E-30 CSR E-30 SCOPEC E-30 FASTA E-30 HMMer E-10 CSR E-10 SCOPEC E-10 FASTA E-10 Query GenomePathwayMethodSensitivitySpecificityE-value M. tuberculosisPath 00020HMMer E-30 CSR E-30 SCOPEC E-30 FASTA E-30 HMMer E-10 CSR E-10 SCOPEC E-10 FASTA E-10

A. aeolicus

B. anthracis

M. tuberculosis

FUTURE PLAN

Future Plan: More Resources ComPath is being extended to incorporate more resources, including KEGG LIGAND : A composite database consisting of compound, glycan, reaction etc. ProRule : A new database containing functional and structural information on PROSITE profiles SFLD : Structure-Function Linkage Database Also we are developing databases and algorithms for enzyme analysis, e.g. Classifiers using a database of enzyme-specific HMMs. ComPath is in an early stage of system development and we solicit feedback and suggestions from biology and bioinformatics communities.

Future Plan: More Algorithms and Tools More integrative understanding on biochemical network evolution Algorithms to handle isozyme problem Algorithms to computationally reconstruct alternative pathways Algorithms to combine sequence, structure, chemical reaction, and contextual information for better enzyme annotation Etc. ComPath is in an early stage of system development and we solicit feedback and suggestions from biology and bioinformatics communities.