Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.

Slides:



Advertisements
Similar presentations
STRING Prediction of protein networks through integration of diverse large-scale data sets Lars Juhl Jensen EMBL Heidelberg.
Advertisements

Weighing Evidence in the Absence of a Gold Standard Phil Long Genome Institute of Singapore (joint work with K.R.K. “Krish” Murthy, Vinsensius Vega, Nir.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Rich Probabilistic Models for Gene Expression Eran Segal (Stanford) Ben Taskar (Stanford) Audrey Gasch (Berkeley) Nir Friedman (Hebrew University) Daphne.
SPARCLE = SPArse ReCovery of Linear combinations of Expression Presented by: Daniel Labenski Seminar in Algorithmic Challenges in Analyzing Big Data in.
Ontology annotation: mapping genomic regions biological function Paul D Thomas, Huaiyu Mi and Suzanna Lewis.
Work Process Using Enrich Load biological data Check enrichment of crossed data sets Extract statistically significant results Multiple hypothesis correction.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Seeing the forest for the trees : using the Gene Ontology to restructure hierarchical clustering Dikla Dotan-Cohen, Simon Kasif and Avraham A. Melkman.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Gene Co-expression Network Analysis BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network.
Functional genomics and inferring regulatory pathways with gene expression data.
衛資所 生物資訊組 陳俊宇 April 07, 03. graph nodeedge Chromosomegenepositional correlations Pathwayenzymefunctional correlations Gene expression genecoexpressed.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces cerevisiae Speaker: Zhu YANG 6 th step, 2006.
The global transcriptional regulatory network for metabolism in Escherichia coli exhibits few dominant functional states Speaker: Zhu Yang
August 29, 2002InforMax Confidential1 Vector PathBlazer Product Overview.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Protein Homology Detection Using String Alignment Kernels Jean-Phillippe Vert, Tatsuya Akutsu.
Biological networks Construction and Analysis. Recap Gene regulatory networks –Transcription Factors: special proteins that function as “keys” to the.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.
Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.
Review of Ondex Bernice Rogowitz G2P Visualization and Visual Analytics Team March 18, 2010.
Cis-regulation Trans-regulation 5 Objective: pathway reconstruction.
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Gene Regulatory Network Inference. Progress in Disease Treatment  Personalized medicine is becoming more prevalent for several kinds of cancer treatment.
Microarrays to Functional Genomics: Generation of Transcriptional Networks from Microarray experiments Joshua Stender December 3, 2002 Department of Biochemistry.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Anis Karimpour-Fard ‡, Ryan T. Gill †,
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Learning Bayesian networks from postgenomic data with an improved structure MCMC sampling scheme Dirk Husmeier Marco Grzegorczyk 1) Biomathematics & Statistics.
1 CISC 841 Bioinformatics (Fall 2007) Kernel engineering and applications of SVMs.
Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.
Anis Karimpour-Fard 1, Corrella Detweiler 2, Ryan T. Gill 3, and Lawrence Hunter 1 1 University of Colorado School of Medicine 2 MCD-Biology, University.
Introduction to biological molecular networks
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
Shortest Path Analysis and 2nd-Order Analysis Ming-Chih Kao U of M Medical School
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,
(c) M Gerstein '06, gerstein.info/talks 1 CS/CBB Data Mining Predicting Networks through Bayesian Integration #1 - Theory Mark Gerstein, Yale University.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Motoki Shiga, Ichigaku Takigawa, Hiroshi Mamitsuka
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Semi-Supervised Clustering
Chapter 7. Classification and Prediction
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Research in Computational Molecular Biology , Vol (2008)
Bud Mishra Professor of Computer Science and Mathematics 12 ¦ 3 ¦ 2001
Machine Learning Basics
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
GENE ANNOTATION AND NETWORK INFERENCE BY PHYLOGENETIC PROFILING
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
Anastasia Baryshnikova  Cell Systems 
Visualization of Content Information in Networks using GlyphNet
Principle of Epistasis Analysis
Yamanishi, M., Itoh, M., Kanehisa, M.
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Presentation transcript:

Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris

Outline Motivation: metabolic network Motivation: metabolic network Method: network inference Method: network inference - Supervised network inference - Supervised network inference - Multiple data integration - Multiple data integration Application Application - Global network prediction - Global network prediction Concluding remarks Concluding remarks

Metabolic network The metabolic network consists of enzyme proteins and chemical compounds The metabolic network consists of enzyme proteins and chemical compounds 6018 genes in yeast genome 6018 genes in yeast genome 1120 genes with EC numbers 1120 genes with EC numbers 668 genes with pathway information 668 genes with pathway information (in the KEGG as of Sep. 2004) (in the KEGG as of Sep. 2004) Problem: unknown part of pathways and many missing enzyme genes Problem: unknown part of pathways and many missing enzyme genes

Network inference methods For gene regulatory network Bayesian network (Friedman et al., 2000, Imoto et al, 2002) Bayesian network (Friedman et al., 2000, Imoto et al, 2002) Boolean network (Akutsu et al., 2000) Boolean network (Akutsu et al., 2000) Graphical modeling (Toh et al., 2001) Graphical modeling (Toh et al., 2001) For protein interaction network Joint graph method (Marcotte et al., 1999) Joint graph method (Marcotte et al., 1999) Mirror tree method (Pazos et al., 2001) Mirror tree method (Pazos et al., 2001)

Objectives Develop a method to infer metabolic gene networks in a supervised context Develop a method to infer metabolic gene networks in a supervised context Integrate heterogeneous genomic data in the framework of network inference Integrate heterogeneous genomic data in the framework of network inference Reconstruct unknown pathways and identify genes for missing enzymes Reconstruct unknown pathways and identify genes for missing enzymes

Kernel in this study Kernel : representation of the similarity between two genes and (e.g., correlation coefficient) Kernel matrix: similarity matrix of a set of genes

An example of the kernel Suppose we have a set of genes x 1, x 2,…, x N and represent them by gene expression profiles

An example of kernel matrix This can be regarded as a kind of similarity matrix

Direct network inference Assumption: connected proteins in the network share high similarity in the data Similarity matrix based on a genomic dataset Configuration of genes

Direct network inference Assumption: connected proteins in the network share high similarity in the data Similarity matrix Predicted network

Direct network inference Assumption: connected proteins in the network share high similarity in the data Similarity matrix Predicted network

Direct network inference Assumption: connected proteins in the network share high similarity in the data Similarity matrix Predicted network

Direct network inference Assumption: connected proteins in the network share high similarity in the data Similarity matrix Predicted network

Direct network inference Assumption: connected proteins in the network share high similarity in the data Similarity matrix Predicted network

Direct network inference Assumption: connected proteins in the network share high similarity in the data Similarity matrix Predicted network

Direct network inference Assumption: connected proteins in the network share high similarity in the data Similarity matrix Predicted network

Direct network inference Assumption: connected proteins in the network share high similarity in the data Similarity matrix Predicted network

Direct network inference Assumption: connected proteins in the network share high similarity in the data Similarity matrix Predicted network

Evaluation of the direct approach: using gene expression data Gold standard data: metabolic network of 668 genes of the yeast in the KEGG/Pathway ROC curve False positives True positives 157 expriments (SMD)

Outline Motivation: metabolic network Motivation: metabolic network Method: network inference Method: network inference - Supervised network inference - Supervised network inference - Multiple data integration - Multiple data integration Application Application - Global network prediction - Global network prediction - Missing enzyme gene estimation - Missing enzyme gene estimation Concluding remarks Concluding remarks

An illustration of formalism Unknown pathway Protein network Similarity matrix in expression

An illustration of formalism Unknown pathway Protein network Similarity matrix in expression training

Supervised network inference :training set Original space Key idea: use of partially known network information

Supervised network inference :training set Original space : edge predicted by direct approach

Supervised network inference :training set Original space :true edge

Supervised network inference 1/2 Step 1: map proteins to a space, where interacting proteins are close to each other Feature space :training set Original space :true edge

Supervised network inference 2/2 Feature space :training set :test set Original space :true edge

Supervised network inference 2/2 Feature space Step 2: predict interacting protein pairs involving the test set :training set :test set Original space :true edge

Algorithm Kernel CCA (Yamanishi et al., 2004) Distance metric learning (Vert et al., 2004)

Result of the supervised learning: ROC curve by cross-validation Direct approachSupervised approach

Outline Motivation: metabolic network Motivation: metabolic network Method: network inference Method: network inference - Supervised network inference - Supervised network inference - Multiple data integration - Multiple data integration Application Application - Global network prediction - Global network prediction - Missing enzyme gene estimation - Missing enzyme gene estimation Concluding remarks Concluding remarks

Various genomic data Bit strings NumericalvectorsStructure Evolutionary similarity Co-localization similarity Co-expresion similarity Gene-gene relationship Data Phylogenetic profile Localization data Geneexpression

Data of the yeast S. cerevisiae Expression: 6059 genes with 157 experiments (SMD database) Expression: 6059 genes with 157 experiments (SMD database) Localization: 6059 proteins with 23 intracellular locations (Huh et al, 2003) Localization: 6059 proteins with 23 intracellular locations (Huh et al, 2003) Phylogenetic profile: 6059 proteins with 145 organisms (KEGG/Ortholog Cluster) Phylogenetic profile: 6059 proteins with 145 organisms (KEGG/Ortholog Cluster)

Gene expression profiles exp1 exp2 exp3 exp4 exp5 … exp P exp1 exp2 exp3 exp4 exp5 … exp P gene 1 (0.1, 0.4, 0.6, 0.2, -0.3, …, 1.5) gene 2 (0.2, 0.9, 1.8, 0.7, -0.3, …, 0.4) gene 3 (0.6, 0.7, -1.0, 0.8, 1.2, …, 0.6) … gene N (1.2, 0.3, 1.9, -0.1, -0.7, …, 0.1) Numerical vectors of the gene expression ratio gene Experiments (or time series)

Phylogenetic profiles org1 org2 org3 org4 org5 … org P org1 org2 org3 org4 org5 … org P gene 1 (1, 1, 0, 0, 0, …, 1) gene 2 (1, 0, 1, 0, 1, …, 0) gene 3 (0, 1, 0, 0, 1, …, 0) … gene N (1, 0, 1, 0, 0, …, 1) Bit strings in which the presence and absence of the genes are corded as 1 or 0 across organisms gene organism

An illustration of our network inference procedure Gene expression Protein localization Phylogenetic profile Gene network similarity matrix of genes INPUT OUTPUT infer

Data representation and integration Genomic dataSimilarity matrix

Evaluating the weight for each data source 1.Individual application to each data 2.Evaluation of its biological relevance by the ROC score ROC curve ROC score: area under the ROC curve

Evaluating the weight by the ROC scores For each data, compute the ROC score - 0.5, which are used as the weight ExpressionLocalizationPhylogenetic profile Evolutionary information seems to be useful

The resulting normalized weights: The effect of data integration ROC curve

Outline Motivation: metabolic network Motivation: metabolic network Method: network inference Method: network inference - Supervised network inference - Supervised network inference - Multiple data integration - Multiple data integration Application Application - Global network prediction - Global network prediction - Missing enzyme gene estimation - Missing enzyme gene estimation Concluding remarks Concluding remarks

Comprehensive prediction of a global gene network We predicted a network of 6059 genes Possible biological applications 1. Estimate unknown pathways 2. Predict biochemical function for hypothetical proteins 3. Identify missing enzyme genes

Prediction for a role in pathways YJR137C (the detail function was unknown as of Sep. 2003) is connected with EC: and EC: in the predicted network YJR137C (the detail function was unknown as of Sep. 2003) is connected with EC: and EC: in the predicted network

Recently, there has been a report that YJR137C is annotated as EC: Prediction for a role in pathways

Outline Motivation: metabolic network Motivation: metabolic network Method: network inference Method: network inference - Supervised network inference - Supervised network inference - Multiple data integration - Multiple data integration Application Application - Global network prediction - Global network prediction Concluding remarks Concluding remarks

Summary We developed supervised approaches to infer the metabolic network from multiple genomic data We developed supervised approaches to infer the metabolic network from multiple genomic data The accuracy improved from the supervised learning and the weighted data integration The accuracy improved from the supervised learning and the weighted data integration We showed some possibilities to obtain new biological findings We showed some possibilities to obtain new biological findings

Collaborator For the methods For the methods Jean-Philippe Vert (Ecole des Mines) Jean-Philippe Vert (Ecole des Mines) Minoru Kanehisa (Kyoto University) Minoru Kanehisa (Kyoto University) For the biochemical experiments For the biochemical experiments Hisaaki Mihara, Motoharu Ohsaki, Hisashi Muramatsu, Nobuyoshi Esaki (Kyoto University) Hisaaki Mihara, Motoharu Ohsaki, Hisashi Muramatsu, Nobuyoshi Esaki (Kyoto University)