Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.

Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris

Outline Motivation: metabolic network Motivation: metabolic network Method: network inference Method: network inference - Supervised network inference - Supervised network inference - Multiple data integration - Multiple data integration Application Application - Global network prediction - Global network prediction Concluding remarks Concluding remarks

Metabolic network The metabolic network consists of enzyme proteins and chemical compounds The metabolic network consists of enzyme proteins and chemical compounds 6018 genes in yeast genome 6018 genes in yeast genome 1120 genes with EC numbers 1120 genes with EC numbers 668 genes with pathway information 668 genes with pathway information (in the KEGG as of Sep. 2004) (in the KEGG as of Sep. 2004) Problem: unknown part of pathways and many missing enzyme genes Problem: unknown part of pathways and many missing enzyme genes

Network inference methods For gene regulatory network Bayesian network (Friedman et al., 2000, Imoto et al, 2002) Bayesian network (Friedman et al., 2000, Imoto et al, 2002) Boolean network (Akutsu et al., 2000) Boolean network (Akutsu et al., 2000) Graphical modeling (Toh et al., 2001) Graphical modeling (Toh et al., 2001) For protein interaction network Joint graph method (Marcotte et al., 1999) Joint graph method (Marcotte et al., 1999) Mirror tree method (Pazos et al., 2001) Mirror tree method (Pazos et al., 2001)

Objectives Develop a method to infer metabolic gene networks in a supervised context Develop a method to infer metabolic gene networks in a supervised context Integrate heterogeneous genomic data in the framework of network inference Integrate heterogeneous genomic data in the framework of network inference Reconstruct unknown pathways and identify genes for missing enzymes Reconstruct unknown pathways and identify genes for missing enzymes

Kernel in this study Kernel : representation of the similarity between two genes and (e.g., correlation coefficient) Kernel matrix: similarity matrix of a set of genes

An example of the kernel Suppose we have a set of genes x 1, x 2,…, x N and represent them by gene expression profiles

An example of kernel matrix This can be regarded as a kind of similarity matrix

Direct network inference Assumption: connected proteins in the network share high similarity in the data Similarity matrix based on a genomic dataset 1 2 3 4 5 6 7 8 9 123456789123456789 Configuration of genes 1 2 3 5 4 7 6 8 9

Direct network inference Assumption: connected proteins in the network share high similarity in the data 1 2 3 4 5 6 7 8 9 123456789123456789 1 2 3 5 4 7 6 8 9 Similarity matrix Predicted network

Evaluation of the direct approach: using gene expression data Gold standard data: metabolic network of 668 genes of the yeast in the KEGG/Pathway ROC curve False positives True positives 157 expriments (SMD)

Outline Motivation: metabolic network Motivation: metabolic network Method: network inference Method: network inference - Supervised network inference - Supervised network inference - Multiple data integration - Multiple data integration Application Application - Global network prediction - Global network prediction - Missing enzyme gene estimation - Missing enzyme gene estimation Concluding remarks Concluding remarks

An illustration of formalism Unknown pathway Protein network Similarity matrix in expression

An illustration of formalism Unknown pathway Protein network Similarity matrix in expression training

Supervised network inference :training set Original space Key idea: use of partially known network information

Supervised network inference :training set Original space : edge predicted by direct approach

Supervised network inference :training set Original space :true edge

Supervised network inference 1/2 Step 1: map proteins to a space, where interacting proteins are close to each other Feature space :training set Original space :true edge

Supervised network inference 2/2 Feature space :training set :test set Original space :true edge

Supervised network inference 2/2 Feature space Step 2: predict interacting protein pairs involving the test set :training set :test set Original space :true edge

Algorithm Kernel CCA (Yamanishi et al., 2004) Distance metric learning (Vert et al., 2004)

Result of the supervised learning: ROC curve by cross-validation Direct approachSupervised approach

Various genomic data Bit strings NumericalvectorsStructure Evolutionary similarity Co-localization similarity Co-expresion similarity Gene-gene relationship Data Phylogenetic profile Localization data Geneexpression

Data of the yeast S. cerevisiae Expression: 6059 genes with 157 experiments (SMD database) Expression: 6059 genes with 157 experiments (SMD database) Localization: 6059 proteins with 23 intracellular locations (Huh et al, 2003) Localization: 6059 proteins with 23 intracellular locations (Huh et al, 2003) Phylogenetic profile: 6059 proteins with 145 organisms (KEGG/Ortholog Cluster) Phylogenetic profile: 6059 proteins with 145 organisms (KEGG/Ortholog Cluster)

Gene expression profiles exp1 exp2 exp3 exp4 exp5 … exp P exp1 exp2 exp3 exp4 exp5 … exp P gene 1 (0.1, 0.4, 0.6, 0.2, -0.3, …, 1.5) gene 2 (0.2, 0.9, 1.8, 0.7, -0.3, …, 0.4) gene 3 (0.6, 0.7, -1.0, 0.8, 1.2, …, 0.6) … gene N (1.2, 0.3, 1.9, -0.1, -0.7, …, 0.1) Numerical vectors of the gene expression ratio gene Experiments (or time series)

Phylogenetic profiles org1 org2 org3 org4 org5 … org P org1 org2 org3 org4 org5 … org P gene 1 (1, 1, 0, 0, 0, …, 1) gene 2 (1, 0, 1, 0, 1, …, 0) gene 3 (0, 1, 0, 0, 1, …, 0) … gene N (1, 0, 1, 0, 0, …, 1) Bit strings in which the presence and absence of the genes are corded as 1 or 0 across organisms gene organism

An illustration of our network inference procedure Gene expression Protein localization Phylogenetic profile Gene network similarity matrix of genes INPUT OUTPUT infer

Data representation and integration Genomic dataSimilarity matrix

Evaluating the weight for each data source 1.Individual application to each data 2.Evaluation of its biological relevance by the ROC score ROC curve ROC score: area under the ROC curve

Evaluating the weight by the ROC scores For each data, compute the ROC score - 0.5, which are used as the weight ExpressionLocalizationPhylogenetic profile Evolutionary information seems to be useful

The resulting normalized weights: The effect of data integration ROC curve

Comprehensive prediction of a global gene network We predicted a network of 6059 genes Possible biological applications 1. Estimate unknown pathways 2. Predict biochemical function for hypothetical proteins 3. Identify missing enzyme genes

Prediction for a role in pathways YJR137C (the detail function was unknown as of Sep. 2003) is connected with EC:1.8.4.8 and EC:2.5.1.47 in the predicted network YJR137C (the detail function was unknown as of Sep. 2003) is connected with EC:1.8.4.8 and EC:2.5.1.47 in the predicted network

Recently, there has been a report that YJR137C is annotated as EC:1.8.1.2 Prediction for a role in pathways

Outline Motivation: metabolic network Motivation: metabolic network Method: network inference Method: network inference - Supervised network inference - Supervised network inference - Multiple data integration - Multiple data integration Application Application - Global network prediction - Global network prediction Concluding remarks Concluding remarks

Summary We developed supervised approaches to infer the metabolic network from multiple genomic data We developed supervised approaches to infer the metabolic network from multiple genomic data The accuracy improved from the supervised learning and the weighted data integration The accuracy improved from the supervised learning and the weighted data integration We showed some possibilities to obtain new biological findings We showed some possibilities to obtain new biological findings

Collaborator For the methods For the methods Jean-Philippe Vert (Ecole des Mines) Jean-Philippe Vert (Ecole des Mines) Minoru Kanehisa (Kyoto University) Minoru Kanehisa (Kyoto University) For the biochemical experiments For the biochemical experiments Hisaaki Mihara, Motoharu Ohsaki, Hisashi Muramatsu, Nobuyoshi Esaki (Kyoto University) Hisaaki Mihara, Motoharu Ohsaki, Hisashi Muramatsu, Nobuyoshi Esaki (Kyoto University)

Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.

Similar presentations

Presentation on theme: "Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.

Similar presentations

Presentation on theme: "Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris."— Presentation transcript:

Similar presentations

About project

Feedback