Download presentation
Presentation is loading. Please wait.
Published byIra Garrett Modified over 9 years ago
1
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris
2
Outline Motivation: metabolic network Motivation: metabolic network Method: network inference Method: network inference - Supervised network inference - Supervised network inference - Multiple data integration - Multiple data integration Application Application - Global network prediction - Global network prediction Concluding remarks Concluding remarks
3
Metabolic network The metabolic network consists of enzyme proteins and chemical compounds The metabolic network consists of enzyme proteins and chemical compounds 6018 genes in yeast genome 6018 genes in yeast genome 1120 genes with EC numbers 1120 genes with EC numbers 668 genes with pathway information 668 genes with pathway information (in the KEGG as of Sep. 2004) (in the KEGG as of Sep. 2004) Problem: unknown part of pathways and many missing enzyme genes Problem: unknown part of pathways and many missing enzyme genes
4
Network inference methods For gene regulatory network Bayesian network (Friedman et al., 2000, Imoto et al, 2002) Bayesian network (Friedman et al., 2000, Imoto et al, 2002) Boolean network (Akutsu et al., 2000) Boolean network (Akutsu et al., 2000) Graphical modeling (Toh et al., 2001) Graphical modeling (Toh et al., 2001) For protein interaction network Joint graph method (Marcotte et al., 1999) Joint graph method (Marcotte et al., 1999) Mirror tree method (Pazos et al., 2001) Mirror tree method (Pazos et al., 2001)
5
Objectives Develop a method to infer metabolic gene networks in a supervised context Develop a method to infer metabolic gene networks in a supervised context Integrate heterogeneous genomic data in the framework of network inference Integrate heterogeneous genomic data in the framework of network inference Reconstruct unknown pathways and identify genes for missing enzymes Reconstruct unknown pathways and identify genes for missing enzymes
6
Kernel in this study Kernel : representation of the similarity between two genes and (e.g., correlation coefficient) Kernel matrix: similarity matrix of a set of genes
7
An example of the kernel Suppose we have a set of genes x 1, x 2,…, x N and represent them by gene expression profiles
8
An example of kernel matrix This can be regarded as a kind of similarity matrix
9
Direct network inference Assumption: connected proteins in the network share high similarity in the data Similarity matrix based on a genomic dataset 1 2 3 4 5 6 7 8 9 123456789123456789 Configuration of genes 1 2 3 5 4 7 6 8 9
10
Direct network inference Assumption: connected proteins in the network share high similarity in the data 1 2 3 4 5 6 7 8 9 123456789123456789 1 2 3 5 4 7 6 8 9 Similarity matrix Predicted network
11
Direct network inference Assumption: connected proteins in the network share high similarity in the data 1 2 3 4 5 6 7 8 9 123456789123456789 1 2 3 5 4 7 6 8 9 Similarity matrix Predicted network
12
Direct network inference Assumption: connected proteins in the network share high similarity in the data 1 2 3 4 5 6 7 8 9 123456789123456789 1 2 3 5 4 7 6 8 9 Similarity matrix Predicted network
13
Direct network inference Assumption: connected proteins in the network share high similarity in the data 1 2 3 4 5 6 7 8 9 123456789123456789 1 2 3 5 4 7 6 8 9 Similarity matrix Predicted network
14
Direct network inference Assumption: connected proteins in the network share high similarity in the data 1 2 3 4 5 6 7 8 9 123456789123456789 1 2 3 5 4 7 6 8 9 Similarity matrix Predicted network
15
Direct network inference Assumption: connected proteins in the network share high similarity in the data 1 2 3 4 5 6 7 8 9 123456789123456789 1 2 3 5 4 7 6 8 9 Similarity matrix Predicted network
16
Direct network inference Assumption: connected proteins in the network share high similarity in the data 1 2 3 4 5 6 7 8 9 123456789123456789 1 2 3 5 4 7 6 8 9 Similarity matrix Predicted network
17
Direct network inference Assumption: connected proteins in the network share high similarity in the data 1 2 3 4 5 6 7 8 9 123456789123456789 1 2 3 5 4 7 6 8 9 Similarity matrix Predicted network
18
Direct network inference Assumption: connected proteins in the network share high similarity in the data 1 2 3 4 5 6 7 8 9 123456789123456789 1 2 3 5 4 7 6 8 9 Similarity matrix Predicted network
19
Evaluation of the direct approach: using gene expression data Gold standard data: metabolic network of 668 genes of the yeast in the KEGG/Pathway ROC curve False positives True positives 157 expriments (SMD)
20
Outline Motivation: metabolic network Motivation: metabolic network Method: network inference Method: network inference - Supervised network inference - Supervised network inference - Multiple data integration - Multiple data integration Application Application - Global network prediction - Global network prediction - Missing enzyme gene estimation - Missing enzyme gene estimation Concluding remarks Concluding remarks
21
An illustration of formalism Unknown pathway Protein network Similarity matrix in expression
22
An illustration of formalism Unknown pathway Protein network Similarity matrix in expression training
23
Supervised network inference :training set Original space Key idea: use of partially known network information
24
Supervised network inference :training set Original space : edge predicted by direct approach
25
Supervised network inference :training set Original space :true edge
26
Supervised network inference 1/2 Step 1: map proteins to a space, where interacting proteins are close to each other Feature space :training set Original space :true edge
27
Supervised network inference 2/2 Feature space :training set :test set Original space :true edge
28
Supervised network inference 2/2 Feature space Step 2: predict interacting protein pairs involving the test set :training set :test set Original space :true edge
29
Algorithm Kernel CCA (Yamanishi et al., 2004) Distance metric learning (Vert et al., 2004)
30
Result of the supervised learning: ROC curve by cross-validation Direct approachSupervised approach
31
Outline Motivation: metabolic network Motivation: metabolic network Method: network inference Method: network inference - Supervised network inference - Supervised network inference - Multiple data integration - Multiple data integration Application Application - Global network prediction - Global network prediction - Missing enzyme gene estimation - Missing enzyme gene estimation Concluding remarks Concluding remarks
32
Various genomic data Bit strings NumericalvectorsStructure Evolutionary similarity Co-localization similarity Co-expresion similarity Gene-gene relationship Data Phylogenetic profile Localization data Geneexpression
33
Data of the yeast S. cerevisiae Expression: 6059 genes with 157 experiments (SMD database) Expression: 6059 genes with 157 experiments (SMD database) Localization: 6059 proteins with 23 intracellular locations (Huh et al, 2003) Localization: 6059 proteins with 23 intracellular locations (Huh et al, 2003) Phylogenetic profile: 6059 proteins with 145 organisms (KEGG/Ortholog Cluster) Phylogenetic profile: 6059 proteins with 145 organisms (KEGG/Ortholog Cluster)
34
Gene expression profiles exp1 exp2 exp3 exp4 exp5 … exp P exp1 exp2 exp3 exp4 exp5 … exp P gene 1 (0.1, 0.4, 0.6, 0.2, -0.3, …, 1.5) gene 2 (0.2, 0.9, 1.8, 0.7, -0.3, …, 0.4) gene 3 (0.6, 0.7, -1.0, 0.8, 1.2, …, 0.6) … gene N (1.2, 0.3, 1.9, -0.1, -0.7, …, 0.1) Numerical vectors of the gene expression ratio gene Experiments (or time series)
35
Phylogenetic profiles org1 org2 org3 org4 org5 … org P org1 org2 org3 org4 org5 … org P gene 1 (1, 1, 0, 0, 0, …, 1) gene 2 (1, 0, 1, 0, 1, …, 0) gene 3 (0, 1, 0, 0, 1, …, 0) … gene N (1, 0, 1, 0, 0, …, 1) Bit strings in which the presence and absence of the genes are corded as 1 or 0 across organisms gene organism
36
An illustration of our network inference procedure Gene expression Protein localization Phylogenetic profile Gene network similarity matrix of genes INPUT OUTPUT infer
37
Data representation and integration Genomic dataSimilarity matrix
38
Evaluating the weight for each data source 1.Individual application to each data 2.Evaluation of its biological relevance by the ROC score ROC curve ROC score: area under the ROC curve
39
Evaluating the weight by the ROC scores For each data, compute the ROC score - 0.5, which are used as the weight ExpressionLocalizationPhylogenetic profile Evolutionary information seems to be useful
40
The resulting normalized weights: The effect of data integration ROC curve
41
Outline Motivation: metabolic network Motivation: metabolic network Method: network inference Method: network inference - Supervised network inference - Supervised network inference - Multiple data integration - Multiple data integration Application Application - Global network prediction - Global network prediction - Missing enzyme gene estimation - Missing enzyme gene estimation Concluding remarks Concluding remarks
42
Comprehensive prediction of a global gene network We predicted a network of 6059 genes Possible biological applications 1. Estimate unknown pathways 2. Predict biochemical function for hypothetical proteins 3. Identify missing enzyme genes
43
Prediction for a role in pathways YJR137C (the detail function was unknown as of Sep. 2003) is connected with EC:1.8.4.8 and EC:2.5.1.47 in the predicted network YJR137C (the detail function was unknown as of Sep. 2003) is connected with EC:1.8.4.8 and EC:2.5.1.47 in the predicted network
44
Recently, there has been a report that YJR137C is annotated as EC:1.8.1.2 Prediction for a role in pathways
45
Outline Motivation: metabolic network Motivation: metabolic network Method: network inference Method: network inference - Supervised network inference - Supervised network inference - Multiple data integration - Multiple data integration Application Application - Global network prediction - Global network prediction Concluding remarks Concluding remarks
46
Summary We developed supervised approaches to infer the metabolic network from multiple genomic data We developed supervised approaches to infer the metabolic network from multiple genomic data The accuracy improved from the supervised learning and the weighted data integration The accuracy improved from the supervised learning and the weighted data integration We showed some possibilities to obtain new biological findings We showed some possibilities to obtain new biological findings
47
Collaborator For the methods For the methods Jean-Philippe Vert (Ecole des Mines) Jean-Philippe Vert (Ecole des Mines) Minoru Kanehisa (Kyoto University) Minoru Kanehisa (Kyoto University) For the biochemical experiments For the biochemical experiments Hisaaki Mihara, Motoharu Ohsaki, Hisashi Muramatsu, Nobuyoshi Esaki (Kyoto University) Hisaaki Mihara, Motoharu Ohsaki, Hisashi Muramatsu, Nobuyoshi Esaki (Kyoto University)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.