Download presentation
Presentation is loading. Please wait.
Published byAda Daniel Modified over 9 years ago
1
Inferring strengths of protein-protein interactions from experimental data using linear programming Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University
2
Overview Background Probabilistic model Related work Biological experimental data Proposed methods For binary data For numerical data Results of computational experiments Conclusion
3
Background (1/3) Understanding protein-protein interactions is useful for understanding of protein functions. Transcription factors Proteins interact with a factor. Regulate the gene. Receptors, etc.
4
Background (2/3) Various methods were developed for inference of protein-protein interactions Gene fusion/Rosetta stone (Enright et al. and Marcotte et al. 1999) Number of possible genes to be applied is limited. Molecular dynamics Long CPU time Difficult to predict precisely
5
Background (3/3) A Model based on domain-domain interactions has been proposed. Use domains defined by databases like InterPro or Pfam. Domain
6
Overview Background Probabilistic model Related work Biological experimental data Proposed methods For binary data For numerical data Results of computational experiments Conclusion
7
Probabilistic model of interaction (1/2) Model (Deng et al., 2002) Two proteins interact. At least one pair of domains interacts. Interactions between domains are independent events. D1D1 D2D2 D3D3 D2D2 D4D4 P2P2 P1P1
8
: Proteins P i and P j interact : Domains D m and D n interact : Domain pair (D m,D n ) is included in protein pair P i X P j Probabilistic model of interaction (2/2)
9
Overview Background Probabilistic model Related work Association method (Sprinzak et al., 2001) EM method (Deng et al., 2002) Biological experimental data Proposed methods Results of computational experiments Conclusion
10
Related work INPUT: interacting protein pairs (positive examples) non-interacting protein pairs (negative examples) OUTPUT: Pr(D mn =1) for all domain pairs
11
Association method (Sprinzak et al., 2001) Inference of probabilities of domain- domain interactions using ratios of frequencies : Number of interacting protein pairs that include (D m, D n ) : Number of protein pairs that include (D m, D n )
12
EM method (Deng et al.,2002) Probability (likelihood L ) that experimental data {O ij ={0,1} } are observed. Use EM algorithm in order to (locally) maximize L. Estimate Pr(D mn =1)
13
Overview Background Probabilistic model Related work Biological experimental data Proposed methods For binary data For numerical data Results of computational experiments Conclusion
14
Biological experimental data Related methods (Association and EM) use only binary data (interact or not). Experimental data using Yeast 2 hybrid Ito et al. (2000, 2001) Uetz et al. (2001) For many protein pairs, different results ( O ij = {0,1} ) were observed. We developed new methods using raw numerical data.
15
Numerical data Ito et al. (2000,2001) For each protein pair, experiments were performed multiple times. IST (Interaction Sequence Tag) Number of observed interactions By using a threshold, we obtain binary data.
16
Overview Background Probabilistic model Related work Biological experimental data Proposed methods For binary data For numerical data Results of computational experiments Conclusion
17
Proposed methods It seems difficult to modify EM method for numerical data. Linear Programming For binary data LPBN Combined methods LPEM EMLP SVM-based method For numerical data ASNM LPNM
18
Overview Background Probabilistic model Related work Biological experimental data Proposed methods For binary data For numerical data Results of computational experiments Conclusion
19
LPBN (LP-based method)(1/2) Transformation into linear inequalities P i and P j interact
20
LPBN (LP-based method)(2/2) Linear programming for inference of protein-protein interactions
21
Combination of EM and LPBN LPEM method Use the results of LPBN as initial parameter values for EM. EMLP method Constrains to LPBN with the following inequalities so that LP solutions are close to EM solutions.
22
Simple SVM-based method Feature vector Simple linear kernel with Interacting pairs = Positive examples Non-interacting pairs = Negative examples
23
Overview Background Probabilistic model Related work Biological experimental data Proposed methods For binary data For numerical data Results of computational experiments Conclusion
24
Strength of protein-protein interaction For each protein pair, experiments were performed multiple times. The ratio can be considered as strength. K ij : Number of observed interactions for a protein pair (P i, P j ) M ij : Number of experiments for (P i, P j )
25
LPNM method (1/2) Minimize the gap between Pr(P ij =1) and using LP.
26
LPNM method (2/2) Linear programming for inference of strengths of protein-protein interactions
27
ASNM Modified Association method for numerical data For binary data (Sprinzak et al., 2001)
28
Overview Background Probabilistic model Related work Biological experimental data Proposed methods For binary data For numerical data Results of computational experiments Conclusion
29
Computational experiments for binary data DIP database (Xenarios et al., 2002) 1767 protein pairs as positive 2/3 of the pairs for training, 1/3 for test Computational environment Xeon processor 2.8 GHz LP solver: loqo
30
Results on training data (binary data) SVM EM LPBN Association
31
Results on test data (binary data) SVM EM EMLP Association LPEM
32
Computational experiments for numerical data YIP database (Ito et al., 2001, 2002) IST (Interaction Sequence Tag) 1586 protein pairs 4/5 for training, 1/5 for test Computational environment Xeon processor 2.8 GHz LP solver: lp_solve
33
Results on test data (numerical data) ASNM EM LPNM Association
34
Results on test data (numerical data) LPNM is the best. EM and Association methods classify Pr(P ij =1) into either 0 or 1. LPNM ASNM EMASSOC Ave. Error 0.03080.0405 0.295 0.277 CPU (sec.) 1.200.0077 1.620.0088
35
Conclusion We have defined a new problem to infer strengths of protein-protein interactions. We have proposed LP-based methods. For binary data LPBN, LPEM, EMLP SVM-based method For numerical data ASNM LPNM LPNM outperformed the other methods.
36
Future work Improve the methods to avoid overfitting. Improve the probabilistic model to understand protein-protein interactions more accurately.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.