Download presentation
Presentation is loading. Please wait.
Published byCori Berry Modified over 9 years ago
1
(c) M Gerstein '06, gerstein.info/talks 1 CS/CBB 545 - Data Mining Predicting Networks through Bayesian Integration #1 - Theory Mark Gerstein, Yale University gersteinlab.org/courses/545 (class 2007,02.27 14:30-15:45)
2
(c) M Gerstein '06, gerstein.info/talks 2 Predicting Networks via Bayesian Integration: Problem Motivation
3
(c) M Gerstein '06, gerstein.info/talks 3 Combining networks forms an ideal way of integrating diverse information Metabolic pathway Transcriptional regulatory network Physical protein- protein Interaction Co-expression Relationship Part of the TCA cycle Genetic interaction (synthetic lethal) Signaling pathways
4
(c) M Gerstein '06, gerstein.info/talks 4 Binding Experiments on Subunit Pairs 1111010 Pull-down 2 1101010110111001010000000000 1101010 Pull-down 1 1101010110111101010000010000 0011111101 Cross-linking 1111111111 1 Pull-down 3 11000110 Far Western 3 100100 Far Western 2 111111111100000100000000000000 11111001 Far Western 1 0000000 Interaction experiments before structure was known 11111111123566666888899910 12 23568910111235689101112568910111268910111289101112910111210111211 12 222222233333355555 Subunits
5
(c) M Gerstein '06, gerstein.info/talks 5 RNA polymerase II: Structure Which subunits interact? Based on Binding experiments Source: Cramer et al., 2000, Science, 288:632-633 Compare with Gold Std. Structure 5 6 8 9 1011 12 1 2 3 Source: Edwards et al., 2002, Trends in Genetics
6
(c) M Gerstein '06, gerstein.info/talks 6 Gold-Standard Positives 5 6 8 9 1011 12 1 2 3 Gold-Standard Positive (GSTD+): 13 11111111123566666888899910 12222222233333355555 Subunits 23568910111235689101112568910111268910111289101112910111210111211 12 Subunits
7
(c) M Gerstein '06, gerstein.info/talks 7 Gold-Standard Negatives Gold-Standard Negative (GSTD-): 32 5 6 8 9 1011 12 1 2 3 11111111123566666888899910 12222222233333355555 Subunits 23568910111235689101112568910111268910111289101112910111210111211 12 Subunits
8
(c) M Gerstein '06, gerstein.info/talks 8 RNA Polymerase II: Gold-Standards Gold-Standard Negative (GSTD-): 32 11111111123566666888899910 12222222233333355555 Subunits 23568910111235689101112568910111268910111289101112910111210111211 12 Subunits Gold-Standard Positive (GSTD+): 13 5 6 8 9 1011 12 1 2 3
9
(c) M Gerstein '06, gerstein.info/talks 9 Assess Quality and Coverage of PPints 1111010 Pull-down 2 1101010110111001010000000000 1101010 Pull-down 1 1101010110111101010000010000 0011111101 Cross-linking 1111111111 1 Pull-down 3 11000110 Far Western 3 100100 Far Western 2 111111111100000100000000000000 11111001 Far Western 1 0000000 11111111123566666888899910 12222222233333355555 Subunits 23568910111235689101112568910111268910111289101112910111210111211 12 Subunits True False GSTD+ GSTD- Likelihood Ratio for Feature f: “Quality Score”
10
(c) M Gerstein '06, gerstein.info/talks 10 Data integration: RNA polymerase II
11
(c) M Gerstein '06, gerstein.info/talks 11 Integrate using naive Bayes classifier Data integration: RNA polymerase II
12
(c) M Gerstein '06, gerstein.info/talks 12 Data integration: RNA polymerase II Integrate using naive Bayes classifier
13
(c) M Gerstein '06, gerstein.info/talks 13 Data integration: RNA polymerase II Integrate using naive Bayes classifier (Cross validate)
14
(c) M Gerstein '06, gerstein.info/talks 14 Weighted Voting: the Likelihood Ratio Maj. Vote: 0 = round(avg( 0 + 0 + 0 + 1 + 1 + 0 + 0 ) With weights: likelihood ratio L = L 1 + L 2 + L 3 …
15
(c) M Gerstein '06, gerstein.info/talks 15 Predicting Networks via Bayesian Integration: Intuition & Formalism
16
(c) M Gerstein '06, gerstein.info/talks 16 Classification by Voting (N1) Derived from "perceptron model" in earlier class R = + b
17
(c) M Gerstein '06, gerstein.info/talks 17 Classification by Voting (N2)
18
(c) M Gerstein '06, gerstein.info/talks 18 Bayes Rule Which is shorthand for: [From Mitchell, Machine Learning]
19
(c) M Gerstein '06, gerstein.info/talks 19 More Bayes Rule
20
(c) M Gerstein '06, gerstein.info/talks 20 More Bayes Rule
21
(c) M Gerstein '06, gerstein.info/talks 21 Feature Correlation and Fully Connected Bayes
22
(c) M Gerstein '06, gerstein.info/talks 22 Estimating Probabilities We have so far estimated P(X=x | Y=y) by the fraction n x|y /n y, where n y is the number of instances for which Y=y and n x|y is the number of these for which X=x This is a problem when n x is small E.g., assume P(X=x | Y=y)=0.05 and the training set is s.t. that n y =5. Then it is highly probable that n x|y =0 The fraction is thus an underestimate of the actual probability It will dominate the Bayes classifier for all new queries with X=x
23
(c) M Gerstein '06, gerstein.info/talks 23 m-estimate Replace n x|y /n y by: Where p is our prior estimate of the probability we wish to determine and m is a constant Typically, p = 1/k (where k is the number of possible values of X) m acts as a weight (similar to adding m virtual instances distributed according to p)
24
(c) M Gerstein '06, gerstein.info/talks 24 Training and Testing Set Cross Validation: Leave one out, seven-fold Credits: Munson, 1995; Garnier et al., 1996 4-fold Cross Validation
25
(c) M Gerstein '06, gerstein.info/talks 25 Intuition N2.7 : ROC Curve & the prior
26
(c) M Gerstein '06, gerstein.info/talks 26 Predicting Networks via Bayesian Integration: Worked Examples
27
(c) M Gerstein '06, gerstein.info/talks 27 Network Gold-Standards Prediction of protein interactions: Bayesian integration [Jansen, Yu, et al., Science; Yu, et al., Genome Res.] Feature 2, e.g. Y2H Feature 1, e.g. co-expression Gold-standard + Gold-standard –
28
(c) M Gerstein '06, gerstein.info/talks 28 Network Gold-Standards [Jansen, Yu, et al., Science; Yu, et al., Genome Res.] Feature 2, e.g. Y2H Feature 1, e.g. co-expression Gold-standard + Gold-standard – Prediction of protein interactions: Bayesian integration Frac. of Gold-Std Negatives with Feature Frac. of Gold-Std Positives with Feature Likelihood Ratio for Feature i: = "Quality Score" =
29
(c) M Gerstein '06, gerstein.info/talks 29 Feature 2, e.g. Y2H Feature 1, e.g. co-expression Gold-standard + Gold-standard – Network Gold-Standards L 1 = (4/ 4 )/(3/6) =2 [Jansen, Yu, et al., Science; Yu, et al., Genome Res.] Likelihood Ratio for Feature i: Prediction of protein interactions: Bayesian integration
30
(c) M Gerstein '06, gerstein.info/talks 30 Feature 2, e.g. Y2H Feature 1, e.g. co-expression Gold-standard + Gold-standard – Network Gold-Standards L 1 = ( 4 /4)/(3/6) =2 [Jansen, Yu, et al., Science; Yu, et al., Genome Res.] Likelihood Ratio for Feature i: Prediction of protein interactions: Bayesian integration
31
(c) M Gerstein '06, gerstein.info/talks 31 Feature 2, e.g. Y2H Feature 1, e.g. co-expression Gold-standard + Gold-standard – Network Gold-Standards L 1 = (4/4)/(3/ 6 ) =2 [Jansen, Yu, et al., Science; Yu, et al., Genome Res.] Likelihood Ratio for Feature i: Prediction of protein interactions: Bayesian integration
32
(c) M Gerstein '06, gerstein.info/talks 32 Feature 2, e.g. Y2H Feature 1, e.g. co-expression Gold-standard + Gold-standard – Network Gold-Standards L 1 = (4/4)/( 3 /6) =2 [Jansen, Yu, et al., Science; Yu, et al., Genome Res.] Likelihood Ratio for Feature i: Prediction of protein interactions: Bayesian integration
33
(c) M Gerstein '06, gerstein.info/talks 33 Feature 2, e.g. Y2H Feature 1, e.g. co-expression Gold-standard + Gold-standard – Network Gold-Standards [Jansen, Yu, et al., Science; Yu, et al., Genome Res.] Likelihood Ratio for Feature i: Prediction of protein interactions: Bayesian integration L 1 = (4/4)/(3/6) =2
34
(c) M Gerstein '06, gerstein.info/talks 34 Feature 2, e.g. Y2H Feature 1, e.g. co-expression Gold-standard + Gold-standard – Network Gold-Standards L 1 = (4/4)/(3/6) =2 L2 = (3/4)/(3/6) =1.5 For each protein pair: LR = L1 L2 log(LR) = log(L 1 ) + log(L 2 ) [Jansen, Yu, et al., Science; Yu, et al., Genome Res.] Likelihood Ratio for Feature i: Prediction of protein interactions: Bayesian integration Weighted Voting (Assuming uncorrelated features and Naïve Bayes)
35
(c) M Gerstein '06, gerstein.info/talks 35 Feature 2, e.g. Y2H Feature 1, e.g. co-expression Gold-standard + Gold-standard – Network Gold-Standards L 1 = (4/4)/(3/6) =2 L2 = (3/4)/(3/6) =1.5 For each protein pair: LR = L1 L2 log(LR) = log(L 1 ) + log(L 2 ) [Jansen, Yu, et al., Science; Yu, et al., Genome Res.] Likelihood Ratio for Feature i: Prediction of protein interactions: Bayesian integration
36
(c) M Gerstein '06, gerstein.info/talks 36 4 4 3 0 TPLR Cutoffs 4 3 2 1 Feature 1; L 1 = 2 Feature 2; L 2 = 1.5 Gold-standard + Gold-standard – Predicted Positive Predicted Negative LR cutoffs 1 No Prediction made Protein Bayesian Integration and ROC Curves 2 <3 22 1 3< 3 <2 3/2< 4 3/2
37
(c) M Gerstein '06, gerstein.info/talks 37 6 3 0 0 FPLR Cutoffs 4 3 2 1 Feature 1; L 1 = 2 Feature 2; L 2 = 1.5 Gold-standard + Gold-standard – Predicted Positive Predicted Negative LR cutoffs 1 No Prediction made Protein Bayesian Integration and ROC Curves 2 <3 22 1 3< 3 <2 3/2< 4 3/2
38
(c) M Gerstein '06, gerstein.info/talks 38 64 34 03 00 FPTPLR Cutoffs 4 3 2 1 Feature 1; L 1 = 2 Feature 2; L 2 = 1.5 Gold-standard + Gold-standard – Predicted Positive Predicted Negative LR cutoffs 1 No Prediction made Protein 4 3 2 1 FPR=FP/N "Error Rate" TPR=TP/P "Coverage" ROC Curve 1/21 1 Bayesian Integration and ROC Curves 2 <3 22 1 3< 3 <2 3/2< 4 3/2
39
(c) M Gerstein '06, gerstein.info/talks 39 Likelihood Ratios 1101010 Pull-down 1 1101010110111101010000010000 11111111123566666888899910 12222222233333355555 Subunits 23568910111235689101112568910111268910111289101112910111210111211 12 Subunits True False GSTD+ GSTD- Likelihood Ratio for Feature f:
40
(c) M Gerstein '06, gerstein.info/talks 40 Calculating Likelihood Ratios 10100 Pull-down 1 10111 11111111123566666888899910 12222222233333355555 Subunits 23568910111235689101112568910111268910111289101112910111210111211 12 Subunits True False GSTD+ GSTD-
41
(c) M Gerstein '06, gerstein.info/talks 41 Calculating Likelihood Ratios 11 Pull-down 1 10101101101010000010000 11111111123566666888899910 12222222233333355555 Subunits 23568910111235689101112568910111268910111289101112910111210111211 12 Subunits True False GSTD+ GSTD- =1.34 1101010 Pull-down 1 1101010110111101010000010000 =0.70
42
(c) M Gerstein '06, gerstein.info/talks 42 Calculating Likelihood Ratios 11111111123566666888899910 12222222233333355555 Subunits 23568910111235689101112568910111268910111289101112910111210111211 12 Subunits True False GSTD+ GSTD- Pull-down 1 L1 = (6/13) / (11/32) = 1.34L0 = (4/13) / (14/32) = 0.70 Pull-down 2 L1 = (7/13) / (9/32) = 1.91L0 = (2/13) / (16/32) = 0.31 Pull-down 3 L1 = (2/13) / (3/32) = 1.64L0 = (2/13) / (2/32) = 2.46 Cross-linking L1 = (10/13) / (7/32) = 3.52L0 = (0/13) / (3/32) = 0 Far Western 1 L1 = (2/13) / (4/32) = 1.23L0 = (3/13) / (6/32) = 1.23 Far Western 2 L1 = (6/13) / (5/32) = 2.95L0 = (2/13) / (17/32) = 0.29 Far Western 3 L1 = (1/13) / (1/32) = 2.46L0 = (2/13) / (2/32) = 2.46 1111010 Pull-down 2 1101010110111001010000000000 1101010 Pull-down 1 1101010110111101010000010000 0011111101 Cross-linking 1111111111 1 Pull-down 3 11000110 Far Western 3 100100 Far Western 2 111111111100000100000000000000 11111001 Far Western 1 0000000
43
(c) M Gerstein '06, gerstein.info/talks 43 Data Integration: ROC-Curve 001011100 Combined (Bayes) 111110111000010000000000000000000000 1111010 Pull-down 2 1101010110111001010000000000 1101010 Pull-down 1 1101010110111101010000010000 0011111101 Cross-linking 1111111111 1 Pull-down 3 11000110 Far Western 3 100100 Far Western 2 111111111100000100000000000000 11111001 Far Western 1 0000000 3.5218.211.113.926.60.220 0.290.060.120.220.290.06 0.22 0.360.080.910.2200.522.1613219.50.533.685.5212.710.42.2526.60.2202.2511.118.210.42.250.060.29 0 11111111123566666888899910 12222222233333355555 Subunits 23568910111235689101112568910111268910111289101112910111210111211 12 Subunits True False GSTD+ GSTD- “Weighted Voting”
44
(c) M Gerstein '06, gerstein.info/talks 44 Data Integration: ROC Curve 001011100 Combined (Bayes) 111110111000010000000000000000000000 3.5218.211.113.926.60.220 0.290.060.120.220.290.06 0.22 0.360.080.910.2200.522.1613219.50.533.685.5212.710.42.2526.60.2202.2511.118.210.42.250.060.29 0 111010101 Majority 111110111000010000000000000000000000 001010000 Intersection 111100111000010000000000000000000000 111111111 Union 111111111100011100100100000100000000 0.2 0.4 0.6 0.8 1.0 0.20.40.60.81.0 FPR=FP/N=1-Specificity TPR=TP/P=Sensitivity ROC Curves of RNA Polymerase II 11111111123566666888899910 12222222233333355555 Subunits 23568910111235689101112568910111268910111289101112910111210111211 12 Subunits True False GSTD+ GSTD- Bayesian Integration Majority Intersection Union
45
(c) M Gerstein '06, gerstein.info/talks 45 Predicting Networks via Bayesian Integration: Features Correlation
46
(c) M Gerstein '06, gerstein.info/talks 46 Correlations between similar features
47
(c) M Gerstein '06, gerstein.info/talks 47 Intuition N3 : Feature Correlation
48
(c) M Gerstein '06, gerstein.info/talks 48 Naive Bayes
49
(c) M Gerstein '06, gerstein.info/talks 49 A ‘correct’ factorisation
50
(c) M Gerstein '06, gerstein.info/talks 50 A Typical BBN
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.