Download presentation
Presentation is loading. Please wait.
2
1 Data Mining with Bayesian Networks (I) Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe Frank
3
2 Goal: Mining Probability Models Probability Basics Our state s in world W, is distributed according to probability distribution 0 <= Pr(s) <= 1 for all s S Pr(s) = 1 For subsets S1 and S2, Pr(S1 S2) = Pr(s1) + Pr(s2) - Pr(s1 S2) Bayes Rule:
4
3 Weather data set OutlookTemperatureHumidityWindyPlay sunnyhothighFALSEno sunnyhothighTRUEno overcasthothighFALSEyes rainymildhighFALSEyes rainycoolnormalFALSEyes rainycoolnormalTRUEno overcastcoolnormalTRUEyes sunnymildhighFALSEno sunnycoolnormalFALSEyes rainymildnormalFALSEyes sunnymildnormalTRUEyes overcastmildhighTRUEyes overcasthotnormalFALSEyes rainymildhighTRUEno
5
4 Basics Unconditional or Prior Probability Pr(Play=yes) + Pr(Play=no)=1 Pr(Play=yes) is sometimes written as Pr(Play) Table has 9 yes, 5 no Pr(Play=yes)=9/(9+5)=9/14 Thus, Pr(Play=no)=5/14 Joint Probability of Play and Windy: Pr(Play=x,Windy=y) for all values x and y, should be 1 Play=yes Play=no Windy=TrueWindy=False 3/14 ? 6/14
6
5 Probability Basics Conditional Probability Pr(A|B) # (Windy=False)=8 Within the 8, #(Play=yes)=6 Pr(Play=yes | Windy=False) =6/8 Pr(Windy=False)=8/14 Pr(Play=Yes)=9/14 Applying Bayes Rule Pr(B|A) = Pr(A|B)Pr(B) / Pr(A) Pr(Windy=False|Play=yes)= 6/8*8/14/(9/14)=6/9 WindyPlay *FALSEno TRUEno *FALSE*yes *FALSE*yes *FALSE*yes TRUEno TRUEyes *FALSEno *FALSE*yes *FALSE*yes TRUEyes TRUEyes *FALSE*yes TRUEno
7
6 Conditional Independence “ A and P are independent given C ” Pr(A | P,C) = Pr(A | C) Cavity Probe Catches Ache C A P Probability F F F 0.534 F F T 0.356 F T F 0.006 F T T 0.004 T F F 0.048 T F T 0.012 T T F 0.032 T T T 0.008
8
7 Pr(A|C) = 0.032+0.008/ (0.048+0.012+0.032+0.008) = 0.04 / 0.1 = 0.4 Suppose C=True Pr(A|P,C) = 0.032/(0.032+0.048) = 0.032/0.080 = 0.4 Conditional Independence “ A and P are independent given C ” Pr(A | P,C) = Pr(A | C) and also Pr(P | A,C) = Pr(P | C) C A P Probability F F F 0.534 F F T 0.356 F T F 0.006 F T T 0.004 T F F 0.012 T F T 0.048 T T F 0.008 T T T 0.032
9
8 Conditional Independence Can encode joint probability distribution in compact form C A P Probability F F F 0.534 F F T 0.356 F T F 0.006 F T T 0.004 T F F 0.012 T F T 0.048 T T F 0.008 T T T 0.032 Cavity Probe Catches Ache P(C).01 C P(P) T 0.8 F 0.4 C P(A) T 0.4 F 0.02 Conditional probability table (CPT)
10
9 Creating a Network 1: Bayes net = representation of a JPD 2: Bayes net = set of cond. independence statements If create correct structure that represents causality Then get a good network i.e. one that ’ s small = easy to compute with One that is easy to fill in numbers
11
10 Example My house alarm system just sounded (A). Both an earthquake (E) and a burglary (B) could set it off. John will probably hear the alarm; if so he ’ ll call (J). But sometimes John calls even when the alarm is silent Mary might hear the alarm and call too (M), but not as reliably We could be assured a complete and consistent model by fully specifying the joint distribution: Pr(A, E, B, J, M) Pr(A, E, B, J, ~M) etc.
12
11 Structural Models (HK book 7.4.3) Instead of starting with numbers, we will start with structural relationships among the variables There is a direct causal relationship from Earthquake to Alarm There is a direct causal relationship from Burglar to Alarm There is a direct causal relationship from Alarm to JohnCall Earthquake and Burglar tend to occur independently etc.
13
12 Possible Bayesian Network Burglary MaryCalls JohnCalls Alarm Earthquake
14
13 Graphical Models and Problem Parameters What probabilities need I specify to ensure a complete, consistent model given the variables I have identified the dependence and independence relationships I have specified by building a graph structure Answer provide an unconditional (prior) probability for every node in the graph with no parents for all remaining, provide a conditional probability table Prob(Child | Parent1, Parent2, Parent3) for all possible combination of Parent1, Parent2, Parent3 values
15
14 Complete Bayesian Network Burglary MaryCalls JohnCalls Alarm Earthquake P(A).95.94.29.01 ATFATF P(J).90.05 ATFATF P(M).70.01 P(B).001 P(E).002 ETFTFETFTF BTTFFBTTFF
16
15 Microsoft Bayesian Belief Net http://research.microsoft.com/adapt/MSBNx/ Can be used to construct and reason with Bayesian Networks Consider the example
17
16
18
17
19
18
20
19
21
20 Mining for Structural Models Learning problem Some methods are proposed Difficult problem Often requires domain expert’s knowledge Once set up, a Bayesian Network can be used to provide probabilistic queries Microsoft Bayesian Network Software Problems: Known structure, fully observable CPTables are to be learned Unknown structure, fully observable Search structures Known Structure, hidden var Parameter learning using hill climbing Unknown (Structure,Var) No good results
22
21 Hidden Variable (Han and Kamber’s Data Mining book, pages 301-302) Assume that the Bayesian Network structure is given Some variables are hidden Example: Our objective: find the CPT for all nodes Idea: Use a method of gradient descent Let S be the set of training examples: {X1, X2, … Xs} Consider a variable Yi and Parents Ui={Parent1, Parent2, …}. Question: What is Pr(Y i =y ij | Ui=u ik )? Answer: learn this value from the data in iterations
23
22 Learn CPT for Hidden Variable Suppose we are in a Tennis Domain We wish to introduce a new variable not in our data set, called Field Temp It represents the temperature of the field Assume that we don’t have a good way to measure it, but have to include it in our network Windy Outlook Field Temp
24
23 Learn the CPT Let w ijk be the value of Pr(Yi|Ui) Compute a new w ijk based on the old Parent1 Parent2 Ui Yi
25
24 Example: Learn the CPT w= Pr(Field Temp=Hot|Windy=True,Outlook=Sunny ) Let the old w be 0.5. Compute a new w Windy Outlook Field Temp Normalize and then iterate until stable.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.