Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Data Mining with Bayesian Networks (I) Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe.

Similar presentations


Presentation on theme: "1 Data Mining with Bayesian Networks (I) Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe."— Presentation transcript:

1

2 1 Data Mining with Bayesian Networks (I) Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe Frank

3 2 Goal: Mining Probability Models Probability Basics Our state s in world W, is distributed according to probability distribution 0 <= Pr(s) <= 1 for all s  S Pr(s) = 1 For subsets S1 and S2, Pr(S1  S2) = Pr(s1) + Pr(s2) - Pr(s1  S2) Bayes Rule:

4 3 Weather data set OutlookTemperatureHumidityWindyPlay sunnyhothighFALSEno sunnyhothighTRUEno overcasthothighFALSEyes rainymildhighFALSEyes rainycoolnormalFALSEyes rainycoolnormalTRUEno overcastcoolnormalTRUEyes sunnymildhighFALSEno sunnycoolnormalFALSEyes rainymildnormalFALSEyes sunnymildnormalTRUEyes overcastmildhighTRUEyes overcasthotnormalFALSEyes rainymildhighTRUEno

5 4 Basics Unconditional or Prior Probability Pr(Play=yes) + Pr(Play=no)=1 Pr(Play=yes) is sometimes written as Pr(Play) Table has 9 yes, 5 no Pr(Play=yes)=9/(9+5)=9/14 Thus, Pr(Play=no)=5/14 Joint Probability of Play and Windy: Pr(Play=x,Windy=y) for all values x and y, should be 1 Play=yes Play=no Windy=TrueWindy=False 3/14 ? 6/14

6 5 Probability Basics Conditional Probability Pr(A|B) # (Windy=False)=8 Within the 8, #(Play=yes)=6 Pr(Play=yes | Windy=False) =6/8 Pr(Windy=False)=8/14 Pr(Play=Yes)=9/14 Applying Bayes Rule Pr(B|A) = Pr(A|B)Pr(B) / Pr(A) Pr(Windy=False|Play=yes)= 6/8*8/14/(9/14)=6/9 WindyPlay *FALSEno TRUEno *FALSE*yes *FALSE*yes *FALSE*yes TRUEno TRUEyes *FALSEno *FALSE*yes *FALSE*yes TRUEyes TRUEyes *FALSE*yes TRUEno

7 6 Conditional Independence “ A and P are independent given C ” Pr(A | P,C) = Pr(A | C) Cavity Probe Catches Ache C A P Probability F F F 0.534 F F T 0.356 F T F 0.006 F T T 0.004 T F F 0.048 T F T 0.012 T T F 0.032 T T T 0.008

8 7 Pr(A|C) = 0.032+0.008/ (0.048+0.012+0.032+0.008) = 0.04 / 0.1 = 0.4 Suppose C=True Pr(A|P,C) = 0.032/(0.032+0.048) = 0.032/0.080 = 0.4 Conditional Independence “ A and P are independent given C ” Pr(A | P,C) = Pr(A | C) and also Pr(P | A,C) = Pr(P | C) C A P Probability F F F 0.534 F F T 0.356 F T F 0.006 F T T 0.004 T F F 0.012 T F T 0.048 T T F 0.008 T T T 0.032

9 8 Conditional Independence Can encode joint probability distribution in compact form C A P Probability F F F 0.534 F F T 0.356 F T F 0.006 F T T 0.004 T F F 0.012 T F T 0.048 T T F 0.008 T T T 0.032 Cavity Probe Catches Ache P(C).01 C P(P) T 0.8 F 0.4 C P(A) T 0.4 F 0.02 Conditional probability table (CPT)

10 9 Creating a Network 1: Bayes net = representation of a JPD 2: Bayes net = set of cond. independence statements If create correct structure that represents causality Then get a good network i.e. one that ’ s small = easy to compute with One that is easy to fill in numbers

11 10 Example My house alarm system just sounded (A). Both an earthquake (E) and a burglary (B) could set it off. John will probably hear the alarm; if so he ’ ll call (J). But sometimes John calls even when the alarm is silent Mary might hear the alarm and call too (M), but not as reliably We could be assured a complete and consistent model by fully specifying the joint distribution: Pr(A, E, B, J, M) Pr(A, E, B, J, ~M) etc.

12 11 Structural Models (HK book 7.4.3) Instead of starting with numbers, we will start with structural relationships among the variables There is a direct causal relationship from Earthquake to Alarm There is a direct causal relationship from Burglar to Alarm There is a direct causal relationship from Alarm to JohnCall Earthquake and Burglar tend to occur independently etc.

13 12 Possible Bayesian Network Burglary MaryCalls JohnCalls Alarm Earthquake

14 13 Graphical Models and Problem Parameters What probabilities need I specify to ensure a complete, consistent model given the variables I have identified the dependence and independence relationships I have specified by building a graph structure Answer provide an unconditional (prior) probability for every node in the graph with no parents for all remaining, provide a conditional probability table Prob(Child | Parent1, Parent2, Parent3) for all possible combination of Parent1, Parent2, Parent3 values

15 14 Complete Bayesian Network Burglary MaryCalls JohnCalls Alarm Earthquake P(A).95.94.29.01 ATFATF P(J).90.05 ATFATF P(M).70.01 P(B).001 P(E).002 ETFTFETFTF BTTFFBTTFF

16 15 Microsoft Bayesian Belief Net http://research.microsoft.com/adapt/MSBNx/ Can be used to construct and reason with Bayesian Networks Consider the example

17 16

18 17

19 18

20 19

21 20 Mining for Structural Models Learning problem Some methods are proposed Difficult problem Often requires domain expert’s knowledge Once set up, a Bayesian Network can be used to provide probabilistic queries Microsoft Bayesian Network Software Problems: Known structure, fully observable CPTables are to be learned Unknown structure, fully observable Search structures Known Structure, hidden var Parameter learning using hill climbing Unknown (Structure,Var) No good results

22 21 Hidden Variable (Han and Kamber’s Data Mining book, pages 301-302) Assume that the Bayesian Network structure is given Some variables are hidden Example: Our objective: find the CPT for all nodes Idea: Use a method of gradient descent Let S be the set of training examples: {X1, X2, … Xs} Consider a variable Yi and Parents Ui={Parent1, Parent2, …}. Question: What is Pr(Y i =y ij | Ui=u ik )? Answer: learn this value from the data in iterations

23 22 Learn CPT for Hidden Variable Suppose we are in a Tennis Domain We wish to introduce a new variable not in our data set, called Field Temp It represents the temperature of the field Assume that we don’t have a good way to measure it, but have to include it in our network Windy Outlook Field Temp

24 23 Learn the CPT Let w ijk be the value of Pr(Yi|Ui) Compute a new w ijk based on the old Parent1 Parent2 Ui Yi

25 24 Example: Learn the CPT w= Pr(Field Temp=Hot|Windy=True,Outlook=Sunny ) Let the old w be 0.5. Compute a new w Windy Outlook Field Temp Normalize and then iterate until stable.


Download ppt "1 Data Mining with Bayesian Networks (I) Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe."

Similar presentations


Ads by Google