Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Science CPSC 322 Lecture 19 Bayesian Networks: Construction 1.

Similar presentations


Presentation on theme: "Computer Science CPSC 322 Lecture 19 Bayesian Networks: Construction 1."— Presentation transcript:

1 Computer Science CPSC 322 Lecture 19 Bayesian Networks: Construction 1

2 Announcements Assignment 4 posted last Friday You can already do Q1, Q2 and part of Q3 after today Due Friday, April 1, 11.59pm Solution for the assignment will be posted on Wed, April 6 (to account for late days) This will give you plenty of time to study it and discuss it with us before the final 2

3 Lecture Overview Recap lecture 18 Defining Conditional Probabilities in a Bnet Considerations on Network Structure Inference: Intro 3

4 Marginal Independence Intuitively: if X ╨ Y, then learning that Y=y does not change your belief in X and this is true for all values y that Y could take For example, weather is marginally independent of the result of a coin toss 4

5 Exploiting marginal independence Recall the product rule p(X=x ˄ Y=y) = p(X=x | Y=y) × p(Y=y) If X and Y are marginally independent, p(X=x | Y=y) = p(X=x) Thus we have p(X=x ˄ Y=y) = p(X=x) × p(Y=y) In distribution form p(X,Y) = p(X) × p(Y) 5

6 Exploiting marginal independence Exponentially fewer than the JPD! 6

7 Conditional Independence Intuitively: if X and Y are conditionally independent given Z, then learning that Y=y does not change your belief in X when we already know Z=z and this is true for all values y that Y could take and all values z that Z could take 7

8 Example for Conditional Independence However, whether light l 1 is lit (Lit-l 1 ) is conditionally independent from the position of switch s 2 (Up-s 2 ) given whether there is power in wire w 0 Once we know Power-w 0, learning values for Up-s 2 does not change our beliefs about Lit-l 1 I.e., Lit-l 1 is conditionally independent of Up-s 2 given Power-w 0 Power-w 0 Lit-l 1 Up-s 2 Whether light l 1 is lit and the position of switch s 2 are not marginally independent The position of the switch determines whether there is power in the wire w 0 connected to the light Lit-l 1 Up-s 2 8

9 Exploiting Conditional Independence If, for instance, D is conditionally independent of A and B given C, we can rewrite the above as Under independence we gain compactness The chain rule allows us to write the JPD as a product of conditional distributions Conditional independence allows us to write them compactly 9

10 Bayesian Network Motivation We want a representation and reasoning system that is based on conditional (and marginal) independence Compact yet expressive representation Efficient reasoning procedures Bayesian (Belief) Networks are such a representation 10

11 Belief (or Bayesian) networks Def. A Belief network consists of a directed, acyclic graph (DAG) where each node is associated with a random variable X i A domain for each variable X i a set of conditional probability distributions for each node X i given its parents Pa(X i ) in the graph P (X i | Pa(X i )) The parents Pa(X i ) of a variable X i are those X i directly depends on A Bayesian network is a compact representation of the JDP for a set of variables (X 1, …,X n ) P(X 1, …,X n ) = ∏ n i= 1 P (X i | Pa(X i )) 11

12 1. Define a total order over the random variables: (X 1, …,X n ) 2. Apply the chain rule P(X 1, …,X n ) = ∏ n i= 1 P(X i | X 1, …,X i-1 ) 3.For each X i,, select the smallest set of predecessors Pa (X i ) such that P(X i | X 1, …,X i-1 ) = P (X i | Pa(X i )) 4.Then we can rewrite P(X 1, …,X n ) = ∏ n i= 1 P (X i | Pa(X i )) This is a compact representation of the initial JPD factorization of the JPD based on existing conditional independencies among the variables How to build a Bayesian network Predecessors of X i in the total order defined over the variables X i is conditionally independent from all its other predecessors given Pa(X i ) 12

13 How to build a Bayesian network (cont’d) 5. Construct the Bayesian Net (BN) Nodes are the random variables Draw a directed arc from each variable in Pa(X i ) to X i Define a conditional probability table (CPT) for each variable X i : P(X i | Pa(X i )) 13

14 Defined a total order of the variables that follows the causal structure of events: Fire, Tampering, Alarm, Smoke, Leaving, Report Obtained the Bayesian network below, and its corresponding, very compact factorization of the original JPD P(F,T,A,S,L,R)= P(F)P (T ) P (A | F,T) P (S | F) P (L | A) P (R | L) Fire Diagnosis Example Tampering Fire Alarm Smoke Leaving Report 14

15 Example for BN construction: Fire Diagnosis We are not done yet: must specify the Conditional Probability Table (CPT) for each variable. All variables are Boolean. How many probabilities do we need to specify for this Bayesian network? For instance, how many probabilities do we need to explicitly specify for Fire? P(Fire): 1 probability –> P(Fire = T) because P(Fire = F) can be derived from that (1 minus) P(Fire=t) 0.01 15

16 Lecture Overview Recap lecture 18 Defining Conditional Probabilities in a Bnet Considerations on Network Structure Inference: Intro 16

17 Example for BN construction: Fire Diagnosis We are not done yet: must specify the Conditional Probability Table (CPT) for each variable. All variables are Boolean. How many probabilities do we need to specify for this Bayesian network? For instance, how many probabilities do we need to explicitly specify for Alarm? B. 2A. 1C. 4D. 8 P(Fire=t) 0.01 17

18 Example for BN construction: Fire Diagnosis We are not done yet: must specify the Conditional Probability Table (CPT) for each variable. All variables are Boolean. How many probabilities do we need to specify for this Bayesian network? For instance, how many probabilities do we need to explicitly specify for Alarm? P(Alarm|Tampering, Fire): 4 probabilities, 1 probability for each of the 4 instantiations of the parents P(Fire=t) 0.01 18

19 Tampering TFire FP(Alarm=t|T,F)P(Alarm=f|T,F) tt0.5 tf0.850.15 ft0.990.01 ff0.00010.9999 Example for BN construction: Fire Diagnosis We don’t need to speficy explicitly P(Alarm=f|T,F) since probabilities in each row must sum to 1 Each row of this table is a conditional probability distribution P(Fire=t) 0.01 19

20 Example for BN construction: Fire Diagnosis How many probabilities do we need to explicitly specify for the whole Bayesian network? B. 12 A. 6 C. 20D. 2 6 -1 20

21 Example for BN construction: Fire Diagnosis ……..probabilities in total, compared to the of the JPD for P(T,F,A,S,L,R) Tampering TFire FP(Alarm=t|T,F) tt0.5 tf0.85 ft0.99 ff0.0001 P(Tampering=t) 0.02 P(Fire=t) 0.01 Fire FP(Smoke=t |F) t0.9 f0.01 AlarmP(Leaving=t|A) t0.88 f0.001 LeavingP(Report=t|L) t0.75 f0.01 21

22 Example for BN construction: Fire Diagnosis We are not done yet: must specify the Conditional Probability Table (CPT) for each variable. All variables are Boolean. How many probabilities do we need to specify for this Bayesian network? P(Tampering): 1 probability P(T = t) P(Alarm|Tampering, Fire): 4 (independent) –1 probability for each of the 4 instantiations of the parents For all other variables with only one parent – 2 probabilities: one for the parent be true and one for the parent being false In total: 1+1+4+2+2+2 = 12 (compared to 2 6 -1= 63 for full JPD!) 22

23 Example for BN construction: Fire Diagnosis Tampering TFire FP(Alarm=t|T,F) tt0.5 tf0.85 ft0.99 ff0.0001 P(Tampering=t) 0.02 P(Fire=t) 0.01 Fire FP(Smoke=t |F) t0.9 f0.01 AlarmP(Leaving=t|A) t0.88 f0.001 LeavingP(Report=t|L) t0.75 f0.01 P(Tampering=t, Fire=f, Alarm=t, Smoke=f, Leaving=t, Report=t) = P(Tampering=t) x P(Fire=f)xP(Alarm=t| Tampering=t, Fire=f)xP(Smoke=f| Fire = f)xP(Leaving=t| Alarm=t) x P(Report=t|Leaving=t) = = 0.02 x (1-0.01) x 0.85 x (1-0.01) x 0.88 x 0.75 = 0.126 Once we have the CPTs in the network, we can compute any entry of the JPD 23

24 In Summary In a Belief network, the JPD of the variables involved is defined as the product of the local conditional distributions: P (X 1, …,X n ) = ∏ i P(X i | X 1, …,X i-1 ) = ∏ i P (X i | Parents(X i )) Any entry of the JPD can be computed given the CPTs in the network Once we know the JPD, we can answer any query about any subset of the variables - (see Inference by Enumeration topic) Thus, a Belief network allows one to answer any query on any subset of the variables We will see how this can be done efficiently, but first let’s think a bit more about network structure 24

25 Lecture Overview Recap lecture 18 Defining Conditional Probabilities in the network Considerations on Network Structure Inference: Intro 25

26 Compactness In a Bnet, how many rows do we need to explicitly store for the CPT of a Boolean variable X i with k Boolean parents? A. kB. 2kC. 2 k 26

27 Compactness A CPT for a Boolean variable X i with k Boolean parents has 2 k rows for the combinations of parent values Each row requires one number p for X i = true (the number for X i = false is just 1-p) If each variable has no more than k parents, the complete network requires to specify O(n · 2 k ) numbers For k<< n, this is a substantial improvement, the numbers required grow linearly with n, vs. O(2 n ) for the full joint distribution E.g., if we have a Bnets with 30 boolean variables, each with 5 parents Need to specify 30*2 5 probability But we need 2 30 for JPD 27

28 Realistic BNet: Liver Diagnosis Source: Onisko et al., 1999 ~ 60 nodes, max 4 parents per node Need ~ 60 x 2 4 = 15 x 2 6 probabilities instead of 2 60 probabilities for the JPD 28

29 Compactness What happens if the network is fully connected? Or k ≈ n Not much saving compared to the numbers needed to specify the full JPD Bnets are useful in sparse (or locally structured) domains Domains in with each component interacts with (is related to) a small fraction of other components What if this is not the case in a domain we need to reason about May need to make simplifying assumptions to reduce the dependencies in a domain 29

30 “Where do the numbers (CPTs) come from?” From experts Tedious Costly Not always reliable From data => Machine Learning There are algorithms to learn both structures and numbers (CPSC 340, CPSC 422) Can be hard to get enough data Still, usually better than specifying the full JPD 30

31 What if we use a different ordering? We end up with a completely different network structure! (try it as an exercise) Leaving Report Tampering Smoke Alarm Fire A. The previous structure B. This structure What happens if we use the following order: Leaving; Tampering; Report; Smoke; Alarm; Fire. Which of the two structures is better? C. There is no difference 31

32 Which Structure is Better? Non-causal network is less compact: 1+2+2+4+8+8 = 25 numbers needed Deciding on conditional independence is hard in non-causal directions Causal models and conditional independence seem hardwired for humans! Specifying the conditional probabilities may be harder than in causal direction For instance, we have lost the direct dependency between alarm and one of its causes, which essentially describes the alarm’s reliability (info often provided by the maker) Leaving Report Tampering Smoke Alarm Fire 32

33 Example contd. Other than that, our two Bnets for the Alarm problem are equivalent as long as they represent the same probability distribution P(T,F,A,S,L,R) = P (T) P (F) P (A | T,F) P (L | A) P (R|L) = = P(L)P(T|L)P(R|L)P(S|L,T)P(A|S,L,T) P(F|S,A,T) i.e., they are equivalent if the corresponding CPTs are specified so that they satisfy the equation above Leaving Report Tampering Smoke Alarm Fire Variable ordering: T,F,A,S,L,R Variable ordering: L,T,R,S,A,F 33

34 Are there wrong network structures? Given an order of variables, a network with arcs in excess to those required by the direct dependencies implied by that order are still ok Just not as efficient Leaving Report Tampering Smoke Alarm Fire P (L)P(T|L)P(R|L) P(S|L,R,T) P(A|S,L,T) P(F|S,A,T) = P (L)P(T|L)P(R|L)P(S|L,T)P(A|S,L,T) P(F|S,A,T) One extreme: the fully connected network is always correct but rarely the best choice It corresponds to just applying the chain rule to the JDP, without leveraging conditional independencies to simplify the factorization P(L,T,R,S,A,L)= P (L)P(T|L)P(R|L,T)P(S|L,T,R)P(A|S,L,T,R) P(F|S,A,T,L,R) 34

35 Are there wrong network structures? Leaving Report Tampering Smoke Alarm Fire P(L,T,R,S,A,L)= P (L)P(T|L)P(R|L,T)P(S|L,T,R)P(A|S,L,T,R) P(F|S,A,T,L,R) 35

36 Are there wrong network structures? How can a network structure be wrong? If it misses directed edges that are required E.g. an edge is missing below, making Fire conditionally independent of Alarm given Tampering and Smoke Leaving Report Tampering Smoke Alarm Fire But they are not: for instance, P(Fire = t| Smoke = f, Tampering = F, Alarm = T) should be higher than P(Fire = t| Smoke = f, Tampering = f), 36

37 Are there wrong network structures? How can a network structure be wrong? If it misses directed edges that are required E.g. an edge is missing below: Fire is not conditionally independent of Alarm | {Tampering, Smoke} Leaving Report Tampering Smoke Alarm Fire But remember what we said a few slides back. Sometimes we may need to make simplifying assumptions - e.g. assume conditional independence when it does not actually hold – in order to reduce complexity 37

38 In 1, 2 and 3, X and Y are dependent (grey areas represent existing evidence/observations) Summary of Dependencies in a Bayesian Network Z Z Z XY E E E 1 2 3 In 3, X and Y become dependent as soon as there is evidence on Z or on any of its descendants. This is because knowledge of one possible cause given evidence of the effect explains away the other cause 38

39 In 1, 2 and 3, X and Y are dependent (grey areas represent existing evidence/observations) Dependencies in a Bayesian Network: summary Z Z Z XY E E E 1 2 3 In 3, X and Y become dependent as soon as there is evidence on Z or on any of its descendants. This is because knowledge of one possible cause given evidence of the effect explains away the other cause Understood Material Assignment Grade Exam Grade Alarm Smoking At Sensor Fire Power(w 0 ) Lit(l 1 ) Up(s 2 ) 1 2 3 39

40 Or, blocking paths for probability propagation. Three ways in which a path between Y to X (or viceversa) can be blocked, given evidence E Or Conditional Independencies Z Z Z XYE 1 2 3 In 3, X and Y are independent if there is no evidence on their common effect (recall fire and tampering in the alarm example 40

41 Or, blocking paths for probability propagation. Three ways in which a path between Y to X (or viceversa) can be blocked, given evidence E Or Conditional Independencies Z Z Z XYE 1 2 3 In 3, X and Y are independent if there is no evidence on their common effect (recall fire and tampering in the alarm example Understood Material Assignment Grade Exam Grade Alarm Smoking At Sensor Fire Power(w 0 ) Lit(l 1 ) Up(s 2 ) 1 2 3 41

42 Conditional Independece in a Bnet Is H conditionally independent of E given I? Z Z Z XYE 1 2 3 A. True B. False C. It depends on the evidence 42

43 Conditional Independece in a Bnet Is H conditionally independent of E given I? Z Z Z XYE 1 2 3 T: Since we have no evidence on F, changes on the belief of G (caused by changes in the belief of H) do not affect belief on E 43

44 (In)Dependencies in a Bnet Is A conditionally independent of I given F? Z Z Z XYE 1 2 3 A. True B. False C. It depends on the evidence 44

45 (In)Dependencies in a Bnet Is A conditionally independent of I given F? Z Z Z XYE 1 2 3 False: Since we have evidence on F, changes to the belief on G due to changes in I affect belief on E, which in turns propagates to A via C and B What if node C is not there? 45

46 (In)Dependencies in a Bnet Is A conditionally independent of I given F? Z Z Z XYE 1 2 3 A. True B. False 46

47 (In)Dependencies in a Bnet Is A conditionally independent of I given F? Z Z Z XYE 1 2 3 T: lack on evidence on D blocks the path that goes through E, D, B, 47

48 Given a JPD Marginalize over specific variables Compute distributions over any subset of the variables Use inference by enumeration to compute joint posterior probability distributions over any subset of variables given evidence Define and use marginal and conditional independence Build a Bayesian Network for a given domain (structure) Specify the necessary conditional probabilities Compute the representational savings in terms of number of probabilities required Identify dependencies/independencies between nodes in a Bayesian Network Learning Goals so Far Now we will see how to do inference in BNETS 48

49 Lecture Overview Recap lecture 18 Defining Conditional Probabilities in a Bnet Considerations on Network Structure Inference: Intro 49

50 Where Are We? Environment Problem Type Query Planning Deterministic Stochastic Constraint Satisfaction Search Arc Consistency Search Logics STRIPS Vars + Constraints Value Iteration Variable Elimination Belief Nets Decision Nets Markov Processes Static Sequential Representation Reasoning Technique Variable Elimination We’ll focus on Variable Elimination 50

51 Inference in Bayesian Networks Given: A Bayesian Network BN, and Observations of a subset of its variables E: E=e A subset of its variables Y that is queried Compute: The conditional probability P(Y|E=e) We denote observed variables as shaded. Examples: UnderstoodMaterial Assignment Grade Exam Grade Alarm Smoking At Sensor Fire 51

52 Bayesian Networks: Types of Inference Diagnostic People are leaving P(L=t)=1 P(F|L=t)=? PredictiveIntercausal Mixed Fire happens P(F=t)=1 P(L|F=t)=? Alarm goes off P(A=T) = 1 P(F|A=t,T=t)=? People are leaving P(L=t)=1 There is no fire F=f P(A|F=f,L=t)=? Person smokes next to sensor S=t Fire Alarm Leaving Fire Alarm Leaving Fire Alarm Leaving Fire Alarm Smoking at Sensor We will use the same reasoning procedure for all of these types 52

53 Bayesian Networks - AISpace Try queries with the Belief Networks applet in AISpace, Load the Fire Alarm example (ex. 6.10 in textbook) Compare the probability of each query node before and after the new evidence is observed try first to predict how they should change using your understanding of the domain Make sure you explore and understand the various other example networks in textbook and Aispace Electrical Circuit example (textbook ex 6.11) Patient’s wheezing and coughing example (ex. 6.14) Not in applet, but you can easily insert it Several other examples in AIspace 53

54 Example Suppose that We observe people leaving Should the probability of Tampering A. Go up B. Go down C. Stay the same 54

55 Example Suppose that, after observing people leaving, we then observe smoke Now should the probability of Tampering A. Go up B. Go down C. Stay the same 55

56 Bayesian Networks - AISpace Try queries with the Belief Networks applet in AISpace, Load the Fire Alarm example (ex. 6.10 in textbook) Compare the probability of each query node before and after the new evidence is observed try first to predict how they should change using your understanding of the domain Make sure you explore and understand the various other example networks in textbook and Aispace Electrical Circuit example (textbook ex 6.11) Patient’s wheezing and coughing example (ex. 6.14) Not in applet, but you can easily insert it Several other examples in AIspace 56


Download ppt "Computer Science CPSC 322 Lecture 19 Bayesian Networks: Construction 1."

Similar presentations


Ads by Google