Bayesian Networks VISA Hyoungjune Yi
BN – Intro. Introduced by Pearl (1986 ) Resembles human reasoning Causal relationship Decision support system/ Expert System
Common Sense Reasoning about uncertainty June is waiting for Larry and Jacobs who are both late for VISA seminar June is worried that if the roads are icy one or both of them may have crash his car Suddenly June learns that Larry has crashed June think: “If Larry has crashed then probably the roads are icy. So Jacobs has also crashed” June then learns that it is warm outside and roads are salted June Think: “Larry was unlucky; Jacobs should still make it”
Causal Relationships State of Road Icy/ not icy Jacobs Crash/No crash Larry Crash/No crash
Larry Crashed ! State of Road Icy/ not icy Jacobs Crash/No crash Larry Crash/No crash Information Flow
But Roads are dry State of Road not icy Jacobs Crash/No crash Larry Crash/No crash Information Flow
Wet grass To avoid icy roads, Larry moves to UCLA; Jacobs moves in USC One morning as Larry leaves for work, he notices that his grass is wet. He wondered whether he has left his sprinkler on or it has rained Glancing over to Jacobs’ lawn he notices that it is also get wet Larry thinks: “Since Jacobs’ lawn is wet, it probably rained last night” Larry then thinks: “If it rained then that explains why my lawn is wet, so probably the sprinkler is off”
Larry’s grass is wet Rain Yes/no Larry’s grass Wet Jacobs grass Wet/Dry Information Flow Sprinkler On/Off
Jacobs’ grass is also wet Rain Yes/no Larry’s grass Wet Jacobs grass Wet Information Flow Sprinkler On/Off
Bayesian Network Data structure which represents the dependence between variables Gives concise specification of joint prob. dist. Bayesian Belief Network is a graph that holds – Nodes are a set of random variables – Each node has a conditional prob. Table – Edges denote conditional dependencies – DAG : No directed cycle – Markov condition
Bayesian network Markov Assumption – Each random variable X is independent of its non- descendent given its parent Pa(X) – Formally, Ind(X; NonDesc(X) | Pa(X)) if G is an I-MAP of P (<-? ) I-MAP? Later X Y1Y1 Y2Y2
Markov Assumption In this example: – Ind( E; B ) – Ind( B; E, R ) – Ind( R; A, B, C | E ) – Ind( A; R | B, E ) – Ind( C; B, E, R | A) Earthquake Radio Burglary Alarm Call
I-Maps A DAG G is an I-Map of a distribution P if the all Markov assumptions implied by G are satisfied by P Examples: XYXY
I-MAP G is Minimal I-Map iff – G is I-Map of P – If G’ G then G’ is not an I-Map of P I-Map is not unique
Factorization Given that G is an I-Map of P, can we simplify the representation of P ? Example: Since Ind(X;Y), we have that P(X|Y) = P(X) Applying the chain rule P(X,Y) = P(X|Y) P(Y) = P(X) P(Y) Thus, we have a simpler representation of P(X,Y) XY
Factorization Theorem Thm: if G is an I-Map of P, then P(C,A,R,E,B) = P(B)P(E|B)P(R|E,B)P(A|R,B,E)P(C|A,R,B,E) versus P(C,A,R,E,B) = P(B) P(E) P(R|E) P(A|B,E) P(C|A) Earthquake Radio Burglary Alarm Call
So, what ? We can write P in terms of “local” conditional probabilities If G is sparse, that is, |Pa(Xi)| < k, each conditional probability can be specified compactly e.g. for binary variables, these require O(2k) params. representation of P is compact linear in number of variables
Formal definition of BN A Bayesian network specifies a probability distribution via two components: – A DAG G – A collection of conditional probability distributions P(Xi|Pai) The joint distribution P is defined by the factorization Additional requirement: G is a minimal I-Map of P
Bayesian Network - Example Each node X i has a conditional probability distribution P(X i |Pa i ) – If variables are discrete, P is usually multinomial – P can be linear Gaussian, mixture of Gaussians, … XRay Lung Infiltrates Sputum Smear TuberculosisPneumonia p t p tp t t p T P P(I |P, T )
BN Semantics Compact & natural representation: – nodes have k parents 2k n vs. 2n params conditional independencies in BN structure + local probability models full joint distribution over domain = X I S TP
d-separation d-sep(X;Y | Z, G) – X is d-separated from Y, given Z if all paths from a node in X to a node in Y are blocked given Z Meaning ? – On the blackboard – Path Active: dependency between end nodes in the path Blocked: No dependency – Common cause, Intermediate, common effect On the blackboard
BN – Belief, Evidence and Query BN is for “Query” - partly Query involves evidence – Evidence is an assignment of values to a set of variables in the domain Query is a posteriori belief – Belief P(x) = 1 or P(x) = 0
Learning Structure Problem Definition – Given: Data D – Return: directed graph expressing BN Issue – Superfluous edges – Missing edges Very difficult –
BN Learning BN models can be learned from empirical data – parameter estimation via numerical optimization – structure learning via combinatorial search. BN hypothesis space biased towards distributions with independence structure. Induce r Data X I S TP