Directed Graphical Probabilistic Models: the sequel William W. Cohen Machine Learning 10-601 Feb 2008
Summary of Monday(1): Bayes nets Many problems can be solved using the joint probability P(X1,…,Xn). Bayes nets describe a way to compactly write the joint. For a Bayes net: A P(A) 1 0.33 2 3 First guess The money B P(B) 1 0.33 2 3 A B Stick or swap? C The goat D A B C P(C|A,B) 1 2 0.5 3 1.0 … E Conditional independence: Second guess A C D P(E|A,C,D) …
Aside: Conditional Independence? (Chain rule – always true) (Fancy version of c.r.) (Def’n of cond. Indep.) Caveat divisor: we’ll usually assume no probabilities are zero, so division is safe….
Summary of Monday(2): d-separation X E Y There are three ways paths from X to Y given evidence E can be blocked. X is d-separated from Y given E iff all paths from X to Y given E are blocked…see there? If X is d-separated from Y given E, then I<X,E,Y> Z Z Z
d-separation continued… X Y Question: is E X P(X) 0.5 1 ? E Y P(Y|E) 0.5 1 It depends…on the CPTs X E P(E|X) 0.01 1 0.99 This is why d-separation => independence but not the converse… E Y P(Y|E) 0.01 1 0.99
d-separation continued… X Y Question: is E Yes! ?
d-separation continued… X Y Question: is E Yes! ? Bayes rule Fancier version of B.R. From previous slide…
An aside: computations with Bayes Nets X Y Question: what is ? E Main point: inference has no preferred direction in a Bayes net
d-separation X E Y Z Z Z
d-separation X E Y ? Z ? Z Z
d-separation continued… X Y Question: is E Yes! ?
d-separation continued… X Y Question: is E No ? E P(E) 0.5 1
d-separation X E Y ? Z ? Z Z
d-separation continued… X Y Question: is E Yes! ?
d-separation continued… X Y Question: is E ? X Y E P(E|X,Y) P(E,X,Y) 0.96 0.24 1 0.04 0.01
d-separation continued… X Y Question: is E No! ? X Y E P(E|X,Y) P(E,X,Y) 0.96 0.24 1 0.04 0.01
d-separation continued… X Y Question: is E No! ? X Y E P(E|X,Y) P(E,X,Y) 0.96 0.24 1 0.04 0.01
“Explaining away” NO X Y E YES This is “explaining away”: E is common symptom of two causes, X and Y After observing E=1, both X and Y become more probable After observing E=1 and X=1, Y becomes less probable since X alone is enough to “explain” E
“Explaining away” and common-sense Historical note: Classical logic is monotonic: the more you know, the more you deduce. “Common-sense” reasoning is not monotonic birds fly but, not after being cooked for 20min/lb at 350o F This led to numerous “non-monotonic logics” for AI This examples shows that Bayes nets are not monotonic If P(Y|E) is “your belief” in Y after observing E, and P(Y|X,E) is “your belief” in Y after observing E,X your belief in Y decreases after you discover X
A special case: linear chain networks X1 Xn ... Xj ... d-separation “backward” “forward”
A special case: linear chain networks X1 Xn ... Xj ... Fwd: Recursion! (fwd) CPT entry
A special case: linear chain networks X1 Xn ... Xj ... Back: Chain rule CPT Recursion backward
A special case: linear chain networks X1 Xn ... Xj ... “backward” “forward” Instead of recursion: iteratively compute P(Xj|x1) from P(Xj-1|x1) – the forward probabilities iteratively compute P(xn|Xj) from P(xn|Xj+1) – the backward probabilities can view the forward computations as passing a “message” forward and vice versa
Linear-chain message passing How long is this !#@ line? 1 1 Xj ... Xn X1 E+ E- j-1 n-j 2 … … ? ? … How many ahead? … How many behind?
Linear-chain message passing P(Xj|E) = P(X|E+)P(X|E-) … true by d-separation Xj ... Xn X1 E+ E- Pass forward: P(Xj|E+)…computed from P(Xj-1|E+) and CPT for Xj Pass backward: P(Xj|E-)…computed from P(Xj+1|E-) and CPT for Xj+1
Inference in Bayes nets General problem: given evidence E1,…,Ek compute P(X|E1,..,Ek) for any X Big assumption: graph is “polytree” <=1 undirected path between any nodes X,Y Notation: U1 U2 X Z1 Z2 Y1 Y2
Inference in Bayes nets: P(X|E)
Inference in Bayes nets: P(X|E+) d-sep – write as product d-sep.
Inference in Bayes nets: P(X|E+) d-sep – write as product d-sep. CPT table lookup So far: simple way of propogating “belief due to causal evidence” up the tree Recursive call to P(.|E+)
Inference in Bayes nets: P(E-|X) recursion
Inference in Bayes nets: P(E-|X) recursion
Inference in Bayes nets: P(E-|X)
Inference in Bayes nets: P(E-|X) Recursive call to P(.|E) CPT Recursive call to P(E-|.) where
More on Message Passing We reduced P(X|E) to product of two recursively calculated parts: P(X=x|E+) i.e., CPT for X and product of “forward” messages from parents P(E-|X=x) i.e., combination of “backward” messages from parents, CPTs, and P(Z|EZ\Yk), a simpler instance of P(X|E) This can also be implemented by message-passing (belief propogation)
Learning for Bayes nets Input: Sample of the joint: Graph structure of the variables for I=1,…,N, you know Xi and parents(Xi) Output: Estimated CPTs A B B P(B) 1 0.33 2 3 C D Method (discrete variables): Estimate each CPT independently Use a MLE or MAP A B C P(C|A,B) 1 2 0.5 3 1.0 … E …
Learning for Bayes nets Method (discrete variables): Estimate each CPT independently Use a MLE or MAP MLE: A B B P(B) 1 0.33 2 3 C D A B C P(C|A,B) 1 2 0.5 3 1.0 … E …
MAP estimates The beta distribution: “pseudo-data”: like hallucinating a few heads and a few tails
The Dirichlet distribution: MAP estimates The Dirichlet distribution: “pseudo-data”: like hallucinating αi examples of X=i for each value of i
Learning for Bayes nets Method (discrete variables): Estimate each CPT independently Use a MLE or MAP MAP: A B B P(B) 1 0.33 2 3 C D A B C P(C|A,B) 1 2 0.5 3 1.0 … E …
Additional reading ftp://ftp.research.microsoft.com/pub/tr/tr-95-06.pdf A Tutorial on Learning With Bayesian Networks, Heckerman