Download presentation
Presentation is loading. Please wait.
Published byBeryl Young Modified over 5 years ago
1
Directed Graphical Probabilistic Models: the sequel
William W. Cohen Machine Learning Feb 2008
2
Summary of Monday(1): Bayes nets
Many problems can be solved using the joint probability P(X1,…,Xn). Bayes nets describe a way to compactly write the joint. For a Bayes net: A P(A) 1 0.33 2 3 First guess The money B P(B) 1 0.33 2 3 A B Stick or swap? C The goat D A B C P(C|A,B) 1 2 0.5 3 1.0 … E Conditional independence: Second guess A C D P(E|A,C,D) …
3
Aside: Conditional Independence?
(Chain rule – always true) (Fancy version of c.r.) (Def’n of cond. Indep.) Caveat divisor: we’ll usually assume no probabilities are zero, so division is safe….
4
Summary of Monday(2): d-separation
X E Y There are three ways paths from X to Y given evidence E can be blocked. X is d-separated from Y given E iff all paths from X to Y given E are blocked…see there? If X is d-separated from Y given E, then I<X,E,Y> Z Z Z
5
d-separation continued…
X Y Question: is E X P(X) 0.5 1 ? E Y P(Y|E) 0.5 1 It depends…on the CPTs X E P(E|X) 0.01 1 0.99 This is why d-separation => independence but not the converse… E Y P(Y|E) 0.01 1 0.99
6
d-separation continued…
X Y Question: is E Yes! ?
7
d-separation continued…
X Y Question: is E Yes! ? Bayes rule Fancier version of B.R. From previous slide…
8
An aside: computations with Bayes Nets
X Y Question: what is ? E Main point: inference has no preferred direction in a Bayes net
9
d-separation X E Y Z Z Z
10
d-separation X E Y ? Z ? Z Z
11
d-separation continued…
X Y Question: is E Yes! ?
12
d-separation continued…
X Y Question: is E No ? E P(E) 0.5 1
13
d-separation X E Y ? Z ? Z Z
14
d-separation continued…
X Y Question: is E Yes! ?
15
d-separation continued…
X Y Question: is E ? X Y E P(E|X,Y) P(E,X,Y) 0.96 0.24 1 0.04 0.01
16
d-separation continued…
X Y Question: is E No! ? X Y E P(E|X,Y) P(E,X,Y) 0.96 0.24 1 0.04 0.01
17
d-separation continued…
X Y Question: is E No! ? X Y E P(E|X,Y) P(E,X,Y) 0.96 0.24 1 0.04 0.01
18
“Explaining away” NO X Y E YES This is “explaining away”:
E is common symptom of two causes, X and Y After observing E=1, both X and Y become more probable After observing E=1 and X=1, Y becomes less probable since X alone is enough to “explain” E
19
“Explaining away” and common-sense
Historical note: Classical logic is monotonic: the more you know, the more you deduce. “Common-sense” reasoning is not monotonic birds fly but, not after being cooked for 20min/lb at 350o F This led to numerous “non-monotonic logics” for AI This examples shows that Bayes nets are not monotonic If P(Y|E) is “your belief” in Y after observing E, and P(Y|X,E) is “your belief” in Y after observing E,X your belief in Y decreases after you discover X
20
A special case: linear chain networks
X1 Xn ... Xj ... d-separation “backward” “forward”
21
A special case: linear chain networks
X1 Xn ... Xj ... Fwd: Recursion! (fwd) CPT entry
22
A special case: linear chain networks
X1 Xn ... Xj ... Back: Chain rule CPT Recursion backward
23
A special case: linear chain networks
X1 Xn ... Xj ... “backward” “forward” Instead of recursion: iteratively compute P(Xj|x1) from P(Xj-1|x1) – the forward probabilities iteratively compute P(xn|Xj) from P(xn|Xj+1) – the backward probabilities can view the forward computations as passing a “message” forward and vice versa
24
Linear-chain message passing
How long is this line? 1 1 Xj ... Xn X1 E+ E- j-1 n-j 2 … … ? ? … How many ahead? … How many behind?
25
Linear-chain message passing
P(Xj|E) = P(X|E+)P(X|E-) … true by d-separation Xj ... Xn X1 E+ E- Pass forward: P(Xj|E+)…computed from P(Xj-1|E+) and CPT for Xj Pass backward: P(Xj|E-)…computed from P(Xj+1|E-) and CPT for Xj+1
26
Inference in Bayes nets
General problem: given evidence E1,…,Ek compute P(X|E1,..,Ek) for any X Big assumption: graph is “polytree” <=1 undirected path between any nodes X,Y Notation: U1 U2 X Z1 Z2 Y1 Y2
27
Inference in Bayes nets: P(X|E)
28
Inference in Bayes nets: P(X|E+)
d-sep – write as product d-sep.
29
Inference in Bayes nets: P(X|E+)
d-sep – write as product d-sep. CPT table lookup So far: simple way of propogating “belief due to causal evidence” up the tree Recursive call to P(.|E+)
30
Inference in Bayes nets: P(E-|X)
recursion
31
Inference in Bayes nets: P(E-|X)
recursion
32
Inference in Bayes nets: P(E-|X)
33
Inference in Bayes nets: P(E-|X)
Recursive call to P(.|E) CPT Recursive call to P(E-|.) where
34
More on Message Passing
We reduced P(X|E) to product of two recursively calculated parts: P(X=x|E+) i.e., CPT for X and product of “forward” messages from parents P(E-|X=x) i.e., combination of “backward” messages from parents, CPTs, and P(Z|EZ\Yk), a simpler instance of P(X|E) This can also be implemented by message-passing (belief propogation)
35
Learning for Bayes nets
Input: Sample of the joint: Graph structure of the variables for I=1,…,N, you know Xi and parents(Xi) Output: Estimated CPTs A B B P(B) 1 0.33 2 3 C D Method (discrete variables): Estimate each CPT independently Use a MLE or MAP A B C P(C|A,B) 1 2 0.5 3 1.0 … E …
36
Learning for Bayes nets
Method (discrete variables): Estimate each CPT independently Use a MLE or MAP MLE: A B B P(B) 1 0.33 2 3 C D A B C P(C|A,B) 1 2 0.5 3 1.0 … E …
37
MAP estimates The beta distribution:
“pseudo-data”: like hallucinating a few heads and a few tails
38
The Dirichlet distribution:
MAP estimates The Dirichlet distribution: “pseudo-data”: like hallucinating αi examples of X=i for each value of i
39
Learning for Bayes nets
Method (discrete variables): Estimate each CPT independently Use a MLE or MAP MAP: A B B P(B) 1 0.33 2 3 C D A B C P(C|A,B) 1 2 0.5 3 1.0 … E …
40
Additional reading ftp://ftp.research.microsoft.com/pub/tr/tr pdf A Tutorial on Learning With Bayesian Networks, Heckerman
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.