Download presentation
Presentation is loading. Please wait.
1
Probabilistic Reasoning
2
Power of Cond. Independence
Often, using conditional independence reduces the storage complexity of the joint distribution from exponential to linear!! Conditional independence is the most basic & robust form of knowledge about uncertain environments. © Daniel S. Weld
3
Bayes Rule Simple proof from def of conditional probability: QED:
(Def. cond. prob.) (Def. cond. prob.) قانون بیز (Mult by P(H) in line 1) QED: (Substitute #3 in #2) © Daniel S. Weld
4
Use to Compute Diagnostic Probability from Causal Probability
E.g. let M be meningitis, S be stiff neck P(M) = , P(S) = 0.1, P(S|M)= 0.8 P(M|S) = © Daniel S. Weld
5
Bayes’ Rule & Cond. Independence
© Daniel S. Weld
6
Bayes Nets In general, joint distribution P over set of variables (X1 x ... x Xn) requires exponential space for representation & inference BNs provide a graphical representation of conditional independence relations in P usually quite compact requires assessment of fewer parameters, those being quite natural (e.g., causal) efficient (usually) inference: query answering and belief update © Daniel S. Weld
7
BN What do the arrows really mean?
Topology may happen to encode causal structure Topology only guaranteed to encode conditional independence © Daniel S. Weld
8
P(X1, X2,... Xn ) = P(X1)P(X2)... P(Xn)
Independence If X1, X2,... Xn are mutually independent, then P(X1, X2,... Xn ) = P(X1)P(X2)... P(Xn) Joint can be specified with n parameters cf. the usual 2n-1 parameters required While extreme independence is unusual, Conditional independence is common BNs exploit this conditional independence © Daniel S. Weld
9
An Example Bayes Net P(E) 0.002 Earthquake Burglary Alarm Nbr1Calls
Pr(B=t)) 0.001 Earthquake Burglary Pr(A|E,B) e,b e,b e,b e,b Alarm Nbr1Calls Nbr2Calls A P(N1) T 0.9 F 0.05 A P(N2) T 0.7 F 0.01 © Daniel S. Weld
10
Earthquake Example (con’t)
Burglary Alarm Nbr2Calls Nbr1Calls If I know if Alarm, no other evidence influences my degree of belief in Nbr1Calls P(N1|N2,A,E,B) = P(N1|A) also: P(N2|N2,A,E,B) = P(N2|A) and P(E|B) = P(E) By the chain rule we have P(N1,N2,A,E,B) = P(N1|N2,A,E,B) ·P(N2|A,E,B)· P(A|E,B) ·P(E|B) ·P(B) = P(N1|A) ·P(N2|A) ·P(A|B,E) ·P(E) ·P(B) Full joint requires only 10 parameters (cf. 32) 10 = © Daniel S. Weld
11
BNs: Qualitative Structure
Graphical structure of BN reflects conditional independence among variables Each variable X is a node in the DAG Edges denote direct probabilistic influence usually interpreted causally parents of X are denoted Par(X) X is conditionally independent of all nondescendents given its parents Graphical test exists for more general independence “Markov Blanket” © Daniel S. Weld
12
Given Parents, X is Independent of Non-Descendants
© Daniel S. Weld
13
For Example Earthquake Burglary Radio Alarm Nbr1Calls Nbr2Calls
© Daniel S. Weld
14
Given Markov Blanket, X is Independent of All Other Nodes
MB(X) = Par(X) Childs(X) Par(Childs(X)) © Daniel S. Weld
15
Conditional Probability Tables
Pr(B=t) Pr(B=f) Earthquake Burglary Pr(A|E,B) e,b (0.1) e,b (0.8) e,b (0.15) e,b (0.99) Radio Alarm Nbr1Calls Nbr2Calls © Daniel S. Weld
16
Conditional Probability Tables
For complete spec. of joint dist., quantify BN For each variable X, specify CPT: P(X | Par(X)) number of params locally exponential in |Par(X)| If X1, X2,... Xn is any topological sort of the network, then we are assured: P(Xn,Xn-1,...X1) = P(Xn| Xn-1,...X1)·P(Xn-1 | Xn-2,… X1) … P(X2 | X1) · P(X1) = P(Xn| Par(Xn)) · P(Xn-1 | Par(Xn-1)) … P(X1) © Daniel S. Weld
17
Bayes Nets Representation Summary
Bayes nets compactly encode joint distributions Guaranteed independencies of distributions can be deduced from BN graph structure D-separation gives precise conditional independence guarantees from graph alone A Bayes’ net’s joint distribution may have further (conditional) independence that is not detectable until you inspect its specific distribution © Daniel S. Weld
18
Inference in BNs Inference: calculating some useful quantity from a joint probability distribution The graphical independence representation yields efficient inference schemes Posterior probability: Most likely explanation: Computations organized by network topology © Daniel S. Weld
19
P(b|j,m) = P(b) P(e) P(a|b,e)P(j|a)P(m,a)
P(B | J=true, M=true) Earthquake Burglary Radio Alarm John Mary P(b|j,m) = P(b) P(e) P(a|b,e)P(j|a)P(m,a) e a © Daniel S. Weld
20
Inference by Enumeration
Given unlimited time, inference in BNs is easy Recipe: State the marginal probabilities you need Figure out ALL the atomic probabilities you need Calculate and combine them E.g. © Daniel S. Weld
21
In this simple method, we only need the BN to synthesize the joint entries
© Daniel S. Weld
22
Inference by Enumeration?
© Daniel S. Weld
23
Variable Elimination Why is inference by enumeration so slow?
You join up the whole joint distribution before you sum out the hidden variables You end up repeating a lot of work! Idea: interleave joining and marginalizing! Called “Variable Elimination” Still NP-hard, but usually much faster than inference by enumeration We’ll need some new notation to define VE 2 © Daniel S. Weld
24
Factor 1 Joint distribution: P(X,Y) Selected joint: P(x,Y)
Entries P(x,y) for all x, y Sums to 1 Selected joint: P(x,Y) A slice of the joint distribution Entries P(x,y) for fixed x, all y Sums to P(x) © Daniel S. Weld
25
Factor 2 Family of conditionals: P(X |Y) Single conditional: P(Y | x)
Multiple conditionals Entries P(x | y) for all x, y Sums to |Y| Single conditional: P(Y | x) Entries P(y | x) for fixed x, all y Sums to 1 © Daniel S. Weld
26
Specified family: P(y | X) Entries P(y | x) for fixed y, but for all x
Sums to … who knows! In general, when we write P(Y1 … YN | X1 … XM) It is a “factor,” a multi-dimensional array Its values are all P(y1 … yN | x1 … xM) Any assigned X or Y is a dimension missing (selected) from the array © Daniel S. Weld
27
Example: Traffic Domain
Random Variables R: Raining T: Traffic L: Late for class! First query: P(L) © Daniel S. Weld
28
Operation 1: Join Factors
First basic operation: joining factors Combining factors: Just like a database join Get all factors over the joining variable Build a new factor over the union of the variables involved Example: Join on R Computation for each entry: pointwise products © Daniel S. Weld
29
Example: Multiple Joins
© Daniel S. Weld
30
© Daniel S. Weld
31
Operation 2: Eliminate Second basic operation: marginalization
Take a factor and sum out a variable Shrinks a factor to a smaller one A projection operation Example: © Daniel S. Weld
32
Multiple Elimination © Daniel S. Weld
33
P(L) : Marginalizing Early!
© Daniel S. Weld
34
Marginalizing Early-2 © Daniel S. Weld
35
Evidence If evidence, start with factors that select that evidence
No evidence uses these initial factors: Computing P(L|+r) , the initial factors become: We eliminate all vars other than query + evidence © Daniel S. Weld
36
Evidence II esult will be a selected joint of query and evidence
E.g. for P(L | +r), we’d end up with: To get our answer, just normalize this © Daniel S. Weld
37
General Variable Elimination
Query: Start with initial factors: Local CPTs (but instantiated by evidence) While there are still hidden variables (not Q or evidence): Pick a hidden variable H Join all factors mentioning H Eliminate (sum out) H Join all remaining factors and normalize © Daniel S. Weld
38
Variable Elimination Bayes Rule
© Daniel S. Weld
39
Example 2 © Daniel S. Weld
40
© Daniel S. Weld
41
Notes on VE Each operation is a simply multiplication of factors and summing out a variable Complexity determined by size of largest factor linear in number of vars, exponential in largest factorelimination ordering greatly impacts factor size optimal elimination orderings: NP-hard heuristics, special structure (e.g., polytrees) On tree-structured graphs, variable elimination runs in polynomial time, like tree-structured CSPs Practically, inference is much more tractable using structure of this sort © Daniel S. Weld
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.