Probabilistic Reasoning

Probabilistic Reasoning

Power of Cond. Independence
Often, using conditional independence reduces the storage complexity of the joint distribution from exponential to linear!! Conditional independence is the most basic & robust form of knowledge about uncertain environments. © Daniel S. Weld

Bayes Rule Simple proof from def of conditional probability: QED:
(Def. cond. prob.) (Def. cond. prob.) قانون بیز (Mult by P(H) in line 1) QED: (Substitute #3 in #2) © Daniel S. Weld

Use to Compute Diagnostic Probability from Causal Probability
E.g. let M be meningitis, S be stiff neck P(M) = , P(S) = 0.1, P(S|M)= 0.8 P(M|S) = © Daniel S. Weld

Bayes’ Rule & Cond. Independence
© Daniel S. Weld

Bayes Nets In general, joint distribution P over set of variables (X1 x ... x Xn) requires exponential space for representation & inference BNs provide a graphical representation of conditional independence relations in P usually quite compact requires assessment of fewer parameters, those being quite natural (e.g., causal) efficient (usually) inference: query answering and belief update © Daniel S. Weld

BN What do the arrows really mean?
Topology may happen to encode causal structure Topology only guaranteed to encode conditional independence © Daniel S. Weld

P(X1, X2,... Xn ) = P(X1)P(X2)... P(Xn)
Independence If X1, X2,... Xn are mutually independent, then P(X1, X2,... Xn ) = P(X1)P(X2)... P(Xn) Joint can be specified with n parameters cf. the usual 2n-1 parameters required While extreme independence is unusual, Conditional independence is common BNs exploit this conditional independence © Daniel S. Weld

An Example Bayes Net P(E) 0.002 Earthquake Burglary Alarm Nbr1Calls
Pr(B=t)) 0.001 Earthquake Burglary Pr(A|E,B) e,b e,b e,b e,b Alarm Nbr1Calls Nbr2Calls A P(N1) T 0.9 F 0.05 A P(N2) T 0.7 F 0.01 © Daniel S. Weld

BNs: Qualitative Structure
Graphical structure of BN reflects conditional independence among variables Each variable X is a node in the DAG Edges denote direct probabilistic influence usually interpreted causally parents of X are denoted Par(X) X is conditionally independent of all nondescendents given its parents Graphical test exists for more general independence “Markov Blanket” © Daniel S. Weld

Given Parents, X is Independent of Non-Descendants
© Daniel S. Weld

For Example Earthquake Burglary Radio Alarm Nbr1Calls Nbr2Calls
© Daniel S. Weld

Given Markov Blanket, X is Independent of All Other Nodes
MB(X) = Par(X)  Childs(X)  Par(Childs(X)) © Daniel S. Weld

Conditional Probability Tables
Pr(B=t) Pr(B=f) Earthquake Burglary Pr(A|E,B) e,b (0.1) e,b (0.8) e,b (0.15) e,b (0.99) Radio Alarm Nbr1Calls Nbr2Calls © Daniel S. Weld

Conditional Probability Tables
For complete spec. of joint dist., quantify BN For each variable X, specify CPT: P(X | Par(X)) number of params locally exponential in |Par(X)| If X1, X2,... Xn is any topological sort of the network, then we are assured: P(Xn,Xn-1,...X1) = P(Xn| Xn-1,...X1)·P(Xn-1 | Xn-2,… X1) … P(X2 | X1) · P(X1) = P(Xn| Par(Xn)) · P(Xn-1 | Par(Xn-1)) … P(X1) © Daniel S. Weld

Bayes Nets Representation Summary
Bayes nets compactly encode joint distributions Guaranteed independencies of distributions can be deduced from BN graph structure D-separation gives precise conditional independence guarantees from graph alone A Bayes’ net’s joint distribution may have further (conditional) independence that is not detectable until you inspect its specific distribution © Daniel S. Weld

Inference in BNs Inference: calculating some useful quantity from a joint probability distribution The graphical independence representation yields efficient inference schemes Posterior probability: Most likely explanation: Computations organized by network topology © Daniel S. Weld

Inference by Enumeration
Given unlimited time, inference in BNs is easy Recipe: State the marginal probabilities you need Figure out ALL the atomic probabilities you need Calculate and combine them E.g. © Daniel S. Weld

In this simple method, we only need the BN to synthesize the joint entries
© Daniel S. Weld

Inference by Enumeration?
© Daniel S. Weld

Variable Elimination Why is inference by enumeration so slow?
You join up the whole joint distribution before you sum out the hidden variables You end up repeating a lot of work! Idea: interleave joining and marginalizing! Called “Variable Elimination” Still NP-hard, but usually much faster than inference by enumeration We’ll need some new notation to define VE 2 © Daniel S. Weld

Factor 1 Joint distribution: P(X,Y) Selected joint: P(x,Y)
Entries P(x,y) for all x, y Sums to 1 Selected joint: P(x,Y) A slice of the joint distribution Entries P(x,y) for fixed x, all y Sums to P(x) © Daniel S. Weld

Specified family: P(y | X) Entries P(y | x) for fixed y, but for all x
Sums to … who knows! In general, when we write P(Y1 … YN | X1 … XM) It is a “factor,” a multi-dimensional array Its values are all P(y1 … yN | x1 … xM) Any assigned X or Y is a dimension missing (selected) from the array © Daniel S. Weld

Example: Traffic Domain
Random Variables R: Raining T: Traffic L: Late for class! First query: P(L) © Daniel S. Weld

Operation 1: Join Factors
First basic operation: joining factors Combining factors: Just like a database join Get all factors over the joining variable Build a new factor over the union of the variables involved Example: Join on R Computation for each entry: pointwise products © Daniel S. Weld

Operation 2: Eliminate Second basic operation: marginalization
Take a factor and sum out a variable Shrinks a factor to a smaller one A projection operation Example: © Daniel S. Weld

Evidence If evidence, start with factors that select that evidence
No evidence uses these initial factors: Computing P(L|+r) , the initial factors become: We eliminate all vars other than query + evidence © Daniel S. Weld

Evidence II esult will be a selected joint of query and evidence
E.g. for P(L | +r), we’d end up with: To get our answer, just normalize this © Daniel S. Weld

General Variable Elimination
Query: Start with initial factors: Local CPTs (but instantiated by evidence) While there are still hidden variables (not Q or evidence): Pick a hidden variable H Join all factors mentioning H Eliminate (sum out) H Join all remaining factors and normalize © Daniel S. Weld

Notes on VE Each operation is a simply multiplication of factors and summing out a variable Complexity determined by size of largest factor linear in number of vars, exponential in largest factorelimination ordering greatly impacts factor size optimal elimination orderings: NP-hard heuristics, special structure (e.g., polytrees) On tree-structured graphs, variable elimination runs in polynomial time, like tree-structured CSPs Practically, inference is much more tractable using structure of this sort © Daniel S. Weld

Probabilistic Reasoning

Similar presentations

Presentation on theme: "Probabilistic Reasoning"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Probabilistic Reasoning

Similar presentations

Presentation on theme: "Probabilistic Reasoning"— Presentation transcript:

Similar presentations

About project

Feedback