Download presentation
Presentation is loading. Please wait.
1
Representing Uncertainty
2
Aspects of Uncertainty
Suppose you have a flight at 12 noon When should you leave for SEATAC What are traffic conditions? How crowded is security? Leaving 18 hours early may get you there But … ? © Daniel S. Weld
3
Decision Theory = Probability + Utility Theory
Min before noon P(arrive-in-time) 20 min 30 min 45 min 60 min 120 min 1080 min Depends on your preferences Utility theory: representing & reasoning about preferences © Daniel S. Weld
4
What Is Probability? Probability: Calculus for dealing with nondeterminism and uncertainty Cf. Logic Probabilistic model: Says how often we expect different things to occur Cf. Function © Daniel S. Weld
5
What Is Statistics? Statistics 1: Describing data
Statistics 2: Inferring probabilistic models from data Structure Parameters © Daniel S. Weld
6
Why Should You Care? The world is full of uncertainty
Logic is not enough Computers need to be able to handle uncertainty Probability: new foundation for AI (& CS!) Massive amounts of data around today Statistics and CS are both about data Statistics lets us summarize and understand it Statistics is the basis for most learning Statistics lets data do our work for us © Daniel S. Weld
7
Outline Basic notions Bayesian networks Statistical learning
Atomic events, probabilities, joint distribution Inference by enumeration Independence & conditional independence Bayes’ rule Bayesian networks Statistical learning Dynamic Bayesian networks (DBNs) Markov decision processes (MDPs) © Daniel S. Weld
8
Logic vs. Probability Symbol: Q, R … Random variable: Q …
Domain: you specify e.g. {heads, tails} [1, 6] Boolean values: T, F State of the world: Assignment to Q, R … Z Atomic event: complete specification of world: Q… Z Mutually exclusive Exhaustive Prior probability (aka Unconditional prob: P(Q) Joint distribution: Prob. of every atomic event © Daniel S. Weld
9
Syntax for Propositions
© Daniel S. Weld
10
Propositions Assume Boolean variables Propositions: A = true B = false
© Daniel S. Weld
11
Why Use Probability? E.g. P(AB) = ? P(A) + P(B) - P(AB) A B A True
© Daniel S. Weld
12
Axioms of Probability Theory
All probabilities between 0 and 1 0 ≤ P(A) ≤ 1 P(true) = 1 P(false) = 0. The probability of disjunction is: A B A B True © Daniel S. Weld
13
Prior Probability Any question can be answered by the
joint distribution © Daniel S. Weld
14
Conditional Probability
© Daniel S. Weld
15
Conditional Probability
Def: © Daniel S. Weld
16
Inference by Enumeration
P(toothache)= = .20 or 20% This process is called “Marginalization” © Daniel S. Weld
17
Inference by Enumeration
P(toothachecavity = .20 + ?? .28 © Daniel S. Weld
18
Inference by Enumeration
© Daniel S. Weld
19
Problems ?? Worst case time: O(dn) Space complexity also O(dn)
Where d = max arity And n = number of random variables Space complexity also O(dn) Size of joint distribution How get O(dn) entries for table?? Value of cavity & catch irrelevant - When computing P(toothache) © Daniel S. Weld
20
Independence A and B are independent iff:
These two constraints are logically equivalent Therefore, if A and B are independent: © Daniel S. Weld
21
Independence A A B B True © Daniel S. Weld
22
Independence Complete independence is powerful but rare
What to do if it doesn’t hold? © Daniel S. Weld
23
Conditional Independence
A&B not independent, since P(A|B) < P(A) A A B B True © Daniel S. Weld
24
Conditional Independence
But: A&B are made independent by C A A B AC P(A|C) = P(A|B,C) B True C B C © Daniel S. Weld
25
Conditional Independence
Instead of 7 entries, only need 5 © Daniel S. Weld
26
Conditional Independence II
P(catch | toothache, cavity) = P(catch | cavity) P(catch | toothache,cavity) = P(catch |cavity) Why only 5 entries in table? © Daniel S. Weld
27
Power of Cond. Independence
Often, using conditional independence reduces the storage complexity of the joint distribution from exponential to linear!! Conditional independence is the most basic & robust form of knowledge about uncertain environments. © Daniel S. Weld
28
Bayes Rule Simple proof from def of conditional probability: QED:
(Def. cond. prob.) (Def. cond. prob.) (Mult by P(H) in line 1) QED: (Substitute #3 in #2) © Daniel S. Weld
29
Use to Compute Diagnostic Probability from Causal Probability
E.g. let M be meningitis, S be stiff neck P(M) = , P(S) = 0.1, P(S|M)= 0.8 P(M|S) = © Daniel S. Weld
30
Bayes’ Rule & Cond. Independence
© Daniel S. Weld
31
Bayes Nets In general, joint distribution P over set of variables (X1 x ... x Xn) requires exponential space for representation & inference BNs provide a graphical representation of conditional independence relations in P usually quite compact requires assessment of fewer parameters, those being quite natural (e.g., causal) efficient (usually) inference: query answering and belief update © Daniel S. Weld
32
An Example Bayes Net Earthquake Burglary Radio Alarm Nbr1Calls
Pr(B=t) Pr(B=f) Earthquake Burglary Pr(A|E,B) e,b (0.1) e,b (0.8) e,b (0.15) e,b (0.99) Radio Alarm Nbr1Calls Nbr2Calls © Daniel S. Weld
33
Earthquake Example (con’t)
Burglary Alarm Nbr2Calls Nbr1Calls Radio If I know if Alarm, no other evidence influences my degree of belief in Nbr1Calls P(N1|N2,A,E,B) = P(N1|A) also: P(N2|N2,A,E,B) = P(N2|A) and P(E|B) = P(E) By the chain rule we have P(N1,N2,A,E,B) = P(N1|N2,A,E,B) ·P(N2|A,E,B)· P(A|E,B) ·P(E|B) ·P(B) = P(N1|A) ·P(N2|A) ·P(A|B,E) ·P(E) ·P(B) Full joint requires only 10 parameters (cf. 32) © Daniel S. Weld
34
BNs: Qualitative Structure
Graphical structure of BN reflects conditional independence among variables Each variable X is a node in the DAG Edges denote direct probabilistic influence usually interpreted causally parents of X are denoted Par(X) X is conditionally independent of all nondescendents given its parents Graphical test exists for more general independence “Markov Blanket” © Daniel S. Weld
35
Given Parents, X is Independent of Non-Descendants
© Daniel S. Weld
36
For Example Earthquake Burglary Radio Alarm Nbr1Calls Nbr2Calls
© Daniel S. Weld
37
Given Markov Blanket, X is Independent of All Other Nodes
MB(X) = Par(X) Childs(X) Par(Childs(X)) © Daniel S. Weld
38
Conditional Probability Tables
Pr(B=t) Pr(B=f) Earthquake Burglary Pr(A|E,B) e,b (0.1) e,b (0.8) e,b (0.15) e,b (0.99) Radio Alarm Nbr1Calls Nbr2Calls © Daniel S. Weld
39
Conditional Probability Tables
For complete spec. of joint dist., quantify BN For each variable X, specify CPT: P(X | Par(X)) number of params locally exponential in |Par(X)| If X1, X2,... Xn is any topological sort of the network, then we are assured: P(Xn,Xn-1,...X1) = P(Xn| Xn-1,...X1)·P(Xn-1 | Xn-2,… X1) … P(X2 | X1) · P(X1) = P(Xn| Par(Xn)) · P(Xn-1 | Par(Xn-1)) … P(X1) © Daniel S. Weld
40
Inference in BNs The graphical independence representation
yields efficient inference schemes We generally want to compute Pr(X), or Pr(X|E) where E is (conjunctive) evidence Computations organized by network topology One simple algorithm: variable elimination (VE) © Daniel S. Weld
41
P(b|j,m) = P(b) P(e) P(a|b,e)P(j|a)P(m,a)
P(B | J=true, M=true) Earthquake Burglary Radio Alarm John Mary P(b|j,m) = P(b) P(e) P(a|b,e)P(j|a)P(m,a) e a © Daniel S. Weld
42
Structure of Computation
Dynamic Programming © Daniel S. Weld
43
Variable Elimination A factor is a function from some set of variables into a specific value: e.g., f(E,A,N1) CPTs are factors, e.g., P(A|E,B) function of A,E,B VE works by eliminating all variables in turn until there is a factor with only query variable To eliminate a variable: join all factors containing that variable (like DB) sum out the influence of the variable on new factor exploits product form of joint distribution © Daniel S. Weld
44
Example of VE: P(N1) P(N1) = SN2,A,B,E P(N1,N2,A,B,E)
= SN2,A,B,E P(N1|A)P(N2|A) P(B)P(A|B,E)P(E) = SAP(N1|A) SN2P(N2|A) SBP(B) SEP(A|B,E)P(E) = SAP(N1|A) SN2P(N2|A) SBP(B) f1(A,B) = SAP(N1|A) SN2P(N2|A) f2(A) = SAP(N1|A) f3(A) = f4(N1) Earthqk Burgl Alarm N1 N2 © Daniel S. Weld
45
Notes on VE Each operation is a simply multiplication of factors and summing out a variable Complexity determined by size of largest factor e.g., in example, 3 vars (not 5) linear in number of vars, exponential in largest factorelimination ordering greatly impacts factor size optimal elimination orderings: NP-hard heuristics, special structure (e.g., polytrees) Practically, inference is much more tractable using structure of this sort © Daniel S. Weld
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.