Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Probability FOL fails for a domain due to: –Laziness: too much to list the complete set of rules, too hard to use the enormous rules that result –Theoretical.

Similar presentations


Presentation on theme: "1 Probability FOL fails for a domain due to: –Laziness: too much to list the complete set of rules, too hard to use the enormous rules that result –Theoretical."— Presentation transcript:

1 1 Probability FOL fails for a domain due to: –Laziness: too much to list the complete set of rules, too hard to use the enormous rules that result –Theoretical ignorance: there is no complete theory for the domain –Practical ignorance: have not or cannot run all necessary tests

2 2 Probability Probability = a degree of belief Probability comes from: –Frequentist: experiments and statistical assessment –Objectivist: real aspects of the universe –Subjectivist: a way of characterizing an agent’s beliefs Decision theory = probability + utility theory

3 3 Probability Prior probability: probability in the absence of any other information P(Dice = 2) = 1/6 random variable: Dice domain = probability distribution: P(Dice) =

4 4 Probability Conditional probability: probability in the presence of some evidence P(Dice = 2 | Dice is even) = 1/3 P(Dice = 2 | Dice is odd) = 0 P(A | B) = P(A  B)/P(B) P(A  B) = P(A | B).P(B)

5 5 Probability Example: S = stiff neck M = meningitis P(S | M) = 0.5 P(M) = 1/50000 P(S) = 1/20  P(M | S) = P(S | M).P(M)/P(S) = 1/5000

6 6 Probability Joint probability distributions: X: Y: P(X = x i, Y = y j )

7 7 Probability Axioms: 0  P(A)  1 P(true) = 1 and P(false) = 0 P(A  B) = P(A) + P(B)  P(A  B)

8 8 Probability Derived properties: P(  A) = 1  P(A) P(U) = P(A 1 ) + P(A 2 ) +... + P(A n ) U = A 1  A 2 ...  A n collectively exhaustive A i  A j = falsemutually exclusive

9 9 Probability Bayes’ theorem: P(H i | E) = P(H i  E)/P(E) = P(E | H i ).P(H i )/  P(E | H i ).P(H i ) H i ’s are collectively exhaustive & mutually exclusive

10 10 Probability Problem: a full joint probability distribution P(X 1, X 2,..., X n ) is sufficient for computing any probability on X i 's, but the number of joint probabilities is exponential.

11 11 Probability Independence: P(A  B) = P(A).P(B) P(A) = P(A | B) Conditional independence: P(A  B | E) = P(A | E).P(B | E) P(A | E) = P(A | E  B)

12 12 Probability Example: P(Toothache | Cavity  Catch) = P(Toothache | Cavity) P(Catch | Cavity  Toothache) = P(Catch | Cavity)

13 13 Probability "In John's and Mary's house, an alarm is installed to sound in case of burglary or earthquake. When the alarm sounds, John and Mary may make a call for help or rescue."

14 14 Probability "In John's and Mary's house, an alarm is installed to sound in case of burglary or earthquake. When the alarm sounds, John and Mary may make a call for help or rescue." Q1: If earthquake happens, how likely will John make a call?

15 15 Probability "In John's and Mary's house, an alarm is installed to sound in case of burglary or earthquake. When the alarm sounds, John and Mary may make a call for help or rescue." Q1: If earthquake happens, how likely will John make a call? Q2: If the alarm sounds, how likely is the house burglarized?

16 16 Bayesian Networks Pearl, J. (1982). Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach, presented at the Second National Conference on Artificial Intelligence (AAAI-82), Pittsburgh, Pennsylvania 2000 AAAI Classic Paper Award

17 17 Bayesian Networks BurglaryEarthquake Alarm JohnCalls MaryCalls P(B).001 P(E).002 B EP(A) T T F F T F.95.94.29.001 AP(J) TFTF.90.05 AP(M) TFTF.70.01

18 18 Bayesian Networks Syntax: –A set of random variables making up the nodes –A set of directed links connecting pairs of nodes –Each node has a conditional probability table that quantifies the effects of its parent nodes –The graph has no directed cycles A B C D A B C D yes no

19 19 Bayesian Networks Semantics: Ordering the nodes: X i is a predecessor of X j  i < j P(X 1, X 2, …, X n ) = P(X n | X n-1, …, X 1 ).P(X n-1 | X n-2, …, X 1 ). ….P(X 2 | X 1 ).P(X 1 ) =  i P(X i | X i-1, …, X 1 ) =  i P(X i | Parents(X i )) P (X i | X i-1, …, X 1 ) = P(X i | Parents(X i )) Parents(X i )  {X i-1, …, X 1 } Each node is conditionally independent of its predecessors given its parents

20 20 Bayesian Networks Example: P(J  M  A   B   E) = P(J | A).P(M | A).P(A |  B   E).P(  B).P(  E)

21 21 Bayesian Networks P(Query | Evidence) = ? –Diagnostic (from effects to causes): P(B | J) –Causal (from causes to effects): P(J | B) –Intercausal (between causes of a common effect): P(B | A, E) –Mixed: P(A | J,  E), P(B | J,  E)

22 22 P(B | A) = P(B  A)/P(A) =  P(B  A) Bayesian Networks

23 23 Bayesian Networks Why Bayesian Networks?

24 24 General Conditional Independence Y1Y1 YnYn U1U1 UmUm Z 1j Z nj X A node (X) is conditionally independent of its non-descendents (Z ij 's), given its parents (U i 's)

25 25 General Conditional Independence Y1Y1 YnYn U1U1 UmUm Z 1j Z nj X A node (X) is conditionally independent of all other nodes, given its parents (U i 's), children (Y i 's), and children's parents (Z ij 's)

26 26 General Conditional Independence X and Y are conditionally independent given E Z Z Z XYE

27 27 General Conditional Independence Example: Battery GasIgnitionRadio Starts Moves

28 28 General Conditional Independence Example: Battery GasIgnitionRadio Starts Moves Gas - (no evidence at all) - Radio Gas  Radio | Starts Gas  Radio | Moves

29 29 Network Construction Wrong ordering: –Produces a more complicated net –Requires more probabilities to be specified –Produces slight relationships that require difficult and unnatural probability judgments –Fails to represent all the conditional independence relationships  a lot of unnecessary probabilities

30 30 Network Construction Models: –Diagnostic: symptoms to causes –Causal: causes to symptoms Add "root causes" first, then the variables they influence  fewer and more comprehensible probabilities

31 31 Dempster-Shafer Theory Example: disease diagnostics D = {Allergy, Flu, Cold, Pneumonia}

32 32 Dempster-Shafer Theory Example: disease diagnostics D = {Allergy, Flu, Cold, Pneumonia} Basic probability assignment: m: 2 D  [0, 1] m 1 : {Flu, Cold, Pneu} 0.6 D 0.4

33 33 Dempster-Shafer Theory Belief: Bel: 2 D  [0, 1] Bel(S) =  X  S m(X)

34 34 Dempster-Shafer Theory Belief: Bel: 2 D  [0, 1] Bel(S) =  X  S m(X) Plausibility: Pl(p) = 1  Bel(  p) p: [Bel(p), Pl(p)]

35 35 Dempster-Shafer Theory Ignorance: Bel(p) + Bel(  p)  1

36 36 Dempster-Shafer Theory Combination rules: m 1 and m 2  X  Y=Z m 1 (X).m 2 (Y) m(Z) = 1   X  Y=  m 1 (X).m 2 (Y)

37 37 Exercises In Russell’s AIMA: Chapters 13-14.


Download ppt "1 Probability FOL fails for a domain due to: –Laziness: too much to list the complete set of rules, too hard to use the enormous rules that result –Theoretical."

Similar presentations


Ads by Google