Artificial Intelligence Probabilistic reasoning Fall 2008 professor: Luigi Ceccaroni.

Artificial Intelligence Probabilistic reasoning Fall 2008 professor: Luigi Ceccaroni

Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions. Syntax: –a set of nodes, one per variable –a directed, acyclic graph (links ≈ "directly influences") –a conditional distribution for each node given its parents: P (X i | Parents (X i )) In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over X i for each combination of parent values.

Example Topology of network encodes conditional independence assertions: Weather is independent of the other variables. Toothache and Catch are conditionally independent given Cavity.

Example What is the probability of having a heart attack? This probability depends on “4 variables”: –Sport –Diet –Blood pressure –Smoking Knowing the dependency among these variables let us build a Bayesian network. 4

Constructing Bayesian networks 1. Choose an ordering of variables X 1, …,X n 2. For i = 1 to n –add X i to the network –select parents from X 1, …,X i-1 such that P (X i | Parents(X i )) = P (X i | X 1,... X i-1 ) This choice of parents guarantees: P (X 1, …,X n ) = π i =1 P (X i | X 1, …, X i-1 ) (chain rule) = π i =1 P (X i | Parents(X i )) (by construction) n n

Example Heart attack Smoking Blood pressure Diet Sport DietP(Di) balanced0.4 unbalanced0.6 SportP(Sp) yes0.1 no0.9 SmokingP(Sm) yes0.4 no0.6 DietSportP(Bp = high) P(Bp = normal) bal.yes0.010.99 unbal.yes0.20.8 bal.no0.250.75 unbal.no0.70.3 BpSmP(Ha=yes)P(Ha=no) highyes0.80.2 norm.yes0.60.4 highno0.70.3 norm.no0.30.7

Compactness A CPT for Boolean X i with k Boolean parents has 2 k rows for the combinations of parent values. Each row requires one number p for X i = true (the number for X i = false is just 1-p). If each variable (n) has no more than k parents (k<<n), the complete network requires O(n · 2 k ) numbers.

Representation cost The network grows linearly with n, vs. O(2 n ) for the conditional full joint distribution. Examples: –With 10 variables and at most 3 parents: 80 vs. 1024 –With 100 variables and at most 5 parents: 3200 vs. 10 30

Semantics The full joint distribution is defined as the product of the local conditional distributions: P (X 1, …,X n ) = π i = 1 P (X i | Parents(X i )) Example: P (sp ∧ Di=balanced ∧ Bp=high ∧ ¬sm ∧ ¬ha) = = P (sp) P (Di=balanced) P (Bp=high | sp, Di=balanced) P (¬sm) P (¬ha | Bp=high, ¬sm) n

Bayesian networks – Joint distribution - Example P(ha ∧ Bp = high ∧ sm ∧ sp ∧ Di = balanced) = P(ha | Bp = high, sm) P(Bp = high | sp, Di = balanced) P(sm) P(sp) P(Di = balanced) = 0.8 x 0.01 x 0.4 x 0.1 x 0.4 = 0.000128 10

Exact inference in Bayesian networks: example Inference by enumeration: P(X | e) = α P(X, e) = α  y P(X, e, y) Let’s calculate: P(Smoking | Heart attack = yes, Sport = no) The full joint distribution of the network is: P(Sp, Di, Bp, Sm, Ha) = = P(Sp) P(Di) P(Bp | Sp, Di) P(Sm) P(Ha | Bp, Sm) We want to calculate: P(Sm | ha, ¬sp).

Exact inference in Bayesian networks: example P(Sm | ha, ¬sp) = α P(Sm, ha, ¬sp) = = α  Di  {b, ¬b}  Bp  {h, n} P(Sm, ha, ¬sp, Di, Bp) = = α P(¬sp) P(Sm)  Di  {b, ¬b} P(Di)  Bp  {h,n} P(Bp | ¬sp, Di) P(ha | Bp, Sm) = = α <0.9 * 0.4 * (0.4 * (0.25 * 0.8 + 0.75 * 0.6) + 0.6 * (0.7 * 0.8 + 0.3 * 0.6)), 0.9 * 0.6 * (0.4 * (0.25 * 0.7 + 0.75 * 0.3) + 0.6 * (0.7 * 0.7 + 0.3 * 0.3)> = = α =

Variable elimination algorithm The variable elimination algorithm let us avoid the calculation repetition of inference by enumeration. Each variable is represented by a factor. Intermediate results are saved to be later reused. Non-relevant variables, being constant factors, are not directly computed. 13

Variable elimination algorithm 14 CALCULA-FACTOR generates the factor corresponding to variable var in the function of the joint probability distribution. PRODUCTO-Y-SUMA multiplies factors and sums over the hidden variable. PRODUCTO multiplies a set of factors.

Variable elimination algorithm - Example α P(¬sp) P(Sm)  Di  {b, ¬b} P(Di)  Bp  {h,n} P(Bp | ¬sp, Di) P(ha | Bp, Sm) Factor for variable Heart attack P(ha | Bp, Sm), f Ha (Bp, Sm): 15 BpSmf Ha (Bp, Sm) highyes0.8 highno0.7 normalyes0.6 normalno0.3

Variable elimination algorithm - Example Factor for variable Blood pressure P(Bp | ¬sp, Di), f Bp (Bp, Di): To put together the factors just obtained, we calculate the product of f Ha (Bp, Sm) x f Bp (Bp, Di) = f Ha Bp (Bp, Sm, Di) 16 BpDif Bp (Bp, Di) highbalanced0.25 highunbalanced0.7 normalbalanced0.75 normalunbalanced0.3

f Ha Bp (Bp, Sm, Di) = = f Ha (Bp, Sm) x f Bp (Bp, Di) Variable elimination algorithm - Example BpSmDif Ha Bp (Bp, Sm, Di) highyesbalanced0.8 * 0.25 highyesunbalanced0.8 * 0.7 highnobalanced0.7 * 0.25 highnounbalanced0.7 * 0.7 normalyesbalanced0.6 * 0.75 normalyesunbalanced0.6 * 0.3 normalnobalanced0.3 * 0.75 normalnounbalanced0.3 * 0.3 17

We sum over the values of variable Bp to obtain factor f Ha Bp (Sm, Di) Factor for variable Di, f Di (Di): Variable elimination algorithm - Example 18 SmDif Ha Bp (Sm, Di) yesbalanced0.8 * 0.25 + 0.6 * 0.75 = 0.65 yesunbalanced0.8 * 0.7 + 0.6 * 0.3 = 0.74 nobalanced0.7 * 0.25 + 0.3 * 0.75 = 0.4 nounbalanced0.7 * 0.7 + 0.3 * 0.3 = 0.58 Dif Di (Di) balanced0.4 unbalanced0.6

f Ha Di Bp (Sm, Di) = f Di (Di) x f Ha Bp (Sm, Di) We sum over the values of variable Di to obtain factor f Ha Di Bp (Sm) Variable elimination algorithm - Example 19 SmDif Ha Di Bp (Sm, Di) yesbalanced0.65 * 0.4 yesunbalanced0.74 * 0.6 nobalanced0.4 * 0.4 nounbalanced0.58 * 0.6 Smf Ha Di Bp (Sm) yes0.65 * 0.4 + 0.74 * 0.6 = 0.7 no0.4 * 0.4 + 0.58 * 0.6 = 0.51

Variable elimination algorithm - Example Factor for variable Sm, f Sm (Sm): f Ha Sm Di Bp (Sm) = f Sm (Sm) x f Ha Di Bp (Sm) Normalizing, we obtain: 20 Smf Sm (Sm) yes0.4 no0.6 Smf Ha Sm Di Bp (Sm) yes0.4 * 0.7 = 0.282 no0.6 * 0.51 = 0.305 SmP(Sm | ha, ¬sp) yes0.48 no0.52

Bayesian networks provide a natural representation for (causally induced) conditional independence. Topology + CPTs = compact representation of joint distribution. Generally easy for domain experts to construct. Summary

Artificial Intelligence Probabilistic reasoning Fall 2008 professor: Luigi Ceccaroni.

Similar presentations

Presentation on theme: "Artificial Intelligence Probabilistic reasoning Fall 2008 professor: Luigi Ceccaroni."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Artificial Intelligence Probabilistic reasoning Fall 2008 professor: Luigi Ceccaroni.

Similar presentations

Presentation on theme: "Artificial Intelligence Probabilistic reasoning Fall 2008 professor: Luigi Ceccaroni."— Presentation transcript:

Similar presentations

About project

Feedback