Presentation is loading. Please wait.

Presentation is loading. Please wait.

Today’s Topics Bayesian Networks (BNs) used a lot in medical diagnosis M-estimates Searching for Good BNs Markov Blanket what is conditionally independent.

Similar presentations


Presentation on theme: "Today’s Topics Bayesian Networks (BNs) used a lot in medical diagnosis M-estimates Searching for Good BNs Markov Blanket what is conditionally independent."— Presentation transcript:

1 Today’s Topics Bayesian Networks (BNs) used a lot in medical diagnosis M-estimates Searching for Good BNs Markov Blanket what is conditionally independent in a BN 10/13/15CS 540 - Fall 2015 (Shavlik©), Lecture 15, Week 61 Thomas Bayes 1701-1761

2 Approximating the Full Joint Prob Table Bayesian Networks are one way to ‘compactly’ represent a full joint prob table A Bayes net can fill every cell in a full joint prob table, but is (usually) much smaller –The trick? All the cells are no longer independent An analogy: we could have (a) a big table that holds all products of two 32-bit int’s or (b) a more compact way to compute ‘cells’ in this table when needed 10/13/15CS 540 - Fall 2015 (Shavlik©), Lecture 15, Week 6 2

3 Bayesian Networks (BNs) 10/13/15CS 540 - Fall 2015 (Shavlik©), Lecture 15, Week 6 3

4 Example 10/13/15CS 540 - Fall 2015 (Shavlik©), Lecture 15, Week 6 4 A B C D E P(A) = 0.7 (implicit: P(¬A) = 0.3)) P(B) = 0.4 P(D) = 0.2 AB P(C | A = ? ˄ B= ?) FF0.9 FT0.3 TF0.6 TT0.1 CD P(E | C = ? ˄ D= ?) FF0.8 FT0.7 TF0.4 TT0.6 What are P(A ˄ B ˄ C ˄ D ˄ E) = P(A ˄ ¬B ˄ C ˄ D ˄ ¬E) = Have several SMALL tables rather than one BIG table - they are called CONDITIONAL PROBABILITY TABLES (CPTs)

5 Solutions for Prev Page P(A ˄ B ˄ C ˄ D ˄ E) = P(A)  P(B)  P(C | A ˄ B)  P(D)  P(E | C ˄ D) = 0.7  0.4  0.1  0.2  0.6 P(A ˄ ¬B ˄ C ˄ D ˄ ¬E) = = P(A)  P(¬B)  P(C | A ˄ ¬B)  P(D)  P(¬E | C ˄ D) = P(A)  (1 - P(B))  P(C | A ˄ ¬B)  P(D)  (1 - P(E | C ˄ D)) = 0.7  (1 – 0.4)  0.6  0.2  (1 – 0.6 ) 10/13/15CS 540 - Fall 2015 (Shavlik©), Lecture 15, Week 6 5

6 Using BNs to Answer ‘Partial’ Queries 10/13/15CS 540 - Fall 2015 (Shavlik©), Lecture 15, Week 6 6 Basic idea (repeated from previous lecture) 1)create probs that only involve AND and NOT 2)“AND in” the remaining vars in all possible (conjunctive) ways 3)look up fully specified ‘world states’ 4)do the arithmetic We still do the following from the prev lec to convert partial world-state queries to full world-state queries But instead of doing a LOOKUP in a table, we use the BN to calc the prob of a complete world state Complete world state Prob(complete world state) Black Box (full joint? BN?)

7 Filling the BN’s Tables - look at data and simply count! Over60SmokesP(Had_HeartAttack | Over60 = ? ˄ Smokes = ?) FF FT TF TT 10/13/15CS 540 - Fall 2015 (Shavlik©), Lecture 15, Week 6 7 Over60SmokesHad_HeartAttack FTF TTT TFT FTT FFF TTF TFT FTT FTF TFF FFT FFF Data Set 1/3 2/4 2/3 1/2

8 Filling the Full Joint Table from a BN SeasonSprinklerOn?Raining?GrassWet?Slippery?Prob SpringFFFF FFFT FFTF FFTT FTFF FTFT etc………… 10/13/15CS 540 - Fall 2015 (Shavlik©), Lecture 15, Week 6 8 Successively fill each row by applying the BN formula using that row’s values for all the random variables CPTs not shown, but need to be there

9 10/13/15CS 540 - Fall 2015 (Shavlik©), Lecture 15, Week 6 9 A B C D E P(A) = 0.7 (implicit: P(¬A) = 0.3)) P(B) = 0.4 P(D) = 0.2 AB P(C | A = ? ˄ B= ?) FF0.9 FT0.3 TF0.6 TT0.1 CD P(E | C = ? ˄ D= ?) FF0.8 FT0.7 TF0.4 TT0.6 We have FIVE Boolean-valued random vars, so full joint table would have 2 5 -1 = 31 (‘-1’ since sum of probs equals 1) independent numbers In this BN we have 11 independent numbers (difference more striking on large tasks) We can use the BN to fill the joint table with non-zero values, but likely are approximating the ‘true’ full joint table (approx’ing might lead to better generalization if we don’t have a lot of data)

10 M-Estimates What if NO Data for some Cell in a CPT? Assuming prob=0 has Major Impact –Zero very different from ‘very small prob’ Soln: assume we have m examples for each possible value of each random variable –Often we use m=1 (called `Laplace Smoothing’, en.wikipedia.org/wiki/Additive_smoothing) en.wikipedia.org/wiki/Additive_smoothing 10/13/15CS 540 - Fall 2015 (Shavlik©), Lecture 15, Week 6 10 Pierre-Simon Laplace 1749-1827

11 Example of M-estimates Assume size has values S, M, L Assume we have 3 Small, 2 Medium, and 0 Large ex’s Use m=1, so ‘imagine’ 1 S, 1 M, and 1 L additional, ‘pseudo’ examples P(size = S) = (3 + 1) / (5 + 3) = 0.500 P(size = M) = (2 + 1) / (5 + 3) = 0.375 P(size = L) = (0 + 1) / (5 + 3) = 0.125 Programming trick: start all NUMERATOR counters at m rather than 0 (then sum numerators to set initial DENOMINATOR) 10/13/15CS 540 - Fall 2015 (Shavlik©), Lecture 15, Week 6 11 Count of ACTUAL examples Count of PSEUDO examples Aside: could also do this in the d- tree calculations!

12 If we have, say, 100 features, we are multiplying 100 numbers in [0,1] If many probabilities are small, we could “underflow” the minimum positive double in our computer Trick: Sum log’s of prob’s Often we need only compare the exponents of two calc’s NB Technical Detail: Underflow 10/13/1512CS 540 - Fall 2015 (Shavlik©), Lecture 15, Week 6

13 From Where Does the BN’s GRAPH Come? Knowledge Provided by Domain Expert Or via AI SEARCH! State: a BN using all the random vars Start: NO arcs Action: add directed arc if no cycle created Heuristic: accuracy on train set plus penalty proportional to network size (to reduce over fitting) 10/13/15CS 540 - Fall 2015 (Shavlik©), Lecture 15, Week 6 13

14 Searching for a Good BN Structure 10/13/15CS 540 - Fall 2015 (Shavlik©), Lecture 15, Week 614 Computationally expensive, so often specialized algorithms used, eg TAN (Tree-Augmented Naïve Bayes), which allows at most two parents and finds optimal such network in polynomial time Add Arc...

15 Terminology: Bayes Nets and ML Parameter (aka Weight) Learning –Fill the conditional probability tables (CPTs) stored at each node (look at training data and count things) Structure Learning (‘learn the graph’) –SKIP if expert provides structure –Can cast as an AI search task Add/subtract nodes (broader search than on earlier slide) Add/subtract arcs (graph must be acyclic) Change direction of arc –Need to score candidate graphs Usually need to penalize graph complexity/size (we’ll see analogs later in SVMs and neural networks) 10/13/1515CS 540 - Fall 2015 (Shavlik©), Lecture 15, Week 6

16 The Full Joint Table as a BN Usually a BN approximates the full joint table But the BN notation can also exactly rep it P(A  B  C  D) // For clarity, we’ll only use four random vars  P(A  B  C | D)  P(D)  P(A  B | C  D)  P(C | D) x P(D)  P(A | B  C  D)  P(B | C  D)  P(C | D) x P(D) 10/13/15CS 540 - Fall 2015 (Shavlik©), Lecture 15, Week 6 16 DABC This is a ‘fully connected’ acyclic graph

17 Markov Blanket If the random variables in a node’s Markov Blanket are set, then that node is conditionally independent of all other nodes in a BN Markov Blanket (pg 517) –Parent nodes (this is one is obvious) –Children nodes (less obvious) –Children’s other parent nodes (much less obvious) 10/13/15CS 540 - Fall 2015 (Shavlik©), Lecture 15, Week 6 17 Andrei Andreyevich Markov 1856-1922

18 Markov Blanket Illustrated 10/13/15CS 540 - Fall 2015 (Shavlik©), Lecture 15, Week 6 18 P1P1 PNPN X CMCM C1C1 CP M,1 CP M,k CP 1,1 CP 1,k … … … … We can compute P(X | P’s, C’s, and CP’s) regardless of settings for the rest of the BN

19 More Markov-Blanket Experience 10/13/15CS 540 - Fall 2015 (Shavlik©), Lecture 15, Week 6 19 Knowing GasInTank=0 can ‘explain away’ why GasGauge=empty “Leak” nodes often used to model “all other causes” If we know value of EngineCranks, whether or not the car Starts tells us nothing about BatteryPower


Download ppt "Today’s Topics Bayesian Networks (BNs) used a lot in medical diagnosis M-estimates Searching for Good BNs Markov Blanket what is conditionally independent."

Similar presentations


Ads by Google