cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9
11/20/2018 Today’s Topics Bayesian Networks (BNs) used a lot in medical diagnosis M-estimates Searching for Good BNs Markov Blanket what is conditionally independent in a BN Next: Artificial Neural Networks (ANNs) Read Section 18.7 and Section 18.9 of text, skim Sec 18.6 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Some Exam Solutions Gain(F1) = 0.57, Gain(F2) = 0.42 ID3({ex4}, {F2}, +) ID3({ex1, ex3}, {F2}, +) ID3({ex2, ex5}, {F2}, +) S1, A, D, F, FAIL S1, A, D, S2, C, E, G1 Regrade requests must be in writing – indicate request on FRONT PAGE (I reserve right to regrade entire exam) 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Some Exam Solutions P(!A,B,!C) / [ P(!A,!B,!C) + P(!A, B,!C) ] P(A,!B,!C) + P(A,B,!C) P(!A,!B,C) + P(A,!B,C) P(A,!B,!C,C) -9 node can be skipped by alpha-beta -(1/8) log2(1/8) - (1/4) log2(1/4) -(1/2) log2(1/2) - (1/8) log2(1/4) = (-1/8)(-3) - (1/4)(-2) - (1/2)(-1) - (1/8)(-3) = 1.75 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Approximating the Full Joint Prob Table
Bayesian Networks are one way to ‘compactly’ represent a full joint prob table A Bayes net can fill every cell in a full joint prob table, but is (usually) much smaller The trick? All the cells are no longer independent An analogy: we could have (a) a big table that holds all products of two 32-bit int’s or (b) a more compact way to compute ‘cells’ in this table when needed 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Bayesian Networks (BNs)
BNs are directed, acyclic graphs Each random variable is a node Arcs indicate direct dependence P(A1 ˄ A2 ˄ A1 ˄ … ˄ An) =  P(Ai | immediate parents of Ai) 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Example P(B) = 0.4 P(A) = 0.7 (implicit: P(¬A) = 0.3)) A B P(D) = 0.2 A B P(C | A = ? ˄ B= ?) F 0.9 T 0.3 0.6 0.1 D C E C D P(E | C = ? ˄ D= ?) F 0.8 T 0.7 0.4 0.6 What are P(A ˄ B ˄ C ˄ D ˄ E) = P(A ˄ ¬B ˄ C ˄ D ˄ ¬E) = Have several SMALL tables rather than one BIG table - they are called CONDITIONAL PROBABILITY TABLES (CPTs) 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Solutions for Prev Page
P(A ˄ B ˄ C ˄ D ˄ E) = P(A)  P(B)  P(C | A ˄ B)  P(D)  P(E | C ˄ D) = 0.7  0.4  0.1  0.2  0.6 P(A ˄ ¬B ˄ C ˄ D ˄ ¬E) = = P(A)  P(¬B)  P(C | A ˄ ¬B)  P(D)  P(¬E | C ˄ D) = P(A)  (1 - P(B))  P(C | A ˄ ¬B)  P(D)  (1 - P(E | C ˄ D)) = 0.7  (1 – 0.4)  0.6  0.2  (1 – 0.6 ) 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Using BNs to Answer ‘Partial’ Queries
Black Box (full joint? BN?) Complete world state Prob(complete world state) We still do the following from the prev lec to convert partial world-state queries to full world-state queries But instead of doing a LOOKUP in a table, we use the BN to calc the prob of a complete world state Basic idea (repeated from previous lecture) create probs that only involve AND (NEGATED single vars OK) “AND in” the remaining vars in all possible (conjunctive) ways look up fully specified ‘world states’ do the arithmetic 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Filling the BN’s Tables - look at data and simply count!
Data Set Over60 Smokes Had_HeartAttack F T Over60 Smokes P(Had_HeartAttack | Over60 = ? ˄ Smokes = ?) F T 1/3 2/4 2/3 1/2 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

P(B) = 0.4 P(A) = 0.7 (implicit: P(¬A) = 0.3)) A B P(D) = 0.2 A B P(C | A = ? ˄ B= ?) F 0.9 T 0.3 0.6 0.1 D C E C D P(E | C = ? ˄ D= ?) F 0.8 T 0.7 0.4 0.6 We have FIVE Boolean-valued random vars, so full joint table would have = 31 (‘-1’ since sum of probs equals 1) independent numbers In this BN we have 11 independent numbers (difference more striking on large tasks) We can use the BN to fill the joint table with non-zero values, but likely are approximating the ‘true’ full joint table (approx’ing might lead to better generalization if we don’t have a lot of data) 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Filling the Full Joint Table from a BN
CPTs not shown, but need to be there Season SprinklerOn? Raining? GrassWet? Slippery? Prob Spring F T etc … Successively fill each row by applying the BN formula using that row’s values for all the random variables 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

M-Estimates What if NO Data for some Cell in a CPT? Assuming prob=0 has Major Impact Zero very different from ‘very small prob’ Soln: assume we have m examples for each possible value of each random variable Often we use m=1 (called `Laplace Smoothing’, en.wikipedia.org/wiki/Additive_smoothing) Pierre-Simon Laplace 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Example of M-estimates
Assume size has values S, M, L Assume we have 3 Small, 2 Medium, and 0 Large ex’s Use m=1, so ‘imagine’ 1 S, 1 M, and 1 L additional, ‘pseudo’ examples P(size = S) = (3 + 1) / (5 + 3) = 0.500 P(size = M) = (2 + 1) / (5 + 3) = 0.375 P(size = L) = (0 + 1) / (5 + 3) = 0.125 Programming trick: start all NUMERATOR counters at m rather than 0 (then sum numerators to set initial DENOMINATOR) Count of ACTUAL examples Count of PSEUDO examples Aside: could also do this in the d-tree calculations! 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

From Where Does the BN’s GRAPH Come?
Knowledge Provided by Domain Expert Or via AI SEARCH! State: a BN using all the random vars Start: NO arcs Action: add directed arc if no cycle created Heuristic: accuracy on train set plus penalty proportional to network size (to reduce over fitting) 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Searching for a Good BN Structure
Add Arc . . . Computationally expensive, so often specialized algorithms used, eg TAN (Tree-Augmented Naïve Bayes), which allows at most two parents and finds optimal such network in polynomial time 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Terminology: Bayes Nets and ML
Parameter (aka Weight) Learning Fill the conditional probability tables (CPTs) stored at each node (look at training data and count things) Structure Learning (‘learn the graph’) SKIP if expert provides structure Can cast as an AI search task Add/subtract nodes (broader search than on earlier slide) Add/subtract arcs (graph must be acyclic) Change direction of arc Need to score candidate graphs Usually need to penalize graph complexity/size (we’ll see analogs later in SVMs and neural networks) 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

The Full Joint Table as a BN
Usually a BN approximates the full joint table But the BN notation can also exactly rep it P(A=?  B=?  C=?  D=?) // For clarity, we’ll only use 4 random vars  P(A  B  C | D)  P(D) // Dropping =? for clarity  P(A  B | C  D)  P(C | D) x P(D)  P(A | B  C  D)  P(B | C  D)  P(C | D) x P(D) D A B C This is a ‘fully connected’ acyclic graph 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

A Major Weakness of BN’s
If many ‘hidden’ random vars (N binary vars, say), then the marginalization formula leads to many calls to a BN (2N in our example; for N = 20, 2N = 1,048,576) Using uniform-random sampling to estimate the result is too inaccurate since most of the probability might be concentrated in only a few ‘complete world states’ Hence, much research (beyond cs540’s scope) on scaling up inference in BNs and other graphical models, eg via more sophisticated sampling (eg, MCMC) 10/17/16 AmFam- Fall 2016(Shavlik©), Lecture 11, Week 7

Markov Blanket Andrei Andreyevich Markov If the random variables in a node’s Markov Blanket are set, then that node is conditionally independent of all other nodes in a BN Markov Blanket Parent nodes (this is one is obvious) Children nodes (less obvious) Children’s other parent nodes (much less obvious) 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Markov Blanket Illustrated
P1 PN … X CPM,1 CP1,k … … CPM,k CP1,1 C1 CM … We can compute P(X | P’s, C’s, and CP’s) regardless of settings for the rest of the BN 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

More Markov-Blanket Experience
Knowing GasInTank=0 can ‘explain away’ why GasGauge=empty If we know value of EngineCranks, whether or not the car Starts tells us nothing more about BatteryPower “Leak” nodes often used to model “all other causes” 11/1/16 cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Similar presentations

Presentation on theme: "cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Similar presentations

Presentation on theme: "cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9"— Presentation transcript:

Similar presentations

About project

Feedback