CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9 11/24/2018 Today’s Topics Back to Naïve Bayes (NB) NB as a BN More on HW3’s Nannon BN Some Hints on Full-Joint Prob Table for Nannon Log-Odds Calculations odds(x) prob(x) / (1 – prob(x)) = prob(x) / prob( x) Time Permitting: Some More BN Practice Probabilistic Reasoning Wrapup 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9
Recap: Naïve Bayes Parameter Learning Use training data to estimate (for Naïve Bayes) in one pass through data P(fi = vj | category = POS) for each i, j P(fi = vj | category = NEG) for each i, j P(category = POS) P(category = NEG) // Note: Some of above unnecessary since some combo’s of probs sum to 1 Apply Bayes’ rule to find odds(category = POS | test example’s features) Incremental/Online Learning Easy simply increment counters (true for BN’s in general, if no incremental ‘structure learning’) 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9
CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9 From Fall 2014 Final Consider the following training set, where three Boolean-valued features are used to predict a Boolean-valued output. Assume you wish to apply the Naïve Bayes algorithm. Calculate the ratio below and use pseudo examples. Prob(Output = True | A = False, B = False, C = False) ____________________________________________ = _________________ Prob(Output = False | A = False, B = False, C = False) 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9
CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9 From Fall 2014 Final Consider the following training set, where three Boolean-valued features are used to predict a Boolean-valued output. Assume you wish to apply the Naïve Bayes algorithm. Calculate the ratio below and use pseudo examples. Prob(Output = True | A = False, B = False, C = False) ____________________________________________ = _________________ Prob(Output = False | A = False, B = False, C = False) P(¬A | Out) P (¬B | Out) P (¬C | Out) P ( Out) (3 / 5) (2 / 5) (2 / 5) (5 / 8) = ________________________________________________ = ___________________________ P(¬A | ¬Out) P (¬B | ¬Out) P (¬C | ¬Out) P (¬Out) (2 / 3) (2 / 3) (1 / 3) (3 / 8) Assume FOUR pseudo examples (ffft, tttt, ffff, tttf) 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9
An Unassigned NB Paper-and-Pencil Problem (from 2015 Final) Consider the following training set, where two Boolean-valued features are used to predict a Boolean-valued output. Assume you wish to apply the Naïve Bayes algorithm. Calculate the ratio below, showing your work below it and putting your final (numeric) answer on the line to the right of the equal sign. Be sure to consider pseudo examples (use m=1). Prob(Output = True | A = False, B = False) _______________________________________ = __________________ Prob(Output = False | A = False, B = False) Assume FOUR pseudo examples (fft, ttt, fff, ttf) Ex # A B Output 1 True False 2 3 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9
CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9 NB as a BN 1 CPT of size 2n+1 (N+1) CPTs of size 1 P(D | S1 S2 S3 … Sn) = [ P(Si | D) ] x P(D) / P(S1 S2 S3 … Sn) We only need to compute this part if we use the ‘odds’ method from prev slides D … S1 S2 S3 Sn 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9
CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9 Why Just the Numerator? [ P(Si | D) ] x P(D) = P(D | S1 S2 S3 … Sn) // Cross multiply x P(S1 S2 S3 … Sn) = P(D S1 S2 S3 … Sn) - hence, we are calculating the joint prob of the symptoms and the disease 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9
HW3: Draw a BN then Implement the Calculation for that BN (also do NB) WIN … S1 S2 S3 Sn [ P(Si = valuei | WIN=true) ] x P(WIN=true) Odds(WIN) = [ P(Si = valuei | WIN=false) ] x P(WIN=false) Recall: Choose move that gives best odds of winning 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9
Going Slightly Beyond NB WIN … S1 S2 S3 Sn P(S1 = ? | S2 = ? WIN) [ P(Si = ? | WIN) ] x P( WIN) Odds(WIN) = P(S1 = ? | S2 = ? WIN) [ P(Si = ? | WIN) ] x P(WIN) Here the PRODUCT is from 2 to n 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9
Going Slightly Beyond NB (Part 2) WIN … S1 S2 S3 Sn P(S1 = ? S2 = ? | WIN) [ P(Si = ? | WIN) ] x P( WIN) Odds(WIN) = P(S1 = ? S2 = ? | WIN) [ P(Si = ? | WIN) ] x P(WIN) Here the PRODUCT is from 3 to n A little bit of joint probability! Used: P(S1 = ? S2 = ? | WIN) = P(S1 = ? | S2 = ? WIN) x P(S2 = ? | WIN) 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9
Hints on the Full Joint for Nannon Assume you have THREE binary-valued features (RV1, RV2, and RV3) that describe a board+move How to create a full joint table? One design: int[][][] fullJoint_wins = new int[2][2][2]; // Init to m (not shown). int[][][] fullJoint_losses = new int[2][2][2]; // Init to m (not shown). Assume we get RV1=T, RV2=F, and RV3=F in a WIN. Then do fullJoint_wins[1][0][0]++ // Also increment countMoves_wins. 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9
Hints on the Full Joint for Nannon (2) Assume we get RV1=T, RV2=T, and RV3=T in and want to get prob this is in a WIN. By Bayes’ Rule: P(win | RV1, RV2, RV3) = P(RV1, RV2, RV3 | win ) x P(win ) / denom P(loss | RV1, RV2, RV3) = P(RV1, RV2, RV3 | loss) x P(loss) / denom So (recall num + denom on left below SUM to 1, so can solve for win prob) P(win | RV1, RV2, RV3) P(RV1, RV2, RV3 | win ) x P(win ) P(loss | RV1, RV2, RV3) P(RV1, RV2, RV3 | loss) x P(loss) (fullJoint_wins[ 1][1][1] / wins) x (wins / totalMoves) = (fullJoint_losses[1][1][1] / losses) x (losses / totalMoves) = 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9
Hints on the Full Joint for Nannon (3) (fullJoint_wins[ 1][1][1] / wins) x (wins / totalMoves) = (fullJoint_losses[1][1][1] / losses) x (losses / totalMoves) Can greatly simplify the above (from prev slide)! Remember to create DOUBLEs! Don’t want integer division wins is really countMoves_wins, since MOVE is our unit of analysis And losses is really countMoves_losses totalMoves = countMoves_wins + countMoves_losses - though it cancels out anyway 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9
Hints on the Full Joint for Nannon (4) (double) fullJoint_wins[ 1][1][1] + m // m-estimate explicit = fullJoint_losses[1][1][1] + m So we really only need to compute: ratio of times this complete world state appears in a WINNING game compared to in a LOSING game This is almost embarrassingly simple, but be sure you understand the steps that lead to this simple ratio of two counters (simplified BN calculation not quite this simple, eg divide by WINS many times) 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9
Notice we removed one via algebra Log Odds Odds > 1 iff prob > 0.5 Odds(x) prob(x) / (1 – prob(x)) Recall (and now assuming D only has TWO values) 1) P( D | S1 S2 S3 … Sn) = [ P(Si | D) ] x P( D) / P(S1 S2 S3 … Sn) P(D | S1 S2 S3 … Sn) = [ P(Si | D) ] x P(D) / P(S1 S2 S3 … Sn) Dividing (1) by (2), denominators cancel out! P( D | S1 S2 S3 … Sn) [ P(Si | D) ] x P( D) = P(D | S1 S2 S3 … Sn) [ P(Si | D) ] x P(D) Since P(D | S1 S2 S3 … Sn) = 1 - P(D | S1 S2 S3 … Sn) odds(D | S1 S2 S3 … Sn) = [ { P(Si | D) / P(Si | D) } ] x [ P(D) / P(D) ] Notice we removed one via algebra 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9
CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9 The Missing Algebra The Implicit Algebra from Prev Page a1 ₓ a2 ₓ a3 ₓ … ₓ an b1 ₓ b2 ₓ b3 ₓ … ₓ bn = (a1 / b1) ₓ (a2 / b2) ₓ (a3 / b3) ₓ … ₓ (an / bn) 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9
CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9 Log Odds (continued) Odds(x) prob(x) / (1 – prob(x)) We ended two slides ago with odds(D | S1 S2 S3 … Sn) = [ { P(Si | D) / P(Si | D) } ] x [ P(D) / P(D) ] Recall log(A B) = log(A) + log(B), so we have log [ odds(D | S1 S2 S3 … Sn) ] = { log [ P(Si | D) / P(Si | D) ] } + log [ P(D) / P(D) ] If log-odds > 0, D is more likely than D since log(x) > 0 iff x > 1 If log-odds < 0, D is less likely than D since log(x) < 0 iff x < 1 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9
CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9 Log Odds (concluded) We ended last slide with log [ odds(D | S1 S2 S3 … Sn) ] = { log [ P(Si | D) / P(Si | D) ] } + log [ P(D) / P(D) ] Consider log [ P(D) / P D) ] if D more likely than D, we start the sum with a positive value Consider each log [ P(Si | D) / P(Si | D) ] if Si more likely give D than given D, we add to the sum a pos value if less likely, we add negative value if Si independent of D, we add zero At end we see if sum is POS (D more likely), ZERO (tie), or NEG (D more likely) 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9
Viewing NB as a PERCEPTON, the Simplest Neural Network log [ P(S1 | D) / P(S1 | D) ] S1 log [ P(S1| D) / P(S1 | D) ] … log-odds log [ P(Sn | D) / P(Sn | D) ] out Sn If Si = true, then NODE Si=1 and NODE Si=0 log [ P(Sn| D) / P(Sn | D) ] Sn log [ P(D) / P(D) ] If Si = false, then NODE Si=0 and NODE Si=1 ‘1’ 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9
More Practice: From Fall 2014 Final What is the probability that A and C are true but B and D are false? What is the probability that A is false, B is true, and D is true? What is the probability that C is true given A is false, B is true, and D is true? 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9
More Practice: From Fall 2014 Final What is the probability that A and C are true but B and D are false? = P(A) (1 – P(B)) P(C | A ˄ ¬B) (1 - P(D | A ˄ ¬B ˄ C)) = 0.3 (1 – 0.8) 0.6 (1 – 0.6) What is the probability that A is false, B is true, and D is true? = P(¬A ˄ B ˄ D) = P(¬A ˄ B ˄ ¬C ˄ D) + P(¬A ˄ B ˄ C ˄ D) = process ‘complete world states’ like first question What is the probability that C is true given A is false, B is true, and D is true? = P(C | ¬A ˄ B ˄ D) = P(C ˄ ¬A ˄ B ˄ D) / P(¬A ˄ B ˄ D) = process like first and second questions 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9
Bayesian Networks Wrapup BNs one Type of ‘Graphical Model’ Lots of Applications (though currently focus on ‘deep [neural] networks’) Bayes’ Rule Appealing Way to Go from EFFECTS to CAUSES (ie, diagnosis) Full Joint Prob Tables and Naïve Bayes are Interesting ‘Limit Cases’ of BNs With ‘Big Data,’ Counting Goes a Long Way! 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9