CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9

Slides:

Advertisements

Similar presentations

CS 206 Introduction to Computer Science II 09 / 10 / 2008 Instructor: Michael Eckmann.

Advertisements

Introduction to Computers and Programming Lecture 4: Mathematical Operators New York University.

Bayesian Belief Networks

Bell Work Explain why the location of point A(1, -2) is different than the location of point B(-2, 1). **Answer in complete thought sentences.

Exponents Scientific Notation

Lecture for Week Spring.  Numbers can be represented in many ways. We are familiar with the decimal system since it is most widely used in everyday.

Math Basics & Diagrams Foundations of Algebra Unit 1 Lesson 1.

CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.

Today’s Topics Playing Deterministic (no Dice, etc) Games –Mini-max –  -  pruning –ML and games? 1997: Computer Chess Player (IBM’s Deep Blue) Beat Human.

Today’s Topics Midterm class mean: 83.5 HW3 Due Thursday and HW4 Out Thursday Turn in Your BN Nannon Player (in Separate, ‘Dummy’ Assignment) until a Week.

Today’s Topics Graded HW1 in Moodle (Testbeds used for grading are linked to class home page) HW2 due (but can still use 5 late days) at 11:55pm tonight.

Today’s Topics Read: Chapters 7, 8, and 9 on Logical Representation and Reasoning HW3 due at 11:55pm THURS (ditto for your Nannon Tourney Entry) Recipe.

2.5 and 2.6 Multiplication and Division of Rational #’s.

Today’s Topics Remember: no discussing exam until next Tues! ok to stop by Thurs 5:45-7:15pm for HW3 help More BN Practice (from Fall 2014 CS 540 Final)

Today’s Topics Bayesian Networks (BNs) used a lot in medical diagnosis M-estimates Searching for Good BNs Markov Blanket what is conditionally independent.

Fractions V Mixed Numbers. Mixed Number Amixed number has a part that is a whole number and a part that is a fraction. =

Today’s Topics Some Exam-Review Notes –Midterm is Thurs, 5:30-7:30pm HERE –One 8.5x11 inch page of notes (both sides), simple calculator (log’s and arithmetic)

Number and Operations. To write decimals as fractions, you must understand place-value!

Grade 8 Integers!. What You Will Learn Some definitions related to integers. Rules for multiplying and dividing integers. Are you ready??

Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.

5.1 Exponential Functions

CS Fall 2016 (Shavlik©), Lecture 5

GCSE Revision 101 Maths Quadratics © Daniel Holloway.

CS Fall 2015 (Shavlik©), Midterm Topics

Inference in Bayesian Networks

Naive Bayes Classifier

Exponents Scientific Notation

CS Fall 2016 (Shavlik©), Lecture 11, Week 6

Why are measurement units important? Why do we use significant digits?

Read R&N Ch Next lecture: Read R&N

cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14

Objective The student will be able to:

BASIC PROBABILITY Probability – the chance of something (an event) happening # of successful outcomes # of possible outcomes All probability answers must.

Unit #4 Rational Expressions Chapter 5 Sections 2-5

Solving Linear Inequalities

CS Fall 2016 (Shavlik©), Lecture 12, Week 6

MAT 105 FALL 2008 Review of Factoring and Algebraic Fractions

cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9

Read R&N Ch Next lecture: Read R&N

Multiplying Rational Numbers

7/8 A Review and new concepts with Integers!

LO Using pictures to represent directed number calculations. RAG

Scientific Notation.

Dr. Clincy Professor of CS

cs540 - Fall 2016 (Shavlik©), Lecture 18, Week 10

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

CS Fall 2016 (Shavlik©), Lecture 2

Dr. Clincy Professor of CS

Pattern Recognition and Image Analysis

Division Properties of Exponents

ML – Lecture 3B Deep NN.

What LIMIT Means Given a function: f(x) = 3x – 5 Describe its parts.

The mathematician’s shorthand

CS Fall 2016 (Shavlik©), Lecture 12, Week 6

Bayesian Learning Chapter

Directed Graphical Probabilistic Models: the sequel

Addition and multiplication

ECE 352 Digital System Fundamentals

Dr. Clincy Professor of CS

Read R&N Ch Next lecture: Read R&N

Learning Target I can multiply and divide integers.

Dr. Clincy Professor of CS

Objectives You will be able to:

Describe the rules to adding and subtracting integers

Read R&N Ch Next lecture: Read R&N

Scientific Notation.

Division Properties of Exponents

CS 188: Artificial Intelligence Fall 2008

Presentation transcript:

CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9 11/24/2018 Today’s Topics Back to Naïve Bayes (NB) NB as a BN More on HW3’s Nannon BN Some Hints on Full-Joint Prob Table for Nannon Log-Odds Calculations odds(x)  prob(x) / (1 – prob(x)) = prob(x) / prob(  x) Time Permitting: Some More BN Practice Probabilistic Reasoning Wrapup 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9

Recap: Naïve Bayes Parameter Learning Use training data to estimate (for Naïve Bayes) in one pass through data P(fi = vj | category = POS) for each i, j P(fi = vj | category = NEG) for each i, j P(category = POS) P(category = NEG) // Note: Some of above unnecessary since some combo’s of probs sum to 1 Apply Bayes’ rule to find odds(category = POS | test example’s features) Incremental/Online Learning Easy simply increment counters (true for BN’s in general, if no incremental ‘structure learning’) 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9

CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9 From Fall 2014 Final Consider the following training set, where three Boolean-valued features are used to predict a Boolean-valued output. Assume you wish to apply the Naïve Bayes algorithm. Calculate the ratio below and use pseudo examples. Prob(Output = True | A = False, B = False, C = False) ____________________________________________ = _________________ Prob(Output = False | A = False, B = False, C = False) 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9

CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9 From Fall 2014 Final Consider the following training set, where three Boolean-valued features are used to predict a Boolean-valued output. Assume you wish to apply the Naïve Bayes algorithm. Calculate the ratio below and use pseudo examples. Prob(Output = True | A = False, B = False, C = False) ____________________________________________ = _________________ Prob(Output = False | A = False, B = False, C = False) P(¬A | Out)  P (¬B | Out)  P (¬C | Out)  P ( Out) (3 / 5)  (2 / 5)  (2 / 5)  (5 / 8) = ________________________________________________ = ___________________________ P(¬A | ¬Out)  P (¬B | ¬Out)  P (¬C | ¬Out)  P (¬Out) (2 / 3)  (2 / 3)  (1 / 3)  (3 / 8) Assume FOUR pseudo examples (ffft, tttt, ffff, tttf) 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9

An Unassigned NB Paper-and-Pencil Problem (from 2015 Final) Consider the following training set, where two Boolean-valued features are used to predict a Boolean-valued output. Assume you wish to apply the Naïve Bayes algorithm. Calculate the ratio below, showing your work below it and putting your final (numeric) answer on the line to the right of the equal sign. Be sure to consider pseudo examples (use m=1). Prob(Output = True | A = False, B = False) _______________________________________ = __________________ Prob(Output = False | A = False, B = False) Assume FOUR pseudo examples (fft, ttt, fff, ttf) Ex # A B Output 1 True False 2 3 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9

CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9 NB as a BN 1 CPT of size 2n+1 (N+1) CPTs of size 1 P(D | S1  S2  S3  …  Sn) = [  P(Si | D) ] x P(D) / P(S1  S2  S3  …  Sn) We only need to compute this part if we use the ‘odds’ method from prev slides D … S1 S2 S3 Sn 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9

CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9 Why Just the Numerator? [  P(Si | D) ] x P(D) = P(D | S1  S2  S3  …  Sn) // Cross multiply x P(S1  S2  S3  …  Sn) = P(D  S1  S2  S3  …  Sn) - hence, we are calculating the joint prob of the symptoms and the disease 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9

HW3: Draw a BN then Implement the Calculation for that BN (also do NB) WIN … S1 S2 S3 Sn [  P(Si = valuei | WIN=true) ] x P(WIN=true) Odds(WIN) = [  P(Si = valuei | WIN=false) ] x P(WIN=false) Recall: Choose move that gives best odds of winning 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9

Going Slightly Beyond NB WIN … S1 S2 S3 Sn P(S1 = ? | S2 = ?  WIN)  [  P(Si = ? | WIN) ] x P( WIN) Odds(WIN) = P(S1 = ? | S2 = ?  WIN)  [  P(Si = ? | WIN) ] x P(WIN) Here the PRODUCT is from 2 to n 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9

Going Slightly Beyond NB (Part 2) WIN … S1 S2 S3 Sn P(S1 = ?  S2 = ? | WIN)  [  P(Si = ? | WIN) ] x P( WIN) Odds(WIN) = P(S1 = ?  S2 = ? | WIN)  [  P(Si = ? | WIN) ] x P(WIN) Here the PRODUCT is from 3 to n A little bit of joint probability! Used: P(S1 = ?  S2 = ? | WIN) = P(S1 = ? | S2 = ?  WIN) x P(S2 = ? | WIN) 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9

Hints on the Full Joint for Nannon Assume you have THREE binary-valued features (RV1, RV2, and RV3) that describe a board+move How to create a full joint table? One design: int[][][] fullJoint_wins = new int[2][2][2]; // Init to m (not shown). int[][][] fullJoint_losses = new int[2][2][2]; // Init to m (not shown). Assume we get RV1=T, RV2=F, and RV3=F in a WIN. Then do fullJoint_wins[1][0][0]++ // Also increment countMoves_wins. 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9

Hints on the Full Joint for Nannon (2) Assume we get RV1=T, RV2=T, and RV3=T in and want to get prob this is in a WIN. By Bayes’ Rule: P(win | RV1, RV2, RV3) = P(RV1, RV2, RV3 | win ) x P(win ) / denom P(loss | RV1, RV2, RV3) = P(RV1, RV2, RV3 | loss) x P(loss) / denom So (recall num + denom on left below SUM to 1, so can solve for win prob) P(win | RV1, RV2, RV3) P(RV1, RV2, RV3 | win ) x P(win ) P(loss | RV1, RV2, RV3) P(RV1, RV2, RV3 | loss) x P(loss) (fullJoint_wins[ 1][1][1] / wins) x (wins / totalMoves) = (fullJoint_losses[1][1][1] / losses) x (losses / totalMoves) = 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9

Hints on the Full Joint for Nannon (3) (fullJoint_wins[ 1][1][1] / wins) x (wins / totalMoves) = (fullJoint_losses[1][1][1] / losses) x (losses / totalMoves) Can greatly simplify the above (from prev slide)! Remember to create DOUBLEs! Don’t want integer division wins is really countMoves_wins, since MOVE is our unit of analysis And losses is really countMoves_losses totalMoves = countMoves_wins + countMoves_losses - though it cancels out anyway 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9

Hints on the Full Joint for Nannon (4) (double) fullJoint_wins[ 1][1][1] + m // m-estimate explicit = fullJoint_losses[1][1][1] + m So we really only need to compute: ratio of times this complete world state appears in a WINNING game compared to in a LOSING game This is almost embarrassingly simple, but be sure you understand the steps that lead to this simple ratio of two counters (simplified BN calculation not quite this simple, eg divide by WINS many times) 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9

Notice we removed one  via algebra Log Odds Odds > 1 iff prob > 0.5 Odds(x)  prob(x) / (1 – prob(x)) Recall (and now assuming D only has TWO values) 1) P( D | S1  S2  S3  …  Sn) = [  P(Si | D) ] x P( D) / P(S1  S2  S3  …  Sn) P(D | S1  S2  S3  …  Sn) = [  P(Si | D) ] x P(D) / P(S1  S2  S3  …  Sn) Dividing (1) by (2), denominators cancel out! P( D | S1  S2  S3  …  Sn) [  P(Si | D) ] x P( D) = P(D | S1  S2  S3  …  Sn) [  P(Si | D) ] x P(D) Since P(D | S1  S2  S3  …  Sn) = 1 - P(D | S1  S2  S3  …  Sn) odds(D | S1  S2  S3  …  Sn) = [  { P(Si | D) / P(Si | D) } ] x [ P(D) / P(D) ] Notice we removed one  via algebra 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9

CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9 The Missing Algebra The Implicit Algebra from Prev Page a1 ₓ a2 ₓ a3 ₓ … ₓ an b1 ₓ b2 ₓ b3 ₓ … ₓ bn = (a1 / b1) ₓ (a2 / b2) ₓ (a3 / b3) ₓ … ₓ (an / bn) 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9

CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9 Log Odds (continued) Odds(x)  prob(x) / (1 – prob(x)) We ended two slides ago with odds(D | S1  S2  S3  …  Sn) = [  { P(Si | D) / P(Si | D) } ] x [ P(D) / P(D) ] Recall log(A  B) = log(A) + log(B), so we have log [ odds(D | S1  S2  S3  …  Sn) ] = {  log [ P(Si | D) / P(Si | D) ] } + log [ P(D) / P(D) ] If log-odds > 0, D is more likely than D since log(x) > 0 iff x > 1 If log-odds < 0, D is less likely than D since log(x) < 0 iff x < 1 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9

CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9 Log Odds (concluded) We ended last slide with log [ odds(D | S1  S2  S3  …  Sn) ] = {  log [ P(Si | D) / P(Si | D) ] } + log [ P(D) / P(D) ] Consider log [ P(D) / P D) ] if D more likely than D, we start the sum with a positive value Consider each log [ P(Si | D) / P(Si | D) ] if Si more likely give D than given D, we add to the sum a pos value if less likely, we add negative value if Si independent of D, we add zero At end we see if sum is POS (D more likely), ZERO (tie), or NEG (D more likely) 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9

Viewing NB as a PERCEPTON, the Simplest Neural Network log [ P(S1 | D) / P(S1 | D) ] S1 log [ P(S1| D) / P(S1 | D) ] … log-odds log [ P(Sn | D) / P(Sn | D) ] out Sn If Si = true, then NODE Si=1 and NODE Si=0 log [ P(Sn| D) / P(Sn | D) ] Sn log [ P(D) / P(D) ] If Si = false, then NODE Si=0 and NODE Si=1 ‘1’ 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9

More Practice: From Fall 2014 Final What is the probability that A and C are true but B and D are false? What is the probability that A is false, B is true, and D is true? What is the probability that C is true given A is false, B is true, and D is true? 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9

More Practice: From Fall 2014 Final What is the probability that A and C are true but B and D are false? = P(A)  (1 – P(B))  P(C | A ˄ ¬B)  (1 - P(D | A ˄ ¬B ˄ C)) = 0.3  (1 – 0.8)  0.6  (1 – 0.6) What is the probability that A is false, B is true, and D is true? = P(¬A ˄ B ˄ D) = P(¬A ˄ B ˄ ¬C ˄ D) + P(¬A ˄ B ˄ C ˄ D) = process ‘complete world states’ like first question What is the probability that C is true given A is false, B is true, and D is true? = P(C | ¬A ˄ B ˄ D) = P(C ˄ ¬A ˄ B ˄ D) / P(¬A ˄ B ˄ D) = process like first and second questions 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9

Bayesian Networks Wrapup BNs one Type of ‘Graphical Model’ Lots of Applications (though currently focus on ‘deep [neural] networks’) Bayes’ Rule Appealing Way to Go from EFFECTS to CAUSES (ie, diagnosis) Full Joint Prob Tables and Naïve Bayes are Interesting ‘Limit Cases’ of BNs With ‘Big Data,’ Counting Goes a Long Way! 11/3/16 CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9