Today’s Topics Some Exam-Review Notes –Midterm is Thurs, 5:30-7:30pm HERE –One 8.5x11 inch page of notes (both sides), simple calculator (log’s and arithmetic)

Slides:



Advertisements
Similar presentations
Rutgers CS440, Fall 2003 Review session. Rutgers CS440, Fall 2003 Topics Final will cover the following topics (after midterm): 1.Uncertainty & introduction.
Advertisements

Uncertainty Everyday reasoning and decision making is based on uncertain evidence and inferences. Classical logic only allows conclusions to be strictly.
Reasoning Under Uncertainty: Bayesian networks intro Jim Little Uncertainty 4 November 7, 2014 Textbook §6.3, 6.3.1, 6.5, 6.5.1,
For Monday Finish chapter 14 Homework: –Chapter 13, exercises 8, 15.
Planning under Uncertainty
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Overview Full Bayesian Learning MAP learning
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 12 Jim Martin.
Lecture 19 Exam: Tuesday June15 4-6pm Overview. General Remarks Expect more questions than before that test your knowledge of the material. (rather then.
CPSC 322, Lecture 31Slide 1 Probability and Time: Markov Models Computer Science cpsc322, Lecture 31 (Textbook Chpt 6.5) March, 25, 2009.
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
Learning Bayesian Networks
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Algebra Problems… Solutions Algebra Problems… Solutions © 2007 Herbert I. Gross Set 5 By Herbert I. Gross and Richard A. Medeiros next.
Crash Course on Machine Learning
Topics on Final Perceptrons SVMs Precision/Recall/ROC Decision Trees Naive Bayes Bayesian networks Adaboost Genetic algorithms Q learning Not on the final:
CS 540 – Introduction to AI Fall 2015
Today’s Topics Chapter 2 in One Slide Chapter 18: Machine Learning (ML) Creating an ML Dataset –“Fixed-length feature vectors” –Relational/graph-based.
Reasoning Under Uncertainty: Bayesian networks intro CPSC 322 – Uncertainty 4 Textbook §6.3 – March 23, 2011.
Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.
Today’s Topics FREE Code that will Write Your PhD Thesis, a Best-Selling Novel, or Your Next Methods for Intelligently/Efficiently Searching a Space.
Today’s Topics Dealing with Noise Overfitting (the key issue in all of ML) A ‘Greedy’ Algorithm for Pruning D-Trees Generating IF-THEN Rules from D-Trees.
1 CS 391L: Machine Learning: Bayesian Learning: Naïve Bayes Raymond J. Mooney University of Texas at Austin.
CS Fall 2015 (© Jude Shavlik), Lecture 7, Week 3
Today’s Topics Read –For exam: Chapter 13 of textbook –Not on exam: Sections & Genetic Algorithms (GAs) –Mutation –Crossover –Fitness-proportional.
Classification Techniques: Bayesian Classification
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
Today’s Topics Learning Decision Trees (Chapter 18) –We’ll use d-trees to introduce/motivate many general issues in ML (eg, overfitting reduction) “Forests”
CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Today’s Topics HW1 Due 11:55pm Today (no later than next Tuesday) HW2 Out, Due in Two Weeks Next Week We’ll Discuss the Make-Up Midterm Be Sure to Check.
Today’s Topics Playing Deterministic (no Dice, etc) Games –Mini-max –  -  pruning –ML and games? 1997: Computer Chess Player (IBM’s Deep Blue) Beat Human.
Today’s Topics 11/10/15CS Fall 2015 (Shavlik©), Lecture 22, Week 101 Support Vector Machines (SVMs) Three Key Ideas –Max Margins –Allowing Misclassified.
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
Today’s Topics Midterm class mean: 83.5 HW3 Due Thursday and HW4 Out Thursday Turn in Your BN Nannon Player (in Separate, ‘Dummy’ Assignment) until a Week.
Today’s Topics Graded HW1 in Moodle (Testbeds used for grading are linked to class home page) HW2 due (but can still use 5 late days) at 11:55pm tonight.
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
Today’s Topics Read: Chapters 7, 8, and 9 on Logical Representation and Reasoning HW3 due at 11:55pm THURS (ditto for your Nannon Tourney Entry) Recipe.
Today’s Topics 12/15/15CS Fall 2015 (Shavlik©), Lecture 31, Week 151 Exam (comprehensive, with focus on material since midterm), Thurs 5:30-7:30pm,
Today’s Topics Exam Thursday Oct 22. 5:30-7:3-pm, same room as lecture Makeup Thursday Oct 29? Or that Monday or Wednesday? Exam Covers Material through.
Stat 31, Section 1, Last Time Big Rules of Probability –The not rule –The or rule –The and rule P{A & B} = P{A|B}P{B} = P{B|A}P{A} Bayes Rule (turn around.
Classification Today: Basic Problem Decision Trees.
Today’s Topics Remember: no discussing exam until next Tues! ok to stop by Thurs 5:45-7:15pm for HW3 help More BN Practice (from Fall 2014 CS 540 Final)
Today’s Topics Bayesian Networks (BNs) used a lot in medical diagnosis M-estimates Searching for Good BNs Markov Blanket what is conditionally independent.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Today’s Topics 11/10/15CS Fall 2015 (Shavlik©), Lecture 21, Week 101 More on DEEP ANNs –Convolution –Max Pooling –Drop Out Final ANN Wrapup FYI:
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
CS Fall 2016 (Shavlik©), Lecture 5
CS Fall 2015 (Shavlik©), Midterm Topics
CS Fall 2016 (Shavlik©), Lecture 11, Week 6
cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14
From last time: on-policy vs off-policy Take an action Observe a reward Choose the next action Learn (using chosen action) Take the next action Off-policy.
CS Fall 2016 (Shavlik©), Lecture 12, Week 6
Data Mining Lecture 11.
Classification Techniques: Bayesian Classification
CS Fall 2016 (Shavlik©), Lecture 8, Week 5
cs540- Fall 2016 (Shavlik©), Lecture 15, Week 9
Read R&N Ch Next lecture: Read R&N
CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9
CS Fall 2016 (Shavlik©), Lecture 27, Week 15
cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11
cs540 - Fall 2016 (Shavlik©), Lecture 18, Week 10
CS Fall 2016 (Shavlik©), Lecture 9, Week 5
CS Fall 2016 (Shavlik©), Lecture 2
CS Fall 2016 (Shavlik©), Lecture 10, Week 6
CS Fall 2016 (Shavlik©), Lecture 12, Week 6
Presentation transcript:

Today’s Topics Some Exam-Review Notes –Midterm is Thurs, 5:30-7:30pm HERE –One 8.5x11 inch page of notes (both sides), simple calculator (log’s and arithmetic) –Don’t Discuss Actual Midterm with Others until Nov 3 Planning to Attend TA’s Review Tomorrow? Bayes’ Rule Naïve Bayes (NB) NB as a BN Prob Reasoning Wrapup Next: BN’s for Playing Nannon (HW3) 10/13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 61

Topics Covered So Far Some AI History and Philosophy (more final class) Learning from Labeled Data (more ahead) Reasoning from Specific Cases (k-NN) Searching for Solutions (many variants, common core) Projecting Possible Futures (eg, game-playing) Simulating ‘Problem Solving’ Done by the Biophysical World (SA, GA, and [next] neural nets) Reasoning Probabilistically (just Ch 13 & Lec 14) 10/13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 6 2 If you don’t recognize this …

Detailed List of Course Topics Learning from labeled data Experimental methodologies for choosing parameter settings and estimating future accuracy Decision trees and random forests Probabilistic models, nearest-neighbor methods Genetic algorithms Neural networks Support vector machines Reinforcement learning (if time permits) Searching for solutions Heuristically finding shortest paths Algorithms for playing games like chess Simulated annealing Genetic algorithms Reasoning probabilistically Probabilistic inference (just the basics so far) Bayes' rule Bayesian networks 10/13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 63 Reasoning from concrete cases Cased-based reasoning Nearest-neighbor algorithm Reasoning logically First-order predicate calculus Representing domain knowledge using mathematical logic Logical inference Problem-solving methods based on the biophysical world Genetic algorithms Simulated annealing Neural networks Philosophical aspects Turing test Searle's Chinese Room thought experiment The coming singularity Strong vs. weak AI Societal impact of AI

Some Key Ideas ML: Easy to fit training examples, hard to generalize to future examples (never use TESTSET to choose model!) SEARCH: OPEN holds partial solutions, how to choose which partial sol’n to extend? (CLOSED prevents infinite loops) PROB: Fill JOINT Prob table (explicitly or implicitly) simply by COUNTING data, then can answer all kinds of questions 10/13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 6 4

10/13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 6 Exam Advice Mix of ‘straightforward’ concrete problem solving and brief discussion of important AI issues and techniques Problem solving graded ‘precisely’ Discussion graded ‘leniently’ Previous exams great training and tune sets (hence soln’s not posted for old exams, ie so they can be used as TUNE sets) 5

10/13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 6 Exam Advice (cont.) Think before you write Briefly discuss important points Don’t do a ‘core dump’ Some questions are open-ended so budget your time wisely Always say SOMETHING 6

Bayes’ Rule Recall P(A  B)  P(A | B) x P(B)  P(B | A) x P(A) Equating the two RHS (right-hand-sides) we get P(A | B) = P(B | A) x P(A) / P(B) This is Bayes’ Rule! 10/13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 6 7

Common Usage - Diagnosing CAUSE Given EFFECTS P(disease | symptoms) = P(symptoms | disease) x P(disease) P(symptoms) 10/13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 6 8 Usually a big AND of several random variables, so a JOINT probability In HW3, you’ll compute prob(this move leads to a WIN | NANNON board configuration)

Simple Example (only ONE symptom variable) Assume we have estimated from data P(headache | condition=haveFlu) = 0.90 P(headache | condition=haveStress)= 0.40 P(headache | condition=healthy) = 0.01 P(haveFlu) = 0.01 // Dropping ‘condition=’ for clarity P(haveStress) = 0.20 // Because it’s midterms time! P(healthy) = 0.79 // We assume the 3 ‘diseases’ disjoint Patient comes in with headache, what is most likely diagnosis? 10/13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 6 9

Solution P(flu | headache) = 0.90  0.01 / P(headache) P(stress | headache) = 0.40  0.20 / P(headache) P(healthy | headache) = 0.01  0.79 / P(headache) STRESS most likely (by nearly a factor of 9) 10/13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 6 10 Note: we never need to compute the denominator to find most likely diagnosis! P(disease | symptoms) = P(symptoms | disease) x P(disease) P(symptoms)

Base-Rate Fallacy /13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 6 11 Assume Disease A is rare (one in 1 million, say – so picture not to scale) Assume population is 10B = So 10 4 people have it Assume testForA is 99.99% accurate You test positive. What is the prob you have Disease A? Someone (not in cs540) might naively think prob = People for whom testForA = true 9999 people that actually have Disease A 10 6 people that do NOT have Disease A Prob(A | testForA) = 0.01 A This same issue arises when have many more neg than pos ex’s – false pos overwhelm true pos 99.99% 0.01%

Recall: What if Symptoms NOT Disjoint? Assume we have symptoms A, B, and C, and they are not disjoint Convert to A’ = A   B   C G’ = A  B  C B’ =  A  B   C H’ =  A   B   C C’ =  A   B  C D’ = A  B   C E’ =  A  B  C F’ = A   B  C 10/13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 6 12

Dealing with Many Boolean-Valued Symptoms (D = Disease, S i = Symptom i ) P(D | S 1  S 2  S 3  …  S n ) // Bayes’ Rule = P(S 1  S 2  S 3  …  S n | D) x P(D) P(S 1  S 2  S 3  …  S n ) If n small, could use a full joint table If not, could design/learn a Bayes Net We’ll consider `conditional independence’ of S’s 10/13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 6 13

Assuming Conditional Independence Repeatedly using P(A  B | C)  P(A | C)  P(B | C) We get P(S 1  S 2  S 3  …  S n | D) =  P(S i | D) Assuming D has three possible disjoint values P(D 1 | S 1  S 2  S 3  …  S n ) = [  P(S i | D 1 ) ] x P(D 1 ) / P(S 1  S 2  S 3  …  S n ) P(D 2 | S 1  S 2  S 3  …  S n ) = [  P(S i | D 2 ) ] x P(D 2 ) / P(S 1  S 2  S 3  …  S n ) P(D 3 | S 1  S 2  S 3  …  S n ) = [  P(S i | D 3 ) ] x P(D 3 ) / P(S 1  S 2  S 3  …  S n ) We know  P(D i | S 1  S 2  S 3  …  S n ) = 1, so if we want, we could solve for P(S 1  S 2  S 3  …  S n ) and, hence, need not compute/approx it! 10/13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 6 14

Full Joint vs. Naïve Bayes Completely assuming conditional independence is called Naïve Bayes (NB) –We need to estimate (eg, from data) P(S i | D j ) // For each disease j, prob symptom i appears P(D j ) // Prob of each disease j If we have N binary-valued symptoms and a tertiary-valued disease, size of full joint is (3  2 N ) – 1 NB needs only (3 x N ) /13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 6 15

Log Odds Odds(x)  prob(x) / (1 – prob(x)) Recall (and now assuming D only has TWO values) 1) P( D | S 1  S 2  S 3  …  S n ) = [  P(S i | D) ] x P( D) / P(S 1  S 2  S 3  …  S n ) 2)P(  D | S 1  S 2  S 3  …  S n ) = [  P(S i |  D) ] x P(  D) / P(S 1  S 2  S 3  …  S n ) Dividing (1) by (2), denominators cancel out! P( D | S 1  S 2  S 3  …  S n ) [  P(S i | D) ] x P( D) = P(  D | S 1  S 2  S 3  …  S n ) [  P(S i |  D) ] x P(  D) Since P(  D | S 1  S 2  S 3  …  S n ) = 1 - P(D | S 1  S 2  S 3  …  S n ) odds(D | S 1  S 2  S 3  …  S n ) = [  { P(S i | D) / P(S i |  D) } ] x [ P(D) / P(  D) ] 10/13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 6 16 Odds > 1 iff prob > 0.5 Notice we removed one  via algebra

The Missing Algebra The Implicit Algebra from Prev Page a 1 a 2 a 3 … a n b 1 b 2 b 3 … b n = (a 1 / b 1 ) (a 2 / b 2 ) (a 3 / b 3 ) … (a n / b n ) 10/13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 6 17

Log Odds (continued) Odds(x)  prob(x) / (1 – prob(x)) We ended two slides ago with odds(D | S 1  S 2  S 3  …  S n ) = [  { P(S i | D) / P(S i |  D) } ] x [ P(D) / P(  D) ] Recall log(A  B) = log(A) + log(B), so we have log [ odds(D | S 1  S 2  S 3  …  S n ) ] = {  log [ P(S i | D) / P(S i |  D) ] } + log [ P(D) / P(  D) ] If log-odds > 0, D is more likely than  D since log(x) > 0 iff x > 1 If log-odds < 0, D is less likely than  D since log(x) < 0 iff x < 1 10/13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 6 18

Log Odds (concluded) We ended last slide with log [ odds(D | S 1  S 2  S 3  …  S n ) ] = {  log [ P(S i | D) / P(S i |  D) ] } + log [ P(D) / P(  D) ] Consider log [ P(D) / P  D) ] if D more likely than  D, we start the sum with a positive value Consider each log [ P(S i | D) / P(S i |  D) ] if S i more likely give D than given  D, we add to the sum a pos value if less likely, we add negative value if S i independent of D, we add zero At end we see if sum is POS (D more likely), ZERO (tie), or NEG (  D more likely) 10/13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 6 19

Viewing NB as a PERCEPTON, the Simplest Neural Network 10/13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 6 20 out S1S1 SnSn SnSn … ‘1’ S1S1 log [ P(S 1 | D) / P(S 1 |  D) ] log [ P(D) / P(  D) ] log [ P(  S 1 | D) / P(  S 1 |  D) ] log [ P(S n | D) / P(S n |  D) ] log [ P(  S n | D) / P(  S n |  D) ] If S i = true, then NODE S i =1 and NODE  S i =0 If S i = false, then NODE S i =0 and NODE  S i =1 log-odds

Naïve Bayes Example (for simplicity, ignore m-estimates here) S1S1 S2S2 S3S3 D TFTT FTTF FTTT TTFT TFTF FTTT TFFF 10/13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 6 21 Dataset P(D=true) = P(D=false) = P(S 1 =true | D = true) = P(S 1 =true | D = false) = P(S 2 =true | D = true) = P(S 2 =true | D = false) = P(S 3 =true | D = true) = P(S 3 =true | D = false) = ‘Law of Excluded Middle’ P(S 3 =true | D=false) + P(S 3 =false | D=false) = 1 so no need for the P(S i =false | D=?) estimates

Naïve Bayes Example (for simplicity, ignore m-estimates) S1S1 S2S2 S3S3 D TFTT FTTF FTTT TTFT TFTF FTTT TFFF 10/13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 6 22 Dataset P(D=true) = 4 / 7 P(D=false) = 3 / 7 P(S 1 =true | D = true) = 2 / 4 P(S 1 =true | D = false) = 2 / 3 P(S 2 =true | D = true) = 3 / 4 P(S 2 =true | D = false) = 1 / 3 P(S 3 =true | D = true) = 3 / 4 P(S 3 =true | D = false) = 2 / 3

Processing a ‘Test’ Example Prob(D = true | S 1 = true  S 2 = true  S 3 = true) ? Odds(D | S 1  S 2  S 3 ) = // Recall Odds(x)  Prob(x) / (1 – Prob(x)) P(S 1 | D)  P(S 2 | D)  P(S 3 | D)  P( D) P(S 1 |  D)  P(S 2 |  D)  P(S 3 |  D)  P(  D) = (3 / 4)  (9 / 4)  (9 / 8)  (4 / 3) = 81 / 32 = 2.53 Use Prob(x) = Odds(x) / (1 + Odds(x)) to get prob = /13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 6 23 Here, vars = true unless NOT sign present

NB as a BN P(D | S 1  S 2  S 3  …  S n ) = [  P(S i | D) ] x P(D) / P(S 1  S 2  S 3  …  S n ) 10/13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 6 24 We only need to compute this part if we use the ‘odds’ method from prev slides S1S1 S2S2 S3S3 SnSn … D 1 CPT of size 2 n+1 (N+1) CPTs of size 1

Recap: Naïve Bayes Parameter Learning Use training data to estimate (for Naïve Bayes) in one pass through data P(f i = v j | category = POS) for each i, j P(f i = v j | category = NEG) for each i, j P(category = POS) P(category = NEG) // Note: Some of above unnecessary since some combo’s of probs sum to 1 Apply Bayes’ rule to find odds(category = POS | test example’s features) Incremental/Online Learning Easy simply increment counters (true for BN’s in general, if no ‘structure learning’) 10/13/15Lecture #6, Slide 25CS Fall 2015 (Shavlik©), Lecture 16, Week 6

10/13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 626 Is NB Naïve? Surprisingly, the assumption of independence, while most likely violated, is not too harmful! Naïve Bayes works quite well –Very successful in text categorization (‘bag-o- words’ rep) –Used in printer diagnosis in Windows, spam filtering, etc Prob’s not accurate (‘uncalibrated’) due to double counting, but good at seeing if prob > 0.5 or prob < 0.5 Resurgence of research activity in Naïve Bayes –Many ‘dead’ ML algo’s resuscitated by availability of large datasets (KISS Principle)

A Major Weakness of BN’s If many ‘hidden’ random vars (N binary vars, say), then the marginalization formula leads to many calls to a BN (2 N in our example; for N = 20, 2 N = 1,048,576) Using uniform-random sampling to estimate the result is too inaccurate since most of the probability might be concentrated in only a few ‘complete world states’ Hence, much research (beyond cs540’s scope) on scaling up inference in BNs and other graphical models, eg via more sophisticated sampling (eg, MCMC) 10/13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 6 27

Bayesian Networks Wrapup BNs one Type of ‘Graphic Model’ Lots of Applications (though currently focus on ‘deep [neural] networks’) Bayes’ Rule Appealing Way to Go from EFFECTS to CAUSES (ie, diagnosis) Full Joint Prob Tables and Naïve Bayes are Interesting ‘Limit Cases’ of BNs With ‘Big Data,’ Counting Goes a Long Way! 10/13/15CS Fall 2015 (Shavlik©), Lecture 16, Week 6 28