Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.

Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do not use company-specific discussion forums for cs540 – use Moodle (or request we add Piazza) Everyone gets 10 free late days this term (but at most 5 per HW) Learning from Labeled Examples Supervised Learning and Venn Diagrams Simple ML Algo: k-Nearest Neighbors –Read Section 18.8.1 of textbook and Wikipedia article(s) linked to class home pageclass home page Tuning Parameters Some “ML Commandments” 9/15/15CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 21

Where We Are Have selected ‘concept’ to learn Have chosen features to rep examples Have created at least 100 labeled examples Next: learn a ‘model’ that can predict output for NEW examples 9/15/15 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2 Lecture 1, Slide 2

9/15/15 Learning from Labeled Examples Positive ExamplesNegative Examples Category of this example? Concept Solid Red Circle in a (Regular?) Polygon What about? Figures on left side of page Figures drawn before 5pm 2/2/89 3CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2

9/15/15...... Concept Learning Learning systems differ in how they represent concepts Training Examples Backpropagation ID3, C4.5, CART AQ, FOIL, Aleph SVMs Neural Net Decision Tree Φ  X  Y Φ  Z Rules If 5x 1 + 9x 2 – 3x 3 > 12 Then + Weighted Sum 4CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2

9/15/15 Recall: Feature Space If examples are described in terms of values of features, they can be plotted as points in an N-dimensional space Size Color Weight ? Big 2500 Gray A “concept” is then a (possibly disjoint) volume in this space 5CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2

9/15/15 Supervised Learning and Venn Diagrams Concept = A or B (ie, a disjunctive concept) Examples = labeled points in feature space Concept = a label for regions of feat. space Venn Diagram A B - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + +++ + + + + + + + + + + + + + + Feature Space 6CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2

9/15/15 Brief Introduction to Logic Conjunctive Concept Color(?obj1, red)  Size(?obj1, large) Disjunctive Concept Color(?obj2, blue)  Size(?obj2, small) More formally a “concept” is of the form  x  y  z F(x, y, z)  Member(x, Class1) “and” “or” Instances 7CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2

Logical Symbols  and  or  not  implies  equivalent  for all  there exists 9/15/15CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 28

Deduction –compute what logically follows –if we know P(Mary) is true and  x P(x)  Q(x), we can deduce Q(Mary) Induction –if we observe P(1), P(2), …, P(100) we can induce  x P(x) –might be wrong Induction vs. Deduction 9/15/15 Which does supervised ML do? 9CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2

9/15/15 Nearest-Neighbor Algorithms (aka exemplar models, instance-based learning, case-based learning) – Section 18.1.1 of textbook Learning ≈ memorize training examples Problem solving = find most similar example in memory; output its category Venn - - - - - - - - + + + + ++ + + + + ? … “Voronoi Diagrams” (all points closest to labeled example in center) 10CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2

Find the K nearest neighbors to test-set example Or find all ex’s within radius R Combine their ‘votes’ –Most common category –Average value (real-valued prediction) –Can also weight votes by distance –Lots of variations on basic theme Nearest Neighbors: Basic Algorithm 9/15/15CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2 - - + ? 11 - - +

9/15/15 “Hamming Distance” (# of different bits) Ex 1 = 2 Ex 2 = 1 Ex 3 = 2 Simple Example: 1-NN Training Set 1.a=0, b=0, c=1 + 2.a=0, b=0, c=0 - 3.a=1, b=1, c=1 - Test Example a=0, b=1, c=0 ? So output - (1-NN ≡ one nearest neighbor) 12CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2

9/15/15CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2 Some Common Jargon Classification Learning a discrete valued function Regression Learning a real valued function k-NN easily extended to regression tasks (and to multi-category classification) – HOW? Discrete/Real Outputs (inputs can be real valued in both cases) 13

9/15/15 Sample Experimental Results (see UC-Irvine archive for more) Testbed Testset Correctness 1-NND-Trees Neural Nets Wisconsin Cancer 98%95%96% Heart Disease 78%76%? Tumor 37%38%? Appendicitis 83%85%86% Simple algorithm works quite well! 14CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2 Why so low?

Doing Well by doing Poorly You say: “Bad news, my testset accuracy is only 1%” (on a two-category task) I say: “That is great news!” Why? If you NEGATE your predictions, you’ll have 99% accuracy! 9/15/15CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 215

Doing Poorly by Doing Well You say: “Good news, my testset accuracy is 95%” (on a two-category task) I say: “That is bad news!” Why might that be? Because (let’s assume) the most common output value occurs 99% of the time! 9/15/15CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 216

9/15/15CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 2 Parameter Tuning (First Visit) Algo: Collect K nearest neighbors, combine their outputs What should K be? –It is problem (ie, testbed) dependent –Can use tuning sets to select good setting for K 1 Shouldn’t really “connect the dots” (Why?) Tuning Set Error Rate 2345 K 17

Why Not Use the TEST Set to Select Good Parameters? A 2002 paper in Nature (a major, major journal) needed to be corrected due to “training on the testing set” Original report : 95% accuracy (5% error rate) Corrected report (which still is buggy): 73% accuracy (27% error rate) Error rate increased over 400%!!! This is, unfortunately, a very common error 9/15/15CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 218

Some ML “Commandments” Let the data decide –‘Internalize’ (ie, tune) parameters Scaling up by dummying down –Don’t ignore simple algo’s, such as Always guessing most common category in the training set Find best SINGLE feature –Clever ideas do not imply better results Generalize don’t memorize –Accuracy on held-aside data is our focus Never train on the test examples! –Commonly violated, alas 9/15/15CS 540 - Fall 2015 (© Jude Shavlik), Lecture 3, Week 219

Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.

Similar presentations

Presentation on theme: "Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.

Similar presentations

Presentation on theme: "Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do."— Presentation transcript:

Similar presentations

About project

Feedback