CS 540 - Fall 2016 (Shavlik©), Lecture 5 5/7/2018 Today’s Topics A Bit More on ML Training Examples Experimental Methodology for ML How do we measure how well we learned? Simple ML Algo: k-Nearest Neighbors Tuning Parameters Some “ML Commandments” 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5
A Richer Sense of Example: Eg, the Internet Movie Database (IMDB) IMDB richly represents data note each movie is potentially represented by a graph of a different size Figure from David Jensen of UMass 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5
CS 540 - Fall 2016 (Shavlik©), Lecture 5 Learning with Data in Multiple Tables (Relational ML) – not covered in cs540 Previous Mammograms Previous Blood Tests Patients Prev. Rx Key challenge different amount of data for each patient 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5
Getting Labeled Examples The ‘Achilles Heel’ of ML Often ‘experts’ label eg ‘books I like’ or ‘patients that should get drug X’ ‘Time will tell’ concepts wait a month and see if medical treatment worked or stock appreciated over a year Use of Amazon Mechanical Turk ‘the crowd’ Need representative examples, especially good ‘negative’ (counter) examples 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5
If it is Free, You are the Product Google is using authentication (as a human) as a way to get labeled data for their ML algorithms! 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5
IID and Other Assumptions We are assuming examples are IID: independently identically distributed We are ignoring temporal dependencies (covered in time-series learning) We assume the ML algo has no say in which examples it gets (covered in active learning) Data arrives in any order 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5
Train/Tune/Test Sets: A Pictorial Overview collection of classified examples (here each column is an example) training examples testing examples train’ set tune set classifier generate solutions select best expected accuracy on future examples ML Algo 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5
CS 540 - Fall 2016 (Shavlik©), Lecture 5 Why Not Learn After Each Test Example? (as opposed to i ex’s in testset) In ‘production mode,’ this would make sense (assuming one later received the correct label) In ‘experiments,’ we wish to estimate Probability we’ll label the next example correctly need several samples to accurately estimate 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5
N -fold Cross Validation Can be used to 1) estimate future accuracy (via test sets) 2) choose parameter settings (via tuning sets) Method 1) Randomly permute examples 2) Divide into N bins 3) Train on N - 1 bins, measure accuracy on bin ‘left out’ 4) Compute average accuracy on held-out sets Examples Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5
Dealing with Data that Comes from Larger Objects Assume examples are sentences contained in books Or web pages from computer science depts Or short DNA sequences from genes (Usually) need to cross validate on the LARGER objects Eg, first partition books into N folds, then collect sentences from a fold’s books Sentences in Books Fold1 Fold2 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5
Doing Well by doing Poorly You say: “Bad news, my testset accuracy is only 1%” (on a two-category task) I say: “That is great news!” Why? If you NEGATE your predictions, you’ll have 99% accuracy! 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5
Doing Poorly by Doing Well You say: “Good news, my testset accuracy is 95%” (on a two-category task) I say: “That is bad news!” Why might that be? Because (let’s assume) the most common output value occurs 99% of the time! 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5
Nearest-Neighbor Algorithms (aka exemplar models, instance-based learning, case-based learning) – Section 18.1.1 of textbook Learning ≈ memorize training examples Problem solving = find most similar example in memory; output its category Venn - - “Voronoi Diagrams” (all points closest to labeled example in center) + + + + + - … - - - + - + + + + ? - 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5
Nearest Neighbors: Basic Algorithm Find the K nearest neighbors to test-set example Or find all ex’s within radius R Combine their ‘votes’ Most common category Average value (real-valued prediction) Can also weight votes by distance Lots of variations on basic theme + - + ? - - 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5
CS 540 - Fall 2016 (Shavlik©), Lecture 5 Simple Example: 1-NN (1-NN ≡ one nearest neighbor) Training Set a=0, b=0, c=1 + a=0, b=0, c=0 - a=1, b=1, c=1 - Test Example a=0, b=1, c=0 ? “Hamming Distance” (# of different bits) Ex 1 = 2 Ex 2 = 1 Ex 3 = 2 So output - 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5
Sample Experimental Results (see UC-Irvine archive for more) Testbed Testset Correctness 1-NN D-Trees Neural Nets Wisconsin Cancer 98% 95% 96% Heart Disease 78% 76% ? Tumor 37% 38% Appendicitis 83% 85% 86% Why so low? Simple algorithm works quite well! 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5
Parameter Tuning (First Visit) Algo: Collect K nearest neighbors, combine their outputs What should K be? It is problem (ie, testbed) dependent Can use tuning sets to select good setting for K Shouldn’t really “connect the dots” (Why?) Tuning Set Error Rate 2 3 4 5 K 1 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5
Why Not Use the TEST Set to Select Good Parameters? A 2002 paper in Nature (a major, major journal) needed to be corrected due to “training on the testing set” Original report : 95% accuracy (5% error rate) Corrected report (which still is buggy): 73% accuracy (27% error rate) Error rate increased over 400%!!! This is, unfortunately, a very common error 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5
Some ML “Commandments” Let the data decide ‘Internalize’ (ie, tune) parameters Scaling up by dummying down Don’t ignore simple algo’s, such as Always guessing most common category in the training set Find best SINGLE feature Clever ideas do not imply better results Generalize don’t memorize Accuracy on held-aside data is our focus Never train on the test examples! Commonly violated, alas 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5