CS Fall 2016 (Shavlik©), Lecture 5

Slides:



Advertisements
Similar presentations
Imbalanced data David Kauchak CS 451 – Fall 2013.
Advertisements

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Statistical Classification Rong Jin. Classification Problems X Input Y Output ? Given input X={x 1, x 2, …, x m } Predict the class label y  Y Y = {-1,1},
Prof. Ramin Zabih (CS) Prof. Ashish Raj (Radiology) CS5540: Computational Techniques for Analyzing Clinical Data.
Instance Based Learning IB1 and IBK Small section in chapter 20.
Experimental Evaluation
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 01: Training, Testing, and Tuning Datasets.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Today’s Topics Chapter 2 in One Slide Chapter 18: Machine Learning (ML) Creating an ML Dataset –“Fixed-length feature vectors” –Relational/graph-based.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.
CS Fall 2015 (© Jude Shavlik), Lecture 7, Week 3
Today’s Topics Read –For exam: Chapter 13 of textbook –Not on exam: Sections & Genetic Algorithms (GAs) –Mutation –Crossover –Fitness-proportional.
Ensemble Methods: Bagging and Boosting
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
CSSE463: Image Recognition Day 11 Lab 4 (shape) tomorrow: feel free to start in advance Lab 4 (shape) tomorrow: feel free to start in advance Test Monday.
Today’s Topics Learning Decision Trees (Chapter 18) –We’ll use d-trees to introduce/motivate many general issues in ML (eg, overfitting reduction) “Forests”
Today’s Topics HW1 Due 11:55pm Today (no later than next Tuesday) HW2 Out, Due in Two Weeks Next Week We’ll Discuss the Make-Up Midterm Be Sure to Check.
Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.
CSSE463: Image Recognition Day 11 Due: Due: Written assignment 1 tomorrow, 4:00 pm Written assignment 1 tomorrow, 4:00 pm Start thinking about term project.
Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.
CS Machine Learning Instance Based Learning (Adapted from various sources)
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
Data Science Credibility: Evaluating What’s Been Learned
OPERATING SYSTEMS CS 3502 Fall 2017
What is a Hidden Markov Model?
Evaluating Classifiers
cs638/838 - Spring 2017 (Shavlik©), Week 10
Ensembles (Bagging, Boosting, and all that)
CS Fall 2015 (Shavlik©), Midterm Topics
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Instance Based Learning
Cross-validation for detecting and preventing overfitting
CSSE463: Image Recognition Day 11
Ch8: Nonparametric Methods
CS Fall 2016 (© Jude Shavlik), Lecture 4
cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14
CS Fall 2016 (Shavlik©), Lecture 12, Week 6
Cost-Sensitive Learning
Data Mining (and machine learning)
8/28/15 Today I will explain the steps to the scientific method
CS 4/527: Artificial Intelligence
Instance Based Learning (Adapted from various sources)
cs638/838 - Spring 2017 (Shavlik©), Week 7
K Nearest Neighbor Classification
Machine Learning Today: Reading: Maria Florina Balcan
Data Mining Practical Machine Learning Tools and Techniques
Cost-Sensitive Learning
CSSE463: Image Recognition Day 11
CS540 - Fall 2016(Shavlik©), Lecture 16, Week 9
cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11
CS Fall 2016 (Shavlik©), Lecture 2
Ensembles.
CS5112: Algorithms and Data Structures for Applications
ML – Lecture 3B Deep NN.
October 6, 2011 Dr. Itamar Arel College of Engineering
Ensemble learning.
CS Fall 2016 (Shavlik©), Lecture 12, Week 6
Model Combination.
Machine learning overview
Evaluating Classifiers
CSSE463: Image Recognition Day 11
CSSE463: Image Recognition Day 11
CS639: Data Management for Data Science
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
CS249: Neural Language Model
Evaluation David Kauchak CS 158 – Fall 2019.
Ensembles (Bagging, Boosting, and all that)
Presentation transcript:

CS 540 - Fall 2016 (Shavlik©), Lecture 5 5/7/2018 Today’s Topics A Bit More on ML Training Examples Experimental Methodology for ML How do we measure how well we learned? Simple ML Algo: k-Nearest Neighbors Tuning Parameters Some “ML Commandments” 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5

A Richer Sense of Example: Eg, the Internet Movie Database (IMDB) IMDB richly represents data note each movie is potentially represented by a graph of a different size Figure from David Jensen of UMass 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5

CS 540 - Fall 2016 (Shavlik©), Lecture 5 Learning with Data in Multiple Tables (Relational ML) – not covered in cs540 Previous Mammograms Previous Blood Tests Patients Prev. Rx Key challenge different amount of data for each patient 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5

Getting Labeled Examples The ‘Achilles Heel’ of ML Often ‘experts’ label eg ‘books I like’ or ‘patients that should get drug X’ ‘Time will tell’ concepts wait a month and see if medical treatment worked or stock appreciated over a year Use of Amazon Mechanical Turk ‘the crowd’ Need representative examples, especially good ‘negative’ (counter) examples 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5

If it is Free, You are the Product Google is using authentication (as a human) as a way to get labeled data for their ML algorithms! 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5

IID and Other Assumptions We are assuming examples are IID: independently identically distributed We are ignoring temporal dependencies (covered in time-series learning) We assume the ML algo has no say in which examples it gets (covered in active learning) Data arrives in any order 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5

Train/Tune/Test Sets: A Pictorial Overview collection of classified examples (here each column is an example) training examples testing examples train’ set tune set classifier generate solutions select best expected accuracy on future examples ML Algo 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5

CS 540 - Fall 2016 (Shavlik©), Lecture 5 Why Not Learn After Each Test Example? (as opposed to i ex’s in testset) In ‘production mode,’ this would make sense (assuming one later received the correct label) In ‘experiments,’ we wish to estimate Probability we’ll label the next example correctly need several samples to accurately estimate 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5

N -fold Cross Validation Can be used to 1) estimate future accuracy (via test sets) 2) choose parameter settings (via tuning sets) Method 1) Randomly permute examples 2) Divide into N bins 3) Train on N - 1 bins, measure accuracy on bin ‘left out’ 4) Compute average accuracy on held-out sets Examples Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5

Dealing with Data that Comes from Larger Objects Assume examples are sentences contained in books Or web pages from computer science depts Or short DNA sequences from genes (Usually) need to cross validate on the LARGER objects Eg, first partition books into N folds, then collect sentences from a fold’s books Sentences in Books Fold1 Fold2 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5

Doing Well by doing Poorly You say: “Bad news, my testset accuracy is only 1%” (on a two-category task) I say: “That is great news!” Why? If you NEGATE your predictions, you’ll have 99% accuracy! 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5

Doing Poorly by Doing Well You say: “Good news, my testset accuracy is 95%” (on a two-category task) I say: “That is bad news!” Why might that be? Because (let’s assume) the most common output value occurs 99% of the time! 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5

Nearest-Neighbor Algorithms (aka exemplar models, instance-based learning, case-based learning) – Section 18.1.1 of textbook Learning ≈ memorize training examples Problem solving = find most similar example in memory; output its category Venn - - “Voronoi Diagrams” (all points closest to labeled example in center) + + + + + - … - - - + - + + + + ? - 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5

Nearest Neighbors: Basic Algorithm Find the K nearest neighbors to test-set example Or find all ex’s within radius R Combine their ‘votes’ Most common category Average value (real-valued prediction) Can also weight votes by distance Lots of variations on basic theme + - + ? - - 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5

CS 540 - Fall 2016 (Shavlik©), Lecture 5 Simple Example: 1-NN (1-NN ≡ one nearest neighbor) Training Set a=0, b=0, c=1 + a=0, b=0, c=0 - a=1, b=1, c=1 - Test Example a=0, b=1, c=0 ? “Hamming Distance” (# of different bits) Ex 1 = 2 Ex 2 = 1 Ex 3 = 2 So output - 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5

Sample Experimental Results (see UC-Irvine archive for more) Testbed Testset Correctness 1-NN D-Trees Neural Nets Wisconsin Cancer 98% 95% 96% Heart Disease 78% 76% ? Tumor 37% 38% Appendicitis 83% 85% 86% Why so low? Simple algorithm works quite well! 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5

Parameter Tuning (First Visit) Algo: Collect K nearest neighbors, combine their outputs What should K be? It is problem (ie, testbed) dependent Can use tuning sets to select good setting for K Shouldn’t really “connect the dots” (Why?) Tuning Set Error Rate 2 3 4 5 K 1 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5

Why Not Use the TEST Set to Select Good Parameters? A 2002 paper in Nature (a major, major journal) needed to be corrected due to “training on the testing set” Original report : 95% accuracy (5% error rate) Corrected report (which still is buggy): 73% accuracy (27% error rate) Error rate increased over 400%!!! This is, unfortunately, a very common error 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5

Some ML “Commandments” Let the data decide ‘Internalize’ (ie, tune) parameters Scaling up by dummying down Don’t ignore simple algo’s, such as Always guessing most common category in the training set Find best SINGLE feature Clever ideas do not imply better results Generalize don’t memorize Accuracy on held-aside data is our focus Never train on the test examples! Commonly violated, alas 9/22/16 CS 540 - Fall 2016 (Shavlik©), Lecture 5