LECTURE 09: CLASSIFICATION WRAP-UP February 24, 2016 SDS 293 Machine Learning.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

Non-linear DA and Clustering Stat 600. Nonlinear DA We discussed LDA where our discriminant boundary was linear Now, lets consider scenarios where it.
Model generalization Test error Bias, variance and complexity
Please CLOSE YOUR LAPTOPS, and turn off and put away your cell phones, and get out your note-taking materials. Today’s daily quiz will be given at the.
Indian Statistical Institute Kolkata
Ch11 Curve Fitting Dr. Deshi Ye
Advanced Methods and Models in Behavioral Research – 2014 Been there / done that: Stata Logistic regression (……) Conjoint analysis Coming up: Multi-level.
Introduction to Predictive Learning
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Classification 10/03/07.
Stat 112: Lecture 8 Notes Homework 2: Due on Thursday Assessing Quality of Prediction (Chapter 3.5.3) Comparing Two Regression Models (Chapter 4.4) Prediction.
Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Ensemble Learning (2), Tree and Forest
Midterm Review. 1-Intro Data Mining vs. Statistics –Predictive v. experimental; hypotheses vs data-driven Different types of data Data Mining pitfalls.
Today Evaluation Measures Accuracy Significance Testing
Please CLOSE YOUR LAPTOPS, and turn off and put away your cell phones, and get out your note-taking materials. Today’s daily quiz will be given at the.
David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
Mrs. Kim Ross Math Class Room 8
1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.
Experimental Evaluation of Learning Algorithms Part 1.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Machine Learning Recitation 6 Sep 30, 2009 Oznur Tastan.
Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.
Today Ensemble Methods. Recap of the course. Classifier Fusion
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Linear Discriminant Analysis and Its Variations Abu Minhajuddin CSE 8331 Department of Statistical Science Southern Methodist University April 27, 2002.
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Application Advice *Courtesy of Associate Professor Andrew.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning with Discriminative Methods Lecture 05 – Doing it 1 CS Spring 2015 Alex Berg.
CpSc 881: Machine Learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Case Selection and Resampling Lucila Ohno-Machado HST951.
Machine Learning 5. Parametric Methods.
Validation methods.
Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.
LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning.
Machine Learning in Practice Lecture 6 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
LECTURE 06: CLASSIFICATION PT. 2 February 10, 2016 SDS 293 Machine Learning.
LECTURE 07: CLASSIFICATION PT. 3 February 15, 2016 SDS 293 Machine Learning.
LECTURE 03: LINEAR REGRESSION PT. 1 February 1, 2016 SDS 293 Machine Learning.
LECTURE 04: LINEAR REGRESSION PT. 2 February 3, 2016 SDS 293 Machine Learning.
LECTURE 05: CLASSIFICATION PT. 1 February 8, 2016 SDS 293 Machine Learning.
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
LECTURE 16: BEYOND LINEARITY PT. 1 March 28, 2016 SDS 293 Machine Learning.
Teachers: If you hand back Test 4 today, tell your class you will review it with them in class on Thursday when you do the review lecture on Chapters 6,
LECTURE 17: BEYOND LINEARITY PT. 2 March 30, 2016 SDS 293 Machine Learning.
LECTURE 20: SUPPORT VECTOR MACHINES PT. 1 April 11, 2016 SDS 293 Machine Learning.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
LECTURE 11: LINEAR MODEL SELECTION PT. 1 March SDS 293 Machine Learning.
LECTURE 12: LINEAR MODEL SELECTION PT. 2 March 7, 2016 SDS 293 Machine Learning.
LECTURE 13: LINEAR MODEL SELECTION PT. 3 March 9, 2016 SDS 293 Machine Learning.
LECTURE 21: SUPPORT VECTOR MACHINES PT. 2 April 13, 2016 SDS 293 Machine Learning.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
LECTURE 15: PARTIAL LEAST SQUARES AND DEALING WITH HIGH DIMENSIONS March 23, 2016 SDS 293 Machine Learning.
PREDICT 422: Practical Machine Learning Module 3: Resampling Methods in Machine Learning Lecturer: Nathan Bastian, Section: XXX.
Estimating standard error using bootstrap
Lecture 19. Review CS 109A/AC 209A/STAT 121A Data Science: Harvard University Fall 2016 Instructors: P. Protopapas, K. Rader, W. Pan.
Linear regression project
Predict House Sales Price
Generally Discriminant Analysis
Analysis for Predicting the Selling Price of Apartments Pratik Nikte
Adequacy of Linear Regression Models
Outlines Introduction & Objectives Methodology & Workflow
Presentation transcript:

LECTURE 09: CLASSIFICATION WRAP-UP February 24, 2016 SDS 293 Machine Learning

Announcement For those who have asked about extra credit for finishing the superhero LDA from Monday - Full data on the “training heroes” is available at: - You may use R to calculate the covariance matrix if you like, but the rest must be done by hand - Anyone who submits a well-reasoned solution will have their lowest homework grade dropped Submissions should be turned in before spring break

Outline Finish up Monday’s pseudo-lab: classification approaches person teams (3 is ideal) - Take ~30 minutes to discuss the 6 scenarios, then we’ll come back and wrap up classification Evaluating models using resampling - Today: Cross-validation - Monday: Bootstrap

30 min: comparing classification methods Work in teams of 2-4 people (3 is ideal) Goal: to deepen our understanding of how the 4 classification methods we’ve discussed measure up - K-nearest neighbors - Logistic regression - LDA - QDA Instructions: Can check your intuition against p of ISLR

Discussion: classification methods Logistic Regression K-nearest neighbors LDAQDA Scenario 1 Scenario 2 Scenario 3 Scenario 4 Scenario 5 Scenario 6 ( )

Rough guide to choosing a method Is the decision boundary ~linear? Are the observations normally distributed? Do you have limited training data? Do you have limited training data? Start with LDA Start with LDA Start with Logistic Regression Start with Logistic Regression Start with QDA Start with QDA Start with K-Nearest Neighbors Start with K-Nearest Neighbors YES NO YES NO YES NO Linear methods Non-linear methods

Test vs. training In all of these methods, we evaluate performance using test error rather than training error (why?) Real life: if the data comes in stages (i.e. trying to predict election results), we have a natural “test set” Question: when that’s not the case, what do we do? Answer: set aside a “validation set”

What to set aside? Smarket: used years to train, tested on 2005 Caravan: used the first 1000 records as test Carseats: - some of you split the data in half - some sampled randomly - some split on one of the predictors (population>300, sales>7, etc.) Question: did it make a difference?

Answer: absolutely Example: Auto dataset (392 observations) Goal: use linear regression to predict mpg using polynomial f (horsepower) Randomly split into 2 sets of 196 obs. (test and training) ^ 10x? Huge variation in prediction accuracy!

Issues with the “validation set” approach 1. The test error rate depends heavily on which observations are used for training vs. testing 2. We’re only training on a subset of the data (why is this a problem?) We need a new approach...

Coming up Monday: cross-validation A1/A2 feedback should be ready shortly A3 posted, due Feb. 29 th by 11:59pm