LECTURE 05: CLASSIFICATION PT. 1 February 8, 2016 SDS 293 Machine Learning.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

Linear Regression.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Pattern Recognition and Machine Learning
INTRODUCTION TO Machine Learning 3rd Edition
Indian Statistical Institute Kolkata
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
Chapter 4: Linear Models for Classification
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Classification and risk prediction
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Linear Methods for Classification
Machine Learning CMPT 726 Simon Fraser University
Classification 10/03/07.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
1 Linear Classification Problem Two approaches: -Fisher’s Linear Discriminant Analysis -Logistic regression model.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Machine learning Image source:
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Machine learning Image source:
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Outline Separating Hyperplanes – Separable Case
ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 – Fall 2014.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Text Classification 2 David Kauchak cs459 Fall 2012 adapted from:
Lecture3 – Overview of Supervised Learning Rice ELEC 697 Farinaz Koushanfar Fall 2006.
Linear Models for Classification
CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.
Linear Methods for Classification : Presentation for MA seminar in statistics Eli Dahan.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.
Regress-itation Feb. 5, Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a.
LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning.
ICS 178 Introduction Machine Learning & data Mining Instructor max Welling Lecture 4: Least squares Regression.
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
LECTURE 06: CLASSIFICATION PT. 2 February 10, 2016 SDS 293 Machine Learning.
LECTURE 07: CLASSIFICATION PT. 3 February 15, 2016 SDS 293 Machine Learning.
LECTURE 03: LINEAR REGRESSION PT. 1 February 1, 2016 SDS 293 Machine Learning.
LECTURE 04: LINEAR REGRESSION PT. 2 February 3, 2016 SDS 293 Machine Learning.
LECTURE 20: SUPPORT VECTOR MACHINES PT. 1 April 11, 2016 SDS 293 Machine Learning.
LECTURE 11: LINEAR MODEL SELECTION PT. 1 March SDS 293 Machine Learning.
LECTURE 12: LINEAR MODEL SELECTION PT. 2 March 7, 2016 SDS 293 Machine Learning.
LECTURE 09: CLASSIFICATION WRAP-UP February 24, 2016 SDS 293 Machine Learning.
LECTURE 13: LINEAR MODEL SELECTION PT. 3 March 9, 2016 SDS 293 Machine Learning.
LECTURE 21: SUPPORT VECTOR MACHINES PT. 2 April 13, 2016 SDS 293 Machine Learning.
Linear Models Tony Dodd. 21 January 2008Mathematics for Data Modelling: Linear Models Overview Linear models. Parameter estimation. Linear in the parameters.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
LECTURE 15: PARTIAL LEAST SQUARES AND DEALING WITH HIGH DIMENSIONS March 23, 2016 SDS 293 Machine Learning.
Classification Methods
Classification Methods
STT : Intro. to Statistical Learning
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
The Elements of Statistical Learning
Learning Hongfei Yan March 1, 2016.
Overview of Supervised Learning
Pattern Recognition and Machine Learning
Generally Discriminant Analysis
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Classification Methods
Presentation transcript:

LECTURE 05: CLASSIFICATION PT. 1 February 8, 2016 SDS 293 Machine Learning

Outline Motivation Bayes classifier K-nearest neighbors Logistic regression - Logistic model - Estimating coefficients with maximum likelihood - Multivariate logistic regression - Multiclass logistic regression - Limitations Linear discriminant analysis (LDA) - Bayes’ theorem - LDA on one predictor - LDA on multiple predictors Comparing Classification Methods

Motivation So far: predicted quantitative responses Image credit: Ming Malaykham

Motivation Question: what if we have a qualitative response?

Motivation Like in regression, we have a set of training observations: Want to build a model to predict the (qualitative) response for new observations

Classification Predicting a qualitative response for an observation can be thought of as classifying that observation MODEL

Quantifying accuracy With quantitative responses, we measured a model’s accuracy using the mean squared error: Problem: how do we take the difference of two classes Solution: we’ll try to minimize the proportion of misclassifications (the test error rate) instead: is an indicator function that outputs 1 if the input is true, 0 otherwise

Why won’t linear regression work? LR only works on quantitative responses (why?) Could try approach we took with qualitative predictors: What’s the problem?

Why won’t linear regression work? Is it any better if we only have a binary response? In this case, how might we interpret ? or

Bayes’ classifier A simple, elegant classifier: assign each observation to the most likely class, given its predictor values That is, assign a test observation with predictor vector to the class that maximizes:

Toy example Bayes’ Decision Boundary 50%

Bayes’ classifier Test error rate of the Bayes classifier: Great news! This error rate is provably* optimal! Just one problem… Okay, let’s estimate! * proof left to the reader

Back to our toy example X

K-nearest neighbors Input: a positive integer K and a test value x 0 Step 1: Identify the K training examples closest to x 0 (call them N 0 ) Step 2: Estimate: Step 3: Assign x 0 to whichever class has the highest (estimated) probability

K-nearest neighbors example: K = 3 KNN Decision Boundary BLUE ORANGE

K-nearest neighbors example: K = 10 KNN Decision Boundary BLUEORANGE

KNN vs. Bayes Despite being extremely simple, KNN can produce classifiers that are close to optimal (is this surprising?) Problem: what’s the right K ? Question: does it matter?

Choosing the right K low bias, high variance TE = high bias, low variance TE =

KNN training vs. test error Flexibility ( 1/K ) Error rate bigger K smaller K Decrease in training error

Lab: K-nearest neighbors To do today’s lab in R: class package To do today’s lab in python: pandas, numpy, sklearn Instructions and code: Full version can be found beginning on p. 163 of ISLR Note: we’re going a little out of order, so you may want to stick with the demo code

Discussion: KNN on quantitative responses Question 1: is there any reason we couldn’t use KNN to predict quantitative responses? Question 2: what (if anything) would need to change? Question 3: how does it compare to LR?

LR vs. KNN Linear Regression Parametric (meaning?)  We assume an underlying functional form for f(X) Pros: - Coefficients have simple interpretations - Easy to do significance testing Cons: - Wrong about the functional form  poor performance K-Nearest Neighbors Non-parametric (meaning?)  No explicit assumptions about the form for f(X) Pros: - Doesn’t require knowledge about the underlying form - More flexible approach Cons: - Can accidentally “mask” the underlying function

Discussion: which method? Question 1: would you expect LR to outperform KNN when the underlying relationship is linear? Why? - Yes: KNN won’t get a reduction in bias as it increases in variance Question 2: what happens as the # of dimensions increases, but the # of observations stays the same? - “Curse of dimensionality”: the more dimensions there are, the farther away each observation’s “nearest neighbors” can be

Coming up Wednesday: Logistic regression - Logistic model - Estimating coefficients with maximum likelihood - Multivariate logistic regression - Multiclass logistic regression - Limitations A1 due Wednesday Feb. 10 by 11:59pm moodle.smith.edu/mod/assign/view.php?id=90237