ECE 5424: Introduction to Machine Learning

Slides:



Advertisements
Similar presentations
Nonparametric Methods: Nearest Neighbors
Advertisements

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning 4.1 Introduction Instance-Based Learning: Local approximation to the.
Instance Based Learning
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –SVM –Multi-class SVMs –Neural Networks –Multi-layer Perceptron Readings:
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Instance based learning K-Nearest Neighbor Locally weighted regression Radial basis functions.
Learning from Observations Chapter 18 Section 1 – 4.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Nearest Neighbor Models (Reading: Chapter.
Instance Based Learning
These slides are based on Tom Mitchell’s book “Machine Learning” Lazy learning vs. eager learning Processing is delayed until a new instance must be classified.
INSTANCE-BASE LEARNING
ECE 5984: Introduction to Machine Learning
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Nearest Neighbour Readings: Barber 14 (kNN)
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Supervised Learning –General Setup, learning from data –Nearest Neighbour.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Regression Readings: Barber 17.1, 17.2.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.
CS Machine Learning Instance Based Learning (Adapted from various sources)
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Model selection –Error decomposition –Bias-Variance Tradeoff –Classification:
k-Nearest neighbors and decision tree
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
Ananya Das Christman CS311 Fall 2016
Dhruv Batra Georgia Tech
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Multivariate Methods of
Instance Based Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
INTRODUCTION TO Machine Learning 3rd Edition
Ch8: Nonparametric Methods
ECE 5424: Introduction to Machine Learning
Machine learning, pattern recognition and statistical data modelling
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
Large-Scale Machine Learning: k-NN, Perceptron
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
Instance Based Learning (Adapted from various sources)
CS 2750: Machine Learning K-NN (cont’d) + Review
ECE 5424: Introduction to Machine Learning
Roberto Battiti, Mauro Brunato
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
K Nearest Neighbor Classification
Statistical Learning Dong Liu Dept. EEIS, USTC.
ECE 5424: Introduction to Machine Learning
CS 2750: Machine Learning Support Vector Machines
Hyperparameters, bias-variance tradeoff, validation
ECE 5424: Introduction to Machine Learning
یادگیری بر پایه نمونه Instance Based Learning Instructor : Saeed Shiry
Instance Based Learning
Advanced Artificial Intelligence Classification
CS5112: Algorithms and Data Structures for Applications
Support Vector Machine I
Nearest Neighbors CSC 576: Data Mining.
Multivariate Methods Berlin Chen, 2005 References:
Lecture: Object Recognition
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Dhruv Batra Georgia Tech
Presentation transcript:

ECE 5424: Introduction to Machine Learning Topics: (Finish) Nearest Neighbor Readings: Barber 14 (kNN) Stefan Lee Virginia Tech

Administrative HW1 Out now on Scholar Due on Wednesday 09/14, 11:55pm Please please please please please start early Implement K-NN Kaggle Competition Bonus points for best performing entries. Bonus points for beating the instructor/TA. (C) Dhruv Batra

Recap from last time (C) Dhruv Batra

Image Credit: Wikipedia Nearest Neighbours (C) Dhruv Batra Image Credit: Wikipedia

Instance/Memory-based Learning Four things make a memory based learner: A distance metric How many nearby neighbors to look at? A weighting function (optional) How to fit with the local points? (C) Dhruv Batra Slide Credit: Carlos Guestrin

1-Nearest Neighbour Four things make a memory based learner: A distance metric Euclidean (and others) How many nearby neighbors to look at? 1 A weighting function (optional) unused How to fit with the local points? Just predict the same output as the nearest neighbour. (C) Dhruv Batra Slide Credit: Carlos Guestrin

1-NN for Regression y x Here, this is the closest datapoint (C) Dhruv Batra Figure Credit: Carlos Guestrin

Multivariate distance metrics Suppose the input vectors x1, x2, …xN are two dimensional: x1 = ( x11 , x12 ) , x2 = ( x21 , x22 ) , …xN = ( xN1 , xN2 ). One can draw the nearest-neighbor regions in input space. Dist(xi,xj) =(xi1 – xj1)2+(3xi2 – 3xj2)2 Dist(xi,xj) = (xi1 – xj1)2 + (xi2 – xj2)2 The relative scalings in the distance metric affect region shapes Slide Credit: Carlos Guestrin

Euclidean distance metric Or equivalently, where A Slide Credit: Carlos Guestrin

Notable distance metrics (and their level sets) Scaled Euclidian (L2) Mahalanobis (non-diagonal A) Slide Credit: Carlos Guestrin

Minkowski distance (C) Dhruv Batra Image Credit: By Waldir (Based on File:MinkowskiCircles.svg) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

Notable distance metrics (and their level sets) Scaled Euclidian (L2) L1 norm (absolute) Mahalanobis (non-diagonal A) Linf (max) norm Slide Credit: Carlos Guestrin

Plan for today (Finish) Nearest Neighbour Kernel Classification/Regression Curse of Dimensionality (C) Dhruv Batra

Figure Credit: Andrew Moore 1-NN for Regression Often bumpy (overfits) (C) Dhruv Batra Figure Credit: Andrew Moore

Figure Credit: Andrew Moore 9-NN for Regression Often bumpy (overfits) (C) Dhruv Batra Figure Credit: Andrew Moore

Weighted k-NNs Neighbors are not all the same

Kernel Regression/Classification Four things make a memory based learner: A distance metric Euclidean (and others) How many nearby neighbors to look at? All of them A weighting function (optional) wi = exp(-d(xi, query)2 / σ2) Nearby points to the query are weighted strongly, far points weakly. The σ parameter is the Kernel Width. Very important. How to fit with the local points? Predict the weighted average of the outputs predict = Σwiyi / Σwi (C) Dhruv Batra Slide Credit: Carlos Guestrin

Weighting/Kernel functions wi = exp(-d(xi, query)2 / σ2) (Our examples use Gaussian) (C) Dhruv Batra Slide Credit: Carlos Guestrin

Image Credit: Ben Taskar Effect of Kernel Width What happens as σinf? What happens as σ0? (C) Dhruv Batra Image Credit: Ben Taskar

Problems with Instance-Based Learning Expensive No Learning: most real work done during testing For every test sample, must search through all dataset – very slow! Must use tricks like approximate nearest neighbour search Doesn’t work well when large number of irrelevant features Distances overwhelmed by noisy features Curse of Dimensionality Distances become meaningless in high dimensions (See proof in next lecture) (C) Dhruv Batra

Curse of Dimensionality Consider: Sphere of radius 1 in d-dims Consider: an outer ε-shell in this sphere What is ? (C) Dhruv Batra

Curse of Dimensionality (C) Dhruv Batra

What you need to know k-NN Kernel regression/classification Simplest learning algorithm With sufficient data, very hard to beat “strawman” approach Picking k? Kernel regression/classification Set k to n (number of data points) and chose kernel width Smoother than k-NN Problems with k-NN Curse of dimensionality Irrelevant features often killers for instance-based approaches Slow NN search: Must remember (very large) dataset for prediction (C) Dhruv Batra

What you need to know Key Concepts (which we will meet again) Supervised Learning Classification/Regression Loss Functions Statistical Estimation Training vs Testing Data Hypothesis Class Overfitting, Generalization (C) Dhruv Batra