ECE 5424: Introduction to Machine Learning Topics: Supervised Learning Measuring performance Nearest Neighbor Distance Metrics Readings: Barber 14 (kNN) Stefan Lee Virginia Tech
Administrative Course add period is over If not enrolled, please leave. Virginia law apparently. (C) Dhruv Batra
HW0 The lower your score, the harder you should expect to work. HW0 is graded Average: 81% Median: 85% Max: 100% Min: 35% The lower your score, the harder you should expect to work. (C) Dhruv Batra
HW0 Question 1: Sam is an odd and not-so-honest person who carries around a bag containing 3 fair coins (heads and tails) and 1 coin which has heads on both sides (which he uses to cheat his friends out of small amounts of cash). He selects a coin randomly from the bag and flips it 4 times and the coin comes up heads each time. He bets you $10 the next flip will be heads. What is the probability the next flip will be heads? Only 1 coin is drawn from the bag, this coin is flipped 4 times. What is the probability of the 5th flip being heads? Want: P(H | 4H) P(H | 4 H) = P(H | fair) P(fair| 4H) + P(H | unfair) P(unfair|4H) P(fair | 4H) = P(4H | fair) P(fair) / P(4H) (Bayes Rule) P(4H) = P(4H | fair) P(fair) + P(4H | unfair) P(unfair) (C) Dhruv Batra
Recap from last time (C) Dhruv Batra
Slide Credit: Yaser Abu-Mostapha (C) Dhruv Batra Slide Credit: Yaser Abu-Mostapha
Nearest Neighbor Demo http://www.cs.technion.ac.il/~rani/LocBoost/ (C) Dhruv Batra
Proportion as Confidence Gender Classification from body proportions Igor Janjic & Daniel Friedman, Juniors (C) Dhruv Batra
Plan for today Supervised/Inductive Learning Nearest Neighbor (A bit more on) Loss functions Nearest Neighbor Common Distance Metrics Kernel Classification/Regression Curse of Dimensionality (C) Dhruv Batra
Loss/Error Functions How do we measure performance? Regression: L2 error Classification: #misclassifications Weighted misclassification via a cost matrix For 2-class classification: True Positive, False Positive, True Negative, False Negative For k-class classification: Confusion Matrix ROC curves http://psych.hanover.edu/JavaTest/SDT/ROC.html (C) Dhruv Batra
Image Credit: Wikipedia Nearest Neighbors (C) Dhruv Batra Image Credit: Wikipedia
Instance/Memory-based Learning Four things make a memory based learner: A distance metric How many nearby neighbors to look at? A weighting function (optional) How to fit with the local points? (C) Dhruv Batra Slide Credit: Carlos Guestrin
1-Nearest Neighbour Four things make a memory based learner: A distance metric Euclidean (and others) How many nearby neighbors to look at? 1 A weighting function (optional) unused How to fit with the local points? Just predict the same output as the nearest neighbour. (C) Dhruv Batra Slide Credit: Carlos Guestrin
k-Nearest Neighbour Four things make a memory based learner: A distance metric Euclidean (and others) How many nearby neighbors to look at? k A weighting function (optional) unused How to fit with the local points? Just predict the average output among the nearest neighbors. (C) Dhruv Batra Slide Credit: Carlos Guestrin
1-NN for Regression y x Here, this is the closest datapoint (C) Dhruv Batra Figure Credit: Carlos Guestrin
Multivariate distance metrics Suppose the input vectors x1, x2, …xN are two dimensional: x1 = ( x11 , x12 ) , x2 = ( x21 , x22 ) , …xN = ( xN1 , xN2 ). One can draw the nearest-neighbor regions in input space. Dist(xi,xj) =(xi1 – xj1)2+(3xi2 – 3xj2)2 Dist(xi,xj) = (xi1 – xj1)2 + (xi2 – xj2)2 The relative scalings in the distance metric affect region shapes Slide Credit: Carlos Guestrin
Euclidean distance metric Or equivalently, where A Slide Credit: Carlos Guestrin
Notable distance metrics (and their level sets) Scaled Euclidian (L2) Mahalanobis (non-diagonal A) Slide Credit: Carlos Guestrin
Minkowski distance (C) Dhruv Batra Image Credit: By Waldir (Based on File:MinkowskiCircles.svg) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
Notable distance metrics (and their level sets) Scaled Euclidian (L2) L1 norm (absolute) Mahalanobis (non-diagonal A) Linf (max) norm Slide Credit: Carlos Guestrin