NN Cont’d. Administrivia No news today... Homework not back yet Working on it... Solution set out today, though.

Slides:



Advertisements
Similar presentations
Principles of Density Estimation
Advertisements

Linear Regression.
Linear Classifiers (perceptrons)
Data Mining Classification: Alternative Techniques
K-means method for Signal Compression: Vector Quantization
Lecture 3 Nonparametric density estimation and classification
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Pattern recognition Professor Aly A. Farag
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Bayesian Learning, Part 1 of (probably) 4 Reading: Bishop Ch. 1.2, 1.5, 2.3.
Intro to Linear Methods Reading: DH&S, Ch 5.{1-4,8} hip to be hyperplanar...
Lecture Notes for CMPUT 466/551 Nilanjan Ray
More Methodology; Nearest-Neighbor Classifiers Sec 4.7.
Bayesian Learning, Part 1 of (probably) 4 Reading: DH&S, Ch. 2.{1-5}, 3.{1-4}
Bayesianness, cont’d Part 2 of... 4?. Administrivia CSUSC (CS UNM Student Conference) March 1, 2007 (all day) That’s a Thursday... Thoughts?
Spatial and Temporal Data Mining
Linear methods: Regression & Discrimination Sec 4.6.
Statistical Decision Theory, Bayes Classifier
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
These slides are based on Tom Mitchell’s book “Machine Learning” Lazy learning vs. eager learning Processing is delayed until a new instance must be classified.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Steep learning curves Reading: DH&S, Ch 4.6, 4.5.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Nearest-Neighbor Classifiers Sec minutes of math... Definition: a metric function is a function that obeys the following properties: Identity:
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Principles of Pattern Recognition
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CSIE Dept., National Taiwan Univ., Taiwan
Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
1 E. Fatemizadeh Statistical Pattern Recognition.
NEAREST NEIGHBORS ALGORITHM Lecturer: Yishay Mansour Presentation: Adi Haviv and Guy Lev 1.
Image Modeling & Segmentation Aly Farag and Asem Ali Lecture #2.
Optimal Bayes Classification
Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Chapter 4: Feature representation and compression
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Vector Quantization CAP5015 Fall 2005.
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
CS Machine Learning Instance Based Learning (Adapted from various sources)
Mete Ozay, Fatos T. Yarman Vural —Presented by Tianxiao Jiang
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
LECTURE 05: CLASSIFICATION PT. 1 February 8, 2016 SDS 293 Machine Learning.
KNN & Naïve Bayes Hongning Wang
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Ch. 4: Feature representation
Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Instance Based Learning (Adapted from various sources)
Ch. 4: Feature representation
K Nearest Neighbor Classification
Nearest-Neighbor Classifiers
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Instance Based Learning
EE513 Audio Signals and Systems
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 16: NONPARAMETRIC TECHNIQUES
Announcements Project 2 artifacts Project 3 due Thursday night
Nonparametric density estimation and classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.
Presentation transcript:

NN Cont’d

Administrivia No news today... Homework not back yet Working on it... Solution set out today, though

Whence & Whither Last time: Learning curves & performance estimation Metric functions The nearest-neighbor classifier Today: Homework 2 assigned More on k-NN Probability distributions and classification (maybe) NN in your daily life

Homework 2 Due: Feb 16 DH&S, Problems 4.9, 4.11, 4.19, 4.20 Also: 1. Prove that the square of Euclidean distance is still a metric 2. Let W be a square matrix ( d X d ). Is still a metric? Under what conditions on W ?

Distances in classification Nearest neighbor rule: find the nearest instance to the query point in feature space, return the class of that instance Simplest possible distance-based classifier With more notation: Distance here is “whatever’s appropriate to your data”

Properties of NN Training time of NN? Classification time? Geometry of model? d(, ) Closer to

Properties of NN Training time of NN? Classification time? Geometry of model?

Properties of NN Training time of NN? Classification time? Geometry of model?

Eventually...

Gotchas Unscaled dimensions What happens if one axis is measured in microns and one in lightyears?

Gotchas Unscaled dimensions What happens if one axis is measured in microns and one in lightyears? ? x ? x

Gotchas Unscaled dimensions What happens if one axis is measured in microns and one in lightyears? Usual trick is to scale each axis to [-1,1] range ? x ? x

NN miscellaney Slight generalization: k -Nearest neighbors ( k -NN) Find k training instances closest to query point Vote among them for label

Geometry of k -NN d (7) Query point

What’s going on here? One way to look at k-NN Trying to estimate the probability distribution of the classes in the vicinity of query point, x Quantity is called the posterior probability of Probability that class is, given that the data vector is x NN and k-NN are average estimates of this posterior So why is that the right thing to do?...

5 minutes of math Bayesian decision rule Want to pick the class that minimizes expected cost Simplest case: cost==misclassification Expected cost == expected misclassification rate

5 minutes of math Expectation only defined w.r.t. a probability distribution: Posterior probability of class i given data x : Interpreted as: chance that the real class is, given that the observed data is x

5 minutes of math Expected cost is then: cost of getting wrong * prob of getting it wrong integrated over all possible outcomes (true classes) More formally: Want to pick that minimizes this

5 minutes of math For 0/1 cost, reduces to:

5 minutes of math For 0/1 cost, reduces to: To minimize, pick the that minimizes:

5 minutes of math In pictures:

5 minutes of math In pictures:

5 minutes of math These thresholds are called the Bayes decision thresholds The corresponding cost (err rate) is called the Bayes optimal cost A real-world example:

Back to NN So the nearest neighbor rule is using the density of points around the query as an estimate Labels are an estimate of Assume that the majority vote corresponds to maximum As k grows, estimate gets better But there are problems with growing k...

Exercise Geometry of k-NN Let V(k,N)=volume of sphere enclosing k neighbors of X, assuming N points in data set For fixed N, what happens to V(k,N) as k grows? For fixed k, what happens to V(k,N) as N grows? What about radius of V(k,N) ?

The volume question Let V(k,N)=volume of sphere enclosing k neighbors of X, assuming N points in data set Assume uniform point distribution Total volume of data is ( 1, w.l.o.g.) So, on average,

1-NN in practice Most common use of 1-nearest neighbor in daily life?

1-NN & speech proc. 1-NN closely related to technique of Vector Quantization (VQ) Used to compress speech data for cell phones CD quality sound: 16 bit/sample 44.1 kHz ⇒ 88.2 kB/sec ⇒ ~705 kbps Telephone (land line) quality: ~10 bit/sample 10 kHz ⇒ ~12.5 kB/sec ⇒ 100 kpbs Cell phones run at ~9600 bps...

Speech compression via VQ Speech source Raw audio signal

Speech compression via VQ Raw audio “Framed” audio

Speech compression via VQ Framed audio Cepstral (~ smoothed frequency) representation

Speech compression via VQ Cepstrum Downsampled cepstrum

Speech compression via VQ D.S. cepstrum Vector representation Vector quantize (1-NN) Transmitted exemplar (cell centroid)

Compression ratio Original: 10 bits 10 kHz; 250 samples/“frame” (25ms/frame) ⇒ 100 kbps; 2500 bits/frame VQ compressed: 40 frames/sec 1 VC centroid/frame ~1M centroids ⇒ ~20 bits/centroid ⇒ ~800 bits/sec!

Signal reconstruction Transmitted cell centroid Look up cepstral coefficients Reconstruct cepstrum Convert to audio

Not lossless, though!