Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Slides:



Advertisements
Similar presentations
Principles of Density Estimation
Advertisements

Lecture 3 Nonparametric density estimation and classification
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Chapter 20 of AIMA KAIST CS570 Lecture note
Pattern recognition Professor Aly A. Farag
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 4 (Part 1): Non-Parametric Classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification, Chapter 3 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Prénom Nom Document Analysis: Non Parametric Methods for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 4 (part 2): Non-Parametric Classification
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Pattern Recognition: Baysian Decision Theory Charles Tappert Seidenberg School of CSIS, Pace University.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.
Non-Parameter Estimation 主講人:虞台文. Contents Introduction Parzen Windows k n -Nearest-Neighbor Estimation Classification Techiques – The Nearest-Neighbor.
Computational Intelligence: Methods and Applications Lecture 12 Bayesian decisions: foundation of learning Włodzisław Duch Dept. of Informatics, UMK Google:
Image Modeling & Segmentation Aly Farag and Asem Ali Lecture #2.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 6: Nearest and k-nearest Neighbor Classification.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17
Non-Parameter Estimation
Ch8: Nonparametric Methods
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
School of Computer Science & Engineering
3(+1) classifiers from the Bayesian world
Non-parametric Density Estimation Chapter 4 (Duda et al.)
Outline Maximum Likelihood Maximum A-Posteriori (MAP) Estimation
Pattern Classification, Chapter 3
Outline Parameter estimation – continued Non-parametric methods.
K Nearest Neighbor Classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Generally Discriminant Analysis
LECTURE 16: NONPARAMETRIC TECHNIQUES
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Parametric Methods Berlin Chen, 2005 References:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Nonparametric density estimation and classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Hairong Qi, Gonzalez Family Professor
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.
Presentation transcript:

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher

Chapter 4: Nonparametric Techniques (Sections 1-6) Introduction Density Estimation Parzen Windows Kn–Nearest-Neighbor Estimation The Nearest-Neighbor Rule Metrics and Nearest-Neighbor Classification

1. Introduction All Parametric densities are unimodal (have a single local maximum), whereas many practical problems involve multi-modal densities Nonparametric procedures can be used with arbitrary distributions and without the assumption that the forms of the underlying densities are known There are two types of nonparametric methods: Estimate density functions P(x |j) without assuming a model Parzen Windows Bypass density functions and directly estimate P(j |x) k-Nearest Neighbor (kNN) Pattern Classification, Ch4

2. Density Estimation Basic idea: Probability that a vector x will fall in region R is: P is a smoothed version of the density function p(x) For n samples i.i.d. dist, probability that k points fall in R is: and the expected value for k is: E(k) = nP (3) Binomial distribution Pattern Classification, Ch4

Maximum Likelihood estimation of P =  is reached for Therefore, the ratio k/n is a good estimate for the probability P and hence for the density function p. Because p(x) is continuous and the region R is so small that p does not vary significantly within it: where x is a point within R and V the volume enclosed by R. Pattern Classification, Ch4

Combining equations (1), (3), (4) => (5) 5 Pattern Classification, Ch4

containing x: the first contains one sample, the second two, etc. Theoretically, if an unlimited number of samples is available, we can estimate the density of x by forming a sequence of regions R1, R2,… containing x: the first contains one sample, the second two, etc. Let Vn be the volume of Rn, kn the number of samples falling in Rn and pn(x) be the nth estimate for p(x): pn(x) = (kn/n)/Vn (7) Three conditions must apply for convergence : There are two different ways to satisfy these conditions: 1. Shrink an initial region where Vn = 1/n and show that the Parzen-window estimation method 2. Specify kn as a function of n, such as kn = n; the volume Vn is grown until it encloses kn neighbors of x, the kNN estimation method Pattern Classification, Ch4

Pattern Classification, Ch4

3. Parzen Windows Parzen-window approach to estimate densities: assume e.g. that the region Rn is a d-dimensional hypercube ((x-xi)/hn) is equal to unity if xi falls within the hypercube of volume Vn centered at x, and equal to zero otherwise. Pattern Classification, Ch4

The number of samples in this hypercube is: Substituting kn in equation 7 (pn(x) = (kn/n)/Vn) we obtain: Pn(x) estimates p(x) as an average of functions of x and the samples (xi) (i = 1,… ,n). These functions  can be general! Pattern Classification, Ch4

Illustration – effect of window function The behavior of the Parzen-window method Case where p(x) N(0,1) Let (u) = (1/(2) exp(-u2/2) and hn = h1/n (n>1) (h1: parameter at our disposal) Thus: is an average of normal densities centered at the samples xi. Pattern Classification, Ch4

Numerical 1D results (see figure next slide): Results depend on n and h1 For n = 1 and h1=1 For n = 10 and h1= 0.1, the contributions of the individual samples are clearly observable ! Pattern Classification, Ch4

Pattern Classification, Ch4

Pattern Classification, Ch4

Case where p(x) = 1. U(a,b) + 2 Case where p(x) = 1.U(a,b) + 2.T(c,d) (mixture of a uniform and a triangle density) Pattern Classification, Ch4

Classification example In classifiers based on Parzen-window estimation: We estimate the densities P(x |j) for each category and classify a test point by the label corresponding to the maximum posterior (unequal priors for multiple classes can be included) The decision region for a Parzen-window classifier depends upon the choice of window function as illustrated in the following figure. For good estimates, usually n must be large, much greater than for parametric models Pattern Classification, Ch4

Pattern Classification, Ch4

Pattern Classification, Ch4

4. Kn–Nearest-Neighbor Estimation Rather than trying to find the “best” Parzen window function Let the cell volume be a function of the training data Center a cell about x and let it grows until it captures kn samples (kn = f(n)) kn are called the kn nearest-neighbors of x Two possibilities can occur: Density is high near x so the cell will be small to provide good resolution Density is low so the cell will grow until higher density regions are reached We can obtain a family of estimates by setting kn=k1/n and choosing different values for k1 (a parameter at our disposal) Pattern Classification, Ch4

Pattern Classification, Ch4

Pattern Classification, Ch4

Pattern Classification, Ch4

Estimation of a-posteriori probabilities Goal: estimate P(i | x) from a set of n labeled samples Let’s place a cell of volume V around x and capture k samples If ki samples among the k turned out to be labeled i , then: pn(x, i) = ki /n.V An estimate for pn(i| x) is: Pattern Classification, Ch4

ki /k is the fraction of the samples within the cell that are labeled i For minimum error rate, the most frequently represented category within the cell is selected If k is large and the cell sufficiently small, the performance will approach the best possible Pattern Classification, Ch4

5. The Nearest-Neighbor Rule Let Dn = {x1, x2, …, xn} be a set of n labeled prototypes Let x’  Dn be the closest prototype to a test point x then the nearest-neighbor rule for classifying x is to assign it the label associated with x’ The nearest-neighbor rule leads to an error rate greater than the minimum possible: the Bayes rate If the number of prototypes is large (unlimited), the error rate of the nearest-neighbor classifier is never worse than twice the Bayes rate (it can be demonstrated!) If n  , it is always possible to find x’ sufficiently close so that: P(i | x’)  P(i | x) If P(m | x)  1, then the nearest neighbor selection is almost always the same as the Bayes selection Pattern Classification, Ch4

Pattern Classification, Ch 4

The k-nearest-neighbor rule Goal: Classify x by assigning it the label most frequently represented among the k nearest samples and use a voting scheme Usually choose k odd so no voting ties Pattern Classification, Ch4

Step-by-step algorithm for finding the nearest neighbor class decision regions and decision boundaries in 2D Find the midpoints between all pairs of points. Find the perpendicular bisectors of the lines between all pairs of points (they go through the midpoints found in step 1). Find the point regions, the region surrounding each point that is closest to the point (this region is outlined by the perpendicular bisector segments that are perpendicular to the shortest line from the point to the bisector segment).  These regions are called Voronoi cells. Merge adjoining point regions of the same class (such as a two-class problem of dog versus cat) to obtain class decision regions (any point falling into the region is assigned to the class of the region).  This is done by eliminating the boundary lines (perpendicular bisector segments) between points of the same class.  The resulting connected line segments defining the decision regions are called the decision boundaries. Pattern Classification, Ch4

Pattern Classification, Ch4

Pattern Classification, Ch4

Pattern Classification, Ch4

Example: Prototypes Labels (0.15, 0.35) (0.10, 0.28) (0.09, 0.30) k = 3 (odd value) and x = (0.10, 0.25)t Closest vectors to x with their labels are: {(0.10, 0.28, 2); (0.12, 0.20, 2); (0.15, 0.35,1)} One voting scheme assigns the label 2 to x since 2 is the most frequently represented Prototypes Labels (0.15, 0.35) (0.10, 0.28) (0.09, 0.30) (0.12, 0.20) 1 2 5 Pattern Classification, Ch4

6. Metrics and Nearest-Neighbor Classification kNN uses a metric (distance function) between two vectors Typically Euclidean distance Distance functions have the properties of Pattern Classification, Ch4

Pattern Classification, Ch4

The Minkowski Metric or Distance L1 is the Manhattan or city block distance L2 is the Euclidean distance Pattern Classification, Ch4

Pattern Classification, Ch4