Non-Parameter Estimation

Slides:

Advertisements

Similar presentations

Principles of Density Estimation

Advertisements

Lecture 3 Nonparametric density estimation and classification

Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.

Linear Discriminant Functions

Pattern recognition Professor Aly A. Farag

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 

Chapter 4 (Part 1): Non-Parametric Classification

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Prénom Nom Document Analysis: Non Parametric Methods for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.

Chapter 4 (part 2): Non-Parametric Classification

Nearest Neighbour Condensing and Editing David Claus February 27, 2004 Computer Vision Reading Group Oxford.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.

Non-Parameter Estimation 主講人：虞台文. Contents Introduction Parzen Windows k n -Nearest-Neighbor Estimation Classification Techiques – The Nearest-Neighbor.

1 E. Fatemizadeh Statistical Pattern Recognition.

David Claus and Christoph F. Eick: Nearest Neighbor Editing and Condensing Techniques Nearest Neighbor Editing and Condensing Techniques 1.Nearest Neighbor.

Image Modeling & Segmentation Aly Farag and Asem Ali Lecture #2.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 6: Nearest and k-nearest Neighbor Classification.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

Chapter 13 (Prototype Methods and Nearest-Neighbors )

Initial slides by Eamonn Keogh Clustering. Organizing data into classes such that there is high intra-class similarity low inter-class similarity Finding.

Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.

Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.

KNN & Naïve Bayes Hongning Wang

Chapter 3: Maximum-Likelihood Parameter Estimation

Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17

INTRODUCTION TO Machine Learning 3rd Edition

Ch8: Nonparametric Methods

Parameter Estimation 主講人：虞台文.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

School of Computer Science & Engineering

3(+1) classifiers from the Bayesian world

Lecture 05: K-nearest neighbors

Non-parametric Density Estimation Chapter 4 (Duda et al.)

Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)

Instance Based Learning (Adapted from various sources)

Outline Parameter estimation – continued Non-parametric methods.

K Nearest Neighbor Classification

Lecture 26: Faces and probabilities

Outline Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no.

Nearest-Neighbor Classifiers

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Clustering Wei Wang.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

LECTURE 16: NONPARAMETRIC TECHNIQUES

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Nearest Neighbors CSC 576: Data Mining.

Announcements Project 2 artifacts Project 3 due Thursday night

Announcements Project 4 out today Project 2 winners help session today

Lecture 03: K-nearest neighbors

Data Mining Classification: Alternative Techniques

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Announcements Artifact due Thursday

Nonparametric density estimation and classification

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Hairong Qi, Gonzalez Family Professor

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Announcements Artifact due Thursday

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.

Presentation transcript:

Non-Parameter Estimation 主講人：虞台文

Contents Introduction Parzen Windows kn-Nearest-Neighbor Estimation Classification Techiques The Nearest-Neighbor rule (1-NN) The k-Nearest-Neighbor rule (k-NN) Distance Metrics

Non-Parameter Estimation Introduction

Facts Classical parametric densities are unimodal. Many practical problems involve multimodal densities. Common parametric forms rarely fit the densities actually encountered in practice.

Goals Estimate class-conditional densities Estimate posterior probabilities

Density Estimation R n samples Assume p(x) is continuous & R is small + Randomly take n samples, let K denote the number of samples inside R. n samples

Density Estimation R n samples Assume p(x) is continuous & R is small + Let kR denote the number of samples in R. n samples

Density Estimation 1. R 2. 3. n samples What items can be controlled? How? Density Estimation Use subscript n to take sample size into account. We hope R + To this, we should have 1. 2. 3. n samples

Two Approaches 1. R 2. 3. n samples What items can be controlled? How? Parzen Windows Control Vn kn-Nearest-Neighbor Control kn R + 1. 2. 3. n samples

Two Approaches Parzen Windows kn-Nearest-Neighbor

Non-Parameter Estimation Parzen Windows

Window Function 1

Window Function hn

Window Function hn

Parzen-Window Estimation kn: # samples inside hypercube centered at x. hn

Generalization The window is not necessary a hypercube. Set x/hn=u. Requirement Set x/hn=u. The window is not necessary a hypercube. hn is a important parameter. It depends on sample size.

Interpolation Parameter hn  0 n(x) is a Dirac delta function.

Example Parzen-window estimations for five samples

Convergence Conditions To assure convergence, i.e., and we have the following additional constraints:

Illustrations One dimension case:

Illustrations One dimension case:

Illustrations Two dimension case:

Classification Example Smaller window Larger window

Choosing the Window Function Vn must approach zero when n, but at a rate slower than 1/n, e.g., The value of initial volume V1 is important. In some cases, a cell volume is proper for one region but unsuitable in a different region.

PNN (Probabilistic Neural Network)

PNN (Probabilistic Neural Network) Irrelevant for discriminant analysis Irrelevant for discriminant analysis 2k(bias)

PNN (Probabilistic Neural Network) ak() wk1 wk2 wkd x1 x2 xd k wk x ak(netk) . . .

PNN (Probabilistic Neural Network)  … 1 o1 x1 x2 xd . . .  … c oc . . .  … 2 o2

PNN (Probabilistic Neural Network) 1. Pros and cons of the approach? 2. How to deal with prior probabilities? PNN (Probabilistic Neural Network) Assign patterns to the class with maximum output values.  … 1 o1 x1 x2 xd . . .  … c oc . . .  … 2 o2

Non-Parameter Estimation kn-Nearest-Neighbor Estimation

Basic Concept Let the cell volume depends on the training data. To estimate p(x), we can center a cell about x and let it grow until it captures kn samples, where is some specified function of n, e.g.,

Example kn=5

Example

Estimation of A Posteriori Probabilities Pn(i|x)=? Estimation of A Posteriori Probabilities x

Estimation of A Posteriori Probabilities Pn(i|x)=? Estimation of A Posteriori Probabilities x

Estimation of A Posteriori Probabilities The value of Vn or kn can be determined base on Parzen window or kn-nearest-neighbor technique.

Non-Parameter Estimation Classification Techniques The Nearest-Neighbor Rule The k-Nearest-Neighbor Rule

The Nearest-Neighbor Rule  A set of labeled prototypes x’ x Classify as

The Nearest-Neighbor Rule Voronoi Tessellation

Optimum: Error Rate Baysian (optimum): x

Optimum: Error Rate 1-NN Suppose the true class for x is  x’ x

Optimum: Error Rate 1-NN As n, x’ x ?

Optimum: Error Rate 1-NN x’ x

Error Bounds Bayesian 1-NN Consider the most complex classification case: Error Bounds Bayesian 1-NN

Error Bounds Bayesian 1-NN Consider the opposite case: i.e., Minimized this term Bayesian 1-NN Maximized this term to find the upper bound This term is minimum when all elements have the same value i.e.,

Consider the opposite case: Error Bounds Bayesian 1-NN

Error Bounds Bayesian 1-NN Consider the opposite case: The nearest-neighbor rule is a suboptimal procedure. The error rate is never worse than twice the Bayes rate.

Error Bounds

The k-Nearest-Neighbor Rule Assign pattern to the class wins the majority.

Error Bounds

Computation Complexity The computation complexity of the nearest-neighbor algorithm (both in time and space) has received a great deal of analysis. Require O(dn) space to store n prototypes in a training set. Editing, pruning or condensing To search the nearest neighbor for a d-dimensional test point x, the time complexity is O(dn). Partial distance Search tree

Partial Distance Using the following fact to early throw far-away prototypes

Editing Nearest Neighbor Given a set of points, a Voronoi diagram is a partition of space into regions, within which all points are closer to some particular node than to any other node.

Delaunay Triangulation If two Voronoi regions share a boundary, the nodes of these regions are connected with an edge. Such nodes are called the Voronoi neighbors (or Delaunay neighbors).

The Decision Boundary The circled prototypes are redundant.

The Edited Training Set

Editing: The Voronoi Diagram Approach Compute the Delaunay triangulation for the training set. Visit each node, marking it if all its Delaunay neighbors are of the same class as the current node. Delete all marked nodes, exiting with the remaining ones as the edited training set. Demo

Editing: Other Approaches The Gabriel Graph Approach The Relative Neighbour Graph Approach References: Binay K. Bhattacharya, Ronald S. Poulsen, Godfried T. Toussaint, "Application of Proximity Graphs to Editing Nearest Neighbor Decision Rule", International Symposium on Information Theory, Santa Monica, 1981. T.M. Cover, P.E. Hart, "Nearest Neighbor Pattern Classification", IEEE Transactions on Information Theory, vol. IT-13, No.1, 1967, pp.21-27. V. Klee, On the complexity of d-dimensional Voronoi diagrams", Arch. Math., vol. 34, 1980, pp. 75-80. Godfried T. Toussaint, "The Relative Neighborhood Graph of a Finite Planar Set", Pattern Recognition, vol.12, No.4, 1980, pp.261-268.

Non-Parameter Estimation Distance Metrics

Nearest-Neighbor Classifier Distance Measurement is an importance factor for nearest-neighbor classifier, e.g., To achieve invariant pattern recognition The effect of change units

Nearest-Neighbor Classifier Distance Measurement is an importance factor for nearest-neighbor classifier, e.g., To achieve invariant pattern recognition The effect of translation

Properties of a Distance Metric Nonnegativity Reflexivity Symmetry Triangle Inequality

Minkowski Metric (Lp Norm) 1. L1 norm Manhattan or city block distance 2. L2 norm Euclidean distance 3. L norm Chessboard distance

Minkowski Metric (Lp Norm)