Download presentation
Presentation is loading. Please wait.
Published byMonica Bailey Modified over 9 years ago
1
Non-Parameter Estimation 主講人:虞台文
2
Contents Introduction Parzen Windows k n -Nearest-Neighbor Estimation Classification Techiques – The Nearest-Neighbor rule (1-NN) – The k -Nearest-Neighbor rule ( k -NN) Distance Metrics
3
Non-Parameter Estimation Introduction
4
Facts Classical parametric densities are unimodal. Many practical problems involve multimodal densities. Common parametric forms rarely fit the densities actually encountered in practice.
5
Goals Estimate class-conditional densities Estimate posterior probabilities
6
R Density Estimation n samples Assume p(x) is continuous & R is small + Randomly take n samples, let K denote the number of samples inside R.
7
Density Estimation R n samples + Assume p(x) is continuous & R is small Let k R denote the number of samples in R.
8
R Density Estimation n samples + What items can be controlled? How? Use subscript n to take sample size into account. To this, we should have We hope 1. 2. 3.
9
Two Approaches Parzen Windows – Control V n k n -Nearest-Neighbor – Control k n R n samples + 1. 2. 3. What items can be controlled? How?
10
Two Approaches Parzen Windows k n -Nearest-Neighbor
11
Non-Parameter Estimation Parzen Windows
12
Window Function 1 1 1
13
hnhn hnhn hnhn
14
hnhn hnhn hnhn
15
Parzen-Window Estimation hnhn hnhn hnhn k n : # samples inside hypercube centered at x.
16
Generalization Requirement Set x/h n =u. The window is not necessary a hypercube. h n is a important parameter. It depends on sample size.
17
Interpolation Parameter h n 0 n (x) is a Dirac delta function.
18
Example Parzen-window estimations for five samples
19
Convergence Conditions To assure convergence, i.e., and we have the following additional constraints:
20
Illustrations One dimension case:
21
Illustrations One dimension case:
22
Illustrations Two dimension case:
23
Classification Example Smaller windowLarger window
24
Choosing the Window Function V n must approach zero when n , but at a rate slower than 1/n, e.g., The value of initial volume V 1 is important. In some cases, a cell volume is proper for one region but unsuitable in a different region.
25
PNN (Probabilistic Neural Network)
26
Irrelevant for discriminant analysis 2 k (bias) Irrelevant for discriminant analysis
27
PNN (Probabilistic Neural Network) a k () wk1wk1 wk2wk2 w kd x1x1 x2x2 xdxd kk wkwk x a k (net k )...
28
PNN (Probabilistic Neural Network) … 22 o2o2 … 11 o1o1 x1x1 x2x2 xdxd... … cc ococ
29
PNN (Probabilistic Neural Network) … 22 o2o2 … 11 o1o1 x1x1 x2x2 xdxd... … cc ococ Assign patterns to the class with maximum output values. 1. Pros and cons of the approach? 2. How to deal with prior probabilities?
30
Non-Parameter Estimation k n -Nearest-Neighbor Estimation
31
Basic Concept Let the cell volume depends on the training data. To estimate p(x), we can center a cell about x and let it grow until it captures k n samples, where is some specified function of n, e.g.,
32
Example k n =5
33
Example
34
Estimation of A Posteriori Probabilities x P n ( i |x)=?
35
Estimation of A Posteriori Probabilities x P n ( i |x)=?
36
Estimation of A Posteriori Probabilities The value of V n or k n can be determined base on Parzen window or k n -nearest-neighbor technique.
37
Non-Parameter Estimation Classification Techniques The Nearest-Neighbor Rule The k-Nearest-Neighbor Rule
38
The Nearest-Neighbor Rule x A set of labeled prototypes x’x’ Classify as
39
The Nearest-Neighbor Rule Voronoi Tessellation
40
Error Rate x Baysian (optimum): Optimum:
41
Error Rate x x’x’ 1-NN Suppose the true class for x is Optimum:
42
Error Rate x x’x’ 1-NN ? As n , Optimum:
43
Error Rate x x’x’ 1-NN Optimum:
44
Error Bounds 1-NN Bayesian Consider the most complex classification case:
45
Error Bounds 1-NN Bayesian Consider the opposite case: Maximized this term to find the upper bound This term is minimum when all elements have the same value i.e., Minimized this term
46
Error Bounds 1-NN Bayesian Consider the opposite case:
47
Error Bounds 1-NN Bayesian Consider the opposite case: The nearest-neighbor rule is a suboptimal procedure. The error rate is never worse than twice the Bayes rate.
48
Error Bounds
49
The k -Nearest-Neighbor Rule k = 5 Assign pattern to the class wins the majority.
50
Error Bounds
51
Computation Complexity The computation complexity of the nearest-neighbor algorithm (both in time and space) has received a great deal of analysis. Require O(dn) space to store n prototypes in a training set. – Editing, pruning or condensing To search the nearest neighbor for a d-dimensional test point x, the time complexity is O(dn). – Partial distance – Search tree
52
Partial Distance Using the following fact to early throw far-away prototypes
53
Editing Nearest Neighbor Given a set of points, a Voronoi diagram is a partition of space into regions, within which all points are closer to some particular node than to any other node.
54
Delaunay Triangulation If two Voronoi regions share a boundary, the nodes of these regions are connected with an edge. Such nodes are called the Voronoi neighbors (or Delaunay neighbors).
55
The Decision Boundary The circled prototypes are redundant.
56
The Edited Training Set
57
Editing: The Voronoi Diagram Approach Compute the Delaunay triangulation for the training set. Visit each node, marking it if all its Delaunay neighbors are of the same class as the current node. Delete all marked nodes, exiting with the remaining ones as the edited training set. Demo
58
Editing: Other Approaches The Gabriel Graph Approach The Relative Neighbour Graph Approach References: – Binay K. Bhattacharya, Ronald S. Poulsen, Godfried T. Toussaint, "Application of Proximity Graphs to Editing Nearest Neighbor Decision Rule", International Symposium on Information Theory, Santa Monica, 1981."Application of Proximity Graphs to Editing Nearest Neighbor Decision Rule", – T.M. Cover, P.E. Hart, "Nearest Neighbor Pattern Classification", IEEE Transactions on Information Theory, vol. IT-13, No.1, 1967, pp.21-27. – V. Klee, On the complexity of d-dimensional Voronoi diagrams", Arch. Math., vol. 34, 1980, pp. 75-80. – Godfried T. Toussaint, "The Relative Neighborhood Graph of a Finite Planar Set", Pattern Recognition, vol.12, No.4, 1980, pp.261-268.
59
Non-Parameter Estimation Distance Metrics
60
Nearest-Neighbor Classifier Distance Measurement is an importance factor for nearest-neighbor classifier, e.g., – To achieve invariant pattern recognition The effect of change units
61
Nearest-Neighbor Classifier Distance Measurement is an importance factor for nearest-neighbor classifier, e.g., – To achieve invariant pattern recognition The effect of translation
62
Properties of a Distance Metric Nonnegativity Reflexivity Symmetry Triangle Inequality
63
Minkowski Metric ( L p Norm) 1. L 1 norm Manhattan or city block distance 2. L 2 norm Euclidean distance 3. L norm Chessboard distance
64
Minkowski Metric ( L p Norm)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.