Presentation is loading. Please wait.

Presentation is loading. Please wait.

Non-Parameter Estimation

Similar presentations


Presentation on theme: "Non-Parameter Estimation"— Presentation transcript:

1 Non-Parameter Estimation
主講人:虞台文

2 Contents Introduction Parzen Windows kn-Nearest-Neighbor Estimation
Classification Techiques The Nearest-Neighbor rule (1-NN) The k-Nearest-Neighbor rule (k-NN) Distance Metrics

3 Non-Parameter Estimation
Introduction

4 Facts Classical parametric densities are unimodal.
Many practical problems involve multimodal densities. Common parametric forms rarely fit the densities actually encountered in practice.

5 Goals Estimate class-conditional densities
Estimate posterior probabilities

6 Density Estimation R n samples Assume p(x) is continuous & R is small
+ Randomly take n samples, let K denote the number of samples inside R. n samples

7 Density Estimation R n samples Assume p(x) is continuous & R is small
+ Let kR denote the number of samples in R. n samples

8 Density Estimation 1. R 2. 3. n samples What items can be controlled?
How? Density Estimation Use subscript n to take sample size into account. We hope R + To this, we should have 1. 2. 3. n samples

9 Two Approaches 1. R 2. 3. n samples What items can be controlled? How?
Parzen Windows Control Vn kn-Nearest-Neighbor Control kn R + 1. 2. 3. n samples

10 Two Approaches Parzen Windows kn-Nearest-Neighbor

11 Non-Parameter Estimation
Parzen Windows

12 Window Function 1

13 Window Function hn

14 Window Function hn

15 Parzen-Window Estimation
kn: # samples inside hypercube centered at x. hn

16 Generalization The window is not necessary a hypercube. Set x/hn=u.
Requirement Set x/hn=u. The window is not necessary a hypercube. hn is a important parameter. It depends on sample size.

17 Interpolation Parameter
hn  0 n(x) is a Dirac delta function.

18 Example Parzen-window estimations for five samples

19 Convergence Conditions
To assure convergence, i.e., and we have the following additional constraints:

20 Illustrations One dimension case:

21 Illustrations One dimension case:

22 Illustrations Two dimension case:

23 Classification Example
Smaller window Larger window

24 Choosing the Window Function
Vn must approach zero when n, but at a rate slower than 1/n, e.g., The value of initial volume V1 is important. In some cases, a cell volume is proper for one region but unsuitable in a different region.

25 PNN (Probabilistic Neural Network)

26 PNN (Probabilistic Neural Network)
Irrelevant for discriminant analysis Irrelevant for discriminant analysis 2k(bias)

27 PNN (Probabilistic Neural Network)
ak() wk1 wk2 wkd x1 x2 xd k wk x ak(netk) . . .

28 PNN (Probabilistic Neural Network)
1 o1 x1 x2 xd . . . c oc . . . 2 o2

29 PNN (Probabilistic Neural Network)
1. Pros and cons of the approach? 2. How to deal with prior probabilities? PNN (Probabilistic Neural Network) Assign patterns to the class with maximum output values. 1 o1 x1 x2 xd . . . c oc . . . 2 o2

30 Non-Parameter Estimation
kn-Nearest-Neighbor Estimation

31 Basic Concept Let the cell volume depends on the training data.
To estimate p(x), we can center a cell about x and let it grow until it captures kn samples, where is some specified function of n, e.g.,

32 Example kn=5

33 Example

34 Estimation of A Posteriori Probabilities
Pn(i|x)=? Estimation of A Posteriori Probabilities x

35 Estimation of A Posteriori Probabilities
Pn(i|x)=? Estimation of A Posteriori Probabilities x

36 Estimation of A Posteriori Probabilities
The value of Vn or kn can be determined base on Parzen window or kn-nearest-neighbor technique.

37 Non-Parameter Estimation
Classification Techniques The Nearest-Neighbor Rule The k-Nearest-Neighbor Rule

38 The Nearest-Neighbor Rule
 A set of labeled prototypes x’ x Classify as

39 The Nearest-Neighbor Rule
Voronoi Tessellation

40 Optimum: Error Rate Baysian (optimum): x

41 Optimum: Error Rate 1-NN Suppose the true class for x is  x’ x

42 Optimum: Error Rate 1-NN As n, x’ x ?

43 Optimum: Error Rate 1-NN x’ x

44 Error Bounds Bayesian 1-NN
Consider the most complex classification case: Error Bounds Bayesian 1-NN

45 Error Bounds Bayesian 1-NN Consider the opposite case: i.e.,
Minimized this term Bayesian 1-NN Maximized this term to find the upper bound This term is minimum when all elements have the same value i.e.,

46 Consider the opposite case:
Error Bounds Bayesian 1-NN

47 Error Bounds Bayesian 1-NN Consider the opposite case:
The nearest-neighbor rule is a suboptimal procedure. The error rate is never worse than twice the Bayes rate.

48 Error Bounds

49 The k-Nearest-Neighbor Rule
Assign pattern to the class wins the majority.

50 Error Bounds

51 Computation Complexity
The computation complexity of the nearest-neighbor algorithm (both in time and space) has received a great deal of analysis. Require O(dn) space to store n prototypes in a training set. Editing, pruning or condensing To search the nearest neighbor for a d-dimensional test point x, the time complexity is O(dn). Partial distance Search tree

52 Partial Distance Using the following fact to early throw far-away prototypes

53 Editing Nearest Neighbor
Given a set of points, a Voronoi diagram is a partition of space into regions, within which all points are closer to some particular node than to any other node.

54 Delaunay Triangulation
If two Voronoi regions share a boundary, the nodes of these regions are connected with an edge. Such nodes are called the Voronoi neighbors (or Delaunay neighbors).

55 The Decision Boundary The circled prototypes are redundant.

56 The Edited Training Set

57 Editing: The Voronoi Diagram Approach
Compute the Delaunay triangulation for the training set. Visit each node, marking it if all its Delaunay neighbors are of the same class as the current node. Delete all marked nodes, exiting with the remaining ones as the edited training set. Demo

58 Editing: Other Approaches
The Gabriel Graph Approach The Relative Neighbour Graph Approach References: Binay K. Bhattacharya, Ronald S. Poulsen, Godfried T. Toussaint, "Application of Proximity Graphs to Editing Nearest Neighbor Decision Rule", International Symposium on Information Theory, Santa Monica, 1981. T.M. Cover, P.E. Hart, "Nearest Neighbor Pattern Classification", IEEE Transactions on Information Theory, vol. IT-13, No.1, 1967, pp V. Klee, On the complexity of d-dimensional Voronoi diagrams", Arch. Math., vol. 34, 1980, pp Godfried T. Toussaint, "The Relative Neighborhood Graph of a Finite Planar Set", Pattern Recognition, vol.12, No.4, 1980, pp

59 Non-Parameter Estimation
Distance Metrics

60 Nearest-Neighbor Classifier
Distance Measurement is an importance factor for nearest-neighbor classifier, e.g., To achieve invariant pattern recognition The effect of change units

61 Nearest-Neighbor Classifier
Distance Measurement is an importance factor for nearest-neighbor classifier, e.g., To achieve invariant pattern recognition The effect of translation

62 Properties of a Distance Metric
Nonnegativity Reflexivity Symmetry Triangle Inequality

63 Minkowski Metric (Lp Norm)
1. L1 norm Manhattan or city block distance 2. L2 norm Euclidean distance 3. L norm Chessboard distance

64 Minkowski Metric (Lp Norm)


Download ppt "Non-Parameter Estimation"

Similar presentations


Ads by Google