Presentation is loading. Please wait.

Presentation is loading. Please wait.

New Algorithms for Efficient High-Dimensional Nonparametric Classification Ting Liu, Andrew W. Moore, and Alexander Gray.

Similar presentations


Presentation on theme: "New Algorithms for Efficient High-Dimensional Nonparametric Classification Ting Liu, Andrew W. Moore, and Alexander Gray."— Presentation transcript:

1 New Algorithms for Efficient High-Dimensional Nonparametric Classification Ting Liu, Andrew W. Moore, and Alexander Gray

2 Overview  Introduction k Nearest Neighbors ( k -NN) KNS1: conventional k -NN search  New algorithms for k -NN classification KNS2: for skewed-class data KNS3: ”are at least t of k -NN positive”?  Results  Comments

3 Introduction: k -NN  k -NN Nonparametric classification method. Given a data set of n data points, it finds the k closest points to a query point, and chooses the label corresponding to the majority. Computational complexity is too high in many solutions, especially for the high- dimensional case.

4 Introduction: KNS1  KNS1: Conventional k -NN search with ball-tree. Ball-Tree (binary):  Root node represents full set of points.  Leaf node contains some points.  Non-leaf node has two children nodes.  Pivot of a node: one of the points in the node, or the centroid of the points.  Radius of a node:

5 Introduction: KNS1  Bound the distance from a query point q :  Trade off the cost of construction against the tightness of the radius of the balls.

6 Introduction: KNS1 recursive procedure: PS out =BallKNN (PS in, Node)  PS in consists of the k-NN of q in V ( the set of points searched so far)  PS out consists of the k-NN of q in V and Node

7 KNS2  KNS2: For skewed-class data: one class is much more frequent than the other. Find the # of the k NN in the positive class without explicitly finding the k -NN set. Basic idea:  Build two ball-trees: Postree (small), Negtree  “Find Positive”: Search Postree to find k-nn set Posset k using KNS1 ;  “Insert negative”: Search Negtree, use Posset k as bounds to prune nodes far away and to estimate the # of negative points to be inserted to the true nearest neighbor set.

8 KNS2 Definitions:  Dists={Dist 1,…, Dist k } : the distance to the k nearest positive neighbors of q, sorted in increasing order.  V: the set of points in the negative balls visited so far.  (n, C): n is the # of positive points in k NN of q. C ={C 1,…,C n }, C i is # of the negative points in V closer than the i th positive neighbor to q.  and

9 KNS2 Step 2 “insert negative” is implemented by the recursive function (n out, C out )=NegCount(n in, C in, Node, j parent, Dists) (n in, C in ) sumarize interesting negative points for V; (n out, C out ) sumarize interesting negative points for V and Node;

10 KNS3  KNS3 “are at least t of k nearest neighbors positive?” No constraint of skewness in the class. Proposition:  Instead of directly compute the exact values, we compute the lower and upper bound, since m+t=k+1

11 KNS3 P is a set of balls from Postree, N consists of balls from Negtree.

12 Experimental results  Real data

13 Experimental results k=9, t=ceiling(k/2), Randomly pick 1% negative records and 50% positive records as test (986 points) Train on the reaming 87372 data points

14 Comments  Why k-NN? Baseline  No free lunch: For uniform high-dimensional data, no benefits. Results mean the intrinsic dimensionality is much lower.


Download ppt "New Algorithms for Efficient High-Dimensional Nonparametric Classification Ting Liu, Andrew W. Moore, and Alexander Gray."

Similar presentations


Ads by Google