Presentation is loading. Please wait.

Presentation is loading. Please wait.

Approximate NN queries on Streams with Guaranteed Error/performance Bounds Nick AT&T labs-research Beng Chin Ooi, Kian-Lee Tan, Rui National.

Similar presentations


Presentation on theme: "Approximate NN queries on Streams with Guaranteed Error/performance Bounds Nick AT&T labs-research Beng Chin Ooi, Kian-Lee Tan, Rui National."— Presentation transcript:

1 Approximate NN queries on Streams with Guaranteed Error/performance Bounds Nick Koudas @ AT&T labs-research Beng Chin Ooi, Kian-Lee Tan, Rui Zhang @ National University of Singapore

2 Problem Problem: kNN search. Environment: data stream (one scan; memory constraint). Approximate Solution: e-approximate kNN (ekNN). Motivation: Applications in which absolute error is preferable or more straightforward. IP: 137.132.48.120 137.132.48.121 …

3 Two Optimization Problems: –memory optimization for a given error bound: given an error bound e, use as little memory as possible to answer ekNN queries. –error minimization for a given memory size: given a fixed amount of memory, achieve the best accuracy for ekNN queries. Requirements: –One scan algorithm. –Satisfies the constraints. –Efficient updates and query processing.

4 A Framework Divide space into equal square-shaped cells. Maintain at most K points in each cell. For any k≤K, absolute error of kNN distance is bounded by d M, the maximum distance within a cell. For Euclidean distance: d M = where d is dimensionality; u is the number of cells each dim is divided to.

5 Maintenance of the Points --aDaptive Indexing on Streams by space-filling Curves (DISC) Cells are not explicitly maintained, only points. Cells linearized according to Z-curve. Z-value of the cell is the key of a point. Points maintained in a B*-tree. An efficient merge-cell algorithm possible.

6 Algorithm: Build index m: the order of Z-curve, 2 m cells each dim. If e given,, we get. m e is integer, so If memory constraint given, set a large enough m. Build index –Initialize m –Read a record P, calculate Z-value, search the B*-tree and find out N c : number of existing points in the cell P belongs to. –If N c < K Insert P to the B*-tree. –Else Discard one and insert P. –If memory runs out //this only happens for the error minimization problem Merge cells and let m=m-1 –Go back to Step 2 (Read next record)

7 Algorithm: Merge Cells General Merge-Cell –Apply to any structure. –For each new cell, find all the points of the old cells in it, and merge them. Bulk Merge-Cell –Only apply to DISC. –Scan all the leaf pages once.

8 Algorithm: KNN search W: a window query centered at the center of the cell Q is in; and with gradually increasing side length s. Find the kNN to Q within W. –If the kNN distance is no larger than the distance between the nearest side of W to Q and Q, search terminates; –Else increase s by 1/u.

9 Experiments

10 Questions ?


Download ppt "Approximate NN queries on Streams with Guaranteed Error/performance Bounds Nick AT&T labs-research Beng Chin Ooi, Kian-Lee Tan, Rui National."

Similar presentations


Ads by Google