Download presentation
Presentation is loading. Please wait.
Published byFrederica Barnett Modified over 9 years ago
1
Approximate NN queries on Streams with Guaranteed Error/performance Bounds Nick Koudas @ AT&T labs-research Beng Chin Ooi, Kian-Lee Tan, Rui Zhang @ National University of Singapore
2
Problem Problem: kNN search. Environment: data stream (one scan; memory constraint). Approximate Solution: e-approximate kNN (ekNN). Motivation: Applications in which absolute error is preferable or more straightforward. IP: 137.132.48.120 137.132.48.121 …
3
Two Optimization Problems: –memory optimization for a given error bound: given an error bound e, use as little memory as possible to answer ekNN queries. –error minimization for a given memory size: given a fixed amount of memory, achieve the best accuracy for ekNN queries. Requirements: –One scan algorithm. –Satisfies the constraints. –Efficient updates and query processing.
4
A Framework Divide space into equal square-shaped cells. Maintain at most K points in each cell. For any k≤K, absolute error of kNN distance is bounded by d M, the maximum distance within a cell. For Euclidean distance: d M = where d is dimensionality; u is the number of cells each dim is divided to.
5
Maintenance of the Points --aDaptive Indexing on Streams by space-filling Curves (DISC) Cells are not explicitly maintained, only points. Cells linearized according to Z-curve. Z-value of the cell is the key of a point. Points maintained in a B*-tree. An efficient merge-cell algorithm possible.
6
Algorithm: Build index m: the order of Z-curve, 2 m cells each dim. If e given,, we get. m e is integer, so If memory constraint given, set a large enough m. Build index –Initialize m –Read a record P, calculate Z-value, search the B*-tree and find out N c : number of existing points in the cell P belongs to. –If N c < K Insert P to the B*-tree. –Else Discard one and insert P. –If memory runs out //this only happens for the error minimization problem Merge cells and let m=m-1 –Go back to Step 2 (Read next record)
7
Algorithm: Merge Cells General Merge-Cell –Apply to any structure. –For each new cell, find all the points of the old cells in it, and merge them. Bulk Merge-Cell –Only apply to DISC. –Scan all the leaf pages once.
8
Algorithm: KNN search W: a window query centered at the center of the cell Q is in; and with gradually increasing side length s. Find the kNN to Q within W. –If the kNN distance is no larger than the distance between the nearest side of W to Q and Q, search terminates; –Else increase s by 1/u.
9
Experiments
10
Questions ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.