Download presentation
Presentation is loading. Please wait.
Published byGertrude Hall Modified over 9 years ago
1
Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)
2
Navigating Nets2 A classical problem Fix a metric space (X,d): X = set of points. d = distance function over X. Near-neighbor search (NNS) [Minsky-Papert]: 1. Preprocess a given n-point subset S X. 2. Given a query point q 2 X, quickly compute the closest point to q among S.
3
Navigating Nets3 Variations on NNS (1+ )-approximate nearest neighbor search: Find a 2 X such that d(q,a) · (1+ ) d(q,S). Dynamic case: Allow updates to S (insertions and deletions). Distributed case: No central index (e.g., nodes in a network). Other cost measures (e.g., communication, stretch, load).
4
Navigating Nets4 General metrics Only oracle access to distance function d( ¢, ¢ ). Models a complicated metric or on-demand measurement. No “hashing of coordinates” or tuning for a specific metric. Goal: efficient query (sublinear or polylog time). Impossible, even if the data set S is a path metric: 1 2n n-1 n n What about approximate NNS?
5
Navigating Nets5 Approximate NNS Hard even for (near) uniform metrics d(x,y) = 1 for all x,y 2 S. 1 1 1 But many data sets lack large uniform subsets. Can we quantify this?
6
Navigating Nets6 Abstract dimension The doubling constant X of a metric (X,d) is the minimum such that every ball can be covered by balls of half the radius. The metric is doubling if X = O(1). The (abstract) dimension is dim (X) = log 2 X. Immediate properties: dim A (R d, || · || 2 ) = O(d). dim A (X’) dim A (X) for all X’ X. dim A (X) log |X|. (Equality for a uniform metric.)
7
Navigating Nets7 Illustration Grid with missing piece
8
Navigating Nets8 Illustration Grid with missing piece Low-dimensional manifold (bounded curvature)
9
Navigating Nets9 Illustration Grid with missing piece Manifold Union of curves in Euclidean space
10
Navigating Nets10 Embedding doubling metrics Theorem [Assouad, 1983] [Gupta, K., Lee, 2003]: Fix 0< <1, and let (X,d) be a doubling metric. Then (X,d ) can be embedded with O(1) distortion into l 2 O(1). Not true for =1 [Semmes, 1996]. Motivation: Embed S and then apply Euclidean NNS.
11
Navigating Nets11 Our results Simple data structure for maintaining S: (1+ )-NNS query time: (1/ ) O(dim(S)) · log (for <½), where d max /d min is the normalized diameter of S (typically =n O(1) ). Space: n · 2 O(dim(S)) Dynamic maintenance of S: Insertion / deletion time: 2 O(dim(S)) · log · loglog . Additional properties: Best possible dependency on dim(S) (in a certain model). Oblivious to dim(S) and robust against “bad localities”. Matches/improves known (more specialized) results.
12
Navigating Nets12 Nets Definition: An r-net of X is a subset Y with 1. d(y 1,y 2 ) r for all y 1,y 2 2 Y. 2. d(x,Y) < r for all x 2 X n Y. (I.e., a maximal r-separated subset.) Note: Compare vs. -net. Running example – a path metric: An 8-net A 4-net A 16-net
13
Navigating Nets13 More nets Definition: An r-net of X is a subset Y with 1. d(y 1,y 2 ) r for all y 1,y 2 2 Y. 2. d(x,Y) < r for all x 2 X n Y. (I.e., a maximal r-separated subset.) Note: Compare vs. -net. Y r YY Y
14
Navigating Nets14 The data structure For every r = 2 i, let Y r be an r-net of S. Only O(log ) values of r are non-trivial. A 16-net An 8-net A 4-net For every y 2 Y r maintain a navigation list L y,r = {z 2 Y r/2 : d(y,z) 2r}
15
Navigating Nets15 More on the data structure 3r Y r/2 YrYr For every r = 2 i, let Y r be an r-net of S. Only O(log ) values of r are non-trivial. For every y 2 Y r maintain a navigation list L y,r = {z 2 Y r/2 : d(y,z) 2r}
16
Navigating Nets16 Space requirement Lemma: |L y,r | 2 O(dim(S)) for all y 2 Y, r ¸ 0. Proof: L y,r is contained in a ball of radius 2r. This ball can be covered by S 3 balls of radius r/4. Every point in L y,r Y r/2 must be covered by a distinct ball. Hence, | L y,r | S 3 = 2 3dim(S). Corollary: Total space is 2 O(dim(S)) · n · log . We actually improve it to 2 O(dim(S)) · n.
17
Navigating Nets17 Back to running example A 16-net An 8-net A 4-net
18
Navigating Nets18 Navigating nets Let $ denote the query point. Initially z 16 = only point in Y 16. Find z 8 = closest Y 8 point to $. Find z 4 = closest Y 4 point to $ etc. $ $ $
19
Navigating Nets19 How to find z r/2 ? Assume each z r 2 Y r is the closest point to a (instead of to q). Then d(z r,z r/2 ) · r+r/2 = 3r/2. And z r/2 must be in z r ‘s list L y,r. q zr zr · r a z r/2 · r/2 · r/4 For z r to be closest Y r point to q, It suffices that d(q,a) · r/4. And then z r ’s list L y,r contains z r/2. Note: d(q,z r ) · 3r/2.
20
Navigating Nets20 Stopping point If we find a point z r with d(q,z r ) · 3r/2, But not a point z r/2 with d(q,z r/2 ) · 3r/4, We know that d(q,S) > r/4, Yielding 6-NNS with query time 2 O(dim(S)) · log . This can be extended to (1+ )-NNS Similar principles yield insertions and deletions.
21
Navigating Nets21 Near-optimality The basic idea: Consider a uniform metric on points. Let the query point be at distance 1 from all of them, Except for one point whose distance is 1- . Finding this point requires (in an oracle model) computing all distances to q. Can happen at every distance scale r. We get a lower bound of 2 (dim(S)) log .
22
Navigating Nets22 Related work – general metrics Let K X be the smallest K such that |B(x,r)| K ¢ |B(x,r/2)| for all x 2 X, r ¸ 0. Define the KR-dimension as log 2 K X. Randomized exact NNS [Karger-Ruhl’02, Hildrum et al.’04] : Space n · 2 O(dim(S)) · log . Query time : 2 O(dim(S)) · log . If dim KR (S) = O(1) the log term is actually O(log n). Our results extend to this setting: 1. KR-metrics are doubling: dim(X) 4dim KR (X). 2. Our algorithms actually give exact NNS. Assumptions on query distribution [Clarkson’99].
23
Navigating Nets23 Related work – Euclidean metrics Exact NNS for R d : O(d 5 log n) query time and O(n d+ ) space. [Meiser’93] - NNS for R d : O((d/ ) d log n) query time and O(dn) space by quad-tree like decompositions [AMNSW’94]. Our algorithm achieves similar bounds. O(d polylog(dn)) query time and (dn) O(1) space is useful for higher dimensions [IM’98, KOR’98].
24
Navigating Nets24 Concluding remarks Our approach: A “decision tree” that is not really a tree (saves space). In progress: A different (static) scheme where log is replaced by log n. Bounds on the help of “ambient” space points. Our data structure yields a spanner of the metric Immediate: O(1) stretch with average degree 2 dim(S). More work: O(1) stretch with maximum degree 2 dim(S). [Guibas,’04] applied the nets data structure for moving points in the plane.
25
Navigating Nets25
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.