Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Slides:



Advertisements
Similar presentations
Efficient classification for metric data Lee-Ad GottliebWeizmann Institute Aryeh KontorovichBen Gurion U. Robert KrauthgamerWeizmann Institute TexPoint.
Advertisements

Lower Bounds for Additive Spanners, Emulators, and More David P. Woodruff MIT and Tsinghua University To appear in FOCS, 2006.
Embedding Metric Spaces in Their Intrinsic Dimension Ittai Abraham, Yair Bartal*, Ofer Neiman The Hebrew University * also Caltech.
A Nonlinear Approach to Dimension Reduction Robert Krauthgamer Weizmann Institute of Science Joint work with Lee-Ad Gottlieb TexPoint fonts used in EMF.
WSPD Applications.
Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
Overcoming the L 1 Non- Embeddability Barrier Robert Krauthgamer (Weizmann Institute) Joint work with Alexandr Andoni and Piotr Indyk (MIT)
Fast Algorithms For Hierarchical Range Histogram Constructions
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
1 Greedy Forwarding in Dynamic Scale-Free Networks Embedded in Hyperbolic Metric Spaces Dmitri Krioukov CAIDA/UCSD Joint work with F. Papadopoulos, M.
Doubling dimension and the traveling salesman problem Yair BartalHebrew University Lee-Ad GottliebHebrew University Robert KrauthgamerWeizmann Institute.
A Metric Notion of Dimension and Its Applications to Learning Robert Krauthgamer (Weizmann Institute) Based on joint works with Lee-Ad Gottlieb, James.
Metric Embeddings with Relaxed Guarantees Hubert Chan Joint work with Kedar Dhamdhere, Anupam Gupta, Jon Kleinberg, Aleksandrs Slivkins.
Cse 521: design and analysis of algorithms Time & place T, Th pm in CSE 203 People Prof: James Lee TA: Thach Nguyen Book.
Embedding Metrics into Ultrametrics and Graphs into Spanning Trees with Constant Average Distortion Ittai Abraham, Yair Bartal, Ofer Neiman The Hebrew.
The strange geometries of computer science James R. Lee University of Washington TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Metric Embeddings As Computational Primitives Robert Krauthgamer Weizmann Institute of Science [Based on joint work with Alex Andoni]
A Nonlinear Approach to Dimension Reduction Lee-Ad Gottlieb Weizmann Institute of Science Joint work with Robert Krauthgamer TexPoint fonts used in EMF.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Robert Pless, CS 546: Computational Geometry Lecture #3 Last Time: Convex Hulls Today: Plane Sweep Algorithms, Segment Intersection, + (Element Uniqueness,
Department of Computer Science, University of Maryland, College Park, USA TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Proximity algorithms for nearly-doubling spaces Lee-Ad Gottlieb Robert Krauthgamer Weizmann Institute TexPoint fonts used in EMF. Read the TexPoint manual.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Advances in Metric Embedding Theory Ofer Neiman Ittai Abraham Yair Bartal Hebrew University.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
A Note on Finding the Nearest Neighbor in Growth-Restricted Metrics Kirsten Hildrum John Kubiatowicz Sean Ma Satish Rao.
Lower Bounds on the Distortion of Embedding Finite Metric Spaces in Graphs Y. Rabinovich R. Raz DCG 19 (1998) Iris Reinbacher COMP 670P
1 Random Walks in WSN 1.Efficient and Robust Query Processing in Dynamic Environments using Random Walk Techniques, Chen Avin, Carlos Brito, IPSN 2004.
Searching Dynamic Point Sets in Spaces with Bounded Doubling Dimension Lee-Ad Gottlieb Joint work with Richard Cole.
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Inst. / Columbia) Robert Krauthgamer (Weizmann Inst.) Ilya Razenshteyn (MIT, now.
Doubling Dimension in Real-World Graphs Melitta Lorraine Geistdoerfer Andersen.
Distance scales, embeddings, and efficient relaxations of the cut cone James R. Lee University of California, Berkeley.
Algorithms on negatively curved spaces James R. Lee University of Washington Robert Krauthgamer IBM Research (Almaden) TexPoint fonts used in EMF. Read.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
On Embedding Edit Distance into L_11 On Embedding Edit Distance into L 1 Robert Krauthgamer (Weizmann Institute and IBM Almaden)‏ Based on joint work (i)
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
Fast, precise and dynamic distance queries Yair BartalHebrew U. Lee-Ad GottliebWeizmann → Hebrew U. Liam RodittyBar Ilan Tsvi KopelowitzBar Ilan → Weizmann.
Ad Hoc and Sensor Networks – Roger Wattenhofer –3/1Ad Hoc and Sensor Networks – Roger Wattenhofer – Topology Control Chapter 3 TexPoint fonts used in EMF.
An optimal dynamic spanner for points residing in doubling metric spaces Lee-Ad Gottlieb NYU Weizmann Liam Roditty Weizmann.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008.
Geometric Problems in High Dimensions: Sketching Piotr Indyk.
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Piotr Indyk, Rajeev Motwani The 30 th annual ACM symposium on theory of computing.
1 Efficient Algorithms for Substring Near Neighbor Problem Alexandr Andoni Piotr Indyk MIT.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Doubling Dimension: a short survey Anupam Gupta Carnegie Mellon University Barriers in Computational Complexity II, CCI, Princeton.
A light metric spanner Lee-Ad Gottlieb. Graph spanners A spanner for graph G is a subgraph H ◦ H contains vertices, subset of edges of G Some qualities.
On the Impossibility of Dimension Reduction for Doubling Subsets of L p Yair Bartal Lee-Ad Gottlieb Ofer Neiman.
Distributed Algorithms for Dynamic Coverage in Sensor Networks Lan Lin and Hyunyoung Lee Department of Computer Science University of Denver.
Introduction Wireless Ad-Hoc Network  Set of transceivers communicating by radio.
1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Algorithms for Radio Networks Winter Term 2005/2006.
Ultra-low-dimensional embeddings of doubling metrics
Lecture 18: Uniformity Testing Monotonicity Testing
Chapter 5. Optimal Matchings
Enumerating Distances Using Spanners of Bounded Degree
Near-Optimal (Euclidean) Metric Compression
Light Spanners for Snowflake Metrics
Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University
Locality Sensitive Hashing
cse 521: design and analysis of algorithms
Introduction Wireless Ad-Hoc Network
Kinetic Collision Detection for Convex Fat Objects
Embedding Metrics into Geometric Spaces
Lecture 15: Least Square Regression Metric Embeddings
The Intrinsic Dimension of Metric Spaces
Clustering.
Hierarchical Routing in Networks with Bounded Doubling Dimension
Routing in Networks with Low Doubling Dimension
Presentation transcript:

Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets2 A classical problem Fix a metric space (X,d): X = set of points. d = distance function over X. Near-neighbor search (NNS) [Minsky-Papert]: 1. Preprocess a given n-point subset S  X. 2. Given a query point q 2 X, quickly compute the closest point to q among S.

Navigating Nets3 Variations on NNS (1+  )-approximate nearest neighbor search: Find a 2 X such that d(q,a) · (1+  ) d(q,S). Dynamic case: Allow updates to S (insertions and deletions). Distributed case: No central index (e.g., nodes in a network). Other cost measures (e.g., communication, stretch, load).

Navigating Nets4 General metrics Only oracle access to distance function d( ¢, ¢ ). Models a complicated metric or on-demand measurement. No “hashing of coordinates” or tuning for a specific metric. Goal: efficient query (sublinear or polylog time). Impossible, even if the data set S is a path metric: 1 2n n-1 n n What about approximate NNS?

Navigating Nets5 Approximate NNS Hard even for (near) uniform metrics d(x,y) = 1 for all x,y 2 S  But many data sets lack large uniform subsets. Can we quantify this?

Navigating Nets6 Abstract dimension The doubling constant X of a metric (X,d) is the minimum such that every ball can be covered by balls of half the radius. The metric is doubling if X = O(1). The (abstract) dimension is dim (X) = log 2 X. Immediate properties: dim A (R d, || · || 2 ) = O(d). dim A (X’)  dim A (X) for all X’  X. dim A (X)  log |X|. (Equality for a uniform metric.)

Navigating Nets7 Illustration Grid with missing piece

Navigating Nets8 Illustration Grid with missing piece Low-dimensional manifold (bounded curvature)

Navigating Nets9 Illustration Grid with missing piece Manifold Union of curves in Euclidean space

Navigating Nets10 Embedding doubling metrics Theorem [Assouad, 1983] [Gupta, K., Lee, 2003]: Fix 0<  <1, and let (X,d) be a doubling metric. Then (X,d  ) can be embedded with O(1) distortion into l 2 O(1). Not true for  =1 [Semmes, 1996]. Motivation: Embed S and then apply Euclidean NNS.

Navigating Nets11 Our results Simple data structure for maintaining S: (1+  )-NNS query time: (1/  ) O(dim(S)) · log  (for  <½), where  d max /d min is the normalized diameter of S (typically  =n O(1) ). Space: n · 2 O(dim(S))  Dynamic maintenance of S: Insertion / deletion time: 2 O(dim(S)) · log  · loglog . Additional properties: Best possible dependency on dim(S) (in a certain model). Oblivious to dim(S) and robust against “bad localities”. Matches/improves known (more specialized) results.

Navigating Nets12 Nets Definition: An r-net of X is a subset Y with 1. d(y 1,y 2 )  r for all y 1,y 2 2 Y. 2. d(x,Y) < r for all x 2 X n Y. (I.e., a maximal r-separated subset.) Note: Compare vs.  -net. Running example – a path metric: An 8-net A 4-net A 16-net

Navigating Nets13 More nets Definition: An r-net of X is a subset Y with 1. d(y 1,y 2 )  r for all y 1,y 2 2 Y. 2. d(x,Y) < r for all x 2 X n Y. (I.e., a maximal r-separated subset.) Note: Compare vs.  -net. Y r YY Y

Navigating Nets14 The data structure For every r = 2 i, let Y r be an r-net of S. Only O(log  ) values of r are non-trivial. A 16-net An 8-net A 4-net For every y 2 Y r maintain a navigation list L y,r = {z 2 Y r/2 : d(y,z)  2r}

Navigating Nets15 More on the data structure 3r Y r/2 YrYr For every r = 2 i, let Y r be an r-net of S. Only O(log  ) values of r are non-trivial. For every y 2 Y r maintain a navigation list L y,r = {z 2 Y r/2 : d(y,z)  2r}

Navigating Nets16 Space requirement Lemma: |L y,r |  2 O(dim(S)) for all y 2 Y, r ¸ 0. Proof: L y,r is contained in a ball of radius 2r. This ball can be covered by S 3 balls of radius r/4. Every point in L y,r  Y r/2 must be covered by a distinct ball. Hence, | L y,r |  S 3 = 2 3dim(S).  Corollary: Total space is 2 O(dim(S)) · n · log . We actually improve it to 2 O(dim(S)) · n.

Navigating Nets17 Back to running example A 16-net An 8-net A 4-net

Navigating Nets18 Navigating nets Let $ denote the query point. Initially z 16 = only point in Y 16. Find z 8 = closest Y 8 point to $. Find z 4 = closest Y 4 point to $ etc. $ $ $

Navigating Nets19 How to find z r/2 ? Assume each z r 2 Y r is the closest point to a (instead of to q). Then d(z r,z r/2 ) · r+r/2 = 3r/2. And z r/2 must be in z r ‘s list L y,r. q zr zr · r a z r/2 · r/2 · r/4 For z r to be closest Y r point to q, It suffices that d(q,a) · r/4. And then z r ’s list L y,r contains z r/2. Note: d(q,z r ) · 3r/2.

Navigating Nets20 Stopping point If we find a point z r with d(q,z r ) · 3r/2, But not a point z r/2 with d(q,z r/2 ) · 3r/4, We know that d(q,S) > r/4, Yielding 6-NNS with query time 2 O(dim(S)) · log . This can be extended to (1+  )-NNS Similar principles yield insertions and deletions.

Navigating Nets21 Near-optimality The basic idea: Consider a uniform metric on points. Let the query point be at distance 1 from all of them, Except for one point whose distance is 1- . Finding this point requires (in an oracle model) computing all distances to q. Can happen at every distance scale r. We get a lower bound of 2  (dim(S)) log .

Navigating Nets22 Related work – general metrics Let K X be the smallest K such that |B(x,r)|  K ¢ |B(x,r/2)| for all x 2 X, r ¸ 0. Define the KR-dimension as log 2 K X. Randomized exact NNS [Karger-Ruhl’02, Hildrum et al.’04] : Space n · 2 O(dim(S)) · log . Query time : 2 O(dim(S)) · log . If dim KR (S) = O(1) the log  term is actually O(log n). Our results extend to this setting: 1. KR-metrics are doubling: dim(X)  4dim KR (X). 2. Our algorithms actually give exact NNS. Assumptions on query distribution [Clarkson’99].

Navigating Nets23 Related work – Euclidean metrics Exact NNS for R d : O(d 5 log n) query time and O(n d+  ) space. [Meiser’93]  - NNS for R d : O((d/  ) d log n) query time and O(dn) space by quad-tree like decompositions [AMNSW’94]. Our algorithm achieves similar bounds. O(d polylog(dn)) query time and (dn) O(1) space is useful for higher dimensions [IM’98, KOR’98].

Navigating Nets24 Concluding remarks Our approach: A “decision tree” that is not really a tree (saves space). In progress: A different (static) scheme where log  is replaced by log n. Bounds on the help of “ambient” space points. Our data structure yields a spanner of the metric Immediate: O(1) stretch with average degree 2 dim(S). More work: O(1) stretch with maximum degree 2 dim(S). [Guibas,’04] applied the nets data structure for moving points in the plane.

Navigating Nets25