Download presentation
Presentation is loading. Please wait.
1
Near-Optimal (Euclidean) Metric Compression
Piotr Indyk (MIT) Tal Wagner
2
Metric Sketching a.k.a. Distance Oracle
Task: Given a finite metric space, store all distances, approximately, with minimum space. Formally: π· π Summ Input: Metric π· on π= 1,β¦,π , and π>0 Summary: Given π,π· and π, output sπβ{0,1 } π Estimation: Given π₯,π¦βπ and π π, output π· (π₯,π¦) π· π₯,π¦ β€ π· π₯,π¦ β€ 1+π π·(π₯,π¦) Goal: Minimize π sketch Est π₯,π¦ π· (π₯,π¦)
3
Euclidean Metrics There is π:πβ β π s.t. for all π₯,π¦βπ,
π· π₯,π¦ =βπ π₯ βπ π¦ β π is known during summary, unknown during estimation
4
Dimension Reduction For Euclidean metrics, dimension is reducible to π=π( π β2 log π ) [Johnson-Lindenstraussβ84] Resulting sketch size: π( π β2 π log π) numbers How many bits? Depends on numerics of input metric (has to!) The spread of (π,π·) is Ξ¦β max π₯,π¦βπ π· π₯,π¦ / min π₯β π¦βπ π·(π₯,π¦) We get (essentially): π( π β2 π log π log πΞ¦ ) bits
5
Is JL Optimal? For dimension reduction? Yes [Larsen-Nelsonβ16]
Previously: Almost - up to a logβ‘(1/π) factor [Alonβ03] For metric sketching? No [this work]
6
Results: Euclidean Metrics
Theorem: For Euclidean metrics, π π β2 log 1 π π log π +π log log Ξ¦ bits. Lower bound: Ξ© π β2 π log π + π log log Ξ¦ bits. (tight up to log 1/π ) Compare with: π( π β2 π log π β
log πΞ¦ ) (by JL) If Ξ¦=ππππ¦(π) and π=Ξ© 1 : log 2 π vs. log π bits per point. Same upper and lower bounds for β 1 metrics.
7
π πΌ β1 π β2 log 1 π π log π +π log log Ξ¦ bits.
Running Times Theorem: For Euclidean metrics, π πΌ β1 π β2 log 1 π π log π +π log log Ξ¦ bits. Construction time: π π 1+πΌ + ππ β2 for any πΌ>0 (by LSH) Estimation time: π π β2 log 2 π
8
Results: Arbitrary Metrics
Theorem: For arbitrary metrics, π π 2 log 1 π +π log log Ξ¦ bits. Lower bound: Ξ© π 2 log 1 π +π log log Ξ¦ (tight) Compare with: π π 2 log 1 π + π 2 log log Ξ¦ (naΓ―ve rounding) If Ξ¦=ππππ¦(π) and π=Ξ© 1 : π log log π vs. π bits per point. (Same algorithm)
9
Related Work π(π π β2 log π log πΞ¦ ) π( ππ β2 log π log Ξ¦ )
Euclidean metrics JL (essentially) π(π π β2 log π log πΞ¦ ) [Kushilevitz-Ostrovsky-Rabaniβ98] [Alon-Klartagβ16] π( ππ β2 log π log Ξ¦ ) [Molinaro-Woodruff-Yaroslavtsevβ13] Ξ© ππ β2 log π log Ξ¦ But in a different communication model This work π π π β2 log π log 1 π +π log log Ξ¦ Ξ© π π β2 log π +π log log Ξ¦
10
Related Work π π 1+1/k for stretch 2πβ1β₯3 Arbitrary metrics
[AlthΓΆfer-Das-Dobkin-Joseph-Soaresβ93] [MatouΕ‘ekβ96] [Thorup-Zwickβ00] [β¦] π π 1+1/k for stretch 2πβ1β₯3 Tight by ErdΕs Girth Conjecture Bipartite graphs Ξ© π 2 for stretch <3 This work Ξ π 2 log 1 π +π log log Ξ¦ for stretch 1+π
11
π· π Summ Algorithm Overview sketch Est π₯,π¦ π· (π₯,π¦)
12
Surrogates π₯ 2 β π π₯ 1 π Fix π Store each point as displacement from a nearby point
13
Surrogates β π π₯ 2 π₯ 1 π 1 β π 2 β Fix π
π₯ 2 β π 1 β π 1 β π₯ 2 β π 1 β π 2 β Fix π Store each point as displacement from a nearby point Define a surrogate for every π₯ π π 1 β β π₯ 1 π π β β π πβ1 β + π₯ π β π πβ1 β π£ is π£ rounded to π-net π£ βπ£ β€ π£ β
π
14
Surrogates β π π₯ 2 π π₯ 3 π₯ 1 π 1 β π 2 β Fix π
Store each point as displacement from a nearby point Define a surrogate for every π₯ π π 1 β β π₯ 1 π π β β π πβ1 β + π₯ π β π πβ1 β π£ is π£ rounded to π-net π£ βπ£ β€ π£ β
π
15
Surrogates β π π₯ 2 π₯ 3 π₯ 1 π 1 β π 2 β π 3 β Fix π
π₯ 3 β π 2 β π 1 β π 2 β π₯ 3 β π 2 β Fix π π 3 β Store each point as displacement from a nearby point Define a surrogate for every π₯ π π 1 β β π₯ 1 π π β β π πβ1 β + π₯ π β π πβ1 β π£ is π£ rounded to π-net π£ βπ£ β€ π£ β
π
16
Surrogates β π π₯ 4 π₯ 2 π₯ 3 π π₯ 1 π 1 β π 2 β π 3 β Fix π
Store each point as displacement from a nearby point Define a surrogate for every π₯ π π 1 β β π₯ 1 π π β β π πβ1 β + π₯ π β π πβ1 β π£ is π£ rounded to π-net π£ βπ£ β€ π£ β
π
17
Surrogates β π π₯ 4 π₯ 2 π₯ 3 π₯ 1 π 1 β π 2 β π 4 β π 3 β Fix π
π₯ 4 β π 3 β π 1 β π 2 β π₯ 4 β π 3 β π 4 β Fix π π 3 β Store each point as displacement from a nearby point Define a surrogate for every π₯ π π 1 β β π₯ 1 π π β β π πβ1 β + π₯ π β π πβ1 β π£ is π£ rounded to π-net π£ βπ£ β€ π£ β
π
18
Surrogates β π Estimation: π₯ 4 π₯ 2 π₯ 3 π₯ 1 π 1 β π 2 β π 4 β π 3 β
Claim: π₯ π β π π β β€2πβ
π (Proof: by induction) Corollary: π π β β π π β = 1Β±4π β π₯ π β π₯ π β
19
Surrogates β π π₯ 4 π₯ 2 π₯ 3 π₯ 1 π 4 β π 2 β π 3 β Sketch: For π₯ π store
π₯ 4 β π 3 β π 4 β π₯ 2 β π 1 β π 2 β π₯ 3 β π 2 β π 3 β Sketch: For π₯ π store Ingress label: πβ1 Rounded displacement: π₯ π β π πβ1 β Size: π log π +π log 1 π = π π log π bits per point
20
Scheme works because all displacements have length π
Surrogates π₯ 4 π₯ 2 β π π π₯ 3 π π₯ 1 π Scheme works because all displacements have length π Sketch: For π₯ π store Ingress label: πβ1 Rounded displacement: π₯ π β π πβ1 β Size: π log π +π log 1 π = π π log π bits per point
21
Multiple Distance Scales
β π π₯ 2 π¦ 2 π¦ 1 π π
β«π π₯ 1 π
π
22
Multiple Distance Scales
β π π₯ 2 π¦ 2 π¦ 1 π π
β«π π₯ 1 π
ππ
π Previous scheme breaks down Error ππ
too big for distance π
23
Multiple Distance Scales
β π π₯ 2 π¦ 2 π¦ 1 π π
β«π π₯ 1 π
π Previous scheme breaks down What can we do? Option 1: Use better precision: π π
βπβ
π π
(costs more bits)
24
Multiple Distance Scales
β π π₯ 2 π¦ 2 π¦ 1 π π
β«π π₯ 1 π
π Previous scheme breaks down What can we do? Option 1: Use better precision: π π
βπβ
π π
(costs more bits) Option 2: Contract pairs to single points (saves bits, works if π<ππ
)
25
General Case: Overview
Store each point as displacement from a nearby point Tune precision individually per cluster Contract isolated clusters into single point (from external pov)
26
Hierarchical Clustering
Step 1: Construct cluster tree. In level π, draw edges of length < 2 π , and take connected components. β π 4 5 6 1 3 2 7 8
27
Hierarchical Clustering
Step 1: Construct cluster tree. In level π, draw edges of length < 2 π , and take connected components. β π 4 5 6 1 3 2 7 1 2 3 4 5 6 7 8 8
28
Hierarchical Clustering
Step 1: Construct cluster tree. In level π, draw edges of length < 2 π , and take connected components. β π 4 5 6 1 3 2 1 4 7 7 1 2 3 4 5 6 7 8 8
29
Hierarchical Clustering
Step 1: Construct cluster tree. In level π, draw edges of length < 2 π , and take connected components. β π 4 5 6 1 3 1 7 2 1 4 7 7 1 2 3 4 5 6 7 8 8
30
Hierarchical Clustering
Step 1: Construct cluster tree. In level π, draw edges of length < 2 π , and take connected components. β π 4 1 7 5 6 1 3 1 7 2 1 4 7 7 1 2 3 4 5 6 7 8 8
31
Hierarchical Clustering
Step 1: Construct cluster tree. In level π, draw edges of length < 2 π , and take connected components. β π 1 4 1 7 5 6 1 3 1 7 2 1 4 7 7 1 2 3 4 5 6 7 8 8
32
Surrogates Step 2: Store each point as displacement from a nearby point. Tune precision individually per cluster (based on its level and diameter). Claim: Amortized π( π β2 log π ) bits per cluster. β π 1 4 π 1 7 5 π 6 1 3 1 7 π π 2 1 4 7 π 7 1 2 π 3 4 5 6 7 8 8
33
Tree Compression Step 3: Contract isolated clusters (from external pov). Compresses long degree-1 paths in cluster tree. β π 1 4 1 7 5 6 1 3 1 7 2 1 4 7 7 1 2 3 4 5 6 7 8 8
34
Tree Compression Step 3: Contract isolated clusters (from external pov). Compresses long degree-1 paths in cluster tree. Claim: Tree size becomes π π log 1 π . β π 1 4 1 7 5 6 βlong edgeβ 1 3 1 2 1 4 7 7 1 2 3 4 5 6 7 8 8
35
The Sketch Store the tree: #nodes: π π log 1 π
Lengths of long edges: π(π log log Ξ¦) bits For each node, store: Center label: π( log π) bits Ingress label: π( log π) bits Approx. displacement: π π β2 log π bits (amortized) 7 8 1 2 3 5 6 4 β π 1 2 3 4 5 6 7 8
36
Estimation β π Given a pair of point labels,
Compute best surrogates available in sketch. Return distance between them. β π 1 4 1 7 5 6 1 3 1 2 1 4 7 7 1 2 3 4 5 6 7 8 8
37
Estimation β π Dist(2,8) = ? Given a pair of point labels,
Compute best surrogates available in sketch. Return distance between them. Dist(2,8) = ? β π 1 4 1 7 5 6 1 3 1 2 1 4 7 7 2 7 1 2 3 4 5 6 7 8 8
38
Estimation β π Dist(7,8) = ? Given a pair of point labels,
Compute best surrogates available in sketch. Return distance between them. Dist(7,8) = ? β π 1 4 1 7 5 6 1 3 1 2 1 4 7 8 7 7 1 2 3 4 5 6 7 8 8
39
Subsequent Extensions
Recover low-dimensional embedding Approximate nearest neighbor of a new query point Estimate all distances from a new query point Open problem: log 1 π gap for Euclidean metrics π π β2 π log π log 1 π +π log log Ξ¦ Ξ© π β2 π log π +π log log Ξ¦ Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.