Approximate Near Neighbors for General Symmetric Norms

Approximate Near Neighbors for General Symmetric Norms
Ilya Razenshteyn (MIT CSAIL) joint with Alexandr Andoni (Columbia University) Aleksandar Nikolov (University of Toronto) Erik Waingarten (Columbia University) arXiv:

Nearest Neighbor Search
Motivation Data Feature vector space + distance function Data analysis Geometry / Linear Algebra / Optimization Similarity search Nearest Neighbor Search

An example Word embeddings
Yale Harvard graduate faculty undergraduate Juilliard university undergraduates Cornell MIT Word embeddings High-dimensional vectors that capture semantic similarity between words (and more) GloVe [Pennigton, Socher, Manning 2014], 400K words, 300 dimensions Ten nearest neighbors for “NYU”?

Approximate Near Neighbors (ANN)
Dataset: 𝑛 points in a metric space 𝑋 (denote by 𝑃) Approximation 𝑐>1, distance threshold 𝑟>0 Query: 𝑞∈𝑋 such that there is 𝑝 ∗ ∈𝑃 with 𝑑 𝑋 (𝑞, 𝑝 ∗ )≤𝑟 Want: 𝑝 ∈𝑃 such that 𝑑 𝑋 (𝑞, 𝑝 )≤𝑐𝑟 Parameters: space, query time 𝑐𝑟 𝑟 𝑞

This talk: a metric on 𝑅 𝑑 , where 𝜔 log 𝑛 ≤𝑑≤ 𝑛 𝑜(1)
FAQ Focus of this talk Q: why approximation? A: the exact case is hard for the high-dimensional problem. Q: what does “high-dimensional” mean? A: when 𝑑=𝜔( log 𝑛 ), where 𝑑 is the dimension of a metric. Q: how is the dimension defined? A: a metric is typically defined on 𝑅 𝑑 ; alternatively, doubling dimension, etc. Must depend on 𝑑 as 2 𝑜(𝑑) , ideally as 𝑑 𝑂(1) This talk: a metric on 𝑅 𝑑 , where 𝜔 log 𝑛 ≤𝑑≤ 𝑛 𝑜(1)

Which distance function to use?
𝛼 A distance function Must capture semantic similarity well Must be algorithmically tractable Word embeddings, etc.: cosine similarity The goal: classify metrics according to the complexity of high-dimensional ANN For theory: a poorly-understood property of a metric For practice: universal algorithm for ANN

High-dimensional norms
An important case: 𝑋 is a normed space 𝑑 𝑋 𝑥 1 , 𝑥 2 = 𝑥 1 − 𝑥 2 , where ⋅ : 𝑅 𝑑 → 𝑅 + is such that 𝑥 =0 iff 𝑥=0 𝛼𝑥 =|𝛼| 𝑥 𝑥 1 + 𝑥 2 ≤ 𝑥 𝑥 2 Lots of tools (linear functional analysis) [Andoni, Krauthgamer, R 2015] characterizes norms that allow efficient sketching (succinct summarization), which implies efficient ANN Approximation O 𝑑 is easy (John’s theorem)

Unit balls A norm can be given by its unit ball 𝐵 𝑋 = 𝑥∈ 𝑅 𝑑 𝑥 ≤1 𝑥 ≈2
Claim: 𝐵 𝑋 is a symmetric convex body Claim: any such body can be a unit ball 𝑥 𝐾 = inf 𝑡>0 𝑥 𝑡 ∈𝐾 What property of a convex body makes ANN wrt it tractable? John’s theorem: any symmetric convex body is close to an ellipsoid (gives approximation 𝑑 ) 𝐵 𝑋 𝑥 ≈2

Our result Invariant under permutation of coordinates and changing signs 𝑑= 2 𝑜 log 𝑛 log log 𝑛 If 𝑋 is a symmetric normed space, and 𝑑= 𝑛 𝑜(1) , can solve ANN with: Approximation 𝑂(1) Space 𝑛 1+𝑜(1) Query time 𝑛 𝑜(1) log log 𝑛 𝑂(1)

Examples Usual 𝑙 𝑝 norms 𝑥 𝑝 = 𝑖 𝑥 𝑖 𝑝 1 𝑝
Top-𝑘 norm: sum of 𝑘 largest absolute values of coordinates Interpolates between 𝑙 1 and 𝑙 ∞ Orlicz norms: a unit ball is 𝑥∈ 𝑅 𝑑 𝑖 𝐺 𝑥 𝑖 ≤1 , Where 𝐺(⋅) is convex and non-negative, and 𝐺 0 =0. Gives 𝑙 𝑝 norms for 𝐺 𝑡 = 𝑡 𝑝 𝑘-support norm, box-Θ norm, 𝐾-functional (arise in probability and machine learning) 𝑡 𝐺(𝑡)

Prior work: symmetric norms
[Blasiok, Braverman, Chestnut, Krauthgamer, Yang 2015]: classification of symmetric norms according to their streaming complexity Depends on how well the norm concentrates on the Euclidean ball Unlike streaming, ANN is always tractable

Prior work: ANN Mostly, focus on 𝑙 1 (Hamming/Manhattan) and 𝑙 2 (Euclidean) norms Work for many applications Allow efficient algorithms based on hashing Locality-Sensitive Hashing [Indyk, Motwani 1998] [Andoni, Indyk 2006] Data-dependent LSH [Andoni, Indyk, Nguyen, R 2014] [Andoni, R 2015] [Andoni, Laarhoven, R, Waingarten 2017]: tight trade-off between space and query time for every 𝑐>1 Few results for other norms ( 𝑙 ∞ , general 𝑙 𝑝 , will see later)

ANN for 𝑙 ∞ [Indyk 1998] ANN for 𝑑-dimensional 𝑙 ∞ :
Space 𝑑⋅𝑛 1+𝜀 Query time 𝑂(𝑑 log 𝑛 ) Approximation 𝑂( 𝜀 −1 ⋅ log log 𝑑 ) Main idea: recursive partitioning “Small” ball with Ω(𝑛) points ― easy No such balls ― there is a “good” cut wrt some coordinate [Andoni, Croitoru, Patrascu 2008] [Kapralov, Panigrahy 2012]: Approximation 𝑂 log log 𝑑 is tight for decision trees!

𝑑 𝑌 𝑓 𝑎 , 𝑓 𝑏 /𝐶≤ 𝑑 𝑋 𝑎,𝑏 ≤ 𝑑 𝑌 (𝑓 𝑎 , 𝑓 𝑏 )
Metric embeddings A map 𝑓:𝑋→𝑌 is an embedding with distortion 𝑪, if for 𝑎,𝑏∈𝑋: 𝑑 𝑌 𝑓 𝑎 , 𝑓 𝑏 /𝐶≤ 𝑑 𝑋 𝑎,𝑏 ≤ 𝑑 𝑌 (𝑓 𝑎 , 𝑓 𝑏 ) Reductions for geometric problems 𝑓(𝑎) 𝑓(𝑏) 𝑓 𝑋 𝑌 a 𝑏 ANN with approximation 𝐷 for 𝑌 ANN with approximation 𝐶𝐷 for 𝑋

Embedding norms into 𝑙 ∞
For a normed space 𝑋 and 𝜀>0 there exists 𝑓:𝑋→ 𝑙 ∞ 𝑑 ′ with 𝑓 𝑥 ∞ ∈ 1±𝜀 ⋅ 𝑥 𝑋 Proof idea: 𝑥 𝑋 ≈ max 𝑦∈𝑁 𝑥, 𝑦 Take all directions and discretize (more details later) Can we combine it with ANN for 𝑙 ∞ and obtain ANN for any norm? No! Discretization requires 𝑑 ′ = 1 𝜀 𝑂(𝑑) . Tight even for 𝑙 2 . Approximation 𝑂 log log 𝑑 ′ =𝑂 log 𝑑 ≪ 𝑑 .

The strategy    What Where Dimension Any norm 𝑙 ∞ 𝑑 ′ 𝑑 ′ = 2 𝑂(𝑑)
𝑙 ∞ 𝑑 ′ 𝑑 ′ = 2 𝑂(𝑑)   Symmetric norm ⨁ 𝑙 ∞ ⨁ 𝑙 1 top− 𝑘 𝑖𝑗 norm 𝑑 ′ = 𝑑 𝑂( log log 𝑑 )  Bypass non-embeddability into low-dimensional 𝑙 ∞ allowing a more complicated host space, which is still tractable

𝑙 𝑝 -direct sums of metric spaces
For metrics 𝑀 1 , 𝑀 2 , …, 𝑀 𝑡 , define ⨁ 𝑙 𝑝 𝑀 𝑖 as follows: The ground set is 𝑀 1 × 𝑀 2 ×…× 𝑀 𝑡 The distance is: 𝑑 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑡 , ( 𝑦 1 , 𝑦 2 ,…, 𝑦 𝑡 ) = (𝑑 𝑥 1 , 𝑦 1 , 𝑑 𝑥 2 , 𝑦 2 ,…,𝑑( 𝑥 𝑡 , 𝑦 𝑡 ) 𝑝 Example: ⨁ 𝑙 𝑝 𝑙 𝑞 ― cascaded norms Our host space: ⨁ 𝑙 ∞ ⨁ 𝑙 1 𝑋 𝑖𝑗 , where 𝑋 𝑖𝑗 is 𝑅 𝑑 equipped with the top- 𝑘 𝑖𝑗 norm Outer sum is of size 𝑑 𝑂( log log 𝑑 ) Inner sum is of size 𝑑

Two necessary steps Embed a symmetric norm into ⨁ 𝑙 ∞ ⨁ 𝑙 1 𝑋 𝑖𝑗
Solve ANN for ⨁ 𝑙 ∞ ⨁ 𝑙 1 𝑋 𝑖𝑗 Prior work on ANN via product spaces: for Frechet distance [Indyk 2002], edit distance [Indyk 2004], and Ulam distance [Andoni, Indyk, Krauthgamer 2009]

ANN for ⨁ 𝑙 ∞ ⨁ 𝑙 1 𝑋 𝑖𝑗 [Indyk 2002], [Andoni 2009]: if for 𝑀 1 , 𝑀 2 , …, 𝑀 𝑡 there are data structures for 𝑐-ANN, then for ⨁ 𝑙 𝑝 𝑀 𝑖 one can get 𝑂(𝑐 log log 𝑛 )-ANN with almost the same time and space A powerful generalization of ANN for 𝑙 ∞ [Indyk 1998] Trivially implies ANN for general 𝑙 𝑝 Thus, enough to handle ANN for 𝑋 𝑖𝑗 (top-𝑘 norms)!

ANN for top-𝑘 norms Include 𝑙 1 and 𝑙 ∞ , thus, need a unified approach Idea: embed a top-𝑘 norm into 𝑙 ∞ 𝑑 ′ and use [Indyk 1998] Approximation: distortion × O(log log 𝑑′ ) Problem: 𝑙 1 requires 2 Ω(𝑑) -dimensional 𝑙 ∞ Solution: use randomized embeddings

Embedding top-𝑘 norm into 𝑙 ∞
The case 𝑘=𝑑 (that is, 𝑙 1 ) Embedding (uses min-stability of exponential distribution): Sample i.i.d. 𝑢 1 , 𝑢 2 ,…, 𝑢 𝑑 ~Exp(1) Embed 𝑓: 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 ↦ 𝑥 1 𝑢 1 , 𝑥 2 𝑢 2 ,…, 𝑥 𝑑 𝑢 𝑑 Pr 𝑓 𝑥 ∞ ≤𝑡 = 𝑖 Pr 𝑥 𝑖 𝑢 𝑖 ≤𝑡 = 𝑖 𝑒 −| 𝑥 𝑖 |/𝑡 = 𝑒 − 𝑥 𝟏 /𝑡 Constant distortion w.h.p. In reality: slightly different parameters General 𝒌: sample 𝑢 𝑖 ~max 1 𝑘 ,Exp(1)

Detour: ANN for Orlicz norms
Reminder: for convex 𝐺:𝑅→ 𝑅 + with 𝐺 0 =0, define a norm whose unit ball is 𝑥 𝑖 𝐺 𝑥 𝑖 ≤1 (e.g., 𝐺 𝑡 = 𝑡 𝑝 gives 𝑙 𝑝 norms). Embedding into 𝑙 ∞ (as before, 𝑂(1) distortion w.h.p.): Sample i.i.d. 𝑢 1 , 𝑢 2 ,…, 𝑢 𝑑 ~𝓓 Embed 𝑓: 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 ↦ 𝑥 1 𝑢 1 , 𝑥 2 𝑢 2 ,…, 𝑥 𝑑 𝑢 𝑑 Pr 𝑋∼𝒟 [𝑋≤𝑡] =1− 𝑒 −𝐺(𝑡) A special case for 𝑙 𝑝 norms appeared in [Andoni 2009]

Where are we? Can solve ANN for ⨁ 𝑙 ∞ ⨁ 𝑙 1 𝑋 𝑖𝑗 , where 𝑋 𝑖𝑗 is 𝑅 𝑑 equipped with a top- 𝑘 𝑖𝑗 norm What remains to be done? Embed a 𝑑-dimensional symmetric norm into ( 𝑑 𝑂( log log 𝑑 ) -dimensional) ⨁ 𝑙 ∞ ⨁ 𝑙 1 𝑋 𝑖𝑗

Starting point: embedding any norm into 𝑙 ∞
For a normed space 𝑋 and 𝜀>0 there is linear 𝑓:𝑋→ 𝑙 ∞ 𝑑 ′ s.t. 𝑓 𝑥 ∞ ∈ 1±𝜀 ⋅ 𝑥 𝑋 A normed space 𝑋 ∗ dual to 𝑋: 𝑦 𝑋 ∗ = sup 𝑥 𝑋 ≤1 〈𝑥,𝑦〉 . Dual to 𝑙 𝑝 is 𝑙 𝑞 where 1 𝑝 + 1 𝑞 =1 ( 𝑙 1 vs. 𝑙 ∞ , 𝑙 2 vs. 𝑙 2 , etc.). Claim: for every 𝑥∈𝑋, have: 𝑥 𝑋 ∈ 1±𝜀 ⋅ max 𝑦∈N | 𝑥, 𝑦 | , where 𝑁 is an 𝜺-net of 𝐁 𝑿 ∗ (wrt 𝑋 ∗ ) Immediately gives an embedding

Proof For every 𝑦∈𝑁, have 𝑥,𝑦 ≤ 𝑥 𝑋 ⋅ 𝑦 𝑋 ∗ ≤ 𝑥 𝑋 ,
𝑥,𝑦 ≤ 𝑥 𝑋 ⋅ 𝑦 𝑋 ∗ ≤ 𝑥 𝑋 , thus, max 𝑦∈𝑁 |〈𝑥, 𝑦〉| ≤ 𝑥 𝑋 . There exists 𝑦 such that 𝑦 𝑋 ∗ ≤1 and 𝑥,𝑦 = 𝑥 𝑋 Non-trivial, requires Hahn–Banach theorem Move 𝑦 to the closest 𝑦 ′ ∈𝑁 Get 𝑥,𝑦′ ≥ 1−𝜀 ⋅ 𝑥 𝑋 Thus, max 𝑦∈𝑁 |〈𝑥, 𝑦〉| ≥ 1−𝜀 ⋅ 𝑥 𝑋 Can take 𝑁 = 1 𝜀 𝑂(𝑑) by the volume argument

Better embeddings for symmetric norms
Recap: can’t embed even 𝑙 2 𝑑 into 𝑙 ∞ 𝑑′ unless 𝑑 ′ = 2 Ω(𝑑) Instead, aim at embedding a symmetric norm into ⨁ 𝑙 ∞ ⨁ 𝑙 1 𝑋 𝑖𝑗 High level idea: a new space is more forgiving and allows to consider an 𝜀-net of 𝐵 𝑋 ∗ up to a symmetry Show that there is an 𝜀-net that is a result of applying symmetries to merely 𝑑 𝑂( log log 𝑑 ) vectors!

Exploiting symmetry For a vector 𝑥, 𝜋∈ 𝑆 𝑑 , and 𝜎∈{−1, 1 } 𝑑 , denote 𝑥 𝜋,𝜎 be 𝑥 with coordinates permuted according to 𝜋 and signs flipped according to 𝜎 Recap: 𝑥 𝜋, 𝜎 𝑋 = 𝑥 𝑋 Suppose that 𝑁 ′ is an 𝜀-net for 𝐵 𝑋 ∗ intersect ℒ= 𝑦 𝑦 1 ≥ 𝑦 2 ≥…≥ 𝑦 𝑑 ≥0 Then, 𝑥 𝑋 ∈ 1±𝜀 ⋅ max 𝑦∈ 𝑁 ′ ,𝜋,𝜎 𝑥, 𝑦 𝜋,𝜎 = 1±𝜀 ⋅ max 𝑦∈𝑁′ max 𝜋,𝜎 〈𝑥, 𝑦 𝜋,𝜎 〉 = 1±𝜀 ⋅ max 𝑦∈𝑁′ 𝑥 𝑦 = 1±𝜀 ⋅ max 𝑦∈𝑁′ 𝑘 𝑦 𝑘 − 𝑦 𝑘+1 ⋅ (top−𝑘 norm of 𝑥) max 𝑦∈𝑁′ 𝑘 𝑦 𝑘 − 𝑦 𝑘+1 ⋅ (top−𝑘 norm of 𝑥)

Small nets What remains to be done: an 𝜀-net for
𝐵 𝑋 ∗ ∩ 𝑦 𝑦 1 ≥ 𝑦 2 ≥…≥ 𝑦 𝑑 ≥0 of size 𝑑 𝑂 𝜀 ( log log 𝑑 ) Will see a weaker bound of 𝑑 𝑂 𝜀 ( log 𝑑 ) , still non-trivial Volume bound fails Instead, a simple explicit construction

Small nets: continued Want to approximate a vector 𝑦∈ 𝐵 𝑋 ∗ with 𝑦 1 ≥ 𝑦 2 ≥…≥ 𝑦 𝑑 ≥0 Zero all 𝑦 𝑖 ’s that are smaller than 𝑦 1 /poly 𝑑 Round all coordinates to a nearest power of 1+𝜀 O 𝜀 (log 𝑑 ) scales Only cardinality of each scale matters 𝑑 𝑂 𝜀 ( log 𝑑 ) vectors total Can be improved to 𝑑 𝑂 𝜀 ( log log 𝑑 ) by one more trick

Quick summary Embed a symmetric norm into a 𝑑 log log 𝑑 -dimensional product space of top- 𝑘 norms Use known techniques to reduce the ANN problem on the product space to ANN for the top-𝑘 norm Uses truncated exponential random variables to embed the top-𝑘 norm into 𝑙 ∞ and use a known ANN data structure there

Two immediate open questions
Improve the dependence on 𝑑 from 𝑑 log log 𝑑 to 𝑑 𝑂(1) Better 𝜀-net for 𝐵 𝑋 ∗ ∩ 𝑦 𝑦 1 ≥ 𝑦 2 ≥…≥ 𝑦 𝑑 ≥0 Looks doable Improve approximation from log log 𝑛 𝑂 1 to O(log log 𝑑) Beyond log log 𝑑 is hard due to 𝑙 ∞ Need to bypass ANN for product spaces Maybe randomized embedding into low-dimensional 𝑙 ∞ for any symmetric norm?

General norms Exists an embedding into 𝑙 2 with distortion 𝑂 𝑑
Universal 𝑑 log log 𝑑 -dimensional space that can host all 𝑑-dimensional symmetric norms Impossible for general norms even for randomized embeddings: even distortion 𝑑 requires dimension 2 𝑑 Ω(1) Stronger hardness results? Implied by: there is a family of spectral expanders that embed with distortion 𝑂 1 into some log 𝑂(1) 𝑛 -dimensional norm, where 𝑛 is the number of nodes [Naor 2016]

The main open question Thanks!
Is there an efficient ANN algorithm for general high-dimensional norms with approximation 𝑑 𝑜(1) ? There is hope… Thanks!

Approximate Near Neighbors for General Symmetric Norms

Similar presentations

Presentation on theme: "Approximate Near Neighbors for General Symmetric Norms"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Approximate Near Neighbors for General Symmetric Norms

Similar presentations

Presentation on theme: "Approximate Near Neighbors for General Symmetric Norms"— Presentation transcript:

Similar presentations

About project

Feedback