Presentation is loading. Please wait.

Presentation is loading. Please wait.

Approximate Near Neighbors for General Symmetric Norms

Similar presentations


Presentation on theme: "Approximate Near Neighbors for General Symmetric Norms"β€” Presentation transcript:

1 Approximate Near Neighbors for General Symmetric Norms
Ilya Razenshteyn (MIT CSAIL) joint with Alexandr Andoni (Columbia University) Aleksandar Nikolov (University of Toronto) Erik Waingarten (Columbia University) arXiv:

2 Nearest Neighbor Search
Motivation Data Feature vector space + distance function Data analysis Geometry / Linear Algebra / Optimization Similarity search Nearest Neighbor Search

3 An example Word embeddings
Yale Harvard graduate faculty undergraduate Juilliard university undergraduates Cornell MIT Word embeddings High-dimensional vectors that capture semantic similarity between words (and more) GloVe [Pennigton, Socher, Manning 2014], 400K words, 300 dimensions Ten nearest neighbors for β€œNYU”?

4 Approximate Near Neighbors (ANN)
Dataset: 𝑛 points in a metric space 𝑋 (denote by 𝑃) Approximation 𝑐>1, distance threshold π‘Ÿ>0 Query: π‘žβˆˆπ‘‹ such that there is 𝑝 βˆ— βˆˆπ‘ƒ with 𝑑 𝑋 (π‘ž, 𝑝 βˆ— )β‰€π‘Ÿ Want: 𝑝 βˆˆπ‘ƒ such that 𝑑 𝑋 (π‘ž, 𝑝 )β‰€π‘π‘Ÿ Parameters: space, query time π‘π‘Ÿ π‘Ÿ π‘ž

5 This talk: a metric on 𝑅 𝑑 , where πœ” log 𝑛 ≀𝑑≀ 𝑛 π‘œ(1)
FAQ Focus of this talk Q: why approximation? A: the exact case is hard for the high-dimensional problem. Q: what does β€œhigh-dimensional” mean? A: when 𝑑=πœ”( log 𝑛 ), where 𝑑 is the dimension of a metric. Q: how is the dimension defined? A: a metric is typically defined on 𝑅 𝑑 ; alternatively, doubling dimension, etc. Must depend on 𝑑 as 2 π‘œ(𝑑) , ideally as 𝑑 𝑂(1) This talk: a metric on 𝑅 𝑑 , where πœ” log 𝑛 ≀𝑑≀ 𝑛 π‘œ(1)

6 Which distance function to use?
𝛼 A distance function Must capture semantic similarity well Must be algorithmically tractable Word embeddings, etc.: cosine similarity The goal: classify metrics according to the complexity of high-dimensional ANN For theory: a poorly-understood property of a metric For practice: universal algorithm for ANN

7 High-dimensional norms
An important case: 𝑋 is a normed space 𝑑 𝑋 π‘₯ 1 , π‘₯ 2 = π‘₯ 1 βˆ’ π‘₯ 2 , where β‹… : 𝑅 𝑑 β†’ 𝑅 + is such that π‘₯ =0 iff π‘₯=0 𝛼π‘₯ =|𝛼| π‘₯ π‘₯ 1 + π‘₯ 2 ≀ π‘₯ π‘₯ 2 Lots of tools (linear functional analysis) [Andoni, Krauthgamer, R 2015] characterizes norms that allow efficient sketching (succinct summarization), which implies efficient ANN Approximation O 𝑑 is easy (John’s theorem)

8 Unit balls A norm can be given by its unit ball 𝐡 𝑋 = π‘₯∈ 𝑅 𝑑 π‘₯ ≀1 π‘₯ β‰ˆ2
Claim: 𝐡 𝑋 is a symmetric convex body Claim: any such body can be a unit ball π‘₯ 𝐾 = inf 𝑑>0 π‘₯ 𝑑 ∈𝐾 What property of a convex body makes ANN wrt it tractable? John’s theorem: any symmetric convex body is close to an ellipsoid (gives approximation 𝑑 ) 𝐡 𝑋 π‘₯ β‰ˆ2

9 Our result Invariant under permutation of coordinates and changing signs 𝑑= 2 π‘œ log 𝑛 log log 𝑛 If 𝑋 is a symmetric normed space, and 𝑑= 𝑛 π‘œ(1) , can solve ANN with: Approximation 𝑂(1) Space 𝑛 1+π‘œ(1) Query time 𝑛 π‘œ(1) log log 𝑛 𝑂(1)

10 Examples Usual 𝑙 𝑝 norms π‘₯ 𝑝 = 𝑖 π‘₯ 𝑖 𝑝 1 𝑝
Top-π‘˜ norm: sum of π‘˜ largest absolute values of coordinates Interpolates between 𝑙 1 and 𝑙 ∞ Orlicz norms: a unit ball is π‘₯∈ 𝑅 𝑑 𝑖 𝐺 π‘₯ 𝑖 ≀1 , Where 𝐺(β‹…) is convex and non-negative, and 𝐺 0 =0. Gives 𝑙 𝑝 norms for 𝐺 𝑑 = 𝑑 𝑝 π‘˜-support norm, box-Θ norm, 𝐾-functional (arise in probability and machine learning) 𝑑 𝐺(𝑑)

11 Prior work: symmetric norms
[Blasiok, Braverman, Chestnut, Krauthgamer, Yang 2015]: classification of symmetric norms according to their streaming complexity Depends on how well the norm concentrates on the Euclidean ball Unlike streaming, ANN is always tractable

12 Prior work: ANN Mostly, focus on 𝑙 1 (Hamming/Manhattan) and 𝑙 2 (Euclidean) norms Work for many applications Allow efficient algorithms based on hashing Locality-Sensitive Hashing [Indyk, Motwani 1998] [Andoni, Indyk 2006] Data-dependent LSH [Andoni, Indyk, Nguyen, R 2014] [Andoni, R 2015] [Andoni, Laarhoven, R, Waingarten 2017]: tight trade-off between space and query time for every 𝑐>1 Few results for other norms ( 𝑙 ∞ , general 𝑙 𝑝 , will see later)

13 ANN for 𝑙 ∞ [Indyk 1998] ANN for 𝑑-dimensional 𝑙 ∞ :
Space 𝑑⋅𝑛 1+πœ€ Query time 𝑂(𝑑 log 𝑛 ) Approximation 𝑂( πœ€ βˆ’1 β‹… log log 𝑑 ) Main idea: recursive partitioning β€œSmall” ball with Ξ©(𝑛) points ― easy No such balls ― there is a β€œgood” cut wrt some coordinate [Andoni, Croitoru, Patrascu 2008] [Kapralov, Panigrahy 2012]: Approximation 𝑂 log log 𝑑 is tight for decision trees!

14 𝑑 π‘Œ 𝑓 π‘Ž , 𝑓 𝑏 /𝐢≀ 𝑑 𝑋 π‘Ž,𝑏 ≀ 𝑑 π‘Œ (𝑓 π‘Ž , 𝑓 𝑏 )
Metric embeddings A map 𝑓:π‘‹β†’π‘Œ is an embedding with distortion π‘ͺ, if for π‘Ž,π‘βˆˆπ‘‹: 𝑑 π‘Œ 𝑓 π‘Ž , 𝑓 𝑏 /𝐢≀ 𝑑 𝑋 π‘Ž,𝑏 ≀ 𝑑 π‘Œ (𝑓 π‘Ž , 𝑓 𝑏 ) Reductions for geometric problems 𝑓(π‘Ž) 𝑓(𝑏) 𝑓 𝑋 π‘Œ a 𝑏 ANN with approximation 𝐷 for π‘Œ ANN with approximation 𝐢𝐷 for 𝑋

15 Embedding norms into 𝑙 ∞
For a normed space 𝑋 and πœ€>0 there exists 𝑓:𝑋→ 𝑙 ∞ 𝑑 β€² with 𝑓 π‘₯ ∞ ∈ 1Β±πœ€ β‹… π‘₯ 𝑋 Proof idea: π‘₯ 𝑋 β‰ˆ max π‘¦βˆˆπ‘ π‘₯, 𝑦 Take all directions and discretize (more details later) Can we combine it with ANN for 𝑙 ∞ and obtain ANN for any norm? No! Discretization requires 𝑑 β€² = 1 πœ€ 𝑂(𝑑) . Tight even for 𝑙 2 . Approximation 𝑂 log log 𝑑 β€² =𝑂 log 𝑑 β‰ͺ 𝑑 .

16 The strategy    What Where Dimension Any norm 𝑙 ∞ 𝑑 β€² 𝑑 β€² = 2 𝑂(𝑑)
𝑙 ∞ 𝑑 β€² 𝑑 β€² = 2 𝑂(𝑑)   Symmetric norm ⨁ 𝑙 ∞ ⨁ 𝑙 1 topβˆ’ π‘˜ 𝑖𝑗 norm 𝑑 β€² = 𝑑 𝑂( log log 𝑑 )  Bypass non-embeddability into low-dimensional 𝑙 ∞ allowing a more complicated host space, which is still tractable

17 𝑙 𝑝 -direct sums of metric spaces
For metrics 𝑀 1 , 𝑀 2 , …, 𝑀 𝑑 , define ⨁ 𝑙 𝑝 𝑀 𝑖 as follows: The ground set is 𝑀 1 Γ— 𝑀 2 ×…× 𝑀 𝑑 The distance is: 𝑑 π‘₯ 1 , π‘₯ 2 ,…, π‘₯ 𝑑 , ( 𝑦 1 , 𝑦 2 ,…, 𝑦 𝑑 ) = (𝑑 π‘₯ 1 , 𝑦 1 , 𝑑 π‘₯ 2 , 𝑦 2 ,…,𝑑( π‘₯ 𝑑 , 𝑦 𝑑 ) 𝑝 Example: ⨁ 𝑙 𝑝 𝑙 π‘ž ― cascaded norms Our host space: ⨁ 𝑙 ∞ ⨁ 𝑙 1 𝑋 𝑖𝑗 , where 𝑋 𝑖𝑗 is 𝑅 𝑑 equipped with the top- π‘˜ 𝑖𝑗 norm Outer sum is of size 𝑑 𝑂( log log 𝑑 ) Inner sum is of size 𝑑

18 Two necessary steps Embed a symmetric norm into ⨁ 𝑙 ∞ ⨁ 𝑙 1 𝑋 𝑖𝑗
Solve ANN for ⨁ 𝑙 ∞ ⨁ 𝑙 1 𝑋 𝑖𝑗 Prior work on ANN via product spaces: for Frechet distance [Indyk 2002], edit distance [Indyk 2004], and Ulam distance [Andoni, Indyk, Krauthgamer 2009]

19 ANN for ⨁ 𝑙 ∞ ⨁ 𝑙 1 𝑋 𝑖𝑗 [Indyk 2002], [Andoni 2009]: if for 𝑀 1 , 𝑀 2 , …, 𝑀 𝑑 there are data structures for 𝑐-ANN, then for ⨁ 𝑙 𝑝 𝑀 𝑖 one can get 𝑂(𝑐 log log 𝑛 )-ANN with almost the same time and space A powerful generalization of ANN for 𝑙 ∞ [Indyk 1998] Trivially implies ANN for general 𝑙 𝑝 Thus, enough to handle ANN for 𝑋 𝑖𝑗 (top-π‘˜ norms)!

20 ANN for top-π‘˜ norms Include 𝑙 1 and 𝑙 ∞ , thus, need a unified approach Idea: embed a top-π‘˜ norm into 𝑙 ∞ 𝑑 β€² and use [Indyk 1998] Approximation: distortion Γ— O(log log 𝑑′ ) Problem: 𝑙 1 requires 2 Ξ©(𝑑) -dimensional 𝑙 ∞ Solution: use randomized embeddings

21 Embedding top-π‘˜ norm into 𝑙 ∞
The case π‘˜=𝑑 (that is, 𝑙 1 ) Embedding (uses min-stability of exponential distribution): Sample i.i.d. 𝑒 1 , 𝑒 2 ,…, 𝑒 𝑑 ~Exp(1) Embed 𝑓: π‘₯ 1 , π‘₯ 2 ,…, π‘₯ 𝑑 ↦ π‘₯ 1 𝑒 1 , π‘₯ 2 𝑒 2 ,…, π‘₯ 𝑑 𝑒 𝑑 Pr 𝑓 π‘₯ ∞ ≀𝑑 = 𝑖 Pr π‘₯ 𝑖 𝑒 𝑖 ≀𝑑 = 𝑖 𝑒 βˆ’| π‘₯ 𝑖 |/𝑑 = 𝑒 βˆ’ π‘₯ 𝟏 /𝑑 Constant distortion w.h.p. In reality: slightly different parameters General π’Œ: sample 𝑒 𝑖 ~max 1 π‘˜ ,Exp(1)

22 Detour: ANN for Orlicz norms
Reminder: for convex 𝐺:𝑅→ 𝑅 + with 𝐺 0 =0, define a norm whose unit ball is π‘₯ 𝑖 𝐺 π‘₯ 𝑖 ≀1 (e.g., 𝐺 𝑑 = 𝑑 𝑝 gives 𝑙 𝑝 norms). Embedding into 𝑙 ∞ (as before, 𝑂(1) distortion w.h.p.): Sample i.i.d. 𝑒 1 , 𝑒 2 ,…, 𝑒 𝑑 ~𝓓 Embed 𝑓: π‘₯ 1 , π‘₯ 2 ,…, π‘₯ 𝑑 ↦ π‘₯ 1 𝑒 1 , π‘₯ 2 𝑒 2 ,…, π‘₯ 𝑑 𝑒 𝑑 Pr π‘‹βˆΌπ’Ÿ [𝑋≀𝑑] =1βˆ’ 𝑒 βˆ’πΊ(𝑑) A special case for 𝑙 𝑝 norms appeared in [Andoni 2009]

23 Where are we? Can solve ANN for ⨁ 𝑙 ∞ ⨁ 𝑙 1 𝑋 𝑖𝑗 , where 𝑋 𝑖𝑗 is 𝑅 𝑑 equipped with a top- π‘˜ 𝑖𝑗 norm What remains to be done? Embed a 𝑑-dimensional symmetric norm into ( 𝑑 𝑂( log log 𝑑 ) -dimensional) ⨁ 𝑙 ∞ ⨁ 𝑙 1 𝑋 𝑖𝑗

24 Starting point: embedding any norm into 𝑙 ∞
For a normed space 𝑋 and πœ€>0 there is linear 𝑓:𝑋→ 𝑙 ∞ 𝑑 β€² s.t. 𝑓 π‘₯ ∞ ∈ 1Β±πœ€ β‹… π‘₯ 𝑋 A normed space 𝑋 βˆ— dual to 𝑋: 𝑦 𝑋 βˆ— = sup π‘₯ 𝑋 ≀1 〈π‘₯,𝑦βŒͺ . Dual to 𝑙 𝑝 is 𝑙 π‘ž where 1 𝑝 + 1 π‘ž =1 ( 𝑙 1 vs. 𝑙 ∞ , 𝑙 2 vs. 𝑙 2 , etc.). Claim: for every π‘₯βˆˆπ‘‹, have: π‘₯ 𝑋 ∈ 1Β±πœ€ β‹… max π‘¦βˆˆN | π‘₯, 𝑦 | , where 𝑁 is an 𝜺-net of 𝐁 𝑿 βˆ— (wrt 𝑋 βˆ— ) Immediately gives an embedding

25 Proof For every π‘¦βˆˆπ‘, have π‘₯,𝑦 ≀ π‘₯ 𝑋 β‹… 𝑦 𝑋 βˆ— ≀ π‘₯ 𝑋 ,
π‘₯,𝑦 ≀ π‘₯ 𝑋 β‹… 𝑦 𝑋 βˆ— ≀ π‘₯ 𝑋 , thus, max π‘¦βˆˆπ‘ |〈π‘₯, 𝑦βŒͺ| ≀ π‘₯ 𝑋 . There exists 𝑦 such that 𝑦 𝑋 βˆ— ≀1 and π‘₯,𝑦 = π‘₯ 𝑋 Non-trivial, requires Hahn–Banach theorem Move 𝑦 to the closest 𝑦 β€² βˆˆπ‘ Get π‘₯,𝑦′ β‰₯ 1βˆ’πœ€ β‹… π‘₯ 𝑋 Thus, max π‘¦βˆˆπ‘ |〈π‘₯, 𝑦βŒͺ| β‰₯ 1βˆ’πœ€ β‹… π‘₯ 𝑋 Can take 𝑁 = 1 πœ€ 𝑂(𝑑) by the volume argument

26 Better embeddings for symmetric norms
Recap: can’t embed even 𝑙 2 𝑑 into 𝑙 ∞ 𝑑′ unless 𝑑 β€² = 2 Ξ©(𝑑) Instead, aim at embedding a symmetric norm into ⨁ 𝑙 ∞ ⨁ 𝑙 1 𝑋 𝑖𝑗 High level idea: a new space is more forgiving and allows to consider an πœ€-net of 𝐡 𝑋 βˆ— up to a symmetry Show that there is an πœ€-net that is a result of applying symmetries to merely 𝑑 𝑂( log log 𝑑 ) vectors!

27 Exploiting symmetry For a vector π‘₯, πœ‹βˆˆ 𝑆 𝑑 , and 𝜎∈{βˆ’1, 1 } 𝑑 , denote π‘₯ πœ‹,𝜎 be π‘₯ with coordinates permuted according to πœ‹ and signs flipped according to 𝜎 Recap: π‘₯ πœ‹, 𝜎 𝑋 = π‘₯ 𝑋 Suppose that 𝑁 β€² is an πœ€-net for 𝐡 𝑋 βˆ— intersect β„’= 𝑦 𝑦 1 β‰₯ 𝑦 2 β‰₯…β‰₯ 𝑦 𝑑 β‰₯0 Then, π‘₯ 𝑋 ∈ 1Β±πœ€ β‹… max π‘¦βˆˆ 𝑁 β€² ,πœ‹,𝜎 π‘₯, 𝑦 πœ‹,𝜎 = 1Β±πœ€ β‹… max π‘¦βˆˆπ‘β€² max πœ‹,𝜎 〈π‘₯, 𝑦 πœ‹,𝜎 βŒͺ = 1Β±πœ€ β‹… max π‘¦βˆˆπ‘β€² π‘₯ 𝑦 = 1Β±πœ€ β‹… max π‘¦βˆˆπ‘β€² π‘˜ 𝑦 π‘˜ βˆ’ 𝑦 π‘˜+1 β‹… (topβˆ’π‘˜ norm of π‘₯) max π‘¦βˆˆπ‘β€² π‘˜ 𝑦 π‘˜ βˆ’ 𝑦 π‘˜+1 β‹… (topβˆ’π‘˜ norm of π‘₯)

28 Small nets What remains to be done: an πœ€-net for
𝐡 𝑋 βˆ— ∩ 𝑦 𝑦 1 β‰₯ 𝑦 2 β‰₯…β‰₯ 𝑦 𝑑 β‰₯0 of size 𝑑 𝑂 πœ€ ( log log 𝑑 ) Will see a weaker bound of 𝑑 𝑂 πœ€ ( log 𝑑 ) , still non-trivial Volume bound fails Instead, a simple explicit construction

29 Small nets: continued Want to approximate a vector π‘¦βˆˆ 𝐡 𝑋 βˆ— with 𝑦 1 β‰₯ 𝑦 2 β‰₯…β‰₯ 𝑦 𝑑 β‰₯0 Zero all 𝑦 𝑖 ’s that are smaller than 𝑦 1 /poly 𝑑 Round all coordinates to a nearest power of 1+πœ€ O πœ€ (log 𝑑 ) scales Only cardinality of each scale matters 𝑑 𝑂 πœ€ ( log 𝑑 ) vectors total Can be improved to 𝑑 𝑂 πœ€ ( log log 𝑑 ) by one more trick

30 Quick summary Embed a symmetric norm into a 𝑑 log log 𝑑 -dimensional product space of top- π‘˜ norms Use known techniques to reduce the ANN problem on the product space to ANN for the top-π‘˜ norm Uses truncated exponential random variables to embed the top-π‘˜ norm into 𝑙 ∞ and use a known ANN data structure there

31 Two immediate open questions
Improve the dependence on 𝑑 from 𝑑 log log 𝑑 to 𝑑 𝑂(1) Better πœ€-net for 𝐡 𝑋 βˆ— ∩ 𝑦 𝑦 1 β‰₯ 𝑦 2 β‰₯…β‰₯ 𝑦 𝑑 β‰₯0 Looks doable Improve approximation from log log 𝑛 𝑂 1 to O(log log 𝑑) Beyond log log 𝑑 is hard due to 𝑙 ∞ Need to bypass ANN for product spaces Maybe randomized embedding into low-dimensional 𝑙 ∞ for any symmetric norm?

32 General norms Exists an embedding into 𝑙 2 with distortion 𝑂 𝑑
Universal 𝑑 log log 𝑑 -dimensional space that can host all 𝑑-dimensional symmetric norms Impossible for general norms even for randomized embeddings: even distortion 𝑑 requires dimension 2 𝑑 Ξ©(1) Stronger hardness results? Implied by: there is a family of spectral expanders that embed with distortion 𝑂 1 into some log 𝑂(1) 𝑛 -dimensional norm, where 𝑛 is the number of nodes [Naor 2016]

33 The main open question Thanks!
Is there an efficient ANN algorithm for general high-dimensional norms with approximation 𝑑 π‘œ(1) ? There is hope… Thanks!


Download ppt "Approximate Near Neighbors for General Symmetric Norms"

Similar presentations


Ads by Google