Download presentation
Presentation is loading. Please wait.
Published byJasmin May Modified over 6 years ago
1
Approximate Near Neighbors for General Symmetric Norms
Ilya Razenshteyn (MIT CSAIL) joint with Alexandr Andoni (Columbia University) Aleksandar Nikolov (University of Toronto) Erik Waingarten (Columbia University) arXiv:
2
Nearest Neighbor Search
Motivation Data Feature vector space + distance function Data analysis Geometry / Linear Algebra / Optimization Similarity search Nearest Neighbor Search
3
An example Word embeddings
Yale Harvard graduate faculty undergraduate Juilliard university undergraduates Cornell MIT Word embeddings High-dimensional vectors that capture semantic similarity between words (and more) GloVe [Pennigton, Socher, Manning 2014], 400K words, 300 dimensions Ten nearest neighbors for βNYUβ?
4
Approximate Near Neighbors (ANN)
Dataset: π points in a metric space π (denote by π) Approximation π>1, distance threshold π>0 Query: πβπ such that there is π β βπ with π π (π, π β )β€π Want: π βπ such that π π (π, π )β€ππ Parameters: space, query time ππ π π
5
This talk: a metric on π
π , where π log π β€πβ€ π π(1)
FAQ Focus of this talk Q: why approximation? A: the exact case is hard for the high-dimensional problem. Q: what does βhigh-dimensionalβ mean? A: when π=π( log π ), where π is the dimension of a metric. Q: how is the dimension defined? A: a metric is typically defined on π
π ; alternatively, doubling dimension, etc. Must depend on π as 2 π(π) , ideally as π π(1) This talk: a metric on π
π , where π log π β€πβ€ π π(1)
6
Which distance function to use?
πΌ A distance function Must capture semantic similarity well Must be algorithmically tractable Word embeddings, etc.: cosine similarity The goal: classify metrics according to the complexity of high-dimensional ANN For theory: a poorly-understood property of a metric For practice: universal algorithm for ANN
7
High-dimensional norms
An important case: π is a normed space π π π₯ 1 , π₯ 2 = π₯ 1 β π₯ 2 , where β
: π
π β π
+ is such that π₯ =0 iff π₯=0 πΌπ₯ =|πΌ| π₯ π₯ 1 + π₯ 2 β€ π₯ π₯ 2 Lots of tools (linear functional analysis) [Andoni, Krauthgamer, R 2015] characterizes norms that allow efficient sketching (succinct summarization), which implies efficient ANN Approximation O π is easy (Johnβs theorem)
8
Unit balls A norm can be given by its unit ball π΅ π = π₯β π
π π₯ β€1 π₯ β2
Claim: π΅ π is a symmetric convex body Claim: any such body can be a unit ball π₯ πΎ = inf π‘>0 π₯ π‘ βπΎ What property of a convex body makes ANN wrt it tractable? Johnβs theorem: any symmetric convex body is close to an ellipsoid (gives approximation π ) π΅ π π₯ β2
9
Our result Invariant under permutation of coordinates and changing signs π= 2 π log π log log π If π is a symmetric normed space, and π= π π(1) , can solve ANN with: Approximation π(1) Space π 1+π(1) Query time π π(1) log log π π(1)
10
Examples Usual π π norms π₯ π = π π₯ π π 1 π
Top-π norm: sum of π largest absolute values of coordinates Interpolates between π 1 and π β Orlicz norms: a unit ball is π₯β π
π π πΊ π₯ π β€1 , Where πΊ(β
) is convex and non-negative, and πΊ 0 =0. Gives π π norms for πΊ π‘ = π‘ π π-support norm, box-Ξ norm, πΎ-functional (arise in probability and machine learning) π‘ πΊ(π‘)
11
Prior work: symmetric norms
[Blasiok, Braverman, Chestnut, Krauthgamer, Yang 2015]: classification of symmetric norms according to their streaming complexity Depends on how well the norm concentrates on the Euclidean ball Unlike streaming, ANN is always tractable
12
Prior work: ANN Mostly, focus on π 1 (Hamming/Manhattan) and π 2 (Euclidean) norms Work for many applications Allow efficient algorithms based on hashing Locality-Sensitive Hashing [Indyk, Motwani 1998] [Andoni, Indyk 2006] Data-dependent LSH [Andoni, Indyk, Nguyen, R 2014] [Andoni, R 2015] [Andoni, Laarhoven, R, Waingarten 2017]: tight trade-off between space and query time for every π>1 Few results for other norms ( π β , general π π , will see later)
13
ANN for π β [Indyk 1998] ANN for π-dimensional π β :
Space πβ
π 1+π Query time π(π log π ) Approximation π( π β1 β
log log π ) Main idea: recursive partitioning βSmallβ ball with Ξ©(π) points β easy No such balls β there is a βgoodβ cut wrt some coordinate [Andoni, Croitoru, Patrascu 2008] [Kapralov, Panigrahy 2012]: Approximation π log log π is tight for decision trees!
14
π π π π , π π /πΆβ€ π π π,π β€ π π (π π , π π )
Metric embeddings A map π:πβπ is an embedding with distortion πͺ, if for π,πβπ: π π π π , π π /πΆβ€ π π π,π β€ π π (π π , π π ) Reductions for geometric problems π(π) π(π) π π π a π ANN with approximation π· for π ANN with approximation πΆπ· for π
15
Embedding norms into π β
For a normed space π and π>0 there exists π:πβ π β π β² with π π₯ β β 1Β±π β
π₯ π Proof idea: π₯ π β max π¦βπ π₯, π¦ Take all directions and discretize (more details later) Can we combine it with ANN for π β and obtain ANN for any norm? No! Discretization requires π β² = 1 π π(π) . Tight even for π 2 . Approximation π log log π β² =π log π βͺ π .
16
The strategy ο ο ο What Where Dimension Any norm π β π β² π β² = 2 π(π)
π β π β² π β² = 2 π(π) ο ο Symmetric norm β¨ π β β¨ π 1 topβ π ππ norm π β² = π π( log log π ) ο Bypass non-embeddability into low-dimensional π β allowing a more complicated host space, which is still tractable
17
π π -direct sums of metric spaces
For metrics π 1 , π 2 , β¦, π π‘ , define β¨ π π π π as follows: The ground set is π 1 Γ π 2 Γβ¦Γ π π‘ The distance is: π π₯ 1 , π₯ 2 ,β¦, π₯ π‘ , ( π¦ 1 , π¦ 2 ,β¦, π¦ π‘ ) = (π π₯ 1 , π¦ 1 , π π₯ 2 , π¦ 2 ,β¦,π( π₯ π‘ , π¦ π‘ ) π Example: β¨ π π π π β cascaded norms Our host space: β¨ π β β¨ π 1 π ππ , where π ππ is π
π equipped with the top- π ππ norm Outer sum is of size π π( log log π ) Inner sum is of size π
18
Two necessary steps Embed a symmetric norm into β¨ π β β¨ π 1 π ππ
Solve ANN for β¨ π β β¨ π 1 π ππ Prior work on ANN via product spaces: for Frechet distance [Indyk 2002], edit distance [Indyk 2004], and Ulam distance [Andoni, Indyk, Krauthgamer 2009]
19
ANN for β¨ π β β¨ π 1 π ππ [Indyk 2002], [Andoni 2009]: if for π 1 , π 2 , β¦, π π‘ there are data structures for π-ANN, then for β¨ π π π π one can get π(π log log π )-ANN with almost the same time and space A powerful generalization of ANN for π β [Indyk 1998] Trivially implies ANN for general π π Thus, enough to handle ANN for π ππ (top-π norms)!
20
ANN for top-π norms Include π 1 and π β , thus, need a unified approach Idea: embed a top-π norm into π β π β² and use [Indyk 1998] Approximation: distortion Γ O(log log πβ² ) Problem: π 1 requires 2 Ξ©(π) -dimensional π β Solution: use randomized embeddings
21
Embedding top-π norm into π β
The case π=π (that is, π 1 ) Embedding (uses min-stability of exponential distribution): Sample i.i.d. π’ 1 , π’ 2 ,β¦, π’ π ~Exp(1) Embed π: π₯ 1 , π₯ 2 ,β¦, π₯ π β¦ π₯ 1 π’ 1 , π₯ 2 π’ 2 ,β¦, π₯ π π’ π Pr π π₯ β β€π‘ = π Pr π₯ π π’ π β€π‘ = π π β| π₯ π |/π‘ = π β π₯ π /π‘ Constant distortion w.h.p. In reality: slightly different parameters General π: sample π’ π ~max 1 π ,Exp(1)
22
Detour: ANN for Orlicz norms
Reminder: for convex πΊ:π
β π
+ with πΊ 0 =0, define a norm whose unit ball is π₯ π πΊ π₯ π β€1 (e.g., πΊ π‘ = π‘ π gives π π norms). Embedding into π β (as before, π(1) distortion w.h.p.): Sample i.i.d. π’ 1 , π’ 2 ,β¦, π’ π ~π Embed π: π₯ 1 , π₯ 2 ,β¦, π₯ π β¦ π₯ 1 π’ 1 , π₯ 2 π’ 2 ,β¦, π₯ π π’ π Pr πβΌπ [πβ€π‘] =1β π βπΊ(π‘) A special case for π π norms appeared in [Andoni 2009]
23
Where are we? Can solve ANN for β¨ π β β¨ π 1 π ππ , where π ππ is π
π equipped with a top- π ππ norm What remains to be done? Embed a π-dimensional symmetric norm into ( π π( log log π ) -dimensional) β¨ π β β¨ π 1 π ππ
24
Starting point: embedding any norm into π β
For a normed space π and π>0 there is linear π:πβ π β π β² s.t. π π₯ β β 1Β±π β
π₯ π A normed space π β dual to π: π¦ π β = sup π₯ π β€1 β©π₯,π¦βͺ . Dual to π π is π π where 1 π + 1 π =1 ( π 1 vs. π β , π 2 vs. π 2 , etc.). Claim: for every π₯βπ, have: π₯ π β 1Β±π β
max π¦βN | π₯, π¦ | , where π is an πΊ-net of π πΏ β (wrt π β ) Immediately gives an embedding
25
Proof For every π¦βπ, have π₯,π¦ β€ π₯ π β
π¦ π β β€ π₯ π ,
π₯,π¦ β€ π₯ π β
π¦ π β β€ π₯ π , thus, max π¦βπ |β©π₯, π¦βͺ| β€ π₯ π . There exists π¦ such that π¦ π β β€1 and π₯,π¦ = π₯ π Non-trivial, requires HahnβBanach theorem Move π¦ to the closest π¦ β² βπ Get π₯,π¦β² β₯ 1βπ β
π₯ π Thus, max π¦βπ |β©π₯, π¦βͺ| β₯ 1βπ β
π₯ π Can take π = 1 π π(π) by the volume argument
26
Better embeddings for symmetric norms
Recap: canβt embed even π 2 π into π β πβ² unless π β² = 2 Ξ©(π) Instead, aim at embedding a symmetric norm into β¨ π β β¨ π 1 π ππ High level idea: a new space is more forgiving and allows to consider an π-net of π΅ π β up to a symmetry Show that there is an π-net that is a result of applying symmetries to merely π π( log log π ) vectors!
27
Exploiting symmetry For a vector π₯, πβ π π , and πβ{β1, 1 } π , denote π₯ π,π be π₯ with coordinates permuted according to π and signs flipped according to π Recap: π₯ π, π π = π₯ π Suppose that π β² is an π-net for π΅ π β intersect β= π¦ π¦ 1 β₯ π¦ 2 β₯β¦β₯ π¦ π β₯0 Then, π₯ π β 1Β±π β
max π¦β π β² ,π,π π₯, π¦ π,π = 1Β±π β
max π¦βπβ² max π,π β©π₯, π¦ π,π βͺ = 1Β±π β
max π¦βπβ² π₯ π¦ = 1Β±π β
max π¦βπβ² π π¦ π β π¦ π+1 β
(topβπ norm of π₯) max π¦βπβ² π π¦ π β π¦ π+1 β
(topβπ norm of π₯)
28
Small nets What remains to be done: an π-net for
π΅ π β β© π¦ π¦ 1 β₯ π¦ 2 β₯β¦β₯ π¦ π β₯0 of size π π π ( log log π ) Will see a weaker bound of π π π ( log π ) , still non-trivial Volume bound fails Instead, a simple explicit construction
29
Small nets: continued Want to approximate a vector π¦β π΅ π β with π¦ 1 β₯ π¦ 2 β₯β¦β₯ π¦ π β₯0 Zero all π¦ π βs that are smaller than π¦ 1 /poly π Round all coordinates to a nearest power of 1+π O π (log π ) scales Only cardinality of each scale matters π π π ( log π ) vectors total Can be improved to π π π ( log log π ) by one more trick
30
Quick summary Embed a symmetric norm into a π log log π -dimensional product space of top- π norms Use known techniques to reduce the ANN problem on the product space to ANN for the top-π norm Uses truncated exponential random variables to embed the top-π norm into π β and use a known ANN data structure there
31
Two immediate open questions
Improve the dependence on π from π log log π to π π(1) Better π-net for π΅ π β β© π¦ π¦ 1 β₯ π¦ 2 β₯β¦β₯ π¦ π β₯0 Looks doable Improve approximation from log log π π 1 to O(log log π) Beyond log log π is hard due to π β Need to bypass ANN for product spaces Maybe randomized embedding into low-dimensional π β for any symmetric norm?
32
General norms Exists an embedding into π 2 with distortion π π
Universal π log log π -dimensional space that can host all π-dimensional symmetric norms Impossible for general norms even for randomized embeddings: even distortion π requires dimension 2 π Ξ©(1) Stronger hardness results? Implied by: there is a family of spectral expanders that embed with distortion π 1 into some log π(1) π -dimensional norm, where π is the number of nodes [Naor 2016]
33
The main open question Thanks!
Is there an efficient ANN algorithm for general high-dimensional norms with approximation π π(1) ? There is hopeβ¦ Thanks!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.