Dimension reduction techniques for lp (1<p<2), with applications Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University
Introduction for all u,v in S, Fundamental result in dimension reduction: Johnson-Lindenstrauss Lemma (JL-84) for Euclidean space. Given: set S of n points in Rd There exists: ƒ : Rd → Rk k = O( ln(n) / ε 2 ) for all u,v in S, ||u-v||2 ≤ ||f(u)-f(v)||2 ≤ (1+ε)||u-v||2
Introduction JL Lemma is specific to l2. Dimension reduction for other lp spaces? Impossible for l and l1. Not known for other lp spaces. This paper: Dimension reduction techniques for lp (1<p<2) Specifically, single scale and snowflake embeddings
JL transform for all u,v in S, Given: set S of n points in Rd There exists: ƒ : Rd → Rk k = O( ln(n) / ε 2 ) for all u,v in S, ||u-v||2 ≤ ||f(u)-f(v)||2 ≤ (1+ε)||u-v||2
JL transform Proof by (randomized) construction = f : Rd → Rk : multiply vectors by random d x k matrix Matrix entries can be {-1,1} or Gaussians g2 g1 g4 g3 g6 g5 = 3 2 1 24 2
JL transform Prove: with constant probability, for all u,v in S ║u-v║2 ≤ ║f(u)-f(v)║2 ≤ (1+ε) ║u-v║2 Observation: f is linear if w = u-v f(w) = f(u-v) = f(u)-f(v) Suffices to prove ║w║2 ≤ ║f(w)║2 ≤ (1+ε)║w║2
JL transform = Consider an embedding into R1, with G=N(0,1) Normals are 2-stable: If: X,Y ~ N(0,1) Then: aX ~ N(0,a2) Also: aX + bY ~ N(0,a2+b2) ~ √(a2+b2) N(0,1) So: ∑ wigi ~ √(∑ wi2) N(0,1) = ║w║2 N(0,1) g1 g2 g3 = c b a ag1 + bg2 + cg3
JL transform Even a single coordinate preserves magnitude. Each coordinate is distributed ~ ║w║2 N(0,1) So (up to scaling) E[║f(w)║2] = ║w║2 Need this to hold simultaneously for all point pairs Multiple coordinates: ║f(w)║22 ~ ║w║22 ∑ N2 (0,1) ~ χ2(k) Sum of k coordinates squared tightly concentrated around its mean Can demonstrate When k= ln(n) / ε2 all point pairs preserved simultaneously
Dimension reduction for lp? JL works well for l2. Let’s try to do the same thing for lp (1<p<2) Hint: won’t work… but will be instructive p-stable distributions: If: X,Y ~ Fp p≤2 Then: aX + bY ~ (ap+bp)1/p Fp [Johnson-Schechtman 82, Datar-Immorlica-Indyk-Mirrokni 04, Mendel-Naor 04]
Dimension reduction for lp? Suppose we embedded into R1, with G=Fp ║f(w)║p distributed as ║w║p Fp So (up to scaling) E[║f(w)║p] = ║w║p Multiple coordinates from lp into lp or lq (q≤p) ║f(w)║pp = ║w║pp ∑gp ║f(w)║pq = ║w║pq ∑gq Looks good! But what’s E[gp] and E[gq]?
p-stable distribution Familiar examples: Guassian: 2-stable Cauchy: 1-stable Density function Unimodal [SY-78, Y-78, H-84] Bell-shaped [G-84] Heavy-tailed when p<2: h(x) ≈ 1/(1+xp+1) When p<2, E[gq] = ∫0∞ xqh(x)dx ≈ ∫0∞ xq/(1+xp+1) ≈ ∫01 xqdx + ∫1∞ xq−(p+1)dx ≈ -x-(p-q) /(p-q) |1∞ 0<q<p E[gq] ≈ 1/(p-q) ← OK q≥p E[gq] ≈ ∞ ← Problem
Dimension reduction for lp? Problems using p-stables for dimension reduction Heavy tails for p<2 E[gp] When q<p, E[gq] is finite, but how many coordinates are needed?
Dimension reduction for lp? What’s known for non-Euclidean space? For l1 : Bounded range dimension reduction [OR-02] Dimension: O(R logn / ε3 ) Distortion: Distances in range [1,R] retained to (1+ε) Expansion: Distances <1 remain smaller Contraction: Distances >R remain larger Used as a subroutine for clustering, ANNS
Dimension reduction for lp? Our contributions for lp (1<p<2): Bounded range dimension reduction (lp lq q≤p) Dimension: Oε(R logn) Distortion: Distances in [1,R] retained to (1+ε) Expansion: Distances <1 remain smaller Contraction: Distances >R remain larger Snowflake embedding: ║x-y║p (1ε) ║x-y║pα α ≤ 1 Dimension: O(ddim2) Previously known only for l1, with dimension O(22ddim) Both embeddings have application to clustering.
Single scale dimension reduction Our single-coordinate embedding is as follows: f: Rd → R1 s: upper distance threshold (~ R) φ: random angle F(v) = Fφ,s(v) = s sin(φ + (1/s) ∑i givi) Motivated by [Mendel-Naor 04] Intuition: sin(ε) ≈ ε Small values retained Large values truncated
Single scale dimension reduction F(v) = Fφ,s(v) = s sin(φ + 1/s ∑i givi) E[|F(u)-F(v)|q] = sq E[|sin(φ + 1/s ∑i giui) - sin(φ + 1/s ∑I givi)|q] = c (2s)q E[|sin(1/(2s) ∑i gi(ui-vi)) cos(φ + 1/(2s) ∑I gi(ui+vi))|q] = c (2s)q E[|sin(1/(2s) ∑i gi(ui-vi))|q] Multiple dimensions: repeat n=sO(1)logn times, tight bounds using Bernstein’s inequality Final embedding: Threshold: ║F(u)-F(v) ║q = O(s) Distortion: when 1<w < εs ║F(u)-F(v) ║q ≈ ║(1+ε)u-v║p Expansion: when w < 1 ║F(u)-F(v) ║q < ║(1+ε)u-v║p
Snowflake embedding Snowflake embedding is created by concatenating many single-scale embeddings An idea due to Assouad (84) Need many properties of single scale: threshold, smoothness, fidelity. Thank you!