Dimension reduction techniques for lp (1<p<2), with applications

Slides:



Advertisements
Similar presentations
Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden.
Advertisements

Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.
Embedding Metric Spaces in Their Intrinsic Dimension Ittai Abraham, Yair Bartal*, Ofer Neiman The Hebrew University * also Caltech.
A Nonlinear Approach to Dimension Reduction Robert Krauthgamer Weizmann Institute of Science Joint work with Lee-Ad Gottlieb TexPoint fonts used in EMF.
Embedding the Ulam metric into ℓ 1 (Ενκρεβάτωση του μετρικού χώρου Ulam στον ℓ 1 ) Για το μάθημα “Advanced Data Structures” Αντώνης Αχιλλέως.
Algorithmic High-Dimensional Geometry 1 Alex Andoni (Microsoft Research SVC)
6.1 Vector Spaces-Basic Properties. Euclidean n-space Just like we have ordered pairs (n=2), and ordered triples (n=3), we also have ordered n-tuples.
Signal Spaces.
Chain Rules for Entropy
A Nonlinear Approach to Dimension Reduction Lee-Ad Gottlieb Weizmann Institute of Science Joint work with Robert Krauthgamer TexPoint fonts used in EMF.
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Institute) Robert Krauthgamer (Weizmann Institute) Ilya Razenshteyn (CSAIL MIT)
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 25, 2006
1 Maximal Independent Set. 2 Independent Set (IS): In a graph, any set of nodes that are not adjacent.
Geometry (Many slides adapted from Octavia Camps and Amitabh Varshney)
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Dimensionality Reduction and Embeddings
Dimensionality Reduction
Probability theory 2011 The multivariate normal distribution  Characterizing properties of the univariate normal distribution  Different definitions.
Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform Nir Ailon, Bernard Chazelle (Princeton University)
Expanders Eliyahu Kiperwasser. What is it? Expanders are graphs with no small cuts. The later gives several unique traits to such graph, such as: – High.
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Inst. / Columbia) Robert Krauthgamer (Weizmann Inst.) Ilya Razenshteyn (MIT, now.
Dimensionality Reduction
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Embedding and Sketching Alexandr Andoni (MSR). Definition by example  Problem: Compute the diameter of a set S, of size n, living in d-dimensional ℓ.
1 Entropy Waves, The Zigzag Graph Product, and New Constant-Degree Expanders Omer Reingold Salil Vadhan Avi Wigderson Lecturer: Oded Levy.
Kansas State University Department of Computing and Information Sciences CIS 736: Computer Graphics Monday, 26 January 2004 William H. Hsu Department of.
Diophantine Approximation and Basis Reduction
Information Complexity Lower Bounds for Data Streams David Woodruff IBM Almaden.
Magnitude of a Vector The magnitude of a vector, denoted as |a|, is defined as the square root of the sum of the squares of its components: |a| = (x 2.
Vectors Addition is commutative (vi) If vector u is multiplied by a scalar k, then the product ku is a vector in the same direction as u but k times the.
13 th Nov Geometry of Graphs and It’s Applications Suijt P Gujar. Topics in Approximation Algorithms Instructor : T Kavitha.
Lecture 5 Today, how to solve recurrences We learned “guess and proved by induction” We also learned “substitution” method Today, we learn the “master.
On the Impossibility of Dimension Reduction for Doubling Subsets of L p Yair Bartal Lee-Ad Gottlieb Ofer Neiman.
Basic Theory (for curve 01). 1.1 Points and Vectors  Real life methods for constructing curves and surfaces often start with points and vectors, which.
Summer School on Hashing’14 Dimension Reduction Alex Andoni (Microsoft Research)
1 Teaching Innovation - Entrepreneurial - Global The Centre for Technology enabled Teaching & Learning, N Y S S, India DTEL DTEL (Department for Technology.
Linear Algebra Review Tuesday, September 7, 2010.
Big Data Lecture 5: Estimating the second moment, dimension reduction, applications.
1 Entropy Waves, The Zigzag Graph Product, and New Constant-Degree Expanders Omer Reingold Salil Vadhan Avi Wigderson Lecturer: Oded Levy.
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Packing to fewer dimensions
Dimension reduction for finite trees in L1
Generalized Sparsest Cut and Embeddings of Negative-Type Metrics
Fast Dimension Reduction MMDS 2008
Vectors.
Ultra-low-dimensional embeddings of doubling metrics
Sublinear Algorithmic Tools 3
Lecture 03: Linear Algebra
Sublinear Algorithmic Tools 2
Packing to fewer dimensions
Lecture 10: Sketching S3: Nearest Neighbor Search
Sketching and Embedding are Equivalent for Norms
Turnstile Streaming Algorithms Might as Well Be Linear Sketches
Lecture 16: Earth-Mover Distance
Y. Kotidis, S. Muthukrishnan,
Near-Optimal (Euclidean) Metric Compression
Light Spanners for Snowflake Metrics
Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University
Bounds for Optimal Compressed Sensing Matrices
Functions of Random variables
Embedding and Sketching
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Dimension versus Distortion a.k.a. Euclidean Dimension Reduction
Lecture 15: Least Square Regression Metric Embeddings
Packing to fewer dimensions
Clustering.
Math review - scalars, vectors, and matrices
TWO DIMENSIONAL TRANSFORMATION
Sublinear Algorihms for Big Data
Presentation transcript:

Dimension reduction techniques for lp (1<p<2), with applications Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University

Introduction for all u,v in S, Fundamental result in dimension reduction: Johnson-Lindenstrauss Lemma (JL-84) for Euclidean space. Given: set S of n points in Rd There exists: ƒ : Rd → Rk k = O( ln(n) / ε 2 ) for all u,v in S, ||u-v||2 ≤ ||f(u)-f(v)||2 ≤ (1+ε)||u-v||2

Introduction JL Lemma is specific to l2. Dimension reduction for other lp spaces? Impossible for l and l1. Not known for other lp spaces. This paper: Dimension reduction techniques for lp (1<p<2) Specifically, single scale and snowflake embeddings

JL transform for all u,v in S, Given: set S of n points in Rd There exists: ƒ : Rd → Rk k = O( ln(n) / ε 2 ) for all u,v in S, ||u-v||2 ≤ ||f(u)-f(v)||2 ≤ (1+ε)||u-v||2

JL transform Proof by (randomized) construction = f : Rd → Rk : multiply vectors by random d x k matrix Matrix entries can be {-1,1} or Gaussians g2 g1 g4 g3 g6 g5 = 3 2 1 24 2

JL transform Prove: with constant probability, for all u,v in S ║u-v║2 ≤ ║f(u)-f(v)║2 ≤ (1+ε) ║u-v║2 Observation: f is linear if w = u-v f(w) = f(u-v) = f(u)-f(v) Suffices to prove ║w║2 ≤ ║f(w)║2 ≤ (1+ε)║w║2

JL transform = Consider an embedding into R1, with G=N(0,1) Normals are 2-stable: If: X,Y ~ N(0,1) Then: aX ~ N(0,a2) Also: aX + bY ~ N(0,a2+b2) ~ √(a2+b2) N(0,1) So: ∑ wigi ~ √(∑ wi2) N(0,1) = ║w║2 N(0,1) g1 g2 g3 = c b a ag1 + bg2 + cg3

JL transform Even a single coordinate preserves magnitude. Each coordinate is distributed ~ ║w║2 N(0,1) So (up to scaling) E[║f(w)║2] = ║w║2 Need this to hold simultaneously for all point pairs Multiple coordinates: ║f(w)║22 ~ ║w║22 ∑ N2 (0,1) ~ χ2(k) Sum of k coordinates squared tightly concentrated around its mean Can demonstrate When k= ln(n) / ε2 all point pairs preserved simultaneously

Dimension reduction for lp? JL works well for l2. Let’s try to do the same thing for lp (1<p<2) Hint: won’t work… but will be instructive p-stable distributions: If: X,Y ~ Fp p≤2 Then: aX + bY ~ (ap+bp)1/p Fp [Johnson-Schechtman 82, Datar-Immorlica-Indyk-Mirrokni 04, Mendel-Naor 04]

Dimension reduction for lp? Suppose we embedded into R1, with G=Fp ║f(w)║p distributed as ║w║p Fp So (up to scaling) E[║f(w)║p] = ║w║p Multiple coordinates from lp into lp or lq (q≤p) ║f(w)║pp = ║w║pp ∑gp ║f(w)║pq = ║w║pq ∑gq Looks good! But what’s E[gp] and E[gq]?

p-stable distribution Familiar examples: Guassian: 2-stable Cauchy: 1-stable Density function Unimodal [SY-78, Y-78, H-84] Bell-shaped [G-84] Heavy-tailed when p<2: h(x) ≈ 1/(1+xp+1) When p<2, E[gq] = ∫0∞ xqh(x)dx ≈ ∫0∞ xq/(1+xp+1) ≈ ∫01 xqdx + ∫1∞ xq−(p+1)dx ≈ -x-(p-q) /(p-q) |1∞ 0<q<p E[gq] ≈ 1/(p-q) ← OK q≥p E[gq] ≈ ∞ ← Problem

Dimension reduction for lp? Problems using p-stables for dimension reduction Heavy tails for p<2  E[gp]   When q<p, E[gq] is finite, but how many coordinates are needed?

Dimension reduction for lp? What’s known for non-Euclidean space? For l1 : Bounded range dimension reduction [OR-02] Dimension: O(R logn / ε3 ) Distortion: Distances in range [1,R] retained to (1+ε) Expansion: Distances <1 remain smaller Contraction: Distances >R remain larger Used as a subroutine for clustering, ANNS

Dimension reduction for lp? Our contributions for lp (1<p<2): Bounded range dimension reduction (lp  lq q≤p) Dimension: Oε(R logn) Distortion: Distances in [1,R] retained to (1+ε) Expansion: Distances <1 remain smaller Contraction: Distances >R remain larger Snowflake embedding: ║x-y║p  (1ε) ║x-y║pα α ≤ 1 Dimension: O(ddim2) Previously known only for l1, with dimension O(22ddim) Both embeddings have application to clustering.

Single scale dimension reduction Our single-coordinate embedding is as follows: f: Rd → R1 s: upper distance threshold (~ R) φ: random angle F(v) = Fφ,s(v) = s  sin(φ + (1/s) ∑i givi) Motivated by [Mendel-Naor 04] Intuition: sin(ε) ≈ ε Small values retained Large values truncated

Single scale dimension reduction F(v) = Fφ,s(v) = s sin(φ + 1/s ∑i givi) E[|F(u)-F(v)|q] = sq E[|sin(φ + 1/s ∑i giui) - sin(φ + 1/s ∑I givi)|q] = c (2s)q E[|sin(1/(2s) ∑i gi(ui-vi)) cos(φ + 1/(2s) ∑I gi(ui+vi))|q] = c (2s)q E[|sin(1/(2s) ∑i gi(ui-vi))|q] Multiple dimensions: repeat n=sO(1)logn times, tight bounds using Bernstein’s inequality Final embedding: Threshold: ║F(u)-F(v) ║q = O(s) Distortion: when 1<w < εs ║F(u)-F(v) ║q ≈ ║(1+ε)u-v║p Expansion: when w < 1 ║F(u)-F(v) ║q < ║(1+ε)u-v║p

Snowflake embedding Snowflake embedding is created by concatenating many single-scale embeddings An idea due to Assouad (84) Need many properties of single scale: threshold, smoothness, fidelity. Thank you!