Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Slides:



Advertisements
Similar presentations
1/15 Agnostically learning halfspaces FOCS /15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ )
Advertisements

Quantum Lower Bounds The Polynomial and Adversary Methods Scott Aaronson September 14, 2001 Prelim Exam Talk.
Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT Joint work with Piotr Indyk.
On the k-Independence Required by Linear Probing and Minwise Independence Mihai P ă trașcuMikkel Thorup ICALP10.
Extracting Randomness From Few Independent Sources Boaz Barak, IAS Russell Impagliazzo, UCSD Avi Wigderson, IAS.
Inapproximability of MAX-CUT Khot,Kindler,Mossel and O ’ Donnell Moshe Ben Nehemia June 05.
Overcoming the L 1 Non- Embeddability Barrier Robert Krauthgamer (Weizmann Institute) Joint work with Alexandr Andoni and Piotr Indyk (MIT)
Big Data Lecture 6: Locality Sensitive Hashing (LSH)
MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.
Kevin Matulef MIT Ryan O’Donnell CMU Ronitt Rubinfeld MIT Rocco Servedio Columbia.
MMDS Secs Slides adapted from: J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, October.
Ryan O’Donnell (CMU, IAS) joint work with Yi Wu (CMU, IBM), Yuan Zhou (CMU)
Deterministic Selection and Sorting Prepared by John Reif, Ph.D. Analysis of Algorithms.
An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh.
Dictator tests and Hardness of approximating Max-Cut-Gain Ryan O’Donnell Carnegie Mellon (includes joint work with Subhash Khot of Georgia Tech)
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 25, 2006
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 11 June 1, 2005
Given by: Erez Eyal Uri Klein Lecture Outline Exact Nearest Neighbor search Exact Nearest Neighbor search Definition Definition Low dimensions Low dimensions.
Analysis of Boolean Functions Fourier Analysis, Projections, Influence, Junta, Etc… And (some) applications Slides prepared with help of Ricky Rosen.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006
Putting a Junta to the Test Joint work with Eldar Fischer & Guy Kindler.
Analysis of Boolean Functions Fourier Analysis, Projections, Influence, Junta, Etc… Slides prepared with help of Ricky Rosen.
Fourier Analysis, Projections, Influences, Juntas, Etc…
Testing of Clustering Noga Alon, Seannie Dar Michal Parnas, Dana Ron.
1 Lecture 18 Syntactic Web Clustering CS
Ryan O'Donnell (CMU, IAS) Yi Wu (CMU, IBM) Yuan Zhou (CMU)
1 Streaming Computation of Combinatorial Objects Ziv Bar-Yossef U.C. Berkeley Omer Reingold AT&T Labs – Research Ronen.
Nearest Neighbor Retrieval Using Distance-Based Hashing Michalis Potamias and Panagiotis Papapetrou supervised by Prof George Kollios A method is proposed.
Fourier Analysis of Boolean Functions Juntas, Projections, Influences Etc.
Foundations of Privacy Lecture 11 Lecturer: Moni Naor.
Optimal Data-Dependent Hashing for Approximate Near Neighbors
1 Hardness Result for MAX-3SAT This lecture is given by: Limor Ben Efraim.
Logarithmic Functions and Their Graphs. Review: Changing Between Logarithmic and Exponential Form If x > 0 and 0 < b ≠ 1, then if and only if. This statement.
Asaf Cohen (joint work with Rami Atar) Department of Mathematics University of Michigan Financial Mathematics Seminar University of Michigan March 11,
Correlation testing for affine invariant properties on Shachar Lovett Institute for Advanced Study Joint with Hamed Hatami (McGill)
1 By: MOSES CHARIKAR, CHANDRA CHEKURI, TOMAS FEDER, AND RAJEEV MOTWANI Presented By: Sarah Hegab.
Beyond Locality Sensitive Hashing Alex Andoni (Microsoft Research) Joint with: Piotr Indyk (MIT), Huy L. Nguyen (Princeton), Ilya Razenshteyn (MIT)
Extremal Problems of Information Combining Alexei Ashikhmin  Information Combining: formulation of the problem  Mutual Information Function for the Single.
Locality Sensitive Hashing Basics and applications.
Sketching and Nearest Neighbor Search (2) Alex Andoni (Columbia University) MADALGO Summer School on Streaming Algorithms 2015.
© 2001 by Charles E. Leiserson Introduction to AlgorithmsDay 12 L8.1 Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 8 Prof. Charles E. Leiserson.
1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008.
October 5, 2005Copyright © by Erik D. Demaine and Charles E. LeisersonL7.1 Prof. Charles E. Leiserson L ECTURE 8 Hashing II Universal hashing Universality.
Optimal Data-Dependent Hashing for Nearest Neighbor Search Alex Andoni (Columbia University) Joint work with: Ilya Razenshteyn.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)
S IMILARITY E STIMATION T ECHNIQUES FROM R OUNDING A LGORITHMS Paper Review Jieun Lee Moses S. Charikar Princeton University Advanced Database.
KITPC Osamu Watanabe Tokyo Inst. of Tech. Finding Most-Likely Solution of the Perturbed k -Linear-Equation Problem k -Linear-Equation = k LIN 渡辺.
New Characterizations in Turnstile Streams with Applications
Vitaly Feldman and Jan Vondrâk IBM Research - Almaden
Lecture 22: Linearity Testing Sparse Fourier Transform
Introduction to Algorithms 6.046J/18.401J
General Strong Polarization
Communication Amid Uncertainty
Sublinear Algorithmic Tools 3
Lecture 11: Nearest Neighbor Search
COMS E F15 Lecture 2: Median trick + Chernoff, Distinct Count, Impossibility Results Left to the title, a presenter can insert his/her own image.
Lecture 10: Sketching S3: Nearest Neighbor Search
Tight Fourier Tails for AC0 Circuits
Near(est) Neighbor in High Dimensions
Linear sketching with parities
The Curve Merger (Dvir & Widgerson, 2008)
Linear sketching over
Overcoming the L1 Non-Embeddability Barrier
Linear sketching with parities
CSCI B609: “Foundations of Data Science”
Imperfectly Shared Randomness
Compact routing schemes with improved stretch
Minwise Hashing and Efficient Search
President’s Day Lecture: Advanced Nearest Neighbor Search
Presentation transcript:

Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Locality Sensitive Hashing [Indyk-Motwani '98] objectssketchesh : H : family of hash functions h s.t. “similar” objects collide w/ high prob. “dissimilar” objects collide w/ low prob.

Abbreviated history

Min-wise hash functions [Broder '98] A B word 1?word 2?word 3?word d? Jaccard similarity: Invented simple H s.t. Pr [h(A) = h(B)] =

Indyk-Motwani '98 Defined LSH. Invented very simple H good for {0, 1} d under Hamming distance. Showed good LSH implies good nearest-neighbor-search data structs.

Charikar '02, STOC Proposed alternate H (“simhash”) for Jaccard similarity. Patented by. GoogleGoogle GoogleGoogle

Many papers about LSH

PracticeTheory Free code base [AI’04] Sequence comparison in bioinformatics Association-rule finding in data mining Collaborative filtering Clustering nouns by meaning in NLP Pose estimation in vision [Tenesawa–Tanaka ’07] [Broder ’97] [Indyk–Motwani ’98] [Gionis–Indyk–Motwani ’98] [Charikar ’02] [Datar–Immorlica– –Indyk–Mirrokni ’04] [Motwani–Naor–Panigrahi ’06] [Andoni–Indyk ’06] [Neylon ’10] [Andoni–Indyk ’08, CACM]

Given: (X, dist), r > 0, c > 1 distance space“radius”“approx factor” Goal: Family H of functions X → S ( S can be any finite set) s.t. ∀ x, y ∈ X, ≥ p ≤ q ≥ q.5 ≥ q.25 ≥ q.1 ≥ q ρ ≤ r≥ cr

Theorem [IM’98, GIM’98] Given LSH family for (X, dist), can solve “(r,cr)-near-neighbor search” for n points with data structure of size: O(n 1+ρ ) query time: Õ(n ρ ) hash fcn evals.

Example X = {0,1} d, dist = Hamming r = εd, c = dist ≤ εd or ≥5εd H = { h 1, h 2, …, h d }, h i (x) = x i [IM’98] “output a random coord.”

Analysis = q = q ρ (1 − 5 ε ) 1/5 ≈ 1 − ε. ∴ ρ ≈ 1/5 (1 − 5 ε ) 1/5 ≤ 1 − ε. ∴ ρ ≤ 1/5 In general, achieves ρ ≤ 1/c, ∀ c ( ∀ r).

Optimal upper bound ( {0, 1} d, Ham ), r > 0, c > 1. S ≝ {0, 1} d ∪ { ✔ }, H ≝ {h ab : dist(a,b) ≤ r} h ab (x) = ✔ if x = a or x = b x otherwise = 0 positive = > 0.5 > 0.1 > 0.01 > ≤ r≥ cr

The End. Any questions?

Wait, what? Theorem [IM’98, GIM’98] Given LSH family for (X, dist), can solve “(r,cr)-near-neighbor search” for n points with data structure of size: O(n 1+ρ ) query time: Õ(n ρ ) hash fcn evals.

Wait, what? Theorem [IM’98, GIM’98] Given LSH family for (X, dist), can solve “(r,cr)-near-neighbor search” for n points with data structure of size: O(n 1+ρ ) query time: Õ(n ρ ) hash fcn evals. q ≥ n -o(1) ("not tiny")

More results For R d with ℓ p -distance: when p = 1, 0 < p < 1, p = 2 [IM’98][DIIM’04][AI’06] For Jaccard similarity: ρ ≤ 1/c [Bro’98] For {0,1} d with Hamming distance: −o d (1) (assuming q ≥ 2 −o(d) ) [MNP’06] immediately for ℓ p -distance

Our Theorem For {0,1} d with Hamming distance: −o d (1) (assuming q ≥ 2 −o(d) ) immediately for ℓ p -distance ( ∃ r s.t.) Proof also yields ρ ≥ 1/c for Jaccard.

Proof

Noise-stability is log-convex. :

Proof A definition, and two lemmas. :

Definition: Noise stability at e -т Fix any arbitrary function h : {0,1} d → S. Pick x ∈ {0,1} d at random: x =h(x) = s Flip each bit w.p. (1-e -2т )/2 independenttly y = h(y) = s’ def:

Lemma 1: Lemma 2: For x y, when τ ≪ 1. K h (τ) is a log-convex function of τ. (for any h) τ dist(x, y) = o(d) w.v.h.p. ≈ Proof: Chernoff bound and Taylor expansion. Proof uses Fourier analysis of Boolean functions. 0 τ log K h (τ)

Fourier transformation Theorem. f : {0, 1} d -> R can be uniquely written as where Proof. is an orthonormal basis of {f : {0, 1} d -> R}. Basis fcns. Fourier coef.

Lemma 2:K h (τ) is a log-convex function of τ. Proof: Let h i (x) = 1 h(x)=i.

=

Lemma 2:K h (τ) is a log-convex function of τ. Proof: Let h i (x) = 1 h(x)=i. log-convex fcns. non-neg comb. of

Lemma 1: Lemma 2: For x y, when τ ≪ 1. K h (τ) is a log-convex function of τ. (for any h) τ ≈ Theorem: LSH for {0,1} d requires. 0 τ log K h (τ) dist(x, y) = o(d) w.v.h.p.

Proof: Say H is an LSH family for {0,1} d with params (εd + o(d), cεd - o(d), q ρ, q). r (c − o(1) ) r def: ( Non-neg. lin. comb. of log-convex fcns. ∴ K H (τ) is also log-convex. ) w.v.h.p., dist(x,y) ≈ (1 - e -т )d ≈ тd ∴ K H (ε) ≳ q ρ K H (cε) ≲ q in truth, q+2 −Θ(d) ; we assume q not tiny

∴ K H (ε) ≳ K H (cε) ≲ ∴ K H (0) = ln 1 qρqρ q 0 ρ ln q ln q K H (τ) is log-convex 0 τ ln K H (τ) cεcε ln q ε ∴ 1 c ρ ln q ≤ ln q 1 c

The End. Any questions?