Ryan O’Donnell (CMU, IAS) joint work with Yi Wu (CMU, IBM), Yuan Zhou (CMU)

Slides:



Advertisements
Similar presentations
Estimating Distinct Elements, Optimally
Advertisements

Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT Joint work with Piotr Indyk.
Extracting Randomness From Few Independent Sources Boaz Barak, IAS Russell Impagliazzo, UCSD Avi Wigderson, IAS.
An Approximate Truthful Mechanism for Combinatorial Auctions An Internet Mathematics paper by Aaron Archer, Christos Papadimitriou, Kunal Talwar and Éva.
Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.
Big Data Lecture 6: Locality Sensitive Hashing (LSH)
MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.
Kevin Matulef MIT Ryan O’Donnell CMU Ronitt Rubinfeld MIT Rocco Servedio Columbia.
MMDS Secs Slides adapted from: J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, October.
A polylogarithmic approximation of the minimum bisection Robert Krauthgamer The Hebrew University Joint work with Uri Feige.
1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.
Dictator tests and Hardness of approximating Max-Cut-Gain Ryan O’Donnell Carnegie Mellon (includes joint work with Subhash Khot of Georgia Tech)
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 25, 2006
Given by: Erez Eyal Uri Klein Lecture Outline Exact Nearest Neighbor search Exact Nearest Neighbor search Definition Definition Low dimensions Low dimensions.
Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
Testing of Clustering Noga Alon, Seannie Dar Michal Parnas, Dana Ron.
1 Lecture 18 Syntactic Web Clustering CS
Oded Regev Tel-Aviv University On Lattices, Learning with Errors, Learning with Errors, Random Linear Codes, Random Linear Codes, and Cryptography and.
Ryan O'Donnell (CMU, IAS) Yi Wu (CMU, IBM) Yuan Zhou (CMU)
1 Sampling Lower Bounds via Information Theory Ziv Bar-Yossef IBM Almaden.
CISS Princeton, March Optimization via Communication Networks Matthew Andrews Alcatel-Lucent Bell Labs.
1 Streaming Computation of Combinatorial Objects Ziv Bar-Yossef U.C. Berkeley Omer Reingold AT&T Labs – Research Ronen.
1 Chapter 5 A Measure of Information. 2 Outline 5.1 Axioms for the uncertainty measure 5.2 Two Interpretations of the uncertainty function 5.3 Properties.
Nearest Neighbor Retrieval Using Distance-Based Hashing Michalis Potamias and Panagiotis Papapetrou supervised by Prof George Kollios A method is proposed.
Quantum Algorithms II Andrew C. Yao Tsinghua University & Chinese U. of Hong Kong.
Optimal Data-Dependent Hashing for Approximate Near Neighbors
Asaf Cohen (joint work with Rami Atar) Department of Mathematics University of Michigan Financial Mathematics Seminar University of Michigan March 11,
CALCULUS II Chapter Sequences A sequence can be thought as a list of numbers written in a definite order.
Ragesh Jaiswal Indian Institute of Technology Delhi Threshold Direct Product Theorems: a survey.
1 By: MOSES CHARIKAR, CHANDRA CHEKURI, TOMAS FEDER, AND RAJEEV MOTWANI Presented By: Sarah Hegab.
6.853: Topics in Algorithmic Game Theory Fall 2011 Constantinos Daskalakis Lecture 21.
RA PRESENTATION Sublinear Geometric Algorithms B 張譽馨 B 汪牧君 B 李元翔.
Beyond Locality Sensitive Hashing Alex Andoni (Microsoft Research) Joint with: Piotr Indyk (MIT), Huy L. Nguyen (Princeton), Ilya Razenshteyn (MIT)
Expanders via Random Spanning Trees R 許榮財 R 黃佳婷 R 黃怡嘉.
Joint work with Yuval Peres, Mikkel Thorup, Peter Winkler and Uri Zwick Overhang Bounds Mike Paterson DIMAP & Dept of Computer Science University of Warwick.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Locality Sensitive Hashing Basics and applications.
NEAREST NEIGHBORS ALGORITHM Lecturer: Yishay Mansour Presentation: Adi Haviv and Guy Lev 1.
Markov Chains and Random Walks. Def: A stochastic process X={X(t),t ∈ T} is a collection of random variables. If T is a countable set, say T={0,1,2, …
Sketching and Nearest Neighbor Search (2) Alex Andoni (Columbia University) MADALGO Summer School on Streaming Algorithms 2015.
© 2001 by Charles E. Leiserson Introduction to AlgorithmsDay 12 L8.1 Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 8 Prof. Charles E. Leiserson.
Linear Program Set Cover. Given a universe U of n elements, a collection of subsets of U, S = {S 1,…, S k }, and a cost function c: S → Q +. Find a minimum.
October 5, 2005Copyright © by Erik D. Demaine and Charles E. LeisersonL7.1 Prof. Charles E. Leiserson L ECTURE 8 Hashing II Universal hashing Universality.
Andrea CLEMENTI Radio Networks The Model Broadcast.
Rate Distortion Theory. Introduction The description of an arbitrary real number requires an infinite number of bits, so a finite representation of a.
Optimal Data-Dependent Hashing for Nearest Neighbor Search Alex Andoni (Columbia University) Joint work with: Ilya Razenshteyn.
March 7, Using Pattern Recognition Techniques to Derive a Formal Analysis of Why Heuristic Functions Work B. John Oommen A Joint Work with Luis.
Jeffrey D. Ullman Stanford University. 2  Generalized LSH is based on some kind of “distance” between points.  Similar points are “close.”  Example:
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
An algorithmic proof of the Lovasz Local Lemma via resampling oracles Jan Vondrak IBM Almaden TexPoint fonts used in EMF. Read the TexPoint manual before.
Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)
S IMILARITY E STIMATION T ECHNIQUES FROM R OUNDING A LGORITHMS Paper Review Jieun Lee Moses S. Charikar Princeton University Advanced Database.
Krishnendu ChatterjeeFormal Methods Class1 MARKOV CHAINS.
New Characterizations in Turnstile Streams with Applications
Chapter 8 Infinite Series.
Introduction to Algorithms 6.046J/18.401J
Communication Amid Uncertainty
Sublinear Algorithmic Tools 3
Lecture 11: Nearest Neighbor Search
COMS E F15 Lecture 2: Median trick + Chernoff, Distinct Count, Impossibility Results Left to the title, a presenter can insert his/her own image.
Near(est) Neighbor in High Dimensions
Overcoming the L1 Non-Embeddability Barrier
Imperfectly Shared Randomness
CS5112: Algorithms and Data Structures for Applications
Compact routing schemes with improved stretch
Minwise Hashing and Efficient Search
President’s Day Lecture: Advanced Nearest Neighbor Search
Contagious sets in random graphs
Presentation transcript:

Ryan O’Donnell (CMU, IAS) joint work with Yi Wu (CMU, IBM), Yuan Zhou (CMU)

Locality Sensitive Hashing [Indyk–Motwani ’98] objectssketchesh : H : family of hash functions h s.t. “similar” objects collide w/ high prob. “dissimilar” objects collide w/ low prob.

Abbreviated history

A Broder ’97, Altavista B word 1?word 2?word 3?word d? Jaccard similarity: Invented simple H s.t. Pr [h(A) = h(B)] =

Indyk–Motwani ’98 (cf. Gionis–I–M ’98) Defined LSH. Invented very simple H good for {0, 1} d under Hamming distance. Showed good LSH implies good nearest-neighbor-search data structs.

Charikar ’02, STOC Proposed alternate H (“simhash”) for Jaccard similarity.

Many papers about LSH

PracticeTheory Free code base [AI’04] Sequence comparison in bioinformatics Association-rule finding in data mining Collaborative filtering Clustering nouns by meaning in NLP Pose estimation in vision [Tenesawa–Tanaka ’07] [Broder ’97] [Indyk–Motwani ’98] [Gionis–Indyk–Motwani ’98] [Charikar ’02] [Datar–Immorlica– –Indyk–Mirrokni ’04] [Motwani–Naor–Panigrahi ’06] [Andoni–Indyk ’06] [Neylon ’10] [Andoni–Indyk ’08, CACM]

Given: (X, dist), r > 0, c > 1 distance space“radius”“approx factor” Goal: Family H of functions X → S ( S can be any finite set) s.t. ∀ x, y ∈ X, ≥ p ≤ q ≥ q.5 ≥ q.25 ≥ q.1 ≥ q ρ

Theorem [IM’98, GIM’98] Given LSH family for (X, dist), can solve “(r,cr)-near-neighbor search” for n points with data structure of size: O(n 1+ρ ) query time: Õ(n ρ ) hash fcn evals.

Example X = {0,1} d, dist = Hamming r = d, c = dist ≤ d or ≥ 5d H = { h 1, h 2, …, h d }, h i (x) = x i [IM’98] “output a random coord.” ( S = {0,1})

Analysis = q = q ρ (1 − 5) 1/5 ≈ 1 −. ∴ ρ ≈ (1 − 5) 1/5 ≤ 1 −. ∴ ρ ≤ In general, achieves ρ ≤ ∀ c ( ∀ r).

Optimal upper bound ( {0, 1} d, Ham ), r > 0, c > 1. S ≝ {0, 1} d ∪ { ✔ }, H ≝ {h ab : dist(a,b) ≤ r} h ab (x) = ✔ if x = a or x = b x otherwise 0 positive = > 0.5 > 0.1 > 0.01 >

Wait, what? [IM’98, GIM’98] Theorem: Given LSH family for (X, dist), can solve “(r,cr)-near-neighbor search” for n points with data structure of size: Õ(n 1+ρ ) query time: Õ(n ρ ) hash fcn evals

Wait, what? [IM’98, GIM’98] Theorem: size: Õ(n 1+ρ ) query time: Õ(n ρ ) hash fcn evals

More results For R d with ℓ p -distance: when p = 1, 0 < p < 1, p = 2 [IM’98][DIIM’04][AI’06] For Jaccard similarity: ρ ≤ 1/c For {0,1} d with Hamming distance: [Bro’97] −o d (1) (assuming q ≥ 2 −o(d) ) [MNP’06] immediately for ℓ p -distance

Our Theorem For {0,1} d with Hamming distance: −o d (1) (assuming q ≥ 2 −o(d) ) immediately for ℓ p -distance ( ∃ r s.t.) Proof also yields ρ ≥ 1/c for Jaccard.

Proof:

Noise-stability is log-convex.

Proof: A definition, and two lemmas.

Fix any arbitrary function h : {0,1} d → S. Pick x ∈ {0,1} d at random: x =h(x) = s Continuous-time (lazy) random walk for time τ y = h(y) = s’ def:

Lemma 1: Lemma 2: From which the proof of ρ ≥ 1/c follows easily. For x y, τ when τ ≪ 1. K h (τ) is a log-convex function of τ. (for any h) 0 1 τ

Continuous-Time Random Walk : Repeatedly — waits Exponential(1) seconds, — dings. (Reminder: T ~ Expon(1) means Pr[T > u] = e −u.) In C.T.R.W. on {0,1} d, each coord. gets its own independent alarm clock. When i th clock dings, coord. i is rerandomized.

x = y = time τ Pr[coord. i never updated] =Pr[Exp(1) > τ]= e − τ ∴ Pr[x i ≠ y i ] = ⇒ Lemma 1: dist(x,y) ≈

Lemma 2:K h (τ) is a log-convex function of τ. Remark:True for any reversible C.T.M.C. Recall:For f : {0,1} d → ℝ, Given hash function h : {0,1} d → S, for each s ∈ S, introduce h s : {0,1} d → {0,1}, h s (x) = 1 {h(x)=s}

Proof of Lemma 2: is log-convex. log-convexnon-neg. lin. comb. of

Lemma 1: Lemma 2: Theorem: LSH for {0,1} d requires For x y, τ is a log-convex function of τ.

Proof: Say H is an LSH family for {0,1} d with params. r (c − o(1) ) r def: ( Non-neg. lin. comb. of log-convex fcns. ∴ K H (τ) is also log-convex. ) w.v.h.p., dist(x,y) ≈ ∴ K H () ≳ q ρ K H (c) ≲ q in truth, q+2 −Θ(d) ; we assume q not tiny

∴ K H () ≳ K H (c) ≲ ∴ K H (0) = ln 1 qρqρ q 0 ρ ln q ln q K H (τ) is log-convex 0 τ ln K H (τ) c ln q ∴

Super-tedious, super-straightforward Make Lemma 1 precise. (Chernoff) Make precise. (Taylor) Choose = (c, q, d) very carefully. Theorem: Meaningful iff q ≥ 2 −o(d) ; i.e., not tiny.