Download presentation
Presentation is loading. Please wait.
1
Sublinear Algorithmic Tools 3
Alex Andoni
2
Plan Dimension reduction Sketching
Application: Numerical Linear Algebra Sketching Application: Streaming Application: Nearest Neighbor Search and moreβ¦ Dimension reduction: linear map π: β π β β π s.t: for any points π,πβ β π : Pr π ||π π βπ(π)|| ||πβπ|| β 1Β±π β₯1βπΏ Sketch: linear map π: β π β β π , estimator π¬, s.t. for any πβ β π : Pr π π¬(π π ) ||π| | 1 β(1Β±π) β₯1βπΏ
3
Approximate Near Neighbor Search
Preprocess: a set of π point approximation π>1 Query: given a query point π, report a point π β βπ with the smallest distance to π up to factor π>1 Near neighbor: threshold π Applications to: closest pair, Max/Min inner product (MIPS), greedy coordinate descent, kernel approximation & estimation,β¦ Parameters: space & query time π π β ππ π π β²
4
π1= π2= Locality-Sensitive Hashing ππ, where
[Indyk-Motwaniβ98] π πβ² Random hash function π on β π satisfying: for close pair (when πβπ β€π) Prβ‘[π(π)=π(π)] is βhighβ for far pair (when πβπβ² >ππ) Prβ‘[π(π)=π(πβ²)] is βsmallβ Use several hash tables π π π1= βnot-so-smallβ π2= Prβ‘[π(π)=π(π)] π= log 1/ π 1 log 1/ π 2 1 ππ, where π1 π2 πβπ π ππ
5
Locality sensitive hash functions
Hash function π is usually a concatenation of βprimitiveβ functions: π π = β 1 (π), β 2 (π),β¦, β π (π) Example: Hamming space 0,1 π β π = π π , i.e., choose π π‘β bit for a random π π(π) chooses π bits at random Pr β π =β π = 1 β π»ππ π,π π π 1 =1β π π β π βπ/π π 2 =1β ππ π β π βππ/π π= log 1/ π 1 log 1/ π 2 = π/π ππ/π = 1 π
6
Full algorithm Data structure is just πΏ= π π hash tables: Query:
Each hash table uses a fresh random function π π π = β π,1 (π),β¦, β π,π (π) Hash all dataset points into the table Query: Check for collisions in each of the hash tables until we encounter a point within distance ππ Guarantees: Space: π ππΏ =π( π 1+π ), plus space to store points Query time: π πΏβ
(π+π) =π( π π β
π) (in expectation) 50% probability of success. Draw L hash tables
7
Analysis of LSH Scheme Choice of parameters π,πΏ ?
πΏ hash tables with π π = β 1 (π),β¦, β π (π) Pr[collision of far pair] = π 2 π Pr[collision of close pair] = π 1 π Hence πΏ=π( π π ) βrepetitionsβ (tables) suffice! set π s.t. = 1/π = π 2 π π = 1/π π π 1 π 1 2 π 2 π=1 π 2 2 π=2 π ππ
8
Analysis: Correctness
Let π β be an π-near neighbor If does not exists, algorithm can output anything Algorithm fails when: near neighbor π β is not in the searched buckets π 1 π , π 2 π , β¦, π πΏ π Probability of failure: Probability π, π β do not collide in a hash table: β€1β π 1 π Probability they do not collide in πΏ hash tables at most 1β π 1 π πΏ = 1β 1 π π π π β€1/π
9
Analysis: Runtime Runtime dominated by: Distance computations:
Hash function evaluation: π(πΏβ
π) time Distance computations to points in buckets Distance computations: Care only about far points, at distance >ππ
In one hash table, we have Probability a far point collides is at most π 2 π =1/π Expected number of far points in a bucket: πβ
1 π =1 Over πΏ hash tables, expected number of far points is πΏ Total: π πΏπ +π πΏπ =π( π π π) in expectation
10
LSH maps: Euclidean space
[Datar-Indyk-Immorlica-Mirrokniβ04, A.-Indykβ06] Space Time Exponent π=π π 1+π π π π=1/π π=1/2 πβ1/ π 2 π=1/4 πβ 1 2 π 2 β1 π=1/7 π π πβ² Even better LSH maps? NO: example of isoperimetry [Motwani-Naor-Panigrahyβ06, OβDonell-Wu-Zhouβ11] Example question: Among bodies in π
π of volume 1, which has the lowest perimeter? Can get better maps, if π allowed to (mildly) depend on the dataset [A-Indyk-Nguyen-Razenshteynβ14, A-Razenshteynβ15]
11
Plan Dimension reduction Sketching
Application: Numerical Linear Algebra Sketching Application: Streaming Application: Nearest Neighbor Search and moreβ¦ Dimension reduction: linear map π: β π β β π s.t: for any points π,πβ β π : Pr π ||π π βπ(π)|| ||πβπ|| β 1Β±π β₯1βπΏ Sketch: linear map π: β π β β π , estimator π¬, s.t. for any πβ β π : Pr π π¬(π π ) ||π| | 1 β(1Β±π) β₯1βπΏ
12
Sketching/NNS for other distances?
Earth Mover Distance: Given two sets A, B of points in a metric space EMD(A,B) = min cost bipartite matching between A and B Applications in image vision π 010110 πΈππ·(π΄,π΅) small or large? π 010101
13
Tool: metric embeddings
Map sets into vectors in β 1 preserving the distance (approximately) Then use algorithms for β 1 ! Say, EMD over π 2 Theorem [Cha02, IT03]: Exists a map π mapping all π΄β π 2 into β 1 with distortion π(logβ‘π ): i.e., for any π΄,π΅β π 2 we have: πΈππ· π΄,π΅ β€||π π΄ βπ π΅ | | 1 β€π( log π )β
πΈππ·(π΄,π΅) Implies sketch/NNS: π log π -approximation
14
Embeddability into β 1 edit( banana , ananas ) = 2 edit(1234567,
Metric Upper bound Earth-mover distance (π -sized sets in 2D plane) π log π [Cha02, IT03] (π -sized sets in 0,1 π ) π( log π β
log π ) [AIK08] Edit distance over 0,1 π (#indels to transform x->y) 2 π log π [OR05] Ulam (edit distance between permutations) π log π [CK06] Block edit distance π log π [MS00, CM07] edit( , ) = 2
15
Non-embeddability into β 1
Metric Upper bound Earth-mover distance (π -sized sets in 2D plane) π log π [Cha02, IT03] (π -sized sets in 0,1 π ) π( log π β
log π ) [AIK08] Edit distance over 0,1 π (#indels to transform x->y) 2 π log π [OR05] Ulam (edit distance between permutations) π log π [CK06] Block edit distance π log π [MS00, CM07] Lower bounds Ξ© log π [NS07] Ξ© log π [KN05] Ξ© log π [KN05,KR06] Ξ© log π [AK07] 4/3 [Cor03]
16
A sketch of the rest Sketching Efficient algorithms
Numerical Linear Algebra linear sketching Streaming (smaller space) Nearest neighbor search Locality Sensitive Hashing Sketching of graphs Sparsifiers: reduce number of edges so as to approximately preserve all cuts, distances, effective resistances etc [Benczur-Kargerβ94,β¦] Linear sketch for graph => data structures for dynamic connectivity [AGMβ12, KKMβ13] Characterization of sketching size for distance estimation: [BO10, LNW14,AKR15, BBCKYβ17] E.g.: a normed space X has small sketch iff embeds into β 0.9 Adaptive sketching: when we know we sketch set π΄β β π Then π β
may depend (weakly) on π΄ Non-oblivious subspace embeddings [DMMβ06,β¦, Woodruffβ14]
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.