Download presentation
Presentation is loading. Please wait.
2
Given by: Erez Eyal Uri Klein
3
Lecture Outline Exact Nearest Neighbor search Exact Nearest Neighbor search Definition Definition Low dimensions Low dimensions KD-Trees KD-Trees Approximate Nearest Neighbor search (LSH based) Approximate Nearest Neighbor search (LSH based) Locality Sensitive Hashing families Locality Sensitive Hashing families Algorithm for Hamming Cube Algorithm for Hamming Cube Algorithm for Euclidean space Algorithm for Euclidean space Summary Summary Overview Detailed
4
Nearest Neighbor Search in Springfield ?
5
? Nearest “ Neighbor ” Search for Homer Simpson Home planet distance Height Weight Color
6
Nearest Neighbor (NN) Search Given: a set P of n points in R d (d - dimension) Given: a set P of n points in R d (d - dimension) Goal: a data structure, which given a query point q, finds the nearest neighbor p of q in P (in terms of some distance function D) Goal: a data structure, which given a query point q, finds the nearest neighbor p of q in P (in terms of some distance function D) q p
7
Nearest Neighbor Search Interested in designing a data structure, with the following objectives: Space: O(dn) Space: O(dn) Query time: O(d log(n)) Query time: O(d log(n)) Data structure construction time is not important Data structure construction time is not important
8
Lecture Outline Exact Nearest Neighbor search Exact Nearest Neighbor search Definition Definition Low dimensions KD-Trees KD-Trees Approximate Nearest Neighbor search (LSH based) Approximate Nearest Neighbor search (LSH based) Locality Sensitive Hashing families Locality Sensitive Hashing families Algorithm for Hamming Cube Algorithm for Hamming Cube Algorithm for Euclidean space Algorithm for Euclidean space Summery Summery
9
Simple cases: 1-D (d = 1) A binary search will give the solution A binary search will give the solution Space: O(n); Time: O(log(n)) Space: O(n); Time: O(log(n)) q = 9 147813192532
10
Simple cases: 2-D (d = 2) Using Voronoi diagrams will give the solution Using Voronoi diagrams will give the solution Space: O(n 2 ); Time: O(log(n)) Space: O(n 2 ); Time: O(log(n))
11
Lecture Outline Exact Nearest Neighbor search Exact Nearest Neighbor search Definition Definition Low dimensions Low dimensions KD-Trees Approximate Nearest Neighbor search (LSH based) Approximate Nearest Neighbor search (LSH based) Locality Sensitive Hashing families Locality Sensitive Hashing families Algorithm for Hamming Cube Algorithm for Hamming Cube Algorithm for Euclidean space Algorithm for Euclidean space Summary Summary
12
KD-Trees KD-tree is a data structure based on recursively subdividing a set of points with alternating axis- aligned hyperplanes. The classical KD-tree uses O(dn) space and answers queries in time logarithmic in n (worst case is O(n)), but exponential in d.
13
4 7 6 5 1 3 2 9 8 10 11 l5l5 l1l1 l9l9 l6l6 l3l3 l 10 l7l7 l4l4 l8l8 l2l2 l1l1 l8l8 1 l2l2 l3l3 l4l4 l5l5 l7l7 l6l6 l9l9 3 25411 910 8 67 KD-Trees Construction
14
4 7 6 5 1 3 2 9 8 10 11 l5l5 l1l1 l9l9 l6l6 l3l3 l 10 l7l7 l4l4 l8l8 l2l2 l1l1 l8l8 1 l2l2 l3l3 l4l4 l5l5 l7l7 l6l6 l9l9 3 25411 910 8 67 q KD-Trees Query
15
KD-Trees Algorithms
16
Lecture Outline Exact Nearest Neighbor search Exact Nearest Neighbor search Definition Definition Low dimensions Low dimensions KD-Trees KD-Trees Approximate Nearest Neighbor search (LSH based) Approximate Nearest Neighbor search (LSH based) Locality Sensitive Hashing families Algorithm for Hamming Cube Algorithm for Hamming Cube Algorithm for Euclidean space Algorithm for Euclidean space Summary Summary
17
A conjecture: “ The curse of dimensionality ” “ However, to the best of our knowledge, lower bounds for exact NN Search in high dimensions do not seem sufficiently convincing to justify the curse of dimensionality conjecture ” (Borodin et al. ‘ 99) In an exact solution, any algorithm for high dimension must use either n (1) space or have d (1) query time
18
Why Approximate NN? Approximation allow significant speedup of calculation (on the order of 10 ’ s to 100 ’ s) Fixed-precision arithmetic on computer causes approximation anyway Heuristics are used for mapping features to numerical values (causing uncertainty anyway)
19
Approximate Nearest Neighbor (ANN) Search Given: a set P of n points in R d (d - dimension) and a slackness parameter >0 Given: a set P of n points in R d (d - dimension) and a slackness parameter >0 Goal: a data structure, which given a query point q of which the nearest neighbor in P is a, finds any p s.t. D(q, p) b (1+ )D(q, a) Goal: a data structure, which given a query point q of which the nearest neighbor in P is a, finds any p s.t. D(q, p) b (1+ )D(q, a) q a (1+ )D(q, a)
20
Locality Sensitive Hashing A (r 1, r 2, P 1, P 2 ) - Locality Sensitive Hashing (LSH) family, is a family of hash functions H s.t. for a random hash function h and for any pair of points a, b we have: D(a, b) b r 1 Pr[h(a)=h(b)] r P 1 D(a, b) b r 1 Pr[h(a)=h(b)] r P 1 D(a, b) r r 2 Pr[h(a)=h(b)] b P 2 D(a, b) r r 2 Pr[h(a)=h(b)] b P 2 (r 1 P 2 ) (r 1 P 2 ) [Indyk-Motwani ’ 98] (A common method to reduce dimensionality without loosing distance information)
21
Hamming Cube A d-Dimensional hamming cube Q d is the set {0, 1} d A d-Dimensional hamming cube Q d is the set {0, 1} d For any a, bQ d we define Hamming distance H: For any a, b Q d we define Hamming distance H:
22
LSH – Example in Hamming Cube a)=a i, i{1, …, d}} H ={h|h(a)=a i, i {1, …, d}} Pr[q)=a)]=1-H(q, a)/d Pr[h(q)=h(a)]=1-H(q, a)/d Pr is a monotonically decreasing function in Pr is a monotonically decreasing function in H(q, a) Multi-index hashing: Multi-index hashing: G ={g|g(a)=(h 1 (a) h 2 (a)… h k (a))} Pr[q)=a)]=(1-H(q, a)/d Pr[g(q)=g(a)]=(1-H(q, a)/d) k Pr is a monotonically decreasing function in k
23
Lecture Outline Exact Nearest Neighbor search Exact Nearest Neighbor search Definition Definition Low dimensions Low dimensions KD-Trees KD-Trees Approximate Nearest Neighbor search (LSH based) Approximate Nearest Neighbor search (LSH based) Locality Sensitive Hashing families Locality Sensitive Hashing families Algorithm for Hamming Cube Algorithm for Euclidean space Algorithm for Euclidean space Summary Summary
24
LSH – ANN Search Basic Scheme Preprocess: Construct several such ‘ g ’ functions for each Construct several such ‘ g ’ functions for each l {1,…, d} Store each a at the place g i (a) of the corresponding hash table Store each a P at the place g i (a) of the corresponding hash table Query: Perform binary search on Perform binary search on l In each step retrieve g i (q) (of l, if exists) In each step retrieve g i (q) (of l, if exists) Return the last non empty result Return the last non empty result
25
ANN Search in Hamming Cube -test : Pick a subset C of {1, 2, …, d} independently, at random w.p. For each i C, pick independently and uniformly r i {0, 1} For any a Q d : (Equivalently, we may pick R {0, 1} d s.t. R i is 1 w.p. , and the test is an inner product of R and a. Such R represents a -test ) [Kushilevitz et al. ’ 98]
26
ANN Search in Hamming Cube Pr[(a) R (b)] Define: (a, b)=Pr[ (a) R (b)] For, Let H(a, q) b l, H(b, q) > l For a query q, Let H(a, q) b l, H(b, q) > l(1+ ) Then for Then for =1/(2l): b (a, q) b 1 2 < (b, q) Where: And define: 2 - 1 = (1-e - /2 )
27
ANN Search in Hamming Cube Data structure: S ={S 1, …, S d } Positive integers - M, T For any l {1,…, d}, S l ={ 1,…, M } For any j {1,…, M}, j consists of a set {t 1,…, t T } (each t k is a (1/(2l))-test) and a table A j of 2 T entries
28
ANN Search in Hamming Cube In each S l, construct j as follows: Pick {t 1,…, t T } independently at random For v Q d, the trace t(v)=(t 1 (v),…, t T (v)) {0,1} T b An entry z {0, 1} T in A j contains a point a P, if H(t(a), z) b ( 1 +(1/3) )T (else empty) The space complexity:
29
ANN Search in Hamming Cube b For any query q and a, b P s.t. H(q, a) b l and H(q, b)>(1+ )l, it can be proven using Chernoff bounds that: [Alon & Spencer ’ 92] This gives the result that the trace t functions as a LSH family (in its essence) (When the event presented in these inequalities occur for some j in S l, j is said to ‘ fail ’ )
30
ANN Search in Hamming Cube Search Algorithm: We perform a binary search on l. In every step: Pick j in S l uniformly, at random Compute t(q) from the list of tests in j Check the entry labeled t(q) in A j : If the entry contains a point from P, restrict the search to lower l ’ s Otherwise restrict the search to greater l ’ s Return the last non-empty entry in the search
31
ANN Search in Hamming Cube Search Algorithm: Example Initialize l=d/2 Is A j (t(q)) empty? Calculate t(q) Choose j Access S l Res A j (t(q)), l lower half No l upper half Yes l covered already? No Yes
32
ANN Search in Hamming Cube Construction of S is said to ‘ fail ’, if for some l more than M/log(d) structures j in S l ‘fail’ Define (for some ): Then S ’ s construction fails w.p. of at most If S does not fail, then for every query the search algorithm fails to find an ANN w.p. of at most
33
ANN Search in Hamming Cube Query time complexity: Space complexity: Complexities are also proportional to -2
34
Lecture Outline Exact Nearest Neighbor search Exact Nearest Neighbor search Definition Definition Low dimensions Low dimensions KD-Trees KD-Trees Approximate Nearest Neighbor search (LSH based) Approximate Nearest Neighbor search (LSH based) Locality Sensitive Hashing families Locality Sensitive Hashing families Algorithm for Hamming Cube Algorithm for Hamming Cube Algorithm for Euclidean space Summary Summary
35
Euclidean Space The d-Dimensional Euclidean Space l i d The d-Dimensional Euclidean Space l i d is R d endowed with the L i distance For any a, bQ d we define distance: For any a, b Q d we define L i distance: The algorithm presented deals with l d l d The algorithm presented deals with l 2 d, and with l 1 d under minor changes
36
Euclidean Space Define: B (a, r) is the closed ball around a with radius r D (a, r)=P B (a, r) (A subset of R d ) [Kushilevitz et al. ’ 98]
37
LSH – ANN Search Extended Scheme Preprocess: Prepare a data structure for each ‘ hamming ball ’ induced by any a, b. Prepare a data structure for each ‘ hamming ball ’ induced by any a, b P. Query: Start with some maximal ball Start with some maximal ball In each step calculate the ANN In each step calculate the ANN Stop according to some threshold Stop according to some threshold
38
ANN Search in Euclidean Space For a P, Define a Euclidian to Hamming mapping D (a, r) {0, 1} DF Define a parameter L Given a set of i.i.d. unit vectors z 1, …, z D For each z i, The cutting points c 1, …, c F are equally spaced on: Each z i and c j define a coordinate in the DF- hamming cube, on which the projection of any b D (a, r) is 0 iff
39
ANN Search in Euclidean Space Euclidian to hamming Mapping Example: d=3, d=3, D=2, F=3 1 0 1 0 1 1 (a)(a) z1z1 z2z2 0 1 1 0 1 1 (b)(b) z1z1 z2z2 a3a3 a2a2 a1a1 a (aiR)(aiR) b3b3 b2b2 b1b1 b (biR)(biR)
40
ANN Search in Euclidean Space It can be proven that, expectedly, the mapping preserves the relative distances between points in P This mapping gets more accurate as r grows smaller:
41
ANN Search in Euclidean Space Data structure: S={S a |a P} Positive integers - D, F, L For any a P, S a consists of: A list of all other P ’ s elements sorted by increasing distance from a R A structure S a,b for any b R a (b P)
42
ANN Search in Euclidean Space Let r=L 2 (a, b), then S a,b consists of: A list of D i.i.d. unit vectors {z 1, …, z D } For each unit vector z i, a list of F cutting points A Hamming Cube data structure of dimension DF, containing D (a, r) The size of D (a, r)
43
ANN Search in Euclidean Space Search Algorithm (using a positive integer T): Pick a random a 0 P where b 0 is the farthest point from a 0, and start from S a0,b0 (r 0 =L 2 (a 0, b 0 )) For any S aj,bj : Query for ANN of (q) in the Hamming Cube d.s. and get result (a’) If L 2 (q, a ’ )>r -1 /10 return a ’ Otherwise, pick T points of D (a j, r j ) at random, and let a ” be the closest to q among them Let a j+1 be the closest to q of {a j, a ’, a ” }
44
ANN Search in Euclidean Space r Let b ’ P be the farthest from a j+1 s.t. 2L 2 (a j+1, q) r L 2 (a j+1, b ’ ), Using a binary search on the sorted list of S a(j+1) If can ’ t find, return a j+1 Otherwise, let b j+1 =b ’
45
ANN Search in Euclidean Space Each ball in the search contains q ’ s (exact) NN q aiai bibi
46
ANN Search in Euclidean Space contains only points from contains at most points w.p. of at least 1-2 -T q a i-1 b i-1 aiai
47
ANN Search in Euclidean Space q aiai bibi a i-1
48
ANN Search in Euclidean Space Conclusion: In the expected case, this gives us an O(log(n)) number of iterations
49
ANN Search in Euclidean Space Search Algorithm: Example q a0a0 b0b0 b1b1 a1a1
50
ANN Search in Euclidean Space Construction of S is said to ‘ fail ’, if for some S a,b, does not preserve the relative distances Define (for some ): Then S ’ s construction fails w.p. of at most If S does not fail, then for every query the search algorithm finds an ANN
51
ANN Search in Euclidean Space Query time complexity: Space complexity: Complexities are also proportional to -2
52
Remark – Additional Work Related Works: Jon M. Kleinberg. “ Two Algorithms for Nearest-Neighbor Search in High Dimensions ”, 1997 P. Indyk and R. Motwani. “ Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality ”, 1988 P. Indyk and R. Motwani. “ Similarity search in High Dimensions via Hashing ”, 1999
53
Remark – Additional Work Related Works: [P. Indyk and R. Motwani ‘ 99]
54
Remark – Additional Work [P. Indyk and R. Motwani ‘ 99] Related Works:
55
Summary The Goal: linear space and logarithmic search time Approximate nearest neighbor Locality Sensitive Hash functions Amplify probability by concatenating Discretization of values by projection of points on vector units
56
Good Bye (Approximate) Neighbor [http://www.thesimpsons.com] For questions feel free to consult your neighbors: uri.klein@weizmann.ac.il erez.eyal@weizmann.ac.il
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.