Download presentation
Presentation is loading. Please wait.
Published byDelilah Hudson Modified over 8 years ago
1
NEAREST NEIGHBORS ALGORITHM Lecturer: Yishay Mansour Presentation: Adi Haviv and Guy Lev 1
2
Lecture Overview NN general overview Various methods of NN Models of the Nearest Neighbor Algorithm NN – Risk Analysis KNN – Risk Analysis Drawbacks Locality Sensitive Hashing (LSH) Implementing KNN using LSH Extension for Bounded Values Extension for Real Numbers 2
3
General Overview 3
4
Various methods of NN NN - Given a new point x, we wish to find it's nearest point and return it's classification. K-NN - Given a new point x, we wish to find it's k nearest points and return their average classification. Weighted - Given a new point x, we assign weights to all the sample points according to the distance from x and classify x according to the weighted average. 4
5
Models of the Nearest Neighbor Algorithm 5
6
NN – Risk Analysis 6
7
KNN vs. Bayes Risk Proof 7
8
NN vs. Bayes Risk Proof Cont. 8
9
KNN – Risk Analysis 9
10
KNN vs. Bayes Risk Proof 10
11
KNN vs. Bayes Risk Proof 11
12
Drawbacks 12 ?
13
Locality Sensitive Hashing (LSH) 13
14
Locality Sensitive Hashing (LSH) A Locality Sensitive Hashing family is a set H of hash functions s.t. for any p,q: If then for some probabilities Example: If then we have as required. 14
15
Implementing KNN using LSH Step 1: Amplification: Use functions of the form where are randomly selected from H. Then: If then k is chosen s.t.. Thus: Denote: 15
16
Implementing KNN using LSH Cont. Step 2: Combination pick L functions (use L hash tables). For each i: Probability of no-match for any of the functions: For given δ, Choose, then we have: For “far” points, the probability to hit is so the probability of hitting a “far” point in any of the tables is bounded by 16
17
Implementing KNN using LSH Cont. We are given LSH family H and sample set. Pre-processing: Pick L functions (use L hash tables). Insert each sample x to each table i, according to Finding nearest neighbors of q: For each i calculate and search in the ith table. Thus obtain Check the distance between q and each point in P. 17
18
Implementing KNN using LSH Cont. Complexity: Space Complexity: L tables, each containing n samples. Therefore: Search time complexity: O(L) queries to hash tables. We assume lookup time is constant. For each sample retrieved we check if it is “close”. Expected number of “far” points is at most Therefore rejecting “far” samples is O(L). Time for processing “close” samples: O(kL) Where k is number of desired neighbors. 18
19
Extension for Bounded Values Sample space is We use as distance metric. Use unary encoding: Represent each coordinate by a block of s bits A value t is represented by t consecutive 1s followed by s-t zeros. Example: s=8, x= Representation of x: 1111100011111110 Hamming distance in this representation is same as distance in the original representation. Problems with real values can be reduced to this solution by quantization. 19
20
Extension for Real Numbers Sample space is X = Assume R<<1 Pick randomly and uniformly Hash function is: For : Therefore: If R is small then: 20
21
Extension for Real Numbers Cont. Therefore: So we get a separation between and given a big enough constant c. 21
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.