NEAREST NEIGHBORS ALGORITHM Lecturer: Yishay Mansour Presentation: Adi Haviv and Guy Lev 1.

Slides:



Advertisements
Similar presentations
Principles of Density Estimation
Advertisements

VC Dimension – definition and impossibility result
On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.
k-Nearest Neighbors Search in High Dimensions
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
Overcoming the L 1 Non- Embeddability Barrier Robert Krauthgamer (Weizmann Institute) Joint work with Alexandr Andoni and Piotr Indyk (MIT)
Data Mining Classification: Alternative Techniques
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Big Data Lecture 6: Locality Sensitive Hashing (LSH)
Searching on Multi-Dimensional Data
Similarity Search in High Dimensions via Hashing
Modeling and Analysis of Random Walk Search Algorithms in P2P Networks Nabhendra Bisnik, Alhussein Abouzeid ECSE, Rensselaer Polytechnic Institute.
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
Lazy vs. Eager Learning Lazy vs. eager learning
Data Structures and Functional Programming Algorithms for Big Data Ramin Zabih Cornell University Fall 2012.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Given by: Erez Eyal Uri Klein Lecture Outline Exact Nearest Neighbor search Exact Nearest Neighbor search Definition Definition Low dimensions Low dimensions.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
Instance Based Learning
Reinforcement Learning Rafy Michaeli Assaf Naor Supervisor: Yaakov Engel Visit project’s home page at: FOR.
Similarity Search in High Dimensions via Hashing Aristides Gionis, Protr Indyk and Rajeev Motwani Department of Computer Science Stanford University presented.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
These slides are based on Tom Mitchell’s book “Machine Learning” Lazy learning vs. eager learning Processing is delayed until a new instance must be classified.
1 Streaming Computation of Combinatorial Objects Ziv Bar-Yossef U.C. Berkeley Omer Reingold AT&T Labs – Research Ronen.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Nearest Neighbor Retrieval Using Distance-Based Hashing Michalis Potamias and Panagiotis Papapetrou supervised by Prof George Kollios A method is proposed.
Y. Weiss (Hebrew U.) A. Torralba (MIT) Rob Fergus (NYU)
NN Cont’d. Administrivia No news today... Homework not back yet Working on it... Solution set out today, though.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Indexing Techniques Mei-Chen Yeh.
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman.
Nearest Neighbor Paul Hsiung March 16, Quick Review of NN Set of points P Query point q Distance metric d Find p in P such that d(p,q) < d(p’,q)
Visual Information Systems Recognition and Classification.
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 6: Nearest and k-nearest Neighbor Classification.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.
K nearest neighbors algorithm Parallelization on Cuda PROF. VELJKO MILUTINOVIĆ MAŠA KNEŽEVIĆ 3037/2015.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
New Algorithms for Efficient High-Dimensional Nonparametric Classification Ting Liu, Andrew W. Moore, and Alexander Gray.
CS Machine Learning Instance Based Learning (Adapted from various sources)
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
CS 8751 ML & KDDInstance Based Learning1 k-Nearest Neighbor Locally weighted regression Radial basis functions Case-based reasoning Lazy and eager learning.
KNN & Naïve Bayes Hongning Wang
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Lecture 15. Pattern Classification (I): Statistical Formulation
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Lecture 18: Uniformity Testing Monotonicity Testing
Lecture 11: Nearest Neighbor Search
K Nearest Neighbor Classification
Lecture 10: Sketching S3: Nearest Neighbor Search
Randomized Algorithms CS648
Lecture 7: Dynamic sampling Dimension Reduction
Near(est) Neighbor in High Dimensions
Nearest-Neighbor Classifiers
Locality Sensitive Hashing
Instance Based Learning
CS5112: Algorithms and Data Structures for Applications
CS5112: Algorithms and Data Structures for Applications
Minwise Hashing and Efficient Search
President’s Day Lecture: Advanced Nearest Neighbor Search
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

NEAREST NEIGHBORS ALGORITHM Lecturer: Yishay Mansour Presentation: Adi Haviv and Guy Lev 1

Lecture Overview NN general overview Various methods of NN Models of the Nearest Neighbor Algorithm NN – Risk Analysis KNN – Risk Analysis Drawbacks Locality Sensitive Hashing (LSH) Implementing KNN using LSH Extension for Bounded Values Extension for Real Numbers 2

General Overview 3

Various methods of NN NN - Given a new point x, we wish to find it's nearest point and return it's classification. K-NN - Given a new point x, we wish to find it's k nearest points and return their average classification. Weighted - Given a new point x, we assign weights to all the sample points according to the distance from x and classify x according to the weighted average. 4

Models of the Nearest Neighbor Algorithm 5

NN – Risk Analysis 6

KNN vs. Bayes Risk Proof 7

NN vs. Bayes Risk Proof Cont. 8

KNN – Risk Analysis 9

KNN vs. Bayes Risk Proof 10

KNN vs. Bayes Risk Proof 11

Drawbacks 12 ?

Locality Sensitive Hashing (LSH) 13

Locality Sensitive Hashing (LSH) A Locality Sensitive Hashing family is a set H of hash functions s.t. for any p,q: If then for some probabilities Example: If then we have as required. 14

Implementing KNN using LSH Step 1: Amplification: Use functions of the form where are randomly selected from H. Then: If then k is chosen s.t.. Thus: Denote: 15

Implementing KNN using LSH Cont. Step 2: Combination pick L functions (use L hash tables). For each i: Probability of no-match for any of the functions: For given δ, Choose, then we have: For “far” points, the probability to hit is so the probability of hitting a “far” point in any of the tables is bounded by 16

Implementing KNN using LSH Cont. We are given LSH family H and sample set. Pre-processing: Pick L functions (use L hash tables). Insert each sample x to each table i, according to Finding nearest neighbors of q: For each i calculate and search in the ith table. Thus obtain Check the distance between q and each point in P. 17

Implementing KNN using LSH Cont. Complexity: Space Complexity: L tables, each containing n samples. Therefore: Search time complexity: O(L) queries to hash tables. We assume lookup time is constant. For each sample retrieved we check if it is “close”. Expected number of “far” points is at most Therefore rejecting “far” samples is O(L). Time for processing “close” samples: O(kL) Where k is number of desired neighbors. 18

Extension for Bounded Values Sample space is We use as distance metric. Use unary encoding: Represent each coordinate by a block of s bits A value t is represented by t consecutive 1s followed by s-t zeros. Example: s=8, x= Representation of x: Hamming distance in this representation is same as distance in the original representation. Problems with real values can be reduced to this solution by quantization. 19

Extension for Real Numbers Sample space is X = Assume R<<1 Pick randomly and uniformly Hash function is: For : Therefore: If R is small then: 20

Extension for Real Numbers Cont. Therefore: So we get a separation between and given a big enough constant c. 21