When is “Nearest Neighbor Meaningful? Authors: Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan Uri Shaft Presentation by: Vuk Malbasa For CIS664 Prof.

Slides:

Advertisements

Similar presentations

Random Processes Introduction (2)

Advertisements

When Is Nearest Neighbors Indexable? Uri Shaft (Oracle Corp.) Raghu Ramakrishnan (UW-Madison)

Paper Structure.

Estimation of Means and Proportions

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.

Evaluating Classifiers

Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: asymptotic properties of estimators: plims and consistency Original.

3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ.

Modeling Process Quality

ASYMPTOTIC PROPERTIES OF ESTIMATORS: PLIMS AND CONSISTENCY

MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 

HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.

Optimization via Search CPSC 315 – Programming Studio Spring 2009 Project 2, Lecture 4 Adapted from slides of Yoonsuck Choe.

8 TECHNIQUES OF INTEGRATION. In defining a definite integral, we dealt with a function f defined on a finite interval [a, b] and we assumed that f does.

A gentle introduction to fluid and diffusion limits for queues Presented by: Varun Gupta April 12, 2006.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Chapter 4 Probability Distributions

Clustering Color/Intensity

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Cluster Analysis (1).

Basics: Notation: Sum:. PARAMETERS MEAN: Sample Variance: Standard Deviation: * the statistical average * the central tendency * the spread of the values.

Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.

Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:

Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;

Relationships Among Variables

Copyright © Cengage Learning. All rights reserved.

1 Overview This chapter will deal with the construction of probability distributions by combining the methods of Chapter 2 with the those of Chapter 4.

1 Preliminaries Precalculus Review I Precalculus Review II

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.

Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.

October 14, 2014Computer Vision Lecture 11: Image Segmentation I 1Contours How should we represent contours? A good contour representation should meet.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.

Slide 1 Copyright © 2004 Pearson Education, Inc..

1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.

Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

Yaomin Jin Design of Experiments Morris Method.

Image Modeling & Segmentation Aly Farag and Asem Ali Lecture #2.

1 Nonparametric Statistical Techniques Chapter 17.

The Curse of Dimensionality Richard Jang Oct. 29, 2003.

Week 21 Stochastic Process - Introduction Stochastic processes are processes that proceed randomly in time. Rather than consider fixed random variables.

Chapter1: Introduction Chapter2: Overview of Supervised Learning

Chapter 13 (Prototype Methods and Nearest-Neighbors )

The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 7 Sampling Distributions 7.1 What Is A Sampling.

Principal Component Analysis (PCA)

Basic Business Statistics

Multimedia and Time-Series Data When Is “ Nearest Neighbor ” Meaningful? Group member: Terry Chan, Edward Chu, Dominic Leung, David Mak, Henry Yeung, Jason.

A Statistical Approach to Texture Classification Nicholas Chan Heather Dunlop Project Dec. 14, 2005.

CORRELATION ANALYSIS.

Chapter 5 Probability Distributions 5-1 Overview 5-2 Random Variables 5-3 Binomial Probability Distributions 5-4 Mean, Variance and Standard Deviation.

Percolation Percolation is a purely geometric problem which exhibits a phase transition consider a 2 dimensional lattice where the sites are occupied with.

CURE: An Efficient Clustering Algorithm for Large Databases Authors: Sudipto Guha, Rajeev Rastogi, Kyuseok Shim Presentation by: Vuk Malbasa For CIS664.

Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.

Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 5 Probability Distributions 5-1 Overview 5-2 Random Variables 5-3 Binomial Probability Distributions.

Central Bank of Egypt Basic statistics. Central Bank of Egypt 2 Index I.Measures of Central Tendency II.Measures of variability of distribution III.Covariance.

Theory of Computational Complexity M1 Takao Inoshita Iwama & Ito Lab Graduate School of Informatics, Kyoto University.

Probability Distributions ( 확률분포 ) Chapter 5. 2 모든 가능한 ( 확률 ) 변수의 값에 대해 확률을 할당하는 체계 X 가 1, 2, …, 6 의 값을 가진다면 이 6 개 변수 값에 확률을 할당하는 함수 Definition.

Chapter 8 Introducing Inferential Statistics.

Stochastic Process - Introduction

Analyzing Graphs of Functions 1.5

SIMILARITY SEARCH The Metric Space Approach

Chapter 3: Maximum-Likelihood Parameter Estimation

Overview of Supervised Learning

Outline Parameter estimation – continued Non-parametric methods.

Chapter 1 Warm Up .

Marios Mattheakis and Pavlos Protopapas

Presentation transcript:

When is “Nearest Neighbor Meaningful? Authors: Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan Uri Shaft Presentation by: Vuk Malbasa For CIS664 Prof. Vasilis Megalooekonomou

Overview Introduction Examples of when NN is useful and not Conditions under which NN is not useful Application of results Meaningful applications of high- dimensional NN Experimental Studies Conclusions

Introduction ? Nearest neighbor is a technique where an unseen example is assumed to have similar properties to the already classified point closest to it. The examples on the left are cases where it is obvious that using NN is a useful. Are there cases where this technique is not useful?

Examples Query point Center of circle Query point Histogram of distances to other points

Conditions under which NN is not useful D min D max (1+ε)D min Definition: A nearest neighbor query is unstable for a given ε if the distance from the query point to most data points is less than (1 + ε) times the distance from the query point to its nearest neighbor. It can be shown that under certain conditions for any fixed ε > 0, as dimensionality rises, the probability that a query is unstable converges to 1.

Conditions under which NN is not useful If for a given scenario (set of data points and a set of query points) the equation below is satisfied then NN is not useful. Stated differently, as the dimensionality of data m is increased then if the variance of the distribution scaled by the overall magnitude of the distance converges to zero then NN is meaningless. (1)

Application of results Example 1: The data distribution and query distribution are IID in all dimensions All appropriate moments are finite Query point in chosen independently of data points In this case queries are unstable.

Application of results Example 2: Same as previous example but all dimensions of both query points and data points are completely dependant. value for dimension 1 = value for dimension 2 … In this case queries are not unstable and NN is meaningful.

Application of results Example 3 Every dimension is unique, but all dimensions are correlated with all other dimensions and the variance of each additional dimension increases. First independent variables U 1, …, U m are generated such that U i ~ Uniform(0,sqrt(i)) X 1 =U 1, for 2 ≤ i ≤ m X i =U i + (X i-1 /2) In this case queries are unstable.

Meaningful applications of high- dimensional NN Query point matches one of the data points exactly. Query point falls within some small distance of one of the data points (this becomes increasingly more difficult as dimensionality rises). Data is clustered into several clusters with a fixed maximum distance ε, and the query point falls within one of these clusters. (If the query point isn’t required to fall within some cluster then the query is unstable). Implicit low dimensionality (underlying dimensionality of data is low regardless of actual dimensionality).

Experimental Studies Conditions described in (1) describe what happens as dimensionality approaches infinity Experiments are needed to observe the rate of this convergence.

Experimental Studies

Conclusions Query instability is an indication of a meaningless query. While there are situations where high dimensional NN queries are meaningful, they are very specific and differ from the “independent dimensions” basis. The distinction in distance decreases fastest in the first 20 dimensions.

Conclusions Make sure that the distance distribution between query points and data points allows for enough contrast. When evaluating a NN processing technique, test it on meaningful workloads.

Thanks!

?