Distance Metric Measures the dissimilarity between two data points. A metric is a fctn, d, of 2 points X and Y, such that d(X, Y) is positive definite:

Slides:



Advertisements
Similar presentations
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Advertisements

Partial Derivatives Definitions : Function of n Independent Variables: Suppose D is a set of n-tuples of real numbers (x 1, x 2, x 3, …, x n ). A real-valued.
Volumes – The Disk Method Lesson 7.2. Revolving a Function Consider a function f(x) on the interval [a, b] Now consider revolving that segment of curve.
CLUSTERING PROXIMITY MEASURES
Hyperbolic Geometry Chapter 9.
Announcements Problem Set 2, handed out today, due next Tuesday. Late Homework should be turned into my office with date and time written on it. Mail problem.
Regressions and approximation Prof. Graeme Bailey (notes modified from Noah Snavely, Spring 2009)
International Workshop on Computer Vision - Institute for Studies in Theoretical Physics and Mathematics, April , Tehran 1 I THE NATURAL PSEUDODISTANCE:
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Distance Measures Tan et al. From Chapter 2.
LIMITS 2. LIMITS 2.4 The Precise Definition of a Limit In this section, we will: Define a limit precisely.
Reflexive -- First sentence of proof is: (1) Let x  Z (2) Let (x,x)  R. (3) Let (x,x)  I (4) Let x  R.
Distance Measures Tan et al. From Chapter 2. Similarity and Dissimilarity Similarity –Numerical measure of how alike two data objects are. –Is higher.
Introduction to 3D Graphics John E. Laird. Basic Issues u Given a internal model of a 3D world, with textures and light sources how do you project it.
University of Texas at Austin CS395T - Advanced Image Synthesis Spring 2006 Don Fussell Orthogonal Functions and Fourier Series.
Topics in Algorithms 2007 Ramesh Hariharan. Random Projections.
§ Consider Taxicab and Euclidean distances. Which is usually greater? Taxicab distance or Euclidean distance? Is the reverse ever true? Are they.
Hyperbolic Geometry Chapter 11. Hyperbolic Lines and Segments Poincaré disk model  Line = circular arc, meets fundamental circle orthogonally Note: 
University of Texas at Austin CS384G - Computer Graphics Fall 2008 Don Fussell Orthogonal Functions and Fourier Series.
Warm-Up The graph shown represents a hyperbola. Draw the solid of revolution formed by rotating the hyperbola around the y -axis.
General (point-set) topology Jundong Liu Ohio Univ.
Assigned work: pg. 468 #3-8,9c,10,11,13-15 Any vector perpendicular to a plane is a “normal ” to the plane. It can be found by the Cross product of any.
Shape Analysis and Retrieval Statistical Shape Descriptors Notes courtesy of Funk et al., SIGGRAPH 2004.
STRETCHES AND SHEARS.
Introduction Image geometry studies rotation, translation, scaling, distortion, etc. Image topology studies, e.g., (i) the number of occurrences.
Jaruloj Chongstitvatana Advanced Data Structures 1 Index Structures for Multimedia Data Feature-based Approach.
K-Nearest Neighbor Classification on Spatial Data Streams Using P-trees Maleq Khan, Qin Ding, William Perrizo; NDSU.
Perimeter of Rectangles
Efficient Equal Interval Neighborhood Ring (P-trees technology is patented by NDSU)
Chapter 2: Getting to Know Your Data
Translations. Definitions: Transformations: It is a change that occurs that maps or moves a shape in a specific directions onto an image. These are translations,
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
CS654: Digital Image Analysis Lecture 5: Pixels Relationships.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 6: Nearest and k-nearest Neighbor Classification.
Associative Property of Addition
Fast Similarity Metric Based Data Mining Techniques Using P-trees: k-Nearest Neighbor Classification  Distance metric based computation using P-trees.
PELL ’ S EQUATION - Nivedita. Notation d = positive square root of x Z = ring of integers Z[d] = {a+bd |a,b in Z}
= 5 = 2 = 4 = 3 How could I make 13 from these shapes? How could I make 19 from these shapes? STARTER.
Finding Volumes Chapter 6.2 February 22, In General: Vertical Cut:Horizontal Cut:
Overview Data Mining - classification and clustering
Fast Similarity Metric Based Data Mining Techniques Using P-trees: k-Nearest Neighbor Classification  Distance metric based computation using P-trees.
1 Objectives State the inequalities that relate angles and lengths of sides in a triangle State the possible lengths of three sides of a triangle.
Distance and Similarity measures What does “close” Mean
Image Sampling and Quantization
Lecture 2-2 Data Exploration: Understanding Data
Chapter 3 The Real Numbers.
ECE 417 Lecture 2: Metric (=Norm) Learning
Lecture 05: K-nearest neighbors
Finding Volumes.
Haim Kaplan and Uri Zwick
Chapter 3 The Real Numbers.
Similarity and Dissimilarity
School of Computer Science & Engineering
Translations.
Volumes – The Disk Method
Lines Task Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 7 Task 8
Neural Networks Advantages Criticism
Data Mining extracting knowledge from a large amount of data
Hypershot: Fun with Hyperbolic Geometry
Translations.
Unit 4 Transformations.
Nearest Neighbors CSC 576: Data Mining.
Lecture 03: K-nearest neighbors
Translations.
Clustering.
Translations.
Translations.
Translate 5 squares left and 4 squares up.
PRESENTED BY Dr.U.KARUPPIAH DEPARTMENT OF MATHEMATICS
Presentation transcript:

Distance Metric Measures the dissimilarity between two data points. A metric is a fctn, d, of 2 points X and Y, such that d(X, Y) is positive definite: if (X  Y), d(X, Y) > 0 if (X = Y), d(X, Y) = 0 d(X, Y) is symmetric: d(X, Y) = d(Y, X) d(X, Y) satisfies triangle inequality: d(X, Y) + d(Y, Z)  d(X, Z)

Standard Distance Metrics Minkowski distance or L p distance, Manhattan distance, Euclidian distance, Max distance, (P = 1) (P = 2) (P =  )

An Example A two-dimensional space: Manhattan, d 1 (X,Y) = XZ+ ZY = 4+3 = 7 Euclidian, d 2 (X,Y) = XY = 5 Max, d  (X,Y) = Max(XZ, ZY) = XZ = 4 X (2,1) Y (6,4) Z d1  d2  dd1  d2  d For any positive integer p,

HOBbit Similarity Higher Order Bit (HOBbit) similarity: HOBbitS(A, B) = Bit position: x 1 : x 2 : y 1 : y 2 : HOBbitS(x 1, y 1 ) = 3 HOBbitS(x 2, y 2 ) = 4 A, B: two scalars (integer) a i, b i : i th bit of A and B (left to right) m : number of bits These notes contain NDSU confidential & Proprietary material. Patents pending on bSQ, Ptree technology

HOBbit Distance (High Order Bifurcation bit) HOBbit distance between two scalar value A and B: d v (A, B) = m – HOBbit(A, B) Example: Bit position: x 1 : x 2 : y 1 : y 2 : HOBbitS(x 1, y 1 ) = 3 HOBbitS(x 2, y 2 ) = 4 d v (x 1, y 1 ) = 8 – 3 = 5 d v (x 2, y 2 ) = 8 – 4 = 4 HOBbit distance for X and Y: In our example (considering 2-dim data): d h (X, Y) = max (5, 4) = 5

HOBbit Distance Is a Metric HOBbit distance is positive definite if (X = Y), = 0 if (X  Y), > 0 HOBbit distance is symmetric HOBbit distance holds triangle inequality

Neighborhood of a Point Neighborhood of a target point, T, is a set of points, S, such that X  S if and only if d(T, X)  r 2r2r T X T 2r2r X 2r2r T X T 2r2r X Manhattan Euclidian Max HOBbit If X is a point on the boundary, d(T, X) = r

Decision Boundary decision boundary between points A and B, is the locus of the point X satisfying d(A, X) = d(B, X) B X A D R2R2 R1R1 d(A,X)d(A,X) d(B,X)d(B,X)  > 45  Euclidian B A Max Manhattan  < 45  B A Euclidian Max Manhattan B A B A Decision boundary for HOBbit Distance is perpendicular to axis that makes max distance Decision boundaries for Manhattan, Euclidean and max distance

Minkowski Metrics p=2 (Euclidean) p=1 (Manhattan).... P=  (chessboard) P=½, ⅓, ¼, … d max ≡ max|x i - y i |  d  ≡ lim p   d p (X,Y). Proof (sort of) lim p   {  i=1 to n a i p } 1/p ‎ max(a i ) ≡ b. For p large enough, other a i p << b p since y=x p increasingly concave, so  i=1 to n a i p  k*b p (k=duplicity of b in the sum), so {  i=1 to n a i p } 1/p  k 1/p *b and k 1/p  1 L p -metrics (aka: Minkowski metrics) d p (X,Y) = (  i=1 to n w i |x i - y i | p ) 1/p (weights, w i assumed =1) Unit Disks Boundary p=3,4,… ?

P>1 L p metrics q x 1 y 1 x 2 y 2 Lq distance x to y MAX y x q x 1 y 1 x 2 y 2 Lq distance x to y MAX x y q x 1 y 1 x 2 y 2 Lq distance x to y MAX x y q x 1 y 1 x 2 y 2 Lq distance x to y MAX x y q x 1 y 1 x 2 y 2 Lq distance x to y MAX x y q x 1 y 1 x 2 y 2 Lq distance x to y MAX x y x q x 1 y 1 x 2 y 2 Lq distance x to y MAX y

q x 1 y 1 x 2 y 2 Lq distance x to y E x y P<1 L p metrics d 1/p (X,Y) = (  i=1 to n |x i - y i | 1/p ) p P<1 For p=0 (lim as p  0), Lp doesn’t exist (Does not converge.) q x 1 y 1 x 2 y 2 Lq distance x to y E E x y q x 1 y 1 x 2 y 2 Lq distance x to y E x y

Min dissimilarity function The d min function ( d min (X,Y) = min i=1 to n |x i - y i | ) is strange. It is not even a psuedo-metric. The Unit Disk is: And the neighborhood of the blue point relative to the red point (the neighborhood of points closer to the blue than the red) is strangely shaped!

Canberra metric: d c (X,Y) = (  i=1 to n |x i – y i | / (x i + y i ) normalized manhattan distance Square Cord metric: d sc (X,Y) =  i=1 to n (  x i –  y i ) 2 Already discussed as L p with p=1/2 Squared Chi-squared metric: d chi (X,Y) =  i=1 to n (x i – y i ) 2 / (x i + y i ) Scalar Product metric: d chi (X,Y) = X Y =  i=1 to n x i * y i Hyperbolic metrics: (which map infinite space 1-1 onto a sphere) Which are rotationally invariant? Translation invariant? Other? Other Interesting Metrics Some notes on distance functions can be found at