Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distance Metric Measures the dissimilarity between two data points. A metric is a fctn, d, of 2 points X and Y, such that d(X, Y) is positive definite:

Similar presentations


Presentation on theme: "Distance Metric Measures the dissimilarity between two data points. A metric is a fctn, d, of 2 points X and Y, such that d(X, Y) is positive definite:"— Presentation transcript:

1 Distance Metric Measures the dissimilarity between two data points. A metric is a fctn, d, of 2 points X and Y, such that d(X, Y) is positive definite: if (X  Y), d(X, Y) > 0 if (X = Y), d(X, Y) = 0 d(X, Y) is symmetric: d(X, Y) = d(Y, X) d(X, Y) satisfies triangle inequality: d(X, Y) + d(Y, Z)  d(X, Z)

2 Standard Distance Metrics Minkowski distance or L p distance, Manhattan distance, Euclidian distance, Max distance, (P = 1) (P = 2) (P =  )

3 An Example A two-dimensional space: Manhattan, d 1 (X,Y) = XZ+ ZY = 4+3 = 7 Euclidian, d 2 (X,Y) = XY = 5 Max, d  (X,Y) = Max(XZ, ZY) = XZ = 4 X (2,1) Y (6,4) Z d1  d2  dd1  d2  d For any positive integer p,

4 HOBbit Similarity Higher Order Bit (HOBbit) similarity: HOBbitS(A, B) = Bit position: 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 x 1 : 0 1 1 0 1 0 0 1x 2 : 0 1 0 1 1 1 0 1 y 1 : 0 1 1 1 1 1 0 1y 2 : 0 1 0 1 0 0 0 0 HOBbitS(x 1, y 1 ) = 3 HOBbitS(x 2, y 2 ) = 4 A, B: two scalars (integer) a i, b i : i th bit of A and B (left to right) m : number of bits These notes contain NDSU confidential & Proprietary material. Patents pending on bSQ, Ptree technology

5 HOBbit Distance (High Order Bifurcation bit) HOBbit distance between two scalar value A and B: d v (A, B) = m – HOBbit(A, B) Example: Bit position: 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 x 1 : 0 1 1 0 1 0 0 1x 2 : 0 1 0 1 1 1 0 1 y 1 : 0 1 1 1 1 1 0 1y 2 : 0 1 0 1 0 0 0 0 HOBbitS(x 1, y 1 ) = 3 HOBbitS(x 2, y 2 ) = 4 d v (x 1, y 1 ) = 8 – 3 = 5 d v (x 2, y 2 ) = 8 – 4 = 4 HOBbit distance for X and Y: In our example (considering 2-dim data): d h (X, Y) = max (5, 4) = 5

6 HOBbit Distance Is a Metric HOBbit distance is positive definite if (X = Y), = 0 if (X  Y), > 0 HOBbit distance is symmetric HOBbit distance holds triangle inequality

7 Neighborhood of a Point Neighborhood of a target point, T, is a set of points, S, such that X  S if and only if d(T, X)  r 2r2r T X T 2r2r X 2r2r T X T 2r2r X Manhattan Euclidian Max HOBbit If X is a point on the boundary, d(T, X) = r

8 Decision Boundary decision boundary between points A and B, is the locus of the point X satisfying d(A, X) = d(B, X) B X A D R2R2 R1R1 d(A,X)d(A,X) d(B,X)d(B,X)  > 45  Euclidian B A Max Manhattan  < 45  B A Euclidian Max Manhattan B A B A Decision boundary for HOBbit Distance is perpendicular to axis that makes max distance Decision boundaries for Manhattan, Euclidean and max distance

9 Minkowski Metrics p=2 (Euclidean) p=1 (Manhattan).... P=  (chessboard) P=½, ⅓, ¼, … d max ≡ max|x i - y i |  d  ≡ lim p   d p (X,Y). Proof (sort of) lim p   {  i=1 to n a i p } 1/p ‎ max(a i ) ≡ b. For p large enough, other a i p << b p since y=x p increasingly concave, so  i=1 to n a i p  k*b p (k=duplicity of b in the sum), so {  i=1 to n a i p } 1/p  k 1/p *b and k 1/p  1 L p -metrics (aka: Minkowski metrics) d p (X,Y) = (  i=1 to n w i |x i - y i | p ) 1/p (weights, w i assumed =1) Unit Disks Boundary p=3,4,… ?

10 P>1 L p metrics q x 1 y 1 x 2 y 2 Lq distance x to y 6 90 0 45 0 90.232863532 9 90 0 45 0 90.019514317 100 90 0 45 0 90 MAX 90 0 45 0 90 y x q x 1 y 1 x 2 y 2 Lq distance x to y 2.5 0.5 0.7071067812 4.5 0.5 0.5946035575 9.5 0.5 0.5400298694 100.5 0.5 0.503477775 MAX.5 0.5 0.5 x y q x 1 y 1 x 2 y 2 Lq distance x to y 2.71 0.71 0 1.0 3.71 0.71 0.8908987181 7.71 0.71 0.7807091822 100.71 0.71 0.7120250978 MAX.71 0.71 0.7071067812 x y q x 1 y 1 x 2 y 2 Lq distance x to y 2.99 0.99 0 1.4000714267 8.99 0.99 0 1.0796026553 100.99 0.99 0.9968859946 1000.99 0.99 0.9906864536 MAX.99 0.99 0.99 x y q x 1 y 1 x 2 y 2 Lq distance x to y 2 1 0 1 0 1.4142135624 9 1 0 1 0 1.0800597389 100 1 0 1 0 1.0069555501 1000 1 0 1 0 1.0006933875 MAX 1 0 1 0 1 x y q x 1 y 1 x 2 y 2 Lq distance x to y 2.9 0.1 0.9055385138 9.9 0.1 0.9000000003 100.9 0.1 0.9 1000.9 0.1 0.9 MAX.9 0.1 0.9 x y x q x 1 y 1 x 2 y 2 Lq distance x to y 2 3 0 3 0 4.2426406871 3 3 0 3 0 3.7797631497 8 3 0 3 0 3.271523198 100 3 0 3 0 3.0208666502 MAX 3 0 3 0 3 y

11 q x 1 y 1 x 2 y 2 Lq distance x to y 1.1 0.1 0.2.8.1 0.1 0.238.4.1 0.1 0.566.2.1 0.1 0 3.2.1.1 0.1 0 102.04.1 0.1 0 3355443.02.1 0.1 0 112589990684263.01.1 0.1 0 1.2676 E+29 2.1 0.1 0.141421356 x y P<1 L p metrics d 1/p (X,Y) = (  i=1 to n |x i - y i | 1/p ) p P<1 For p=0 (lim as p  0), Lp doesn’t exist (Does not converge.) q x 1 y 1 x 2 y 2 Lq distance x to y 1.5 0.5 0 1.8.5 0.5 0 1.19.4.5 0.5 0 2.83.2.5 0.5 0 16.1.5 0.5 0 512.04.5 0.5 0 16777216.02.5 0.5 0 5.63 E+14.01.5 0.5 0 6.34 E+29 2.5 0.5 0.7071 x y q x 1 y 1 x 2 y 2 Lq distance x to y 1.9 0 0.1 0 1.8.9 0 0.1 0 1.098.4.9 0 0.1 0 2.1445.2.9 0 0.1 0 10.82.1.9 0 0.1 0 326.27.04.9 0 0.1 0 10312196.962.02.9 0 0.1 0 341871052443154.01.9 0 0.1 0 3.8 E+29 2.9 0 0.1 0.906 x y

12 Min dissimilarity function The d min function ( d min (X,Y) = min i=1 to n |x i - y i | ) is strange. It is not even a psuedo-metric. The Unit Disk is: And the neighborhood of the blue point relative to the red point (the neighborhood of points closer to the blue than the red) is strangely shaped! http://www.cs.ndsu.nodak.edu/~serazi/research/Distance.html

13 Canberra metric: d c (X,Y) = (  i=1 to n |x i – y i | / (x i + y i ) normalized manhattan distance Square Cord metric: d sc (X,Y) =  i=1 to n (  x i –  y i ) 2 Already discussed as L p with p=1/2 Squared Chi-squared metric: d chi (X,Y) =  i=1 to n (x i – y i ) 2 / (x i + y i ) Scalar Product metric: d chi (X,Y) = X Y =  i=1 to n x i * y i Hyperbolic metrics: (which map infinite space 1-1 onto a sphere) Which are rotationally invariant? Translation invariant? Other? Other Interesting Metrics Some notes on distance functions can be found at http://www.cs.ndsu.NoDak.edu/~datasurg/distance_similarity.pdf


Download ppt "Distance Metric Measures the dissimilarity between two data points. A metric is a fctn, d, of 2 points X and Y, such that d(X, Y) is positive definite:"

Similar presentations


Ads by Google