Distance Metric Measures the dissimilarity between two data points. A metric is a fctn, d, of 2 points X and Y, such that d(X, Y) is positive definite: if (X Y), d(X, Y) > 0 if (X = Y), d(X, Y) = 0 d(X, Y) is symmetric: d(X, Y) = d(Y, X) d(X, Y) satisfies triangle inequality: d(X, Y) + d(Y, Z) d(X, Z)
Standard Distance Metrics Minkowski distance or L p distance, Manhattan distance, Euclidian distance, Max distance, (P = 1) (P = 2) (P = )
An Example A two-dimensional space: Manhattan, d 1 (X,Y) = XZ+ ZY = 4+3 = 7 Euclidian, d 2 (X,Y) = XY = 5 Max, d (X,Y) = Max(XZ, ZY) = XZ = 4 X (2,1) Y (6,4) Z d1 d2 dd1 d2 d For any positive integer p,
HOBbit Similarity Higher Order Bit (HOBbit) similarity: HOBbitS(A, B) = Bit position: x 1 : x 2 : y 1 : y 2 : HOBbitS(x 1, y 1 ) = 3 HOBbitS(x 2, y 2 ) = 4 A, B: two scalars (integer) a i, b i : i th bit of A and B (left to right) m : number of bits These notes contain NDSU confidential & Proprietary material. Patents pending on bSQ, Ptree technology
HOBbit Distance (High Order Bifurcation bit) HOBbit distance between two scalar value A and B: d v (A, B) = m – HOBbit(A, B) Example: Bit position: x 1 : x 2 : y 1 : y 2 : HOBbitS(x 1, y 1 ) = 3 HOBbitS(x 2, y 2 ) = 4 d v (x 1, y 1 ) = 8 – 3 = 5 d v (x 2, y 2 ) = 8 – 4 = 4 HOBbit distance for X and Y: In our example (considering 2-dim data): d h (X, Y) = max (5, 4) = 5
HOBbit Distance Is a Metric HOBbit distance is positive definite if (X = Y), = 0 if (X Y), > 0 HOBbit distance is symmetric HOBbit distance holds triangle inequality
Neighborhood of a Point Neighborhood of a target point, T, is a set of points, S, such that X S if and only if d(T, X) r 2r2r T X T 2r2r X 2r2r T X T 2r2r X Manhattan Euclidian Max HOBbit If X is a point on the boundary, d(T, X) = r
Decision Boundary decision boundary between points A and B, is the locus of the point X satisfying d(A, X) = d(B, X) B X A D R2R2 R1R1 d(A,X)d(A,X) d(B,X)d(B,X) > 45 Euclidian B A Max Manhattan < 45 B A Euclidian Max Manhattan B A B A Decision boundary for HOBbit Distance is perpendicular to axis that makes max distance Decision boundaries for Manhattan, Euclidean and max distance
Minkowski Metrics p=2 (Euclidean) p=1 (Manhattan).... P= (chessboard) P=½, ⅓, ¼, … d max ≡ max|x i - y i | d ≡ lim p d p (X,Y). Proof (sort of) lim p { i=1 to n a i p } 1/p max(a i ) ≡ b. For p large enough, other a i p << b p since y=x p increasingly concave, so i=1 to n a i p k*b p (k=duplicity of b in the sum), so { i=1 to n a i p } 1/p k 1/p *b and k 1/p 1 L p -metrics (aka: Minkowski metrics) d p (X,Y) = ( i=1 to n w i |x i - y i | p ) 1/p (weights, w i assumed =1) Unit Disks Boundary p=3,4,… ?
P>1 L p metrics q x 1 y 1 x 2 y 2 Lq distance x to y MAX y x q x 1 y 1 x 2 y 2 Lq distance x to y MAX x y q x 1 y 1 x 2 y 2 Lq distance x to y MAX x y q x 1 y 1 x 2 y 2 Lq distance x to y MAX x y q x 1 y 1 x 2 y 2 Lq distance x to y MAX x y q x 1 y 1 x 2 y 2 Lq distance x to y MAX x y x q x 1 y 1 x 2 y 2 Lq distance x to y MAX y
q x 1 y 1 x 2 y 2 Lq distance x to y E x y P<1 L p metrics d 1/p (X,Y) = ( i=1 to n |x i - y i | 1/p ) p P<1 For p=0 (lim as p 0), Lp doesn’t exist (Does not converge.) q x 1 y 1 x 2 y 2 Lq distance x to y E E x y q x 1 y 1 x 2 y 2 Lq distance x to y E x y
Min dissimilarity function The d min function ( d min (X,Y) = min i=1 to n |x i - y i | ) is strange. It is not even a psuedo-metric. The Unit Disk is: And the neighborhood of the blue point relative to the red point (the neighborhood of points closer to the blue than the red) is strangely shaped!
Canberra metric: d c (X,Y) = ( i=1 to n |x i – y i | / (x i + y i ) normalized manhattan distance Square Cord metric: d sc (X,Y) = i=1 to n ( x i – y i ) 2 Already discussed as L p with p=1/2 Squared Chi-squared metric: d chi (X,Y) = i=1 to n (x i – y i ) 2 / (x i + y i ) Scalar Product metric: d chi (X,Y) = X Y = i=1 to n x i * y i Hyperbolic metrics: (which map infinite space 1-1 onto a sphere) Which are rotationally invariant? Translation invariant? Other? Other Interesting Metrics Some notes on distance functions can be found at