Contours: Y R f R* f(x) Y R f S f:R(A1..An) Y A1 A2 An Af x1 x2 xn f(x1..xn) : . . . R* Equivalently, derived attribute, Af, with domain=Y (equivalence is x.Af = f(x) xR) A1 A2 An x1 x2 xn : . . . f(x) f R Y x and SY, the f-contour(S) = f-1(S) Equiv., Af-contour(S) = Select x1..xn From R* Where x.Af=f(x1..xn) A1 A2 An : : . . . S f R Y graph(f) = { ( x, f(x) ) | xR } f-contour(S) If S={a}, we use f-Isobar(a) equiv. Af-Isobar(a) If f is a local density and {Sk} is a partition of Y, {f-1(Sk)} partitions R. (eg, In OPTICS, f=reachability distance, {Sk} is the partition produced by intersections of graph-f wrt to a walk of R and a horizontal line. A Weather map use equiwidth interval partition of S=Reals (barometric pressure or temperature contours). A grid is the intersection partition with respect to the dimension projection functions (next slide). A Class is a contour under f:RC, the class map. An L -disk about a is the intersection of the -dimension_projection contours containing a.
f:R(A1..An)Y SY The (uncompressed) Predicate-tree 0Pf, S is : 0Pf,S(x)=1(true) iff f(x)S The Compressed P-tree, sPf,S is the compression of 0Pf,S with equi-width leaf size, s, as follows 1. Choose a walk of R (converts 0Pf,S from bit map to bit vector) 2. Equi-width partition 0Pf,S with segment size, s (s=leafsize, the last segment can be short) 3. Eliminate and mask to 0, all pure-zero segments (call mask, NotPure0 Mask or EM) 4. Eliminate and mask to 1, all pure-one segments (call mask, Pure1 Mask or UM) (EM=existential aggregation UM=universal aggregation) Compressing each leaf of sPf,S with leafsize=s2 gives: s1,s2Pf,S Recursivly, s1, s2, s3Pf,S s1, s2, s3, s4Pf,S ... (builds an EM and a UM tree) BASIC P-trees If Ai Real or Binary and fi,j(x) jth bit of xi ; {(*)Pfi,j ,{1} (*)Pi,j}j=b..0 are basic (*)P-trees of Ai, *= s1..sk If Ai Categorical and fi,a(x)=1 if xi=a, else 0; {(*)Pfi,a,{1} (*)Pi,a}aR[Ai] are basic (*)P-trees of Ai Notes: The UM masks (e.g., of 2k,...,20Pi,j, with k=roof(log2|R| ), form a (binary) tree. Whenever the EM bit is 1, that entire subtree can be eliminated (since it represents a pure0 segment), then a 0-node at level-k (lowest level = level-0) with no sub-tree indicates a 2k-run of zeros. In this construction, the UM tree is redundant. We call these EM trees the basic binary P-trees.
= xRd=1..n(xd2 - 2adxd + ad2) i,j,k bit slices indexes A useful functional: TV(a) =xR(x-a)o(x-a) If we use d for an index variable over the dimensions, = xRd=1..n(xd2 - 2adxd + ad2) i,j,k bit slices indexes = xRd=1..n(k2kxdk)2 - 2xRd=1..nad(k2kxdk) + |R||a|2 = xd(i2ixdi)(j2jxdj) - 2xRd=1..nad(k2kxdk) + |R||a|2 = xdi,j 2i+jxdixdj - 2 x,d,k2k ad xdk + |R||a|2 = x,d,i,j 2i+j xdixdj - |R||a|2 2 dad x,k2kxdk + = x,d,i,j 2i+j xdixdj - |R|dadad 2|R| dadd + = x,d,i,j 2i+j xdixdj + dadad ) |R|( -2dadd + TV(a) = i,j,d 2i+j |Pdi^dj| - |R||a|2 k2k+1 dad |Pdk| + Note that the first term does not depend upon a. Thus, the derived attribute, TV-TV() (eliminate 1st term) is much simpler to compute and has identical contours (just lowers the graph by TV() ). We also find it useful to post-compose a log to reduce the number of bit slices. The resulting functional is called the High-Dimension-ready Total Variation or HDTV(a).
TV(a) = x,d,i,j 2i+j xdixdj + |R| ( -2dadd + dadad ) From equation 7, f(a)=TV(a)-TV() TV(a) = x,d,i,j 2i+j xdixdj + |R| ( -2dadd + dadad ) = |R| ( -2d(add-dd) + d(adad- dd) ) + dd2 ) = |R|( dad2 - 2ddad = |R| |a-|2 so f()=0 g(a) HDTV(a) = ln( f(a) )= ln|R| + ln|a-|2 The length of g(a) depends only on the length of a-, so isobars are hyper-circles centered at The graph of g is a log-shaped hyper-funnel: For an -contour ring (radius about a) go inward and outward along a- by to the points; inner point, b=+(1-/|a-|)(a-) and outer point, c=-(1+/|a-|)(a-). x1 x2 g(a)=HDTV(x) g(b) g(c) Then take g(b) and g(c) as lower and upper endpoints of a vertical interval. Then we use EIN formulas on that interval to get a mask P-tree for the -contour (which is a well-pruned superset of the -neighborhood of a) -contour (radius about a) a b c
contour of dimension projection f(a)=a1 If the HDTV circumscribing contour of a is still too populous, use circumscribing Ad-contour (Note: Ad is not a derived attribute at all, but just Ad, so we already have its basic P-trees). As pre-processing, calculate basic P-trees for the HDTV derived attribute (or another hypercircular contour derived attribute). To classify a 1. Calculate b and c (Depend on a, ) 2. Form mask P-tree for training pts with HDTV-values[HDTV(b),HDTV(c)] 3. User that P-tree to prune out the candidate NNS. If the count of candidates is small, proceed to scan and assign class votes using Gaussian vote function, else prune further using a dimension projections). (Use voting function, G(x) = Gauss(|x-a|)-Gauss(), where Gauss(r) is (1/(std*2)e-(r-mean)2/2var (std, mean, var are wrt set distances from a of voters i.e., {r=|x-a|: x a voter} ) HDTV(x) -contour (radius about a) a HDTV(c) We can also note that HDTV can be further simplified (retaining same contours) using h(a)=|a-|. Since we create the derived attribute by scanning the training set, why not just use this very simple function? Others leap to mind, e.g., hb(a)=|a-b| HDTV(b) x1 contour of dimension projection f(a)=a1 b c x2
Graphs of functionals with hyper-circular contours TV()=TV(x33) TV(x15) 1 2 3 4 5 X Y TV HDTV Graphs of functionals with hyper-circular contours h(a)=|a-| 1 2 3 TV(x15)-TV() 4 5 X Y TV-TV() hb(a)=|a-b| b
= (1/|a|)d(xxdad) factor out ad Angular Variation functionals: e.g., AV(a) ( 1/|a| ) xR xoa d is an index over the dimensions, COS(a) = (1/|a|)xRd=1..nxdad = (1/|a|)d(xxdad) factor out ad = (1/|a|)d=1..n(xxd) ad = |R|/|a|d=1..n((xxd)/|R|) ad = |R|/|a|d=1..n d ad COS(a) a = ( |R|/|a| ) o a COS(a) AV(a)/(|||R|) = oa/(|||a|) = cos(a) COS (and AV) has hyper-conic isobars center on COSb(a)? a b COS and AV have -contour(a) = the space between two hyper-cones center on which just circumscribes the Euclidean -hyperdisk at a. Intersection (in pink) with HDTV -contour. Graphs of functionals with hyper-conic contours: E.g., COSb(a) for any vector, b
f(a)x = (x-a)o(x-a) d = index over dims, = d=1..n(xd2 - 2adxd + ad2) i,j,k bit slices indexes = d=1..n(k2kxdk)2 - 2d=1..nad(k2kxdk) + |a|2 = d(i2ixdi)(j2jxdj) - 2d=1..nad(2kxdk) + |a|2 =di,j 2i+jxdixdj - 2 d,k2k ad xdk + |a|2 f(a)x = i,j,d 2i+j (Pdi^dj)x - |a|2 k2k+1 dad (Pdk)x + β exp( -f(a)x ) = βexp(-i,j,d 2i+j (Pdi^dj)x) *exp( -|a|2 ) * exp( k2k+1 dad (Pdk)x ) β exp( -f(a)x ) = β (exp(-i,j,d 2i+j (Pdi^dj)x) exp( -|a|2 ) * exp( k2k+1 dad (Pdk)x ) ) Adding up the Gaussian weighted votes for class c: xcβ exp( -f(a)x ) = β xc (exp(-i,j,d 2i+j (Pdi^dj)x) exp( -|a|2 ) * exp( k2k+1 dad (Pdk)x ) ) xc exp((-i,j,d 2i+j (Pdi^dj)x) + k,d2k+1 ad (Pdk)x ) Collecting diagonal terms inside exp xc exp( ij,d -2i+j (Pdi^dj)x + i=j,d(ad2i+1-22i ) (Pdi)x ) i,j,d inside exp we have coefs which do not involve x multiplied by a 1-bit or a 0-bit, depending on x thus for fixed i,j,d we either have the x-indep coef (if 1-bit) or we don't (if 0-bit) xc ( ij,d exp(-2i+j (Pdi^dj)x) * i=j,d exp((ad2i+1-22i)(Pdi)x) ) ( ij,d:Pdijx=1 exp(-2i+j ) * i=j,d:Pdijx=1 exp((ad2i+1-22i)) ) (eq1)