Contours: Y R f R* f(x) Y R f S

Contours: Y R f R* f(x) Y R f S
 f:R(A1..An)  Y A1 A An Af x1 x xn f(x1..xn) : . . . R* Equivalently,  derived attribute, Af, with domain=Y (equivalence is x.Af = f(x) xR) A1 A An x1 x xn : . . . f(x) f R Y x and  SY, the f-contour(S) = f-1(S) Equiv., Af-contour(S) = Select x1..xn From R* Where x.Af=f(x1..xn) A1 A An : : S f R Y graph(f) = { ( x, f(x) ) | xR } f-contour(S) If S={a}, we use f-Isobar(a) equiv. Af-Isobar(a) If f is a local density and {Sk} is a partition of Y, {f-1(Sk)} partitions R. (eg, In OPTICS, f=reachability distance, {Sk} is the partition produced by intersections of graph-f wrt to a walk of R and a horizontal line. A Weather map use equiwidth interval partition of S=Reals (barometric pressure or temperature contours). A grid is the intersection partition with respect to the dimension projection functions (next slide). A Class is a contour under f:RC, the class map. An L -disk about a is the intersection of the -dimension_projection contours containing a.

f:R(A1..An)Y SY The (uncompressed) Predicate-tree 0Pf, S is : 0Pf,S(x)=1(true) iff f(x)S
The Compressed P-tree, sPf,S is the compression of 0Pf,S with equi-width leaf size, s, as follows 1. Choose a walk of R (converts 0Pf,S from bit map to bit vector) 2. Equi-width partition 0Pf,S with segment size, s (s=leafsize, the last segment can be short) 3. Eliminate and mask to 0, all pure-zero segments (call mask, NotPure0 Mask or EM) 4. Eliminate and mask to 1, all pure-one segments (call mask, Pure1 Mask or UM) (EM=existential aggregation UM=universal aggregation) Compressing each leaf of sPf,S with leafsize=s2 gives: s1,s2Pf,S Recursivly, s1, s2, s3Pf,S s1, s2, s3, s4Pf,S (builds an EM and a UM tree) BASIC P-trees If Ai Real or Binary and fi,j(x)  jth bit of xi ; {(*)Pfi,j ,{1} (*)Pi,j}j=b are basic (*)P-trees of Ai, *= s1..sk If Ai Categorical and fi,a(x)=1 if xi=a, else 0; {(*)Pfi,a,{1} (*)Pi,a}aR[Ai] are basic (*)P-trees of Ai Notes: The UM masks (e.g., of 2k,...,20Pi,j, with k=roof(log2|R| ), form a (binary) tree. Whenever the EM bit is 1, that entire subtree can be eliminated (since it represents a pure0 segment), then a 0-node at level-k (lowest level = level-0) with no sub-tree indicates a 2k-run of zeros. In this construction, the UM tree is redundant. We call these EM trees the basic binary P-trees.

= xRd=1..n(xd2 - 2adxd + ad2) i,j,k bit slices indexes
A useful functional: TV(a) =xR(x-a)o(x-a) If we use d for an index variable over the dimensions, = xRd=1..n(xd adxd ad2) i,j,k bit slices indexes = xRd=1..n(k2kxdk)2 - 2xRd=1..nad(k2kxdk) + |R||a|2 = xd(i2ixdi)(j2jxdj) - 2xRd=1..nad(k2kxdk) + |R||a|2 = xdi,j 2i+jxdixdj 2 x,d,k2k ad xdk |R||a|2 = x,d,i,j 2i+j xdixdj |R||a|2 2 dad x,k2kxdk = x,d,i,j 2i+j xdixdj |R|dadad 2|R| dadd + = x,d,i,j 2i+j xdixdj dadad ) |R|( -2dadd + TV(a) = i,j,d 2i+j |Pdi^dj| - |R||a|2 k2k+1 dad |Pdk| + Note that the first term does not depend upon a. Thus, the derived attribute, TV-TV() (eliminate 1st term) is much simpler to compute and has identical contours (just lowers the graph by TV() ). We also find it useful to post-compose a log to reduce the number of bit slices. The resulting functional is called the High-Dimension-ready Total Variation or HDTV(a).

TV(a) = x,d,i,j 2i+j xdixdj + |R| ( -2dadd + dadad )
From equation 7, f(a)=TV(a)-TV() TV(a) = x,d,i,j 2i+j xdixdj + |R| ( -2dadd + dadad ) = |R| ( -2d(add-dd) + d(adad- dd) ) + dd2 ) = |R|( dad2 - 2ddad = |R| |a-| so f()=0 g(a) HDTV(a) = ln( f(a) )= ln|R| + ln|a-|2 The length of g(a) depends only on the length of a-, so isobars are hyper-circles centered at  The graph of g is a log-shaped hyper-funnel: For an -contour ring (radius  about a) go inward and outward along a- by  to the points; inner point, b=+(1-/|a-|)(a-) and outer point, c=-(1+/|a-|)(a-).  x1 x2 g(a)=HDTV(x) g(b) g(c) Then take g(b) and g(c) as lower and upper endpoints of a vertical interval. Then we use EIN formulas on that interval to get a mask P-tree for the -contour (which is a well-pruned superset of the -neighborhood of a) -contour (radius  about a) a b c

contour of dimension projection f(a)=a1
If the HDTV circumscribing contour of a is still too populous, use circumscribing Ad-contour (Note: Ad is not a derived attribute at all, but just Ad, so we already have its basic P-trees). As pre-processing, calculate basic P-trees for the HDTV derived attribute (or another hypercircular contour derived attribute). To classify a 1. Calculate b and c (Depend on a, ) 2. Form mask P-tree for training pts with HDTV-values[HDTV(b),HDTV(c)] 3. User that P-tree to prune out the candidate NNS. If the count of candidates is small, proceed to scan and assign class votes using Gaussian vote function, else prune further using a dimension projections). (Use voting function, G(x) = Gauss(|x-a|)-Gauss(), where Gauss(r) is (1/(std*2)e-(r-mean)2/2var (std, mean, var are wrt set distances from a of voters i.e., {r=|x-a|: x a voter} ) HDTV(x)  -contour (radius  about a) a HDTV(c) We can also note that HDTV can be further simplified (retaining same contours) using h(a)=|a-|. Since we create the derived attribute by scanning the training set, why not just use this very simple function? Others leap to mind, e.g., hb(a)=|a-b| HDTV(b) x1 contour of dimension projection f(a)=a1 b c x2

Graphs of functionals with hyper-circular contours
TV()=TV(x33) TV(x15) 1 2 3 4 5 X Y TV  HDTV Graphs of functionals with hyper-circular contours  h(a)=|a-| 1 2 3 TV(x15)-TV() 4 5 X Y TV-TV()  hb(a)=|a-b| b

= (1/|a|)d(xxdad) factor out ad
Angular Variation functionals: e.g., AV(a)  ( 1/|a| ) xR xoa d is an index over the dimensions,  COS(a) = (1/|a|)xRd=1..nxdad = (1/|a|)d(xxdad) factor out ad = (1/|a|)d=1..n(xxd) ad = |R|/|a|d=1..n((xxd)/|R|) ad = |R|/|a|d=1..n d ad COS(a)  a = ( |R|/|a| )  o a COS(a)  AV(a)/(|||R|) = oa/(|||a|) = cos(a) COS (and AV) has hyper-conic isobars center on  COSb(a)?  a b COS and AV have -contour(a) = the space between two hyper-cones center on  which just circumscribes the Euclidean -hyperdisk at a. Intersection (in pink) with HDTV -contour. Graphs of functionals with hyper-conic contours: E.g., COSb(a) for any vector, b

f(a)x = (x-a)o(x-a) d = index over dims,
= d=1..n(xd adxd ad2) i,j,k bit slices indexes = d=1..n(k2kxdk) 2d=1..nad(k2kxdk) |a|2 = d(i2ixdi)(j2jxdj) - 2d=1..nad(2kxdk) |a|2 =di,j 2i+jxdixdj 2 d,k2k ad xdk |a|2 f(a)x = i,j,d 2i+j (Pdi^dj)x |a|2 k2k+1 dad (Pdk)x + β exp( -f(a)x ) = βexp(-i,j,d 2i+j (Pdi^dj)x) *exp( -|a|2 ) * exp( k2k+1 dad (Pdk)x ) β exp( -f(a)x ) = β (exp(-i,j,d 2i+j (Pdi^dj)x) exp( -|a|2 ) * exp( k2k+1 dad (Pdk)x ) ) Adding up the Gaussian weighted votes for class c: xcβ exp( -f(a)x ) = β xc (exp(-i,j,d 2i+j (Pdi^dj)x) exp( -|a|2 ) * exp( k2k+1 dad (Pdk)x ) ) xc exp((-i,j,d 2i+j (Pdi^dj)x) k,d2k+1 ad (Pdk)x ) Collecting diagonal terms inside exp xc exp( ij,d -2i+j (Pdi^dj)x + i=j,d(ad2i+1-22i ) (Pdi)x ) i,j,d inside exp we have coefs which do not involve x multiplied by a 1-bit or a 0-bit, depending on x thus for fixed i,j,d we either have the x-indep coef (if 1-bit) or we don't (if 0-bit) xc ( ij,d exp(-2i+j (Pdi^dj)x) * i=j,d exp((ad2i+1-22i)(Pdi)x) )  ( ij,d:Pdijx=1 exp(-2i+j ) * i=j,d:Pdijx=1 exp((ad2i+1-22i)) ) (eq1)

Contours: Y R f R* f(x) Y R f S

Similar presentations

Presentation on theme: "Contours: Y R f R* f(x) Y R f S"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Contours: Y R f R* f(x) Y R f S

Similar presentations

Presentation on theme: "Contours: Y R f R* f(x) Y R f S"— Presentation transcript:

Similar presentations

About project

Feedback