Download presentation
Presentation is loading. Please wait.
1
cskin(C,k) allskin(C,k)s closed skin, and
Gvein any function, f:R(A1..An)Y (any range) and any SY, we define contour(f,S) = f-1(S). Note the DUALITY between functions f:R(A1..An)Y and derived attributes: x.Af = f(x) or fA(x) = x.A Note also Contour(Af,S) = SELECT A1,...,An FROM X WHERE X.Af in S. If S={a}, the contour is also called an isobar(f,a) as used in barometric pressure maps. Given a similarity, s:XXReals and an extension of s to similarity of disjoint subsets of X (e.g., single/complete/average link...) and CX R, we define a k-disk of C, disk(C,k)C : |disk(C,k)C'|=k and s(x,C)s(y,C) xdisk(C,k), ydisk(C,k). Define its skin(C,k) disk(C,k) - C skin stands for s k immediate neighbors and is a kNN set of C cskin(C,k) allskin(C,k)s closed skin, and ring(C,k) = cskin(C,k) - cskin(C,k-1) For C = {a} a r1 r2 C r1 r2 disk(C,r) {xX | s(x,C)r}, skin(C,r) disk(C,r) - C ring(C,r2,r1) disk(C,r2) - disk(C,r1) also = skin(C,r2) - skin(C,r1). Given a [psuedo] distance, d, rather than a similarity, just reverse all inequalities.
2
A broader definition of predicate trees (P-trees) based on functions?
(generalizes previous def.) Given a Relation, R(A1..An); function, f:RY and SY, the 0-Levels P-tree, 0Pf, S is the bit map of size |R| given by Pf,S(x)=1(true) iff f(x)S, xR. Note: With this definition, all P-trees are Contour maps (or isobars). If the function is a local density function (OPTICS) then these contours ARE CLUSTERS! The only question: What partition {Sk} of Y should I use? OPTICS uses a density threshold line. The Sk's are the intervals between graph(f) crossing points (all points below the line are agglomerated into 1 noise cluster). Weather reporters use equi-width interval partitions (of barametric pressure..). Pre-images of partition components are the clusters.
3
A 1-Level, (ls)-P-tree, denoted (ls)Pf,S is the compression of 0Pf,S by
1. order R (walk R: which results in the conversion of the bit map to a bit vector) 2. partitioning into consecutive segments of size ls (ls=leafsize, last 1 can be short) 3. eliminate and mask to 0, the pure-zero segments (via a Leaf Mask or LM ) 4. eliminate and mask to 1, the pure-one segments (via a Pure1 Mask or PM ) Note: LM is the existential rollup or aggregation of segments (leaves). Since the leaves are bit vectors themselves, the compression can be recursed to 2-level P-trees (ls1,ls2) Pf,S (ls2 is 2nd segment size, usually ls2<< ls1). Recursing again: (ls1,ls2,ls3) Pf,S If Ai is Real and fi,j(x) jth bit of xi; {*Pfi,j ,{1} *Pi,j} are the basic P-trees of Ai, and if Ai is categorical, fi,a(x)=1 if xi=aDom(Ai), else 0; {*Pfi,a,{1}*Pi,a} are Ai basic P-trees Real basic P-trees result from binary encoding of individual real numbers (categories) Encodings can be used for any attribute. It's the binary encoding of real attributes which turns an n tuple scan into a Log2(n) column AND (providing the P-tree scalability).
4
Smoothing (zooming) R(A1..An) using P-trees
The parameters defining the solution space of ways convert R (which is a horizontal relation or table) to a vertical forest (or P-trees) are: 1. method of ordering R (walking R) (e.g., (i1..in)-Raster, (i1..in)-Peano, (i1..in)-Hilbert, etc.) 2. leaf size vector (e.g., choice of number of levels, k, and a leafsize for each level, (ls1,...,lsk) Note: How to store these P-trees on disk is an important implementation parameter, but not a theoretical solution space parameter. Given the Basic P-tree set, BPT { (ls1,...lsk)Pi,j | j = position or a category, i=1..n}, a P-tree smoothing taxonomy requires two more solution space parameters: 3. smoothing level = sl (# of hi order bits) 4. rollup or aggregation method (e.g., count, existential, universal, etc.) So the overall smoothing solution space has four dimensions: Note: Smoothing is clustering (with a particular goal) and choosing a good initial partition-clustering centroid sets can be viewed as smoothing (and then choosing a central representative point in each smoothing component or cluster, e.g., the mean) ordering method of R (walk of R) leaf size sequence, (ls1,...,lsk) smoothing level, sl rollup or aggregation method
5
Upper right is the Mixed walk, Mw
Note that the ordering or walk issue is easily described using functions as well. Given a walk of R (which can be thought of as an ordering of the tuples of R and a numbering of those tuples in that order (the step numbers of each tuple in the walk: 1,2,3,...). In a walk, w:R-->{1,2,3,...}, where w(x)=step number of x in w, w itself is a function on R and defines contours. Since it is a candidate key (uniqueness property) every isobar w -1(n) is a singleton, {x} (where x is the nth step of the walk). Interval contours are sets of consecutive steps in the walk. x y z A B C D E N F G H I J K L M t u r s q n o l m j k h i p O P Q R S T U v d e f g 9 a b c w Upper right is the Mixed walk, Mw x y z A B C D E N F G H I J K L M t u r s x y z A B C D E N F G H I J K L M t u r s q q Lower left is x-first Peano walk, xPw Lower right is y-first Hilbert walk, yHw. n o l m j k h i p O P Q R S T U v n o l m j k h i p O P Q R S T U v d e f g 9 a b c d e f g 9 a b c w w
6
Upper left is y-first Raster ordering (walk) yRw.
t u r s Upper left is y-first Raster ordering (walk) yRw. Upper right is x-first Hilbert walk, xHw. Lower left is x-first Raster walk, xRw Lower right is y-first Raster walk, yRw. x y z A B C D E N F G H I J K L M x y z A B C D E N F G H I J K L M t u r s q q n o l m j k h i p O P Q R S T U v n o l m j k h i p O P Q R S T U v d e f g 9 a b c d e f g 9 a b c w w x y z A B C D E N F G H I J K L M t u r s x y z A B C D E N F G H I J K L M t u r s q q n o l m j k h i p O P Q R S T U v n o l m j k h i p O P Q R S T U v d e f g 9 a b c d e f g 9 a b c w w
7
5 7 1 4 2 3 8 8 1 16 1 This is smoothing using 0-D uncompressed Ptrees
0-Levels 2-hi Vertical Count Smoothing using a Mixed Bag Walk (0L Mw 2h VCS) This is smoothing using 0-D uncompressed Ptrees with count aggregation or rollup on 2-hi-grid cells. A j-hi grid is a grid of cells resulting from using the j hi-order bits to identify cells and the rest to walk the interior of each cell. j-lo uses the j lo order bits to walk cell interiors and the rest to id-cells. j-hi gives a square pattern of cells and j-lo gives square cells. When (and only when) the space is square (n..n space) are they equal (j-lo=(b-j)-hi where b=bitwidth(n).) 0L 2h VCS creates a 2-hi-grid count histogram and is order independent, but requires a 56-tuple multi-scan (or use rootcounts of each value Ptree?) K 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J L M N O P Q R S T U 1 3 1 2 2 3 1 2 1 5 x y z A B C D E N F G H I J K L M 7 1 4 t u r s q 2 3 n o l m j k h i 8 p O P Q R S T U 8 1 v d e f g 9 a b c 16 1 w (0,0) (0,1) (1,1) (3,0) (2,1) (0,2) (1,2) (0,3) (1,3) (2,3) (3,3) 16 14 12 10 8 6 4 2
8
0L Mw 3h CVS (Note: 0L and MBw refer to the vertical data structuring
employed (type of Ptrees) and 3h refers to the precision level used (3 high-order bit precision) produces very accurate smoothing, but involves (expensive?) multiple bit column scan processing. Even calculating rootcounts of P3h cells may be expensive? Trade-off? give up accuracy for speed. Use LMs instead of uncompressed bit slices (0-D P-trees)? See next slide. K 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J L M N O P Q R S T U 1 3 1 2 1 2 3 1 2 1 2 1 n o l m j k h i t u r s v w x y z A B C D E N F G H I J K L M q p O P Q R S T U d e f g 9 a b c How can we parameterize our Vertical Smoothing solution space? P-tree parameters, Precision parameters. walk method leaf sizes levels of leaves Aggregation method (count/existential..)
9
1L(8) Mw 2h C VS scans only the level-1 LM vectors instead of the full (level-0) uncompressed bit. Unlike 0L Mw 2h CVS 1L Mw 2h CVS depends on the order, but requires only a multi-scan of |LM|=7 bits (not the entire uncompressed bit slice of 56 bits). The walk is a mixed bag walk. K 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J L M N O P Q R S T U 1 3 7 1 2 3 2 3 1 2 1 3 4 x y z A B C D E N F G H I J K L M t u r s q n o l m j k h i p O P Q R S T U v d e f g 9 a b c w 1 13 1 12 1 23 1 22 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 1 1 01 01 01 1 1 1 01 1 1 01 01 01 1 01 1 1 1 1 1 1 01 01 01 1 1 01 1 1 10 1 1 01 1 01 1 1 01 1 1 1 1 1 1 10 01 01 01 10 10 1 1 1 1 1 01 1 1 1 1 01 10 10 10 10 01 1 1 10 1 1 1 1 1 1 1 01 1 1 10 10 10 01 01 10 10 01 10 1 01 1 1 10 0100 1 01 1 1011 01 10 10 10 01 10 01 10
10
2L(8,2) Mw 3h CVS using 2 levels of LeafMaps (leaf sizes 8 and 2 respectively - the black LMs and the red LMs). (When there is no red LM shown, it's pure and one can tell which type of purity from the black/blue LM/PMs). This is also the mixed bag ordering. 7 6 5 4 3 2 1 x y z A B C D E N F G H I J K L M 1 1 4 t u r s 3 q 1 2 2 1 1 1 n o l m j k h i p O P Q R S T U v 2 1 d e f g 9 a b c 2 2 2 2 w 1 13 1 12 11 1 1 23 1 22 1 21 13 12 11 10 23 22 21 20 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 1 1 01 01 01 1 1 1 01 1 1 01 01 01 1 01 1 1 1 1 1 1 01 01 01 1 1 01 1 1 10 1 1 01 1 01 1 1 01 1 1 1 1 1 1 10 01 01 01 10 10 1 1 1 1 1 01 1 1 1 1 01 10 10 10 10 01 1 1 10 1 1 1 1 1 1 1 1 1 10 10 10 01 01 01 10 10 01 10 1 01 1 1 10 0100 1 01 1 1011 01 10 10 01 10 01 10 10
11
2L(8,2) Mw 3h EVS (existential)
Note this also requires a scan of same LM set, so it is the same expense as count smoothing and give up much information (the only advantage is that the result may be simpler to express (one predicate tree over the 3-hi grid cells) 7 6 5 4 3 2 1 x y z A B C D E N F G H I J K L M t u r s q n o l m j k h i p O P Q R S T U v d e f g 9 a b c w 13 12 11 10 23 22 21 20 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 1 1 01 01 01 1 1 1 01 1 1 01 01 01 1 01 1 1 1 1 1 1 01 01 01 1 1 01 1 1 10 1 1 01 1 01 1 1 01 1 1 1 1 1 1 10 01 01 01 10 10 1 1 1 1 1 01 1 1 1 1 01 10 10 10 10 01 1 1 10 1 1 1 1 1 1 1 1 1 10 10 10 01 01 01 10 10 01 10 1 01 1 1 10 0100 1 01 1 1011 01 10 10 10 01 10 01 10
12
Note the obvious superior compression. .
key 1 5 6 2 3 4 8 7 b c g f e a 9 d h j n l o m k i U S P p Q O R T w v r s u t q D B A z N H F M L K J I G E C y x P13 1 P12 1 P11 1 P10 1 P23 1 P22 1 P21 1 P20 1 Hs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 18 20 22 24 26 28 30 32 34 36 37 38 40 42 44 46 88 114 169 170 171 172 180 195 196 197 198 202 205 206 210 223 224 225 226 245 248 249 251 252 Changing the walk order to y-first Hilbert and reconstructing the LM(8,2) Ptrees. Note the obvious superior compression. . x y z A B C D E N F G H I J K L M t u r s q n o l m j k h i p O P Q R S T U v d e f g 9 a b c w 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 10 01 10 1 1 1 01 1 1 10 01 10 10 01 1 1 1 1 10 11 11 1 01 01 10 01 10 01 10 1 1 1 10 1 1 1 01 01 01 10 01 10 10 01 1 1 1 1 10 1 1 1 1 1 01 1 1 01 1 1 01 10 10 01 10 1 1 10 1 01 1 1 1 1 10 1 1 01 10 01 1 10 1 1 01 1 01 1 1 1 1 1 1111 01 10 10 01 10
13
2L(8,2) y-first Hilbert Walk 3h Count-based Vertical Smoothing or
2L(8,2) yHw 3h CVS On these Hilbert ordered basic Ptrees smoothing with count aggregation by using the both levels of LMs (black and red) 7 6 5 4 3 2 1 1 3 1 1 x y z A B C D E N F G H I J K L M t u r s 3 1 q 1 2 1 1 2 1 n o l m j k h i 1 p O P Q R S T U v d e f g 9 a b c 1 2 3 2 2 w 1 13 1 12 1 11 1 23 1 22 1 21 11 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 10 01 10 1 1 1 01 1 1 10 01 10 10 01 1 1 1 10 11 11 1 01 01 10 01 10 01 10 1 1 1 10 1 1 1 01 01 01 10 01 10 10 01 1 1 1 1 10 1 01 1 1 1 1 01 1 1 01 1 1 10 10 01 10 1 1 10 1 01 1 1 1 1 10 1 1 01 10 01 1 10 1 1 01 1 01 1 1 1 1 1 1111 01 10 10 01 10 10
14
key 1 2 5 6 3 4 7 8 9 a d e b c f g h j i k l n m o U S T P Q p R O w v I J K L M G E C x y N H F z A D B q r s t u P13 1 P12 1 P11 1 P10 1 P23 1 P22 1 P21 1 P20 1 Change walk order to peano (x first or Z-ordered) and reconstruct 2-level Ptrees. x y z A B C D E N F G H I J K L M t u r s q n o l m j k h i p O P Q R S T U v d e f g 9 a b c w 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 01 1 1 01 1 1 01 01 01 1 1 01 01 1 1 1 01 01 01 1 1 1 10 01 1 01 1 1 01 10 01 10 01 10 01 1 1 1 10 1 1 1 01 10 01 10 10 01 10 01 1 1 1 1 10 1 1 1 01 1 1 01 1 1 01 1 1 1 1 01 10 01 10 10 1 1 1 1 1 1 1 01 1 1 10 01 01 10 10 01 1 1 1 1 1 1 01 1 10 10 1 1101 10 01 01 10 01 01
15
7 6 5 4 3 2 1 2L(8,2) xPw 3h CVS Now on these x-first Peano ordered basic Ptrees smoothing with count aggregation by using the both levels of LMs: 2 1 3 2 x y z A B C D E N F G H I J K L M t u r s 3 q 1 1 1 1 1 1 1 n o l m j k h i p O P Q R S T U 1 v 1 1 1 1 1 3 d e f g 9 a b c 2 2 w 1 13 1 12 1 01 11 1 23 1 22 1 21 01 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 01 1 1 01 1 1 01 01 01 1 1 01 01 1 1 1 01 01 01 1 1 1 10 01 1 01 1 1 01 10 01 10 01 10 01 1 1 1 10 1 1 1 01 10 01 10 10 01 10 01 1 1 1 1 10 1 1 1 01 1 1 1 1 01 1 1 01 1 1 01 10 01 10 10 1 1 1 1 1 1 1 01 1 1 10 01 01 10 10 01 1 1 1 1 1 1 01 1 10 10 1 1101 10 01 01 10 01 01
16
key 1 5 9 d j n 2 6 e h l 3 7 b a f k o I G E C x 4 8 c g i m J y S P p K N z U Q L A T O M H F D B R v q w r t s u P13 1 P12 1 P11 1 P10 1 P23 1 P22 1 P21 1 P20 1 Change walk order to raster (y-first or Z-ordered) and reconstruct 2-level Ptrees. x y z A B C D E N F G H I J K L M t u r s q n o l m j k h i p O P Q R S T U v d e f g 9 a b c w 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 01 01 01 1 1 1 1 1 01 1 1111 10 1 10 01 10 10 10 01 01 01 1 11 1 10 1 1 1 01 1 01 01 10 01 01 01 01 10 10 10 01 1 1 1 1 1 1 1 1 1 1 01 1 1 10 01 10 10 01 10 01 101 10 10 10 10 1 1 1 1 1 1 1 1 1 1 01 01 10 01 10 10 10 01 10 01 1 1 01 1 10 1 1 10 10 1 1 1 1 10 1 1 01 10 01 01 1 1 01 1 1 10 1 1 10 1 1 10 1 1 10 1 1 1 01 01 10 10 10 10 01 01 01
17
7 6 5 4 3 2 1 2L(8,2) yRw 3h CVS Now on these y-first raster ordered basic Ptrees smoothing with count aggregation by using the both levels of LMs: 4 3 3 1 3 3 x y z A B C D E N F G H I J K L M t u r s 1 q 1 1 1 2 3 1 n o l m j k h i p O P Q R S T U v 1 1 3 2 d e f g 9 a b c 2 w 1 13 1 12 1 11 1 23 11 1 22 10 1 21 1111 10 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 01 01 01 1 1 1 1 1 01 1 1111 10 1 10 01 10 10 10 01 10 01 1 11 1 10 1 1 1 01 1 1 01 10 10 01 10 01 10 10 10 10 1 1 1 1 1 1 1 1 1 1 01 1 1 10 10 10 01 01 10 01 101 10 10 10 10 1 1 1 1 1 1 1 1 1 1 01 01 10 01 10 10 10 01 10 01 1 1 01 1 10 1 1 10 10 1 1 1 1 10 1 1 01 10 01 01 1 1 01 1 1 10 1 1 10 1 1 10 1 1 10 1 1 1 01 01 10 10 10 10 01 01 01
18
3 1 1 1 1 1 3 1 2 2 3 1 2 2L(8,2) yRw 3h CVS 2L(8,2) xPw 3h CVS
q t u r s w 7 6 5 4 d e f g 9 a b c n o l m j k h i p O P Q R S T U x y z A B C D E N F G H I J K L M 1 3 v q t u r s w 7 6 5 4 2 1 1 d e f g 9 a b c n o l m j k h i p O P Q R S T U x y z A B C D E N F G H I J K L M 2L(8,2) yRw 3h CVS 2L(8,2) xPw 3h CVS Comparing orderings of 2L 3hi Count Smoothing n o l m j k h i t u r s v w x y z A B C D E N F G H I J K L M q p O P Q R S T U d e f g 9 a b c 2 7 6 5 4 3 1 3 2 1 n o l m j k h i v q p O P Q R S T U t u r s w 7 6 5 4 d e f g 9 a b c x y z A B C D E N F G H I J K L M 2L(8,2) Mw 3h CVS 2L(8,2) yHw 3h CVS
19
On these Hilbert ordered basic Ptrees smoothing with count aggregation
3 2 1 1L(8) yHw 2h CVS On these Hilbert ordered basic Ptrees smoothing with count aggregation by using highest level LMs only. x y z A B C D E N F G H I J K L M 2 1 t u r s q n o l m j k h i 1 p O P Q R S T U 1 v d e f g 9 a b c 2 w 13 12 23 22 13 12 11 10 23 22 21 20 1 1 1 1 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 10 01 10 1 1 1 01 1 1 10 01 10 10 01 1 1 1 10 11 11 1 01 01 10 01 10 01 10 1 1 1 10 1 1 1 01 01 01 10 01 10 10 01 1 1 1 1 10 1 1 1 1 1 01 1 1 01 1 1 01 10 10 01 10 1 1 10 1 01 1 1 1 1 10 1 1 01 10 01 1 10 1 1 01 1 01 1 1 1 1 1 1111 01 10 10 01 10 10
20
1L(8) xPw 2h VCS On these Peano ordered basic Ptrees smoothing with count aggregation by using highest level LMs only. x y z A B C D E N F G H I J K L M 1 2 1 t u r s q n o l m j k h i 1 p O P Q R S T U 1 v d e f g 9 a b c 2 w 13 12 23 22 13 12 11 10 23 22 21 20 1 1 1 1 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 01 1 1 01 1 1 01 01 01 1 1 01 01 1 1 1 01 01 01 1 1 1 10 01 1 01 1 1 01 10 01 10 01 10 01 1 1 1 10 1 1 1 01 10 01 10 10 01 10 01 1 1 1 1 10 1 1 1 1 1 1 1 01 1 1 01 1 1 01 01 10 01 10 10 1 1 1 1 1 1 1 01 1 1 10 01 01 10 10 01 1 1 1 1 1 1 01 1 10 10 1 1101 10 01 01 10 01 01
21
1L(8) yRw 2h CVS On these y-first raster ordered basic Ptrees smoothing with count aggregation by using highest level LMs only. 1 x y z A B C D E N F G H I J K L M 3 1 t u r s n o l m j k h i 2 p O P Q R S T U v d e f g 9 a b c w 13 12 23 22 13 12 11 10 23 22 21 20 1 1 1 1 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 01 01 01 1 1 1 1 1 01 1 1111 10 1 10 01 10 10 10 01 10 01 1 11 1 10 1 1 1 01 1 1 01 10 10 01 10 01 10 10 10 10 1 1 1 1 1 1 1 1 1 1 01 1 1 10 10 10 01 01 10 01 101 10 10 10 10 1 1 1 1 1 1 1 1 1 1 01 01 10 01 10 10 10 01 10 01 1 1 01 1 10 1 1 10 10 1 1 1 1 10 1 1 01 10 01 01 1 1 01 1 1 10 1 1 10 1 1 10 1 1 10 1 1 1 01 01 10 10 10 10 01 01 01
22
Comparing orderings of 1L 2hi Count Smoothing
n o l m j k h i v q p O P Q R S T U w 3 d e f g 9 a b c t u r s x y z A B C D E N F G H I J K L M 2 v w d e f g 9 a b c n o l m j k h i p O P Q R S T U 1 3 x y z A B C D E N F G H I J K L M t u r s 1L yRw 2h CVS 1L yHw 2h CVS Comparing orderings of 1L 2hi Count Smoothing d e f g 9 a b c n o l m j k h i t u r s v w x y z A B C D E N F G H I J K L M q p O P Q R S T U 1 2 v q w d e f g 9 a b c n o l m j k h i p O P Q R S T U x y z A B C D E N F G H I J K L M t u r s 1L xPw 2h CVS 1L Mw 2h CVS
23
1L Mw 3h CVS 7 6 5 4 3 2 1 x y z A B C D E N F G H I J K L M 3 t u r s 1 q n o l m j k h i 1 p O P Q R S T U v d e f g 9 a b c 1 w 13 12 11 23 22 21 13 12 11 10 23 22 21 20 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 1 1 01 01 01 1 1 1 01 1 1 01 01 01 1 01 1 1 1 1 1 1 01 01 01 1 1 01 1 1 10 1 1 01 1 01 1 1 01 1 1 1 1 1 1 10 01 01 01 10 10 1 1 1 1 1 01 1 1 1 1 01 10 10 10 10 01 1 1 10 1 1 1 1 1 1 1 01 1 1 10 10 10 01 01 10 10 01 10 1 01 1 1 10 0100 1 01 1 1011 01 10 10 01 10 01 10 10
24
1L yHw 3h CVS 2 x y z A B C D E N F G H I J K L M t u r s 1 q 1 1 n o l m j k h i p O P Q R S T U v d e f g 9 a b c 1 1 w 1 13 1 12 1 11 1 23 1 22 1 21 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 10 01 10 1 1 1 01 1 1 10 01 10 10 01 1 1 1 10 11 11 1 01 01 10 01 10 01 10 1 1 1 10 1 1 1 01 01 01 10 01 10 10 01 1 1 1 1 10 1 01 1 1 1 1 01 1 1 01 1 1 10 10 01 10 1 1 10 1 01 1 1 1 1 10 1 1 01 10 01 1 10 1 1 01 1 01 1 1 1 1 1 1111 01 10 10 01 10 10
25
1L xPw 2h CVS On these Peano ordered basic Ptrees smoothing with count aggregation by using highest level LMs only. 1 x y z A B C D E N F G H I J K L M t u r s 2 1 q 1 n o l m j k h i p O P Q R S T U 1 v d e f g 9 a b c 1 w 1 13 12 23 22 11 21 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 01 1 1 01 1 1 01 01 01 1 1 01 01 1 1 1 01 01 01 1 1 1 10 01 1 01 1 1 01 10 01 10 01 10 01 1 1 1 10 1 1 1 01 10 01 10 10 01 10 01 1 1 1 1 10 1 1 1 1 1 1 1 01 1 1 01 1 1 01 01 10 01 10 10 1 1 1 1 1 1 1 01 1 1 10 01 01 10 10 01 1 1 1 1 1 1 01 1 10 10 1 1101 10 01 01 10 01 01
26
1L yRw 3h CVS On these x-major raster ordered basic Ptrees smoothing with count aggregation by using highest level LMs only. x y z A B C D E N F G H I J K L M 1 1 2 t u r s 1 1 1 n o l m j k h i p O P Q R S T U v d e f g 9 a b c w 13 12 11 23 22 21 13 12 11 10 23 22 21 20 1 1 1 1 1 1 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 01 01 01 1 1 1 1 1 01 1 1111 10 1 10 01 10 10 10 01 10 01 1 11 1 10 1 1 1 01 1 1 01 10 10 01 10 01 10 10 10 10 1 1 1 1 1 1 1 1 1 1 01 1 1 10 01 01 10 10 10 01 101 10 10 10 10 1 1 1 1 1 1 1 1 1 1 01 01 10 01 10 10 10 01 10 01 1 1 01 1 10 1 1 10 10 1 1 1 1 10 1 1 01 10 01 01 1 1 01 1 1 10 1 1 10 1 1 10 1 1 10 1 1 1 01 01 10 10 10 10 01 01 01
27
2 1 1 2 1 1 3 1 2 1 1L(8) yRw 3h CVS 1L(8) yHw 3h CVS
q w t u r s n o l m j k h i p O P Q R S T U d e f g 9 a b c 2 x y z A B C D E N F G H I J K L M v w 1 1 1 2 d e f g 9 a b c n o l m j k h i p O P Q R S T U x y z A B C D E N F G H I J K L M t u r s 1L(8) yRw 3h CVS 1L(8) yHw 3h CVS Comparing orderings of 1L(8) 3h Count Smoothing 3 1 n o l m j k h i t u r s v w x y z A B C D E N F G H I J K L M q p O P Q R S T U d e f g 9 a b c 7 6 5 4 2 As far as using this info to create an good initial cluster centroid set, I like Hilbert because the centriod at (3,3) is strong and would attract I, so the initial clustering is very good (actually doesn't necessarily need improvement) v q w 1 2 d e f g 9 a b c n o l m j k h i p O P Q R S T U x y z A B C D E N F G H I J K L M t u r s 1L(8) Mw 3h CVS 1L(8) xPw 3h CVS
28
Someone should look at y-first-Peano yPw (N-ordering)
Comments so far Someone should look at y-first-Peano yPw (N-ordering) It might be a bit better since it moves immediately from the lower left octant to the one above it??? How about y-first Raster (x-major sorting order)? How about x-first-Hilbert? What about other aggregations? What about universal? (note that finding good initial centroids may work better using universal since it identifies only very dense areas (but maybe too dense? That is, too few centroid areas?). What about majority aggregation (1 iff the majority of the bits are 1-bits)? Note that one cannot use the LMs or PMs for this, but must recompute these bit vectors of size |LM| by examining each not-pure-zero leaf. What about other rank aggregations (e.g., 3/4ths i.e., 1 iff at least 3/4ths of the bits are 1-bits)? Of course any rank aggregation takes a lot of additional processing, whereas, existential and universal use the LM and PM vectors that are already computed and immdiately available. t u r s x y z A B C D E N F G H I J K L M q n o l m j k h i p O P Q R S T U v d e f g 9 a b c w
29
x-first Hilbert ordering is as shown here -->
Comments continued x-first Hilbert ordering is as shown here --> Below is x-first Raster and below at right is y-first Raster x y z A B C D E N F G H I J K L M t u r s q n o l m j k h i p O P Q R S T U v d e f g 9 a b c w x y z A B C D E N F G H I J K L M t u r s x y z A B C D E N F G H I J K L M t u r s q q n o l m j k h i p O P Q R S T U v n o l m j k h i p O P Q R S T U v d e f g 9 a b c d e f g 9 a b c w w
30
1L(8) Mw m13 0 0 0 1 m12 0 0 0 1 1 1 m11 0 1 m10 0 1 We have a wealth of classification and clustering tools now (also ARM). What methods leap to mind? m23 0 0 0 1 1 1 m22 0 0 1 1 0 1 m21 0 0 1 1 0 1 m20 0 1 1L(8) yHw H13 0 0 0 1 H12 0 0 1 1 0 1 H11 0 1 1 1 H10 0 1 H23 0 0 0 1 1 1 H22 0 0 1 1 0 1 H21 0 0 1 1 0 1 H20 0 1 1L(8) xPw P13 0 0 0 1 P12 0 0 1 1 0 1 P11 0 1 1 1 P10 0 1 PH23 0 0 0 1 1 1 P22 0 0 1 1 0 1 P21 0 0 1 1 0 1 P20 0 1 1L(8) yRw X13 0 0 0 1 X12 0 0 0 1 1 1 X11 0 0 0 1 1 1 X10 0 1 X23 0 0 0 1 X22 0 1 X21 0 1 X20 0 1
31
2 1 1 2 (8) yHw yHw ANDing Alg: resLM = ^TLM ^T'^PPM'
resLeaf exists iff resLM=1 resLeaf=^TresLeaf^T'^PresLeaf' H13 0 0 0 1 (8) yHw H12 0 0 1 1 0 1 H11 0 1 1 1 H10 0 1 H23 0 0 0 1 1 1 H22 0 0 1 1 0 1 H20 0 1 key 1 5 6 2 3 4 8 7 b c g f e a 9 d h j n l o m k i U S P p Q O R T w v r s u t q D B A z N H F M L K J I G E C y x P13 1 P12 1 P11 1 P10 1 P23 1 P22 1 P21 1 P20 1 Hs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 18 20 22 24 26 28 30 32 34 36 37 38 40 42 44 46 88 114 169 170 171 172 180 195 196 197 198 202 205 206 210 223 224 225 226 245 248 249 251 252 H21 0 0 1 1 0 1 2 1 n o l m j k h i v q p O P Q R S T U w d e f g 9 a b c t u r s x y z A B C D E N F G H I J K L M A fast isotropic clustering algorithm: 0. remove noise using H-step gap analysis. 1. Use 1L(8) 2h yHw CVS cells as initial clusters (with strengths) 2. expand the strongest cluster by 1 bit (but only if they do not collide with an existing cluster): expand (01 11) to (0 1) revise strength and repeat 2 This gives us 3 noise points {q,v,w} and 5 clusters (the right ones except that it doesn't separate out an tiny embedded cluster in octant (01,01) but that is to be expected since the diameter of that embedded cluster is smaller than the 2hi cell diameter. 1 2 n o l m j k h i v q p O P Q R S T U w d e f g 9 a b c t u r s x y z A B C D E N F G H I J K L M
32
1L xPw P13 0 0 0 1 P12 0 0 1 1 0 1 P11 0 1 1 1 P10 0 1 P23 0 0 0 1 1 1 P22 0 0 1 1 0 1 P21 0 0 1 1 0 1 P20 0 1 2L xPw 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 01 1 1 01 1 1 01 01 01 1 1 01 01 1 1 1 01 01 01 1 1 1 10 01 1 01 1 1 01 10 01 10 01 10 01 1 1 1 10 1 1 1 01 10 01 10 10 01 10 01 1 1 1 1 10 1 1 1 1 1 1 1 01 1 1 01 1 1 01 01 10 01 10 10 1 1 1 1 1 1 1 01 1 1 10 01 01 10 10 01 1 1 1 1 1 1 01 1 10 10 1 1101 10 01 01 10 01 01
33
1L yRw X13 0 0 0 1 X12 0 0 0 1 1 1 X11 0 0 0 1 1 1 X10 0 1 X23 0 0 0 1 X22 0 1 X21 0 1 X20 0 1 2L yRw 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 01 01 01 1 1 1 1 1 01 1 1111 10 1 10 01 10 10 10 01 10 01 1 11 1 10 1 1 1 01 1 1 01 10 10 01 10 01 10 10 10 10 1 1 1 1 1 1 1 1 1 1 01 1 1 10 01 10 10 01 10 01 101 10 10 10 10 1 1 1 1 1 1 1 1 1 1 01 01 10 01 10 10 10 01 10 01 1 1 01 1 10 1 1 10 10 1 1 1 1 10 1 1 01 10 01 01 1 1 01 1 1 10 1 1 10 1 1 10 1 1 10 1 1 1 01 01 10 10 10 10 01 01 01
34
Implementation Specification
R(A1..An) has basic Ptrees, (ls)Pi,j i=1..n and if Ai is real with bitwidth=mi or if Ai is categorical with categories {a1..ami} then j=1..mi Let m=i=1..nmi Sort {(ls)Pi,j} by i first, then j. Alias each P-tree by Pk where k is its sort position, k=1..m. Develop a simple transportable AND utility (assembler, C, C++...) that takes as input: 2 m-bit vectors P, T and and a 2-bit output-switch, S where P (Pattern) specifies which P-trees are to be involved by (1-bit) and T (Truth) has a 1-bit iff P=1 and is the operand (uncomplemented). For those with P=1 and T=0 their complements are the operand. Note: If a simple P-tree complement is called for (no ANDing) just set that P-bit to 1 and leave that T-bit at 0. Let M be a state variable specifying the number of P-trees in the set (M must be at least m). For the output-switch: if the first bit is 1, the result P-tree is to be stored as (ls)PM+1 and if the second bit is 1 the root count is to be returned. rc P,T,S (ls)PM+1
35
ANDing Algorithm: resLM = ^TLM ^T'^PPM'
resLeaf exists iff resLM=1 and then resLeaf=^TresLeaf^T'^PresLeaf' (if no operands, install pure1 or create a PM) K 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J L M N O P Q R S T U 1 3 7 1 2 3 1 3 4 1 22 2 3 1 2 1 3 4 2 1 3 5 2 1 3 1 32 5 1 31 7 1 30 3 1 21 3 1 20 7 1 10 2 2 32 1 7 2 31 1 7 2 30 1 5 2 21 1 2 20 1 9 2 10 1 e.g., 13^12: P= T= , resLM = LM13^LM12 = resLeaf(3): same P and T. LMs in red and PMs in blue below. so resLM ^ = PMs show that the 2 middle leaves are pure1 (rc=4 already) and that the last leaf of 13 is pure1 so just retrieve last leaf of 12 (01) and accumulate 1-count into rc (=5) and ANDing first leaves, 01 ^ 10 = 00, so rc=5 1 1 1 2L Mw 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 1 1 01 01 01 1 1 1 01 1 1 01 01 01 1 01 1 1 1 1 1 1 01 01 01 1 1 01 1 1 10 1 1 01 1 01 1 1 01 1 1 1 1 1 1 10 01 01 01 10 10 1 1 1 1 1 01 1 1 1 1 01 10 10 10 10 01 1 1 10 1 1 1 1 1 1 1 01 1 1 10 Note that PM(B')=LM'(B) LM(B')= PM'(B P = AND-input-pattern (vertical slices involved in AND) T = AND=truth-pattern (truth value of thos inolved) 10 10 01 01 10 10 01 10 1 01 1 1 10 0100 1 01 1 1011 01 10 10 01 10 01 10 10
36
resLM = ^TLM ^T'^PPM' resPM unnecessary - must be construct.
resLeaf exists iff resLM=1 and then resLeaf=^TresLeaf^T'^PresLeaf' K 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J L M N O P Q R S T U 1 3 7 1 2 3 1 3 4 1 22 2 3 1 2 1 3 4 2 1 3 5 2 1 3 1' 3 1 2 1 2 2 1 13'^12^20: P= T= , resLM = LM12^LM20 ^PM'13 = resLeaf(3456): ^ ^ = rc=12 1 1 1000 1000 10 1 1 1 01 1 11 10 00 1 1 1 1 10 00 01 11 1011 1 10 1 11 01 10 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 1 1 01 01 01 1 1 1 01 1 1 01 01 01 1 01 1 1 1 1 1 1 01 01 01 1 1 01 1 1 10 1 1 01 1 01 1 1 01 1 1 1 1 1 1 10 01 01 01 10 10 1 1 1 1 1 01 1 1 1 1 01 10 10 10 10 01 1 1 10 1 1 1 1 1 1 1 01 1 1 10 Note that PM(B')=LM'(B) LM(B')= PM'(B P = AND-input-pattern (vertical slices involved in AND) T = AND=truth-pattern (truth value of thos inolved) 10 10 01 01 10 10 01 10 1 01 1 1 10 0100 1 01 1 1011 01 10 10 01 10 01 10 10
37
Leaf Maps (red are type-1)
K 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J L M N O P Q R S T U 1 3 7 1 2 3 1 3 4 1 22 2 3 1 2 1 3 4 2 1 3 5 2 1 3 13 12 11 10 13 1 0-p13 12 1 12 1 0-p12 11 1 1-p11 10 1 0-p10 23 22 21 20 23 1 23 1 0-p23 22 1 22 1 1-p22 21 1 21 1 1-p21 20 1 0-p20 LeafOff=0; LeafOff=1; Leaf Maps (red are type-1) LeafOff=2; 13 1 12 1 11 1 10 1 23 1 22 1 21 1 20 1 LeafOff=3; LeafOff=4; Purity Maps LeafOff=5; 12 1 23 1 22 1 21 1 LeafSize=8, NOPL=7 LeafOff=6;
38
x y z A B t u C D r s E N F q G H I J K L M n o p O v l m P Q R j k
1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J L M N O P Q R S T U V 1 3 7 1 2 4 1 3 5 1 23 2 3 1 2 1 3 5 2 1 3 5 2 1 3 1 32 5 1 31 7 1 30 3 1 21 4 1 20 8 1 10 3 2 32 1 7 2 31 1 7 2 30 1 5 2 21 1 2 20 1 9 2 10 1 x y z A B C D E N F G H I J K L M t u r s q n o l m j k h i p O P Q R S T U V v d e f g 9 a b c w
39
APPENDIX: Smoothing with existential aggregation by using the top level LMs:
K 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J L M N O P Q R S T U 1 3 7 1 2 3 2 3 1 2 1 3 4 x y z A B C D E N F G H I J K L M t u r s q n o l m j k h i p O P Q R S T U v d e f g 9 a b c w 1 13 1 12 1 23 1 22 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 1 1 01 01 01 1 1 1 01 1 1 01 01 01 1 01 1 1 1 1 1 1 01 01 01 1 1 01 1 1 10 1 1 01 1 01 1 1 01 1 1 1 1 1 1 10 01 01 01 10 10 1 1 1 1 1 01 1 1 1 1 01 10 10 10 10 01 1 1 10 1 1 1 1 1 1 1 01 1 1 10 10 10 01 01 10 10 01 10 1 01 1 1 10 0100 1 01 1 1011 01 10 10 10 01 10 01 10
40
Smoothing using three hi order bits (aggregation by counts within
K 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J L M N O P Q R S T U 1 3 1 2 1 2 3 1 2 1 2 1 Smoothing using three hi order bits (aggregation by counts within 3-hi grid cells) x y z A B C D E N F G H I J K L M t u r s q n o l m j k h i p O P Q R S T U v d e f g 9 a b c w
41
Smoothing using only the two hi order bits Existential
K 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J L M N O P Q R S T U 1 3 7 1 2 3 2 3 1 2 1 3 4 Smoothing using only the two hi order bits Existential aggregation within 2-hi grid cells. x y z A B C D E N F G H I J K L M t u r s q n o l m j k h i p O P Q R S T U v Note that this gives us a very good a smoothing. However, as is duplicate elimination in, e.g., proj, it is very expensive (a vertical scan?) to get this data. If we can approximate using the LMs, it would help greatly! What we note from the following slides is that LM smoothing is sensitive to the "walk order" of the Ptrees. d e f g 9 a b c w
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.