Space Filling Curves and Functional Contours

Slides:

Advertisements

Similar presentations

PARTITIONAL CLUSTERING

Advertisements

Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5

Searching on Multi-Dimensional Data

Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.

Radial Basis Functions

16 MULTIPLE INTEGRALS.

KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.

16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.

October 8, 2013Computer Vision Lecture 11: The Hough Transform 1 Fitting Curve Models to Edges Most contours can be well described by combining several.

ME 2304: 3D Geometry & Vector Calculus Dr. Faraz Junejo Double Integrals.

Vertical Set Square Distance: A Fast and Scalable Technique to Compute Total Variation in Large Datasets Taufik Abidin, Amal Perera, Masum Serazi, William.

September 23, 2014Computer Vision Lecture 5: Binary Image Processing 1 Binary Images Binary images are grayscale images with only two possible levels of.

Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.

Machine Learning is based on Near Neighbor Set(s), NNS. Clustering, even density based, identifies near neighbor cores 1 st (round NNS s,  about a center).

A Fast and Scalable Nearest Neighbor Based Classification Taufik Abidin and William Perrizo Department of Computer Science North Dakota State University.

Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.

CS 103 Discrete Structures Lecture 13 Induction and Recursion (1)

Our Approach  Vertical, horizontally horizontal data vertically)  Vertical, compressed data structures, variously called either Predicate-trees or Peano-trees.

Fast and Scalable Nearest Neighbor Based Classification Taufik Abidin and William Perrizo Department of Computer Science North Dakota State University.

Paper_topic: Parallel Matrix Multiplication using Vertical Data.

Computer Graphics CC416 Lecture 04: Bresenham Line Algorithm & Mid-point circle algorithm Dr. Manal Helal – Fall 2014.

Computer Graphics Lecture 06 Circle Drawing Techniques Taqdees A. Siddiqi

April 21, 2016Introduction to Artificial Intelligence Lecture 22: Computer Vision II 1 Canny Edge Detector The Canny edge detector is a good approximation.

Vertical Set Square Distance Based Clustering without Prior Knowledge of K Amal Perera,Taufik Abidin, Masum Serazi, Dept. of CS, North Dakota State University.

SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.

October 3, 2013Computer Vision Lecture 10: Contour Fitting 1 Edge Relaxation Typically, this technique works on crack edges: pixelpixelpixel pixelpixelpixelebg.

Primitive graphic objects

Chapter 5: Integration Section 5.1 An Area Problem; A Speed-Distance Problem An Area Problem An Area Problem (continued) Upper Sums and Lower Sums Overview.

Machine Learning: Ensemble Methods

Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.

Lesson 8: Basic Monte Carlo integration

3.1 Clustering Finding a good clustering of the points is a fundamental issue in computing a representative simplicial complex. Mapper does not place any.

Indexing Structures for Files and Physical Database Design

Sets in the Complex Plane

Updating SF-Tree Speaker: Ho Wai Shing.

Bitmap Image Vectorization using Potrace Algorithm

Digital Logic & Design Dr. Waseem Ikram Lecture 02.

3. The X and Y samples are independent of one another.

Database analysis can be broken down into 2 areas,

Fast Kernel-Density-Based Classification and Clustering Using P-Trees

Efficient Image Classification on Vertically Decomposed Data

Efficient Ranking of Keyword Queries Using P-trees

Using Algebra Tiles to Solve Equations, Combine Like Terms, and use the Distributive Property Objective: To understand the different parts of an equation,

= xRd=1..n(xd2 - 2adxd + ad2) i,j,k bit slices indexes

Mean Shift Segmentation

Smoothing using only the two hi order bits (aggregation by

Chapter 15 QUERY EXECUTION.

Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.

Computer Vision Lecture 5: Binary Image Processing

Fitting Curve Models to Edges

Efficient Image Classification on Vertically Decomposed Data

A Fast and Scalable Nearest Neighbor Based Classification

K Nearest Neighbor Classification

Pre-Processing What is the best amount of amortized preprocessing?

Vertical K Median Clustering

A Fast and Scalable Nearest Neighbor Based Classification

Digital Logic & Design Lecture 02.

Locality Sensitive Hashing

cskin(C,k)  allskin(C,k)s closed skin, and

Lecture 2- Query Processing (continued)

Vertical K Median Clustering

UNIVERSITY OF MASSACHUSETTS Dept

Contours: Y R f R* f(x) Y R f S

Chapter 7: Transformations

President’s Day Lecture: Advanced Nearest Neighbor Search

Implementation of Learning Systems

Introduction to Artificial Intelligence Lecture 22: Computer Vision II

Unit II Game Playing.

Presentation transcript:

Space Filling Curves and Functional Contours Database analysis can be broken down into 2 areas, Querying and Data Mining. Data Mining can be broken down into 2 areas, Machine Learning and Assoc. Rule Mining Machine Learning can be broken down into 2 areas, Clustering and Classification. Clustering can be broken down into 2 areas, Isotropic (round clusters) and Density-based Machine Learning usually begins by identifying Near Neighbor Set(s), NNS. In Isotropic Clustering, one identifies round sets (disk shaped NNSs about a center). In Density Clustering, one identifies cores (dense round NNSs) then pieces them together. In any Classification based on continuity we classifying a sample based on its NNS class histogram (aka kNN) or we identify isotropic NNSs of centroids (k-means) or we build decision tres with training leafsets and use them to classify samples that fall to that leaf, we find class boundaries (e.g., SVM) which distinguish NNSs in one class from NNSs in another. The basic definition of continuity from elementary calculus proves NNSs are fundamental: >0 >0 : d(x,a)<  d(f(x),f(a))< or  NNS about f(a),  a NNS about a that maps inside it. So NNS Search is a fundamental problem to be solved. We discuss NNS Search from the a vertical data point of view. With vertically structured data, the only neighborhoods that are easily determined are the cubic or Max neighborhoods (L disks), yet usually we want Euclidean disks. We develop techniques to circumscribe Euclidean disks using the intersections of contour sets, the main of which are coordinate projection contours, the intersection of which form L disks. First we review the standard “space filling curves”, Peano (Z) and Hilbert. In both cases, as the gridding gets finer and finer, each point on the curve converges to a point in the square and those points densely fill the square (no empty spaces of any size) which is why they are called “space filling curves”. Other than the raster orderings of a gridding of the square, these two methods may have advantages in that Peano ordering preserves distance better than a raster ordering (not as many massive junps) and Hilbert ordering preserves distance even better than Peano (always move to a neighbor). Recall that choosing a pixel or voxel ordering is the first step in creating vertical pTree spatial data. In any Geospatial Analysis, some ordering of the pixels (voxels in 3D) is required. Which one is best may depend upon what the definition of best is and what data area is being analyzed. After a brief look at space filling curves, we treat functional contours. How are functional contours related to space filling curves? With space filling curves, we try to cover a “space” (2D space) with a 1D curve (up to the pixelization of the space). That is, we try to “fill in” the space with a function from the real line. In functional contouring, we consider sort of the opposite, namely what gets mapped to a single point by a function into the real line (that is, what does the preimage of a point look like). Familiar examples include isobars (which points get mapped to the same pressure value), isotherms, etc.

Hilbert Ordering? In 2-dimensions, Peano ordering is 22-recursive z-ordering (raster ordering) Hilbert ordering is 44-recursive tuning fork ordering (H-trees have fanout=16)

. . . . . . . . . . . . . . . . . . down down right left up down right 0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F down right left up down . . . . . . . . . 0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F . . . 0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F right down up 0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F Coordinates of a tuning-fork (upper-left) depend on ancestry. (x,y) = (ggrrbb, ggrrbb). If your parent points Down and you are the H node in your tuning-fork, your 2-bit contribution is given by: row(x) col(y) 0  00 , 00 1  00 , 01 2  01 , 01 3  01 , 00 4  10 , 00 5  11 , 00 6  11 , 01 7  10 , 01 8  10 , 10 9  11 , 10 A  11 , 11 B  10 , 11 C  01 , 11 D  01 , 10 E  00 , 10 F  00 , 11 Lookup table for Up, Left, Right Parents are similar. 1 2 3 4 5 6 7 8 9 A B C D E F

FUNCTIONAL CONTOURS: R f YS Y S R R* f Y f(x) Given f:R(A1..An)Y (any range but usually the Reals) and SY (any subset of the range, but usually 1 real) , define contour(f,S)  f-1(S). R f A1 A2 An : : . . . YS A1..An space Y S graph(f) = { (a1,...,an,f(a1.an)) | (a1..an)R } contour(f,S) There is a DUALITY between functions, f:R(A1..An)Y and derived attributes, Af of R given by x.Af  f(x) where Dom(Af)=Y A1 A2 An x1 x2 xn : . . . Y f(x) f A1 A2 An Af x1 x2 xn f(x) R R* Contour(Af,S) = SELECT A1..An FROM R* WHERE R*.Af  S. If S={a}, f-1({a}) is Isobar(f, a)

cskin(C,k)  allskin(C,k)s closed skin, and Given a similarity, s:RRReals (e.g., s(x,y)=s(y,x) and s(x,x)s(x,y) x,yR ) and an extension to disjoint subsets of R (e.g., single/complete/average link...) and CR, a k-disk of C is: disk(C,k)C : |disk(C,k)C'|=k and s(x,C)s(y,C) xdisk(C,k), ydisk(C,k). Define its skin(C,k)  disk(C,k) - C skin stands for s k immediate neighbors and is a kNNS of C cskin(C,k)  allskin(C,k)s closed skin, and ring(C,k) = cskin(C,k) - cskin(C,k-1) disk(C,r)  {xR | s(x,C)r}, skin(C,r)  disk(C,r) - C ring(C,r2,r1)  disk(C,r2) - disk(C,r1)  skin(C,r2) - skin(C,r1). C r1 r2 For C = {a} a r1 r2 Given a [psuedo] distance, d, rather than a similarity, just reverse all inequalities.

A definition of Predicate trees (P-trees) based on functionals? (generalizes, but does not alter, previous definitions) Given f:R(A1..An)Y and SY define the uncompressed Functional-P-tree as Pf, S  a bit map given by Pf,S(x)=1 iff f(x)S. . The predicate for 0Pf,S is the set containment predicate, f(x)S Pf,S a Contour bit map (bitmaps, rather than lists the contour points). If f is a local density (ala OPTICS) and {Sk} a partition of Y, {f-1(Sk)} is a clustering! What partition {Sk} of Y should be use? E.g., a binary partition? (given by a threshold value). In OPTICS Sks are the intervals between crossing points of graph(f) and a threshold line pts below the threshold line are agglomerated into 1 noise cluster. Weather reporters use equi-width interval partitions (of barametric pressure or temp..).

Compressed Functional-P-trees (with equi-width leaf size, ls) (ls)Pf,S is a compression of Pf,S by doing the following: 1. order or walk R (converts the bit map to a bit vector) 2. equi-width partition R into segments of size, ls (ls=leafsize, the last 1 can be short) 3. eliminate and mask to 0, all pure-zero segments (via a Leaf Mask or LM ) 4. eliminate and mask to 1, all pure-one segments (via a Pure1 Mask or PM ) Notes: 1. LM is an existential aggregation of R (1 iff that leaf has a 1-bit). Others? (default=existential) 2. There are partitioning other than equi-width (but that will be the default). Doubly Compressed Functional-P-trees with equi-width leaf sizes, (ls1,ls2) Each leaf of (ls)Pf,S is an uncompressed bit vector and can be compressed the same way: (ls1,ls2) Pf,S (ls2 is 2nd equi-width segment size and ls2<< ls1) Recursive compression can continue ad infinitum, (ls1,ls2,ls3) Pf,S (ls1,ls2,ls3,ls4) Pf,S ...

BASIC P-trees For Ai Real and fi,j(x)  jth bit of the ith component, xi {(*)Pfi,j ,{1}  (*)Pi,j}j=b..0 are the basic (*)P-trees of Ai, (* = ls1,...lsk k=0...). For Ai Categorical, and fi,a(x)=1 if xi=aR[Ai], else 0; then {(*)Pfi,a,{1} (*)Pi,a}aR[Ai] are the basic (*)P-trees of Ai For Ai real, the basic P-trees result from binary encoding of individual real numbers (categories). Encodings can be used for any attribute. Note that it is the binary encoding of real attributes, which turns an n-tuple scan into a Log2(n)-column AND (making P-tree technology scalable). Next, we consider various contour functionals that re useful in Machine Learning, starting with Total Variation, TV.

= xRd=1..n(xd2 - 2adxd + ad2) i,j,k bit slices indexes R(A1..An) TV(a)=xR(x-a)o(x-a) If we use d for a index variable over the dimensions, = xRd=1..n(xd2 - 2adxd + ad2) i,j,k bit slices indexes = xRd=1..n(k2kxdk)2 - 2xRd=1..nad(k2kxdk) + |R||a|2 = xd(i2ixdi)(j2jxdj) - 2xRd=1..nad(k2kxdk) + |R||a|2 = xdi,j 2i+jxdixdj - 2 x,d,k2k ad xdk + |R||a|2 = x,d,i,j 2i+j xdixdj - |R||a|2 2 dad x,k2kxdk + = x,d,i,j 2i+j xdixdj - |R|dadad 2|R| dadd + = x,d,i,j 2i+j xdixdj + dadad ) |R|( -2dadd + TV(a) = i,j,d 2i+j |Pdi^dj| - |R||a|2 k2k+1 dad |Pdk| + collecting |Pdk|s: TV(a) = i>j,d 2i+j+1 |Pdi^dj| + |R| (a12+..+an2) k,d (22k- 2k+1ad) |Pdk| + Note that the first term (the only one involving dual bit-slice predicates) does not depend upon a at all! So it can be subtracted from TV(a), giving a simpler derived attr, TV with identical contours (just a lowered graph) and which can be calculated simply from the basic Ptree rootcounts themselves (no preprocessing). Then subtracting TV() (=mean of R) is a function with identical contours (a High Dimensoin-ready TV).

TV(a) = x,d,i,j 2i+j xdixdj + |R| ( -2dadd + dadad ) From equation 7, f(a)=TV(a)-TV() TV(a) = x,d,i,j 2i+j xdixdj + |R| ( -2dadd + dadad ) = |R| ( -2d(add-dd) + d(adad- dd) ) + dd2 ) = |R|( dad2 - 2ddad = |R| |a-|2 f()=0 and letting g(a) HDTV(a) = ln( f(a) )= ln|R| + ln|a-|2 Taking  g / ad (a) = | a- |2 2( a -)d The Gradient of g at a = 2/| a- |2 (a -) The gradient =0 iff a= and gradient length depends only on the length of a- so isobars are hyper-circles The gradient function is has the form, h(r) = 2/r in along any ray from , Integrating, we get that g(a) has the form, 2ln|a-| along any coordinate direction (in fact any radial direction from ), so the shape of graph(g) is a funnel:  -contour (radius  about a) What inteval endpts gives an exact -contour in feature space? a f(b) f(c) b c The way to get an exact -contour is to move in and out along a- by  to inner point, b=µ+(1-/|a-|)(a-) and outer point c=µ+(1+/|a-|)(a-). Then take f(b) and f(c) as lower and upper endpoints of the red vertical interval (use EIN formulas on that interval to get a mask of the exact -contour).

The procedure is alway as shown in the previous slide. To classify a Finally we note that the very same vertical pruning procedures can be used for any functional that requires no additional preprocessing (even if it does require preprocessing - i.e., additional ANDing and Root Counting just to generate the derived attribute values), can be used efficiently (e.g., the dimension projection functionals already have all their basic Ptrees generated for us (since their basic P-trees are precisely the basic P-trees of that dimension). The procedure is alway as shown in the previous slide. To classify a 1. Calculate basic P-trees for the derived attribute column of each training point 2. Calculate b and c (depend upon a and the  chosen) 3. Mask the feature space mask for those points with derived attribute value in that the EIN ring [f(b),f(c)] (that is the precise -contour set). 4. User that mask to prune. 5. If the root count of the candidate set is scan-able, proceed to scan and assign votes, else look for another pruning functional (note that the combination of HDTV and all dimension projections will always suffice).  -contour (radius  about a) a f(b) f(c) b c

Graph of TV, TV-TV() and HDTV TV()=TV(x33) TV(x15) 1 2 3 4 5 X Y TV  HDTV 1 2 3 TV(x15)-TV() 4 5 X Y TV-TV()

Parameters for Vertical Structuring and Smoothing (zooming) of R(A1 Parameters for Vertical Structuring and Smoothing (zooming) of R(A1..An) The parameters defining the conversion of horizontal tables to P-trees are: 1. method of ordering R (walking R) (e.g., (i1..in)-Raster, (i1..in)-Peano, (i1..in)-Hilbert, etc.) 2. leaf sizes (e.g., choice of number of levels, k, and a leafsize for each level, (ls1,...,lsk) Note: How to store these P-trees on disk is an important implementation parameter, but not a theoretical solution space parameter. Given the Basic P-tree set, BPT { (ls1,...lsk)Pi,j | i=col, j = bit position or category}, a P-tree smoothing taxonomy requires two more solution space parameters: 3. smoothing level = sl (# of low order bits) 4. rollup or aggregation method (the predicate) (e.g., count, existential, universal, rank, etc.) So the vertical smoothing solution space has four dimensions: Note: Smoothing is clustering (with a particular goal) and choosing good initial partition-clustering centroid sets can be done by smoothing (and then choosing a representative point in each smoothing component or cluster, e.g., the mean) ordering method of R (walk of R) leaf size sequence, (ls1,...,lsk) smoothing level, sl rollup or aggregation method

What is the goal of smoothing R(A1..An) This is the first question to be answered. One answer is that smoothing can increase the speed of DM algorithm processing and solve the curse of cardinality (essentially as a better alternative than random sub-sampling). In this direction, we think of smoothing as pre-clustering rather than random selection, to reduce the cardinality of the table being mined, hopefully without hiding exceptional data (as random sampling almost always does). So this application of smoothing requires that the smoothing algorithm be fast (or be amortizable) and also, if possible, be sensitive to exceptional data, else why do it? A related direction for smoothing is that is can be a method of pruning to reduce the computational complexity of an algorithm, i.e., to produce only the strong preclusters. Then the points outside these cores, can be individually scanned, e.g., to find exceptions or to be processed in some other way. In this direction, the smoothing is used to isolate those dense core neighborhoods that can be treated as "one unit" and therefore vastly increase the processing speed (over examining each individual point in each core). Other goals of Smoothing?

Note that the walk order issue is easily described using functions as well. Given a walk of R (which can be thought of as an ordering of the tuples of R and a "step" numbering of those tuples in that order (i.e., assigning a step number to each tuple in the walk: 1,2,3,...). In a walk, w:R{1,2,3,...}, w itself is a function on R and defines contours. Since it is a candidate key (uniqueness property) every isobar, w -1(n), is a singleton, {x} (where x is the nth step of the walk). Interval contours are sets of consecutive steps in the walk. The # of steps from x to y is always an upper bound to the Manhattan distance (if x and y are close in steps, they are close in Manhattan distance). n o l m j k h i v x y z A B C D E N F G H I J K L M q p O P Q R S T U d e f g 9 a b c 5 6 7 8 1 2 3 4 t u r s w A Mixed walk, Mw n o l m j k h i v x y z A B C D E N F G H I J K L M q p O P Q R S T U d e f g 9 a b c 5 6 7 8 1 2 3 4 t u r s w x-first Peano walk, xPw n o l m j k h i v x y z A B C D E N F G H I J K L M q p O P Q R S T U d e f g 9 a b c 5 6 7 8 1 2 3 4 t u r s w y-first Hilbert walk, yHw.

y-first Peano order walk yRw. x-first Hilbert walk, xHw. l m j k h i v x y z A B C D E N F G H I J K L M q p O P Q R S T U d e f g 9 a b c 5 6 7 8 1 2 3 4 t u r s w y-first Peano order walk yRw. n o l m j k h i v x y z A B C D E N F G H I J K L M q p O P Q R S T U d e f g 9 a b c 5 6 7 8 1 2 3 4 t u r s w x-first Hilbert walk, xHw. n o l m j k h i v x y z A B C D E N F G H I J K L M q p O P Q R S T U d e f g 9 a b c 5 6 7 8 1 2 3 4 t u r s w x-first Raster walk, xRw n o l m j k h i v x y z A B C D E N F G H I J K L M q p O P Q R S T U d e f g 9 a b c 5 6 7 8 1 2 3 4 t u r s w y-first Raster walk, yRw.

5 7 1 4 2 3 8 8 1 16 1 This is smoothing using uncompressed Ptrees Mixed Walk, Uncompressed, 2-bit Count Smoothing (Mw () 2 C) This is smoothing using uncompressed Ptrees with count aggregation on 2-lo-grid cells. A j-hi grid is a grid of cells resulting from using the j high-order bits to identify cells and the rest to walk the interior of each cell. j-lo uses the j low order bits to walk cell interiors and the rest to id-cells. j-hi gives a square pattern of cells and j-lo gives square cells. When (and only when) the space is square (n..n space) are they equal (j-lo=(b-j)-hi where b=bitwidth(n).) Mw()2C creates a 2-lo-grid count histogram and is order independent, but requires a 56-tuple multi-scan (or use rootcounts of each value Ptree?) K 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J L M N O P Q R S T U 1 3 1 2 2 3 1 11111111111111 2 1 5 x y z A B C D E N F G H I J K L M 7 1 4 t u r s q 2 3 n o l m j k h i 8 p O P Q R S T U 8 1 v d e f g 9 a b c 5 6 7 8 1 2 3 4 16 1 w (0,0) (0,1) (1,1) (3,0) (2,1) (0,2) (1,2) (0,3) (1,3) (2,3) (3,3) 16 14 12 10 8 6 4 2

Mw () 1 C produces very accurate smoothing, but involves (expensive?) multiple bit column scan processing. Even calculating rootcounts of P1-lo cells may be expensive? Trade-off? give up accuracy for speed. Use LMs instead of uncompressed bit slices See next slide. K 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J L M N O P Q R S T U 1 3 1 2 1 2 3 1 11111111111111 2 1 2 1 n o l m j k h i t u r s v w x y z A B C D E N F G H I J K L M q p O P Q R S T U d e f g 9 a b c 5 6 7 8 1 2 3 4

Mw (8) 2C scans the 1st level LM vectors instead of the full uncompressed bit. It depends on the order, but requires only a multi-scan of |LM|=7 bits (not the entire uncompressed bit slice of 56 bits). K 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J L M N O P Q R S T U 1 3 7 1 2 3 2 3 1 11111111111111 2 1 3 4 x y z A B C D E N F G H I J K L M t u r s q n o l m j k h i p O P Q R S T U v d e f g 9 a b c 5 6 7 8 1 2 3 4 w 1 13 1 12 1 23 1 22 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 1 1 01 01 01 1 1 1 01 1 1 01 01 01 1 01 1 1 1 1 1 1 01 01 01 1 1 01 1 1 10 1 1 01 1 01 1 1 01 1 1 1 1 1 1 10 01 01 01 10 10 1 1 1 1 1 01 1 1 1 1 01 10 10 10 10 01 1 1 10 1 1 1 1 1 1 1 01 1 1 10 10 10 01 01 10 10 01 10 1 01 1 1 10 0100 1 01 1 1011 01 10 10 10 01 10 01 10

Mw (8,2) 1 C using 2 levels of LeafMaps (leaf sizes 8 and 2 respectively - the black LMs and the red LMs). (When there is no red LM shown, it's pure and one can tell which type of purity from the black/blue LM/PMs). 0 1 2 3 4 5 6 7 7 6 5 4 3 2 1 x y z A B C D E N F G H I J K L M 1 1 4 t u r s 3 q 1 2 2 1 1 1 n o l m j k h i p O P Q R S T U v 2 1 d e f g 9 a b c 5 6 7 8 1 2 3 4 2 2 2 2 w 1 13 1 12 11 1 1 23 1 22 1 21 13 12 11 10 23 22 21 20 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 1 1 01 01 01 1 1 1 01 1 1 01 01 01 1 01 1 1 1 1 1 1 01 01 01 1 1 01 1 1 10 1 1 01 1 01 1 1 01 1 1 1 1 1 1 10 01 01 01 10 10 1 1 1 1 1 01 1 1 1 1 01 10 10 10 10 01 1 1 10 1 1 1 1 1 1 1 1 1 10 10 10 01 01 01 10 10 01 10 1 01 1 1 10 0100 1 01 1 1011 01 10 10 01 10 01 10 10

Mw(8,2) 1 E (existential aggregation) Note this also requires a scan of same LM set, so it is the same expense as count smoothing but give up much information (the only advantage is that the result may be simpler to express (one predicate tree over the 1-lo grid cells) 0 1 2 3 4 5 6 7 7 6 5 4 3 2 1 x y z A B C D E N F G H I J K L M t u r s q n o l m j k h i p O P Q R S T U v d e f g 9 a b c 5 6 7 8 1 2 3 4 w 13 12 11 10 23 22 21 20 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 1 1 01 01 01 1 1 1 01 1 1 01 01 01 1 01 1 1 1 1 1 1 01 01 01 1 1 01 1 1 10 1 1 01 1 01 1 1 01 1 1 1 1 1 1 10 01 01 01 10 10 1 1 1 1 1 01 1 1 1 1 01 10 10 10 10 01 1 1 10 1 1 1 1 1 1 1 1 1 10 10 10 01 01 01 10 10 01 10 1 01 1 1 10 0100 1 01 1 1011 01 10 10 10 01 10 01 10

hs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17 19 21 23 25 27 29 31 33 35 36 37 39 41 43 45 87 113 168 169 170 171 179 194 195 196 197 201 204 205 209 222 223 224 225 244 247 248 250 251 hs 7 1 2 hs 6 1 9 hs 5 1 2 hs 4 1 9 hs 3 1 2 7 hs 2 1 5 hs 1 2 6 hs 1 3 7 P13 1 k 1 5 6 2 3 4 8 7 b c g f e a 9 d h j n l o m i U S P p Q O R T w v r s u t q D B A z N H F M L K J I G E C y x P12 1 P11 1 P10 1 P23 1 P22 1 P21 1 P20 1 x y z A B C D E N F G H I J K L M t u r s q n o l m j k h i p O P Q R S T U v d e f g 9 a b c 5 6 7 8 1 2 3 4 w 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 10 01 10 1 1 1 01 1 1 10 01 10 10 01 1 1 1 1 10 11 11 1 01 01 10 01 10 01 10 1 1 1 10 1 1 1 01 01 01 10 01 10 10 01 1 1 1 1 10 1 01 1 1 1 1 01 1 1 01 1 1 10 10 01 10 1 1 10 1 01 1 1 1 1 10 1 1 Changing the walk order to y-first Hilbert and reconstructing the LM(8,2) Ptrees. Note compression. 01 10 01 1 10 1 1 01 1 01 1 1 1 1 1 1111 01 10 10 01 10

y-first Hilbert (8,2) 1-lo Count Vertical Smoothing yH (8,2) 1 C On these Hilbert ordered basic Ptrees smoothing with count aggregation by using the both levels of LMs (black and red) 0 1 2 3 4 5 6 7 7 6 5 4 3 2 1 1 3 1 1 x y z A B C D E N F G H I J K L M t u r s 3 1 q 1 2 1 1 2 1 n o l m j k h i 1 p O P Q R S T U v d e f g 9 a b c 5 6 7 8 1 2 3 4 1 2 3 2 2 w 1 13 1 12 1 11 1 23 1 22 1 21 11 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 10 01 10 1 1 1 01 1 1 10 01 10 10 01 1 1 1 10 11 11 1 01 01 10 01 10 01 10 1 1 1 10 1 1 1 01 01 01 10 01 10 10 01 1 1 1 1 10 1 01 1 1 1 1 01 1 1 01 1 1 10 10 01 10 1 1 10 1 01 1 1 1 1 10 1 1 01 10 01 1 10 1 1 01 1 01 1 1 1 1 1 1111 01 10 10 01 10 10

key 1 2 5 6 3 4 7 8 9 a d e b c f g h j i k l n m o U S T P Q p R O w v I J K L M G E C x y N H F z A D B q r s t u P13 1 P12 1 P11 1 P10 1 P23 1 P22 1 P21 1 P20 1 Change walk order to peano (x first or Z-ordered) and reconstruct 2-level Ptrees. x y z A B C D E N F G H I J K L M t u r s q n o l m j k h i p O P Q R S T U v d e f g 9 a b c 5 6 7 8 1 2 3 4 w 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 01 1 1 01 1 1 01 01 01 1 1 01 01 1 1 1 01 01 01 1 1 1 10 01 1 01 1 1 01 10 01 10 01 10 01 1 1 1 10 1 1 1 01 10 01 10 10 01 10 01 1 1 1 1 10 1 1 1 01 1 1 01 1 1 01 1 1 1 1 01 10 01 10 10 1 1 1 1 1 1 1 01 1 1 10 01 01 10 10 01 1 1 1 1 1 1 01 1 10 10 1 1101 10 01 01 10 01 01

0 1 2 3 4 5 6 7 7 6 5 4 3 2 1 xP (8,2) 1 C Now on these x-first Peano ordered basic Ptrees smoothing with count aggregation by using the both levels of LMs: 2 1 3 2 x y z A B C D E N F G H I J K L M t u r s 3 q 1 1 1 1 1 1 1 n o l m j k h i p O P Q R S T U 1 v 1 1 1 1 1 3 d e f g 9 a b c 5 6 7 8 1 2 3 4 2 2 w 1 13 1 12 1 01 11 1 23 1 22 1 21 01 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 01 1 1 01 1 1 01 01 01 1 1 01 01 1 1 1 01 01 01 1 1 1 10 01 1 01 1 1 01 10 01 10 01 10 01 1 1 1 10 1 1 1 01 10 01 10 10 01 10 01 1 1 1 1 10 1 1 1 01 1 1 1 1 01 1 1 01 1 1 01 10 01 10 10 1 1 1 1 1 1 1 01 1 1 10 01 01 10 10 01 1 1 1 1 1 1 01 1 10 10 1 1101 10 01 01 10 01 01

key 1 5 9 d j n 2 6 e h l 3 7 b a f k o I G E C x 4 8 c g i m J y S P p K N z U Q L A T O M H F D B R v q w r t s u P13 1 P12 1 P11 1 P10 1 P23 1 P22 1 P21 1 P20 1 Change walk order to raster (y-first or Z-ordered) and reconstruct 2-level Ptrees. x y z A B C D E N F G H I J K L M t u r s q n o l m j k h i p O P Q R S T U v d e f g 9 a b c 5 6 7 8 1 2 3 4 w 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 01 01 01 1 1 1 1 1 01 1 1111 10 1 10 01 10 10 10 01 01 01 1 11 1 10 1 1 1 01 1 01 01 10 01 01 01 01 10 10 10 01 1 1 1 1 1 1 1 1 1 1 01 1 1 10 01 10 10 01 10 01 101 10 10 10 10 1 1 1 1 1 1 1 1 1 1 01 01 10 01 10 10 10 01 10 01 1 1 01 1 10 1 1 10 10 1 1 1 1 10 1 1 01 10 01 01 1 1 01 1 1 10 1 1 10 1 1 10 1 1 10 1 1 1 01 01 10 10 10 10 01 01 01

0 1 2 3 4 5 6 7 7 6 5 4 3 2 1 yR (8,2) 1 C Now on these y-first raster ordered basic Ptrees smoothing with count aggregation by using the both levels of LMs: 4 3 3 1 3 3 x y z A B C D E N F G H I J K L M t u r s 1 q 1 1 1 2 3 1 n o l m j k h i p O P Q R S T U v 1 1 3 2 d e f g 9 a b c 5 6 7 8 1 2 3 4 2 w 1 13 1 12 1 11 1 23 11 1 22 10 1 21 1111 10 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 01 01 01 1 1 1 1 1 01 1 1111 10 1 10 01 10 10 10 01 10 01 1 11 1 10 1 1 1 01 1 1 01 10 10 01 10 01 10 10 10 10 1 1 1 1 1 1 1 1 1 1 01 1 1 10 10 10 01 01 10 01 101 10 10 10 10 1 1 1 1 1 1 1 1 1 1 01 01 10 01 10 10 10 01 10 01 1 1 01 1 10 1 1 10 10 1 1 1 1 10 1 1 01 10 01 01 1 1 01 1 1 10 1 1 10 1 1 10 1 1 10 1 1 1 01 01 10 10 10 10 01 01 01

3 1 2 1 1 v q t u r s w 0 1 2 3 4 5 6 7 7 6 5 4 d e f g 9 a b c 5 6 7 8 1 2 3 4 n o l m j k h i p O P Q R S T U x y z A B C D E N F G H I J K L M 1 3 v q t u r s w 0 1 2 3 4 5 6 7 7 6 5 4 2 1 1 d e f g 9 a b c 5 6 7 8 1 2 3 4 n o l m j k h i p O P Q R S T U x y z A B C D E N F G H I J K L M yR (8,2) 1 C xP (8,2) 1 C Comparing orderings of (8,2) 1-low-bit Count Smoothing n o l m j k h i t u r s v w x y z A B C D E N F G H I J K L M q p O P Q R S T U d e f g 9 a b c 5 6 7 8 1 2 3 4 2 0 1 2 3 4 5 6 7 7 6 5 4 3 1 3 2 1 n o l m j k h i v q p O P Q R S T U t u r s w 0 1 2 3 4 5 6 7 7 6 5 4 d e f g 9 a b c 5 6 7 8 1 2 3 4 x y z A B C D E N F G H I J K L M M (8,2) 1 C yH (8,2) 1 C

On these Hilbert ordered basic Ptrees smoothing with count aggregation 0 1 2 3 3 2 1 yH (8) 2 C On these Hilbert ordered basic Ptrees smoothing with count aggregation by using highest level LMs only. x y z A B C D E N F G H I J K L M 2 1 t u r s q n o l m j k h i 1 p O P Q R S T U 1 v d e f g 9 a b c 5 6 7 8 1 2 3 4 2 w 13 12 23 22 13 12 11 10 23 22 21 20 1 1 1 1 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 10 01 10 1 1 1 01 1 1 10 01 10 10 01 1 1 1 10 11 11 1 01 01 10 01 10 01 10 1 1 1 10 1 1 1 01 01 01 10 01 10 10 01 1 1 1 1 10 1 1 1 1 1 01 1 1 01 1 1 01 10 10 01 10 1 1 10 1 01 1 1 1 1 10 1 1 01 10 01 1 10 1 1 01 1 01 1 1 1 1 1 1111 01 10 10 01 10 10

xP (8,2) 2 C On these Peano ordered basic Ptrees smoothing with count aggregation by using highest level LMs only. x y z A B C D E N F G H I J K L M 1 2 1 t u r s q n o l m j k h i 1 p O P Q R S T U 1 v d e f g 9 a b c 5 6 7 8 1 2 3 4 2 w 13 12 23 22 13 12 11 10 23 22 21 20 1 1 1 1 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 01 1 1 01 1 1 01 01 01 1 1 01 01 1 1 1 01 01 01 1 1 1 10 01 1 01 1 1 01 10 01 10 01 10 01 1 1 1 10 1 1 1 01 10 01 10 10 01 10 01 1 1 1 1 10 1 1 1 1 1 1 1 01 1 1 01 1 1 01 01 10 01 10 10 1 1 1 1 1 1 1 01 1 1 10 01 01 10 10 01 1 1 1 1 1 1 01 1 10 10 1 1101 10 01 01 10 01 01

yR (8,2) 2 C On these y-first raster ordered basic Ptrees smoothing with count aggregation by using highest level LMs only. 1 x y z A B C D E N F G H I J K L M 3 1 t u r s n o l m j k h i 2 p O P Q R S T U v d e f g 9 a b c 5 6 7 8 1 2 3 4 w 13 12 23 22 13 12 11 10 23 22 21 20 1 1 1 1 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 01 01 01 1 1 1 1 1 01 1 1111 10 1 10 01 10 10 10 01 10 01 1 11 1 10 1 1 1 01 1 1 01 10 10 01 10 01 10 10 10 10 1 1 1 1 1 1 1 1 1 1 01 1 1 10 10 10 01 01 10 01 101 10 10 10 10 1 1 1 1 1 1 1 1 1 1 01 01 10 01 10 10 10 01 10 01 1 1 01 1 10 1 1 10 10 1 1 1 1 10 1 1 01 10 01 01 1 1 01 1 1 10 1 1 10 1 1 10 1 1 10 1 1 1 01 01 10 10 10 10 01 01 01

Comparing orderings of (8,2) 2-lo-bit Count Smoothing 1 2 n o l m j k h i v q p O P Q R S T U w 0 1 2 3 3 d e f g 9 a b c 5 6 7 8 1 2 3 4 t u r s x y z A B C D E N F G H I J K L M 2 v w d e f g 9 a b c 5 6 7 8 1 2 3 4 n o l m j k h i p O P Q R S T U 1 3 x y z A B C D E N F G H I J K L M t u r s yR (8,2) 2 C yH (8,2) 2 C Comparing orderings of (8,2) 2-lo-bit Count Smoothing d e f g 9 a b c 5 6 7 8 1 2 3 4 n o l m j k h i t u r s v w x y z A B C D E N F G H I J K L M q p O P Q R S T U 1 2 v q w d e f g 9 a b c 5 6 7 8 1 2 3 4 n o l m j k h i p O P Q R S T U x y z A B C D E N F G H I J K L M t u r s xP (8,2) 2 C M (8,2) 2 C

M (8,2) 1 C 0 1 2 3 4 5 6 7 7 6 5 4 3 2 1 x y z A B C D E N F G H I J K L M 3 t u r s 1 q n o l m j k h i 1 p O P Q R S T U v d e f g 9 a b c 5 6 7 8 1 2 3 4 1 w 13 12 11 23 22 21 13 12 11 10 23 22 21 20 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 1 1 01 01 01 1 1 1 01 1 1 01 01 01 1 01 1 1 1 1 1 1 01 01 01 1 1 01 1 1 10 1 1 01 1 01 1 1 01 1 1 1 1 1 1 10 01 01 01 10 10 1 1 1 1 1 01 1 1 1 1 01 10 10 10 10 01 1 1 10 1 1 1 1 1 1 1 01 1 1 10 10 10 01 01 10 10 01 10 1 01 1 1 10 0100 1 01 1 1011 01 10 10 01 10 01 10 10

yH (8,2) 1 C 2 x y z A B C D E N F G H I J K L M t u r s 1 q 1 1 n o l m j k h i p O P Q R S T U v d e f g 9 a b c 5 6 7 8 1 2 3 4 1 1 w 1 13 1 12 1 11 1 23 1 22 1 21 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 10 01 10 1 1 1 01 1 1 10 01 10 10 01 1 1 1 10 11 11 1 01 01 10 01 10 01 10 1 1 1 10 1 1 1 01 01 01 10 01 10 10 01 1 1 1 1 10 1 01 1 1 1 1 01 1 1 01 1 1 10 10 01 10 1 1 10 1 01 1 1 1 1 10 1 1 01 10 01 1 10 1 1 01 1 01 1 1 1 1 1 1111 01 10 10 01 10 10

xP (8,2) 1 C On these Peano ordered basic Ptrees smoothing with count aggregation by using highest level LMs only. 1 x y z A B C D E N F G H I J K L M t u r s 2 1 q 1 n o l m j k h i p O P Q R S T U 1 v d e f g 9 a b c 5 6 7 8 1 2 3 4 1 w 1 13 12 23 22 11 21 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 01 1 1 01 1 1 01 01 01 1 1 01 01 1 1 1 01 01 01 1 1 1 10 01 1 01 1 1 01 10 01 10 01 10 01 1 1 1 10 1 1 1 01 10 01 10 10 01 10 01 1 1 1 1 10 1 1 1 1 1 1 1 01 1 1 01 1 1 01 01 10 01 10 10 1 1 1 1 1 1 1 01 1 1 10 01 01 10 10 01 1 1 1 1 1 1 01 1 10 10 1 1101 10 01 01 10 01 01

yR (8,2) 1 C On these x-major raster ordered basic Ptrees smoothing with count aggregation by using highest level LMs only. x y z A B C D E N F G H I J K L M 1 1 2 t u r s 1 1 1 n o l m j k h i p O P Q R S T U v d e f g 9 a b c 5 6 7 8 1 2 3 4 w 13 12 11 23 22 21 13 12 11 10 23 22 21 20 1 1 1 1 1 1 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 01 01 01 1 1 1 1 1 01 1 1111 10 1 10 01 10 10 10 01 10 01 1 11 1 10 1 1 1 01 1 1 01 10 10 01 10 01 10 10 10 10 1 1 1 1 1 1 1 1 1 1 01 1 1 10 01 01 10 10 10 01 101 10 10 10 10 1 1 1 1 1 1 1 1 1 1 01 01 10 01 10 10 10 01 10 01 1 1 01 1 10 1 1 10 10 1 1 1 1 10 1 1 01 10 01 01 1 1 01 1 1 10 1 1 10 1 1 10 1 1 10 1 1 1 01 01 10 10 10 10 01 01 01

1 v q w t u r s n o l m j k h i p O P Q R S T U d e f g 9 a b c 5 6 7 8 1 2 3 4 2 x y z A B C D E N F G H I J K L M v w 1 1 1 2 d e f g 9 a b c 5 6 7 8 1 2 3 4 n o l m j k h i p O P Q R S T U x y z A B C D E N F G H I J K L M t u r s yR (8,2) 1 C yH (8,2) 1 C Comparing orderings of (8,2) 1-lo-bit Count Smoothing 3 1 n o l m j k h i t u r s v w x y z A B C D E N F G H I J K L M q p O P Q R S T U d e f g 9 a b c 5 6 7 8 1 2 3 4 0 1 2 3 4 5 6 7 7 6 5 4 2 As far as using this info to create an good initial cluster centroid set, I like Hilbert because the centriod at (3,3) is strong and would attract I, so the initial clustering is very good (actually doesn't necessarily need improvement) v q w 1 2 d e f g 9 a b c 5 6 7 8 1 2 3 4 n o l m j k h i p O P Q R S T U x y z A B C D E N F G H I J K L M t u r s M (8,2) 1 C xP (8,2) 1 C

Someone should look at y-first-Peano yPw (N-ordering) Comments so far Someone should look at y-first-Peano yPw (N-ordering) It might be a bit better since it moves immediately from the lower left octant to the one above it??? How about y-first Raster (x-major sorting order)? How about x-first-Hilbert? What about other aggregations? What about universal? (note that finding good initial centroids may work better using universal since it identifies only very dense areas (but maybe too dense? That is, too few centroid areas?). What about majority aggregation (1 iff the majority of the bits are 1-bits)? Note that one cannot use the LMs or PMs for this, but must recompute these bit vectors of size |LM| by examining each not-pure-zero leaf. What about other rank aggregations (e.g., 3/4ths i.e., 1 iff at least 3/4ths of the bits are 1-bits)? Of course any rank aggregation takes a lot of additional processing, whereas, existential and universal use the LM and PM vectors that are already computed and immdiately available. t u r s x y z A B C D E N F G H I J K L M q n o l m j k h i p O P Q R S T U v d e f g 9 a b c 5 6 7 8 1 2 3 4 w

x-first Hilbert ordering is as shown here --> Comments continued x-first Hilbert ordering is as shown here --> Below is x-first Raster and below at right is y-first Raster x y z A B C D E N F G H I J K L M t u r s q n o l m j k h i p O P Q R S T U v d e f g 9 a b c 5 6 7 8 1 2 3 4 w x y z A B C D E N F G H I J K L M t u r s x y z A B C D E N F G H I J K L M t u r s q q n o l m j k h i p O P Q R S T U v n o l m j k h i p O P Q R S T U v d e f g 9 a b c 5 6 7 8 1 2 3 4 d e f g 9 a b c 5 6 7 8 1 2 3 4 w w

M (8) m13 0 0 0 1 m12 0 0 0 1 1 1 m11 0 1 m10 0 1 We have a wealth of classification and clustering tools now (also ARM). What methods leap to mind? 00110011 01010101 00110011 01010101 01010101 11001100 01111111 10111101 01111111 01010100 00111010 11001111 01010000 10100111 11111001 00001010 01001010 00011001 m23 0 0 0 1 1 1 m22 0 0 1 1 0 1 m21 0 0 1 1 0 1 m20 0 1 00001111 00001111 00001111 00110011 01111100 11111110 11111110 10001111 11111110 11111001 11100000 00011111 10011111 10000000 01111000 11000110 yH (8) H13 0 0 0 1 H12 0 0 1 1 0 1 H11 0 1 1 1 H10 0 1 00001111 00110110 11110100 01101000 00001111 10010101 00000111 10001010 11111110 10111101 00011010 10001110 01000001 10000000 01111111 01000010 H23 0 0 0 1 1 1 H22 0 0 1 1 0 1 H21 0 0 1 1 0 1 H20 0 1 01100011 00111001 00111100 01101010 00111110 01010101 00111111 01111111 01111111 11001100 11111100 11100011 11110111 00011111 11101011 xP (8) 11100111 P13 0 0 0 1 P12 0 0 1 1 0 1 P11 0 1 1 1 P10 0 1 00001111 01010101 01001111 00010101 00110011 10101010 00100011 10001010 11000000 10001110 11110011 00010100 00001111 11110110 01000001 00011111 11101111 -1111111 10010101 PH23 0 0 0 1 1 1 P22 0 0 1 1 0 1 P21 0 0 1 1 0 1 P20 0 1 00110011 00110011 00001111 01010101 00011111 01100101 00111111 01000001 01111110 11111110 01110001 10111011 10100011 yR (8) X13 0 0 0 1 X12 0 0 0 1 1 1 X11 0 0 0 1 1 1 X10 0 1 00000011 00011111 11100000 00000001 00000001 11111110 11111110 00000111 01111111 10000000 01111111 10011111 10100011 X23 0 0 0 1 X22 0 1 X21 0 1 X20 0 1 00001100 00110100 01011101 01100000 10100111 10001001 00111110 11011110 01100110 01010101 00000110 00011011 01101110 01100101 00111001 11011110 11101011 11111110 10011111 11101111 10110011 10111011 00101111 11101111 11101111 10100011

2 1 1 2 yH (8) yH ANDing Alg: resLM = ^TLM ^T'^PPM' resLeaf exists iff resLM=1 resLeaf=^TresLeaf^T'^PresLeaf' H13 0 0 0 1 yH (8) H12 0 0 1 1 0 1 H11 0 1 1 1 H10 0 1 00001111 00110110 11110100 01101000 00001111 10010101 00000111 10001010 11111110 10111101 00011010 10001110 01000001 10000000 01111111 01000010 H23 0 0 0 1 1 1 H22 0 0 1 1 0 1 H20 0 1 key 1 5 6 2 3 4 8 7 b c g f e a 9 d h j n l o m k i U S P p Q O R T w v r s u t q D B A z N H F M L K J I G E C y x P13 1 P12 1 P11 1 P10 1 P23 1 P22 1 P21 1 P20 1 Hs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 18 20 22 24 26 28 30 32 34 36 37 38 40 42 44 46 88 114 169 170 171 172 180 195 196 197 198 202 205 206 210 223 224 225 226 245 248 249 251 252 H21 0 0 1 1 0 1 01100011 00111001 00111100 01101010 00111110 01010101 00111111 01111111 01111111 11001100 11111100 11100011 11110111 00011111 11100111 11101011 2 1 n o l m j k h i v q p O P Q R S T U w d e f g 9 a b c 5 6 7 8 1 2 3 4 t u r s x y z A B C D E N F G H I J K L M A fast isotropic clustering algorithm: 0. remove noise using H-step gap analysis. 1. Use yH (8) 2-lo cells as initial clusters (with strengths) 2. expand the strongest cluster by 1 bit (but only if they do not collide with an existing cluster): expand (01 11) to (0 1) revise strength and repeat 2 This gives us 3 noise points {q,v,w} and 5 clusters (the right ones except that it doesn't separate out an tiny embedded cluster in octant (01,01) but that is to be expected since the diameter of that embedded cluster is smaller than the 2hi cell diameter. 1 2 n o l m j k h i v q p O P Q R S T U w d e f g 9 a b c 5 6 7 8 1 2 3 4 t u r s x y z A B C D E N F G H I J K L M

xP (8) P13 0 0 0 1 P12 0 0 1 1 0 1 P11 0 1 1 1 P10 0 1 00001111 01010101 01001111 00010101 00110011 10101010 00100011 10001010 11000000 10001110 11110011 00010100 00001111 11110110 01000001 00011111 11101111 -1111111 10010101 P23 0 0 0 1 1 1 P22 0 0 1 1 0 1 P21 0 0 1 1 0 1 P20 0 1 00110011 00110011 00001111 01010101 00011111 01100101 00111111 01000001 01111110 11111110 01110001 10111011 10100011 xP (8,2) 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 01 1 1 01 1 1 01 01 01 1 1 01 01 1 1 1 01 01 01 1 1 1 10 01 1 01 1 1 01 10 01 10 01 10 01 1 1 1 10 1 1 1 01 10 01 10 10 01 10 01 1 1 1 1 10 1 1 1 1 1 1 1 01 1 1 01 1 1 01 01 10 01 10 10 1 1 1 1 1 1 1 01 1 1 10 01 01 10 10 01 1 1 1 1 1 1 01 1 10 10 1 1101 10 01 01 10 01 01

yR (8) X13 0 0 0 1 X12 0 0 0 1 1 1 X11 0 0 0 1 1 1 X10 0 1 00000011 00011111 11100000 00000001 00000001 11111110 11111110 00000111 01111111 10000000 01111111 10011111 10100011 X23 0 0 0 1 X22 0 1 X21 0 1 X20 0 1 00001100 00110100 01011101 01100000 10100111 10001001 00111110 11011110 01100110 01010101 00000110 00011011 01101110 01100101 00111001 11011110 11101011 11111110 10011111 11101111 10110011 10111011 00101111 11101111 11101111 10100011 yR (8) 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 01 01 01 1 1 1 1 1 01 1 1111 10 1 10 01 10 10 10 01 10 01 1 11 1 10 1 1 1 01 1 1 01 10 10 01 10 01 10 10 10 10 1 1 1 1 1 1 1 1 1 1 01 1 1 10 01 10 10 01 10 01 101 10 10 10 10 1 1 1 1 1 1 1 1 1 1 01 01 10 01 10 10 10 01 10 01 1 1 01 1 10 1 1 10 10 1 1 1 1 10 1 1 01 10 01 01 1 1 01 1 1 10 1 1 10 1 1 10 1 1 10 1 1 1 01 01 10 10 10 10 01 01 01

Implementation Specification R(A1..An) has basic Ptrees, (ls)Pi,j i=1..n and if Ai is real with bitwidth=mi or if Ai is categorical with categories {a1..ami} then j=1..mi Let m=i=1..nmi Sort {(ls)Pi,j} by i first, then j. Alias each P-tree by Pk where k is its sort position, k=1..m. Develop a simple transportable AND utility (assembler, C, C++...) that takes as input: 2 m-bit vectors P, T and and a 2-bit output-switch, S where P (Pattern) specifies which P-trees are to be involved by (1-bit) and T (Truth) has a 1-bit iff P=1 and is the operand (uncomplemented). For those with P=1 and T=0 their complements are the operand. Note: If a simple P-tree complement is called for (no ANDing) just set that P-bit to 1 and leave that T-bit at 0. Let M be a state variable specifying the number of P-trees in the set (M must be at least m). For the output-switch: if the first bit is 1, the result P-tree is to be stored as (ls)PM+1 and if the second bit is 1 the root count is to be returned. rc P,T,S (ls)PM+1

ANDing Algorithm: resLM = ^TLM ^T'^PPM' resLeaf exists iff resLM=1 and then resLeaf=^TresLeaf^T'^PresLeaf' (if no operands, install pure1 or create a PM) K 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J L M N O P Q R S T U 1 3 7 1 2 3 1 3 4 1 22 2 3 1 11111111111111 2 1 3 4 2 1 3 5 2 1 3 1 32 5 1 31 7 1 30 3 1 21 3 1 20 7 1 10 2 2 32 1 7 2 31 1 7 2 30 1 5 2 21 1 2 20 1 9 2 10 1 e.g., 13^12: P=1100 0000 T=1100 0000, resLM = LM13^LM12 =0001000 resLeaf(3): same P and T. LMs in red and PMs in blue below. so resLM ^ = PMs show that the 2 middle leaves are pure1 (rc=4 already) and that the last leaf of 13 is pure1 so just retrieve last leaf of 12 (01) and accumulate 1-count into rc (=5) and ANDing first leaves, 01 ^ 10 = 00, so rc=5 1 1 1 2L Mw 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 1 1 01 01 01 1 1 1 01 1 1 01 01 01 1 01 1 1 1 1 1 1 01 01 01 1 1 01 1 1 10 1 1 01 1 01 1 1 01 1 1 1 1 1 1 10 01 01 01 10 10 1 1 1 1 1 01 1 1 1 1 01 10 10 10 10 01 1 1 10 1 1 1 1 1 1 1 01 1 1 10 Note that PM(B')=LM'(B) LM(B')= PM'(B P = AND-input-pattern (vertical slices involved in AND) T = AND=truth-pattern (truth value of thos inolved) 10 10 01 01 10 10 01 10 1 01 1 1 10 0100 1 01 1 1011 01 10 10 01 10 01 10 10

resLM = ^TLM ^T'^PPM' resPM unnecessary - must be construct. resLeaf exists iff resLM=1 and then resLeaf=^TresLeaf^T'^PresLeaf' K 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J L M N O P Q R S T U 1 3 7 1 2 3 1 3 4 1 22 2 3 1 11111111111111 2 1 3 4 2 1 3 5 2 1 3 1' 3 1 2 1 2 2 1 13'^12^20: P=1100 0001 T=0100 0001, resLM = LM12^LM20 ^PM'13 =0001111 resLeaf(3456): ^ ^ = rc=12 1 1 1000 1000 10 1 1 1 01 1 11 10 00 1 1 1 1 10 00 01 11 1011 1 10 1 11 01 10 13 12 11 10 23 22 21 20 1 2 3 4 5 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 01 1 1 01 01 01 1 1 1 01 1 1 01 01 01 1 01 1 1 1 1 1 1 01 01 01 1 1 01 1 1 10 1 1 01 1 01 1 1 01 1 1 1 1 1 1 10 01 01 01 10 10 1 1 1 1 1 01 1 1 1 1 01 10 10 10 10 01 1 1 10 1 1 1 1 1 1 1 01 1 1 10 Note that PM(B')=LM'(B) LM(B')= PM'(B P = AND-input-pattern (vertical slices involved in AND) T = AND=truth-pattern (truth value of thos inolved) 10 10 01 01 10 10 01 10 1 01 1 1 10 0100 1 01 1 1011 01 10 10 01 10 01 10 10

i=1..nk=b..0(xi,k-ai,k)2k = k=b..0 rk2k or 5 6 3 4 7 8 9 a d e b c f g h j i l n m o U S T P Q p R O w v I J K L M G E C x y N H F z A D B q r s t u 1 3 1 2 1 1 2 3 1 2 1 2 1 2 1 Given R(A1..An) (vector space), a = (a1 .. an) = ( k=b..0 a1,k2k ... k=b..0 an,k2k )  i=1..n Dom(Ai) The Lp r-Ring about a, LpRing(a,r) is: {x | Lp(x,a)p = rp} where Lp(x,a)p = i=1..n |xi-ai|p First we treat p=1 (Manhattan distance) and we consider only the polytant where all ai  xi (other polytants are handled similarly with the appropriate signs) so that all |xi-ai| = xi-ai = 0 and thus, L1Ring(x,a)= {x | i=1..n(xi-ai) = r } or all x such that i=1..nk=b..0(xi,k-ai,k)2k = k=b..0 rk2k or x-first Peano (Z-ordering?) n o l m j k h i v x y z A B C D E N F G H I J K L M q p O P Q R S T U d e f g 9 a b c 5 6 7 8 1 2 3 4 t u r s w i=1..n(k=b..0 xi,k2k - k=b..0ai,k2k) = k=b..0 rk2k or i=1..nk=b..0 xi,k2k - i=1..nk=b..0ai,k2k = k=b..0 rk2k or i=1..nk=b..0 xi,k2k = k=b..0 rk2k + i=1..nk=b..0ai,k2k or i=1..nk=b..0 xi,k2k = k=b..0 (rk+i=1..nai,k)2k or k=b..0(i=1..nxi,k)2k = k=b..0(rk+i=1..nai,k)2k Forming a P-tree mask for the set of xs that solve this equation seems difficult because increasing one dimension requires decreasing another, etc.

i=1..n(k=b..0(xi,k-ai,k)2k)2 = (k=b..0 rk2k)2 5 6 3 4 7 8 9 a d e b c f g h j i l n m o U S T P Q p R O w v I J K L M G E C x y N H F z A D B q r s t u 1 3 1 2 1 1 2 3 1 2 1 2 1 2 1 Given R(A1..An) (vector space), a = (a1 .. an) = ( k=b..0 a1,k2k ... k=b..0 an,k2k )  i=1..n Dom(Ai) The Lp r-Ring about a, LpRing(a,r) is: {x | Lp(x,a)p  rp} where Lp(x,a)p = i=1..n |xi-ai|p Next we treat p=2 (square Euclidean distance) and L2Ring(x,a)2= {x | i=1..n(xi-ai)2 = r } or all x such that i=1..n(k=b..0(xi,k-ai,k)2k)2 = (k=b..0 rk2k)2 The left side can be multiplied out and one can, again, seek a P-tree mask for the set of solutions, but it presents the same "trade off" problem, right? x-first Peano (Z-ordering?) n o l m j k h i v x y z A B C D E N F G H I J K L M q p O P Q R S T U d e f g 9 a b c 5 6 7 8 1 2 3 4 t u r s w

ps 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 33 34 37 38 41 42 45 46 49 50 54 56 57 58 61 62 86 110 142 143 154 155 158 164 166 172 174 175 178 180 182 186 187 188 190 237 252 253 254 255 k 1 2 5 6 3 4 7 8 9 a d e b c f g h j i l n m o U S T P Q p R O w v I J K L M G E C x y N H F z A D B q r s t u 1 3 1 2 1 1 2 3 1 2 1 2 1 2 1 Dr. Scott is looking for a formula the P-tree mask of the solution set based on the tree position of a and x. Is there a closed form formula for the P-tree mask of the L1 or L2 ring of radius r about a? Note that the walk of any quadrant is the same at a given level. But that suggests an approach based on j-lo cells is the way to do it (the mask any such cell is trivial). The only concern here is when a is near the boundary of its cell (so in order to get a superset mask of its r-disk, one has to consider some neighboring cells). That suggests just using the EIN-disk about a? ps 7 1 ps 6 1 ps 5 1 ps 4 1 ps 3 1 ps 2 1 ps 1 ps 1 ps 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 33 34 37 38 41 42 45 46 49 50 54 56 57 58 61 62 86 110 142 143 154 155 158 164 166 172 174 175 178 180 182 186 187 188 190 237 252 253 254 255 k 1 2 5 6 3 4 7 8 9 a d e b c f g h j i l n m o U S T P Q p R O w v I J K L M G E C x y N H F z A D B q r s t u x y z A B C D E N F G H I J K L M t u r s q n o l m j k h i p O P Q R S T U v d e f g 9 a b c 5 6 7 8 1 2 3 4 w IJ KL M G E C xy N H F zAD B q r s t u 1256 3478 9ade bcfg hj ik ln mo US T PQ p RO w v

k=b..0(i=1..nxi,k)2k = k=b..0(rk+i=1..nai,k)2k Summary slide: The last few slides have been an attempt to develop a formula to mask the Manhattan ring of a point at a given radius. This work builds off of a discussion with Dr. Kirk Scott and his CATA-06 paper. It seems to come down to a case of solving the equation (creating a P-tree mask for the solutions of): k=b..0(i=1..nxi,k)2k = k=b..0(rk+i=1..nai,k)2k This involves trading off among the dimensions (give from 1, take from another). Can we form P-tree masks in this case? One can use j-lo grid cells as Euclidean r-disk supersets. However, when the center, a, is not in the middle of the cell, these may not give small enough supersets so they can then be pruned using scans. One could always take the j-lo cell and a selection of its bordering j-lo cells to make sure the Euclidean r-disk is completely super-setted. This would involve determining the subset of dimensions in which ai is close to zero (pushing a too close to those "low-side" cell borders) and close to 1 (pushing a too close to those "hi-side" cell borders). The EIN r-disk about a (the L r-disk about a) is the best cube-shaped superset, of course. But is its P-tree mask easily computed, i.e., is there preprocessing that makes it a matter of plugging the ai and r values into one formula with no additional ANDing or Root-Counting? That is, can all the vertical processing be preprocessing and therefore amortized for all a and r? One additional note regarding EIN-disk super-setting of the Euclidean r-disk about a: That's what Taufik is now doing. First, he takes the r-disk-superscribed TV-countour of a (the thinnest TV-contour that contains the Euclidean r-disk about a). Then prune out a sufficient number of the "far away or halo points" by intersecting it with (ANDing masks) the r-disk-superscribing Xi-ai-contours of a (either one i at a time or taking the cluster of all large i-ai values, if there is one). We note that the intersection of all Euclidean-r-disk-superscibing Xei-contours IS the EIN-disk or radius r about a. Another approach suggested by Dr. Scott is to develop formulas for an approximation of the Euclidean disk (or ring) about an arbitrary center point based on where it sits in its j-lo cell. Once these formulas are developed for one cell, they are the same for others (just change the hi-bits that are used to address the cell). (This is similar to the process described in the j-lo grid cell paragraph above).

j-lo core cell mining (assume a cell is core iff it is  50% full): K 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J L M N O P Q R S T U 1 3 7 1 3 7 1 3' 7 1 2 3 1 2 3 1 2 3 1 3 4 1 22 2 3 1 11111111111111 2 3 1 11111111111111 2 3 1 11111111111111 2 1 3 4 2 1 3 4 2 1 3 4 2 1 3 5 2 1 3 x y z A B C D E N F G H I J K L M t u r s q n o l m j k h i p O P Q R S T U v d e f g 9 a b c 5 6 7 8 1 2 3 4 w j-lo core cell mining (assume a cell is core iff it is  50% full): What we note is that we need all patterns precomputed (and RootCounted). Can we AND, RootCount, e.g., 13^23 13'^23 13^23' 13'^23' in 1 step? (e.g., by concatenating, flipping and shifting before ANDing???) Next slide attempts to use the PeanoStep derived attribute in combo with this approach since it walks by j-lo cells. rc(13^23)=5 <32 rc(13^23')=rc13-rc(13^23)7-5=2<32 rc(13'^23)=rc23-rc(13^23)=22-5=17<32 rc(13'^23')=56-17-2-5=3232 so 3-lo cell (0,0) is core There are no 2-lo core cell in 3-lo cells (1,1) or (1,0) since <8 points in them (5 & 2 resp.) 2-lo core cells in 3-lo cell (0,1)? rc(13'^23^12^22)=7<8 ...

NOTE!! ps7=p23 ps6=p13 ps5=p22 ps4=p12 ps3=p21 ps2=p11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 33 34 37 38 41 42 45 46 49 50 54 56 57 58 61 62 86 110 142 143 154 155 158 164 166 172 174 175 178 180 182 186 187 188 190 237 252 253 254 255 k 1 2 5 6 3 4 7 8 9 a d e b c f g h j i l n m o U S T P Q p R O w v I J K L M G E C x y N H F z A D B q r s t u p 1 3 7 p 1 2 3 p 1 3 4 p 1 22 p 2 3 1 22 p 2 1 34 p 2 1 35 p 2 1 33 p s 7 1 2 p s 6 1 7 p s 5 1 3 4 p s 4 1 2 3 p s 3 1 5 p s 2 1 3 4 p s 1 3 p s 1 2 NOTE!! ps7=p23 ps6=p13 ps5=p22 ps4=p12 ps3=p21 ps2=p11 ps1=p20 ps0=p10 So there is nothing in the basic p-tree set of the Peano Step Count derived attribute that we didn't already have in the basic Ptree set of the table itself! n o l m j k h i v x y z A B C D E N F G H I J K L M q p O P Q R S T U d e f g 9 a b c 5 6 7 8 1 2 3 4 t u r s w

k 1 2 5 6 3 4 7 8 9 a d e b c f g h j i l n m o U S T P Q p R O w v I J K L M G E C x y N H F z A D B q r s t u 1 3 7 1 2 3 1 3 4 1 22 2 3 1 22 2 1 34 2 1 35 2 1 33 What's next? What do we get for free once we have computed all of basic P-tree pairs? i.e., The basic P-tree set is {Pi,j | j=bi..0 for each i=1..n} (there are b=i=1..nbi+1 of them ) Taufik precomputes {rc(Pi,j^Pi,k) | i=1..n, all j,k} (there are i=1..n(bi+1)2 of them ) If we were to pre-compute all { rc(Pi,i^Ph,k | Pi,i and Ph,k basic P-trees } (b2 of them), we can get the rcs of any equi-width partition of TV-contours out of it for free (just using Taufik's precomputation). We can also get the rcs of all 2-hi grid cells out of it. We can also get the rcs of all intersections of equi-width TV-contours with 2-hi cells. That might be good enough to always yield up a very good Euclidean disk superset? By very good, I mean, given a point, a, a superset of Disk(a,r) which has few enough points so it can be scanned for the Disk(a,r) points (or it can be fitted with a Gaussian Radial Basis Vote Function for NN Classification). If we were to pre-compute { rc(Pi,i^Ph,k^Pl,m} (b3 of them), we can get the rcs of any equi-width partition of TV-contour and also the rcs of all 3-hi grid cells, etc. Clearly, if we had the rcs of all basic P-tree combinations, we could do anything! Is there a parallel (or pipelined) way to compute all of them?

h s 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17 19 21 23 25 27 29 31 33 35 36 37 39 41 43 45 87 113 168 169 170 171 179 194 195 196 197 201 204 205 209 222 223 224 225 244 247 248 250 251 h s 7 1 2 h s 6 1 9 h s 5 1 2 h s 4 1 9 h s 3 1 2 7 h s 2 1 5 h s 1 2 6 h s 1 3 7 k 1 5 6 2 3 4 8 7 b c g f e a 9 d h j n l o m i U S P p Q O R T w v r s u t q D B A z N H F M L K J I G E C y x h 1 3 7 h 1 2 3 h 1 3 4 h 1 2 h 2 3 1 h 2 1 3 4 h 21 1 3 5 h 2 1 3 x y z A B C D E N F G H I J K L M t u r s q n o l m j k h i p O P Q R S T U v d e f g 9 a b c 5 6 7 8 1 2 3 4 w Changi

Pre-processing costs? Pairs within attributes first (what Taufik does). rc(a3â2') no^ required = rc(a3) - rc(a3â2) = 7-5=2 rc(a3â1') no^ required = rc(a3) - rc(a3â1) = 7-7=0 rc(a3â0') no^ required = rc(a3) - rc(a3â0) = 7-3=4 rc(a3'â1) ^ req, (but just count black-0 red-1 combos = 27) rc(a3'â2) ^ req, (but just count black-0 red-1 combos = 18) ( a3â2 and a3'â2 in 1 instr or in || ?) rc(a3'â0) ^ req, (but just count black-0 red-1 combos = 19) rc(a3'â2') no ^ req, = rc(a3') - rc(a3'â2) = 49 - 18 = 31 (so far: 4 rc's out of 2 ANDs) rc(a3'â1') no ^ req, = rc(a3') - rc(a3'â1) = 49 - 27 = 22 rc(a3'â0') no ^ req, = rc(a3') - rc(a3'â0) = 49 - 19 = 30 a 1 2 a 3 1 7 a 3 1 7 a 1 3 4 a 3 1 7 a 2 1 3 k 1 2 5 6 3 4 7 8 9 a d e b c f g h j i l n m o U S T P Q p R O w v I J K L M G E C x y N H F z A D B q r s t u a 3 1 7 a 2 1 3 a 1 3 4 a 1 2 b 3 1 2 b 2 1 3 4 b 1 3 5 b 1 3 7 23 34 22 35 33 49 21 56 31 22 30 a3 a2 a1 a0 b3 b2 b1 b0 a3' a2' a1' a0' b3' b2' b1' b0' 19 27 18 5 7 3 2 4 a3 a2 a1 a0 b3 b2 b1 b0 a3' a2' a1' a0' b3' b2' b1' b0'

Pre-processing costs? rc(a2â1') no^ required = rc(a2) - rc(a2â1) = 23-13=10 rc(a2â0') no^ required = rc(a2) - rc(a2â0) = 23-7=16 rc(a2'â0) ^ req, (but just count black-0 red-1 combos = 15) rc(a2'â1) ^ req, (but just count black-0 red-1 combos = 21) rc(a2'â0') no ^ req, = rc(a2') - rc(a2'â0) = 33 - 15 = 18 rc(a2'â1') no ^ req, = rc(a2') - rc(a2'â1) = 33 - 21 = 12 a 2 1 3 a 1 2 a 1 3 4 a 2 1 3 k 1 2 5 6 3 4 7 8 9 a d e b c f g h j i l n m o U S T P Q p R O w v I J K L M G E C x y N H F z A D B q r s t u a 3 1 7 a 2 1 3 a 1 3 4 a 1 2 b 3 1 2 b 2 1 3 4 b 1 3 5 b 1 3 7 23 34 22 35 33 49 21 56 12 18 31 22 30 a3 a2 a1 a0 b3 b2 b1 b0 a3' a2' a1' a0' b3' b2' b1' b0' 19 15 27 21 13 7 18 10 16 5 7 3 2 4 a3 a2 a1 a0 b3 b2 b1 b0 a3' a2' a1' a0' b3' b2' b1' b0'

Pre-processing costs? rc(a1â0') no^ required = rc(a1) - rc(a1â0) = 34-12=22 rc(a1'â0) ^ req, (but just count black-0 red-1 combos = 10) rc(a1'â0') no ^ req, = rc(a1') - rc(a1'â0) = 22 - 10 = 12 (total of 12 ANDs so far. a 1 3 4 a 1 2 k 1 2 5 6 3 4 7 8 9 a d e b c f g h j i l n m o U S T P Q p R O w v I J K L M G E C x y N H F z A D B q r s t u a 3 1 7 a 2 1 3 a 1 3 4 a 1 2 b 3 1 2 b 2 1 3 4 b 1 3 5 b 1 3 7 23 34 22 35 33 49 21 56 12 12 18 31 22 30 a3 a2 a1 a0 b3 b2 b1 b0 a3' a2' a1' a0' b3' b2' b1' b0' 19 15 10 12 27 21 22 13 7 18 10 16 5 7 3 2 4 a3 a2 a1 a0 b3 b2 b1 b0 a3' a2' a1' a0' b3' b2' b1' b0'

Pre-processing costs? b 1 3 5 b 3 1 2 b 1 3 b 3 1 2 b 3 1 2 b 2 1 3 4 3 5 b 3 1 2 b 1 3 b 3 1 2 b 3 1 2 b 2 1 3 4 k 1 2 5 6 3 4 7 8 9 a d e b c f g h j i l n m o U S T P Q p R O w v I J K L M G E C x y N H F z A D B q r s t u a 3 1 7 a 2 1 3 a 1 3 4 a 1 2 b 3 1 2 b 2 1 3 4 b 1 3 5 b 1 3 7 23 34 22 35 33 49 21 56 17 16 16 12 12 18 31 22 30 a3 a2 a1 a0 b3 b2 b1 b0 a3' a2' a1' a0' b3' b2' b1' b0' 18 18 17 17 17 17 5 5 7 19 15 10 12 27 21 22 13 7 18 10 16 5 7 3 2 4 a3 a2 a1 a0 b3 b2 b1 b0 a3' a2' a1' a0' b3' b2' b1' b0'

Pre-processing costs? b 2 1 3 4 b 1 3 5 b 2 1 3 4 b 1 3 5 b 1 3 b 1 3 1 3 4 b 1 3 5 b 2 1 3 4 b 1 3 5 b 1 3 b 1 3 k 1 2 5 6 3 4 7 8 9 a d e b c f g h j i l n m o U S T P Q p R O w v I J K L M G E C x y N H F z A D B q r s t u a 3 1 7 a 2 1 3 a 1 3 4 a 1 2 b 3 1 2 b 2 1 3 4 b 1 3 5 b 1 3 7 23 34 22 35 33 49 21 56 9 9 8 17 16 16 12 12 18 31 22 30 a3 a2 a1 a0 b3 b2 b1 b0 a3' a2' a1' a0' b3' b2' b1' b0' 18 14 14 21 18 13 12 22 19 17 12 15 17 17 17 5 5 7 19 15 10 12 27 21 22 13 7 18 10 16 5 7 3 2 4 a3 a2 a1 a0 b3 b2 b1 b0 a3' a2' a1' a0' b3' b2' b1' b0'

rc(a3^b3') = rc(a3) - rc(a3^b3) = 7-5 = 2 Pre-processing costs? rc(a3^b3') = rc(a3) - rc(a3^b3) = 7-5 = 2 rc(a3'^b3) = rc(b3) - rc(a3^b3) = 22-5 = 17 rc(a3'^b3') = total - rc(a3^b3)-rc(a3^b3'(-rc(a3'^b3)=56-5-2-17=32 a 3 1 7 b 3 1 2 a 3 1 7 b 1 3 5 a 3 1 7 b 1 3 a 3 1 7 b 2 1 3 4 k 1 2 5 6 3 4 7 8 9 a d e b c f g h j i l n m o U S T P Q p R O w v I J K L M G E C x y N H F z A D B q r s t u a 3 1 7 a 2 1 3 a 1 3 4 a 1 2 b 3 1 2 b 2 1 3 4 b 1 3 5 b 1 3 similarly, rc(a3^b2)=6 rc(a3^b2')=1 rc(a3'^b2)=28 rc(a3'^b2')=21 similarly, rc(a3^b1)=6 rc(a3^b1')=1 rc(a3'^b1)=29 rc(a3'^b1')=20 similarly, rc(a3^b0)=4 rc(a3^b0')=3 rc(a3'^b0)=29 rc(a3'^b0')=20 7 23 34 22 35 33 49 21 56 9 9 8 17 16 16 12 12 18 31 22 30 32 21 20 20 a3 a2 a1 a0 b3 b2 b1 b0 a3' a2' a1' a0' b3' b2' b1' b0' 29 18 14 14 21 29 18 13 12 22 19 28 17 12 15 17 17 17 17 5 5 7 19 15 10 12 27 21 22 13 7 18 10 16 5 7 3 5 6 6 4 2 4 2 1 1 3 a3 a2 a1 a0 b3 b2 b1 b0 a3' a2' a1' a0' b3' b2' b1' b0'

similarly, rc(a2^b0)=15 rc(a2'^b0)=18 rc(a2^b0')=8 rc(a2'^b0')=15 Pre-processing costs? similarly, rc(a2^b0)=15 rc(a2'^b0)=18 rc(a2^b0')=8 rc(a2'^b0')=15 similarly, rc(a2^b2)=19 rc(a2'^b2)=15 rc(a2^b2')=4 rc(a2'^b2')=18 similarly, rc(a2^b3)=14 rc(a2'^b3)=8 rc(a2^b3')=9 rc(a2'^b3')=25 similarly, rc(a2^b1)=16 rc(a2'^b1)=19 rc(a2^b1')=7 rc(a2'^b1')=14 a 2 1 3 b 1 3 5 b 3 1 2 b 2 1 3 4 b 1 3 k 1 2 5 6 3 4 7 8 9 a d e b c f g h j i l n m o U S T P Q p R O w v I J K L M G E C x y N H F z A D B q r s t u a 3 1 7 a 2 1 3 a 1 3 4 a 1 2 b 3 1 2 b 2 1 3 4 b 1 3 5 b 1 3 7 23 34 22 35 33 49 21 56 9 9 8 17 16 16 12 12 18 25 18 18 15 31 22 30 32 21 20 20 a3 a2 a1 a0 b3 b2 b1 b0 a3' a2' a1' a0' b3' b2' b1' b0' 29 18 18 14 14 21 29 19 18 13 12 22 19 28 15 17 12 15 17 17 17 17 8 5 5 7 19 15 10 12 27 21 22 13 7 14 19 16 15 18 10 16 9 4 14 8 5 7 3 5 6 6 4 2 4 2 1 1 3 a3 a2 a1 a0 b3 b2 b1 b0 a3' a2' a1' a0' b3' b2' b1' b0'

similarly, rc(a0^b0)=9 rc(a0'^b0)=24 rc(a0^b0')=13 rc(a0'^b0')=10 Pre-processing costs? similarly, rc(a0^b0)=9 rc(a0'^b0)=24 rc(a0^b0')=13 rc(a0'^b0')=10 (16 additional AND operations were required for the mixed attribute pairs. The total was 28 ANDs) similarly, rc(a0^b1)=15 rc(a0'^b1)=20 rc(a0^b1')=7 rc(a0'^b1')=14 similarly, rc(a0^b2)=12 rc(a0'^b2)=22 rc(a0^b2')=10 rc(a0'^b2')=12 similarly, rc(a0^b3)=7 rc(a0'^b3)=15 rc(a0^b3')=15 rc(a0'^b3')=19 similarly, rc(a1^b1)=22 rc(a1'^b1)=13 rc(a1^b1')=12 rc(a1'^b1')=9 similarly, rc(a1^b0)=20 rc(a1'^b0)=13 rc(a1^b0')=14 rc(a1'^b0')=9 similarly, rc(a1^b3)=17 rc(a1'^b3)=5 rc(a1^b3')=17 rc(a1'^b3')=17 similarly, rc(a1^b2)=22 rc(a1'^b2)=12 rc(a1^b2')=12 rc(a1'^b2')=10 a 1 2 b 1 3 5 b 3 1 2 a 1 3 4 b 1 3 b 1 3 5 b 2 1 3 4 b 3 1 2 b 2 1 3 4 b 1 3 k 1 2 5 6 3 4 7 8 9 a d e b c f g h j i l n m o U S T P Q p R O w v I J K L M G E C x y N H F z A D B q r s t u a 3 1 7 a 2 1 3 a 1 3 4 a 1 2 b 3 1 2 b 2 1 3 4 b 1 3 5 b 1 3 7 23 34 22 35 33 49 21 56 9 9 8 17 16 16 19 12 14 10 12 17 10 9 9 12 18 25 18 18 15 31 22 30 32 21 20 20 a3 a2 a1 a0 b3 b2 b1 b0 a3' a2' a1' a0' b3' b2' b1' b0' 29 18 13 24 18 14 14 21 29 19 13 20 18 13 12 22 19 28 15 12 22 17 12 15 17 17 17 17 8 5 15 5 5 7 7 12 15 9 19 15 10 15 10 7 13 12 17 22 22 20 27 21 22 17 12 12 14 13 7 14 19 16 15 18 10 16 9 4 14 8 5 7 3 5 6 6 4 2 4 2 1 1 3 a3 a2 a1 a0 b3 b2 b1 b0 a3' a2' a1' a0' b3' b2' b1' b0'

With 4 aj^bj ANDs (i,j=3,2) plus the 12 intra-attribute ANDs required for TV-contouring, preprocessing, TV-contours plus 2-hi cell masks can be created by just plugging the right selection of the resulting rootcounts into a formula. For 3-hi cell masks how many would be needed (in addition to the 12 for TV contouring)? k 1 2 5 6 3 4 7 8 9 a d e b c f g h j i l n m o U S T P Q p R O w v I J K L M G E C x y N H F z A D B q r s t u a 3 1 7 a 2 1 3 a 1 3 4 a 1 2 b 3 1 2 b 2 1 3 4 b 1 3 5 b 1 3 7 23 34 22 35 33 49 21 56 9 9 8 17 16 16 19 12 14 10 12 17 10 9 9 12 18 25 18 18 15 31 22 30 32 21 20 20 a3 a2 a1 a0 b3 b2 b1 b0 a3' a2' a1' a0' b3' b2' b1' b0' 29 18 13 24 18 14 14 21 29 19 13 20 18 13 12 22 19 28 15 12 22 17 12 15 17 17 17 17 8 5 15 5 5 7 7 12 15 9 19 15 10 15 10 7 13 12 17 22 22 20 27 21 22 17 12 12 14 13 7 14 19 16 15 18 10 16 9 4 14 8 5 7 3 5 6 6 4 2 4 2 1 1 3 a3 a2 a1 a0 b3 b2 b1 b0 a3' a2' a1' a0' b3' b2' b1' b0'

We can think of the preprocessing as filling RoloDex cards (Note that this RoloDex is built to fit TV-analysis - i.e., 2-D cards with the primary one containing the needed dual-AND P-tree rootcounts needed for TV analysis (40 red and blue rcs below) dual rc card (the 104 black values shown below) A+A' Attributes and Complements 7 23 34 22 35 33 49 21 56 9 1 1 A+A' 9 8 e.g., a3-triple rc slice card 17 16 16 19 12 14 10 A+A' 12 17 10 9 9 12 18 25 18 18 15 e.g., a3b2-triple rc slice card 31 22 30 32 21 20 20 a3 a2 a1 a0 b3 b2 b1 b0 a3' a2' a1' a0' b3' b2' b1' b0' 29 18 13 24 18 14 14 21 29 19 13 20 18 13 12 22 19 28 15 12 22 17 12 15 17 17 17 17 8 5 15 5 5 7 7 12 15 9 19 15 10 15 10 7 13 12 17 22 22 20 27 21 22 17 12 12 14 13 7 14 19 16 15 18 10 16 9 4 14 8 5 7 3 5 6 6 4 2 4 2 1 1 3 a3 a2 a1 a0 b3 b2 b1 b0 a3' a2' a1' a0' b3' b2' b1' b0'