PDR PTreeSet Distribution Revealer

Slides:



Advertisements
Similar presentations
Descriptive Statistics. A frequency distribution is a table that shows classes or intervals of data entries with a count of the number of entries in.
Advertisements

With PGP-D, to get pTree info, you need: the ordering (the mapping of bit position to table row) the predicate (e.g., table column id and bit slice or.
ReiserFS Hans Reiser
TEMPLATE DESIGN © Predicate-Tree based Pretty Good Protection of Data William Perrizo, Arjun G. Roy Department of Computer.
P-Tree Implementation Anne Denton. So far: Logical Definition C.f. Dr. Perrizo’s slides Logical definition Defines node information Representation of.
Revisiting FAUST for One-class Classification (one class C): I. Let x be an unclassified sample. Let D x be the vector from VoM C to x. Use UDR to construct.
1 p1 p2 p7 2 p3 p5 p8 3 p4 p6 p9 4 pa pf 9 pb a pc b pd pe c d e f a b c d e f X x1 x2 p1 1 1 p2 3 1 p3 2 2 p4 3 3 p5 6 2 p6.
Our Approach  Vertical, horizontally horizontal data vertically)  Vertical, compressed data structures, variously called either Predicate-trees or Peano-trees.
Enclose clusters with gaps using functionals (ScalarPTreeSets or SPTSs): C p,d (x)=(x-p) o d /  (x-p) o (x-p) Conical Separating clusters by cone gaps.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
FAUST Oblique Analytics : X(X 1..X n )  R n |X|=N, Classes={C 1..C K }, d=(d 1..d n ) |d|=1, p=(p 1..p n )  R n, L, R: FAUST C ount C hange C lusterer.
Correspondences this week From: Arjun Roy Sent: Sunday, March 02, :14 PM Subject: C++/C# Compiler I did some tests to compare C/C++ Vs C# on some.
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 7 (W5)
Knowledge Discovery in Protected Vertical Information Dr. William Perrizo University Distinguished Professor of Computer Science North Dakota State University,
FAUST Technology for Clustering (includes Anomaly Detection) and Classification (Where are we now?) FAUST technology for classification/clustering is built.
Q&A f=distance dominated functional, avgGap=(f max -f min )/|f(X)| may be a good measurement for setting thresholds, e.g., x is an outlier=anomaly if.
FAUST Technology for Clustering (includes Anomaly Detection) and Classification (Where are we now?) FAUST technology for classification/clustering is built.
EXCEL CHAPTER 6 ANALYZING DATA STATISTICALLY. Analyzing Data Statistically Data Characteristics Histograms Cumulative Distributions Classwork: 6.1, 6.6,
Spatial Data Management
Data Transformation: Normalization
Linear Algebra Review.
4k… 4K format was named because it has 4000 pixels horizontal resolution approximately. Meanwhile, standard 1080p and 720p resolutions were named because.
Updating SF-Tree Speaker: Ho Wai Shing.
Heaps, Heapsort, and Priority Queues
Inference in Bayesian Networks
Bipartite Matching Lecture 8: Oct 7.
Database Management Systems (CS 564)
FAUST for One-class Classification Let the one class = C
B+ Tree.
PC trees and Circular One Arrangements
Introduction to Algorithms
i206: Lecture 13: Recursion, continued Trees
The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
Taking our Pulse on FAUST Classifiers, 03/01/2014
FAUST Oblique Analytics Given a table, X(X1
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
Fitting Curve Models to Edges
Classify x as class C iff there exists cC such that (c-x)o(c-x)  r2
More Loop Examples Functions and Parameters
FAUST Oblique Analytics Given a table, X(X1
Heaps, Heapsort, and Priority Queues
FAUST Outlier Detector To be used when the goal is to find outliers as quickly as possible. FOD recursively uses a vector, D=FurthestFromMedian-to-FurthestFromFurthestFromMedia.
Ch. 8 Priority Queues And Heaps
FAUST Oblique Analytics Given a table, X(X1
Lectures on Graph Algorithms: searching, testing and sorting
Data Structures & Algorithms Union-Find Example
Sorting … and Insertion Sort.
Shortest Path Trees Construction
Virtual Memory Hardware
Kevin Mason Michael Suggs
Let's review Data Analytics Technology, Supervised and Supervised
FAUST Oblique Analytics are based on a linear or dot product, o Let X(X1...Xn) be a table. FAUST Oblique analytics employ.
Frequencies and the normal distribution
Dynamic Programming Merge Sort 1/18/ :45 AM Spring 2007
Functional Analytic Unsupervised and Supervised data mining Technology
Recurrences (Method 4) Alexandra Stefan.
Database Design and Programming
Binary Search Trees 7/16/2009.
ITEC 2620M Introduction to Data Structures
Pseudorandom number, Universal Hashing, Chaining and Linear-Probing
CS 261 – Data Structures Trees.
BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies
Merge Sort 4/28/ :13 AM Dynamic Programming Dynamic Programming.
Hashing.
Error Correction Coding
Linear Time Sorting.
Chapter 10 Def: The subprogram call and return operations of
Classify x into C iff there exists cC such that (c-x)o(c-x)  r2
DIVIDE AND CONQUER.
Algorithms Tutorial 27th Sept, 2019.
Presentation transcript:

PDR PTreeSet Distribution Revealer 15 PDR PTreeSet Distribution Revealer depth=0 will produce a Distribution Tree (DT)  SpTS, S, in a PTreeSet 5 10 node2,3 depth=1 X x1 x2 p1 1 1 p2 3 1 p3 2 2 p4 3 3 p5 6 2 p6 9 3 p7 15 1 p8 14 2 p9 15 3 pa 13 4 pb 10 9 pc 11 10 pd 9 11 pe 11 11 pf 7 8 xofM 11 27 23 34 53 80 118 114 125 110 121 109 83 p6 1 p5 1 p4 1 p3 1 p2 1 p1 1 p0 1 p6' 1 p5' 1 p4' 1 p3' 1 p2' 1 p1' 1 p0' 1 f= 3 2 2 8 1 2 1 1 0 2 2 6 0 1 1 1 1 0 1 0 0 0 2 0 0 2 3 3 depthDT(S)b≡BitWidth(S). h=depth, k=node offset, Nodeh,k has a ptr to pTree{xS | F(x)[k2b-h, (k+1)2b-h)} & its 1count p6' 1 5/64 [0,64) p6 10/64 [64,128) p3' 1 0[0,8) p3 1[8,16) 1[16,24) 1[24,32) 1[32,40) 0[40,48) 1[48,56) 0[56,64) 2[80,88) 0[88,96) 0[96,104) 2[194,112) 3[112,120) 3[120,128) p4' 1 1/16[0,16) p4 2/16[16,32) 1[32,48) 1[48,64) 0[64,80) 2[80,96) 2[96,112) 6[112,128) p5' 1 3/32[0,32) 2/32[64,96) p5 2/32[32,64) ¼[96,128) Pre-compute and enter into the ToC, all DT(Yk) plus those for selected Linear Functionals (e.g., d=main diagonals, ModeVector . Suggestion: In our pTree-base, every pTree (basic, mask,...) should be referenced in ToC( pTree, pTreeLocationPointer, pTreeOneCount ).and these OneCts should be repeated everywhere (e.g., in every DT as defined above). The reason is that these OneCts help us in selecting the pertinent pTrees to access - and in fact are often all we need to know about the pTree to get the answers we are after.)

Assume a SpTS (a vertical column of numbers in bitslice format) produced by a linear functional, F(y)=yod for some unit vector d. (d=ek). Cut at gaps but more generally, at , e.g., all 25% count changes. Why? Big Data may produce no gaps (remote clusters fill in local gaps). However, remote clusters will almost never fill in the local cluster gaps seamlessly (with no abrupt change in count distribution). "The linear projection of any cluster boundary will produce a noticeable count change" and "Every noticeable count reveals a cluster bddry". Note, a gap is 2 consecutive count changes (to 0 and back). 25% is a parameter to be studied (varied by dataset ??) What we see from this preliminary look at the SEEDS dataset (from UCI MLR) is that there are few gaps, but many count changes which reveal sub-cluser info. It looks like 25% is a better parameter value than 45% for SEEDS? Value distribution of Column 1 of SEEDS 11 18 12 25 13 18 14 18 15 15 16 13 17 8 18 8 19 21 20 2 21 4 CLASS->lo me hi [11,11] 0 0 18 [12,12] 1 0 24 [13,16] 48 8 8 [16,18] 8 21 0 [19,19] 0 21 0 [20,20] 0 2 0 [21,21] 0 4 0 Note: NO GAPS! but >25% count changes. 6 cut pts  7 clusters SEEDS_2 5 68 6 75 7 7 [5,6] 50 43 50 [7,7] 0 7 0 SEEDS_3 1 11 2 26 3 30 4 42 5 24 6 9 7 5 8 3 [1,1] 11 0 0 [2,3] 22 6 2 [4,4] 8 16 18 [5,5] 1 9 14 [6,6] 1 2 6 [7,7] 1 1 3 [8,8] 0 0 3 SEEDS_4 5 99 6 51 [5,5] 48 1 50 [6,6] 2 49 SEEDS_3 1 11 2 26 3 30 4 42 5 24 6 9 7 5 8 3 [1,1] 11 0 0 [2,5] 37 47 38 [6,8] 2 2 12 SEEDS_4 5 99 6 51 [5,5] 48 1 50 [6,6] 2 49 Here. cut if a count and its successor count differ by > 45% of high count) SEEDS_1 11 18 12 25 13 18 14 18 15 15 16 13 17 8 18 8 19 21 20 2 21 4 [11,18] 50 15 50 [19,19] 0 21 0 [20,21] 0 6 0 SEEDS_2 5 68 6 75 7 7 [5,6] 50 43 50 [7,7] 0 7 0

Cut at all 25% Count Changes On C311 w SEED4 [5,6) 13 1 3 WINE1*8 36 1 4 40 4 1 41 1 1 42 2 2 44 4 1 45 1 3 48 1 1 49 1 1 50 1 1 51 1 1 52 5 1 53 3 1 54 5 1 55 3 1 56 5 1 57 8 1 58 5 1 59 2 1 60 8 1 61 3 1 62 3 1 63 5 1 64 4 1 65 4 1 66 5 1 67 2 1 68 3 1 69 1 1 70 1 1 71 8 1 72 5 1 73 2 1 74 1 1 75 2 1 76 3 2 78 2 1 79 4 1 80 4 1 81 2 1 82 1 1 83 2 1 84 5 1 85 2 3 88 4 2 90 1 2 92 4 4 96 1 4 100 2 2 102 1 18 120 2 [36,40) 1 0 [40,41) 1 3 [41,44) 0 3 [44,45) 1 3 [45,52) 5 0 [52,57)15 6C1 [57,58) 1 7 [58,71) 1 3 [71,71) 0 2 [71,72) 0 8 [72,84) 6 22C2 [84,90) 1 10 [90,121)3 8C3 C1 WINE2*3 9 2 3 12 5 3 15 1 3 18 1 3 21 1 6 27 2 6 33 1 3 36 1 9 45 2 3 48 1 9 57 1 12 69 1 27 96 1 12 108 1 [ 0,12) 1 0 [12,48)10 4C11 [48,57) 1 0 [57,69) 1 0 [69,96) 1 0 [96,108) 0 1 [108,109)0 1 C311 consists of i34 63 28 51 15 e34 60 27 51 16 These are 3.32 appart (small gap) and there won't be any count change (counts will be 1 and 1) I2 Ct gap 27 1 1 28 1 I1 Ct gap 60 1 3 63 1 ACCR 100% IRIS1 F Ct gap 43 1 1 44 3 1 45 1 1 46 4 1 47 2 1 48 5 1 49 6 1 50 10 1 51 9 1 52 4 1 53 1 1 54 6 1 55 7 1 56 6 1 57 8 1 58 7 1 59 3 1 60 6 1 61 6 1 62 4 1 63 9 1 64 7 1 65 5 1 66 2 1 67 8 1 68 3 1 69 4 1 70 1 1 71 1 1 72 3 1 73 1 1 74 1 2 76 1 1 77 4 2 79 1 [43,50)20 1 1 C1 [50,53)19 4 0 C1 [53,54) 1 0 0 [54,59)10 19 5 C2 [59,60) 0 2 1 C3 [60,63) 0 10 6 \ [63,66) 0 6 15\ [66,67) 0 2 0 \ [67,68) 0 3 5 \ [68,70) 0 2 5 \ [70,77) 0 1 7 C3 [77,80) 0 0 5 On C31 w SEED3 1 3 1 2 6 1 3 6 1 4 5 1 5 1 1 6 1 1 7 1 1 8 2 [1,2) 3 0 0 [2,5) 13 1 3 C311 [5,8) 1 0 2 [8,9) 0 0 2 SEED1 11 18 12 25 13 18 14 18 15 15 16 13 17 8 18 8 19 21 20 2 21 4 [11,12) 0 0 18 C1 [12,13) 1 0 24 C2 [13,17) 48 8 8 C3 [17,19) 1 15 0 C4 [19,20) 0 21 0 C5 [20,21) 0 2 0 C6 [21,22) 0 4 0 C7 On C3 using SEED2 [5,6) 17 1 7 C31 [6,7) 31 7 1 C32 C1 w IRIS4*6 6 6 6 \ 12 22 6 \ 18 6 6 \ 24 3 6 \ 30 1 6 \ 36 1 24 39 00 60 3 6 \ 66 1 18 \ 84 1 18 0 5 0 102 1 0 0 1 On C321 w SEED4 [5,6) 13 0 0 [6,7) 1 6 0 On C3 w SEED3 1 7 1 2 8 1 3 11 1 4 9 1 5 2 1 6 1 1 7 1 [1,3)15 0 0 [3,5)14 6 0 C321 [5,8) 2 1 1 Accuracy 140/150=93.3% to 96.67% (~= GM) C2 w IRIS4 2 5 1 3 1 1 4 4 6 10 3 1 11 2 1 12 4 1 13 8 2 15 2 4 19 2 1 20 2 4 24 1 [2,3) 5 0 0 [3,10) 5 0 0 [10,13)0 9 0 [13,19)0 10 0 [19,20)0 0 2 [20,25)0 0 3 CONC1/4 0 4 6 6 3 1 7 2 5 12 10 1 13 2 1 14 3 4 18 9 2 20 4 2 22 5 1 23 3 1 24 5 3 27 3 4 31 3 5 36 4 5 41 2 1 42 1 1 43 4 1 44 3 2 46 3 2 48 2 1 49 2 6 55 13 3 58 8 2 60 6 2 62 5 3 65 4 6 71 16 1 72 4 2 74 3 8 82 4 1 83 7 14 97 2 3 100 1 [ 0, 6) 2 2 0 [ 6,12) 5 0 0 [12,18) 8 7 0 [18,20) 3 6 0 [20,41)16 8 3 [41,55) 5 5 7 [55,58) 1 0 12 [58,71) 3 9 11 [71,72) 0 4 12 [72,82) 0 4 3 [82,97) 0 5 6 [97,101)0 3 1 hopeless! CONC2 0 2 5 5 4 11 16 4 4 20 2 3 23 4 1 24 4 1 25 4 4 29 4 3 32 14 2 34 3 2 36 1 1 37 3 3 40 8 3 43 13 1 44 2 4 48 4 5 53 2 1 54 7 4 58 3 1 59 3 1 60 3 2 62 3 2 64 3 1 65 7 2 67 4 3 70 8 3 73 5 3 76 3 30 106 23 [ 0, 5) 2 0 0 [ 5,16) 1 1 2 [16,32) 1 6 15 C1 [32,34) 1 4 9 C2 [34,43) 8 4 3 C2 [43,65) 11 6 26 C2 [65,107)19 31 0 C3 C2 w CONC4/3 1 7 1 2 7 2 4 6 5 9 19 9 18 16 12 30 13 3 33 4 [ 1, 2) 7 0 0 [ 2, 4) 1 6 0 [ 4, 9) 6 0 0 [ 9,18) 5 2 12 [18,30) 1 2 13 [30,33) 0 0 13 [33,34) 0 4 0 C3 w CONC4/3 1 3 1 2 1 2 4 6 5 9 10 9 18 3 12 30 6 3 33 4 27 60 7 30 90 4 30 120 2 1 121 4 [ 1, 2) 3 0 0 [ 2, 4) 1 0 0 [ 4, 9) 6 0 0 [ 9,18) 7 3 0 [18,30) 2 1 0 [30,60) 0 10 0 [60,90) 0 7 0 [90,120)0 4 0 [120,..)0 6 0 C2 w WINE2 3 1 1 4 1 1 5 4 1 6 5 1 7 1 2 9 2 3 12 2 1 13 3 1 14 1 4 18 1 3 21 2 10 31 1 1 32 1 2 34 1 1 35 1 7 42 1 [3,5) 1 1 [5,7) 2 7 [7,12) 0 3 [12,21)1 9 [21,31)0 2 [31,42)2 2 [42,43)0 1 none of the 6 L were isolated. C3 IRIS4 10 1 2 12 1 1 13 5 1 14 7 1 15 10 1 16 4 1 17 1 1 18 12 1 19 3 1 20 2 1 21 6 1 22 2 1 23 6 1 24 2 1 25 3 C11 w CONC4 3 4 4 7 1 21 28 5 28 56 5 35 91 5 [ 3, 7)1 3 0 [ 7,28)0 1 0 [28,56)0 0 5 [56,91)0 0 5 [91,92)0 0 5 accuracy 132/150 = 88% (~5% > GM) C11 WINE3*3 21 1 3 24 3 9 33 1 3 36 1 3 39 1 3 42 2 18 60 1 9 69 1 3 72 1 15 87 1 6 93 1 [21,24)1 0 [24,33)1 2 [33,42)3 0 [42,60)1 1 [60,72)2 0 [72,87)0 1 [87,93)1 0 [93,94)1 0 [10,13)0 2 0 [13,16)0 19 3 C31 [16,17)0 3 1 C31 [17,18)0 1 0 [18,19)0 1 11C32 [19,21)0 0 5 [21,22)0 0 6 [22,23)0 0 2 [23,24)0 0 6 [24,26)0 0 5 C31 w IRIS3*6 12 2 6 18 2 6 24 3 6 30 4 6 36 3 6 42 4 6 48 1 6 54 2 6 60 1 6 66 2 30 96 1 12 108 1 [12,48) 0 18 0 [48,60) 0 3 0 [60,66) 0 0 1 [66,96) 0 1 1C311 [96,108)0 0 1 [108,.) 0 0 1 C3 w WINE2 6 2 3 9 1 1 10 3 2 12 1 10 22 1 4 26 1 9 35 1 1 36 1 [6,10) 0 3 [10,12) 1 2 [12,26) 0 2 [26,35) 1 0 [35,37) 1 1 C1 w CONC3/4 40 4 12 52 8 9 61 4 3 64 4 11 75 2 [40,52)0 1 C11 [52,61)1 1 6 | [61,64)0 1 3 | [64,75)0 1 3 | [75,76)0 2 0 ACC 135/150=90% (~10% > GM)

So there is potential in analyzing the more general concept of Functional Distribution Changes, FDCs (gaps are pairs of consecutive FDCs). Let's now walk through storing SEEDS, CONCRETE, IRIS and WINE in a pTreeDataBase and using that pDB efficiently in this clustering algorithm (possibly requiring only the pre-computed DT info from the Catalog and never requiring access to the pTrees themselves???) The basic pDB storage object will be a PTreeSet or PTS, which is an sequence of ScalarPTreeSets or SPTSs, each of which is an sequence of pTrees, each of which is a BitArray (compressed or uncompressed, multilevel or singlelevel) A SPTS is a PTS with SequenceLength=1. A pTree is a SPTS with SequenceLength=1 So, an alternative definition: A PTS is a pTreeSequence and a BitWidthSequence, PTS(pS, BWS) where Length(pS)=bBWSb A SPTS has associated with it in the pDB Catalogue, its DT, its Avg, its Median, its Minimum, its Maximum, A pTree has associated with it in the pDB Catalague, its 1-count We note, if all DTs are full (built down to singleton intervals) the DT contains every pre=computation, but such DTs are massive structures!

15 5 10 SPAETH Y y1 Y2 y1 1 1 y2 3 1 y3 2 2 y4 3 3 y5 6 2 y6 9 3 y7 15 1 y8 14 2 y9 15 3 ya 13 4 yb 10 9 yc 11 10 yd 9 11 ye 11 11 yf 7 8 yofM 11 27 23 34 53 80 118 114 125 110 121 109 83 p6 1 p5 1 p4 1 p3 1 p2 1 p1 1 p0 1 p6' 1 p5' 1 p4' 1 p3' 1 p2' 1 p1' 1 p0' 1 f= 3 2 2 8 1 2 1 1 0 2 2 6 0 1 1 1 1 0 1 0 0 0 2 0 0 2 3 3 p6' 1 5/64 [0,64) p6 10/64 [64,128) p5' 1 3/32[0,32) 2/32[64,96) p5 2/32[32,64) ¼[96,128) p3' 1 0[0,8) p3 1[8,16) 1[16,24) 1[24,32) 1[32,40) 0[40,48) 1[48,56) 0[56,64) 2[80,88) 0[88,96) 0[96,104) 2[194,112) 3[112,120) 3[120,128) p4' 1 1/16[0,16) p4 2/16[16,32) 1[32,48) 1[48,64) 0[64,80) 2[80,96) 2[96,112) 6[112,128)

Assume a MainMemory pDB and that each bit position has an address. 1 y1 y2 y7 2 y3 y5 y8 3 y4 y6 y9 4 ya 5 6 7 8 yf 9 yb a yc b yd ye c d e f 0 1 2 3 4 5 6 7 8 9 a b c d e f SPAETH 1 2 3 4 5 6 7 8 9 a b c Y A1 p13 p12 p11 p10 A2 p23 p22 p21 p20 yod p3 p2 p1 p0 y1 1 0 0 0 1 1 0 0 0 1 1.3 0 0 0 1 y2 3 0 0 1 1 1 0 0 0 1 3.0 0 0 1 1 y3 2 0 0 1 0 2 0 0 1 0 2.7 0 0 1 0 y4 3 0 0 1 1 3 0 0 1 1 4.1 0 1 0 0 y5 5 0 1 0 1 2 0 0 1 0 5.2 0 1 0 1 y6 9 1 0 0 1 3 0 0 1 1 9.0 1 0 0 1 y7 15 1 1 1 1 1 0 0 0 1 12. 1 1 0 0 y8 14 1 1 1 0 2 0 0 1 0 12. 1 1 0 0 y9 15 1 1 1 1 3 0 0 1 1 13. 1 1 0 1 ya 13 1 1 0 1 4 0 1 0 0 12. 1 1 0 0 yb 10 1 0 1 0 9 1 0 0 1 13. 1 1 0 1 yc 11 1 0 1 1 10 1 0 1 0 14. 1 1 1 0 yd 9 1 0 0 1 11 1 0 1 1 13. 1 1 0 1 ye 11 1 0 1 1 11 1 0 1 1 15. 1 1 1 1 yf 7 0 1 1 1 8 1 0 0 0 10. 1 0 1 0 1-count 9 6 10 12 5 1 9 9 10 10 5 8 d=e1 d=e2 dnnxx .8 .5 min1 1 min2 1 main diagonal 1 max1 15 max2 11 D 14 10 d e f g h i j k l m n o yod p3 p2 p1 p0 yod p3 p2 p1 p0 yod p3 p2 p1 p0 0.2 0 0 0 0 1.3 0 0 0 1 1.2 0 0 0 1 1.8 0 0 0 1 3.1 0 0 1 1 3.1 0 0 1 1 0.4 0 0 0 0 2.6 0 0 1 0 2.5 0 0 1 0 0.6 0 0 0 0 4.0 0 1 0 0 3.7 0 0 1 1 2.9 0 0 1 0 5.3 0 1 0 1 5.3 0 1 0 1 5.5 0 1 0 1 9.3 1 0 0 1 9.4 1 0 0 1 11. 1 0 1 1 13. 1 1 0 1 14. 1 1 1 0 10. 1 0 1 0 13. 1 1 0 1 13. 1 1 0 1 10. 1 0 1 0 14. 1 1 1 0 15. 1 1 1 1 8.2 1 0 0 0 13. 1 1 0 1 13. 1 1 0 1 2.9 0 0 1 0 12. 1 1 0 0 12. 1 1 0 0 3.1 0 0 1 1 14. 1 1 1 0 13. 1 1 0 1 0.9 0 0 0 0 12. 1 1 0 0 12. 1 1 0 0 2.5 0 0 1 0 14. 1 1 1 0 13. 1 1 0 1 1.0 0 0 0 1 9.8 1 0 0 1 9.1 1 0 0 1 4 1 7 5 10 10 5 8 10 9 5 11 dnxnx .8 -0. dfA .8 .4 f=y1 dMA .9 .3 mainxniagonal 2 A 8.5 4.7 A=Avg M 9 3 M=Med D 14 -10 D 7.5 3.7 =fA D 9 3 =AM KEY: 14,d,4,6,7,b,b,o,2,g,1,3,6,8,9,i,2,m,a,1,1,4,2,a,1,7,3,n,g,k,2,2,0,c,3,l,2,9,1,5,1,f,2,h,s,e,2,j Assume a MainMemory pDB and that each bit position has an address. ToC for Spaeth MMpDB d=DIAGnnxx d=DIAGnxxn d=furth_Avg d=Avg_Med__ p13 p12 p11 p10 p23 p22 p21 p20 p3 p2 p1 p0 p3 p2 p1 p0 p3 p2 p1 p0 p3 p2 p1 p0 pTrees_Array 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o 9 6 10 12 5 1 9 9 10 10 5 8 4 1 7 5 10 10 5 8 10 9 5 11 1Count_Array LOCATION_POINTER_ARRAY 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 The data portion of SpaethMMpDB is 495 bits with 24 15_bit pTrees ( 360 data bits, 135 red pad bits ). Key is 24 pTrees + 24 pad lengths each 5_bits (or just randomly generate array and send seed?) If there are 15 trillion (not just 15) rows, green ToC same size, key same size, data array trillion (480Tb = 60TB) or smaller since pads can stay small ( ~30TB?). Next, put DTs in ToC?

yodfA 1.34 3.13 2.68 4.02 5.36 9.39 13.8 13.4 14.7 12.9 14.2 9.82 15 5 10 p6 1 p5 1 p4 1 p3 1 p2 1 p1 1 p0 1 p6' 1 p5' 1 p4' 1 p3' 1 p2' 1 p1' 1 p0' 1 3 2 2 8 1 2 1 1 0 2 2 6 0 1 1 1 1 0 1 0 0 0 2 0 0 2 3 3 p6' 1 5/64 [0,64) p6 10/64 [64,128) p5' 1 3/32[0,32) 2/32[64,96) p5 2/32[32,64) ¼[96,128) p3' 1 0[0,8) p3 1[8,16) 1[16,24) 1[24,32) 1[32,40) 0[40,48) 1[48,56) 0[56,64) 2[80,88) 0[88,96) 0[96,104) 2[194,112) 3[112,120) 3[120,128) p4' 1 1/16[0,16) p4 2/16[16,32) 1[32,48) 1[48,64) 0[64,80) 2[80,96) 2[96,112) 6[112,128)