pTrees predicate Tree technologies provide fast, accurate horizontal processing of compressed, data-mining-ready, vertical data structures. Applications: PINE Podium Incremental Neighborhood Evaluator uses pTrees for Closed k Nearest Neighbor Classification. FAUST Fast Accurate Unsupervised, Supervised Treemining uses pTtrees for classification and clustering of spatial data. 13 12 1 document 2 3 4 5 course Text person Enroll Buy MYRRH ManY-Relationship-Rule Harvester uses pTrees for association rule mining of multiple relationships. PGP-D Pretty Good Protection of Data protects vertical pTree data. 5,54 | 7,539 | 87,3 | 209,126 | 25,896 | 888,23 | ... key=array(offset,pad) ConCur Concurrency Control uses pTrees for ROCC and ROLL concurrency control. DOVE DOmain VEctors Uses pTrees for database query processing.
PINE Podium Incremental Neighborhood Evaluator uses pTrees for Closed k Nearest Neighbor Classification (CkNNC) First 3NN using horizontal data to classify an unclassified sample, a =( 0 0 0 0 0 0 ). a5 a6 a10=C a11 a12 a13 a14 dis from a=000000 area for 3 nearest nbrs t12 0 0 1 0 1 1 0 2 C=1 wins! t13 0 0 1 0 1 0 0 1 t53 0 0 0 0 1 0 0 1 t15 0 0 1 0 1 0 1 2 0 1 Key a1 a2 a3 a4 a5 a6 a7 a8 a9 a10=C a11 a12 a13 a14 a15 a16 a17 a18 a19 a20 t12 1 0 1 0 0 0 1 1 0 1 0 1 1 0 1 1 0 0 0 1 t13 1 0 1 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 1 1 t15 1 0 1 0 0 0 1 1 0 1 0 1 0 1 0 0 1 1 0 0 t16 1 0 1 0 0 0 1 1 0 1 1 0 1 0 1 0 0 0 1 0 t21 0 1 1 0 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 t27 0 1 1 0 1 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 t31 0 1 0 0 1 0 0 0 1 1 1 0 1 0 0 0 1 1 0 1 t32 0 1 0 0 1 0 0 0 1 1 0 1 1 0 1 1 0 0 0 1 t33 0 1 0 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 1 1 t35 0 1 0 0 1 0 0 0 1 1 0 1 0 1 0 0 1 1 0 0 t51 0 1 0 1 0 0 1 1 0 0 1 0 1 0 0 0 1 1 0 1 t53 0 1 0 1 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 t55 0 1 0 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0 t57 0 1 0 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 0 0 t61 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 1 0 1 t72 0 0 1 1 0 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 t75 0 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 distance=2, don’t replace 0 0 0 0 0 0 distance=4, don’t replace 0 0 0 0 0 0 distance=4, don’t replace 0 0 0 0 0 0 distance=3, don’t replace 0 0 0 0 0 0 distance=3, don’t replace 0 0 0 0 0 0 distance=2, don’t replace 0 0 0 0 0 0 distance=3, don’t replace 0 0 0 0 0 0 distance=2, don’t replace 0 0 0 0 0 0 distance=1, replace 0 0 0 0 0 0 distance=2, don’t replace 0 0 0 0 0 0 distance=2, don’t replace 0 0 0 0 0 0 distance=3, don’t replace 0 0 0 0 0 0 distance=2, don’t replace 0 0 0 0 0 0 distance=2, don’t replace
Next C3NN using horizontal data: (a second pass is necessary to find all other voters that are at distance 2 from a) Vote after 1st scan. t12 0 0 1 0 1 1 0 2 t13 0 0 1 0 1 0 0 1 a5 a6 a10=C a11 a12 a13 a14 distance t53 0 0 0 0 1 0 0 1 Unclassified sample: 0 0 0 0 0 0 3NN set after 1st scan 0 1 Key a1 a2 a3 a4 a5 a6 a7 a8 a9 a10=C a11 a12 a13 a14 a15 a16 a17 a18 a19 a20 t12 1 0 1 0 0 0 1 1 0 1 0 1 1 0 1 1 0 0 0 1 t13 1 0 1 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 1 1 t15 1 0 1 0 0 0 1 1 0 1 0 1 0 1 0 0 1 1 0 0 t16 1 0 1 0 0 0 1 1 0 1 1 0 1 0 1 0 0 0 1 0 t21 0 1 1 0 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 t27 0 1 1 0 1 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 t31 0 1 0 0 1 0 0 0 1 1 1 0 1 0 0 0 1 1 0 1 t32 0 1 0 0 1 0 0 0 1 1 0 1 1 0 1 1 0 0 0 1 t33 0 1 0 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 1 1 t35 0 1 0 0 1 0 0 0 1 1 0 1 0 1 0 0 1 1 0 0 t51 0 1 0 1 0 0 1 1 0 0 1 0 1 0 0 0 1 1 0 1 t53 0 1 0 1 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 t55 0 1 0 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0 t57 0 1 0 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 0 0 t61 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 1 0 1 t72 0 0 1 1 0 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 t75 0 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 d=2, already voted 0 0 0 0 0 0 d=1, already voted 0 0 0 0 0 0 d=2, include it also 0 0 0 0 0 0 d=2, include it also 0 0 0 0 0 0 d=4, don’t include 0 0 0 0 0 0 d=4, don’t include 0 0 0 0 0 0 d=3, don’t include 0 0 0 0 0 0 d=3, don’t include 0 0 0 0 0 0 d=2, include it also 0 0 0 0 0 0 d=3, don’t include 0 0 0 0 0 0 d=2, include it also 0 0 0 0 0 0 d=1, already voted 0 0 0 0 0 0 d=2, include it also 0 0 0 0 0 0 d=2, include it also 0 0 0 0 0 0 d=3, don’t replace 0 0 0 0 0 0 d=2, include it also 0 0 0 0 0 0 d=2, include it also C=0 wins now!
PINE: a Closed 3NN method using pTrees (vertically data structures). 1st: pTree-based C3NN goes as follows: First let all training points at distance=0 vote, then distance=1, then distance=2, ... until 3 votes are cast. For distance=0 (exact matches) constructing the P-tree, Ps then AND with PC and PC’ to compute the vote. a14 1 a13 1 No neighbors at distance=0 a12 1 a11 1 C' 1 a6 1 C 1 C 1 a5 1 Ps key t12 t13 t15 t16 t21 t27 t31 t32 t33 t35 t51 t53 t55 t57 t61 t72 t75 a1 1 a2 1 a3 1 a4 1 a5 1 a6 1 a7 1 a8 1 a9 1 a11 1 a12 1 a13 1 a14 1 a15 1 a16 1 a17 1 a18 1 a19 1 a20 1
pTree-based C3NN: = OR PS(si,1) S(sj,0) a14 1 a14 1 a13 a12 a11 a6 find all distance=1 nbrs: Construct Ptree, PS(s,1) = OR Pi = P|si-ti|=1; |sj-tj|=0, ji = OR PS(si,1) S(sj,0) i=5,6,11,12,13,14 i=5,6,11,12,13,14 j{5,6,11,12,13,14}-{i} P5 P6 P11 P12 P13 P14 0 1 a14 1 a14 1 a13 a12 a11 a6 0 0 a5 a14 1 a13 a12 a11 a6 a5 a14 1 a13 a12 1 1 a11 a6 a5 a14 1 a13 1 0 a12 a11 a6 a5 a14 0 0 1 a13 a12 a11 a6 a5 a13 1 a12 1 a11 1 C' 1 a6 1 C 1 a10 =C 1 a5 1 PD(s,1) 1 key t12 t13 t15 t16 t21 t27 t31 t32 t33 t35 t51 t53 t55 t57 t61 t72 t75 a1 1 a2 1 a3 1 a4 1 a5 1 a6 1 a7 1 a8 1 a9 1 a11 1 a12 1 a13 1 a14 1 a15 1 a16 1 a17 1 a18 1 a19 1 a20 1 OR
pTree-based C3NN, dist=2 nbrs: OR{all double-dim interval-Ptrees}; PD(s,2) = OR Pi,j Pi,j = PS(si,1) S(sj,1) S(sk,0) k{5,6,11,12,13,14}-{i,j} i,j{5,6,11,12,13,14} pTree-based C3NN, dist=2 nbrs: PINE=CkNN in which all training samples vote weighted by their nearness to a (~Olympic podiums) We now have the C3NN set and we can declare C=0 the winner! We now have 3 nearest nbrs. We could quite and declare C=1 winner? 0 1 P5,6 P5,11 P5,12 P5,13 P5,14 P6,11 P6,12 P6,13 P6,14 P11,12 P11,13 P11,14 P12,13 P12,14 P13,14 a14 1 a13 a12 a11 a6 0 0 a5 a14 1 a13 a12 a11 a6 a5 a14 1 a13 a12 1 1 a11 a6 a5 a14 1 a13 1 0 a12 a11 a6 a5 a14 0 0 1 a13 a12 a11 a6 a5 a14 1 a13 a12 a11 a6 0 0 a5 a14 1 a13 a12 a11 a6 0 0 a5 a14 1 a13 a12 a11 a6 0 0 a5 a14 1 a13 a12 a11 a6 0 0 a5 a14 1 a13 a12 a11 a6 a5 a14 1 a13 a12 a11 a6 a5 a14 1 a13 a12 a11 a6 a5 a14 1 a13 1 0 a12 a11 a6 a5 a14 0 0 1 a13 a12 a11 a6 a5 a14 0 0 1 a13 a12 a11 a6 a5 a10 C 1 key t12 t13 t15 t16 t21 t27 t31 t32 t33 t35 t51 t53 t55 t57 t61 t72 t75 a5 1 a6 1 a11 1 a12 1 a13 1 a14 1