Presentation is loading. Please wait.

Presentation is loading. Please wait.

FAUST Oblique Analytics are based on a linear or dot product, o Let X(X1...Xn) be a table. FAUST Oblique analytics employ.

Similar presentations


Presentation on theme: "FAUST Oblique Analytics are based on a linear or dot product, o Let X(X1...Xn) be a table. FAUST Oblique analytics employ."— Presentation transcript:

1 FAUST Oblique Analytics are based on a linear or dot product, o Let X(X1...Xn) be a table.
FAUST Oblique analytics employ the ScalarPTreeSet (SPTS) of a valueTree, XoD  k=1..nXk*Dk, D=(D1...Dn)=a fixed vector. FAUST Count Change (FC2) for clustering Choose a nextD recursion plan to specify which D to use at each recursive step, e.g., if a cluster, C, needs further partitioning, a. D = the diagonal producing the maximum Standard Deviation (STD(C)) or maximum STD(C)/Spread(C). b. AM(C) (Average-to-Median) c. AFFA(C) (Avg-FurthestFromAverage) [or FFAFFF(C) (FurthestFromAvg-FurthestFromFurthest)]. d. cycle thru diagonals: e1,...,..en, e1e2.. Or cycle thru AM, AFFA, FFAFFF or cycle through both. Choose DensityThreshold(DT), DensityUniformityThreshold(DUT),Precipitous Count Change (PCC) def (PCCs include gaps). ALGORITHM: If DT (and DUT) are not exceeded at cluster, C, partition by cutting at each PCC in CoD using the nextD. FAUST Polygon Prediction (FP 2) for 1-class or multi-class classification. Let Xn+1= Class label column, C. For each vector, D, let lD,kminCkoD (or the1st Precipitous_Count_Increase=PCI?); hD,k=hD,kmaxCkoD (or the last PCD?). ALGORITHM: y is declared to be class=k iff yHullk where Hullk={z| lD,k  Doz  hD,k all D}. (If y is in multiple hulls, Hi1..Hih, y isa Ck for the k maximizing OneCount{PCk&PHi..&PHih} or fuzzy classify using those OneCounts as k-weights) Outlier Mining can mean: 1. Given a set of n objects and given a k, find the top k objects in terms of dissimilarity from the rest of the objects. 1.a This could mean the k object, xh, (h=1..k) most dissimilar [distant from] their individual complements, X-{xh}, or 1.b The top "set of k objects“, Sk, for which that set is most dissimilar from its complement, X-Sk. 2. Given a Training Set, identify outliers in each class (correctly classified but noticeably dissimilar to fellow class members). 3. Determine "fuzzy" clusters, i.e., assign a weight for each (object, cluster) pair. (A dendogram does that to some extent.). Note: FC3 is a good outlier detector, since it identifies and removes large clusters so small clusters (outliers) appear. FAUST Distance Analytics use the SPTS of the distance valueTree, SquareDistanceToNearestNeighbor (D2NN) FAUST Outlier Observer (FO2) uses D2NN. (L2 or Euclidean distance is best, but L (EIN) works too.) D2NN provides an instantaneous k-slider for 1.a (Find k objects, x, most dissimilar from X-{x}. It’s useful for the others too. I say “instantaneous” because the Univariate Distribution Revealer on D2NN takes log2n time (one time only), then the slider works instantaneously off the high end of that distribution.

2 XoD=k=1,2Xk*Dk with pTrees: qN..q0, N=22B+roof(log2n)+2B+1
FLC Clusterer Choose nextD plan, a Density (DT) and a DensUnif(DUT) threshold and a PrecipCountChange (PCC) Def. If DT (and/or DUT) are not exceeded at C, partition C further by cutting at each gap and PCC in CoD using the nextD. For a table X(X1...Xn), the SPTS, Xk*Dk is the column of numbers, xk*Dk. XoD is the sum of those SPTSs, k=1..nXk*Dk Xk*Dk = Dkb2bpk,b = 2BDkpk,B Dkpk,0 = Dk(2Bpk,B +..+20pk,0) = (2Bpk,B +..+20pk,0) (2BDk,B+..+20Dk,0) + 22B-1(Dk,B-1pk,B +..+20Dk,0pk,0 = 22B( Dk,Bpk,B) +Dk,Bpk,B-1) XoD = k=1..nXk*Dk k=1..n ( = 22B + 22B-1 Dk,B pk,B-1 + Dk,B-1 pk,B + 22B-2 Dk,B pk,B-2 + Dk,B-1 pk,B-1 + Dk,B-2 pk,B + 22B-3 Dk,B pk,B-3 + Dk,B-1 pk,B-2 + Dk,B-2 pk,B-1 +Dk,B-3 pk,B + 23 Dk,B pk,0 + Dk,2 pk,1 + Dk,1 pk,2 +Dk,0 pk,3 + 22 Dk,2 pk,0 + Dk,1 pk,1 + Dk,0 pk,2 + 21 Dk,1 pk,0 + Dk,0 pk,1 + 20 Dk,0 pk,0 Dk,B pk,B . . . 1 3 2 X pTrees 1 2 D D1,1 D1,0 0 1 D2,1 D2,0 1 0 B=1 XoD=k=1,2Xk*Dk with pTrees: qN..q0, N=22B+roof(log2n)+2B+1 k=1..2 ( = 22 + 21 Dk,1 pk,0 + Dk,0 pk,1 + 20 Dk,0 pk,0 Dk,1 pk,1 q0 = p1,0 = no carry 1 q1= carry1= 1 ( = 22 + 21 D1,1 p1,0 + D1,0 p11 + 20 D1,0 p1,0 D1,1 p1,1 + D2,1 p2,1 ) + D2,1 p2,0 + D2,0 p2,1 ) + D2,0 p2,0 ) q2=carry1= no carry 1 ( = 22 + 21 D1,1 p1,0 + D1,0 p11 + 20 D1,0 p1,0 D1,1 p1,1 + D2,1 p2,1 ) + D2,1 p2,0 + D2,0 p2,1 ) + D2,0 p2,0 ) 1 q0 = carry0= 1 3 3 D D1,1 D1,0 1 1 D2,1 D2,0 1 1 q1=carry0+raw1= carry1= 1 2 ( = 22 + 21 1 p1,0 p11 + 20 1 p1,1 p2,1 ) p2,0 + 1 p2,1 ) p2,0 ) 1 A carryTree is a valueTree or vTree, as is the rawTree at each level (rawTree = valueTree before carry is incl.). In what form is it best to carry the carryTree over? (for speediest of processing?) 1. multiple pTrees added at next level? (since the pTrees at the next level are in that form and need to be added) 2. carryTree as a SPTS, s1? (next level rawTree=SPTS, s2, then s10& s20 = qnext_level and carrynext_level ? q2=carry1+raw2= carry2= 1 q3=carry2 = carry3= 1

3 =k=1..n( b=B..02b(xk,b-pk,b) ) ( b=B..02b(xk,b-pk,b) ) |---ak,b---|
FDO Table, X(X1...Xn) D2NN yields a 1.a-type outlier detector (top k objects, x, dissimilarity from X-{x}). X pTrees We install in D2NN, each min[D2NN(x)] (It's a one-time construction but for a trillion x's it will take a while. Is there a massive parallelization scheme?) 1 3 2 1 1 1 1 D2NN(x): (x-X)o(x-X) = k=1..n(xk-Xk)(xk-Xk)=k=1..n(b=B..02bxk,b-2bpk,b)( (b=B..02bxk,b-2bpk,b) =k=1..n( b=B..02b(xk,b-pk,b) ) ( b=B..02b(xk,b-pk,b) ) |---ak,b---| =k=1..n( b=B..0 2bak,b ) ( b=B..0 2bak,b ) = k=1..n ( 2Bak,B + 2B-1ak,B 21ak, 1 + 20ak, 0 ) ( 2Bak,B + 2B-1ak,B 21ak, 1 + 20ak, 0 ) ( 22Bak,Bak,B + 22B-1( ak,Bak,B-1 + ak,B-1ak,B ) + { which is 22Bak,Bak,B-1 } 22B-2( ak,Bak,B-2 + ak,B-1ak,B-1 + ak,B-2ak,B ) + { which is 22B-1ak,Bak,B B-2ak,B-12 22B-3( ak,Bak,B-3 + ak,B-1ak,B-2 + ak,B-2ak,B-1 + ak,B-3ak,B ) + { 22B-2( ak,Bak,B-3 + ak,B-1ak,B-2 ) } 22B-4(ak,Bak,B-4+ak,B-1ak,B-3+ak,B-2ak,B-2+ak,B-3ak,B-1+ak,B-4ak,B)+ {22B-3( ak,Bak,B-4+ak,B-1ak,B-3)+22B-4ak,B-22} {h odd: 22B-h+1i=B..B-(h-1)/2ak,iak,2B-h-i 22B-h(ak,Bak,B-h+ak,B-1ak,B-h ak,B-h+1ak,B-1+ak,B-hak,B) {h even 22B-h+1i=B..B+1-h/2ak,iak,2B-h-i +22B-hak,B-h/22

4 22B-1( ak,Bak,B-1 + ak,B-1ak,B ) + { which is 22Bak,Bak,B-1 }
FDO Table, X(X1...Xn) D2NN yields a 1.a-type outlier detector (top k objects, x, dissimilarity from X-{x}). X pTrees We install in D2NN, each min[D2NN(x)] (It's a one-time construction but for a trillion x's it;s slow. Parallelization?) 1 3 2 1 1 1 1 D2NN(x)= k=1..n(xk-Xk)(xk-Xk)=k=1..n(b=B..02bxk,b-2bpk,b)( (b=B..02bxk,b-2bpk,b) =k=1..n( b=B..02b(xk,b-pk,b) ) ( ----ak,h-- b=B..02b(xk,b-pk,b) ) (2Bak,B+ 2B-1ak,B-1+..+ 21ak, 1+ 20ak, 0) =k=1..n(b=B..02bak,b)( b=B..02bak,b) =k ( 22Bak,Bak,B + 22B-1( ak,Bak,B-1 + ak,B-1ak,B ) + { which is 22Bak,Bak,B } 22B-2( ak,Bak,B-2 + ak,B-1ak,B-1 + ak,B-2ak,B ) + { which is 22B-1ak,Bak,B B-2ak,B-12 22B-3( ak,Bak,B-3 + ak,B-1ak,B-2 + ak,B-2ak,B-1 + ak,B-3ak,B ) + { 22B-2( ak,Bak,B-3 + ak,B-1ak,B-2 ) } 22B-4(ak,Bak,B-4+ak,B-1ak,B-3+ak,B-2ak,B-2+ak,B-3ak,B-1+ak,B-4ak,B)... {22B-3( ak,Bak,B-4+ak,B-1ak,B-3)+22B-4ak,B-22} 22B ( ak,B2 + ak,Bak,B-1 ) + 22B-1( ak,Bak,B-2 ) + 22B-2( ak,B-12 + ak,Bak,B-3 + ak,B-1ak,B-2 ) 22B-3( ak,Bak,B-4+ak,B-1ak,B-3) 22B-4ak,B

5 FLC on IRIS150 DT=1 PCCD: PCCs must involve a hi  5 and be at least a 60% change ( 2 if high=5) from that high. Gaps must be 3 FLC on IRIS150: 1st rnd, D=1-111 (hi STD/spread = 28.2 and high Spread=121) 91.3% accurate after 1st round F Ct Gp (50 0 0) C1(0 25 2) (090) C2( ) (0 0 37) 2nd rnd, D= (which is the highest STD/spread of those  to 1-111: ) C2 C21(0 3 10) (021) (0 11 0)----- C1 F Ct Gp (001) (020) (0 23 0) (001) 97.3% accurate after 2nd round For big datasets, gaps will not appear, so we will have to rely on PCCs only. Next I look at IRIS using only PCCs and hope to find that it doesn't matter what D we pick in that case except for speed and cleanliness. FLC on IRIS150: 1st rnd, D=1-111 F Ct (49 0 0) C1(1 25 2) (090) C2( ) (0 0 37) 1strnd: 14 errs FLC on IRIS150: 1st rnd, D=0001 (14 0 0) F Ct (600)(2800) (200) (070)(080)(0 30 3) (042) (0 1 31) (0 0 14) 1strnd: 6 errs (highest STD.Spread=0,31 tied with 3+4. FLC on IRIS150: 1st rnd, D=0010 1strnd: errs (STD.Spread=0.29 F Ct (400) ( ) ( ) (0 0 24) NOTE: [3,8)=(44 0 0) [8,36)=(2 36 2) [36,38)=(080)

6 NSTD  Std(Xk-minXk)/SpreadXk ?
FLC on SEED DT=1 PCCs have to involve a high of at least 5 and be at least 60% change from that high. Gaps must be  3 NSTD NSTD  Std(Xk-minXk)/SpreadXk ? The Ulitmate PCC clusterer algorithm ? Set Dk to 1 for each column with NSTD>ThresholdNSTD (NSTDT=0.25) Shift X column values as in Gap Preservation above giving the shifted table, Y Make Gap and PCC cuts on YoD If Density < DT at a dendogram node, C, (a cluster), partition C at each gap and PCC in CoD using next D in recursion plan. Using UPCC with D=1111 First round only: F Ct GP C1 (44L, 1M, 47H) C2 (1L, 0M, 0H) C3 (3L, 9M, 3H) C4 (2L, 31M, 0H) C5 (0L, 9M, 0H) errs= spread= Using UPCC with D=1001 First round only: F Ct GP C1 (42L, 1M, 50H) C2 (0L, 22M, 0H) C3 (8L, 21M, 0H) C4 (0L, 6M, 0H) errs= spread= (0L,0M,1H) C1.4.5 (0L,0M,1H) C1.5.5 D= st rnd: 2nd round D=AFFA 85% accurate C1 (50L,22M,50H) C2 (0L,21M,0H) C3 (0L,7M, 0H) C1,1 (0L,4M,0H) C1,2 (6L,17M,0H) C1,3 (13L,1M,2H) C1,4 (5L,0M,0H) C1,5 (5L,0M,0H) C1,6 (16L,0M,6H) C1,7 (2L,0M,12H) C1,8 (3L,0M,28H) C1,2 3rdrnd D=0010 3rd round D=0010 C1,6: 3rd round D= C1,8: errs= spread= F 0 1 Ct GP 1 F Ct F Ct C1,2,1 (4L,0M,0H) C1,2,1 (2L,17M) C1,6,1 (2L,0M,0H) C1,6,2 (14L,0M,2H) C1,6,3 (0L,0M,4H) C1,8,1 (3L,0M,0H) C1,8,2 (0L,0M,28H) 3rd round D= C1,3: 3rd round with D= % accurate F Ct C1,3,3 (0L,0M,2H) C1,3,1 (3L,0M,0H) C1,3,2 (9L,0M,0H) C1,3,2 (0L,1M,0H)

7 NSTD  Std(Xk-minXk)/SpreadXk ?
FLC on CONC CONC counts are L=43, M=52, H= DT= PCCs:  5 60% change from high Gaps:  3 NSTD NSTD  Std(Xk-minXk)/SpreadXk ? D 1111 1st rnd (200) C2(2,2,0) C3( ) C4(6,4,1)(204) C6(5,6,11) C5 C9( ) C10(152)(005)C11(144) (002) F Count Gap C15(0 5 5) (0 2 0) 1st rnd 53% accurate D:1100 2nd rnd C3 ( ) (310) (610) (002) (102) (1 2 0) F Ct GP (5 0 0)(520)(100) (0 5 0) (101) (020) (100) D:1100 2nd rnd C9 ( ) (010) (002) (011) F Ct GP (0 4 0) (002) (0 2 8) (012)(020) (011) D:1100 2nd rnd C15 (0 5 5) (010) F Ct GP (010)(0 0 5) (0 4 0) D:1100 2nd rnd C4 (6 4 1) (200) F Ct GP (200) (101) (140) D:1100 2nd rnd C6 (5 6 11) (100) (100) (0 0 9) (010) F Ct GP (020) ( 2 2 1) (110) (001) D:1100 2nd rnd C2 (2 2 0) F Ct GP (200) (020) D:1100 2nd rnd C5 (2 0 4) F Ct GP 3 45 (200) (004) D:1100 2nd rn10 C10 (1 5 2) (030) F Ct GP (100)(001) (011) (010) D:1100 2nd rn10 C11 (1 4 4) (010) (001) F Ct GP (010)(110) (003) (010) 2nd round 90% accurate FLC on WINE WINE counts are L=57, M=75, H= DT= PCCs:  5 60% change from high. COLUMN STD SPREAD STD/SPR NSTD  Std(Xk-minXk)/SpreadXk ? D 0001 1st rnd Gp>=1 (1 0 0) C3( ) C5(1 20 4) (0 0 2) F Ct C2(22 4 1) C4( ) C6(1 6 3) 1st round 64% accurate F Ct GP C21(5 0 1) C22(10 4 0) (7 0 0) (030) F Ct GP C31(10 8 1) C32(1 5 3) C33(121) C34(4 7 1) C35(481) C2 0100 2nd rnd (22 4 1) Gp>=2 C4 0010 2nd rnd ( ) (020) (010) (100) (001) (100) C6 0010 2nd rnd ( ) F Ct GP (002) (030) (100) (031)

8 FLP on IRIS150 Class1=C1={y1,y2.y3,y4. Class2=C2={y7,y8.y9}.
C e e y y y y y y y yb yc yd ye mn1 1 mx1 3 mn2 1 mx mn mx mn mx mn1 14 mx1 15 mx2 3 mn mx mn mx mn1 9 mx mn2 9 mx mn mx mn mx Class1=C1={y1,y2.y3,y4. Class2=C2={y7,y8.y9}. Class3=C3={yb,yc.yd,ye} xCk iff lok,D  Dox  hik,D D. Shortcuts ? Pre-compute all diagonal mins and maxs; e1, e2, e1+e2, e1-e2. Then there is no pTree processing left to do (just straight forward comparisons). 1 y1y y7 2 y3 y y8 y 3 y y y9 ya 5 6 7 yf yb a x yc b yd ye a b c d e f xf On e1 it is "none-or-the-above" 9,a It is in class3 (red) only ya On e1 it is "none-or-the-above" y On e1 it is "none-or-the-above" f, It is in class2 (green) only Versicolor 1D min max n n n n4 x x x x4 1D FLP Hversicolor has 7 virginica! Versicolor 2D min max n12 n13 n14 n23 n24 n34 n1-2 n1-3 n1-4 n2-3 n2-4 n3-4 x12 x13 x14 x23 x24 x34 x1-2 x1-3 x1-4 x2-3 x2-4 x3-4 1D_2D FLP Hversicolor has 3 virginica! 1D_2D_3D FLP Hversicolor has 3 virginica! Versicolor 3D min max n n n n234 n n1-23 n n n1-24 x x x x234 x x1-23 x x x1-24 n n n1-34 n n n2-34 n2-3-4 x x x1-34 x x x2-34 x2-3-4 Versicolor 4D min max n1234 n n n n n n n x1234 x x x x x x x 1D_2D_3D_4D MCL Hversicolor has 3 virginica (24,27,28) 1D_2D_3D_4D FLP Hvirginica has 20 versicolor errors!! Look at removing outliers (gapped>=3) from Hullvirginica 23 Ct gp ''' e1 Ct gp ... 79 1 e2 Ct gp ... 38 2 e3 Ct gp ... 69 1 e4 Ct gp 25 3 12 Ct gp ... 13 Ct gp 104 ... 146 14 Ct gp ... 102 1 24 Ct gp no outliers 34 Ct gp ... 92 1 Hvirginica 12 versic Hvirginica 15 versic Hvirginica 3 versic 1D FLP Hvirginica only 16 versicolors! One possibility would be to keep track of those that are outliers to their class but are not in any other class hull and put a sphere around them. Then any unclassified sample that doesn't fall in any class hull would be checked to see if it falls in any of the class outlier spheres???

9 FLC for outlier detection When the goal is to only to find outliers as quickly as possible.
1 recursively uses a vector, D=FurthestFromMedian-to-FurthestFromFurthestFromMedia. Mean=(8.53, 4,73) Median=(9, 3) 1 y1y y7 2 y3 y y8 3 y y y9 ya 5 6 7 yf yb a yc b yd ye a b c d e f d2(y1,x), D=y1->y9 4 2 8 17 68 196 170 200 153 145 181 164 85 xoD = xo(14,2) 16 44 32 48 74 132 212 200 216 190 158 174 148 176 114 d2(med,x) 68 40 50 36 17 26 37 53 64 29 xoD2 13. 14. 12. 11. 20. 17. 16. 14 3.9 2.6 0.9 4.2 xoD3 1.6 4.3 3.3 5 7.3 13 20. 19. 21 18. 16. 18 15. 12 FDO-1 won't work for big data. Finding outliers is local. Big data has many localities to exhaustively search. We may need to enclose each outlier in a gapped hulls. Those gapped hulls will likely be filled in when projecting onto a randomly chosen line. I.e., barrel gapping suffers from a chicken-egg problem: First look for linear gaps and then radial gaps out from it. Unless line runs thru outlier radial gap not likely to appear y1 y2 y3 y4 y5 y6 y7 y8 y9 ya yb yc yd ye yf xoD Distribution down to 25: [0,32) [32,64) [64,96) [96,128) [128,160) [160,192) [192,224) Thinnings [0,32) and [64,128). So we check y1,y5,yf. y5 and yf check out as outliers, y1 does not. Note y6 does not either! Let D2 be mean to median and go down to 22: [0,4) [4,8) [8,12) [12,16) [16,20) [20,24) Thinnings [4,12), [20,24). yf checks out as outlier, y4 does not. Note y6 does not either! Let D3 be (Median to FurthestFromMedian)/6 and go down to 22: [0,4) [4,8) [8,12) [12,16) [16,20) [20,24) Thinnings [8,16) yf , y6 check out as outlier, yd does not. This D3 isd best? FOD-1 doesn't work well for interior outlier identifiction (which is the case for all Spaeth outliers. 2 uses FLC Clusterer (CC=Count Change) to find outliers. CC removes big clusters so that as it moves down the dendogram clusters gets smaller and smaller. Thus outliers are more likely to reveal themselves as singletons (and doubletons?) gapped away from their complements. With each dendogram iteration we will attempt to identify outlier candidates and construct the SPTS of distances from each candidate (If the minimum of those distance exceeds a threshold, declare that candidate an outlier.). E;g;, look for outliers using projections onto the sequence of D's = e1,...,en , then diagonals, e1+e2, e1-e2, ... We look for singleton (and doubleton?...) sets gapped away from the other points. We start out looking for coordinate hulls (rectangles) that provide a gap around 1 (or2? or 3?) points only. We can do this by intersecting "thinnings" in each DoX distribution. ote, if all we're interested in anomalies, then we might ignore all PCCs that are not involved in thinnings. This would save lots of time! (A "thinning" is a PCD to below threshold s.t. the next PCC is a PCI to above threshold. The threshold should be  PCC threshold.) 1 y1y y7 2 y3 y y8 3 y y y9 ya 5 6 7 yf yb a yc b yd ye a b c d e f DensityCount/r2 labeled dendogram for FAUST on Spaeth with D=AvgMedian DET=.3 A A So intersect thinnings [1,1]1, [5,7]1 and [13,14]1 with [4,10]2

10 Appendix (scratch work) XoD = k=1..nXk*Dk k=1..n ( = 22B + 22B-1
pTrees B=1 XoD = k=1..nXk*Dk k=1..n ( = 22B + 22B-1 Dk,B pk,B-1 + Dk,B-1 pk,B + 22B-2 Dk,B pk,B-2 + Dk,B-1 pk,B-1 + Dk,B-2 pk,B + 22B-3 Dk,B pk,B-3 + Dk,B-1 pk,B-2 + Dk,B-2 pk,B-1 +Dk,B-3 pk,B + 23 Dk,B pk,0 + Dk,2 pk,1 + Dk,1 pk,2 +Dk,0 pk,3 + 22 Dk,2 pk,0 + Dk,1 pk,1 + Dk,0 pk,2 + 21 Dk,1 pk,0 + Dk,0 pk,1 + 20 Dk,0 pk,0 Dk,B pk,B . . . 1 3 2 1 1 1 1 D D1,1 D1,0 D2,1 D2,0 1 2 0 1 1 0 XoD=k=1,2Xk*Dk with pTrees: qN..q0, N=22B+roof(log2n)+2B+1 = 22 k=1..2 ( Dk,1 pk,1 + 21 k=1..2 ( Dk,1 pk,0 + Dk,0 pk,1 + 20 k=1..2 ( Dk,0 pk,0 = ( D1,1 p1,1 + D2,1 p2,1 ) + 21 ( D1,1 p1,0 + D1,0 p11 + D2,1 p2,0 + D2,0 p2,1 ) ( D1,0 p1,0 + D2,0 p2,0 ) = ( D1,1 p1,1 + D2,1 p2,1 ) + 21 ( D1,1 p1,0 + D1,0 p11 + D2,1 p2,0 + D2,0 p2,1 ) ( D1,0 p1,0 + D2,0 p2,0 ) 1 1 1 q0=p1,0= no carry q1= carry1= q2=carry1+p2,1= no carry 1 1 1 1 XoD=k=1,2Xk*Dk with pTrees: qN..q0, N=22B+roof(log2n)+2B Where does this exponent come from? Since x = ( x1,B2B +..+ x1,020 , , xn,B2B +..+ xn,020 ) and D = ( D1,B2B +..+ D1,020 , , Dn,B2B +..+ Dn,020 ) xoD = (x1,B2B +..+ x1,020) (D1,B2B +..+ D1,020) (xn,B2B +..+ xn,020) (Dn,B2B +..+ Dn,020)

11 FAUST LINEAR CC Clusterer Choose nextD plan, Dens (DT) DensUnif(DUT) thresholds, PrecipCountChange (PCC) If DT (and/or DUT) are not exceeded at C, partition C by cutting at each gap and PCC in CoD using the nextD. Given a table, X(X1...Xn), Xk*Dk is the SPTS (column of numbers) xk*Dk and XoD is the sum of those SPTSs, k=1..nXk*Dk Xk*Dk = Dkb2bpk,b = 2BDkpk,B Dkpk,0 = Dk(2Bpk,B +..+20pk,0) = (2Bpk,B +..+20pk,0) (2BDk,B+..+20Dk,0) XkDk= 22B ( Dk,B pk,B (2BDk,B+..+20Dk,0) XoD = k=1..nXk*Dk = k=1..n, b=B..0 2bDk pk,b = k=1..n, b=B..0 2b pk,b = k=1..n, b=B..0, =B..0 2b+ Dk, pk,b + 22B-1 ( Dk,B pk,B-1 + Dk,B-1 pk,B + 22B-2 ( Dk,B pk,B-2 + Dk,B-1 pk,B-1 + Dk,B-2 pk,B + 22B-3 ( Dk,B pk,B-3 + Dk,B-1 pk,B-2 + Dk,B-2 pk,B-1 + Dk,B-3 pk,B . . . 1 2 . B-1 B = ( Dk,B pk,0 + Dk,2 pk,1 + Dk,1 pk,2 + Dk,0 pk,3 ( Dk,2 pk,0 + Dk,1 pk,1 + Dk,0 pk,2 ( Dk,1 pk,0 + Dk,0 pk,1 ( Dk,0 pk,0 B =b XoD = k=1..nXk*Dk = 22B k=1..n ( Dk,B pk,B + 22B-1 k=1..n ( Dk,B pk,B-1 + Dk,B-1 pk,B + 22B-2 k=1..n ( Dk,B pk,B-2 + Dk,B-1 pk,B-1 + Dk,B-2 pk,B + 22B-3 k=1..n ( Dk,B pk,B-3 + Dk,B-1 pk,B-2 + Dk,B-2 pk,B-1 + Dk,B-3 pk,B . . . + 23 k=1..n ( Dk,B pk,0 + Dk,2 pk,1 + Dk,1 pk,2 + Dk,0 pk,3 + 22 k=1..n ( Dk,2 pk,0 + Dk,1 pk,1 + Dk,0 pk,2 + 21 k=1..n ( Dk,1 pk,0 + Dk,0 pk,1 + 20 k=1..n ( Dk,0 pk,0


Download ppt "FAUST Oblique Analytics are based on a linear or dot product, o Let X(X1...Xn) be a table. FAUST Oblique analytics employ."

Similar presentations


Ads by Google