Presentation is loading. Please wait.

Presentation is loading. Please wait.

FAUST Oblique Analytics Given a table, X(X1

Similar presentations


Presentation on theme: "FAUST Oblique Analytics Given a table, X(X1"— Presentation transcript:

1 FAUST Oblique Analytics Given a table, X(X1
FAUST Oblique Analytics Given a table, X(X1..Xn) with |X|=N and vectors, D=(D1..Dn), FAUST Oblique uses ScalarPTreeSets of the valueTrees, XoD  k=1..nXkDk NextD is a sequence of D's, used when recursively partitioning X into a Clusters (constructing a Cluster Dendogram for X) e.g. a. recursively, take the diagonal maximizing Standard Deviation (STD(CoD)) [or maximizing STD(CoD)/Spread(CoD).] b. recursively, take the AM(CoD)Avg-to-Median; AFFA(CoD)Avg-FurthestFromAvg; FFAFFFFA(CoD)FFA-FurthFromFFA c. recursively cycle thru diagonals: e1,...,..en, e1e2.. or cycle thru AM, AFFA, FFAFFFFA or cycle through both sets Count Change clustering: Choose Density(DT), DensityUniformity(DUT) and PrecipitousCountChange(PCCT) thresholds. If DT (and DUT) are not exceeded at a cluster C, partition C by cutting at each gap and/or PCC in CoD using nextD. FAUST Gap Clusterer cuts in the middle of CoD gaps. This is the old version. It usually chokes on big data. It is included in: FAUST CCClusterer (Count Change Clusterer) cuts at all PCCs. Gap are PCCs, so it includes the old version Outlier Mining: Find the top k objects dissimilarity from the rest of the objects. This might mean: 1.a Find {xh | h=1..k} such that xh maximizes distance(xh, X-{xj | jh}) 1.b Find the top set of k objects, Sk, that maximizes distance(X-Sk.Sk) 2. Given a Training Set, X, identify outliers in each class (correctly classified but noticeably dissimilar from classmates) or Fuzzy cluster X, i.e., assign a weight for each (object, cluster) pair. Then x isa outlier iff w(x,k) < OutlierThreshold k 3. Examine individual new samples for outlierhood, assuming they come in after normalcy has been established by 1 o 2. One can simply cluster to identify and remove large clusters so outliers (small clusters) are revealed by gaps. FAUST TKOutliers (TopK Outliers) uses D2NN = SquareDistance(x, X-{x}) = rankN(x-X)o(x-X) D2NN provides an instantaneous k-slider for 1.a. Instantaneous? UDR on D2NN takes log2n time (and is a 1-time calculation), then a k-slider works instantaneously off that distribution - there is no need to sort D2NN) Dset is a set of Ds used to build a model for fast classification (1-class or k-class) by circumscribing each class with a hull. The larger the Dset the better (for accuracy). D, there is, however, the 1-time construction cost of LD,k and HD,k below. Dset should include DAvgi,jAvg(Ci)Avg(Cj) i>j=1..k [and also Median connectors?]. Should Dset include all DnextD? y is declared to be class=k iff yHullk where Hullk={z| lD,k  Doz  hD,k all D}. FAUST PLClassifier (Piecewise Linear Classifier), k-classes k1. DDset, lD,kmnCkoD (1st PCI?); hD,kmxCkoD (last PCD (If y is in multiple hulls, Hi1..Hih, y isa Ck for the k maximizing OneCount{PCk&PHi..&PHih} or fuzzy classify using those OneCounts as k-weights)

2 For Square Distance analytics, we need {Xox}xX.
In FAUST (clusterer, classifier, outlier detector) the central, reoccurring calculation is XoD = k=1..nXkDk 6 9 XoD 1 pXoD,3 pXoD,2 pXoD,1 pXoD,0 ( = 22 + 21 (1 p1,0 + 1 p11 + 20 (1 p1,0 1 p1,1 + 1 p2,1 ) + 1 p2,0 + 1 p2,0 ) 1 3 3 D D1,1 D1,0 1 1 D2,1 D2,0 1 3 2 X X1 X2 p11 p10 p21 p20 /*Calc PXoD,i after PXoD,i-1 CarrySet=CARi-1,i RawSet=RSi */ INPUT: CARi-1,i, RSi ROUTINE: PXoD,i=RSiCARi-1,i CARi,i+1=RSi&CARi-1,i OUTPUT: PXoD,i, CARi,i+1 1 CAR12,3 & 1 & 1 CAR22,3 PXoD,2 1 1 CAR11,2 & & 1 CAR21,2 PXoD,1 1 & 1 1 1 1 PXoD,0 CAR10,1 & & 1 PXoD,3 CAR13,4 Different data. 3 3 D D1,1 D1,0 1 1 D2,1 D2,0 1 1 ( = 22 + 21 (1 p1,0 + 1 p11 + 20 (1 p1,0 1 p1,1 + 1 p2,1 ) + 1 p2,0 + 1 p2,0 ) 1 3 2 X pTrees 6 18 9 XoD 1 1 1 1 1 1 1 1 1 1 1 1 1 1 & 1 CAR13,4 1 CAR12,3 & 1 CAR11,2 & 1 PXoD,0 CAR10,1 & 1 & CAR23,4 PXoD,3 1 & CAR22,3 1 & CAR32,3 1 & CAR42,3 PXoD,2 1 & CAR21,2 & 1 CAR31,2 PXoD,1 & 1 CAR41,2 1 PXoD,4 We are extending to pTrees, the Galois field, GF(2)={0,1}, XOR=add, AND=multiply (Checksums and ciphers use FG(2)). For Square Distance analytics, we need {Xox}xX. We create {Rankid2(x,X)}i=N-1..N-q some q (pTrees masking Rankid2(x,X) points). We install the ptrs into RankiPTR(x, ptr-to-Rankid2(x,X)) and square distances into RankiSQD[x,d2(x,Rankid2(x,X))] ordered descending on square distance (Does not require a sort since the rows are created one at a time and can be inserted in order when created (use binary search thru 2nd col to identify the proper insertion pt in log time). These tables will facilitate many very fast outlier and cluster analytics, most specifically a k-slider for FAUST TKO.

3 XoD is the central SPTS computation in FAUST. e. g
XoD is the central SPTS computation in FAUST. e.g., XoD is the only SPTS needed in CCClusterer and PLClassifier. For each xX, TKOutliers computes d2(X,x)=(X-x)o(X-x)=XoX+xox-2Xox. XoX is computed just once, so that cost is amortized over X. Then xox is a row lookup in XoX. Then the TKO K-slider uses the table of ptrs to mask pTrees: RankiP(x,ptr-to-Rankid2(X,x)) and the vTrees, RankiD(x,d2(x,Rankid2(X,x))) ordered desc on Rankid2(X,x) i=N-1..N-q. XoX+pop-2Xop - [Xod-pod]2 The RadialReach SPTS for barrel analytics, SRR(p,d), measures the square of the radial distance of each xX from the line through p in the direction of unit vector, d: (X-p)o(X-p) - [(X-p)od]2 = p x d (x-p)o(x-p) (x-p)od = |x-p| cos  (x-p)o(x-p) - (x-p)od2 Computed 1 time: XoX, -2Xop, Xod. then 2 scalar adds, 1 SPTS mult, 2 adds If X is a high-value classification training set (eg, Enron s), what should we pre-compute? 1. column statistics(min, avg, max, std,...) ; 2. XoX; Xop, p=class_Avg/Median); 3. Xod, d=interclass_Avg/Median_UnitVector; 4. Xox, d2(X,x), Rankid2(X,x), xX, i=N-1,N-2...; 5. SRR(p,d) for all p's and d's above FAUST PLC-i (PLC incremental) classifier should produce better accuracy even faster than PLC, as follows: Sequence the class pairs, e.g., (C1,C2)..(C1,CK); (C2,C3)..(C2,CK); (CK-1,CK). For the next (Ci1,Ci2) in the sequence, let D AvgCi1AvgCi2 and let lD,k  minCkoD (or 1st PCI?), hD,k maxCkoD (or last PCD), k=1..K FAUST PLC-i on IRIS150 Dse S y isa OTHER if yoDse (-,495)(802,1061)(2725,) E y isa O or S if yoDse  C1,1  [ 495 , 802] I y isa O or I if yoDse  C1,2  [1061 ,1270] L H y isa O or E or I if yoDse  C1,3  [1270 ,2010] y isa O or I if yoDse  C1,4  [2010 ,2725] Dsi S y isa O if yoDsi (-,1006)(1474,1861)(4291,) E y isa O or S if yoDsi  C1,5  [1006 ,1474] I y isa O or I if yoDsi  C1,6  [1861 ,2100] L H y isa O or E or I if yoDsi  C1,7  [2100 ,3243] y isa O or I if yoDsi  C1,8  [3243 ,3291] Dsi S y isa O if yoDsi (-,484)(679,800)(1566,) E y isa O or S if yoDsi  C1,5  [ 484 , 679] I y isa O or I if yoDsi  C1,6  [ 800 , 830] L H y isa O or E or I if yoDsi  C1,7  [ 830 ,1233] y isa O or I if yoDsi  C1,8  [1233 ,1566] This isn't truely incremental yet! On the next slide,we redo it truely incrementally.

4 PLC-i on IRIS150 (w incrementation)
C13 Dse S E I L H y isa OTHER if yoDse (-,495)(802,1061)(2725,) y isa OTHER or S if yoDse  C1,1  [ 495 , 802] y isa OTHER or I if yoDse  C1,2  [1061 ,1270] y isa OTHER or E or I if yoDse  C1,3  [1270 ,2010 C1,3: 0 s 49 e 11 i y isa OTHER or I if yoDse  C1,4  [2010 ,2725] Dei E y isa O if yoDei (-,-117)(-3,) I y isa O or E or I if yoDei  C2,1  [-62 ,-44] L H y isa O or I if yoDei  C2,2  [-44 , -3] C2,1: 2 e 4 i Dei E y isa O if yoDei (-,420)(459,480)(501,) I y isa O or E if yoDei  C3,1  [420 ,459] L H y isa O or I if yoDei  C3,2  [480 ,501] Continue this on clusters with OTHER + one class, so the hull fits tightely (reducing false positives), using diagonals? C1,1: D=1000 y isa O if yoD(-,43)(58,) L H y isa O|S if yoD C2,3  [43,58] C2,3: D=0100 y isa O if yoD(-,23)(44,) L H y isa O|S if yoD C3,3  [23,44] C3,3: D=0010 y isa O if yoD(-,10)(19,) L H y isa O|S if yoD C4,1  [10,19] C4,1: D=0001 1 6 y isa O if yoD(-,1)(6,) L H y isa O|S if yoD C5,1  [1,6] C5,1: D=1100 y isa O if yoD(-,68)(117,) L H y isa O|S if yoD C6,1  [68,117] C6,1: D=1010 y isa O if yoD(-,54)(146,) L H y isa O|S if yoD C7,1  [54,146] C7,1: D=1001 y isa O if yoD(-,44)(100,) L H y isa O|S if yoD C8,1  [44,100] C8,1: D=0110 y isa O if yoD(-,36)(105,) L H y isa O|S if yoD C9,1  [36,105] C9,1: D=0101 y isa O if yoD(-,26)(61,) L H y isa O|S if yoD Ca,1  [26,61] Ca,1: D=0011 y isa O if yoD(-,12)(91,) L H y isa O|S if yoD Cb,1  [12,91] Cb,1: D=1110 y isa O if yoD(-,81)(182,) L H y isa O|S if yoD Cc,1  [81,182] Cc,1: D=1101 y isa O if yoD(-,71)(137,) L H y isa O|S if yoD Cd,1  [71,137] Cd,1: D=1011 y isa O if yoD(-,55)(169,) L H y isa O|S if yoD Ce,1  [55,169] Ce,1: D=0111 y isa O if yoD(-,39)(127,) L H y isa O|S if yoD Cf,1  [39,127] Cf,1: D=1111 y isa O if yoD(-,84)(204,) L H y isa O|S if yoD Cg,1  [84,204] Cg,1: D=1-100 y isa O if yoD(-,10)(22,) L H y isa O|S if yoD Ch,1  [10,22] Ch,1: D=10-10 y isa O if yoD(-,3)(46,) L H y isa O|S if yoD Ci,1  [3,46] The amount of work yet to be done., even for only 4 attributes, is immense.. For each D, we should fit boundaries for each class, not just one class. For 4 attributes, I count 77 diagonals*3 classes = 231 cases. How many in the Enron case with 10,000 columns? Too many for sure!! D, not only cut at minCoD, maxCoD but also limit the radial reach for each class (barrel analytics)? Note, limiting the radial reach limits all other directions [other than the D direction] in one step and therefore by the same amount. I.e., it limits all directions assuming perfectly round clusters). Think about Enron, some words (columns) have high count and others have low count. Our radial reach threshold would be based on the highest count and therefore admit many false positives. We can cluster directions (words) by count and limit radial reach differently for different clusters??

5 PLC-i on IRIS150 (redo ) y isa O if yoD (-,-184)(382,590)(2725,) y isa O or S if yoD  C1,1  [-184 , 123] Dse ; xoDes: S E I y isa O or I if yoD  C1,2  [ 381 , 590] y isa O or E or I if yoD  C1,3  [ 590 ,1331] y isa O or I if yoD  C1,4  [1331 ,2046] SRR(AVGs,Dse) onC1,1 S y isa O if y isa C1,1 AND SRR(AVGs,Dse)(154,) if y isa C1,1 AND SRR(AVGs,DSE)[0,154] yCR1,1 SRR(AVGs,Dse) onC1,2 only one such I SRR(AVGs,Dse) onC1,3 E I y isa O if y isa C1,3 AND SRR(AVGs,Dse)(-,2)U(393,) y isa O or E if y isa C1,3 AND SRR in [2,6) y isa O or E or I if y isa C1,3 AND SRR in [6,137) y isa O or I if y isa C1,3 AND SRR in [137,393) rtc. Note we don't have to treat all directions the same, either in the linear step (dot product onto the D-line) or in the radial step. In the past we have thought about projection X onto a higher dimensional space (higher than the dimension=1 of the D-line). That may be productive, however, we were not able to make any progress on that idea as yet (expanding the dimension of the projection range) Another idea is to limit the domain of the projections (a third idea is to limit both domain and range - later!) In both the initial dot product projections onto a D-Line and the subsequent radial reach projections onto a Radial_Reach_Line, we can attempt to cluster directions into "similar" clusters in some way and limit the domain of our projections to one of these clusters at a time. For the linear projection step, that would simply mean taking one of the "similarity clusters" of dimensions or directions, let's say it is expressed as a set of D's. {Di | i=1..m}, and using the projection, i=1..m XoDi If Di=e1, then this is just i=1..m Xi So we could cluster dimension, choose D;s and just do the dot project over those few dimensions with that D. Then we could do the same with the radial reach. At any step of the process we can limit the dimensions of the radial reach projection the same general way (to a cluster of similar dimensions, e.g., in the enron case the dimensions would be words that have about the same count???

6 FAUST Oblique XoD used in CGC, CCC, TKO, PLC) and (x-X)o(x-X) APPENDIX
Example: FAUST Oblique XoD used in CGC, CCC, TKO, PLC) and (x-X)o(x-X) = -2Xox+xox+XoX is used in FTKO. APPENDIX So in FAUST, we need to construct lots of SPTSs of the type, X dotted with a fixed vector, a costly pTree calculation (Note that XoX is a costly pTree calculation also, but it is a 1-time calculation (a pre-calculation?). xox is calculated for each individual x but it's a scalar calculation. Thus, we should optimize the living he__ out of the XoD calculation!!! The method on the previous slide seems efficient. Is there a better method? Then for FTKO we need to computer ranks: RankK: p is what's left of K yet to be counted, initially p=K V is the RankKvalue, initially 0. For i=bitwidth+1 to 0 if Count(P&Pi)  p { KVal=KVal+2i; P=P&Pi }; else /* < p */ { p=p-Count(P&Pi); P=P&P'i }; RankN-1(XoD)=Rank2(XoD) 1 1 D=x1 D1,1 D1,0 0 1 D2,1 D2,0 1 P=P&p1 32 1*21+ P p1 n=1 p=2 1 P=p0&P 22 1*21+1*20=3 so -2x1oX = -6 P &p0 n=0 p=2 1 3 2 X X1 X2 p11 p10 p21 p20 2 3 XoD 1 p3 p2 p1 p,0 RankN-1(XoD)=Rank2(XoD) 3 0 D=x2 D1,1 D1,0 1 1 D2,1 D2,0 0 0 3 9 2 XoD 1 p3 p2 p1 p,0 1 P=P&p'3 1<2 2-1=1 0*23+ P p3 n=3 p=2 1 P=p'2&P 0<1 1-0=1 0*23+0*22 P &p2 n=2 p=1 1 P=p1&P 21 0*23+0*22+1*21+ P &p1 n=1 p=1 1 P=p0&P 11 0*23+0*22+1*21+1*20=3 so -2x2oX= -6 P &p0 n=0 p=1 RankN-1(XoD)=Rank2(XoD) 2 1 D=x3 D1,1 D1,0 1 0 D2,1 D2,0 0 1 3 6 5 XoD 1 p3 p2 p1 p,0 1 P=P&p2 22 1*22+ P p2 n=2 p=2 1 P=p'1&P 1<2 2-1=1 1*22+0*21 P &p1 n=1 p=2 1 P=p0&P 11 1*22+0*21+1*20=5 so -2x3oX= -10 P &p0 n=0 p=1

7 (n=3) c=Count(P&P4,3)= 3 < 6
RankN-1(XoD)=Rank2(XoD) 3 3 D D1,1 D1,0 0 1 D2,1 D2,0 n=3 p=2 n=2 p=2 n=1 p=2 n=0 p=2 1 3 2 X X1 X2 p11 p10 p21 p20 2 3 XoD 1 p3 p2 p1 p,0 P p3 P &p2 P &p1 P &p0 1 1 22 1*23+ 1 1 0<2 2-0=2 1*23+0*22+ 1 1 0<2 2-0=2 1*23+0*22+0*21+ 1 1 22 1*23+0*22+0*21+1*20=9 pTree Rank(K) computation: (Rank(N-1) gives 2nd smallest which is very useful in outlier analysis?) 1 1 1 1 P=P&p3 P=p'2&P P=p'1&P P=p0&P Cross out the 0-positions of P each step. (n=3) c=Count(P&P4,3)= < 6 p=6–3=3; P=P&P’4,3 masks off highest (val 8) {0} X P4, P4, P4, P4,0 10 5 6 7 11 9 3 1 1 1 1 (n=2) c=Count(P&P4,2)= >= 3 P=P&P4,2 masks off lowest (val 4) {1} (n=1) c=Count(P&P4,1)= < 3 p=3-2=1; P=P&P'4,1 masks off highest (val8-2=6 ) {0} (n=0) c=Count(P&P4,0 )= >= 1 P=P&P4,0 {1} RankKval=0; p=K; c=0; P=Pure1; /*Note: n=bitwidth-1. The RankK Points are returned as the resulting pTree, P*/ For i=n to 0 {c=Count(P&Pi); If (c>=p) {RankVal=RankVal+2i; P=P&Pi }; else {p=p-c; P=P&P'i }; return RankKval, P; /* Above K=7-1=6 (looking for the Rank6 or 6th highest vaue (which is also the 2nd lowest value) */ 23 * * * * = RankKval= P=MapRankKPts= ListRankKPts={2} 1 {0} {1} {0} {1}

8 UDR Univariate Distribution Revealer (on Spaeth:)
15 UDR Univariate Distribution Revealer (on Spaeth:) applied to S, a column of numbers in bistlice format (an SpTS), will produce the DistributionTree of S DT(S) depth=h=0 depth=h=1 Y y1 y2 y y y y y y y y y ya 13 4 pb 10 9 yc 11 10 yd 9 11 ye 11 11 yf 7 8 yofM 11 27 23 34 53 80 118 114 125 110 121 109 83 p6 1 p5 1 p4 1 p3 1 p2 1 p1 1 p0 1 p6' 1 p5' 1 p4' 1 p3' 1 p2' 1 p1' 1 p0' 1 node2,3 [96.128) f= depthDT(S)b≡BitWidth(S) h=depth of a node k=node offset Nodeh,k has a ptr to pTree{xS | F(x)[k2b-h+1, (k+1)2b-h+1)} and its 1count p6' 1 5/64 [0,64) p6 10/64 [64,128) p5' 1 3/32[0,32) 2/32[64,96) p5 2/32[32,64) ¼[96,128) p3' 1 0[0,8) p3 1[8,16) 1[16,24) 1[24,32) 1[32,40) 0[40,48) 1[48,56) 0[56,64) 2[80,88) 0[88,96) 0[96,104) 2[194,112) 3[112,120) 3[120,128) p4' 1 1/16[0,16) p4 2/16[16,32) 1[32,48) 1[48,64) 0[64,80) 2[80,96) 2[96,112) 6[112,128) Pre-compute and enter into the ToC, all DT(Yk) plus those for selected Linear Functionals (e.g., d=main diagonals, ModeVector . Suggestion: In our pTree-base, every pTree (basic, mask,...) should be referenced in ToC( pTree, pTreeLocationPointer, pTreeOneCount ).and these OneCts should be repeated everywhere (e.g., in every DT). The reason is that these OneCts help us in selecting the pertinent pTrees to access - and in fact are often all we need to know about the pTree to get the answers we are after.).

9 So let us look at ways of doing the work to calculate As we recall from the below, the task is to ADD bitslices giving a result bitslice and a set of carry bitslices to carry forward XoD = k=1..nXk*Dk X pTrees XoD 1 3 2 1 1 1 1 6 9 1 1 1 1 3 3 D D1,1 D1,0 1 1 D2,1 D2,0 1 1 ( = 22 + 21 1 p1,0 p11 + 20 1 p1,1 p2,1 ) p2,0 + 1 p2,1 ) p2,0 ) 1 1 1 1 1 1 1 1 1 I believe we add by successive XORs and the carry set is the raw set with one 1-bit turned off iff the sum at that bit is a 1-bit Or we can characterize the carry as the raw set minus the result (always carry forward a set of pTrees plus one negative one). We want a routine that constructs the result pTree from a positive set of pTrees plus a negative set always consisting of 1 pTree. The routine is: successive XORs across the positive set then XOR with the negative set pTree (because the successive pset XOR gives us the odd values and if you subtract one pTree, the 1-bits of it change odd to even and vice versa.): /*For PXoD,i (after PXoD,i-1). CarrySetPos=CSPi-1,i CarrySetNeg=CSNi-1,i RawSet=RSi CSP-1=CSN-1=*/ INPUT: CSPi-1, CSNi-1, RSi ROUTINE: PXoD,i=RSiCSPi-1,iCSNi-1,i CSNi,i+1=CSNi-1,iPXoD,i; CSPi,i+1=CSPi-1,iRSi-1; OUTPUT: PXoD,i, CSNi,i CSPi,i+1 ( = 22 + 21 1 p1,0 p11 + 20 1 p1,1 p2,1 ) p2,0 + 1 p2,1 ) p2,0 ) 1 1 1 1 1 1 RS1 CSN0,1= CSN-1.0PXoD,0 CSP0,1= CSP-1,0RS0 RS0 CSP-1,0=CSN-1,0= PXoD,0 PXoD,1 1 1 1 1 1 1 1 1 1 = = 1 1 1 

10 XoD=k=1,2Xk*Dk with pTrees: qN..q0, N=22B+roof(log2n)+2B+1
FCC Clusterer If DT (and/or DUT) are not exceeded at C, partition C further by cutting at each gap and PCC in CoD For a table X(X1...Xn), the SPTS, Xk*Dk is the column of numbers, xk*Dk. XoD is the sum of those SPTSs, k=1..nXk*Dk Xk*Dk = Dkb2bpk,b = 2BDkpk,B Dkpk,0 = Dk(2Bpk,B +..+20pk,0) = (2Bpk,B +..+20pk,0) (2BDk,B+..+20Dk,0) + 22B-1(Dk,B-1pk,B +..+20Dk,0pk,0 = 22B( Dk,Bpk,B) +Dk,Bpk,B-1) XoD = k=1..nXk*Dk k=1..n ( = 22B + 22B-1 Dk,B pk,B-1 + Dk,B-1 pk,B + 22B-2 Dk,B pk,B-2 + Dk,B-1 pk,B-1 + Dk,B-2 pk,B + 22B-3 Dk,B pk,B-3 + Dk,B-1 pk,B-2 + Dk,B-2 pk,B-1 +Dk,B-3 pk,B + 23 Dk,B pk,0 + Dk,2 pk,1 + Dk,1 pk,2 +Dk,0 pk,3 + 22 Dk,2 pk,0 + Dk,1 pk,1 + Dk,0 pk,2 + 21 Dk,1 pk,0 + Dk,0 pk,1 + 20 Dk,0 pk,0 Dk,B pk,B . . . 1 3 2 X pTrees 1 2 D D1,1 D1,0 0 1 D2,1 D2,0 1 0 B=1 XoD=k=1,2Xk*Dk with pTrees: qN..q0, N=22B+roof(log2n)+2B+1 k=1..2 ( = 22 + 21 Dk,1 pk,0 + Dk,0 pk,1 + 20 Dk,0 pk,0 Dk,1 pk,1 q0 = p1,0 = no carry 1 q1= carry1= 1 ( = 22 + 21 D1,1 p1,0 + D1,0 p11 + 20 D1,0 p1,0 D1,1 p1,1 + D2,1 p2,1 ) + D2,1 p2,0 + D2,0 p2,1 ) + D2,0 p2,0 ) q2=carry1= no carry 1 ( = 22 + 21 D1,1 p1,0 + D1,0 p11 + 20 D1,0 p1,0 D1,1 p1,1 + D2,1 p2,1 ) + D2,1 p2,0 + D2,0 p2,1 ) + D2,0 p2,0 ) 1 So, DotProduct involves just multi-operand pTree addition. (no SPTSs and no multiplications) Engineering shortcut tricka would be huge!!! q0 = carry0= 1 3 3 D D1,1 D1,0 1 1 D2,1 D2,0 1 1 q1=carry0+raw1= carry1= 1 2 ( = 22 + 21 1 p1,0 p11 + 20 1 p1,1 p2,1 ) p2,0 + 1 p2,1 ) p2,0 ) 1 A carryTree is a valueTree or vTree, as is the rawTree at each level (rawTree = valueTree before carry is incl.). In what form is it best to carry the carryTree over? (for speediest of processing?) 1. multiple pTrees added at next level? (since the pTrees at the next level are in that form and need to be added) 2. carryTree as a SPTS, s1? (next level rawTree=SPTS, s2, then s10& s20 = qnext_level and carrynext_level ? q2=carry1+raw2= carry2= 1 q3=carry2 = carry3= 1

11 Should we pre-compute all pk,i*pk,j p'k,i*p'k,j pk,i*p'k,j
Question: Which primitives are needed and how do we compute them? X(X1...Xn) D2NN yields a 1.a-type outlier detector (top k objects, x, dissimilarity from X-{x}). D2NN = each min[D2NN(x)] (x-X)o(x-X)= k=1..n(xk-Xk)(xk-Xk)=k=1..n(b=B..02bxk,b-2bpk,b)( (b=B..02bxk,b-2bpk,b) =k=1..n( b=B..02b(xk,b-pk,b) ) ( ----ak,b--- b=B..02b(xk,b-pk,b) ) ( 22Bak,Bak,B + 22B-1( ak,Bak,B-1 + ak,B-1ak,B ) + { 22Bak,Bak,B } 22B-2( ak,Bak,B-2 + ak,B-1ak,B-1 + ak,B-2ak,B ) + {2B-1ak,Bak,B B-2ak,B-12 22B-3( ak,Bak,B-3 + ak,B-1ak,B-2 + ak,B-2ak,B-1 + ak,B-3ak,B ) + { 22B-2( ak,Bak,B-3 + ak,B-1ak,B-2 ) } 22B-4(ak,Bak,B-4+ak,B-1ak,B-3+ak,B-2ak,B-2+ak,B-3ak,B-1+ak,B-4ak,B)... {22B-3( ak,Bak,B-4+ak,B-1ak,B-3)+22B-4ak,B-22} (2Bak,B+ 2B-1ak,B-1+..+ 21ak, 1+ 20ak, 0) =k D2NN=multi-op pTree adds? When xk,b=1, ak,b=p'k,b and when xk,b=0, ak,b= -pk.b So D2NN just multi-op pTree mults/adds/subtrs? Each D2NN row (each xX) is separate calc. =22B ( ak,B2 + ak,Bak,B-1 ) + 22B-1( ak,Bak,B-2 ) + 22B-2( ak,B-12 22B-3( ak,Bak,B-4+ak,B-1ak,B-3) + 22B-4ak,B + ak,Bak,B-3 + ak,B-1ak,B-2 ) + Should we pre-compute all pk,i*pk,j p'k,i*p'k,j pk,i*p'k,j ANOTHER TRY! X(X1...Xn) RKN (Rank K Nbr), K=|X|-1, yields1.a_outlier_detector (top y dissimilarity from X-{x}). Install in RKN, each RankK(D2NN(x)) (1-time construct but for. e.g., 1 trillion xs? |X|=N=1T, slow. Parallelization?) xX, the square distance from x to its neighbors (near and far) is the column of number (vTree or SPTS) d2(x,X)= (x-X)o(x-X)= k=1..n|xk-Xk|2= k=1..n(xk-Xk)(xk-Xk)= k=1..n(xk2-2xkXk+Xk2) = -2 kxkXk kxk kXk2 3. Pick this from XoX for each x and add to 2. = xoX xox XoX 5. Add 3 to this k=1..n i=B..0,j=B..02i+jpk,ipk,j 1. precompute pTree products within each k i,j 2i+j kpk,ipk,j 2. Calculate this sum one time (independent of the x) -2xoX cost is linear in |X|=N. xox cost is ~zero XoX is 1-time -amortized over xX (i.e., =1/N) or precomputed The addition cost, -2xoX + xox + XoX, is linear in |X|=N So, overall, the cost is linear in |X|=n. Data parallelization? No! (Need all of X at each site.) Code parallelization? Yes! (After replicating X to all sites, Each site creates/saves D2NN for its partition of X, then sends requested number(s) (e.g., RKN(x) ) back.


Download ppt "FAUST Oblique Analytics Given a table, X(X1"

Similar presentations


Ads by Google