Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ld Xod = Xod-pod= Ld-pod Lmind,k= min(Ld&Ck) Lmaxd,k= max(Ld&Ck)

Similar presentations


Presentation on theme: "Ld Xod = Xod-pod= Ld-pod Lmind,k= min(Ld&Ck) Lmaxd,k= max(Ld&Ck)"— Presentation transcript:

1 Ld Xod = Xod-pod= Ld-pod Lmind,k= min(Ld&Ck) Lmaxd,k= max(Ld&Ck)
FAUST Analytics X(X1..Xn)Rn, |X|=N; Classes={C1..CK}; d=(d1..dn), |d|=1; p=(p1..pn)Rn; Functionals: Ld Xod = Xod-pod= Ld-pod Lmind,k= min(Ld&Ck) Lmaxd,k= max(Ld&Ck) Ld,p (X-p)od= Xod-pod= Ld-pod Lmind,p,k= min(Ld,p&Ck) Lmaxd,p,k= max(Ld,p&Ck) Sp  (X-p)o(X-p)= XoX+Xo(-2p)+pop Sminp,k = min(Sp&Ck) Smaxp,k = max(Sp&Ck) Rd,p Sp-L2d,p= XoX+Xo(-2p)+pop-L2d-2pod*Xod+pod2= L-2p-(2pod)d+pop+pod2+XoX-L2d Rmind,p,k = min(Rd,p&Ck) Rmaxd,p,k = max(Rd,p&Ck) GAP: Gap Clusterer If DensThres unrealized, cut C mid-gap of Ld,p&C with next (d,pd)dpSet PCC: Precipitous Count Change Clusterer If DensThres unrealized, cut C at PCCsLd,p&C with next (d,pd)dpSet TKO: Top K Outlier Detector Use D2NN=SqDist(x, X')=rank2Sx for TopKOutlier-slider. LIN: Linear Classifier yCk iff yLHk  {z | Lmind,p,k  (z-p)od  Lmmaxd,pd,k}  (d,p)dpSet LHk is a hull around Ck. dpSet is a set of (d,p) pairs, e.g., (Diag,DiagStartPt). XoX+pop-2Xop - [Xod-pod]2 (X-p)o(X-p) - [(X-p)od]2 = p x d (x-p)o(x-p) (x-p)od = |x-p| cos  (x-p)o(x-p) - (x-p)od2 Xod2 - 2pod Xod + pod2 + pod2+XoX+pop or Xod2 - 2pod Xod - 2pod Xod + pod2 XoX+pop-2Xop - Xod2 + XoX RkiPtr(x,PtrRankiSx). RkiSD(x,rankiS2) ordered desc on rankiSx as it's constructed. Pre-compute what? 1. col stats(min, avg, max, std,...) ; 2. XoX; Xop, p=class_Avg/Med); 3. Xod, d=interclass_Avg/Med_UnitVector; 4. Xox, d2(X,x), Rkid2(X,x), xX, i=2..; 5. Ld,p and Rd,p d,pdpSet LSR: Linear Spherical Radial Classifier: yCk iff yLSRHk{z | Tmind,p,k (z-p)od Tmaxd,p,k (d,p)dpSet, T=L|S|R }

2 A Pillar pkmeans clusterer (k is revealed, not specified.)
Machine Learning is moving data up a concept hierarchy using a similarity and takes two forms: Clustering (unsupervised) groups similar objects into a single higher level object or a cluster. Classification (supervised) is the same but supervised by an existing class assignment function, caf:TrainingSet{Classes}. 1. Clustering for Anomaly Detection (which boils down to finding singleton [and/or doubleton..] clusters.) 2. Clustering can be done to develop a Training Set for classification of future unclassified objects. A similarity is usually a function, s:XXCardinalSet s.t. xX s(x,x)s(x,y) yX (every x must be at least as similar to itself as to any other object) and s(x,y)=s(y,x). OrdinalSet is usually a subset of {0,1,...} (eg, binary {0,1}={No,Yes}). Classification is binary similarity function clustering: s(x,y)=1iff caf(x)=caf(y). Using that part of caf which is known to predict that part unknown m1 A Pillar pkmeans clusterer (k is revealed, not specified.) m4 m1 max'es D1=Dis(X,avgX). Check if m1 is outlier with Sm1 Repeat until m1 non-outlier m2 max;es D2=Dis(X,m1) Repeat until m2 is nonoutlier (use Sm2) M1,2=Pm1m2 M2,1=Pm1<m2 m3 max'es D3=D2+Dis(X,m2) Repeat until m3 non-outlier Mi,3=Pmim3 M3,i=Pmi<m3 i<3 m3 max'es D4=D3+Dis(X,m3) Repeat until m4 non-outlier Mi,4=Pmim4 M4,i=Pmi<m3 i<4 ... Do until the MinDist(mh,mk)k<h < Threshold Mj = &hj Mj,h are cluster mask pTrees for k first round clusters. Apply pk-means from here on. m3 m2 A PCC pkmeans clusterer with a kicker: Assign each (object, class) a ClassWeightReals (all CW init at 0) Classes numbered as they are revealed. As we are identifying pillar d's, compute Ld = Xod and 1. For the next larger PCI in Ld(C), left-to-right. 1.1a If followed by PCD, CkAvg(Ld-1[PCI,PCD]) (or VoM). If Ck is center of a sphere-gap (or barrel gap), declare Classk and mask off. 1.1b If followed by another PCI, declare next Classk=the sphere-gapped set around Ck=Avg( Ld-1[ (3PCI1+PCI2)/4,PCI2) ). Mask it off. 2. For the next smaller PCD in Ld from the left side. 2.1a If preceded by a PCI, declare next Classk= subset of Ld-1[PCI, PCD] sphere-gapped around Ck=Avg. Mask off. 2.1b If preceded by another PCD declare next Classk=subset of same, sphere-gapped around Ck=Avg(Ld-1( [PCD2,(PCD1+PCD2)/4] ). Mask off

3 FAUST LSR Classification on IRIS150, a new version (old version is in the appendix)
Ld d=1000 p=origin MinL, MaxL for classes S,E,I S E I 1. If you're classifying individual unclassified samples one at a time, applying these formulas gives 100% accuracy in terms of true positives (assuming the given training set fully characterizes the classes). We have used just d=1000 so many more edges could be placed on these hulls to eliminate false positives. 2. If there is a whole table of unclassified samples to be classified (e.g., millions or billions) then it might be time-cost effective to convert that table to a pTreeSet and then convert these inequalities to pTree inequalities (EIN Ring technology) to accomplish the classification as one batch process (no loop required). 16 26 32 12 99 393 1096 1217 1826 p=AvE 270 1558 2568 Ld d=avgI-avgE p=origin E I 24,0 2,4 0,1 p=AvgS This is the {y isa EI)2 recursive step pseudo code: if  R1000,AvgS(y) < 792 {y isa I} elseif 792  R1000,AvgS(y)  1558 {y isa EI}3 elseif 1558  R1000,AvgS(y)  2568 {y isa I} else {y isa O} p=AvgE 22.69 35.51 54.32 Ld d=avgI-avgE p=origin E 1.78 I 1,0 0,1 This is the {y isa EI}3 recursive step: if  LAvE-AvI(y) < 13.6 {y isa E } elseif 13.6  LAvE-AvI(y)  15.9 {y isa EI}4 elseif 15.9 < LAvE-AvI(y)  16.6 {y isa I} else {y isa O } if  L1000(y)=y1 < 49 {y isa S } elseif 49  L1000(y)=y1  58 {y isa SEI}1 elseif 59 < L1000(y)=y1  70 {y isa EI}2 elseif 70 < L1000(y)=y1  79 {y isa I} else {y isa O } This is the {y isa EI)4 recursive step pseudo code: if  RAvE-AvI,AvgE(y)<31.02 {y isa E } elseif  RAvE-AvI,AvgE(y)35.51 {y isa EI}5 elseif  RAvE-AvI,AvgE(y)54.32 {y isa I} else {y isa O } This is the {y isa SEI)1 recursive step pseudo code: if  R1000,AvgS(y)  99 {y isa S } elseif 99 < R1000,AvgS(y) < 393 {y isa O } elseif 393 < R1000,AvgS(y)  1096 {y isa E } elseif 1096 < R1000,AvgS(y) < 1217 {y isa O } elseif 1217  R1000,AvgS(y)  1826 {y isa I} else {y isa O } This is the {y isa EI}5 recursive step: if =LAvgE-AvgI,origin(y) {y isa E } elseif 6.26=LAvgE-AvgI,origin(y) {y isa I} else {y isa O } L1000,origin(y) [43,49)[49,58](58,70](70,79]else OTHER yS yI R1000,AvgE(y) [0,99][399,1096][1217,1826]else OTHER yS yE yI R1000,AvgE(y) [270,792)[792,1558](1558,2568]else OTHER yE yI LAvEAvI,origin(y) [5.7,13.6)[13.6,15.9](15.9,16.6]else OTHER yE yI LSR Decision Tree algorithm is, Build decision tree for each ek (also for some ek combos?). Build branches to 100% TP (no class duplication exiting). Then y isa C iff y isa C in every tree else y isa Other. node build a branch for each pair of classes in each interval. RAvEAvI,AvgE(y) [22.7,31)[31,35.52](35.52,54.32]else OTHER yE yI LAvEAvI,origin(y)= 1.78 yE yI else OTHER

4 FAUST LSR DT Classification on IRIS150, d= 0100
Instead of calculating R's wrt a freshly calculated Avg in each slice, we calculate R0100,AvgS R0100,AvgE R0100,AvgI once then & w mask, P20L0100,0rigin<22 and later & with masks, P22L0100,0rigin<23 , P23L0100,0rigin34 , P34<L0100,0rigin38 and P38<L0100,0rigin44 L 0100.Origin(y) S E I 1 2 29 47 46 15 3 6 On 22L0100,O<23 R0100,AvgE 15 18 58 59 On 23L0100,O34 R0100,AvgS 0 66 46,12 3 234 R0100,AvgI 36 929 On 34<L0100,O38 0 55 96 273 On 23L0100,O34 & 352R0100,AvgS1750 LAvgEAvgI,Origin 44,11 It takes 7 recursive rounds to separate E and I (build this branch to 100% TP) in this branch of the e2=0100 tree 0100 (Pedal Width). It seems clear we are mostly pealing off outliers a few at a time. Is it because we are not revising the Avg Vectors as we go (to get the best angle)? On the next slide we make a fresh calculation of Avg for each subcluster. It also appears to be unnecessary to position the starting point of the AvgEAvgI vector to both AvgE and AvgI On 23L0100,O34 & 352R0100,AvgS1750 & 53LAvgEAvgI,Origin77 RAvgEAvgI,AvgE 40,10 RAvgEAvgI,AvgI On 23L0100,O34 & 352R0100,AvgS1750 & 53LAvgEAvgI,Origin77 & 2.8RAvgEAvgI,AvgE75.2 LAvgEAvgI,Origin 7,7 On 23L0100,O34 & 352R0100,AvgS1750 & 53LAvgEAvgI,Origin77 & 2.8RAvgEAvgI,AvgE75.2 & 74.1LAvgEAvgI,Origin76.2 RAvgEAvgI,AvgE 6,4 On 23L0100,O34 & 352R0100,AvgS1750 & 53LAvgEAvgI,Origin77 & 2.8RAvgEAvgI,AvgE75.2 & 74.1LAvgEAvgI,Origin76.2 & 15.4RAvgEAvgI,AvgE57.3 & 74.2LAvgEAvgI,Origin75.6 RAvgEAvgI,AvgE 15 37 57 6,0 0,1 On 23L0100,O34 & 352R0100,AvgS1750 & 53LAvgEAvgI,Origin77 & 2.8RAvgEAvgI,AvgE75.2 & 74.1LAvgEAvgI,Origin76.2 & 15.4RAvgEAvgI,AvgE57.3 LAvgEAvgI,Origin 6,1

5 FAUST LSR DT Classification on IRIS150, d= 0100
L 0100.Origin(y) S E I 1 2 29 47 46 15 3 6 On this slide we do the same as on the last but make a fresh calculation of Avg for each recursive steps. On 23L0100,O34 R0100,AvgS 0 43 45,12 On 23L0100,O34 & 320R0100,AvgS1820 LAvgEAvgI,Origin 24,9 It takes 7 recursive rounds again to separate E and I in this branch of the e2=0100 tree 0100 (Pedal Width). From this incomplete testing, it seems not to be beneficial to make expensive fresh Avg calculations. On 23L0100,O34 & 320R0100,AvgS1820 & 25LAvgEAvgI,Origin30 RAvgEAvgI,AvgE 24,6 On 23L0100,O34 & 320R0100,AvgS1820 & 25LAvgEAvgI,Origin30 & 4RAvgEAvgI,AvgE88 LAvgEAvgI,Origin 18,5 On 23L0100,O34 & 320R0100,AvgS1820 & 25LAvgEAvgI,Origin30 & 4RAvgEAvgI,AvgE88 & 3411LAvgEAvgI,Origin4397 RAvgEAvgI,AvgE 10,5 On 23L0100,O34 & 320R0100,AvgS1820 & 25LAvgEAvgI,Origin30 & 4RAvgEAvgI,AvgE88 & 3411LAvgEAvgI,Origin4397 & 5.9RAvgEAvgI,AvgE20.5 LAvgEAvgI,Origin 1,1

6 Sp  (X-p)o(X-p)= XoX+Xo(-2p)+pop, so if we have precomputed XoX,
FAUST LSR DT Classification on IRIS150, d= 0100 L 0100.Origin(y) S E I 1 2 29 47 46 15 3 6 11 23 On 23L0100,O34 R0100,AvgE 3 234 13,21 On 23L0100,O34 & 58R0100,AvgS234 SBarrelAvgE 13,18 23L0100,O34 & 58R0100,AvgS234 SBarrelAvgI 7,14 On 23L0100,O34 & 58R0100,AvgS234 LAvgEAvgI,Origin 7,13 At this point we pause the algorithm and try SBarrelAvgE and SBarrelAvgI in addition to LAvEAvI,O Next we try inserting SLinearAvgE and SLinearAvgI in serial with LAvEAvI,O instead of in parallel. On 23L0100,O34 & 58R0100,AvgS234 & 25LAvEAvI,O32 SLinearAvgE 1,5 On 23L0100,O34 & 58R0100,AvgS234 & 25LAvEAvI,O32 SLinearAvgI 6,11 Seems very beneficial! Use only LinearAvg with the smallest count, in this case LinearAvgE? On 23L0100,O34 & 58R0100,AvgS234 & 25LAvEAvI,O32 & 27SLinearAvgE66.1 Sp 11 23 1, ,5 Sp  (X-p)o(X-p)= XoX+Xo(-2p)+pop, so if we have precomputed XoX,

7 We build hull segments on each interval and OR them.
Why I like the FAUST Linear-Spherical-Radial Serial-Parallel Classifier very much (FAUST LSR SP) The parallel part lets us build a pair of Linear, Spherical and Radial hull segments for every pTree computation (the more the merrier) The serial part allows us the possibility of building a hull better than the convex hull! E.g., in a linear step, if we not only use min and max but also PCIs and PCDs, potentially we could do the following on d minL = pci1L pcd1L pci2L maxL = pcd2L On each PCC interval (not yet well defined n general, but in this example they are [pci1L,pcd1L] (pcd1L,pci2L) [pci2L,pcd2L] We build hull segments on each interval and OR them. Whereas the convex hull is in orange (lots of false positives)

8 APPENDIX FAUST Oblique LSR Classification IRIS150
Ld d=1000 p=origin S E I Ld d=0100 p=origin S E I Ld d=0010 p=origin S E I Ld d=0001 p=origin S 1 6 E I 16 26 32 12 1 15 3 2 1 6 48 50 15 2 1 34 50 22 16 34 28 99 393 1096 1217 1826 270 1558 2568 66 310 1750 4104 55 3139 3850 4954 8134 9809 3000 6120 6251 p=AvgS 279 5 171 186 748 998 1 517, 4 79 633 3 234 793 1417 712 636 9, 3 983 1369 2.8 158 199 5 3617 7 152 611 p=AvgE 24 132 730 1622 2281 388 1369 5 1403 929 1892 2824 96 273 1776 2747 5.9 319 453 454 1397 p=AvgI In pTree psuedo-code: Py<43=PO P43y<49=PS P49y58=PSEI P59<y70=PEI P70<y79=PI PO:= PO or Py>70

9 This first example suggests that recursion can be important.
Row      Attr1       Attr2 1             0            0 2             0            100 3             0            0 4             110        110 5             0            114 6             0            123 7             0            145 8             0            0 1, 3, 8 4 2 6 5 7 Ld,p=(X-p)od (if p=origin we use Ld=Xod) is a distance dominated functional, meaning dis(Ld,p(x),Ld,p(y))  dis(x, y) x,yX. Therefore there is no conflict between Ld,p gap enclosed clusters for different d's. I.e., consecutive Ld,p gaps a separate cluster always (but not necessarily vice versa). A PCI followed by a PCD  a separate cluster (with nesting issues to be resolved!). Recursion solves problems, e.g., gap isolating point4 is revealed by a Le1(X)=Attr1 gap. Recursively restricting to { } and applying Le2(X)=Attr2 reveals the 2 other gaps This first example suggests that recursion can be important. A different example suggests that recursion order can also be important: Row         Attr1      Attr2 1               0             0 2               0             25 3               0             50 4               75          75 5               0             100 6               0             125 7               0             150 1 2 6 4 7 3 5 Using ordering, d=e2, e1 recursively, Le2=Attr2 reveals no gaps, so Le1=Attr1 is applied to all of X and reveals only the gap around point4. 100 25 Using ordering d=e1, e2 instead: Le1=Attr1 on X reveals a gap of at least 100 around point4 (actual gap: ) StD: ~ ~55 Note StD doesn't always reveal best order! Le2=Attr2 is applied to X-{4} reveals a gap of 50 between {123} and {567} also. What about the other functionals? Sp=(X-p)o(X-p) and Rd,p=Sp-L2d,p In an attempt to be more careful, we can only say that Sp (and therefore also Rd,p) is eventually distance dominated meaning dis (Sp(x), Sp(y))dis(x, y) provided 1dis(p,x)+dis(p,y) Letting r=dis(p,x)=Sp(x), s=dis(p,y)=Sp(y) and r>s, then r-s  dis(x,y) and dis(Sp(x),Sp(y)) = r2-s2 = (r-s)*(r+s)  dis(x,y)*[dis(p,x)+dis(p,y)] When does FAUST Gap suffice for clustering? For text mining?

10 A PCC pkmeans clusterer with a kicker: Assign each (object, class) a ClassWeightReals (all CW init at 0) Classes numbered as they are revealed. Do for next d until after masking off new cluster count is too high (doesn't drop enough), compute Ld = Xod and For the next larger PCI in Ld(C), left-to-right. If followed by a PCD, declare CkCentroidLink where LinkLd-1[PCI,PCD] (Centroid=Avg/Median). If a SCk gapped cluster results (barrel?), declare it Classk and mask off, else on Link, apply Pillar pkmeans and mask off If followed by another PCI, declare CkCentroidLd-1[(3PCI2+PCI1)/4, PCI2]. If a SCk gapped cluster results, declare it Classk and mask off, else on Link=Ld-1[PCI1,PCI2) apply initial centroid id step of Pillar pkmeans starting with Ck as the initial mediod (since we already have SCk). Mask off any S-identified outliers and any S-gapped cluster (wrt the entire remains of X) If followed by another PCI, declare next Classk=the sphere-gapped set around Ck=Avg( Ld-1[ (3PCI1+PCI2)/4,PCI2) ). Mask it off. For the next smaller PCD in Ld from the left side. If preceded by a PCI, declare next Classk= subset of Ld-1[PCI, PCD] sphere-gapped around Ck=Avg. Mask off. If preceded by another PCD declare next Classk=subset of same, sphere-gapped around Ck=Avg(Ld-1( [PCD2,(PCD1+PCD2)/4] ). Mask off

11 LSR IRIS150-. Consider all 3 functionals, L, S and R. What's the most efficient way to calculate all 3?\ o=origin; pRn; dRn, |d|=1; {Ck}k=1..K are the classes; An operation enclosed in a parallelogram, , means it is a pTree op, not a scalar operation (on just numeric operands) Lp,d  (X - p) o d = Lo,d - [pod] minLp,d,k = min[Lp,d & Ck] maxLp,d,k = max[Lp,d & Ck[ = [minLo,d,k] - pod = [maxLo,d,k] - pod = min(Xod & Ck) - pod = max(Xod & Ck) - pod OR = min(X&Ck) o d - pod = max(X&Ck) o d - pod Sp = (X - p)o(X - p) = -2Xop+So+pop = Lo,-2p + (So+pop) minSp,k=minSp&Ck maxSp,k = maxSp&Ck = min[(X o (-2p) &Ck)] + (XoX+pop) =max[(X o (-2p) &Ck)] + (XoX+pop) OR = min[(X&Ck)o-2p] + (XoX+pop) =max[(X&Ck)o-2p] + (XoX+pop) Rp,d  Sp, - Lp,d2 minRp,d,k=min[Rp,d&Ck] maxRp,d,k=max[Rp,d&Ck] I suggest that we use each of the functionals with each of the pairs, (p,d) that we select for application (since, to get R we need to compute L and S anyway). So it would make sense to develop an optimal (minimum work and time) procedure to create L, S and R for any (p,d) in the set.

12 LSR on IRIS150 C13 Dse S E I L H y isa OTHER if yoDse (-,495)(802,1061)(2725,) y isa OTHER or S if yoDse  C1,1  [ 495 , 802] y isa OTHER or I if yoDse  C1,2  [1061 ,1270] y isa OTHER or E or I if yoDse  C1,3  [1270 ,2010 C1,3: 0 s 49 e 11 i y isa OTHER or I if yoDse  C1,4  [2010 ,2725] Dei E y isa O if yoDei (-,-117)(-3,) I y isa O or E or I if yoDei  C2,1  [-62 ,-44] L H y isa O or I if yoDei  C2,2  [-44 , -3] C2,1: 2 e 4 i Dei E y isa O if yoDei (-,420)(459,480)(501,) I y isa O or E if yoDei  C3,1  [420 ,459] L H y isa O or I if yoDei  C3,2  [480 ,501] Continue this on clusters with OTHER + one class, so the hull fits tightely (reducing false positives), using diagonals? C1,1: D=1000 y isa O if yoD(-,43)(58,) L H y isa O|S if yoD C2,3  [43,58] C2,3: D=0100 y isa O if yoD(-,23)(44,) L H y isa O|S if yoD C3,3  [23,44] C3,3: D=0010 y isa O if yoD(-,10)(19,) L H y isa O|S if yoD C4,1  [10,19] C4,1: D=0001 1 6 y isa O if yoD(-,1)(6,) L H y isa O|S if yoD C5,1  [1,6] C5,1: D=1100 y isa O if yoD(-,68)(117,) L H y isa O|S if yoD C6,1  [68,117] C6,1: D=1010 y isa O if yoD(-,54)(146,) L H y isa O|S if yoD C7,1  [54,146] C7,1: D=1001 y isa O if yoD(-,44)(100,) L H y isa O|S if yoD C8,1  [44,100] C8,1: D=0110 y isa O if yoD(-,36)(105,) L H y isa O|S if yoD C9,1  [36,105] C9,1: D=0101 y isa O if yoD(-,26)(61,) L H y isa O|S if yoD Ca,1  [26,61] Ca,1: D=0011 y isa O if yoD(-,12)(91,) L H y isa O|S if yoD Cb,1  [12,91] Cb,1: D=1110 y isa O if yoD(-,81)(182,) L H y isa O|S if yoD Cc,1  [81,182] Cc,1: D=1101 y isa O if yoD(-,71)(137,) L H y isa O|S if yoD Cd,1  [71,137] Cd,1: D=1011 y isa O if yoD(-,55)(169,) L H y isa O|S if yoD Ce,1  [55,169] Ce,1: D=0111 y isa O if yoD(-,39)(127,) L H y isa O|S if yoD Cf,1  [39,127] Cf,1: D=1111 y isa O if yoD(-,84)(204,) L H y isa O|S if yoD Cg,1  [84,204] Cg,1: D=1-100 y isa O if yoD(-,10)(22,) L H y isa O|S if yoD Ch,1  [10,22] Ch,1: D=10-10 y isa O if yoD(-,3)(46,) L H y isa O|S if yoD Ci,1  [3,46] The amount of work yet to be done., even for only 4 attributes, is immense.. For each D, we should fit boundaries for each class, not just one class. For 4 attributes, I count 77 diagonals*3 classes = 231 cases. How many in the Enron case with 10,000 columns? Too many for sure!! D, not only cut at minCoD, maxCoD but also limit the radial reach for each class (barrel analytics)? Note, limiting the radial reach limits all other directions [other than the D direction] in one step and therefore by the same amount. I.e., it limits all directions assuming perfectly round clusters). Think about Enron, some words (columns) have high count and others have low count. Our radial reach threshold would be based on the highest count and therefore admit many false positives. We can cluster directions (words) by count and limit radial reach differently for different clusters??

13 Dot Product SPTS computation: XoD = k=1..nXkDk
( = 22 + 21 (1 p1,0 + 1 p11 + 20 (1 p1,0 1 p1,1 + 1 p2,1 ) + 1 p2,0 + 1 p2,0 ) 1 3 3 D D1,1 D1,0 1 1 D2,1 D2,0 1 3 2 X X1 X2 p11 p10 p21 p20 6 9 XoD 1 pXoD,3 pXoD,2 pXoD,1 pXoD,0 /*Calc PXoD,i after PXoD,i-1 CarrySet=CARi-1,i RawSet=RSi */ INPUT: CARi-1,i, RSi ROUTINE: PXoD,i=RSiCARi-1,i CARi,i+1=RSi&CARi-1,i OUTPUT: PXoD,i, CARi,i+1 1 CAR12,3 & 1 & 1 CAR22,3 PXoD,2 1 1 CAR11,2 & & 1 CAR21,2 PXoD,1 1 & 1 1 1 1 PXoD,0 CAR10,1 & & 1 PXoD,3 CAR13,4 Different data. 3 3 D D1,1 D1,0 1 1 D2,1 D2,0 1 1 ( = 22 + 21 (1 p1,0 + 1 p11 + 20 (1 p1,0 1 p1,1 + 1 p2,1 ) + 1 p2,0 + 1 p2,0 ) 1 3 2 X pTrees 6 18 9 XoD 1 1 & 1 1 1 & 1 & 1 1 1 1 & 1 1 & 1 & 1 & PXoD,2 1 & PXoD,1 1 & 1 PXoD,0 CAR10,1 & 1 & PXoD,3 PXoD,4 We have extended the Galois field, GF(2)={0,1}, XOR=add, AND=mult to pTrees. SPTS multiplication: (Note, pTree multiplication = &) = (21 p1,1 +20 p1,0) (21 p2,1 p2,0) = 22 +21( p2,0+ + 20 p1,0 p2,0 X1*X2 1 & 1 & 1 & 1 1 & pX1*X2,0 1 & pX1*X2,2 1 & pX1*X2,1 1 3 2 X X1 X2 p11 p10 p21 p20 9 X1*X2 pX1*X2,3 pX1*X2,2 pX1*X2,1 pX1*X2,0 1 pX1*X2,3

14 FAUST Oblique: XoD used in CCC, TKO, PLC and LARC) and (x-X)o(x-X)
Example: FAUST Oblique: XoD used in CCC, TKO, PLC and LARC) and (x-X)o(x-X) = -2Xox+xox+XoX is used in TKO. So in FAUST, we need to construct lots of SPTSs of the type, X dotted with a fixed vector, a costly pTree calculation (Note that XoX is costly too, but it is a 1-time calculation (a pre-calculation?). xox is calculated for each individual x but it's a scalar calculation and just a read-off of a row of XoX, once XoX is calculated.. Thus, we should optimize the living he__ out of the XoD calculation!!! The methods on the previous seem efficient. Is there a better method? Then for TKO we need to computer ranks: RankK: p is what's left of K yet to be counted, initially p=K V is the RankKvalue, initially 0. For i=bitwidth+1 to 0 if Count(P&Pi)  p { KVal=KVal+2i; P=P&Pi }; else /* < p */ { p=p-Count(P&Pi); P=P&P'i }; RankN-1(XoD)=Rank2(XoD) 1 1 D=x1 D1,1 D1,0 0 1 D2,1 D2,0 1 P=P&p1 32 1*21+ P p1 n=1 p=2 1 P=p0&P 22 1*21+1*20=3 so -2x1oX = -6 P &p0 n=0 p=2 1 3 2 X X1 X2 p11 p10 p21 p20 2 3 XoD 1 p3 p2 p1 p,0 RankN-1(XoD)=Rank2(XoD) 3 0 D=x2 D1,1 D1,0 1 1 D2,1 D2,0 0 0 3 9 2 XoD 1 p3 p2 p1 p,0 1 P=P&p'3 1<2 2-1=1 0*23+ P p3 n=3 p=2 1 P=p'2&P 0<1 1-0=1 0*23+0*22 P &p2 n=2 p=1 1 P=p1&P 21 0*23+0*22+1*21+ P &p1 n=1 p=1 1 P=p0&P 11 0*23+0*22+1*21+1*20=3 so -2x2oX= -6 P &p0 n=0 p=1 RankN-1(XoD)=Rank2(XoD) 2 1 D=x3 D1,1 D1,0 1 0 D2,1 D2,0 0 1 3 6 5 XoD 1 p3 p2 p1 p,0 1 P=P&p2 22 1*22+ P p2 n=2 p=2 1 P=p'1&P 1<2 2-1=1 1*22+0*21 P &p1 n=1 p=2 1 P=p0&P 11 1*22+0*21+1*20=5 so -2x3oX= -10 P &p0 n=0 p=1

15 (n=3) c=Count(P&P4,3)= 3 < 6
RankN-1(XoD)=Rank2(XoD) 3 3 D D1,1 D1,0 0 1 D2,1 D2,0 n=3 p=2 n=2 p=2 n=1 p=2 n=0 p=2 1 3 2 X X1 X2 p11 p10 p21 p20 2 3 XoD 1 p3 p2 p1 p,0 P p3 P &p2 P &p1 P &p0 1 1 22 1*23+ 1 1 0<2 2-0=2 1*23+0*22+ 1 1 0<2 2-0=2 1*23+0*22+0*21+ 1 1 22 1*23+0*22+0*21+1*20=9 pTree Rank(K) computation: (Rank(N-1) gives 2nd smallest which is very useful in outlier analysis?) 1 1 1 1 P=P&p3 P=p'2&P P=p'1&P P=p0&P Cross out the 0-positions of P each step. (n=3) c=Count(P&P4,3)= < 6 p=6–3=3; P=P&P’4,3 masks off highest (val 8) {0} X P4, P4, P4, P4,0 10 5 6 7 11 9 3 1 1 1 1 (n=2) c=Count(P&P4,2)= >= 3 P=P&P4,2 masks off lowest (val 4) {1} (n=1) c=Count(P&P4,1)= < 3 p=3-2=1; P=P&P'4,1 masks off highest (val8-2=6 ) {0} (n=0) c=Count(P&P4,0 )= >= 1 P=P&P4,0 {1} RankKval=0; p=K; c=0; P=Pure1; /*Note: n=bitwidth-1. The RankK Points are returned as the resulting pTree, P*/ For i=n to 0 {c=Count(P&Pi); If (c>=p) {RankVal=RankVal+2i; P=P&Pi }; else {p=p-c; P=P&P'i }; return RankKval, P; /* Above K=7-1=6 (looking for the Rank6 or 6th highest vaue (which is also the 2nd lowest value) */ 23 * * * * = RankKval= P=MapRankKPts= ListRankKPts={2} 1 {0} {1} {0} {1}

16 UDR Univariate Distribution Revealer (on Spaeth:)
15 UDR Univariate Distribution Revealer (on Spaeth:) applied to S, a column of numbers in bistlice format (an SpTS), will produce the DistributionTree of S DT(S) depth=h=0 depth=h=1 Y y1 y2 y y y y y y y y y ya 13 4 pb 10 9 yc 11 10 yd 9 11 ye 11 11 yf 7 8 yofM 11 27 23 34 53 80 118 114 125 110 121 109 83 p6 1 p5 1 p4 1 p3 1 p2 1 p1 1 p0 1 p6' 1 p5' 1 p4' 1 p3' 1 p2' 1 p1' 1 p0' 1 node2,3 [96.128) f= depthDT(S)b≡BitWidth(S) h=depth of a node k=node offset Nodeh,k has a ptr to pTree{xS | F(x)[k2b-h+1, (k+1)2b-h+1)} and its 1count p6' 1 5/64 [0,64) p6 10/64 [64,128) p5' 1 3/32[0,32) 2/32[64,96) p5 2/32[32,64) ¼[96,128) p3' 1 0[0,8) p3 1[8,16) 1[16,24) 1[24,32) 1[32,40) 0[40,48) 1[48,56) 0[56,64) 2[80,88) 0[88,96) 0[96,104) 2[194,112) 3[112,120) 3[120,128) p4' 1 1/16[0,16) p4 2/16[16,32) 1[32,48) 1[48,64) 0[64,80) 2[80,96) 2[96,112) 6[112,128) Pre-compute and enter into the ToC, all DT(Yk) plus those for selected Linear Functionals (e.g., d=main diagonals, ModeVector . Suggestion: In our pTree-base, every pTree (basic, mask,...) should be referenced in ToC( pTree, pTreeLocationPointer, pTreeOneCount ).and these OneCts should be repeated everywhere (e.g., in every DT). The reason is that these OneCts help us in selecting the pertinent pTrees to access - and in fact are often all we need to know about the pTree to get the answers we are after.).

17 So let us look at ways of doing the work to calculate As we recall from the below, the task is to ADD bitslices giving a result bitslice and a set of carry bitslices to carry forward XoD = k=1..nXk*Dk X pTrees XoD 1 3 2 1 1 1 1 6 9 1 1 1 1 3 3 D D1,1 D1,0 1 1 D2,1 D2,0 1 1 ( = 22 + 21 1 p1,0 p11 + 20 1 p1,1 p2,1 ) p2,0 + 1 p2,1 ) p2,0 ) 1 1 1 1 1 1 1 1 1 I believe we add by successive XORs and the carry set is the raw set with one 1-bit turned off iff the sum at that bit is a 1-bit Or we can characterize the carry as the raw set minus the result (always carry forward a set of pTrees plus one negative one). We want a routine that constructs the result pTree from a positive set of pTrees plus a negative set always consisting of 1 pTree. The routine is: successive XORs across the positive set then XOR with the negative set pTree (because the successive pset XOR gives us the odd values and if you subtract one pTree, the 1-bits of it change odd to even and vice versa.): /*For PXoD,i (after PXoD,i-1). CarrySetPos=CSPi-1,i CarrySetNeg=CSNi-1,i RawSet=RSi CSP-1=CSN-1=*/ INPUT: CSPi-1, CSNi-1, RSi ROUTINE: PXoD,i=RSiCSPi-1,iCSNi-1,i CSNi,i+1=CSNi-1,iPXoD,i; CSPi,i+1=CSPi-1,iRSi-1; OUTPUT: PXoD,i, CSNi,i CSPi,i+1 ( = 22 + 21 1 p1,0 p11 + 20 1 p1,1 p2,1 ) p2,0 + 1 p2,1 ) p2,0 ) 1 1 1 1 1 1 RS1 CSN0,1= CSN-1.0PXoD,0 CSP0,1= CSP-1,0RS0 RS0 CSP-1,0=CSN-1,0= PXoD,0 PXoD,1 1 1 1 1 1 1 1 1 1 = = 1 1 1 

18 XoD=k=1,2Xk*Dk with pTrees: qN..q0, N=22B+roof(log2n)+2B+1
CCC Clusterer If DT (and/or DUT) not exceeded at C, partition C further by cutting at each gap and PCC in CoD For a table X(X1...Xn), the SPTS, Xk*Dk is the column of numbers, xk*Dk. XoD is the sum of those SPTSs, k=1..nXk*Dk Xk*Dk = Dkb2bpk,b = 2BDkpk,B Dkpk,0 = Dk(2Bpk,B +..+20pk,0) = (2Bpk,B +..+20pk,0) (2BDk,B+..+20Dk,0) + 22B-1(Dk,B-1pk,B +..+20Dk,0pk,0 = 22B( Dk,Bpk,B) +Dk,Bpk,B-1) XoD = k=1..nXk*Dk k=1..n ( = 22B + 22B-1 Dk,B pk,B-1 + Dk,B-1 pk,B + 22B-2 Dk,B pk,B-2 + Dk,B-1 pk,B-1 + Dk,B-2 pk,B + 22B-3 Dk,B pk,B-3 + Dk,B-1 pk,B-2 + Dk,B-2 pk,B-1 +Dk,B-3 pk,B + 23 Dk,B pk,0 + Dk,2 pk,1 + Dk,1 pk,2 +Dk,0 pk,3 + 22 Dk,2 pk,0 + Dk,1 pk,1 + Dk,0 pk,2 + 21 Dk,1 pk,0 + Dk,0 pk,1 + 20 Dk,0 pk,0 Dk,B pk,B . . . 1 3 2 X pTrees 1 2 D D1,1 D1,0 0 1 D2,1 D2,0 1 0 B=1 XoD=k=1,2Xk*Dk with pTrees: qN..q0, N=22B+roof(log2n)+2B+1 k=1..2 ( = 22 + 21 Dk,1 pk,0 + Dk,0 pk,1 + 20 Dk,0 pk,0 Dk,1 pk,1 q0 = p1,0 = no carry 1 q1= carry1= 1 ( = 22 + 21 D1,1 p1,0 + D1,0 p11 + 20 D1,0 p1,0 D1,1 p1,1 + D2,1 p2,1 ) + D2,1 p2,0 + D2,0 p2,1 ) + D2,0 p2,0 ) q2=carry1= no carry 1 ( = 22 + 21 D1,1 p1,0 + D1,0 p11 + 20 D1,0 p1,0 D1,1 p1,1 + D2,1 p2,1 ) + D2,1 p2,0 + D2,0 p2,1 ) + D2,0 p2,0 ) 1 So, DotProduct involves just multi-operand pTree addition. (no SPTSs and no multiplications) Engineering shortcut tricka would be huge!!! q0 = carry0= 1 3 3 D D1,1 D1,0 1 1 D2,1 D2,0 1 1 q1=carry0+raw1= carry1= 1 2 ( = 22 + 21 1 p1,0 p11 + 20 1 p1,1 p2,1 ) p2,0 + 1 p2,1 ) p2,0 ) 1 A carryTree is a valueTree or vTree, as is the rawTree at each level (rawTree = valueTree before carry is incl.). In what form is it best to carry the carryTree over? (for speediest of processing?) 1. multiple pTrees added at next level? (since the pTrees at the next level are in that form and need to be added) 2. carryTree as a SPTS, s1? (next level rawTree=SPTS, s2, then s10& s20 = qnext_level and carrynext_level ? q2=carry1+raw2= carry2= 1 q3=carry2 = carry3= 1

19 Should we pre-compute all pk,i*pk,j p'k,i*p'k,j pk,i*p'k,j
Question: Which primitives are needed and how do we compute them? X(X1...Xn) D2NN yields a 1.a-type outlier detector (top k objects, x, dissimilarity from X-{x}). D2NN = each min[D2NN(x)] (x-X)o(x-X)= k=1..n(xk-Xk)(xk-Xk)=k=1..n(b=B..02bxk,b-2bpk,b)( (b=B..02bxk,b-2bpk,b) =k=1..n( b=B..02b(xk,b-pk,b) ) ( ----ak,b--- b=B..02b(xk,b-pk,b) ) ( 22Bak,Bak,B + 22B-1( ak,Bak,B-1 + ak,B-1ak,B ) + { 22Bak,Bak,B } 22B-2( ak,Bak,B-2 + ak,B-1ak,B-1 + ak,B-2ak,B ) + {2B-1ak,Bak,B B-2ak,B-12 22B-3( ak,Bak,B-3 + ak,B-1ak,B-2 + ak,B-2ak,B-1 + ak,B-3ak,B ) + { 22B-2( ak,Bak,B-3 + ak,B-1ak,B-2 ) } 22B-4(ak,Bak,B-4+ak,B-1ak,B-3+ak,B-2ak,B-2+ak,B-3ak,B-1+ak,B-4ak,B)... {22B-3( ak,Bak,B-4+ak,B-1ak,B-3)+22B-4ak,B-22} (2Bak,B+ 2B-1ak,B-1+..+ 21ak, 1+ 20ak, 0) =k D2NN=multi-op pTree adds? When xk,b=1, ak,b=p'k,b and when xk,b=0, ak,b= -pk.b So D2NN just multi-op pTree mults/adds/subtrs? Each D2NN row (each xX) is separate calc. =22B ( ak,B2 + ak,Bak,B-1 ) + 22B-1( ak,Bak,B-2 ) + 22B-2( ak,B-12 22B-3( ak,Bak,B-4+ak,B-1ak,B-3) + 22B-4ak,B + ak,Bak,B-3 + ak,B-1ak,B-2 ) + Should we pre-compute all pk,i*pk,j p'k,i*p'k,j pk,i*p'k,j ANOTHER TRY! X(X1...Xn) RKN (Rank K Nbr), K=|X|-1, yields1.a_outlier_detector (top y dissimilarity from X-{x}). Install in RKN, each RankK(D2NN(x)) (1-time construct but for. e.g., 1 trillion xs? |X|=N=1T, slow. Parallelization?) xX, the square distance from x to its neighbors (near and far) is the column of number (vTree or SPTS) d2(x,X)= (x-X)o(x-X)= k=1..n|xk-Xk|2= k=1..n(xk-Xk)(xk-Xk)= k=1..n(xk2-2xkXk+Xk2) = -2 kxkXk kxk kXk2 3. Pick this from XoX for each x and add to 2. = xoX xox XoX 5. Add 3 to this k=1..n i=B..0,j=B..02i+jpk,ipk,j 1. precompute pTree products within each k i,j 2i+j kpk,ipk,j 2. Calculate this sum one time (independent of the x) -2xoX cost is linear in |X|=N. xox cost is ~zero XoX is 1-time -amortized over xX (i.e., =1/N) or precomputed The addition cost, -2xoX + xox + XoX, is linear in |X|=N So, overall, the cost is linear in |X|=n. Data parallelization? No! (Need all of X at each site.) Code parallelization? Yes! (After replicating X to all sites, Each site creates/saves D2NN for its partition of X, then sends requested number(s) (e.g., RKN(x) ) back.

20 LSR on IRIS150-3 Here we use the diagonals.
Here we try using other p points for the R step (other than the one used for the L step). d=e1 p=AVGs, L=(X-p)od S E I d=e1 p=AvgS, L=Xod S&L E&L I&L R & L [43,49) S(16) 128 [49,58) E(24)I(6) 0 S(34) 99 393 1096 1217 1825 [58,70) E(26) I(32) 270 792 1558 2567 [70,79] I(12) 2081 3444 R(p,d,X) S E I 128 270 393 1558 3444 d=e1 p=AS L=(X-p)od (-pod=-50.06) S&L -1; E&L I&L -8,-2 16 [-2,8) 34, 24, 6 99 393 1096 1217 1825 [20,29] 12 [8,20) 26, 32 270 792 1558 2567 E=26 I=5 p=AvgS 30ambigs, 5 errs d=e4 p=AvgS, L=(X-p)od S&L E&L I&L -2,4) 50 [7,11) 28 [16,23] I=34 [11,16) 22, 127.5 648.7 1554.7 2892 E=22 I=7 p=AvgS I(1) E(50) I(7) (36,7) 63 70 (11) Only overlap L=[58,70), R[792,1557] (E(26), I(5)) With just d=e1, we get good hulls using LARC: While  Ip,d containing >1class, for next (d,p) create L(p,d)Xod-pod, R(p,d)XoX+pop-2Xop-L2 1.  MnCls(L), MxCls(L), create a linear boundary. 2.  MnCls(R), MxCls(R).create a radial boundary. 3. Use R&Ck to create intra-Ck radial boundaries Hk = {I | Lp,d includes Ck} I(42) d=e1 p=AS L=(X-p)od (-pod=-50.06) S&L -1; E&L I&L -8,-2 16 [-2,8) 34, 24, 6 99 393 1096 1217 1825 [20,29] 12 [8,20) w p=AvgE 26, 32 1.9 51.8 78.6 633 <--E=6 I=4 d=e4 p=AvgS, L=(X-p)od S&L E&L I&L -2,4) 50 [7,11) 28 [16,23] I=34 [11,16) 22, 5.7 36.2 151.06 611 E=17 I=7 p=AvgE d=e1 p=AS L=(X-p)od (-pod=-50.06) S&L -1; E&L I&L -8,-2 16 [-2,8) 34, 24, 6 99 393 1096 1217 1825 [20,29] 12 [8,20) w p=AvgI 26, 32 0.62 34.9 387.8 1369 <--E=25 I=10 d=e4 p=AvgS, L=(X-p)od S&L E&L I&L -2,4) 50 [7,11) 28 [16,23] I=34 [11,16) 22, 127.5 1555 2892 E=22 I=8 p=AvgI For e4, the best choice of p for the R step is also p=AvgE. (There are mistakes in this column on the previous slide!) There is a best choice of p for the R step (p=AvgE) but how would we decide that ahead of time?

21 LSR on IRIS150 SRR(AVGs,dse) on C1,1 S y isa O if yoD (-,-184)(123,381)(2046,) y isa O or S(50) if yoD  C1,1  [-184 , 123] y isa O if y isa C1,1 AND SRR(AVGs,Dse)(154,) y isa O or S(50) if y isa C1,1 AND SRR(AVGs,DSE)[0,154] y isa O or I(1) if yoD  C1,2  [ 381 , 590] Dse ; xoDes: S E I y isa O or E(50) or I(11) if yoD  C1,3  [ 590 ,1331] y isa O or I(38) if yoD  C1,4  [1331 ,2046] SRR(AVGs,dse) on C1,2only one such I SRR(AVGs,dse) onC1,3 E I y isa O if y isa C1,3 AND SRR(AVGs,Dse)(-,2)U(143,) y isa O or E(10) if y isa C1,3 AND SRR in [2,7) y isa O or E(40) or I(10) if y isa C1,3 AND SRR in [7,137) = C2,1 y isa O or I(1) if y isa C1,3 AND SRR in [137,143] etc. y isa O if yoD (-,-2) (19,) y isa O or I(8) if yoD  [ -2 , 1.4] y isa O or E(40) or I(2) if yoD  C3,1 [ 1.4 ,19] Dei ; xoDei on C2,1: E I SRR(AVGe,dei) onC3,1 E I y isa O if y isa C3,1 AND SRR(AVGs,Dei)[0,2)(370,) y isa O or E(4) if y isa C3,1 AND SRR(AVGs,Dei)[2,8) y isa O or E(27) or I(2) if y isa C3,1 AND SRR(AVGs,Dei)[8,106) y isa O or E(9) if y isa C3,1 AND SRR(AVGs,Dei)[106,370] We use the Radial steps to remove false positives from gaps and ends. We are effectively projecting onto a 2-dim range, generated by the Dline and the Dline (which measures the perpendicular radial reach from the D-line). In the D projections, we can attempt to cluster directions into "similar" clusters in some way and limit the domain of our projections to one of these clusters at a time, accommodating "oval" shaped or elongated clusters giving a better hull fit. E.g., in the Enron case the dimensions would be words that have about the same count, reducing false positives. d=e1=1000; The xod limits: S E I y isa O if yoD(-,43)(79,) y isa O or S( 9) if yoD[43,47] y isa O or S(41) or E(26) or I( 7) if yoD(47,60) (yC1,2) y isa O or E(24) or I(32) if yoD[60,72] (yC1,3) y isa O if yoD[43,47]&SRR(-,52)(60,) y isa O or I(11) if yoD(72,79] y isa O if yoD[72,79]&SRR(-,49)(78,) LSR on IRIS150-2 We use the diagonals. Also we set a MinGapThres=2 which will mean we stay 2 units away from any cut y isa O or E( 3) if yoD[18,23) y isa O if yoD(-,18)(46,) y isa O or E(13) or I( 4) if yoD[23,28) (yC2,1) y isa O or S(13) or E(10) or I( 3) if yoD[28,34) (yC2,2) y isa O or S(28) if yoD[34,46] y isa O if yoD[18,23)&SRR[0,21) y isa O if yoD[34,46]&SRR[0,32][46,) d=e2=0100 on C1,2 xod lims: S E I d=e2=0100 on C1,3 xod lims: E I zero differentiation! y isa O or E(17) if yoD[60,72]&SRR[1.2,20] y isa O or I(25)if yoD[60,72]&SRR[66,799] y isa O or E( 7) or I( 7)if yoD[60,72]&SRR[20, 66] y isa O if yoD[0,1.2)(799,) d=e3=0010 on C2,2 xod lims: S E I y isa O if yoD(-,28)(33,) y isa O or S(13) or E(10) or I(3) if yoD[28,33] d=e3=0001 xod lims: E I y isa O or S(13) if yoD[1,5] y isa O if yoD(-,1)(5,12)(24,) y isa O or E( 9) if yoD[12,16) y isa O or E( 1) or I( 3) if yoD[16,24) y isa O if yoD[12,16)&SRR[0,208)(558,) y isa O if yoD[16,24)&SRR[0,1198)(1199,1254)1424,) y isa O or E(1) if yoD[16,24)&SRR[1198,1199] y isa O or I(3) if yoD[16,24)&SRR[1254,1424]

22 LSR IRIS150. Next, we examine:
d=AvgEAvgI p=AvgE, L=(X-p)od S E I R(p,d,X) S E I 2 32 76 357 514 [-17,-14)] I(1) [-14,11) (50, 13) 2.8 134 [11,33] I(36) E=47 I=12 R(p,d,X) S E I .3 .9 4.7 150 204 213 [12,17.5)] I(1) d=AvgSAvgI p=AvgS, L=(X-p)od S E I [17.5,42) (50,12) 6 192 205 [11,33] I(37) E=45 I=12 d=AvgSAvgE p=AvgS, L=(X-p)od S E I R(p,d,X) S E I 2 6 137 154 393 [11,18)] I(1) [18,42) (50,11) 6.92 133 [42,64] 38 E=39 I=11 d=e1 p=AvgS, L=Xod S&L E&L I&L Note that each L=(X-p)od is just a shift of Xod by -pod (for a given d). Next, we examine: For a fixed d, the SPTS, Lp,d. is just a shift of LdLorigin,d by -pod we get the same intervals to apply R to, independent of p (shifted by -pod). Thus, we calculate once, lld=minXod hld=maxXod, then for each different p we shift these interval limit numbers by -pod since these numbers are really all we need for our hulls (Rather than going thru the SPTS calculation of (X-p)od anew  new p). There is no reason we have to use the same p on each of those intervals either. d=e1 p=AS L=(X-p)od (-pod=-50.06) S&L -1; E&L I&L -8,-2 16 [-2,8) 34, 24, 6 99 393 1096 1217 1825 [20,29] 12 [8,20) 26, 32 270 792 1558 2567 E=26 I=5 30ambigs, 5 errs d=e2 p=AvgS, L=(X-p)od S&L E&L I&L ,-13) 1 -13,-11 0, 2, 1 all=-11 [0,4) [4, -11,0 29,47,46 66 310 352 1749 4104 1, 1 46,11 2, 1 9, 3 d=e3 p=AvgS, L=(X-p)od S&L E&L I&L -5,4) 47 [4,15) [37,55] I=34 [15,37) 50, 15 157 297 536 792 E=18 I=12 3, 1 d=e4 p=AvgS, L=(X-p)od S&L E&L I&L -2,4) 50 [7,11) 28 [16,23] I=34 [11,16) 22, 16 11 16 E=22 I=16 38ambigs 16errs d=e1 p=AE L=(X-p)od (-pod=-59.36) S&L E&L I&L -17-11 16 [-11,-1) 33, 21, 3 27 107 172 748 1150 [11,20] I12 [-1,11) 26, 32 1 51 79 633 E=7 I=4 E=5 I=3 d=e2 p=AvgE, L=(X-p)od -5 `17 S&L E&L I&L ,-6) 1 [-6, -5) 0, 2, 1 15 18 58 59 [7,11) [11, 1 err [-5,7) 29,47, 46 3 234 793 1103 1417 13, 21 21, d=e3 p=AvgE, L=(X-p)od S&L E&L I&L ,-25) 48 -25,-12 [9,27] I=34 [-12,9) 49, 15 2(17) 16 158 199 E=32 I=14 d=e4 p=AvgE, L=(X-p)od S&L E&L I&L -7] 50 [-3,1) 21 [5,12] 34 [1,5) 22, 16 E=22 I=16 d=e1 p=AI L=(X-p)od (-pod=-65.88) S&L E&L I&L [-17,-8) 33, 21, 3 38 126 132 730 1622 2181 [-8,4) 26, 32 34 1368 E=26 I=11 E=2 I=1 d=e2 p=AvgI, L=(X-p)od -7 `15 S&L E&L I&L ,-6) 1 [6,11) [11, [-7, 4) 29,46,46 5 36 929 1403 1893 2823 [-8, -7) 2, 1 allsame E=2 I=1 E=47 I=22 [5, 9] 9, 2, 1 S=9 d=e3 p=AvgI, L=(X-p)od S&L E&L I&L ,-25) 48 -25,-12 [9,27] I=34 [-25,-4) 50, 15 5 11 318 453 E=32 I=14 E=46 d=e4 p=AvgI, L=(X-p)od S&L E&L I&L [5,12] 34 [-6,-3) 22, 16 same range E=22 I=16 So on the next slide, we consider all 3 functionals, L, S and R. E.g., Why not apply S first to limit the spherical reach (eliminate FPs). S is calc'ed anyway?

23 LSR IRIS150 e2 Setosa 23 44 vErsicolor 20 34 -11 10 vIrginica 22 38
Ld d=0100 p=origin Setosa vErsicolor vIrginica d=0100 p=AS=( ) d=0100 p=AE=( ) d=0100 p=AI=( ) 1 2 1 all -7.7 5 929 1403 1892 2823 15 3 6

24 p=AvgS p=AvgE p=AvgI d=e1 d=e2 d=e3 d=e4 FAUST Oblique, LSR Lp,d
Linear, Spherical, Radial classifier Form Class Hulls using linear d boundaries thru min and max of Lk.d,p=(Ck&(X-p))od  On every Ik,p,d{[epi,epi+1) | epj=minLk,p,d or maxLk,p,d for some k,p,d} interval add spherical and barrel boundaries with Sk,p and Rk,p,d similarly (use enough (p,d) pairs so that no 2 class hulls overlap) Points outside all hulls are declared as "other". all p,ddis(y,Ik,p,d) = unfitness of y being classed in k. Fitness of y in k is f(y,k) = 1/(1-uf(y,k)) XoX 4026 3501 3406 3306 3996 4742 3477 3885 2977 3588 4514 3720 3401 2871 5112 5426 4622 4031 4991 4279 4365 4211 3516 4004 3825 3660 3928 4158 4060 3493 3525 4313 4611 4989 3672 4423 3009 3986 3903 2732 3133 4017 4422 3409 4305 3340 4407 3789 8329 7370 8348 5323 7350 6227 7523 4166 7482 5150 4225 6370 5784 6967 5442 7582 6286 5874 6578 5403 7133 6274 7220 6858 6955 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 7366 7908 8178 6691 5250 5166 5070 5758 7186 6066 7037 7884 6603 5886 5419 5781 6933 4218 5798 6057 6023 6703 4247 5883 9283 7055 9863 8270 8973 11473 5340 10463 8802 10826 8250 7995 8990 6774 7325 8458 8474 12346 11895 6809 9563 6721 11602 7423 9268 10132 7256 7346 8457 9704 10342 12181 8500 7579 7729 11079 8837 8406 5148 9079 9162 8852 9658 9452 8622 7455 8229 8445 7306 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 148 149 150 Ld 51 49 47 46 50 54 44 48 43 58 57 52 55 45 53 70 64 69 65 63 66 59 60 61 56 67 62 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 68 71 72 73 74 75 d=1000 76 77 79 78 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 148 149 150 On IRIS150 d, precompute! XoX, Ld=Xod nk,L,d  Lmin(Ck&Ld) xk,L,d  max(Ck&Ld) p, (pre-ccompute?) Ld,p(X-p)od=Ld-pod nk,L,d,pmin(Ck&Ld,p)=nk,L,d-pod xk,L,d.pmax(Ck&Ld,p)=xk,L,d-pod Lp,d =Ld-pod d=e1 d=e2 d=e3 d=e4 p=AvgS p=AvgE p=AvgI d=1000 p=0000 nk,L,d xk,L,d S E I d=1000 p=AS=( ) -7 8 -1 20 d=1000 p=AE=( ) d=1000 p=AI=( ) S E I d=0100 p=0000 nk,L,d xk,L,d d=0100 p=AS=( ) d=0100 p=AE=( ) d=0100 p=AI=( ) S 10 19 E I d=0010 p=0000 nk,L,d xk,L,d d=0010 p=AS=( ) -5 4 15 36 d=0010 p=AE=( ) -13 8 d=0010 p=AI=( ) -25 -4 S 1 6 E I d=0001 p=0000 nk,L,d xk,L,d d=0001 p=AS=( ) -1 4 8 16 12 23 d=0001 p=AE=( ) -12 -7 -3 5 1 12 d=0001 p=AI=( ) We have introduce 36 linear bookends to the class hulls, 1 pair for each of 4 ds, 3 ps , 3 class. For fixed d, Ck, the pTree mask is the same over the 3 p's. However we need to differentiate anyway to calculate R correctly. That is, for each d-line we get the same set of intervals for every p (just shifted by -pod). The only reason we need to have them all is to accurately compute R on each min-max interval. In fact, we computer R on all intervals (even those where a single class has been isolated) to eliminate False Positives (if FPs are possible - sometimes they are not, e.g., if we are to classify IRIS samples known to be Setosa, vErsicolor or vIriginica, then there is no "other"). Assuming Ld, nk,L,d and xk,L,d have been pre-computed and stored, the cut-pt pairs of (nk,L,d,p; xk,L,d,p) are computed without further pTree processing, by the scalar computations: nk,L,d,p = nk,L,d-pod xk,L,d.p = xk,L,d-pod.

25 LSR IRIS150 e1 only Analyze R:RnR1 (and S:RnR1?) projections on each interval formed by consecutive L:RnR1 cut-pts. Sp  (X-p)o(X-p) = XoX + L-2p + pop nk,S,p = min(Ck&Sp) xk,S,p  max(Ck&Sp) Rp,d Sp-L2p,d = L-2p-(2pod)d + pop + pod2 + XoX - L2d nk,R,p,d = min(Ck&Rp,d) xk,R,p,d  max(Ck&Rp,d) Ld d=1000 p=origin Setosa vErsicolor vIrginica d=1000 p=AS=( ) d=1000 p=AE=( ) d=1000 p=AI=( ) 270 1558 2568 16 128 24 6 34 99 393 1096 1217 1826 12 2081 3445 26 32 1 517,4 79 633 16 723 1258 279 5 171 186 748 998 12 249 794 16 1641 2391 24 132 730 1622 2281 12 17 220 26 32 388 1369 Recursion works wonderfully on IRIS: The only hull overlaps after only d=1000 are And the 4 i's common to both are {i24 i27 i28 i34}. We could call those "errors". with AI 17 220 with AE 1 517,4 78 633 eliminates FPs better? What is the cost for these additional cuts (at new p-values in an L-interval)? It looks like: make the one additional calculation: L-2p-(2pod)d then AND the interval masks, then AND the class masks? (Or if we already have all interval-class mask, only one mask AND step.) If we have computed, S:RnR1, how can we utilize it?. We can, of course simply put spherical hulls boundaries by centering on the class Avgs, e.g., Sp p=AvgS Setosa E=50 I=11 vErsicolor vIrginica If on the L 1000,avgE interval, [-1, 11) we recurse using SavgI we get 7 4 36 540,4 72 170 Thus, for IRIS at least, with only d=e1=(1000), with only the 3 ps avgS, avgE, avgI, using full linear rounds, 1 R round on each resulting interval and 1 S, the hulls end up completely disjoint. That's pretty good news! There is a lot of interesting and potentially productive (career building) engineering to do here. What is precisely the best way to intermingle p, d, L, R, S? (minimizing time and False Positives)?

26 A pTree Pillar k-means clustering method
(The k is not specified - it reveals itself.) m1 m4 Choose m1 as a pt that maximizes Distance(X, avgX) Choose m2 as a pt that maximizes Distance(X, m1) Choose m3 as a pt that maximizes h=1..2Distance(X, mh) Choose m4 as a pt that maximizes h=1..3Distance(X,mh) Do until minimumh=1..kDistance(X,mh) < Threshold m3 m2 (or Do until mk < Threshold) This gives k. Apply pk-means. (Note we already have all Dis(X,mh)s for the first round. Note: D=m1m2 line. Treat PCCs like parentheses - ( corresponds to a PCI and ) corresponds to a PC. Each matched pair should indicate a cluster somewhere in that slice. Where? One could take the VoM as the best-guess centroid? Then proceed by restricting to that slice. Or 1st apply R and do PCC parenthesizing on R values to identify radial slice where the cluster occurs. VoM of that combo slice (linear and radial) as the centroid. Apply S to confirm. Note: A possible clustering method for identifying density clusters (as opposed to round or convex clusters) (Treating PCCs like parentheses) PCI PCD PCI PCD d-line

27 Clustering: 1. For Anomaly Detection
2. To develop Classes against which we future unclassified objects are classified. ( Classification = moving up a concept hierarchy using a class assignment function, caf:X{Classes} ) When is it important not to over partition? Sometimes it is but sometimes it is not. In 2. it usually isn't. With gap clustering we don't ever over partition, but with PCC based clustering we can. If it is important that each cluster be whole, when using a k=means type clusterer, each round we can fuse Ci and Cj iff on Lmimj their projections touch or overlap. NewClu (k is discovered, not specified. Assign each (object,class) a ClassWeight, CWReals (could be <0). Classes "take next ticket" as they're discovered (tickets are 1,2,... Initially, all classes empty; All CWs=0. Do for next d, compute Ld = Xod until after masking off new cluster, count is too high (doesn't drop enough) For the next PCI in Ld (next-larger starting from smallest) If followed by a PCD, declare next Classk and define it to be the set spherically gapped (or PCDed) around centroid, Ck=Avg or VoMk over Ld-1[PCI, PCD]. Mask off this ticketed new Classk and contin If followed by a PCI, declare next Classk and define it to be the set spherically gapped (or PCDed) around the centroid, Ck=Avg or VoMk over Ld-1[ (3PCI1+PCI2)/4, PCI2 ) Mask off this ticketed new Classk and continue. For the next-smaller PCI (starting from largest) in Ld If preceded by a PCD, declare next Classk and define it to be the set spherically gapped (or PCDed) around centroid Ck=Avg or VoMk over Ld-1[PCD, PCI]. Mask off this ticketed new Classk, contin. If preceded by a PCI, declare next Classk and define it to be the set spherically gapped (or PCDed) around the centroid, Ck=Avg or VoMk over Ld-1( PCI2, (3PCI1+PCI2)/4] Mask off this ticketed new Classk and continue.


Download ppt "Ld Xod = Xod-pod= Ld-pod Lmind,k= min(Ld&Ck) Lmaxd,k= max(Ld&Ck)"

Similar presentations


Ads by Google