Download presentation
Presentation is loading. Please wait.
Published byEugenia Clark Modified over 6 years ago
1
Dear All, I figured out a very rough technique to calculate division using pTrees. It can divide a pTreeSet A by another pTreeSet B to produce a pTreeSet C. That is C=A/B. The idea is, C = A * (1/B) So our task is to find the reciprocal of a pTreeSet so that we can compute D = 1/B. So C = A * D We know how to multiply two pTreeSets. We'll use that to multiply A and D to get C. Assume B is pTreeSet having 3 pTrees. That is all the numbers in B are 3-bit numbers. So the possible values are 0-7. Now we list all these numbers and their reciprocal in the following table: B D = = 0 N/A 1 1.0 2 0.5 3 0.2 6 We know when we convert a fraction into binary we use the negative power of 2's. For example: Binary = Decimal (2^-1+ 2^-3) = = etc. To calculate the binary of D we can use this technique, considering 4 bits after fraction point. B D Binary = = ==== 0 N/A x.xxxx 1 1.0 0.5 0.25 0.2 To get the logical operation required to calculate the D, we consider the following table: B2B1B0 D4D3D2D1D0000 xxxxx Now the logic function will be as follows: D4 = B2'B1' D3 = B2'B0' D2 = B1'B0' + B2'B1B0 D1 = B2B0 + B2B1 D0 = B2'B1B0 + B2B1'B0
2
Cooper Clustering: (building barrel-shaped gaps around clusters)
Furthest Point or Mean Point q The method attempts to build barrels around clusters (masks the interior of barrel shaped gaps which separate the space into two partitions). This allows for a better fit around convex clusters that are elongated in one direction (not round - which would be the case where there is not elongation in any direction). Gaps in dot product lengths [projections] on the line. Exhaustive Search for all barrel gaps: It takes two parameters for a pseudo- exhaustive search (exhaustive modulo a grid width). 1. A StartPoint, p (an n-vector, so n dimensional) 2. A UnitVector, d (a n-direction, so n-1 dimensional - grid on the surface of sphere in Rn). Then for every choice of (p,d) (e.g., in a grid of points in R2n-1) two functionals are used to enclose subclusters in barrel shaped gaps. a. SquareBarrelRadius functional, BR(y) = (y-p)o(y-p) - ((y-p)od)2 b. BarrelLength functional, BL(y) = (y-p)od y barrel cap gap width p barrel radius gap width Given a p, do we need a full grid of d unit vectors (directions)? No! d and -d give the same BL-gaps. Given a d, do we need a full grid of p starting points? No! All p' s.t. p' = p + cd give the same BR-values (and therefore same gaps) Hill climb gap width from a good starting point and direction.
3
This suggests a clustering method:
x=s1 cone=1/√2 60 3 61 4 62 3 63 10 64 15 65 9 66 3 67 1 69 2 50 x=s2 cone=1/√2 47 1 59 2 60 4 61 3 62 6 63 10 64 10 65 5 66 4 67 4 69 1 70 1 51 x=s2 cone=.9 59 2 60 3 61 3 62 5 63 9 64 10 65 5 66 4 67 4 69 1 70 1 47 x=s2 cone=.1 39 2 40 1 41 1 44 1 45 1 46 1 47 1 i39 59 2 60 4 61 3 62 6 63 10 64 10 65 5 66 4 67 4 69 1 70 1 59 w maxs-to-mins cone=.939 i25 i40 i16 i42 i17 i38 i11 i48 22 2 23 1 i34 i50 i24 i28 i27 27 5 28 3 29 2 30 2 31 3 32 4 34 3 35 4 36 2 37 2 38 2 39 3 40 1 41 2 46 1 47 2 48 1 i39 53 1 54 2 55 1 56 1 57 8 58 5 59 4 60 7 61 4 62 5 63 5 64 1 65 3 66 1 67 1 68 1 114 14 i and 100 s/e. So picks i as 0 w naaa-xaaa cone=.95 12 1 13 2 14 1 15 2 16 1 17 1 18 4 19 3 20 2 21 3 22 5 i21 24 5 25 1 27 1 28 1 29 2 i7 41/43 e so picks e This suggests a clustering method: 1. find cosine cone gaps emanating from a corner point (or any circumscribing point). 2. "cap" the cone gap on the open end with a linear gap (actually for IRIS, the cap seems unnecessary and the cone gaps themselves seem to separate the three classes). F=(y-M)o(x-M)/|x-M|-min restricted to a cosine cone on IRIS Corner points Gap in dot product projections onto the cornerpoints line. x=e1 cone=.707 33 1 36 2 37 2 38 3 39 1 40 5 41 4 42 2 43 1 44 1 45 6 46 4 47 5 48 1 49 2 50 5 51 1 52 2 54 2 55 1 57 2 58 1 60 1 62 1 63 1 64 1 65 2 60 x=i1 cone=.707 34 1 35 1 36 2 37 2 38 3 39 5 40 4 42 6 43 2 44 7 45 5 47 2 48 3 49 3 50 3 51 4 52 3 53 2 54 2 55 4 56 2 57 1 58 1 59 1 60 1 61 1 62 1 63 1 64 1 66 1 75 w maxs cone=.707 0 2 8 1 10 3 12 2 13 1 14 3 15 1 16 3 17 5 18 3 19 5 20 6 21 2 22 4 23 3 24 3 25 9 26 3 27 3 28 3 29 5 30 3 31 4 32 3 33 2 34 2 35 2 36 4 37 1 38 1 40 1 41 4 42 5 43 5 44 7 45 3 46 1 47 6 48 6 49 2 51 1 52 2 53 1 55 1 137 w maxs cone=.93 8 1 i10 13 1 14 3 16 2 17 2 18 1 19 3 20 4 21 1 24 1 25 4 e21 e34 27 2 29 2 i7 27/29 are i's Cosine cone gap (over some angle) w aaan-aaax cone=.54 7 3 i27 i28 8 1 9 3 i20 i34 11 7 12 13 13 5 14 3 15 7 19 1 20 1 21 7 22 7 23 28 24 6 100/104 s or e so 0 picks i w maxs cone=.925 8 1 i10 13 1 14 3 16 3 17 2 18 2 19 3 20 4 21 1 24 1 25 5 e21 e34 27 2 28 1 29 2 e35 i7 31/34 are i's w xnnn-nxxx cone=.95 8 2 i22 i50 10 2 i28 i24 i27 i34 13 2 14 4 15 3 16 8 17 4 18 7 19 3 20 5 21 1 22 1 23 1 i39 43/50 e so picks out e Cosine conical gapping seems quick and easy (cosine = dot product divided by both lengths. Length of the fixed vector, x-M, is a one-time calculation. Length y-M changes with y so build the PTreeSet. But we can't divide PTreeSets yet !?!?! )
4
Squared y on f Projection Distance = yoy - (yof)2 fof
The downside of [capped] cone clustering is that we need to divide by PTreeSet |y| . So far we can't do that (without a loop)? Instead of a "capped cone" a better shape might be a "[double] capped tube". For fixed point, f, and variable point , y, we need, in addition to the dot product projection length, the dot product projection distance as well, as shown in red. f y y - f |f| yo = y - (yof) fof f squared is y - (yof) fof f o y - dot product projection distance squared = yoy - 2 (yof)2 fof fof (fof)2 yo dot product projection length f |f| squared = yoy - 2 (yof)2 fof + Now if we replace the origin by a corner point (or some other circumscribing hyper-rectangle point, p, e.g., replace y with y-p and replace f with M-p Squared y on f Projection Distance = yoy - (yof)2 fof Squared y-p on M-p Projection Distance = (y-p)o(y-p) - ( (y-p)o(M-p) )2 (M-p)o(M-p) Furthest Point or Mean Point f (or M) 1st: compute this constant [vector] = yoy -2yop + pop - ( yo(M-p) - po(M-p |M-p| 2 Gaps in dot product lengths [projections] on the line. 3rd: comp these PTreeSets (2 dots, 1 minus, 1 plus) Do not compute y-p. (shifts entire vector sp)? y cap gap width M-p |M-p| (y-p)o For the dot product length projections (caps) we already needed: = ( yo(M-p) - po M-p ) 2nd: compute this PTreeSet (1 dot, 1 minus) That is, we needed to compute the green constants and the blue and red dot product functionals in an optimal way (and then do the PTreeSet additions/subtractions/multiplications). What is optimal? (minimizing PTreeSet functional creations and PTreeSet operations.) Origin (or p) tubular gap width
5
- - - = yoy -2yop + pop - M-p |M-p| yo po M-p |M-p| yo po M-p |M-p| yo
There are three functionals in the "dot product" group for "functional gap clustering" of a VectorSpace subset, Y (yY): 1. SDp(y) = (y-p)o(y-p), p a fixed vector, the "Square Distance from a point", primarily for outlier identification and densities. 2. Pd(y) = yod, d a unit vector, the "Projection" functional. yod projection d y y - (yod)d = projection. Squaring its length: (y-yodd)o(y-yodd)=yoy-(yod)2 yod projection (neg) d y y - (yod)d so again yoy - (yod)2 = squared proj (y-p)o(y-p) - ( (y-p)o(M-p) )2 (M-p)o(M-p) = yoy -2yop + pop - 2 3. SPDd(y) = yoy - (yod)2 (d a unit vector) is the "Square Projection Distance to d" functional. E.g., if d≡(M-p)/|M-p|, d = unit vector from vector p to vector M, then SPD(y)= But to avoid creating an entirely new VectorPTreeSet(Y-p) for the space (with origin shifted to p), we think it useful to alter the expression for SPDfM to : SPDpM(y) M-p |M-p| - yo po M-p |M-p| yo where we might: 1st compute the constant vector nd the ScalarPTreeSet po M-p |M-p| yo - 3rd the constant th the SPTreeSet pop yo M-p |M-p| po - 5th the SPTreeSet th the constant yoy, yop = yoy -2yop + pop - 2 7th the SPTreeSets 8th the SPTreeSet M-p |M-p| - yo po Is it better to leave all the additions and subtractions for one mega-step at the end? (Md?) Other efficiency thoughts? M-p |M-p| (y-p)o = - yo po We note that PL(y)=yod shares many construction steps with SPD.
6
CLUS1.2 is pure Versicolor (45 of the 50).
SPD p q e14 V Ct 2 10 3 12 4 12 5 12 6 8 7 11 8 9 9 5 10 9 11 4 12 4 13 2 14 1 17 2 18 3 19 10 20 5 21 6 22 5 23 6 24 6 25 3 27 2 29 2 30 1 SPD on CLUS1 p e11 q =MN V Ct 2 3 3 4 4 5 5 7 6 2 7 2 8 6 9 6 10 3 11 4 12 2 13 4 14 4 15 3 16 2 17 1 18 5 19 1 20 2 22 2 23 1 24 1 25 1 26 1 29 1 SPD p q e14 V Ct 1 6 2 4 3 8 4 4 5 10 6 2 7 2 8 2 9 7 10 2 11 2 12 2 13 1 15 2 17 1 18 4 19 2 20 4 22 1 24 1 25 1 26 1 29 1 31 2 32 2 33 3 i15 i36 i32 SPD p q V Ct 2 8 3 10 4 10 5 10 6 5 7 10 8 6 9 8 10 6 11 1 mask: V<8.5 CTs SMs CTe SMe CTi SMi CLUS1 mask: V<12.5 5 SMe 24 SMi CLUS1.1 thin gap mask: 8.5<V<15.5 CTs SMs CTe SMe CTi SMi CLUS2 masking V>6: Total_e Masked_e Total_i Masked_i However I cheated a bit. I used p=MinVect(e) and q=MaxVect(e) which makes it somewhat supervised. START OVER WITH THE FULL > mask: V>12.5 45 SMe 0 SMi CLUS1.2 mask: V>15.5: CTs SMs CTe SMe CTi SMi This tube contains 49 setosa + 2 virginica CLUS3 CLUS1.2 is pure Versicolor (45 of the 50). CLUS3 is almost pure Setosa (49 of the 50, plus 2 virginica) CLUS2 is almost purely [1/2 of] viriginica (24 of 50, plus 1 setosa). CLUS1.1 is the other 24 virginicas, plus the other 5 versicolors. So this method clusters IRIS quite well (albeit into 4 clusters, not three). Note that caps were not put on these tubes. Also, this was NOT unsupervised clustering! I took advantage of my knowledge of the classes to carefully chose the unit vector points, p and q E.g., p = MinVector(Versicolor) and q = MaxVector(Versicolor. True, if one sequenced thru a fine enough d-grid of all unit vectors [directions], one would happen upon a unit vector closely aligned to d=q-p/|q-p| but that would be a whole lot more work that I did here (would take much longer). In worst case though, for totally unsupervised clustering. there would be no other way than to sequence through a grid of unit vectors. However, a good heuristic might be to try all unit vectors "corner-to-corner" and "middle-of-face-TO-middle-of-opposite-face" first, etc. Another thought would be to try to introduce some sort of hill climbing to "work our way" toward a good combination of a radial gap plus two good linear cap gaps for that radial gap.
7
SPD on CLUS1 p C1US1axxx q C1US1aaaa V Ct 1 3 2 5 3 9 4 13 5 18 6 12 7 4 8 1 9 2 no thinnings SPD on CLUS1 p C1US1xaxx q C1US1aaaa V Ct 1 4 2 13 3 7 4 19 5 9 6 7 7 9 8 2 SPD on CLUS1 p C1US1xxax q C1US1aaaa V Ct 1 1 2 4 3 3 4 9 5 9 6 14 7 9 8 4 9 6 10 3 11 3 12 1 14 2 15 1 no thinnings SPD on CLUS1 p C1US1xxxa q C1US1aaaa V Ct 1 1 2 3 3 10 4 15 5 16 6 12 7 7 8 3 9 1 10 1 no thinnings SPD p axxx q aaaa V Ct 2 1 3 5 4 6 5 6 6 8 7 6 8 8 9 15 10 7 11 8 12 13 13 8 14 14 15 9 16 13 17 6 18 4 19 4 20 3 21 4 23 1 25 1 mask: V<3.5 14 SM versi 10 SM virgi CL1.1? mask: V<11.5 0 SM setosa 46 SM versicolor 24 SM virginica CLUS1 mask: V>3.5 0 SM setosa 32 SM versi 14 SM virgi CLUS1.2? mask: V>11.5 50 SM setosa 4 SM versicolor 26 SM virginica CLUS2 SPD on CLUS2 p C1US2axxx q C1US2aaaa V Ct 6 2 7 2 8 6 9 13 10 7 11 7 12 4 13 5 14 11 15 9 16 2 18 4 21 2 22 1 23 3 25 1 26 1 SPD on CLUS1 p C1US1axax q C1US1aaaa V Ct 1 1 2 3 3 4 4 2 5 12 6 13 7 9 8 7 9 2 10 7 11 4 13 2 14 1 17 2 18 1 SPD on CLUS1 p C11aaxx q C11aaaa V Ct 1 1 2 7 3 10 4 13 5 13 6 13 7 6 8 2 9 2 11 1 no thinnings SPD on CLUS1 p C1US1axxa q C1US1aaaa V Ct 1 1 2 2 3 6 4 9 5 12 6 17 7 8 8 6 9 5 10 1 11 1 no thinnings mask: V<13.5 44 SM setosa 0 SM versicolor 02 SM virginica CLUS2.1 mask: V<9.5 37 SM vers 16 SM virg CL1.1? mask: 100>V>13.5 6 SM setosa 4 SM versicolor 24 SM virginica CLUS2.2 mask: V>9.5 9 SM vers 8 SM virg CL1.2? SPD on CLUS1 C11xaax C11aaaa V Ct 1 2 2 3 3 4 4 8 5 8 6 14 7 8 8 4 9 5 10 6 11 1 12 3 14 1 15 2 no thins C11axaa C11aaaa V Ct 1 2 2 2 3 2 4 10 5 3 6 13 7 8 8 7 9 4 10 3 11 6 12 2 13 2 14 2 17 2 18 1 19 1 SPD on CLUS1 C11xxaa C11aaaa V Ct 1 1 2 4 3 6 4 9 5 10 6 7 7 9 8 5 9 3 10 4 11 2 12 4 13 1 14 3 17 2 SPD on C1 C11aaax C11aaaa V Ct 1 3 2 1 3 3 4 4 5 12 6 15 7 4 8 5 9 4 10 7 11 4 12 2 13 1 14 1 15 1 17 1 18 1 19 1 SPD on CLUS1 C11xaxa C11aaaa V Ct 1 2 2 3 3 12 4 12 5 10 6 15 7 7 8 4 9 1 10 2 11 1 no thins C11aaxa C11aaaa V Ct 1 2 2 3 3 6 4 12 5 11 6 9 7 11 8 5 9 5 10 1 11 3 13 2 C11xaaa C11aaaa V Ct 1 2 2 4 3 5 4 9 5 10 6 9 7 5 8 6 9 2 10 6 11 3 12 1 13 2 14 2 15 2 17 2 mask: V<5.5 16 ver 3 virCL1.1? mask: V<5.5 26 ver 4 vir CL1.1? mask: V>5.5 30 ver 21 virCL1.1? mask: V>5.5 20 ver 20 vir CL1.1?
8
nnnn xxxx V Ct SDD 1 1 2 8 3 8 4 20 5 15 6 10 7 2 8 5 9 1 PL 0 1 2 2 3 1 5 2 6 3 7 1 8 2 9 3 11 1 12 5 13 6 14 7 15 3 16 2 17 2 18 1 19 3 20 3 22 4 23 3 24 3 25 2 26 2 27 2 28 1 29 2 e8 e44 e49 mask: V<5.5 16 ver 3 virCL1.1? mask: V>5.5 30 ver 21 virCL1.1?
9
95 remaining versicolor and virginica=SubClus1.
i p max V Ct 0 2 1 2 2 2 3 5 4 3 5 3 6 4 7 4 8 7 9 2 10 3 11 1 12 4 13 5 14 4 15 7 16 2 17 5 18 3 19 1 20 1 21 4 23 2 24 2 25 4 26 1 27 2 28 1 29 2 30 1 32 1 {e4, e40} form a doubleton outlier set i7 and e10 are singleton outliers x=s (58=avg(y1) ) V Ct 0 3 s15, s17, s34 1 12 s 6,11,16,19,20,22,28,32,3337,47,49 2 12 s 1,10 13,18,21,27,29,40,41,44,45,50 3 7 s 2,12,23,24,35,36,38 4 10 s 2,3,7,13,25,26,30,31,46,48 5 2 s4, s43 6 2 s9,s39 7 1 s14 8 1 i39 9 1 s32 ^^all 50 setosa + i39 e49 16 2 17 2 19 1 20 2 21 5 22 4 23 3 24 4 25 1 27 8 28 2 29 2 30 4 31 1 32 4 34 2 35 2 36 2 37 3 38 2 39 2 40 4 41 1 43 2 44 4 45 2 46 1 47 2 48 1 50 4 52 2 53 2 54 2 56 2 57 1 i1 i31 vv 9 virginica i10 i8 i36 i32 i16 i18 i23 i19 But here I mistakenly used the mean rather than the max corner. So I will redo - but note the high level of cluster and outlier revelation????? i p max V Ct 0 2 2 6 3 3 4 4 5 4 6 2 7 6 8 9 9 2 10 2 11 2 12 5 13 7 14 2 15 6 16 2 17 5 19 3 20 2 22 3 23 2 24 3 25 2 26 1 27 1 28 1 29 3 30 1 31 2 e32 e11 e8,44 e49 i39 60 1 61 1 62 1 63 1 64 1 65 1 66 1 67 3 68 4 69 4 70 3 71 3 72 4 73 2 74 5 75 1 76 2 77 1 78 3 79 1 s3 s9 s39,43 s42 s23 s14 2 actual gap-ouliers, checking distances reveals 4 e-outlier (versicolor), 5 s-outliers (setosa). i p max V Ct 0 2 1 1 2 3 3 3 4 4 5 2 6 6 7 3 8 5 9 4 10 4 11 2 12 3 13 4 14 6 15 4 16 1 17 7 18 2 19 3 20 2 22 2 23 1 24 2 25 4 26 4 27 1 28 2 29 2 30 1 32 2 33 1 34 1 35 1 No new outliers reviealed 95 remaining versicolor and virginica=SubClus1. Continue outlier id rounds on SC1 (maxSL, maxSW, max PW) then do "capped tube" (further subclusters.) 1. (y-p)o(y-p) remove edge outliers ( thr>2*50) 2. lthin gaps in SPD: d, from an edge point to MN. 3 For each thin PL, do len gap anal of pts in " tube". e13 i7 e40 e4 e10 F e i e e e e32 e11 e8 e44 e49 e e e e e 45 remaining setosa. This is SubCluster 2 (may have additional outliers or sub-subclusters but we will not analyse further (would be done in practice tho SPD(y) =(y-p)o(y-p)-(y-p)od2 d: mn-mx V Ct Next slide i p max V Ct 0 2 1 10 2 11 3 6 4 15 5 4 6 8 7 9 8 4 9 5 10 2 11 7 13 4 14 2 15 2 16 1 17 1 18 1 19 1 e30, e15 outliers e20,e31,e32 form SC12 Declared tripleton outlier set? (But they are not singleton outliers.) s3 s9 s39 s43 s42 s23 s s s s s s e13 e20 e15 e31 e32 e30 F e e e e e e
10
1. (y-p)o(y-p) remove edge outliers ( thr>2*50)
SPD(y) =(y-p)o(y-p)-(y-p)od2 d: nnnn-to-xxxx gp>2/50 V Ct 0 1 1 1 2 2 3 3 4 3 5 3 6 3 7 4 8 2 9 2 10 7 11 3 12 3 14 1 15 4 16 5 17 3 18 8 19 1 20 5 21 2 22 2 23 1 24 2 25 3 27 2 28 2 29 1 30 3 31 2 32 1 33 1 (y-p)o(y-p)-(y-p)od2 d: naaa-to-xaaa V Ct 2 5 3 10 4 12 5 16 6 3 7 10 8 6 9 5 10 3 11 6 12 3 13 2 1. (y-p)o(y-p) remove edge outliers ( thr>2*50) 2. lthin gaps in SPD: d, from an edge point to MN. 3 For each thin PL, do len gap anal of pts in " tube". (y-p)o(y-p)-(y-p)od2 d: nxnn-to-xnxx V Ct 0 1 1 1 2 1 3 3 5 3 6 5 7 2 8 6 9 1 10 5 11 4 12 3 13 3 14 3 15 3 16 8 17 3 18 1 19 7 20 3 21 2 22 1 23 1 24 1 25 2 26 1 27 3 28 3 29 2 30 2 i19 i23 i6 i8 i31 i36 F i i i i i i i19,i23,i6 outliers (y-p)o(y-p)-(y-p)od2 d: aaan-to-aaa V Ct 2 5 3 10 4 12 5 16 6 3 7 10 8 6 9 5 10 3 11 6 12 3 13 2 (y-p)o(y-p)-(y-p)od2 d: xnnn-to-nxxx V Ct 3 1 5 1 6 4 7 6 8 5 9 5 10 10 11 7 12 4 13 8 14 5 15 5 16 4 17 3 18 3 19 8 20 1 21 1 22 2 23 1 24 1 26 1 (y-p)o(y-p)-(y-p)od2 d: nnxn-to-xxnx V Ct 6 3 7 6 8 5 9 9 10 8 11 15 12 6 13 7 14 11 15 5 16 4 17 1 18 1 (y-p)o(y-p)-(y-p)od2 d: nnnx-to-xxxn V Ct 2 1 3 1 4 3 5 2 6 2 7 2 8 2 9 6 10 5 11 7 12 7 13 1 14 2 15 9 16 4 17 5 18 3 19 1 20 4 21 2 22 1 23 4 24 2 25 3 26 2 i1 i37 e33 e13 F i i e e i1, e13 are outliers.
11
FxM(x,y)=yo(x-M)/|x-M|-min on XX≡{(x,y)|x,yX}, where X(x,y) is a Spaeth image table Cluster by splitting at all F_gaps > 2 APPENDIX The 15 Value_Arrays (one for each x) z z z z z z z z z z z z z z z X x y \y= a b 1 1 x=1 1 f M d a b b c e c d a e 8 f 7 9 x y FxM z1 z1 14 z1 z2 12 z1 z3 12 z1 z4 11 z1 z5 10 z1 z6 6 z1 z7 1 z1 z8 2 z1 z9 0 z1 z10 2 z1 z11 2 z1 z12 1 z1 z13 2 z1 z14 0 z1 z15 5 9 5 M (=MeanVector) The 15 Count_Arrays z z z z z z z z z z z z z z z Level0, stride=z1 PointSet (as a pTree mask) z1 z2 z3 z4 z5 z6 z7 z8 z9 z10 z11 z12 z13 z14 z15 gap: 10-6 gap: 5-2 pTree masks of the 3 z1_clusters (obtained by ORing) z11 1 z12 1 z13 1 The FAUST algorithm: 1. project onto each Mx line using the dot product with the unit vector from M to x. (only x=z1 is shown) 2. Generate each Value Array, F[x0]|(y), xX (also generate the Count_Arrays and the mask pTrees). 3. Analyze all gaps and create sub-cluster pTree Masks.
12
Cluster by splitting at gaps > 2
yo(z7-M)/|z7-M| ValueArrays z z z z z z z z z z z z z z z yo(z7-M)/|z7-M| CountArrays z z z z z z z z z z z z z z z Cluster by splitting at gaps > 2 x y x\y a b 3 3 4 9 3 6 f 14 2 8 M d 13 4 a b 10 9 b c e 1110 c 9 11 d a 1111 e 8 7 8 f x y F z1 z1 14 z1 z2 12 z1 z3 12 z1 z4 11 z1 z5 10 z1 z6 6 z1 z7 1 z1 z8 2 z1 z9 0 z1 z10 2 z1 z11 2 z1 z12 1 z1 z13 2 z1 z14 0 z1 z15 5 9 5 Mean z11 1 z12 1 z13 1 gap: 6-9 z71 1 z72 1 In Step_3 of the algorithm we can: Analyze one of the gap arrays (e.g., As done for z1. Subclusters is shown above) then start over on each subcluster. Or we can analyze all gap arrays concurrently (in parallel using the same F - saving the [substantial?] re-compute costs?) and then intersect the subcluster partitions we get from each x_ValueArray gap analysis, forthe final subclustering. Here we use the second alternative, judiciously choosing only the x's that are likely to be productive (choosing z7 next). Many are likely to produce redundant partitions - e.g., z1, z2, z3, z4, z6 - as their projection lines will be nearly coincident. How should we choose the sequence of "productive" strides? One way would be to always choose the remaining stride with the shortest ValueArray, so that the chances of decent sized gaps is maximized. Other ways of choosing?
13
Cluster by splitting at gaps > 2
yo(x-M)/|x-M| Value Arrays z z z z z z z z z z z z z z z Cluster by splitting at gaps > 2 yo(x-M)/|x-M| Count Arrays z z z z z z z z z z z z z z z x y x\y a b 3 3 4 9 3 6 f 14 2 8 M d 13 4 a b 10 9 b c e 1110 c 9 11 d a 1111 e 8 7 8 f x y F z1 z1 14 z1 z2 12 z1 z3 12 z1 z4 11 z1 z5 10 z1 z6 6 z1 z7 1 z1 z8 2 z1 z9 0 z1 z10 2 z1 z11 2 z1 z12 1 z1 z13 2 z1 z14 0 z1 z15 5 9 5 Mean z11 1 z12 1 z13 1 gap: 3-7 z71 1 z72 1 zd1 1 zd2 1 We choose zd=z13 next (Should have been first? Since it's ValueArray is shortest?) Note, z8, z9, za projection lines will be nearly coincident with that of z7.
14
Cluster by splitting at gaps > 2
yo(x-M)/|x-M| Value Arrays z z z z z z z z z z z z z z z Cluster by splitting at gaps > 2 yo(x-M)/|x-M| Count Arrays z z z z z z z z z z z z z z z x y x\y a b 3 3 4 9 3 6 f 14 2 8 M d 13 4 a b 10 9 b c e 1110 c 9 11 d a 1111 e 8 7 8 f x y F z1 z1 14 z1 z2 12 z1 z3 12 z1 z4 11 z1 z5 10 z1 z6 6 z1 z7 1 z1 z8 2 z1 z9 0 z1 z10 2 z1 z11 2 z1 z12 1 z1 z13 2 z1 z14 0 z1 z15 5 9 5 Mean z11 1 z12 1 z13 1 z71 1 z72 1 zd1 1 zd2 1 AND each red with each blue with each green, to get the subcluster masks (12 ANDs producing 5 sub-clusters.
15
F1(x,y) = L1Distance(x,y) = (|x1-y1|+|x2-y2|) on XX≡{(x,y)|x,yX},
Cluster by splitting at all F1_gaps L1(x,y) Value Array z z z z z z z z z z z z z z z L1(x,y) Count Array z z z z z z z z z z z z z z z x y x\y a b 3 3 4 9 3 6 f 14 2 8 d 13 4 a b 10 9 b c e 1110 c 9 11 d a 1111 e 8 7 8 f (redundant subclustering) gap: 10-5 There is a z1-gap, but it produces a subclustering that was already discovered by a previous round. Which z values will give new subclusterings?
16
Re-confirms zf an anomaly.
L1(x,y) Value Array z z z z z z z z z z z z z z z L1(x,y) Count Array z z z z z z z z z z z z z z z This re-confirms z6 as an anomaly or outlier, since it was already declared so during the linear gap analysis. x y x\y a b 3 3 4 9 3 6 f 14 2 8 M d 13 4 a b 10 9 b c e 1110 c 9 11 d a 1111 e 8 7 8 f Re-confirms zf an anomaly. After having subclustered with linear gap analysis, which is best for determining larger subclusters, we run this round gap algorithm out only 2 steps to determine if there are any singleFvalue gaps>2 (the points in the singleFvalueGapped set are then declared anomalies). So we run it out two steps only, then find those points for which the one initial gap determined by those first two values is sufficient to declare outlierness. Doing that here, we reconfirm the outlierness of z6 and zf, while finding new outliers, z5 and za.
17
Using F=yo(x-M)/|x-M|-MIN on IRIS, one stride at a time (s1=setosa1 first)
For virginica1 Val Ct 0 1 1 1 2 2 3 5 4 6 5 11 6 12 7 4 8 2 9 5 10 1 17 1 22 1 24 2 25 1 27 1 28 1 29 2 30 1 31 3 32 4 33 1 34 4 35 2 36 2 37 4 38 4 39 5 40 4 42 6 43 2 44 7 45 5 47 2 48 3 49 3 50 3 51 4 52 3 53 2 54 2 55 4 56 2 57 1 58 1 59 1 60 1 61 1 62 1 63 1 64 1 66 1 F(i39)=17 F<17 (50 Setosa) vers1 Val Ct 0 1 2 4 3 1 4 1 5 3 6 3 7 8 8 3 9 7 10 6 11 4 12 4 13 3 15 2 19 2 20 2 21 1 26 2 27 3 28 4 30 2 31 5 32 4 33 3 34 1 36 3 37 5 38 4 39 5 40 7 41 4 42 2 43 2 44 1 45 6 46 4 47 5 48 1 49 2 50 5 51 1 52 2 54 2 55 1 57 2 58 1 60 1 62 1 63 1 64 1 65 2 F<19 (50 setosa) 19<F<22 {vers8,12,39,44,49} 22<F yo(s1-M)/|s1-M|-69) Val Ct 0 1 3 1 4 2 7 1 8 1 9 2 10 1 12 4 14 5 15 2 16 4 17 1 18 4 19 5 20 1 21 2 22 2 23 8 24 4 25 3 26 2 27 5 28 3 29 4 30 4 31 3 32 2 33 2 34 4 35 5 36 2 37 2 38 1 39 1 40 1 41 1 43 1 44 1 45 1 52 1 60 3 61 4 62 3 63 10 64 15 65 9 66 3 67 1 69 2 F(i39)=52 virginica39 is an outlier. 2 clusters, F<52 (ct=99) and F>52 (50 Setosa) virgini39 Val Ct 0 1 1 2 2 1 4 2 6 1 7 1 8 7 9 2 10 2 11 7 12 2 13 3 14 7 15 4 16 10 17 4 18 6 19 9 20 3 21 6 22 3 23 6 24 3 25 1 27 3 28 2 32 1 39 1 40 1 41 1 42 8 43 13 44 17 45 4 46 5 47 1 F=32 vers49 outlier. 32<F (50 Setosa, vir39) AVG(ver8,12,39,44,49) Val Ct 0 1 1 1 7 5 10 3 12 2 13 2 14 3 15 5 16 2 17 5 18 8 19 4 20 3 21 4 22 3 23 8 24 4 25 4 26 3 27 7 28 7 29 4 30 5 31 4 32 5 33 8 34 2 35 6 36 5 37 3 38 2 39 8 40 6 41 3 43 1 44 2 45 1 47 1 F=0 vir32 outlier F=1 vir18 outlier F=7 vir6,10,19,23,36 subcluster?
18
F=yo(x-M)/|x-M|-MIN on IRIS, subclustering as we go.
On Clus(F<52) ver1 F(virg7)=0 outlier F(virg32)=25 outlier Val Ct 0 1 4 1 5 5 6 3 7 5 8 3 9 8 10 11 11 14 12 8 13 8 14 5 15 3 16 7 17 5 18 6 19 2 20 1 21 1 22 1 25 1 F=yo(x-M)/|x-M|-MIN on IRIS, subclustering as we go. On Remaining, mx mn mx mx Val Ct 0 3 1 4 2 11 3 14 4 14 5 9 6 10 7 2 8 6 9 2 11 2 For s1 (i.e., yo(s1-M)/|s1-M|-69) Val Ct 0 1 3 1 4 2 7 1 8 1 9 2 10 1 12 4 14 5 15 2 16 4 17 1 18 4 19 5 20 1 21 2 22 2 23 8 24 4 25 3 26 2 27 5 28 3 29 4 30 4 31 3 32 2 33 2 34 4 35 5 36 2 37 2 38 1 39 1 40 1 41 1 43 1 44 1 45 1 outlier 60 3 61 4 62 3 63 10 64 15 65 9 66 3 67 1 69 2 F(i39)=52 i39=virgi39 outlier. Clusters, F<52 (ct=99) and F>52 (50 Setosa) On Remaining, max's Val Ct 0 2 e8 outlier 1 2 e11 outlier 7 2 8 1 9 4 10 1 11 2 12 2 13 4 14 3 15 1 16 4 17 2 18 2 19 3 20 4 21 6 22 5 23 5 24 4 25 2 26 2 27 1 28 2 29 4 30 5 31 1 32 3 33 2 34 2 35 3 36 2 37 1 38 1 i8 i10 i36 i6 i23 i19 i18 i6 i8 i10 i19 i23 i35 i i i i i i i6 i10 i18 i19 i23 i35 all declared outliers e4 e38 e19 i20 F e e e outlier i outlier On Remaining, max's Val Ct e44 outlier 6 1 7 2 8 1 9 3 10 1 11 3 12 5 13 2 14 2 15 3 17 3 18 3 19 5 20 1 21 9 22 5 23 4 24 2 26 4 27 2 28 2 29 4 30 2 31 3 32 3 33 2 34 3 35 2 36 1 37 1 38 1 39 1 e36 outlier? On Remaining, mx mx mx mn Val Ct 0 1 1 2 2 3 3 1 5 5 6 4 7 5 8 2 9 3 10 5 11 4 12 7 13 5 14 2 15 4 16 4 17 7 18 4 19 4 20 2 21 2 22 1 24 1 25 1 27 2 29 2 On Remaining, mn mn mx mx Val Ct 0 1 1 3 2 3 3 7 4 7 5 7 6 5 7 5 8 3 9 8 10 4 11 4 12 11 13 4 14 8 15 4 16 1 18 1 On Remaining, mn mx mx mx Val Ct 0 1 2 1 3 4 4 3 5 5 6 4 7 5 8 7 9 8 10 3 11 5 12 2 13 4 14 5 15 7 16 5 17 4 18 1 20 1 On Remaining w e35 Val Ct 0 1 i26 outlier 3 2 On remaining vir1 Val Ct 0 1 1 2 2 1 4 1 5 1 6 2 7 2 8 2 9 4 10 1 11 4 12 3 13 4 14 2 15 6 16 4 17 6 19 4 20 5 21 5 22 2 23 1 24 2 25 5 26 4 27 4 28 1 29 2 30 6 31 2 32 1 33 1 34 1 35 2 36 1 38 1 39 1 e35 e10 e e outlier i44 i3 i i ^^outlier i3 i30 i31 i26 i8 i36 i i outlier i outlier i outlier i outlier i outlier Rem mn mx mn mx Val Ct 0 1 1 1 2 1 3 1 4 1 5 1 6 1 8 1 9 3 10 5 11 5 12 3 13 7 14 6 15 4 16 6 17 7 18 5 19 4 20 2 21 3 22 7 23 4 24 3 25 1 26 1 27 2 e49 outlier On Remaining, mn mx mx mx Val Ct 0 1 1 1 2 1 3 5 4 6 5 5 6 4 7 9 8 4 9 4 10 4 11 3 12 5 13 6 14 6 15 7 16 5 17 4 18 4 20 1 22 1 Could look at distances for 0,1 and 20,22? e13 e30 e32 e outlier e outlier e i44 i45 i49 i5 i37 i1 i i i i i not outlier i outlier
19
outliers gap>L1=32.1 s6 s14 s15 s16 s17 s19 s21 s23 s24 s32 s33 s34 s37 s42 s45 e1 e2 e3 e5 e6 e7 e9 e10 e11 e12 e13 e15 e18 e19 e21 e22 e23 e27 e28 e29 e30 e34 e36 e37 e38 e41 e49 i1 i3 i4 i5 i6 i7 i8 i9 i10 i12 i14 i15 i16 i18 i19 i20 i22 i23 i25 i26 i28 i30 i31 i32 i34 i35 i36 i37 i39 i41 i42 i45 i46 i47 i49 i50 outliers gap>L1=42.8 s15 s16 s19 s23 s33 s34 s37 s42 s45 e1 e2 e7 e10 e11 e12 e13 e15 e19 e21 e22 e23 e27 e28 e30 e34 e36 e38 e41 e49 i1 i3 i5 i6 i7 i8 i9 i10 i12 i14 i15 i16 i18 i19 i20 i22 i23 i26 i30 i31 i32 i34 i35 i36 i39 outliers gp>L1=53.5 s15 s16 s23 s33 s34 s42 e10 e13 e15 e27 e28 e30 e36 e49 i1 i3 i7 i9 i10 i12 i15 i18 i19 i20 i26 i30 i32 i35 i36 i39 F=L1(x,y) on IRIS, masking to subclusters (go right down the table). Two rounds only If we use L1gap=6, remove those outliers, then use linear gap analysis for larger subcluster revalation, let's see if we can separate Versicolor (e) from virginica (i). outliers gap>L1=64.3 s15 s16 s23 s42 e10 e13 e49 i3 i7 i9 i10 i18 i19 i20 i32 i35 i36 i39 outliers gap>L1=74.95 L1gap s42 9 e13 8 i7 10 i9 12 i10 12 i35 9 i36 9 i39 26
20
Val=0;p=K;c=0;P=Pure1; For i=n to 0 {c=Ct(P&Pi); If (c>=p){Val=Val+2i; P=P&Pi }; else{p=p-c; P=P&P'i }; return Val, P; IDX z1 z2 : ze zf IDY z1 z2 z3 z4 z5 z6 z7 z8 z9 za zb zc zd ze zf : X1 1 3 : 11 7 X2 1 : 11 8 X3 1 3 2 6 9 15 14 13 10 11 7 : 1 2 3 4 9 10 11 8 X4 : P3 1 : P2 1 P1 1 : P0 1 : d(xy) 2 1 3 4 8 14 13 12 9 6 11 10 : 7 5 P'3 1 : P'2 1 : P'1 1 : P'0 1 : Need Rank(n-1) applied to each stride instead of the entire pTree. The result from stride=j gives the jth entry of SpS(X,d(x,X-x)) Parallelize over a large cluster? Ct(P&Pi): revise the Count proc to kick out count for each stride (involves loop down pTree by register-lengths? What does P represent after each step?? How does alg go on 2pDoop (w 2 level pTrees) where each stride is separate Note: using d, not d2 (fewer pTrees). Can we estimate d? (using truncated McClarin series) 23 * * * * 1 = 1 n=3: c=Ct(P&P3)=10< 14, p=14–10=4; P=P&P' (elim 10 val8) n=2: c=Ct(P&P2)= 1 < 4, p=4-1=3; P=P&P' (elim 1 val4) n=1: c=Ct(P&P1)=2 < 3, p=3-2=1; P=P&P' (elim 2 val2) n=0: c=Ct(P&P0 )=2>= P=P&P0 (elim 1 val<1) 23 * * * * 1 = 1 n=3: c=Ct(P&P3)=9< 14, p=14–9=5; P=P&P' (elim 9 val8) n=2: c=Ct(P&P2)= 0 < 5, p=5-0=5; P=P&P' (elim 0 val4) n=1: c=Ct(P&P1)=4 < 5, p=5-4=1; P=P&P' (elim 4 val2) n=0: c=Ct(P&P0 )=1>= P=P&P0 (elim 1 val<1 23 * * * * 1 = 1 n=3: c=Ct(P&P3)= 9 < 14, p=14–9=5; P=P&P' (elim 9 val8) n=2: c=Ct(P&P2)= 2 < 5, p=5-2=3; P=P&P' (elim 2 val4)2 n=1: c=Ct(P&P1)=2 < 3, p=3-2=1; P=P&P' (elim 2 val2) n=0: c=Ct(P&P0 )=2>= P=P&P0 (elim 1 val<1) 23 * * * * 1 1 = 3 n=3: c=Ct(P&P3)= 6 < 14, p=14–6=8; P=P&P' (elim 6 val8) n=2: c=Ct(P&P2)= 7 < 8, p=8-7=1; P=P&P' (elim 7 val4)2 n=1: c=Ct(P&P1)=11, p=1-1=0; P=P&P (elim 0 val2) n=0: c=Ct(P&P0 )=1 P=P&P0 (elim 0)
21
Level-1 key map Red=pure stride (so no Level-0)
e f g h i a j b c k d m 0 0 13 12 11 10 23 22 21 20 33 32 31 30 43 42 41 40 a b c d e f g h i j k m 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Level-0: key map 13 12 11 10 23 22 21 20 (6-e) = e else pur0 (6-e) = f else pur0 (6-e) = g else pur0 (6-e) = h else pur0 In this 2pDoop KEY-VALUE DB, we list keys. Should we bitmap? Each bitmap is a pTree in the KVDB. Each of these is existing, e.g., e here 5,7-a,f=f else pur0 5,7-a,f=g else pur0 5,7-a,f=h else pur0 234789bcefg els pr0 234789bcefh else pr0 124-79c-f h else pr0 (b-f) = i else pur0 (b-f) = j else pur0 (b-f) = k else pur0 (b-f) = m else pur0 (a) = j else pur0 (a) = k else pur0 (a) = m else pur0 =SpS(XX, -27( p13p33 + p13p32 + p23p43 p23p42 (3-6,8,9) k, els pr0 (3-6,8,9) m els pr0 + p13p31 + 26( p13+p23+p33+p43 +p13p12+ p23p22+ p33p32 + +p43p42 ) -26( p23p41 124679bd m els pr0 25( p13p11+ p23p21 + p33p31 + p43p41 ) -25( p13p30 +p23p40 +p12p31 +p22p41 +p12p32 +p22p42 e f 5 6 g 7 h i a j b c k d m 33 32 31 30 43 42 41 40 24( p12+p22+p32+p42 +p13p10+ +p23p20 +p33p30 +p43p40 -24(p12p30 +p22p40 +p12p11+ +p22p21 +p32p31 +p42p41 ) 23( p12p10+ p22p20 + p32p30 + p42p40 ) -23(p11p31 +p11p30 +p21p41 +p21p40 p11+p21+p31+p41 +p11p10 + +p21p20 + +p31p30 +p41p40 ) -22(p10p30 +p20p40 p10+p20+p30+p40 ) 22(
22
If (Ct≡Count(P&Pi)p)
x y yox-M ID1ID2 -MIN V P C1 V P C2 V P C3 V P C4 V P C5 V P C6 V P C7 V P C8 V P C9 V P Ca V P Cb V P Cc z1 z z1 z z1 z z1 z z1 z z1 z z1 z z1 z z1 z z1 z z1 z z1 z z1 z z1 z z1 z z2 z z2 z z2 z z2 z z2 z z2 z z2 z z2 z z2 z z2 z z2 z z2 z z2 z z2 z z2 z z3 z z3 z z3 z z3 z z3 z z3 z z3 z z3 z z3 z z3 z z3 z z3 z z3 z z3 z z3 z z4 z z4 z z4 z z4 z z4 z z4 z z4 z z4 z z4 z z4 z z4 z z4 z z4 z z4 z z4 z z5 z z5 z z5 z z5 z z5 z z5 z z5 z z5 z z5 z z5 z z5 z z5 z z5 z z5 z z5 z RankK V=0;p=K;P=Pur1; For i=n..0 { If (Ct≡Count(P&Pi)p) V=V+2i; P=P&Pi } else{ p=p-Ct; P=P&P'i } } RankK reveals the full gap situation for any functional on any vector space. (Here, the functional is the dot product with d=unit vector from the mean to each point). Use RankK with K=N, N-C1,N-C1,...,N-C1...-CmaxL1 to mine useful subcluster info (Ci=ct(P&Pi) after the ith round, N=|X|=15, mxL1=20)
23
x y yox-M ID1ID2 -MIN V P C1 V P C2 V P C3 V P C4 V P C5 V P C6 V P C7 V P C8 V P C9 V P Ca V P Cb V P Cc z6 z z6 z z6 z z6 z z6 z z6 z z6 z z6 z z6 z z6 z z6 z z6 z z6 z z6 z z6 z z7 z z7 z z7 z z7 z z7 z z7 z z7 z z7 z z7 z z7 z z7 z z7 z z7 z z7 z z7 z z8 z z8 z z8 z z8 z z8 z z8 z z8 z z8 z z8 z z8 z z8 z z8 z z8 z z8 z z8 z z9 z z9 z z9 z z9 z z9 z z9 z z9 z z9 z z9 z z9 z z9 z z9 z z9 z z9 z z9 z z10z z10z z10z z10z z10z z10z z10z z10z z10z z10z z10z z10z z10z z10z z10z
24
x y yox-M ID1ID2 -MIN V P C1 V P C2 V P C3 V P C4 V P C5 V P C6 V P C7 V P C8 V P C9 V P Ca V P Cb V P Cc z11z z11z z11z z11z z11z z11z z11z z11z z11z z11z z11z z11z z11z z11z z11z z12z z12z z12z z12z z12z z12z z12z z12z z12z z12z z12z z12z z12z z12z z12z z13z z13z z13z z13z z13z z13z z13z z13z z13z z13z z13z z13z z13z z13z z13z z14z z14z z14z z14z z14z z14z z14z z14z z14z z14z z14z z14z z14z z14z z14z z15z z15z z15z z15z z15z z15z z15z z15z z15z z15z z15z z15z z15z z15z z15z
25
ptree P=Pure1; ptreeSet P[n]; ptree T1,T2,a; ptreeSet c,t,rv,k;
For i=(n-1) to 0 {T1=P&P[i]; T2=P&P[i]’; c=PartialCount(T1); a=Compare(c,k); rv[i]=a; P=(T1&a)|(T2&a’); t=Subtract(k,c); k=(k& a) |(t& a’) ; } The difference between the two algorithms is in the method of handling (resetting) P and the parameters (rv[i] V) and (k p). Mohammad uses PTreeSets for the array of [real number] parameter values and then can avoid looping through the strides. Which is faster for big data? Should 2-level pTrees be used? If so which is better? RankK V=0;p=K;P=Pur1; For i=n..0 { If(Ct≡Count(P&Pi)p){ V=V+2i; P=P&Pi } else {p=p-Ct; P=P&P'i } }
26
Stride 1 P' P3 1 010 1 05 P'3P' P'3P P3P' P3P2 1 08 1 12 1 02 1 03 P'3P'2P' P'3P2P'1 P'3P'2P P'3P2P1 P3P'2P' P3P2 P'1 P3P'2P P3P2 P1 1 04 1 04 1 01 11 00 1 2 1 02 1 01 P'3P'2P'1P' P'3P2P'1P'0 P'3P'2P'1P P'3P2P'1P0 P'3P'2P1P' P'3P2P1P'0 P'3P'2P1P P'3P2P1P0 P3P'2P'1P' P3P2P'1P'0 P3P'2P'1P P3P2P'1P0 P3P'2P1P' P3P2P1P'0 P3P'2P1P P3P2P1P0 00 : 1 03 : 1 04 00 1 01 00 11 00 1 02 00 1 02 00 00 1 01 If all these pTree ANDs are pre-computed and stored (with their 1-counts) for each stride, the Rank alg can be run accessing the counts only. E.g., if n=1,000,000=1M then N=1T and there are 1M strides to pre-compute ;-( If the bitwidth is 4, then each stride requires these 30=( )=25-2 pre-computed level-0 pTrees and counts. If the bitwidth=b each stride requires (i=1..b2i = 2b+1-2 pre-computations. E.g., ~=1018 for b=32, so one would do this only for, say, the high order 8 bits. Descending the tree, 1bits turn to 0bits only. Therefore, the counts are non-increasing and the count across at any level stays at n=1M =106, 31 1 21 1 11 1 01 1 32 1 22 1 12 1 02 1 33 1 23 1 13 1 03 1 34 1 24 1 14 1 04 1 35 1 25 1 15 1 05 1 36 1 26 1 16 1 06 1 37 1 27 1 17 1 07 1 38 1 28 1 18 1 08 1 39 1 29 1 19 1 09 1 3a 1 2a 1 1a 1 0a 1 3b 1 2b 1 1b 1 0b 1 3c 1 2c 1 1c 1 0c 1 3d 1 2d 1 1d 1 0d 1 3e 1 2e 1 1e 1 0e 1 3f 1 2f 1 1f 1 0f 1 LEVEL-0 of PTreeSet yo(z1-M)/|z1-M|
27
y = yoy -2yop + pop - ( yo(M-p) - po(M-p |M-p| M-p |M-p| (y-p)o
Using a "capped tube". Given a unit vector, d, we need the d_dot_product_projection_lengths, and the d_dot_product_projection_distances. y squared is (y-(yod)d)o(y-(yod)d) = yoy -2(yod)2 + (yod)2 = yoy - (yod)2 | y - (yod)d | dot product projection distance (yod)d Note, this projection_distance is the perpendicular distance from the point, y, to the d_line and has nothing to do with the origin of the vector, y. Note, this projection_distance is the perpendicular distance from the point, y, to the d_line and has nothing to do with the origin of the vector, y. Squared y-p on M-p Projection Distance = (y-p)o(y-p) - ( (y-p)o(M-p) )2 (M-p)o(M-p) Furthest Point or Mean Point f (or M) 1st: compute this constant [vector] = yoy -2yop + pop - ( yo(M-p) - po(M-p |M-p| 2 Gaps in dot product lengths [projections] on the line. 3rd: comp these PTreeSets (2 dots, 1 minus, 1 plus) Do not compute y-p. (shifts entire vector sp)? y cap gap width M-p |M-p| (y-p)o For the dot product length projections (caps) we already needed: = ( yo(M-p) - po M-p ) 2nd: compute this PTreeSet (1 dot, 1 minus) That is, we needed to compute the green constants and the blue and red dot product functionals in an optimal way (and then do the PTreeSet additions/subtractions/multiplications). What is optimal? (minimizing PTreeSet functional creations and PTreeSet operations.) p tubular gap width Origin
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.