Presentation is loading. Please wait.

Presentation is loading. Please wait.

FAUST Using std or rankK instead of just gap size to determine the best gap and/or using multiple attribute cutting can improve accuracy. We need a pTree.

Similar presentations


Presentation on theme: "FAUST Using std or rankK instead of just gap size to determine the best gap and/or using multiple attribute cutting can improve accuracy. We need a pTree."— Presentation transcript:

1 FAUST Using std or rankK instead of just gap size to determine the best gap and/or using multiple attribute cutting can improve accuracy. We need a pTree ALGEBRA (already well started with the pTree Algebra paper - involves the pTree operators, AND, OR, COMP, XOR, ... and their algebraic properties (communtativity, associativity, distributivity, ...) We need a pTree CALCULUS (functions that produce the pTree mask for just about any pTree-defining predicate). Note that in FAUST{div,gap}, we cut perpendicular on the single attributes line, which contains the maximum consecutive-class-mean gap and use the midpoint of that gap (or the confluence point of maximal stds) as cut_point to separate the entire remaining space of pixels into 2 big boxes, one containing one partition of the remaining classes and the other, the balance of classes. We never cut oblique to attribute lines! Then we do it again on one of those sub-partitions until we reach a single class. Can that be improved?  With respect to speed, probably not. Can accuracy be improved without sacrificing speed (too much)?  Here is a way at about the same speed.  As motivation, think about a blue-red cars class. (Define, e.g., as 2 parts red, 1 part blue). We want to do a cut (at the midpoint of the maximal gap) maximizing over all oblique directions, not just along dimensions (since the dimensions form a measure zero set of all possible directions).  E.g., a “blue-red” cut would define a line at a 30 degree angle from the red axis toward the blue points in the blue-red direction. If D is any unit vector, X dot D = i=1..nXi*Di. X dot D > cut_point defines an oblique big box.  We ought to consider all D-lines (noting that dimension lines ARE D-lines).  For this we will need an Multi-attribute "EIN-Oblique" mask pTree formula: P(X dot D)>a where X is any vector and D is an oblique vector (NOTE: if D=ei=(0,...,1,...0) then this is just the existing EIN formula for the ith dimension, PXi > a ). The pTree formula for the dot product is in the pTree book, pages (and Mohammad is developing a better one?). We would like a recursive, exhaustive search for the vector D that gives us the maximal gap among the consecutive training class means for the classes that remain (not just over all attribute directions, but all combination directions). How can we find it? 1st examples:

2 D-line mean for the b class D-line mean for the r class
Cut-HyperPlane, CHP Using a quadratic hyper-surface? (instead of a hyper-plane) Suppose there are just 2 attributes (red and blue) and we (r,b)-scatter plot the 10 reddish-blue class training points and the 10 bluish-red class training points:                                                                                                                                                                                                                     b       b                                                                                          b      b                                                                                                  b     b      b                                                                                r               b                                                                                        r          b                                                                                   r       r b                                                                             r         r                                                                              r      r                                                                                     r     r   > gap D-line mean for the b class D-line mean for the r class D Take the r and the b points that project closest to the D-line as the "best" support pair. similarly for the "next best" or "second best" support pair similarly for the "third best" pair. Form the quadratic support curve from the three r-support points for class-r Form the quadratic support curve from the three b-support points for class-b (or move each point in each pair 1/3 of the way toward the other and then do the above) or ????.

3 Fitting a parabolic hyper-surface
Cut-HyperPlane, CHP Suppose there are just 2 attributes (red and blue) and we (r,b)-scatter plot the 10 reddish-blue class training points and the 10 bluish-red class training points:                                                                                                                                                                                                                     b       b                                                                                          b      b                                                                                                  b     b      b                                                                                r               b                                                                                        r          b                                                                                   r       r b                                                                             r         r                                                                              r      r                                                                                     r     r   > Fitting a parabolic hyper-surface D Fitting a parabola with focus=p=b-mean and directrix = line_perpendicular_to_the_D-line through mean midpoint, with pTree mask, Letting M=mrmb, we want the mask pTree, P MoX > d(p,X) MoX > d(p,X)  (MoX)2 > d2(p,X) and (MoX)2 = (m1x1 + m2x2)2 d2(p,X) = (p1-x1)2 + p2-x2)2 P(mrmb)oX=|mr+mb|/2 m12x12 + 2m1m2x1x2 + m22x22 > p12x12 +2p1x1 + p p22x22 +2p2x2 + p22 (m12-p12)x m1m2x1x (m22-p22)x p1x p2x2 > p12 + p22 P should do it. (m12-p12)x12 + 2m1m2x1x2 + (m22-p22)x22 - 2p1x1 - 2p2x2 > p12 + p22

4 APPENDIX on FAUST is a Near Neighbor Classifier
APPENDIX on FAUST is a Near Neighbor Classifier. It is not a Voting NNC like pCkNN (where for each unclassified sample pCkNN builds around that sample, a neighborhood of TrainingSet voters, who then classify sample through majority, plurality or weighted (in PINE) vote. pCkNN classifies one unclassified sample at a time. FAUST is meant for speed and therefore FAUST attempts to classify all unclassified samples at one time. FAUST builds a Big Box Neighborhood (BBN) for each class and then classifies all unclassified samples in the BBN into that class (constructing said class-BBNs with one EIN pTree calculation per class). R aG bG G B aR bR aB bB The BBNs can overlap, so the classification needs to be done one class at a time sequentially, in maximum gap, maximum number of std's in gap, or minimum rankK in gap order.) The whole process can be iterated as in k-means classification using the predicted classes [or subsets of] as the new training set. This can be continued until convergence. R aG bG G B aR bR aB bB A BBN can be a coordinate box: for coord R, cb(R,class,aR,bR) is all x such that aR<xR<bR Either or both of the < can be  or . aR and bR are what were called cut_points of the class. Or BBNs can be multi-coordinate boxes, which are INTERSECTIONs of the best k (kn-1, assuming n classes) cb's for a given class ("best" can be wrt any of the above maximizations). And instead of using a fixed number of coordinates, k, we could use only those in which the "quality" of its cb is higher than a threshold, where "quality" might be measured involving the dimensions of the gaps (or other ways?). FAUST could be combined with pCkNN (probably in many ways) as follows; FAUST multi-coordinate BBN could be used first to classify the "easy points" (that fall in an intersection of high quality BBNs and are therefore fairly certain to be correctly classified). Then for the remaining "difficult points" could be classified using the original training set (or the union of each original TrainingSet class with the new "easy points" of that same class) and using L or Lp , p = 1 or 2.

5 P(mbmr)oX>|mr+mb|/2
A Multi-attribute EIN Oblique (EINO) based heuristic: Instead of finding the best D, take the vector connecting a class mean to another class means as D To separate r from v: D=(mvmr) and a=|mv+vr|/2     r   r r v v        r  mr   r      v v v       r    r       v mv v      r    b v v     r            b    b v                     b  mb  b                   b   b                              b    b b   To separate r from b: D=(mbmr) and a=|mb+vr|/2 Question: What's the best as cutpt? mean, vector_of_medians, outermost, outermost_non-outlier? ANDing the two pTrees masks the region (which is r) P(mbmr)oX>|mr+mb|/2 By "outermost, I mean the "furthest points away from the means in each class (in terms of their projections of the D-line); By "outermost non-outlie" I mean the furthest non-outlier points; Other possibilities: the best rankK points, the best std points, etc. Comments on where to go from here (assuming we can do the above): I think the "medoid-to-mediod" method on this page is close to optimal provided the classes are convex. If they are not convex, then some sort of Support Vector Machines, SVMs, would be the next step. In SVMs the space is translated to higher dimensions in such a way that the classes ARE convex. The inner product in that space is equivalent to a kernel function in the original space so that one need not even do the translation to get inner product based results (the genius of the method). Final note: I should say "linearly separable instead of convex (slightly weaker condition). P(mvmr)oX>|mr+mv|/2 masks vectors that makes a shadow on mr side of the midpt     r   r r v v        r  mr   r      v v v       r    r       v mv v      r    b v v     r            b    b v                     b  mb  b                   b   b                              b    b b   For classes r and b

6 FAUST: For isolating classi
A Multi-attribute EIN Oblique (EINO) based heuristic: Instead of finding the best D, take the vector connecting a class mean to another class mean as D (d = D/|D|) PdoX>a = PdiXi>a Where a can be calculated either as (let d = D/|D| 1. a = ( domr + domv )/2 2. Letting ar=max{dor}; av=min{dov} (if domr < domv, else reverse max and min). For r take a = av 3. Using variance fits.(or rankK fits). (Notes: Apply to all other classes or only those for which there is a positive gap.) P(mrmv)/|mrmv|oX<a FAUST: For isolating classi 1. Create table, TBL[classi, meanvectori]( classj, meanvectorj ) 2. Apply the pTree mask formula at left. Note: If we take the fastest route and just pick the one class which when paired with r, gives the max gap, then we can use max gap or maximum-std-point is used instead of midpoint of max gap, then we need stdj (or variancej) in TBL.     r   r r v v        r  mr   r      v v v       r    r       v mv v      r    v v     r            v                     D = mrmv For classes r and b

7 Suppose there are just 2 attributes (red and blue) and we (r,b)-scatter plot the 10 reddish-blue class training points and the 10 bluish-red class training points:                                                            blue                                                              ^                                                              |                              rb      rb                                                              |                           rb     rb                                                                  |                               rb    rb     rb                                                                  |             br              rb                                                                  |                     br         rb                                                               |                   br      br rb                                                              |               br         br                                                              |               br     br                                                                   |                 br    br   >red D-line mean for the rb class D-line mean for the br class etc. gap Consecutive class mean mid-point = Cut_Point D Cut-HyperPlane, CHP (what we are after) Clearly we would want to find a ~45 degree unit vector, D, then calculate the means of the projections of the two training sets onto the D-line then use the midpoint of the gap between those two means as the cut_point (erecting a perpendicular bisector "hyperplane" to D there - which separates the space into the two class big boxes on each side of the hyperplane. Can it an be masked using one EIN formula??):                                                              ^ blue                                                              |                              rb      rb                                                              |                           rb     rb                                                                  |                               rb    rb     rb                                                                  |             br              rb                                                                  |                     br         rb                                                               |                   br      br rb                                                              |               br         br                                                              |               br     br                                                                   |                 br    br   >red The above "diagonal" cutting produces a perfect classification (of the training points). If we had considered only cut_points along coordinate axes, it would have been very imperfect!

8 D-line mean for the rb class D-line mean for the br class
How do we search through all possible angles for the D that will maximize that gap? We would have to develop the formula (pTree only formula) for the class means for any D and then maximize the gap (distance between consecutive D-projected means). Take a look at the formulas in the book, think about it, take a look at Mohammad’s formulas, see if you can come up with the mega formula above. Let D = (D1, …, Dn) be a unit vector (our “cut_line direction vector) D dot X = D1X1+ …+DnXn is the length of the perpendicular projection of X on D (length of the high noon shadow that X makes on the D line, as if D were the earth). So, we project every training point, Xc,i (class=c, i=1..10), onto D (i.e., D dot Xc,i). Calculate D-line class means, (1/n)(D dot Xc,i), select the max consecutive mean gap along D, (call it best_gap(D)=bg(D). Maximize bg(D) over all possible D. Harder? Calculate it for a [polar] grid of D’s! Maximize over that grid. Then use continuity and hill climbing to improve it. blue     ^     |                              rb      rb     |                           rb     rb     rb     |                               rb    rb     rb         |                             rb     rb     |                                         |                   br      br     |               br         br     |               br     br     br     |                 br    br  br >red D gap etc. Cut_point Cut-HyperPlane, CHP More likely the situation would be: rb's are more blue than red and br's are more red than blue. Suppose there are just 2 attributes (red and blue) and we (r,b)-scatter plot the 10 reddish-blue class training points and the 10 bluish-red class training points:                                                                                                                                                                                                                    rb      rb                                                                                         rb     rb                                                                                                 rb    rb     rb                                                                               br              rb                                                                                       br         rb                                                                                  br      br rb                                                                             br         br                                                                             br     br                                                                                    br    br   > red gap blue    rb      rb        rb      rb      rb    rb    rb           rb     br     rb          br       br                     br     br                   br      br                              br    br   blue red D-line mean for the rb class D-line mean for the br class gap D D What if the training points are shifted away from the origin? This should convince you that it still works.

9 g b    grb  grb grb        grb    grb  grb    grb grb                    grb  In higher dimensions, nothing changes (If there are "convex" clustered classes, FAUST{div,oblique_gap} can find them (consider greenish-redish-blue and bluish-greenish-red):                      bgr      bgr                  bgr       bgr                           bgr  bgr bgr   bgr bgr bgr r D Before considering the pTree formulas for the above, we note again that any pair of classes (multi-classes, as in divisive) that are convex, can be separated by this method. What if they are not convex? A 2-D example: A couple of comments. FAUST resembles the SVD (Support Vector Machine) method in that it constructs a separating hyperplane in the "margin" between classes. The beauty of SVD (over FAUST and all other methods) is that it is provable that there is a transformation to a higher dimensions that renders two non-hyperplane seperable classes to being hyperplane seperable (and you don't actually have to do the transformation - just determine the kernel that produces it.). The problem with SVD is that that it is computationally intensive. I think we want to keep FAUST simple (and fast!). If we can do this generalization, I think it will be a real winner! How do we search over all possible Oblique vectors, D, for the one that is "best"? Of if we are to use multi-box neighborhoods, how do we do that? A heuristic method follows:

10 FAUST_pdq_std (using std's) 1.1
Create attribute tables with cl=class, mn, std, n=max_#_stds_in_gap, cp=cut_point (value in the gap which allows the max # of stds, n, to fit forward from mean (using its std) and backward from next mean (using its std)). n satisfies: mean+n*std=meanG-n*stdG so n=(mnG-mn)/(std+stdG) se se se se se se se se se se ve ve ve ve ve ve ve ve ve ve vi vi vi vi vi vi vi vi vi vi 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 TpLN cl mn std n cp se = ve TA rec with max n Note, since there is also a case with n=4.1 which results in the same partition (into {se} and {ve,vi}) we might use both for improved accuracy - certainly we can do this with sequential! = 19 1 1 1 1 1 1 1 TsLN cl mn std n cp se ve vi TsWD cl mn std n cp ve vi se TpLN cl mn std n cp se ve vi TpWD cl n std n cp se ve vi se_means se_std se_ve_n se_vi_n se_ve_cp se_vi_cp ve_means ve_std ve_vi_n ve_se_n ve_vi_cp ve_se_cp vi_means vi_std vi_se_n vi_ve_n vi_se_cp vi_ve_cp Remove se from RC (={ve, vi} now) and TA's

11 FAUST_pdq using std's Use the 4 Attribute tables with rv=mean, stds and max_#_stds_in_gap=n, cut value, cp (cp=value in gap which allows max # of stds, n, to fit forward from that mean (using its std) and backward from next mean, meanG, (using stdG). n satisfies mean + n*std = meanG - n*stdG so n=(meanG-mean)/(std+stdG). se se se se se se se se se se ve ve ve ve ve ve ve ve ve ve vi vi vi vi vi vi vi vi vi vi 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 TpWD cl mn std n cp ve vi TA rec with max n 16= P{vi} =PpWD>16 1 1 1 1 1 1 1 Note that we get perfect accuracy with one epoch using stds this way!!! TsLN cl mn std n cp se ve vi TsWD cl mn std n cp ve vi se TpLN cl mn std n cp se ve vi TpWD cl mn std n cp se ve vi

12 FAUST_pdq SUMMARY We conclude that FAUST_pdq will be fast (no loops, one pTree mask per step, may converge with 1 [or just a few] epochs?? and is fairly accurate (completely accurate in this example using the std method!). FAUST_pdq is improved (accuracy-wise) by using standard_deviation-based gap measurements and choosing the maximum number of stds as the attribute relevancy choice. There may be many other such improvements one can think of, e.g., using an outlier identification method (see Dr. Dongmei Ren's thesis) to determine the set of non-outliers in each attribute and class. Within each attribute, order by means and define gaps to be between the maximum non-outlier value in one class and the minimum non-outlier value in the next (allowing these gap measurements to be negative if the max of one exceeds the minimum of the next). Also there are many ways of defining representative values (means, medians, rank-points, ...) In Conclusion, FAUST_pdq is intended to be very fast (if raw speed is the need - as it might be for initial processing of the massive and numerous image datasets that the DoD has to categorize and store). It may be fairly accurate as well, depending upon the dataset, but since it uses only one attribute or feature for each division, it is not likely to be of maximal accuracy compared to other methods (such as the FAUST_pms coming up). Next look at FAUST_pms (pTree-based, m-attribute cut_points, sequential (1 class divided off at a time) so we can explore the various choices for m (from 1 to the table width) and alternate distance measures.

13 K=10 1 1 1 1 1 1 1 1 For i=4..0 { c=rc(Pc&Patt,i);
LO 1 HI 16 1 1 25 1 36 1 44 1 serc= seRK= verc= veRK= virc= viRK= seps= veps= vips= K=10 For i= { c=rc(Pc&Patt,i); if (cps){ rankK+= 2i; Pc=(Pc&Patt,i)} [rank(n-K+1)+=2i;] else { ps=ps-c; Pc=Pc&P'att,i }} 4 25 pWD_vi_LO=16  pWD_se_HI=0, pWD_ve_HI=0. So the highest pWD_se_HI and pWD_ve_HI can get is 15 and lowest pWD_vi_LO will ever be is 16. So cutting 16 will separate all vi from {se,ve}. This is, of course, with reference to the training set only and it may not carry over to the test set (much bigger set?) especially since the gap may be small (=1). Here we will use pWDcutpt16 to peal off vi! We need a theorem proof here!!! 1 16 1 25 36 44 26 25 1 16 1 25 1 36 1 44 26 25 26 24 sLN= sWD= pLN= pWD=4 16 1 25 36 44 15' 16 1 16' 1 15 1 13 1 14 1 12 1 12' 1 13' 1 14' 1 10' 1 11' 1 11 1 10 1 25' 1 25 1 21' 1 20 1 21 1 22' 1 20' 1 22 1 23' 1 23 1 24 1 24' 36 1 35' 35 1 36' 1 30 1 30' 1 34 1 33' 1 33 1 32 1 31 1 31' 1 34' 1 32' 1 41' 44 1 40 1 42 1 40' 1 41 43 1 42' 1 43' 1 44' 1 16 1 25 36 44 1 15' 1 16 1 16' 1 15 1 14 1 12 1 12' 1 13 1 14' 1 13' 1 11 1 10' 1 11' 1 10 1 25' 1 25 1 21 1 21' 1 23 1 24 1 24' 1 22 1 22' 1 20 1 20' 1 23' 1 36' 35' 1 35 36 1 30 1 30' 1 34' 1 32' 1 31' 1 31 1 33' 1 34 1 32 1 33 1 40 44 1 40' 1 41' 1 42 1 44' 1 41 1 43 1 42' 1 43' 1 36 1 16 1 25 1 44 1 16' 1 15' 1 16 1 15 1 13' 1 12 1 14' 1 14 1 12' 1 13 1 11 1 10 1 11' 1 10' 1 25' 1 25 1 21' 1 20 1 24 1 21 1 24' 1 23 1 23' 1 22 1 20' 1 22' 1 36' 1 35' 1 35 1 36 1 30' 1 30 1 33' 1 33 1 34' 1 31' 1 32 1 31 1 32' 1 34 1 44 1 40 1 41' 1 40' 1 42 1 41 1 43 44' 1 43' 1 42' 24 sLN=1 sWD=2 pLN=3 pWD=4

14 For i=4..0 { c=rc(Pc&Patt,i); if (cps){ rankK+= 2i; Pc=(Pc&Patt,i)}
10 LO 1 HI 1 16' 1 15 1 1 25 1 24 1 1 36' 35 1 1 44' 43 1 serc= seRK= verc= veRK= virc= viRK= seps= veps= vips= For i= { c=rc(Pc&Patt,i); if (cps){ rankK+= 2i; Pc=(Pc&Patt,i)} [rank(n-K+1)+=2i;] else { ps=ps-c; Pc=Pc&P'att,i }} 3 25 25 +24 1 16 1 15 1 25 1 24 1 36' 1 35 1 44' 1 43 pLN_ve_LO=32  pLN_se_HI=0. So the highest pLN_se_HI can get is 31 and lowest pLN_ve_LO will ever be is 32. So cutting 32 will separate all ve from se! Greater accuracy can be gained by continuing the process for all i and for all K then looking for the best gaps! (all gaps?) (all gaps weighted?) 26 25 +24 25 23 1 16' 1 15 1 25' 1 24 1 36' 35 1 44' 43 4 15' 1 15 1 12' 1 12 1 14 1 14' 1 13 1 13' 1 11 1 10' 1 10 1 11' 1 21 1 21' 1 20 1 22' 1 22 1 23' 1 24 1 24' 1 23 1 20' 1 35' 35 1 30 1 30' 1 31 1 33' 1 32 1 33 1 34 1 31' 1 32' 1 34' 1 41' 1 40 1 42 1 40' 1 41 43 1 42' 1 43' 25 1 16' 1 15 1 25' 1 24 1 36' 1 35 6 8 1 43 1 44' 1 15' 1 15 1 12' 1 13 1 14' 1 14 1 12 1 13' 1 11 1 11' 1 10' 1 10 1 21' 1 21 1 23 1 22' 1 24' 1 20' 1 24 1 22 1 20 1 23' 35' 1 35 1 30 1 30' 1 34' 1 32 1 33 1 34 1 31' 1 33' 1 32' 1 31 1 40 1 40' 1 41' 1 41 1 42 1 42' 1 43 1 43' 25 25

15 APPENDIX FAUST{pdq,mrk} (FAUST{pdq} w max rank_k) rank_k(S) is smallest kth largest value in S. FAUST{pdq,gap} divisive, quiet (no noise) with gaps  attr, A TA(class, md, k, cp) its attribute table ordered on md asc, where 0. attr, A TA(class, rv, gap) ord on rv asc (rv=cls rep, gap=dis to next rv. k s.t. it's max k value s.t. set_rank_k of class and set_rank_(1-k)' of the next class. (note: the rank_k for k=1/2 is median, k=1 is maximum and k=0 is the min. Same alg can clearly be used as pms FAUST{pms,mrk} WHILE RC not empty, DO 1. Find the TA record with maximum gap: 2. PA>c (c=rv+gap/2) to div RC at c into LT, GT (pTrees, PLT and PGT). 3. If LT or GT singleton {remove class) END_DO FAUST{pdq,std} (FAUST{pdq} using # of gap standard devs) 0. For each attribute, A TA(class, mn, std, n, cp) is its attribute table ordered on n asc, where cp=val in gap allowing max # of stds, n. n satisfies: mean+n*std=meanG-n*stdG so n=(mnG-mn)/(std+stdG) WHILE RC not empty, DO 1. Find the TA record with maximum n: 2. Use PA>cp to divide RC at cp=cutpoint into LT and GT (pTree masks, PLT and PGT). 3. If LT or GT singleton {remove that class from RC and from all TA's} END_DO FAUST{pms,gap} (FAUST{p} m attr cut_pts, seq class separation (1 class at time, m=1 0. For each A, TA(class, rv, gap, avgap), where avgap is avg of gap and previous_gap (if 1st avgap = gap). If x classes. DO x-1 times 1. Find the TA record with maximum avgap: 2. cL=rv-prev_gap/2. cG=rv+gap/2, masks Pclass=PA>cL&PAcG&PRC PRC=P'class&PRC (If 1st in TA (no prev_gap), Pclass=PAcG&PRC. Last, Pclass=PA>cL&PRC. 3. Remove that class from RC and from all TA's END_DO FAUST{pms,std} (FAUST{pms} using # gap std 0. attr, A TA(class, mn, std, n, avgn, cp) ordered avgn asc cp=cut_point (value in gap which allows max # of stds, n, (n satisfies: mn+n*std=mnnext-n*stdnext so n=(mnnext-mn)/(std+stdt) DO x-1 times 1. Find the TA record with maximum avgn: 2. cL=rv-prev_gap/2. cG=rv+gap/2 and pTree masks Pclass=PA>cL& PAcG&PRC PRC =P'class&PRC (If class 1st in TA (has no prev_gap), then Pclass =PAcG&PRC. If last, Pclass =PA>cL&PRC.) 3. Remove that class from RC and from all TA's END_DO


Download ppt "FAUST Using std or rankK instead of just gap size to determine the best gap and/or using multiple attribute cutting can improve accuracy. We need a pTree."

Similar presentations


Ads by Google