Download presentation
Presentation is loading. Please wait.
Published byEric Garry Wells Modified over 5 years ago
1
PAj>c=Pj,m om...ok+1Pj,k oi is AND iff bi=1, k is rightmost bit position with bit-value "0", ops are right binding. c = bm bk b0 se se se se se se se se se se ve ve ve ve ve ve ve ve ve ve vi vi vi vi vi vi vi vi vi vi sLN sWD pLN pWD SEPAL_LENGTH SEPAL_WIDTH PEDAL_LENGTH PEDAL_WIDTH Relevancy Interval pTree k-Means (ripm) Find mean, std of each class and column of TrainSet For each column (feature) and each class, define left_relevancy_pt (lrp) = mean - x * std, right_relevancy_pt (rrp) = mean + x * std and (ri) relevancy_interval, [lrp, rrp]class,column A column has "high_relevancy" for class_k iff the gaps (which can be negative?) between the class_k_ri and the other class_ri's is large (size of these gaps (# stds) determines the level of relevancy. Given 10 tuples from each class as TrainSet (1st 10) For x = 2, calculate: versicolor MEANS setosa sepalLN sepalWD pedalLN pedalWD virginica Take 5 from each class as TestSet, remove class and predict se se se se se ve ve ve ve ve vi vi vi vi vi virginica STDs setosa versicolor sepalWD is irrelevant since the means are close and even 1 std radius makes all 3 ri's highly overlapping. Similarly, sepalLN is fairly irrelevant. pedLN and pedWD are certainly relevant for setosa/~setosa classification. pedLN and pedWD may be redundant (correlated?). It can't hurt to use both. We can note that pedLN is not as relevant in ~setosa for distinguishing ve/vi as is pedWD. Therefore a good minimal choice would be to use pedWD and 2*STD for ve/vi classification also.
2
For 2 stds, we get se/~se from pWD. We use a ~midpt cutoff = 7.
7 Greater than 7 For 2 stds, we get se/~se from pWD. We use a ~midpt cutoff = 7. 0.7, se 10.6, ve 14.8, vi spLN spWD pdLN pdWD pedal pedal WDlrp, WDrrp 2*std se se se se se ve ve ve ve ve vi vi vi vi vi Next we use pWD ripm on NOTse to differentiate ve and vi. Use cutoffs, pWD = ( )/2 = 15.9 15.9 Greater than 15.9 ve ve ve ve ve vi vi vi vi vi The method is 100% accurate for this small example. Next we'll evaluate it on the full 120 tuple TrainingSet. in NOTse | ve vi spLN spWD pdLN pdWD
3
pWDcutoff=7 for se/NOTse
setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa spLN spWD pdLN pdWD versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor spLN spWD pdLN pdWD virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica spLN spWD pdLN pdWD >7 NOTse 7 se 100% correct
4
pWDcutoff=15.9 for ve/vi on NOTse.
versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor spLN spWD pdLN pdWD virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica spLN spWD pdLN pdWD 15.9 so classified as ve and correctly, except the 4 red cases >>15.9 so classified as vi correctly except for 3 red cases (120-7)/120 = 94.1% correct with one epoch requiring only two mask pTree evaluations using the Fei Pan formula.
5
Examining means and stds again for NONse, ve classification using pdWD, we see that 2 stds right from the ve-mean still does not overlap with 1 std left from the vi mean. Thus this might be a better cutoff choice (namely, *1.6 = 17) versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor spLN spWD pdLN pdWD virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica spLN spWD pdLN pdWD ve MNs se sepalLN sepalWD pedalLN pedalWD vi 17 so classified as ve and correctly, except the 1 red cases >>17 correct except for 4 reds vi STDs se ve (120-5)/120 = 95.8% correct with one epoch requiring only two mask pTree evaluations using the Fei Pan formula.
6
Going back to se/~se classification, even though we get 100% accuracy, how do we know that without answers? setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa spLN spWD pdLN pdWD versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor spLN spWD pdLN pdWD virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica spLN spWD pdLN pdWD >7 ~se >7 ~se 7 se A better algorithm might look for radii of 3 stds (99.7% for normal distr). For se/~se, on pWD, we can go 4.94 stds before there is interval overlap ( % confidence with a cutoff of 5.89). Thus we can be completely confident in se/~se. For ve/~ve in ~se, using pWD, can go 1.5 stds (cutoff 16.2) before ri overlap (~87%). From the previous slide we see that we get 6 incorrectly classified iris samples. 6/80 = 92% so that is very close to what is expected (must be fairly normally distributed data). versicolor MEANS setosa sepalLN sepalWD pedalLN pedalWD virginica From this TrainingSetMeansStdsTable we can also isolate those that are suspicious in terms of their classification. Iff a sample falls in the overlap of an ri intervals of a relevant attribute, then the radius of those intervals (# of stds used) plus the relevance of the attribute in general, constitute a way to measure the suspiciousness of the prediction. Suspicious predictions might be subjected to other methods such as CNN or ARM or ???? virginica STDs setosa versicolor
7
The pTree-k-Means-Classification-Vanilla (pkmc-v) algorithm is as follows: In the TrainingSet,
1. For each attribute, calc the mean for each class and for each attribute, sort classes asc. 2. Calculate gaps (diff of consec. means) and all relative_gaps, rg's, (gap/mn) 3. Use the max of the sum of a mean's 2 rg's to identify the best attribute and the best class_k/NOT_class_k classification (which should be the current most unambiguous classification.) 4. Repeat 3 above on NOT_class_k until NOT_class_k is empty. 5. Repeat 1,2,3,4 until means stop changing (much). The pTree-k-Means-Classification-Divisive-means-only (pkmc-dm) algorithm is as follows: In the TrainingSet 1. For each attribute, calc the mean for each class, sort classes ascending by mean. 2. Calculate mean_gaps (diff of consecutive means) and each relative_mean_gap, (rmg), (mean_gaps/mean) 3. Choose class (and attr) with max rmg. Use Fei Pan's formula with c = mean gap/2 to separate that class from NOTclass 4. Repeat 3 above on NOTclass until NOTclass is empty. 5. Repeat 1,2,3,4 until means stop changing (much). The pTree-k-Means-Classification-Divisive-means-stds (pkmc-dms) algorithm is as follows: In the TrainingSet 1. For each attribute, calc the mean for each class, sort classes ascending by mean. 2. Calc mean_gaps (difference of consecutive means) and each relative_mean_gap, rmg, (mean_gap/mean) (in stds) 3. In each gap, find the number, x, of stds so that cutpoint= mean1+x*std1=mean2-std2 (solve a system of two simple linear equations). Choose mean with max x, divide it;s cluster into two using Fei Pan Cutoff = cutpoint. 4. Repeat 3 above until all clusters are singleton classes. 5. Repeat 1,2,3,4 until means stop changing (much). The pTree-k-Means-Classification-Relevancy-Intervals (pkmc-ri) algorithm is as follows: 1. Calculate all means and stds for all feature columns and classes in a training set. 2. In each column, calculate max radii s.t. there is zero gap between each consecutive relevancy interval. 3. Add adjacent interval radii (either side of a given mean) to get the set of consecutive-interval radii (cir). Sort cir descending. Use the max cir to identify the first class_k/NOT_class_k classification (which should be the most unambiguous classification.) 4. Repeat 2 and 3 above on NOT_class_k until NOT_class_k is empty. 5. Repeat 1,2,3,4 until means stop changing (much).
8
pkmc-v pg 1 sepalLN sepalWD pedalLN pedalWD MEANS
ve ve ve ve ve ve ve ve ve ve vi vi vi vi vi vi vi vi vi vi sepalLN sepalWD pedalLN pedalWD MEANS se ve vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi ordered sLN mns se 44.1 ve 61 vi 65.7 ordered sWD mns ve 28.7 vi 29.4 se 33.1 ordered pLN mns se 14.5 ve 43.7 vi 57.7 ordered pWD mns se 2.2 ve 13.8 vi 20.4 se_ve ve_vi se_ve se_ve GAPS ve_vi vi_se ve_vi ve_vi se_ve-se ve_vi-ve se_ve-se se_ve-se 5.27 se_ve-ve ve_vi-vi se_ve-ve se_ve-ve 0.84 ve_vi-ve vi_se-vi ve_vi-ve ve_vi-ve 0.48 ve_vi-vi vi_se-se ve_vi-vi ve_vi-vi 0.32 RelativeGaps ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve pWD_se_ve-se has the max RelativeGap and there is no mean on the other side of se, so we start with se/~se classification using threshold = se_mean-gap/2 = /2 = 4.84 pkmc-v pg 1
9
pkmc-v pg 2 sepalLN sepalWD pedalLN pedalWD ve 61 28.7 42.7 13.8
vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi ve ve ve ve ve ve ve ve ve ve vi vi vi vi vi vi vi vi vi vi sepalLN sepalWD pedalLN pedalWD ve vi MEANS ordered sLN mns ve 61 vi 65.7 ordered sWD mns ve 28.7 vi 29.4 ordered pLN mns ve 43.7 vi 57.7 ordered pWD mns ve 13.8 vi 20.4 ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve se_ve ve_vi se_ve se_ve ve_vi ve_vi ve_vi ve_vi GAP ve_vi ve_vi ve_vi ve_vi .47 RelativeGaps pWDve_vi has the max RelativeGap so for ve/~ve (aka ve/vi) classification use threshold= e_mean-gap/2= /2 = 16 74/80 accuracy here Overall, 114/120 = 95% pkmc-v pg 2
10
se se se se se se se se se se se se se se se se se se se se se se se se se se se se se se se se se se se se se se se se se se se se se se se se se pkmc-dm p1 ve ve ve ve ve ve ve ve ve ve pWD_se_ve-se th=( )/2 =8 divides {se,ve,vi} into {se}, {ve,vi}. vi vi vi vi vi vi vi vi vi vi 1. For each attr, calc mean for each class, sort classes asc by mean. 2. Calc mean_gaps (diff of consec means). For each mean, relative_mean_gap, rmg, (mn_gap/mn) (in stds?) 3. Choose max rmg, divide cluster using Cutoff = midpt of gap 4. Repeat 3 above on NOT_class_k until every cluster is 1 class. vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi vi sepalLN sepalWD pedalLN pedalWD MEANS se ve vi sorted sLN mns se 44.1 ve 61 vi 65.7 sorted sWD mns ve 28.7 vi 29.4 se 33.1 sorted pLN mns se 14.5 ve 43.7 vi 57.7 sorted pWD mns se 2.2 ve 13.8 vi 20.4 se_ve ve_vi se_ve se_ve Mean ve_vi vi_se ve_vi ve_vi Gaps ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve ve gap mn rmg se_ve-se ve_vi-ve se_ve-se se_ve-se 5.27 se_ve-ve ve_vi-vi se_ve-ve se_ve-ve 0.84 ve_vi-ve vi_se-vi ve_vi-ve ve_vi-ve 0.48 ve_vi-vi vi_se-se ve_vi-vi ve_vi-vi 0.32 pWD_ve_vi-ve th=0.48 divides {ve,vi} into {ve}, {vi}. Done
11
pkmc-dm p2 1. For each attr, calc mean for each class, sort classes asc by mean. 2. Calc mean_gaps (diff of consec means). For each mean, relative_mean_gap, rmg, (mn_gap/mn) (in stds?) 3. Choose max rmg, divide cluster using Cutoff = mean rmg/2 4. Repeat 3 above on NOT_class_k until every cluster is 1 class. 5. Repeat 1,2,3,4 until means top changing (much?). In the DoD problem we have 7 classes {RedCars, WhiteCars, BlueCars, Grass, Pavement, Shadows, Trees}? Thus on the first sub-epoch (first time through 1,2,3,4) we separate the 7 into two clusters (e.g., of 3 and 4 resp.). Then we apply 5 to both clusters, just as is done in the DIANA hierarchical clustering methods.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.