Presentation is loading. Please wait.

Presentation is loading. Please wait.

PA>c = Pm om ... Pk+1 ok+1 Pk

Similar presentations


Presentation on theme: "PA>c = Pm om ... Pk+1 ok+1 Pk"— Presentation transcript:

1 PA>c = Pm om ... Pk+1 ok+1 Pk
Dr. Fei Pan's theorem (pg 39) Let A be the jth column of a data-table and Pm, ..., P0 the basic pTrees of A. For constant, c10 = ( bm...b0 )2 PA>c = Pm om Pk+1 ok+1 Pk c = bm bk b0 1. oi is AND if bi= oi is OR if bi=0. 2. k is rightmost bit position with bit-value "0" 3. the operators are right binding. PAc =(PA>c)'=(Pm om (Pm-1 om (Pk+1 ok+1 Pk )...))' c = bm bm bk b0 sepLN sepWD pedLN pedWD se se se se se se se se se se ve ve ve ve ve ve ve ve ve ve vi vi vi vi vi vi vi vi vi vi Last week we developed an algorithm in which we 1. calculated each class mean of each column for each column, ordered those means, then used the quantities roof(midpt_of_consec_means) as cutoff thresholds between consecutive features. 3. constructed one mask pTree for each inequality (using EIN theory). ANDed the two pTrees either side of the column to get a mask for each column of an unclassified tuple predicted to be in that class This week we're going to do it differently 1. calculated class means and stds in each column. 2. for each TrainingSet column, order means, define left_relevancy_pt (lrp)=mean-2*std right_relevancy_pt (rrp)=mean+2*std, relevancy interval = [ lrp, rrp ) 3. Construct two UnclassifiedSet mask pTrees for each feature column and class (one if the feature has the smallest or the largest mean in that column - that is to say, one mask pTree for each inequality (using EIN). Classify based on the sum of larc, rarc gap weighted votes. se se se se se ve ve ve ve ve vi vi vi vi vi Take out 10 tuples from each class as TrainSet. Calculate: sepLN sepWD pedLN pedWD se means se stds ve means ve stds vi means vi stds Take 5 remaining from each class, remove class and predict

2 Using relavancy-intervals-puck-muck, 95% coverage (2*std),
we get se/NOTse relevance in pedLN and pedWD only (but very relevant!). We then use ~midpt cutoffs, 25 and 7. 17.4, , , , se 47.2, , , , ve 50.4, , , , vi sepal sepal sepal sepal pedal pedal pedal pedal [LNlr, LNrr] [WDlr, WDrr] [LNlr, LNrr] [WDlr, WDrr] 2*std ve means se means sepLN sepWD pedLN pedWD vi means Using means-puck-muck we could also conclude pedLN and pedWD are relevant, but it's not clear sepLN is irrelevant (sepWD looks fairly irrelevant tho). Next we will use rel-int-pk-mk (ripm) on NOTse to differentiate ve and vi. Using 2*std we get no relevant columns (because the intervals overlap severely in all three columns). So we reduce 2 to 1.5 and get relevancy intervals: 50.6, , , , 16.2 54.2, , , , and then use cutoffs, 50 and 16. (1.5 give ~82%) se se se se se ve ve ve ve ve vi vi vi vi vi se/NOTse predictions | se not spLN spWD pdLN pdWD ve ve ve ve ve vi vi vi vi vi The method is 100% accurate for this small example. Next we'll evaluate it on the full 120 remaining tuples using pdLNcutoff=25, pdWDcutoff=7 for se/NOTse and pdLNcutoff=50, pdWDcutoff=16 for ve/vi in NOTse. in NOTse, ve/vi predictions | ve vi spLN spWD pdLN pdWD

3 <-Since pdLN dominates, ve would be the best guess.
120 remaining tuples use pdLNcutoff=25,pdWDcutoff=7 for se/NOTse and pdLNcutoff=50,pdWDcutoff=16 for ve/vi in NOTse. setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa spLN spWD pdLN pdWD versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor spLN spWD pdLN pdWD virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica spLN spWD pdLN pdWD <-Since pdLN dominates, ve would be the best guess. Since pdLN dominates, vi would be the best guess.-> -> All 40 se correct Thus, 11 out of 120 incorrectly classified. Accuracy is 109/120 or 91% with one epoch. And with the best guesses, 113/120 or 94%


Download ppt "PA>c = Pm om ... Pk+1 ok+1 Pk"

Similar presentations


Ads by Google