Presentation is loading. Please wait.

Presentation is loading. Please wait.

FAUST{pms,std} (FAUST{pms} using # gap std

Similar presentations


Presentation on theme: "FAUST{pms,std} (FAUST{pms} using # gap std"— Presentation transcript:

1 FAUST{pms,std} (FAUST{pms} using # gap std
FAUST{pdq,mrk} (FAUST{pdq} w max rank_k) 0k1 rank_k(S) is smallest vS s.t. k fraction of S v. (except for k=0 then we use <v) RC=Remaining_Classes (initially all classes) with pTree, PRC (initially pure1). FAUST{pdq,gap} (FAUST{p} divisive, quiet (no noise) using gaps 0. attr, A TA(class, rv, gap) ordered on rv asc (rv is class rep val, gap=dist to next rv.  attr, A TA(class, md, k, cp) its attribute table ordered on md asc, where k s.t. it's max k value s.t. set_rank_k of class and set_rank_(1-k)' of the next class. (note: the rank_k for k=1/2 is median, k=1 is maximum and k=0 is the min. Same algorithm can clearly be used as a pms, that is; FAUST{pms,mrk} WHILE RC not empty, DO 1. Find the TA record with maximum gap: 2. Use PA>c (c=rv+gap/2) to divide RC at c into LT, GT (pTrees, PLT and PGT). 3. If LT or GT singleton {remove that class from RC and from all TA's END_DO FAUST{pdq,std} (FAUST{pdq} using # of gap standard devs) 0. For each attribute, A TA(class, mn, std, n, cp) is its attribute table ordered on n asc, where cp=val in gap allowing max # of stds, n. n satisfies: mean+n*std=meanG-n*stdG so n=(mnG-mn)/(std+stdG) WHILE RC not empty, DO 1. Find the TA record with maximum n: 2. Use PA>cp to divide RC at cp=cutpoint into LT and GT (pTree masks, PLT and PGT). 3. If LT or GT singleton {remove that class from RC and from all TA's} END_DO FAUST{pms,gap} (FAUST{p} m attr cut_pts, seq class separation (1 class at time, m=1 0. For each A, TA(class, rv, gap, avgap), where avgap is avg of gap and previous_gap (if 1st avgap = gap). If x classes. DO x-1 times 1. Find the TA record with maximum avgap: 2. cL=rv-prev_gap/2. cG=rv+gap/2, masks Pclass=PA>cL&PAcG&PRC PRC=P'class&PRC (If 1st in TA (no prev_gap), Pclass=PAcG&PRC. Last, Pclass=PA>cL&PRC. 3. Remove that class from RC and from all TA's END_DO FAUST{pms,std} (FAUST{pms} using # gap std In FAUST_pdq the mrk alg should be at least as good as gap and std but should also be better in the following situation: 0. attr, A TA(class, mn, std, n, avgn, cp) ordered avgn asc mrk should beat gap (and std?) taking midpt of a neg gap will be wrong because difference in distrs (e.g., in 1 class normal other power). cp=cut_point (value in gap which allows max # of stds, n, (n satisfies: mn+n*std=mnnext-n*stdnext so n=(mnnext-mn)/(std+stdt) DO x-1 times 1. Find the TA record with maximum avgn: In FAUST_pdq If there is a gap, mrk will always find it. If there isn't and distributions are different, it should perform much better than the other two. 2. cL=rv-prev_gap/2. cG=rv+gap/2 and pTree masks Pclass=PA>cL& PAcG&PRC PRC =P'class&PRC (If class 1st in TA (has no prev_gap), then Pclass =PAcG&PRC. If last, Pclass =PA>cL&PRC.) 3. Remove that class from RC and from all TA's END_DO rank_.9 gap or std

2 1. For every attr and every class, sort the values asc.
44 46 47 49 50 54 20 23 24 27 28 29 31 32 33 13 14 15 17 1 2 3 4 FAUST{pdq,mrk} algorithm, demonstrated with VPHD, Vertical Processing, Horizontal Data first : 1. For every attr and every class, sort the values asc. 2. Find and order the medians asc in TA tables. 3. Find max k s.t. rank_k_setrank_(1-k)_set =. rank_.7 rank_.7 rank_.8 rank_.9 rank_1 rank_1 rank_1 4. Proceed as in all FAUST algorithms - cut accordingly (pdq or pms or ???). With VPHD, sort each class in each attr, find medians (needed?), find rank_k_sets (combine this with sorting?) ... so O(n). With HPVD, we can avoid the sorting, find rank_k_sets (median is rank_.5), fill TAs entirely with a pTree program O(0). 49 50 52 55 57 63 64 65 66 69 rank_0 25 27 29 30 32 36 33 35 39 40 45 46 47 49 rank_0 10 13 14 15 16 rank_0 rank_.1 rank_.2 rank_.3 rank_.3 rank_.7 rank_.8 rank_.9 rank_.9 49 58 63 65 67 71 72 73 76 29 30 31 32 34 36 37 39 45 51 56 58 59 61 63 66 17 18 19 20 21 22 25 rank_.1 rank_.1 rank_.2 rank_.3 HPVD_mrk could be made optimal since we could record exactly which k and cp gives min error (as we work toward empty rank_k_set intersection) and we could know the error set. We could use CkNN or ? on each errant sample. To see this, go through the first k/cp animation. In that looping procedure it's clear we could determine se<55 with 3 errors to be the best cp (se<54, 6 errors; se<52, 5; se<50, 5; se<49, 6 ). Note: mrk above is lazy. It takes cp to be the average of the rank values - in this case cp=53 which has 6 errors. TsLN cl md k cp se ve vi 66 TsWD cl md k cp ve vi se 33 TpLN cl md k cp se ve vi 58 TpWD cl md k cp se ve vi 20 .7 53 .7 29 1.0 25 1.0 7 .7 64 .8 30 .9 49 .9 16 One can see from this animation that MaxGap is probably a pretty good method most of the time (provided there is at least one good gap each step) and the MaxGapStd is even better (same proviso). This method is intended to be optimal and to deal with, e.g., non-normal distributions.

3 maximum c=0; max=0;Pc=pure1; For i=4..0 { c=rc(Pc&Patt,i) if (c>0)
se se se se se se se se se se se se se se se se se se se se se se se se se se se se se se 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Pc = Ppw1 1 1 Ppw3 1 Ppw0 1 Ppw2 1 Ppw4 1 & Pc rc=10 max = 24 + 23 + 20 rc=1 rc=0 rc=1 c=0; max=0;Pc=pure1; For i=4..0 { c=rc(Pc&Patt,i) if (c>0) Pc=Pc&Patt,i max=max+2i } return max; maximum

4 minimum c=0; min=0;Pc=pure1; For i=4..0 { c=rc(Pc&P'att,i) if (c>0)
se se se se se se se se se se se se se se se se se se se se se se se se se se se se se se 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Pc = P'pw1 1 1 P'pw3 1 P'pw0 1 P'pw4 1 P'pw2 1 & Pc rc>0 rc=0 min = 20 c=0; min=0;Pc=pure1; For i=4..0 { c=rc(Pc&P'att,i) if (c>0) Pc=Pc&P'att,i else min=min+2i } return min; minimum

5 rankK (Kth largest) c=0; rank5=0; pos= 5; Pc=pure1;
se se se se se se se se se se se se se se se se se se se se se se se se se se se se se se 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Pc = Ppw1 1 1 P'pw3 P'pw1 1 1 Ppw0 1 Ppw3 1 Ppw2 1 Ppw4 1 & Pc rc=10 rc=1 rc=1 rc=3 rc=4 1 P'pw4 1 P'pw2 1 P'pw0 c=0; rank5=0; pos= 5; Pc=pure1; For i= //current_i = 4 { c=rc(Pc&Patt,i); if (cpos) rankK = rankK + 2i; Pc=Pc&Patt,i ; else pos = pos - c; Pc=Pc&P'att,i ; } } return rankK; 4 3 rankK =0 + 24 +22 1 2 3 rankK (Kth largest)

6 Ppw4 P44 &P10 0/16 APPENDIX masks &P01 10/14 P10 P01 1 1
1 Ppw4 P44 1 &P10 0/16 se se se se se se se se se se se se se se se se se se se se se se se se se se se se se se 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 APPENDIX masks 1 &P01 10/14 1 P10 1 P01

7 P44 &P10 0/16 P43 &P1000 0/8 & P0100 6/8 &P0010 4/8 &P0001 1/6 masks
1 P44 1 &P10 0/16 1 P43 1 &P1000 0/8 se se se se se se se se se se se se se se se se se se se se se se se se se se se se se se 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 & P0100 6/8 1 &P0010 4/8 1 &P0001 1/6 masks 1 &P01 10/14 1 P10 1 P01 1 P1000 1 P0100 1 P0010 1 P0001

8 maximum c=0; max=0;Pc=pure1; For i=4..0 { c=rc(Pc&Patt,i) if (c>0)
se se se se se se se se se se se se se se se se se se se se se se se se se se se se se se 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Pc = 1 Ppw4 max = 24 + 23 + 20 rc=1 rc=0 rc=1 1 & Pc rc=10 1 Ppw0 1 Ppw3 1 Ppw2 Ppw1 1 1 Ppw0 c=0; max=0;Pc=pure1; For i=4..0 { c=rc(Pc&Patt,i) if (c>0) Pc=Pc&Patt,i max=max+2i } return max; maximum


Download ppt "FAUST{pms,std} (FAUST{pms} using # gap std"

Similar presentations


Ads by Google