SampleFK ChiMerge Discretization Statistical approach to Data Discretization Applies the Chi Square method to determine the probability of similarity of data between two intervals.
SampleFK {0,2} {2,5} {5,7.5} {7.5,8.5} {8.5,10} {10,17} {17,30} {30,38} {38,42} {42,45.5} {45.5,52} {52,60} Intervals ChiMerge Discretization Example Sort and order the attributes that you want to group (in this example attribute F). Start with having every unique value in the attribute be in its own interval.
SampleFK SampleK=1K= total202 SampleK=1K= total112 ChiMerge Discretization Example Begin calculating the Chi Square test on every interval
SampleK=1K= total202 SampleK=1K= total112 E 11 = (1/2)*1 =.05 E 12 = (1/2)*1 =.05 E 21 = (1/2)*1 =.05 E 22 = (1/2)*1 =.05 E 11 = (1/2)*2 = 1 E 12 = (0/2)*2 = 0 E 21 = (1/2)*2 = 1 E 22 = (0/2)*2 = 0 X 2 = (0-.5) 2 /.5 + (0-.5) 2 /.5 + (0-.5) 2 /.5 + (0-.5) 2 /.5 = 2 X 2 = (1-1) 2 /1+(0-0) 2 /0+ (1-1) 2 /1+(0-0) 2 /0 = 0 Threshold.1 with df=1 from Chi square distribution chart merge if X 2 < ChiMerge Discretization Example
SampleFK {0,2} {2,5} {5,7.5} {7.5,8.5} {8.5,10} {10,17} {17,30} {30,38} {38,42} {42,45.5} {45.5,52} {52,60} Intervals Chi ChiMerge Discretization Example Calculate all the Chi Square value for all intervals Merge the intervals with the smallest Chi values
SampleFK {0,2} {2,5} {5,10} {10,30} {30,38} {38,42} {42,60} Intervals Chi 2 ChiMerge Discretization Example Repeat
SampleFK {0,5} {5,10} {10,30} {30,42} {42,60} Intervals Chi 2 ChiMerge Discretization Example Again
SampleFK {0,5} {5,10} {10,30} {42,60} Intervals Chi 2 ChiMerge Discretization Example Until
SampleFK {0,10} {10,30} {42,60} Intervals Chi 2 ChiMerge Discretization Example There are no more intervals that can satisfy the Chi Square test.