Entropy-based & ChiMerge Data Discretization Feb. 12, 2008 Team #4: Seunghyun Kim Craig Dunham Suryo Muljono Albert Lee
Entropy-based discretization Table 6.1 Class-labeled training tuples from the AllElectronics customer database (page 299). RIDageincomeStudentCredit_ratingClass: buy_computer 1YouthHighNoFaireNo 2YouthHighNoExcellentNo 3Middle_ageedHighNoFaireYes 4SeniorMediumNoFaireYes 5SeniorLowYesFaireYes 6SeniorLowYesExcellentNo 7Middle_agedLowYesExcellentYes 8YouthMediumNoFaireNo 9YouthLowYesFaireYes 10SeniorMediumYesFaireYes 11YouthMediumYesExcellentYes 12Middle_ageedMediumNoExcellentYes 13Middle_ageedHighYesFaireYes 14SeniorMediumNoExcellentNo
Entropy-based (Cont’d) Information gain Info(D) = = bits Info age (D) = = bits
Entropy-based (Cont’d) Gain(A) = Info(D) – Info A (D). Gain(age) = Info(D) – Info age (D) = – = bits Gain(income)= Info(D) – Info income (D) = – = bits Gain(student)= Info(D) – Info student (D)= – = bits Gain(credit) = Info(D) – Info credit (D) = – = 0.04 bits
Entropy-based (Cont’d) AllElectronics customer database Age ? SeniorMiddle_ageYouth
Entropy-based (Cont’d) AllElectronics customer database Age ? SeniorMiddle Youth Student? Credit? Student Non Student ExcellentFair yes no