Association Rules Hawaii International Conference on System Sciences (HICSS-40) January 2007 David L. Olson Yanhong Li
Fuzzy Association Rules Association rules mining provides information to assess significant correlations in large databases IF X THEN Y –Initial data mining analysis –Not predictive SUPPORT: degree to which relationship appears in data CONFIDENCE: probability that if X, then Y
Association Rule Algorithms APriori Agrawal et al., 1993; Agrawal & Srikant, 1994 –Find correlations among transactions, binary values Weighted association rules Cai et al., 1998; Lu et al Cardinal data Srikant & Agrawal, 1996 –Partitions attribute domain, combines adjacent partitions until binary
Fuzzy Analysis Deal with vagueness & uncertainty Fuzzy Set Theory –Zadeh [1965] Probability Theory –Pearl [1988] Rough Set Theory –Pawlak [1982] Set Pair Theory –Zhao [2000]
Fuzzy Association Rules Most based on APriori algorithm Treat all attributes as uniform Can increase number of rules by decreasing minimum support, decreasing minimum confidence –Generates many uninteresting rules –Software takes a lot longer
Gyenesei (2000) Studied weighted quantitative association rules in fuzzy domain –With & without normalization –NONNORMALIZED Used product operator to define combined weight and fuzzy value If weight small, support level small, tends to have data overflow –NORMALIZED Used geometric mean of item weights as combined weight Support then very small
Algorithm Get membership functions, minimum support, minimum confidence Assign weight to each fuzzy membership for each attribute (categorical) Calculate support for each fuzzy region If support > minimum, OK If confidence > minimum, OK If both OK, generate rules
Demo Model: Loan App CaseAgeIncomeRiskCreditResult Red Green Green Amber Green Green Green Green Green Red1
Fuzzified Age Figure 2: The membership functions of attibute Age Age Membership value YoungMiddleOld
Fuzzify Age CaseAgeYoungMiddleOld
Calculate Support for Each Pair of Fuzzy Categories Membership value –Identify weights for each attribute –Identify highest fuzzy membership category for each case Membership value = minimum weight associated with highest fuzzy membership category Support –Average membership value for all cases
Support by Single Item CategoryWeightSup(R jk ) Age YoungR Age MiddleR Age OldR Income HighR Income MiddleR Income LowR Risk HighR Risk MiddleR Risk LowR Credit GoodR Credit BadR
Support If support for pair of categories is above minimum support, retain Identifies all pairs of fuzzy categories with sufficiently strong relationship For outcomes, R 51 (On Time) strong, R 52 (Default) not
Support by Pair: minsup 0.25 R 11 R R 22 R R 11 R R 22 R R 11 R R 31 R R 11 R R 31 R R 22 R R 41 R
Support by Triplet: minsup 0.25 R 22 R 41 R R 22 R 31 R R 22 R 31 R R 31 R 41 R
Quartets None qualify, so algorithm stops
Confidence Identify direction For those training set cases involving the pair of attributes, what proportion came out as predicted?
Confidence Values: Pairs Minimum confidence 0.9 R 22 →R R 41 R 22 →R R 41 →R R 41 R 51 →R R 22 →R R 22 R 51 →R R 51 →R R 31 R 41 →R R 31 →R R 31 R 51 →R R 41 →R R 51 R 41 →R R 31 →R R 51 →R R 41 →R R 51 →R
4 Rules IF Income is Middle THEN Outcome is On-Time –R 22 →R 51 support 0.490confidence IF Credit is Good THEN Outcome is On-Time –R 41 →R 51 support 0.576confidence IF Income is Middle AND Credit is Good THEN Outcome is On-Time –R 22 R 41 →R 51 support 0.419confidence IF Risk is High AND Credit is Good THEN Outcome is On-Time –R 31 R 41 →R 51 support 0.266confidence 0.993
Rules vs. Support
Rules vs. Confidence
Higher order combinations Try triplets –If ambitious, sets of 4, and beyond Here, none Problems: –Computational complexity explodes –Doesn’t guarantee total coverage That also would explode complexity Can control by lowering minsup, minconf
Simulation Testing Selected 550 cases –Held out 100 Randomly assigned weights to each fuzzy region of each attribute –minsup {0.35, 0.45, 0.55, 0.65} –minconf {0.7, 0.8, 0.9}
Simulation Results