Presentation is loading. Please wait.

Presentation is loading. Please wait.

Association Rules Hawaii International Conference on System Sciences (HICSS-40) January 2007 David L. Olson Yanhong Li.

Similar presentations


Presentation on theme: "Association Rules Hawaii International Conference on System Sciences (HICSS-40) January 2007 David L. Olson Yanhong Li."— Presentation transcript:

1 Association Rules Hawaii International Conference on System Sciences (HICSS-40) January 2007 David L. Olson Yanhong Li

2 Fuzzy Association Rules Association rules mining provides information to assess significant correlations in large databases IF X THEN Y –Initial data mining analysis –Not predictive SUPPORT: degree to which relationship appears in data CONFIDENCE: probability that if X, then Y

3 Association Rule Algorithms APriori Agrawal et al., 1993; Agrawal & Srikant, 1994 –Find correlations among transactions, binary values Weighted association rules Cai et al., 1998; Lu et al. 2001 Cardinal data Srikant & Agrawal, 1996 –Partitions attribute domain, combines adjacent partitions until binary

4 Fuzzy Analysis Deal with vagueness & uncertainty Fuzzy Set Theory –Zadeh [1965] Probability Theory –Pearl [1988] Rough Set Theory –Pawlak [1982] Set Pair Theory –Zhao [2000]

5 Fuzzy Association Rules Most based on APriori algorithm Treat all attributes as uniform Can increase number of rules by decreasing minimum support, decreasing minimum confidence –Generates many uninteresting rules –Software takes a lot longer

6 Gyenesei (2000) Studied weighted quantitative association rules in fuzzy domain –With & without normalization –NONNORMALIZED Used product operator to define combined weight and fuzzy value If weight small, support level small, tends to have data overflow –NORMALIZED Used geometric mean of item weights as combined weight Support then very small

7 Algorithm Get membership functions, minimum support, minimum confidence Assign weight to each fuzzy membership for each attribute (categorical) Calculate support for each fuzzy region If support > minimum, OK If confidence > minimum, OK If both OK, generate rules

8 Demo Model: Loan App CaseAgeIncomeRiskCreditResult 12052623-38954Red0 22623047-23636Green1 3465681045669Green1 43138388-7968Amber1 52880019-35125Green1 62174561-47592Green1 7466534158119Green1 82546504-30022Green1 9386573530571Green1 102726047-6Red1

9 Fuzzified Age Figure 2: The membership functions of attibute Age 0 0.2 0.4 0.6 0.8 1 1.2 025354050100 Age Membership value YoungMiddleOld

10 Fuzzify Age CaseAgeYoungMiddleOld 1201.00000 2260.90.10 34600.40.6 4310.40.60 5280.70.30 621100 74600.40.6 825100 938010 10270.80.20

11 Calculate Support for Each Pair of Fuzzy Categories Membership value –Identify weights for each attribute –Identify highest fuzzy membership category for each case Membership value = minimum weight associated with highest fuzzy membership category Support –Average membership value for all cases

12 Support by Single Item CategoryWeightSup(R jk ) Age YoungR 11 0.450.261 Age MiddleR 12 0.450.135 Age OldR 13 0.450.059 Income HighR 21 0.550.000 Income MiddleR 22 0.550.490 Income LowR 23 0.550.060 Risk HighR 31 0.700.320 Risk MiddleR 32 0.700.146 Risk LowR 33 0.700.233 Credit GoodR 41 0.800.576 Credit BadR 42 0.800.244

13 Support If support for pair of categories is above minimum support, retain Identifies all pairs of fuzzy categories with sufficiently strong relationship For outcomes, R 51 (On Time) strong, R 52 (Default) not

14 Support by Pair: minsup 0.25 R 11 R 22 0.235R 22 R 41 0.419 R 11 R 31 0.207R 22 R 51 0.449 R 11 R 41 0.212R 31 R 41 0.266 R 11 R 51 0.230R 31 R 51 0.264 R 22 R 31 0.237R 41 R 51 0.560

15 Support by Triplet: minsup 0.25 R 22 R 41 R 51 0.417 R 22 R 31 R 41 0.198 R 22 R 31 R 51 0.196 R 31 R 41 R 51 0.264

16 Quartets None qualify, so algorithm stops

17 Confidence Identify direction For those training set cases involving the pair of attributes, what proportion came out as predicted?

18 Confidence Values: Pairs Minimum confidence 0.9 R 22 →R 41 0.855R 41 R 22 →R 51 0.995 R 41 →R 22 0.727R 41 R 51 →R 22 0.744 R 22 →R 51 0.916R 22 R 51 →R 41 0.928 R 51 →R 22 0.697R 31 R 41 →R 51 0.993 R 31 →R 41 0.831R 31 R 51 →R 41 1.000 R 41 →R 31 0.462R 51 R 41 →R 31 0.472 R 31 →R 51 0.825 R 51 →R 31 0.410 R 41 →R 51 0.972 R 51 →R 41 0.870

19 4 Rules IF Income is Middle THEN Outcome is On-Time –R 22 →R 51 support 0.490confidence 0.916 IF Credit is Good THEN Outcome is On-Time –R 41 →R 51 support 0.576confidence 0.972 IF Income is Middle AND Credit is Good THEN Outcome is On-Time –R 22 R 41 →R 51 support 0.419confidence 0.995 IF Risk is High AND Credit is Good THEN Outcome is On-Time –R 31 R 41 →R 51 support 0.266confidence 0.993

20 Rules vs. Support

21 Rules vs. Confidence

22 Higher order combinations Try triplets –If ambitious, sets of 4, and beyond Here, none Problems: –Computational complexity explodes –Doesn’t guarantee total coverage That also would explode complexity Can control by lowering minsup, minconf

23 Simulation Testing Selected 550 cases –Held out 100 Randomly assigned weights to each fuzzy region of each attribute –minsup {0.35, 0.45, 0.55, 0.65} –minconf {0.7, 0.8, 0.9}

24 Simulation Results


Download ppt "Association Rules Hawaii International Conference on System Sciences (HICSS-40) January 2007 David L. Olson Yanhong Li."

Similar presentations


Ads by Google