Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mining Multiple-level Association Rules in Large Databases

Similar presentations


Presentation on theme: "Mining Multiple-level Association Rules in Large Databases"— Presentation transcript:

1 Mining Multiple-level Association Rules in Large Databases
IEEE Transactions on Knowledge and Data Engineering, 1999 Authors : Jiawei Han Simon Fraser University, British Columbia. Yongjian Fu University of Missouri-Rolla, Missouri. Presented by Christopher Hutchinson 1 1 1

2 Outline What is MLAR? A Method For Mining M-L Association Rules
Concepts Motivation A Method For Mining M-L Association Rules Problems/Solutions Definitions Algorithm Example Interestingness Optimizations Conclusions/Future Work Exam Questions 2 2

3 Outline What is MLAR? A Method For Mining M-L Association Rules
Concepts Motivation A Method For Mining M-L Association Rules Problems/Solutions Definitions Algorithm Example Interestingness Optimizations Conclusions/Future Work Exam Questions 3 3

4 What is MLAR? What's the difference between the following rules:
Rule A →70% of customers who bought diapers also bought beer Rule B →45% of customers who bought cloth diapers also bought light beer Rule C →35% of customers who bought Pampers also bought Bud Light beer. 4 4

5 What is MLDM? Rule A applies at a generic higher level of abstraction: Product Rule B applies at a more specific level of abstraction: Category Rule C applies at the lowest level of abstraction: Brand This process is called drilling down. 5 5

6 What Do We Gain? Concrete rules allow for More targeted marketing
New marketing strategies More concrete relationships 6 6

7 Hierarchy Types Generalization/Specialization (is a relationships)
Is a With Multiple Inheritance Whole-Part hierarchies (is-part-of; has-part) 7 7

8 Generalization to Specialization
Is A relationship Vehicle 4-Wheels 2-Wheels Sedan SUV Bicycle Motorcycle 8 8

9 Is A With Multiple Inheritance
Recreational Snowmobile Bicycle Vehicle Commuting Car 9 9

10 Whole-Part Hierarchies
Computer Motherboard Hard Drive RAM CPU RW Head Platter 10 10

11 Outline What is MLAR? A Method For Mining M-L Association Rules
Concepts Motivation A Method For Mining M-L Association Rules Problems/Solutions Definitions Algorithm Example Interestingness Conclusions/Future Work Exam Questions 11 11

12 MLAR: Main Goal As usual we are trying to develop a method to extract non-trivial, interesting, and strong rules from our transactional database. A method which: Avoids trivial rules (Milk→Bread) Common sense Avoids coincidental rules (Toy→Milk) Low support 12 12

13 What Do We Need? Data Representation At Multiple Levels Of Abstraction
Explicitly stored in databases Provided by experts or users Generated via clustering (OLAP) Efficient Methods for ML Rule Mining focus of this paper 13 13

14 Methods Apply single-level Apriori Algorithm to each of the multiple levels under the same miniconf and minsup. Potential Problems? Higher Levels of abstraction will naturally have higher support, support decreases as we drill down What is the optimal minsup for all levels? Too high a minsup → too few itemsets for lower levels Too low a minsup → too many uninteresting rules 14 14

15 Solutions Adapt a minsup for each level of abstraction
Adapt a minconf for each level of abstraction Do both This paper covers a progressive deepening method developed by extension of the Apriori Algorithm, focused on minsup. 15 15

16 Assumptions (taken by paper)
Explore only the descendants of the frequent items, since we consider if an item occurs rarely, its descendants will occur even less frequently and, thus, are uninteresting. 16 16

17 Problems May eliminate possible interesting rules for itemsets at one level whose ancestors are not frequent at higher levels. If so, can be addressed by 2 workaround solutions 2 minsup values at higher levels – one for filtering infrequent items, the other for passing down frequent items to lower levels; latter called level passage threshold (lph) The lph may be adjusted by user to allow descendants of sub frequent items 17 17

18 Differences From Previous Research
Other studies use same minsup across different levels of the hierarchy This study…. Uses different minsup values at different levels of the hierarchy Analyzes different optimization techniques Studies the use of interestingness measures 18 18

19 Requirements Transactional database must contain:
Item dataset containing item description: {<Ai>,<description>} A transaction dataset T containing set of transactions.. {Tid*,{Ap…Aq}} *Tid is a transaction identifier (key) Ai=Bar codes or could be actual item and model number if bar code not available Tid= a numeric or alphanumeric key 19 19

20 Algorithm

21 Algorithm Flow At Level 1: At subsequent levels:
Generate frequent itemsets Get table filtered for frequent itemsets T[2] At subsequent levels: Generate candidate subsets using Apriori Calculate support for generated candidates Union 'passing' subsets with existing rule set Repeat until no additional rules are generated, or desired level is reached 21 21

22 Outline What is MLAR? A Method For Mining M-L Association Rules
Concepts Motivation A Method For Mining M-L Association Rules Problems/Solutions Definitions Algorithm Example Interestingness Optimizations Conclusions/Future Work Exam Questions 22 22

23 Definitions A pattern or an itemset A is one item Ai or a set of conjunctive items Ai Λ …. Λ Aj The support of a pattern is the number of transactions that contain A vs. the total number of transactions σ(A|S) The confidence of a rule A → B in S is given by: φ(A→B) = σ(AUB)/σ(A) (i.e. conditional probability) Specify 2 thresholds: minsup(σ’) and miniconf (φ’); different values at different levels 23 23

24 Definitions A pattern A is frequent in set S, if:
the support of A is no less than the corresponding minimum support threshold σ’ A rule A → B is strong for a set S, if: each ancestor of every item in A and B is frequent at its corresponding level A Λ B is frequent at the current level and φ(A→B)≥ φ’ (miniconf criteria) This ensures that the patterns examined at the lower levels arise from itemsets that have a high support at higher levels 24 24

25 Outline What is MLAR? A Method For Mining M-L Association Rules
Concepts Motivation A Method For Mining M-L Association Rules Problems/Solutions Definitions Algorithm Example Interestingness Conclusions/Future Work Exam Questions 25 25

26 Example: Taxonomy Level 1: milk bread Level 2: ... ... ... ... 26 26
food Level 1: milk bread Level 2: 2% chocolate white wheat Classification systematic arrangement in groups or categories according to established criteria.An example of a classification would be the division of all animals in the classes tasty, edible and not edible. Taxonomy:Orderly classification according to the object relationships to one another in other words a hierarchy Ontology :a branch of metaphysics concerned with the nature and relations of being or a particular theory about the nature of being or the kinds of existents. This is the philosphy angle. In Information Science we have ontology  In theory, an ontology is a "formal, explicit specification of a shared conceptualisation".[1] An ontology provides a shared vocabulary, which can be used to model a domain — that is, the type of objects and/or concepts that exist, and their properties and relations.[ Level 3: ... ... ... ... Dairyland Foremost Old Mills Wonder Generalized ID System: 2% Foremost Milk Coded as GID:112 (1st item in Level 1, 1st item in level 2, 2nd item in level 3) 26 26

27 Example: Dataset Bar_code GID Category Brand Content Size Price 17325
Table 1: A sales-transaction Table Trans-id Bar_code_set 351428 {17325, 92108, ….} 653234 {23423, 56432,…} Table 2: A sales_item Description Relation Bar_code GID Category Brand Content Size Price 17325 112 milk Foremost 2% 1 Gal $3.31 ….. …… 27 27

28 Example: Preprocessing
Join sales_transaction table to sales_item table to produce encoded transaction table T[1]: TID Items T1 {111 , 121 , 211 , 221} T2 {111 , 211 , 222 , 323} T3 {112 , 122 , 221 , 411} T4 {111 , 121} T5 {111 , 122 , 211 , 221 , 413} T6 {211 , 323 , 524} T7 {323 , 411 , 524 , 713} Print this out and provide to class 28 28

29 Example: Step 1 Find Level-1 frequent itemsets Minsup = 4 T[1] L[1,1]
TID Items T1 {111 , 121 , 211 , 221} T2 {111 , 211 , 222 , 323} T3 {112 , 122 , 221 , 411} T4 {111 , 121} T5 {111 , 122 , 211 , 221 , 413} T6 {211 , 323 , 524} T7 {323 , 411 , 524 , 713} Itemset Support {1**} 5 {2**} Similar to Srikant and Agrawal, “Mining Generalized Association Rules”, items with the same encoding at any level will be treated as one item in one transaction Level-1 Frequent 2-Itemsets L[1,2] Itemset Support {1**,2**} 4 29 29

30 Example: Step 2 Create T[2] by filtering T[1] w/ L[1,1] T[1]
Filtered T[2] TID Items T1 {111 , 121 , 211 , 221} T2 {111 , 211 , 222 , 323} T3 {112 , 122 , 221 , 411} T4 {111 , 121} T5 {111 , 122 , 211 , 221 , 413} T6 {211 , 323 , 524} T7 {323 , 411 , 524 , 713} TID Items T1 {111 , 121 , 211 , 221} T2 {111 , 211 , 222} T3 {112 , 122 , 221} T4 {111 , 121} T5 {111 , 122 , 211 , 221} T6 {211} Similar to Srikant and Agrawal, “Mining Generalized Association Rules”, items with the same encoding at any level will be treated as one item in one transaction Itemset Support {1**} 5 {2**} 30 30

31 Example: Step 3 Find Level-2 Frequent Itemsets Minsup = 3
Support {11*} 5 {12*} 4 {21*} {22*} Find Level-2 Frequent Itemsets Minsup = 3 L[2,2] L[2,2] Filtered T[2] Itemset Support {11*, 12*} 4 {11*, 21*} 3 {11*,22*} {12*, 22*} {21*, 22*} TID Items T1 {111 , 121 , 211 , 221} T2 {111 , 211 , 222} T3 {112 , 122 , 221} T4 {111 , 121} T5 {111 , 122 , 211 , 221} T6 {211} Similar to Srikant and Agrawal, “Mining Generalized Association Rules”, items with the same encoding at any level will be treated as one item in one transaction L[2,3] Itemset Support {11*, 12*, 22*} 3 {11*, 21*, 22*} 31 31

32 Stop: Lowest Level Reached
Example: Step 4 Find Level-3 Frequent Itemsets Minsup = 3 L[3,1] Filtered T[2] Itemset Support {111} 4 {211} {221} 3 TID Items T1 {111 , 121 , 211 , 221} T2 {111 , 211 , 222} T3 {112 , 122 , 221} T4 {111 , 121} T5 {111 , 122 , 211 , 221} T6 {211} L[3,2] Similar to Srikant and Agrawal, “Mining Generalized Association Rules”, items with the same encoding at any level will be treated as one item in one transaction Itemset Support {111, 211*} 3 Stop: Lowest Level Reached 32 32

33 Outline What is MLAR? A Method For Mining M-L Association Rules
Concepts Motivation A Method For Mining M-L Association Rules Problems/Solutions Definitions Algorithm Example Interestingness Optimizations Conclusions/Future Work Exam Questions 33 33

34 Are All Of The Strong Rules Interesting?
MLDM creates unique challenges for rule pruning Two filters for interesting rules: Removal of redundant rules Removal of unnecessary rules Similar to Srikant and Agrawal, “Mining Generalized Association Rules”, items with the same encoding at any level will be treated as one item in one transaction 34 34

35 Redundant Rules Consider a strong rule at Level 1: Milk→Bread food milk bread 2% chocolate white wheat ... ... ... ... Dairyland Foremost Old Mills Wonder This rule is likely to have descendent rules which may or may not contain additional information, even if they met our minconf and minsup criteria at that level: 2% Milk→Wheat Bread, 2% Milk→White Bread, Chocolate Milk→Wheat Bread We need a way to distinguish between rules that add information and those that are redundant 35 35

36 Redundant Rules A rule is redundant if the confidence for a rule falls in a certain range and the items in the rule are descendents of a different rule. 36 36

37 Redundant Rules Applying Redundant Rule reduction
eliminates 40-70% of discovered Strong Rules 37 37

38 Unnecessary Rules Consider the following rules: R: Milk→Bread (minsup = 80%) R': Milk, Butter → Bread (minsup = 80%) How much additional information do we gain from the R'? MLDM can produced very complex rules that meet our minsup and minconf criteria, but do not contain much unique/useful information. We need a way to distinguish between rules that add information and those that are Unnecessary 38 38

39 Unnecessary Rule A rule R is unnecessary if there is a simpler rule R' and φ(R) is within a given range of φ(R') 39 39

40 Outline What is MLAR? A Method For Mining M-L Association Rules
Concepts Motivation A Method For Mining M-L Association Rules Problems/Solutions Definitions Algorithm Example Interestingness Optimizations Conclusions/Future Work Exam Questions 40 40

41 Hardware Setup Hardware: Sun Microsystems SPARCstation 20 32MB RAM
100Mhz Clock CLI Similar to Srikant and Agrawal, “Mining Generalized Association Rules”, items with the same encoding at any level will be treated as one item in one transaction 41 41

42 Algorithm Optimizations
Authors proposed 3 different alternatives of the original algorithm ML_T1LA Use only one encoded table T[1] ML_TML1 Generate T[1], T[2], … T[n+1] ML_T2LA Uses T[2], but calculates down level support with a single scan 42 42

43 ML_T1LA Instead of generating T[2] from T[1], ML_T1LA algorithm generates support for all levels of hierarchy in a single scan from T[1] Pros: Avoids generation of new transaction table Limits number of scans to the size of the largest transaction Cons: Scanning T[1] requires scanning all items, even infrequent ones Performance may suffer for DB w/ many infrequent itemsets Large memory required (32MB RAM = page swapping!!!) 43 43

44 ML_TML1 Instead of using only T[2] for rule mining, ML_TML1 algorithm generates a table for each level, using L[i,1] to filter T[i] and create T[i+1] Pros: Saves significant processing time if only a small portion of the data is frequent at each level Allows for creation of T[i] and L[i,1] in parallel Cons: May not be efficient if only a small number of items is filtered at each level 44 44

45 ML_T2LA Like the base algorithm, ML_T2LA creates T[2] table from the frequent itemsets in T[1]. However, it allows for parallel creation of L[i,k]. Pros: Saves time by limiting the number of scans Cons: May not be efficient if only a small number of items is filtered at each level 45 45

46 Experimental Results Figures show that T2LA is best for most minsup values. Authors preferred ML_T1LA 46 46

47 Outline What is MLAR? A Method For Mining M-L Association Rules
Concepts Motivation A Method For Mining M-L Association Rules Problems/Solutions Definitions Algorithm Example Interestingness Optimizations Conclusions/Future Work Exam Questions 47 47

48 Conclusions This paper demonstrated:
Extending association rules from single-level to multiple- level. A top-down progressive deepening technique for mining multiple-level association rules. Filtering of unnecessary/redundant association rules Performance optimization techniques 48 48

49 Future Work Develop efficient algorithms for mining multiple-level sequential patterns Cross-level associations Improve interestingness of rules 49 49

50 Outline What is MLAR? A Method For Mining M-L Association Rules
Concepts Motivation A Method For Mining M-L Association Rules Problems/Solutions Definitions Algorithm Example Interestingness Optimizations Conclusions/Future Work Exam Questions 50 50

51 Exam Question 1 What is a major drawback to multiple-level data mining, when using the same minsup at all levels of a concept hierarchy? Answer. Large support exists at higher levels of the hierarchy; smaller support at lower levels. In order to insure that sufficiently strong association rules are generated at the lower levels, we must reduce the support at higher levels which, in turn, could result in generation of many uninteresting rules at higher levels. Thus we are faced with the problem of determining which is the optimal minsup at all levels 51 51

52 Exam Question 2 Q. Give an example of a multiple level association rule Answer. High level: 80% of people who buy cereal also buy milk Low Level: 25% of people who buy Cheerios cereal buy Hood 2% Milk 52 52

53 Exam Question 3 Q. There were 3 examples of hierarchy types in multiple level rule mining. Pick one and draw an example Whole-Part Is-A Is-A Multiple Inheritance Vehicle Computer 4-Wheels 2-Wheels Motherboard Hard Drive Recreational Snowmobile Bicycle Vehicle Commuting Car RAM CPU RW Head Platter Sedan SUV Bike Motorcycle 53 53


Download ppt "Mining Multiple-level Association Rules in Large Databases"

Similar presentations


Ads by Google