Mining General Temporal Association Rules for Items with Different Exhibition Cheng-Yue Chang, Ming-Syan Chen, Chang-Hung Lee, Proc. of the 2002 IEEE international Conference on Data Mining(ICDM’02) Adviser: Jia-Ling Koh Speaker: Yu-ting Kung
2 Introduction In this paper, explore a new model of mining general temporal association rules from large database where the exhibition periods of the items are allowed to be different from one to another. (see next page)
3 Introduction (Cont.) What’s wrong on conventional mining algorithm applied in this database? For example: Min_support = 30%, min_conf= 75% By conventional mining, only {A}, {B}, {C} and {F} are frequent itemsets No association rule discovered But some rules do exist in this database!!
4 Introduction (Cont.) What’s the problem of conventional mining algorithm? It doesn’t take the individual exhibition periods of items into consideration.
5 Introduction (Cont.) For allowing to have different exhibition periods, now define three basic definition: Maximal common exhibition period (MCP) MCP(X) = [p, q] For example: (in Figure1) MCP(BC) = [2,3] itemset Latest-exhibition-start time earliest-exhibition-end-time
6 Introduction (Cont.) Relative support For example: (in Figure1) Confidence For example: (in Figure1)
7 Introduction (Cont.) Based on the definition above, the frequent general temporal association rules in this database are:
8 Introduction (Cont.) In this model, the “downward closure” property is no longer valid. For example: (In Figure1) itemset BCD is frequent in [2,2], but BC, BD and CD are “not” all frequent in their corresponding MCP!! ex: BC’s relative support is only 25% (< 30%)
9 Problem Description Maximal temporal itemset For example: BCD 2,2 ( ) BD 2,2 ( ) BC 2,2 ( X ) Temporal sub-itemset of the maximal temporal itemset For example: BCD 2,2 is a maximal temporal itemset BD 2,2, BC 2,2 and CD 2,2 are the temporal sub-itemset of BCD 2,2
10 Problem Description (Cont.) Maximal temporal itemset is frequent For example: (X MCP(X) is a maximal TI) If supp(X MCP(X) ) >= min_supp, then X MCP(X) is a frequent Property: All temporal sub-itemsets of a frequent maximal temporal itemset are frequent General temporal association rule It will be frequent iff
11 Mining General Temporal Association Rule ─ SPF Algorithm SPF consists of “two” major procedures: Segmentation (ProcSG) Progressively Filtering (ProcPF) First, SPF divide the database into partitions according to the time granularity imposed. Second, SPF employs ProcSG Third, SPF utilizes ProcPF Then, generate all candidate k-itemsets from (k-1)- itemset transform to TIs, generate SIs Finally, scan database to determine all frequent TIs and SIs
12 SPF Algorithm ─ ProcSG Segment the database into sub-database that items in each will have either the common starting time or the common ending time db 1,6 db 1,3, db 4,4 and db 5,6
13 SPF Algorithm ─ ProcPF After the entire database is segmented by ProcSG, ProcPF is to progressivly filter candidate 2-itemsets from one partition to another in each sub-database
14 An Illustrative Example (SPF) Illustrative Example: Figure1 Min_supp = 30%, min_conf=75% Use ProcSG: database sub-databases db 1,4 db 1,2 and db 3,4 (two sub-segments)
15 An Illustrative Example (SPF) Use ProcPF: progressively filter the candidate 2- itemsets
16 An Illustrative Example (SPF) After the 1st database scan, C2= {AB, BC, BD, CD, CF, EF} Generate C3, C3={BCD} Transform to TI and generate SI After the 2nd database scan, Frequent T1={AB 2,4, BD 2,2, CF 1,3, EF 3,3 BCD 2,2 }
17 Experiment Data |D| = the number of transactions |T| = average size in each transaction |N| = the number of different items |L| = the number of potential frequent itemsets Algorithms to compare SPF Apriori IP
18 Experiment (Cont.)
19 Experiment (Cont.)