1 On Mining General Temporal Association Rules in a Publication Database Chang-Hung Lee, Cheng-Ru Lin and Ming-Syan Chen, Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM’01), 29 Nov.-2 Dec. 2001, pp. 337–344. Advisor : Jia-Ling Koh Speaker : Chen-Yi Lin Department of Information & Computer Education, NTNU
2 Introductions Problem Description General Temporal Association Rules (Progressive Partition Miner, PPM) Experimental Results Conclusions Department of Information & Computer Education, NTNU Outlines
3 Introductions (1/3) A publication database is a set of transactions where each transaction T is a set of items of which each item contains an individual exhibition period. Department of Information & Computer Education, NTNU
4 Introductions (2/3) Department of Information & Computer Education, NTNU Bookstore Transaction Database DataTIDItemset Jan-01 T1BD T2BCD T3BC T4AD Feb-01 T5BCE T6DE T7ABC T8CDE Mar-01 T9BCEF T10BF T11AD T12BDF Item Information ItemPublication Data AJan-95 BApr-96 CJuly-97 DAug-00 EFeb-01 FMar-01 min_sup=30% min_conf=75% Frequent itemsets: {B, C, D, E, BC} Frequent association rule: C => B
5 Introductions (3/3) The current model of association rule mining is not able to handle the publication database: –Lack of consideration of the exhibition period of each individual item –Lack of an equitable support counting basis for each item Department of Information & Computer Education, NTNU
6 Problem Description (1/4) Department of Information & Computer Education, NTNU Bookstore Transaction Database DataTIDItemset D P1 Jan-01 T1BD db 1,3 T2BCD T3BC T4AD P2 Feb-01 T5BCE db 2,3 T6DE T7ABC T8CDE P3 Mar-01 T9BCEF db 3,3 T10BF T11AD T12BDF Time granularity: Month P1+P2+P3
7 Problem Description (2/4) Department of Information & Computer Education, NTNU Maximal Common exhibition period(MCP) of items –MCP(x): the MCP value of item x –For example: MCP(C)=(1, 3) and MCP(E)=(2, 3) => MCP(CE)=(2, 3)
8 Problem Description (3/4) Department of Information & Computer Education, NTNU An association rule (X => Y) MCP(XY) is called a general temporal association rule. (X => Y) MCP(XY) is frequent if and only if –supp((X => Y) MCP(XY) ) >= min_supp –and conf((X => Y) MCP(XY) ) >= min_conf –For example: (C => E) 2,3 is general temporal association rule min_supp=30% and min_conf=75% => (C => E) 2,3 is frequent.
9 Problem Description (4/4) Department of Information & Computer Education, NTNU When a maximal temporal k-itemset is frequent in data set, each of its corresponding sub-itemset is also frequent in. –For example: is frequent. => and are also frequent.
10 General Temporal Association Rules (1/6) Department of Information & Computer Education, NTNU Bookstore Transaction Database DataTIDItemset D P1 Jan-01 T1BD db 1,3 T2BCD T3BC T4AD P2 Feb-01 T5BCE db 2,3 T6DE T7ABC T8CDE P3 Mar-01 T9BCEF db 3,3 T10BF T11AD T12BDF min_sup=30% min_conf=75%
11 General Temporal Association Rules (2/6) Department of Information & Computer Education, NTNU Scan DB (1) P1 C2startCount BD12 BC12 CD11 AD11 P1+P2 C2startCount BD12 BC14 BE21 CE22 DE22 AB21 AC21 CD21 P1+P2+P3 C2startCount BC15 CE23 DE22 BE31 BF33 CF31 EF31 AD31 BD31 DF31
12 General Temporal Association Rules (3/6) Department of Information & Computer Education, NTNU After 1 st scan database D, we have candidate itemsets as follows: no candidate k-itemset is generated (k>=3)
13 General Temporal Association Rules (4/6) Department of Information & Computer Education, NTNU Scan DB (2) Candidate ItemsetscountSRSR C1 {B 1,3 }84 {B 3,3 }32 {C 1,3 }64 {C 2,3 }43 {E 2,3 }43 {F 3,3 }32 C2 {BC 1,3 }54 {BF 3,3 }32 {CE 2,3 }33 Pruning Frequent Itemsetscount L1 {B 1,3 }8 {B 3,3 }3 {C 1,3 }6 {C 2,3 }4 {E 2,3 }4 {F 3,3 }3 L2 {BC 1,3 }5 {BF 3,3 }3 {CE 2,3 }3
14 General Temporal Association Rules (5/6) Department of Information & Computer Education, NTNU After 2nd scan database D, we have frequent itemsets as follows: RulesSupp.Conf. (B=>C) 1, %62.50% (C=>B) 1, %83.33% (B=>F) 3, %100.00% (F=>B) 3, %100.00% (C=>E) 2, %75.00% (C=>E) 2, %75.00% Pruning RulesSupp.Conf. (C=>B) 1, %83.33% (B=>F) 3, %100.00% (F=>B) 3, %100.00% (C=>E) 2, %75.00% (C=>E) 2, %75.00%
15 General Temporal Association Rules (6/6) Department of Information & Computer Education, NTNU Partition database based on exhibition periods Produce candidate 2-TIs Use candidate 2-Tis to produce candidate k-TIs and k-SIs Generate frequent k-TIs and k- SIs Rule generation 1st scan database 2nd scan database The flowchart of PPM
16 Experimental Results (1/3) Department of Information & Computer Education, NTNU |D|Number of transactions in the database |T|Average size of the transactions |I|Average size of the maximal frequent itemsets |L|Number of maximal potentially frequent itemsets (default 2000) NNumber of items (default 10000) |Pi|Number of transactions in the partition database Pi Meaning of various parameters:
17 Department of Information & Computer Education, NTNU Experimental Results (2/3) Relative performance
18 Department of Information & Computer Education, NTNU Experimental Results (3/3) Scaleup performance
19 Conclusions Department of Information & Computer Education, NTNU Algorithm PPM is particularly powerful for efficient mining for transaction databases, video rental store records, library rental records, book rental records, and transactions in electronic commerce.