Amer Zaheer PC101005 Mohammad Ali Jinnah University, Islamabad Progressive Partition Miner: An Efficient Algorithm for Mining General Temporal Association Rules Amer Zaheer PC101005 Mohammad Ali Jinnah University, Islamabad
Agenda References Basic Definitions Association Rule Generation Traditional Association Rule Mining Algorithms General Temporal Association Rules Progressive Partition Miner Limitations of PPM
References: C. H. Lee, M. S. Chen, “Progressive Partition Miner: An Efficient Algorithm for Mining General Temporal Association Rules” C. H. Lee, C. Lin and M. Chen, “On Mining General Temporal Association Rules in a Publication Database” C. Chang, M. Chen and C. Lee, “Mining General Temporal Association Rules for Items with Different Exhibition Periods”
Basic Definitions Exhibition Period Maximal Common Exhibition Period Temporal Association Rules Publication Database Frequent Item set
Publication Database, D Basic Definitions Publication Database Publication Database is set of instructions where each transaction T is a set of items of which each item contains and individual exhibition period. Publication Database, D A B C D 1990 1992 1994 2001
Basic Definitions Exhibition Period: The exhibition period is a starting time of any item or set of items till end of transactions Example: In Publication database, D Item A and B are exhibited from 1990 to 2001 Item C is exhibited from 1992 to 2001 Item D is from 1994 to 2001 So each transaction item has a unique exhibition period
Basic Definitions Maximal Common Exhibition Period Latest exhibition start time of both item set X and Y and common end time. Temporal Association Rule An association rule is consider temporal association rule , if and only if its probability is greater then minimum support required and conditional probability is larger than minimum confidence needed.
Meaning of Symbols Used dbi,n The partial database of D formed by a continuous region from Pi to Pj |dbi,n| Number of transactions in dbi,n Xi,n A temporal Item set in partial database dbi,n MCP() The maximum common exhibition period of an item set (X⇒Y)MCP() A general temporal association rule supp(((X⇒Y)t,n) The support of X⇒Y in partial database dbi,n conf((X⇒Y)t,n) The confidence of X⇒Y in partial database dbi,n min_supp Minimum support threshold required min_conf Minimum confidence threshold required min_leng Minimum length of exhibition period required TI A maximal temporal itemset SI A corresponding temporal sub-itemset of TI
Association Rules Generation Let L= {X1, X2, X3, ……… Xn} D be a set of transactions, where each transaction T is a set of items such that A transaction T said to support X if and only if Conventionally, an association rule is an implication of the form , meaning that the presence of the set X implies the presence of another set Y, where
Association Rule Generation The rule holds in the transaction set D with confidence c if c% of transaction in D that contain X also contain Y. The rule has support in the transaction set D if s% of transaction in D contain Problem of mining association rules that have confidence and support greater then corresponding minimum support threshold and minimum confidence threshold.
Traditional Association Rule Mining Algorithms Conventional association rule mining algorithms works in two steps Generate all frequent item sets that satisfy min_supp Generate all association rules that satisfy min_conf using the frequent item sets But Lack of consideration of Exhibition Period of each individual item Lack of an fair support counting basis for each item
Traditional Association Rule Mining Algorithms Example Transaction Database TID Itemset T1 B D T2 B C D T3 B C T4 A D T5 B C E T6 D E T7 A B C T8 C D E T9 B C E F T10 B F T11 T12 B D F
Traditional Association Rule Mining Algorithms Example Assumptions: min_supp=30% min_conf=75% Traditional mining technique: Absolute Support Threshold SA=|12*0.3|=4 Thus B, C, D, E and BC can be termed as frequent item sets and C⇒B is termed as a frequent association rule with support 41.67% and confidence 83.33% An early publication intrinsically possesses a higher likelihood to be determined as a frequent itemset Some discovered rules may be expired from user interest
General Temporal Association Rules General Temporal Association Rules, i.e, (X⇒Y)t,n , where t is the latest exhibition-start time of both item set X and Y and n denotes the end time publication database. An association rule X⇒Y is termed to be frequent if its probability is larger then minimum support required and conditional probability is larger then minimum confidence needed. Instead of absolute support threshold for each item set, a relative minimum support is used. SRA = ||DX|*min_supp|, where DX indicates the amount of partial transaction in the exhibition period of itemset X
General Temporal Association Rules: Example Transaction Database Date TID Itemset Jan-01 T1 B D T2 B C D T3 B C T4 A D Feb-01 T5 B C E T6 D E T7 A B C T8 C D E Mar-01 T9 B C E F T10 B F T11 A D T12 B D F db1,3 db2,3 db2,3
General Temporal Association Rules: Example Assumption: mini_supp = 30% mini_conf= 75% General Temporal Association Rules: (C⇒E)2,3 with relative support 37.5 % and confidence 75% (E⇒C)2,3 with relative support 37.5 % and confidence 75% (B⇒F)3,3 with relative support 75 % and confidence 100% (F⇒B)3,3 with relative support 75 % and confidence 100%
Progressive Partition Miner To deal with the mining of general temporal association rule (X⇒Y)t,n , Progressive Partition Miner (PPM) is devised. The basic idea of PPM is to first partition the publication database in light of exhibition periods of items and then progressively accumulate the occurrence count of each candidate 2-itemset based on the intrinsic partitioning characteristics.
Progressive Partition Miner: Flow Chart Partition database based on exhibition periods Produce candidate 2-TIs Use candidate 2-TIs to produce candidate k-TIs and k-SIs Generate frequent Rule generation 1st Scan database 2nd Scan Database
Progressive Partition Miner: EXAMPLE Date TID Item Set Jan01 T1 B D T2 B C D T3 B C T4 A D FEB-01 T5 B C E T6 D E T7 A B C T8 C D E MARCH-01 T9 B C E F T10 B F T11 A D T12 B D F P1 P2 P3 Transaction database min_supp = 30%and min_conf = 75%.
Progressive Partition Miner: EXAMPLE P1+P2 C2 START COUNT BD 1 2 BC 4 CE DE AB AC CD BE First Scan P1 C2 START COUNT BD 1 2 BC CD AD four transactions in P1, the partial minimal support is (4 ∗ 0.3) = 2. Support: α ((4 + 4) ∗ 0.3) = 3 β(4 ∗ 0.3) = 2. min_supp = 30%and min_conf = 75%.
Progressive Partition Miner: EXAMPLE P1+P2+P3 C2 START COUNT BC 1 5 CE 2 3 DE DF BE BF CF EF AD BD CANDIDATE ITEM SET COUNT BC 5 BF 3 CE
Progressive Partition Miner: EXAMPLE Candidate 2-item set: BC 1,3 CE 2,3 BF 3,3 Candidate 1-item set: B 1,3 C 1,3 C 2,3 E 2,3 B3,3 F 3,3 CANDIDATE ITEM SET COUNT C1 B1,3 8 B3,3 3 C1,3 6 C2,3 4 F3,3 C2 BC1,3 5 BF3,3 CE2,3
Progressive Partition Miner: EXAMPLE Now count the support according to given rang each item set. Frequent ITEM SET COUNT L1 B1,3 8 B3,3 3 C1,3 6 C2,3 4 F3,3 L2 BC1,3 5 BF3,3 CE2,3 After 2nd scan database D, we have frequent itemsets ( relative support = 30% ) as follows: B 1,3 C 1,3 C 2,3 E 2,3 B3,3 F 3,3 BC1,3 CE 2,3 BF 3,3
General Temporal Association :Rule Generation (B ⇒C)1,3 Total Transaction=12 Support Count= 5 Thus support=41.67% Confidence (X ⇒ Y) = Support(XUY)/ Support(X) Confidence(B ⇒Y) =5/8 Confidence(B ⇒Y)=62.5%
General Temporal Association :Rule Generation SUPPORT CONFIDENCE (B => C)1,3 41.67% 62.50% (C => B)1,3 83.33% (B => F)3,3 75.00% 100.00% (F => B)3,3 (C => E)2,3 37.50% (E => C)2,3 min_supp = 30%and min_conf = 75%.
General Temporal Association :Rule Generation, Pruning SUPPORT CONFIDENCE (C => B)1,3 41.67% 83.33% (B => F)3,3 75.00% 100.00% (F => B)3,3 (C => E)2,3 37.50% (E => C)2,3
Limitation Partition the database on exhibition period rather than size of database so not uniform partition. Temporal database is updated continually it not handle the upcoming iteration.
Thanks