Presentation is loading. Please wait.

Presentation is loading. Please wait.

Amer Zaheer PC Mohammad Ali Jinnah University, Islamabad

Similar presentations


Presentation on theme: "Amer Zaheer PC Mohammad Ali Jinnah University, Islamabad"— Presentation transcript:

1 Amer Zaheer PC101005 Mohammad Ali Jinnah University, Islamabad
Progressive Partition Miner: An Efficient Algorithm for Mining General Temporal Association Rules Amer Zaheer PC101005 Mohammad Ali Jinnah University, Islamabad

2 Agenda References Basic Definitions Association Rule Generation
Traditional Association Rule Mining Algorithms General Temporal Association Rules Progressive Partition Miner Limitations of PPM

3 References: C. H. Lee, M. S. Chen, “Progressive Partition Miner: An Efficient Algorithm for Mining General Temporal Association Rules” C. H. Lee, C. Lin and M. Chen, “On Mining General Temporal Association Rules in a Publication Database” C. Chang, M. Chen and C. Lee, “Mining General Temporal Association Rules for Items with Different Exhibition Periods”

4 Basic Definitions Exhibition Period Maximal Common Exhibition Period
Temporal Association Rules Publication Database Frequent Item set

5 Publication Database, D
Basic Definitions Publication Database Publication Database is set of instructions where each transaction T is a set of items of which each item contains and individual exhibition period. Publication Database, D A B C D 1990 1992 1994 2001

6 Basic Definitions Exhibition Period:
The exhibition period is a starting time of any item or set of items till end of transactions Example: In Publication database, D Item A and B are exhibited from 1990 to 2001 Item C is exhibited from 1992 to 2001 Item D is from 1994 to 2001 So each transaction item has a unique exhibition period

7 Basic Definitions Maximal Common Exhibition Period
Latest exhibition start time of both item set X and Y and common end time. Temporal Association Rule An association rule is consider temporal association rule , if and only if its probability is greater then minimum support required and conditional probability is larger than minimum confidence needed.

8 Meaning of Symbols Used
dbi,n The partial database of D formed by a continuous region from Pi to Pj |dbi,n| Number of transactions in dbi,n Xi,n A temporal Item set in partial database dbi,n MCP() The maximum common exhibition period of an item set (X⇒Y)MCP() A general temporal association rule supp(((X⇒Y)t,n) The support of X⇒Y in partial database dbi,n conf((X⇒Y)t,n) The confidence of X⇒Y in partial database dbi,n min_supp Minimum support threshold required min_conf Minimum confidence threshold required min_leng Minimum length of exhibition period required TI A maximal temporal itemset SI A corresponding temporal sub-itemset of TI

9 Association Rules Generation
Let L= {X1, X2, X3, ……… Xn} D be a set of transactions, where each transaction T is a set of items such that A transaction T said to support X if and only if Conventionally, an association rule is an implication of the form , meaning that the presence of the set X implies the presence of another set Y, where

10 Association Rule Generation
The rule holds in the transaction set D with confidence c if c% of transaction in D that contain X also contain Y. The rule has support in the transaction set D if s% of transaction in D contain Problem of mining association rules that have confidence and support greater then corresponding minimum support threshold and minimum confidence threshold.

11 Traditional Association Rule Mining Algorithms
Conventional association rule mining algorithms works in two steps Generate all frequent item sets that satisfy min_supp Generate all association rules that satisfy min_conf using the frequent item sets But Lack of consideration of Exhibition Period of each individual item Lack of an fair support counting basis for each item

12 Traditional Association Rule Mining Algorithms Example
Transaction Database TID Itemset T1 B D T2 B C D T3 B C T4 A D T5 B C E T6 D E T7 A B C T8 C D E T9 B C E F T10 B F T11 T12 B D F

13 Traditional Association Rule Mining Algorithms Example
Assumptions: min_supp=30% min_conf=75% Traditional mining technique: Absolute Support Threshold SA=|12*0.3|=4 Thus B, C, D, E and BC can be termed as frequent item sets and C⇒B is termed as a frequent association rule with support 41.67% and confidence % An early publication intrinsically possesses a higher likelihood to be determined as a frequent itemset Some discovered rules may be expired from user interest

14 General Temporal Association Rules
General Temporal Association Rules, i.e, (X⇒Y)t,n , where t is the latest exhibition-start time of both item set X and Y and n denotes the end time publication database. An association rule X⇒Y is termed to be frequent if its probability is larger then minimum support required and conditional probability is larger then minimum confidence needed. Instead of absolute support threshold for each item set, a relative minimum support is used. SRA = ||DX|*min_supp|, where DX indicates the amount of partial transaction in the exhibition period of itemset X

15 General Temporal Association Rules: Example
Transaction Database Date TID Itemset Jan-01 T1 B D T2 B C D T3 B C T4 A D Feb-01 T5 B C E T6 D E T7 A B C T8 C D E Mar-01 T9 B C E F T10 B F T11 A D T12 B D F db1,3 db2,3 db2,3

16 General Temporal Association Rules: Example
Assumption: mini_supp = 30% mini_conf= 75% General Temporal Association Rules: (C⇒E)2,3 with relative support 37.5 % and confidence 75% (E⇒C)2,3 with relative support 37.5 % and confidence 75% (B⇒F)3,3 with relative support 75 % and confidence 100% (F⇒B)3,3 with relative support 75 % and confidence 100%

17 Progressive Partition Miner
To deal with the mining of general temporal association rule (X⇒Y)t,n , Progressive Partition Miner (PPM) is devised. The basic idea of PPM is to first partition the publication database in light of exhibition periods of items and then progressively accumulate the occurrence count of each candidate 2-itemset based on the intrinsic partitioning characteristics.

18 Progressive Partition Miner: Flow Chart
Partition database based on exhibition periods Produce candidate 2-TIs Use candidate 2-TIs to produce candidate k-TIs and k-SIs Generate frequent Rule generation 1st Scan database 2nd Scan Database

19 Progressive Partition Miner: EXAMPLE
Date TID Item Set Jan01 T1 B D T2 B C D T3 B C T4 A D FEB-01 T5 B C E T6 D E T7 A B C T8 C D E MARCH-01 T9 B C E F T10 B F T11 A D T12 B D F P1 P2 P3 Transaction database min_supp = 30%and min_conf = 75%.

20 Progressive Partition Miner: EXAMPLE
P1+P2 C2 START COUNT BD 1 2 BC 4 CE DE AB AC CD BE First Scan P1 C2 START COUNT BD 1 2 BC CD AD four transactions in P1, the partial minimal support is (4 ∗ 0.3) = 2. Support: α ((4 + 4) ∗ 0.3) = 3 β(4 ∗ 0.3) = 2. min_supp = 30%and min_conf = 75%.

21 Progressive Partition Miner: EXAMPLE
P1+P2+P3 C2 START COUNT BC 1 5 CE 2 3 DE DF BE BF CF EF AD BD CANDIDATE ITEM SET COUNT BC 5 BF 3 CE

22 Progressive Partition Miner: EXAMPLE
Candidate 2-item set: BC 1,3 CE 2,3 BF 3,3 Candidate 1-item set: B 1,3 C 1,3 C 2,3 E 2,3 B3,3 F 3,3 CANDIDATE ITEM SET COUNT C1 B1,3 8 B3,3 3 C1,3 6 C2,3 4 F3,3 C2 BC1,3 5 BF3,3 CE2,3

23 Progressive Partition Miner: EXAMPLE
Now count the support according to given rang each item set. Frequent ITEM SET COUNT L1 B1,3 8 B3,3 3 C1,3 6 C2,3 4 F3,3 L2 BC1,3 5 BF3,3 CE2,3 After 2nd scan database D, we have frequent itemsets ( relative support = 30% ) as follows: B 1,3 C 1,3 C 2,3 E 2,3 B3,3 F 3,3 BC1,3 CE 2,3 BF 3,3

24 General Temporal Association :Rule Generation
(B ⇒C)1,3 Total Transaction=12 Support Count= 5 Thus support=41.67% Confidence (X ⇒ Y) = Support(XUY)/ Support(X) Confidence(B ⇒Y) =5/8 Confidence(B ⇒Y)=62.5%

25 General Temporal Association :Rule Generation
SUPPORT CONFIDENCE (B => C)1,3 41.67% 62.50% (C => B)1,3 83.33% (B => F)3,3 75.00% 100.00% (F => B)3,3 (C => E)2,3 37.50% (E => C)2,3 min_supp = 30%and min_conf = 75%.

26 General Temporal Association :Rule Generation, Pruning
SUPPORT CONFIDENCE (C => B)1,3 41.67% 83.33% (B => F)3,3 75.00% 100.00% (F => B)3,3 (C => E)2,3 37.50% (E => C)2,3

27 Limitation Partition the database on exhibition period rather than size of database so not uniform partition. Temporal database is updated continually it not handle the upcoming iteration.

28 Thanks


Download ppt "Amer Zaheer PC Mohammad Ali Jinnah University, Islamabad"

Similar presentations


Ads by Google