Download presentation
Presentation is loading. Please wait.
Published byἈράχνη Μιχαηλίδης Modified over 6 years ago
1
Progressive Weighted Miner: An Efficient Method for Time-Constraint Mining
Chang-Hung Lee, Jian Chih Ou, and Ming Syan Chen, Proc. of the 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’03) Adviser: Jia-Ling Koh Speaker: Yu-ting Kung
2
Introduction Introduce a weighted model of transaction-weighted association rules in a time-variant database. Propose an efficient Progressive Weighted Miner (PWM) algorithm to produce weighted association rules
3
Introduction (Cont.) Progressive Weighted Miner
The importance of each transaction period is first reflected by a proper weight assigned by the user. PWM partitions the time-variant database in light of weighted periods of transactions and performs weighted mining.
4
Problem Description Weighted minimum support Weighted-Support of X
Weighted support ratio of an itemset X Amount of partial transactions Corresponding weight values by “weighting function” The number of transactions in partition Pi that contain itemset X
5
Problem Description (Cont.)
Weighted-Confidence Frequent weighted association rule (X=>Y)W , and
6
Problem Description (Cont.)
For example (min_sup=30%, min_conf=75%, W(P1) = 0.5, W(P2) = 1 and W(P3) = 2) Min_SW={4X0.5+4X1+4X2}X0.3=4.2 (C=>B)Wis frequent? Transaction Database Date TID Itemset P1 Jan-02 t1 B D t2 A t3 C t4 P2 Feb-02 t5 E t6 t7 t8 P3 Mar-02 t9 t10 F t11 t12 Yes!! > min_sup > min_conf
7
Partition database based
PWM Algorithm Procedure I Partition database based on weighted periods 1st Scan database Procedure II W(Pi) Produce C2 Procedure III W(Pi) Use C2 to produce Ck 2st Scan database Procedure IV Generate LK (X=>Y)W
8
Procedure I Time granularity= month, W(P1) = 0.5, W(P2) = 1, W(P3) = 2
Transaction Database Date TID Itemset Jan-02 t1 B D t2 A t3 C t4 Feb-02 t5 E t6 t7 t8 Mar-02 t9 t10 F t11 t12 P1 P2 P3 Time granularity= month, W(P1) = 0.5, W(P2) = 1, W(P3) = 2
9
Procedure II p1 p2 scan scan 4x0.5x0.3
Min_SW(P1+P2)=1.8, Min_SW(P2)=1.2 C2 start NW(X)count AB 2 1*1=1 AC BC 1 1+2*1=3 BD 1+0*1=1 BE CD CE 2*1=2 DE Min_SW(P1)=0.6 C2 start NW(X)count BD 1 2*0.5=1 AD 1*0.5=0.5 BC CD
10
Procedure II (Cont.) scan p3
Min_SW(P1+P2+P3)=4.2, Min_SW(P2+P3)=3.6, Min_SW(P3)=2.4 C2 start NW(X)count AD 3 1*2=2 BC 1 3+1*2=5 BD BE BF 3*2=6 CE 2 2+1*2=4 CF DE 2+0*2=2 DF EF p3 scan
11
Procedure III After 1st scan D, candidate itemsets:
{B}, {C}, {E}, {F}, {BC}, {BF}, {CE} (because C2 = {BC,CE,BF} from Procedure II)
12
Procedure IV After 2nd scan D pruning Min_SW(D)=4.2 Candidate
NW(X)count C1 {B} 3*0.5+2*1+3*2=9.5 {C} 2*0.5+3*1+1*2=6 {E} 3*1+1*2=5 {F} 3*2=6 C2 {BC} 2*0.5+2*1+1*2=5 {BF} {CE} 2*1+1*2=4 Frequent Itemsets NW(X)count L1 {B} 9.5 {C} 6 {E} 5 {F} L2 {BC} {BF} pruning
13
WAR pruning Rules Support Confidence B => C
5/(4*0.5+4*1+4*0.2)= 35.7% 5/9.5=52.6% B => F 6/(4*0.5+4*1+4*0.2)= 42.8% 6/9.5=63.1% C => B 5/6=83.3% F => B 6/6=100% pruning Rules Support Confidence C => B 35.7% 83.3% F => B 42.8% 100%
14
Conclusion Develop PWM to generate the WAR in Time-variant database
PWM employs a filtering threshold in each partition PWM can lead to more interesting results
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.