Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Efficient Algorithm for Incremental Mining of Association Rules

Similar presentations


Presentation on theme: "An Efficient Algorithm for Incremental Mining of Association Rules"— Presentation transcript:

1 An Efficient Algorithm for Incremental Mining of Association Rules
Chin-Chen Chang, Yu-Chiang Li, Jung-San Lee RIDE-SDMA’05 Speaker:董原賓 Advisor:柯佳伶

2 Introduction Previous incremental mining algorithms Problem Solution
FUP (Fast Update Algorithm) FUP2 negative border ※They all have to rescan the originally database Problem Publication-like database EX:Publication database, web log records, etc. The original database is normally much larger than the incremental database Solution NFUP (New Fast Update Algorithm)

3 Definition DB:original database db:the set of newly added transactions
DB+:DB + db n, Pn:db is divided into n partitions, db = P1UP2U,…,UPn-1UPn dbm,n = PmUPm+1U,…,UPn-1UPn

4 Definition α set: frequent itemsets in DB+
β set: frequent in dbm,n , (m ≤ n), but infrequent in dbm-1,n γ set: frequent in dbm,m, but infrequent in dbm+1,n X.count:occurrence count X.start:partition number when X becomes frequent X.type:denotes one of the three types α,β, and γ

5 FUP (Fast Update Algorithm)
In case2, itemset is easily calculated In case3, FUP needs to rescan the original database

6 NFUP (New Fast Update Algo.)
A backward method that only requires scanning incremental database A frequent itemset in the incremental database is also important even if it is infrequent in the updated database Partition the incremental database (db) by the time interval

7 NFUP The frequent set of itemsets of DB is known in advance
NFUP scans each partition backward, the last partition is scanned first In each partition, the process is performed like that of Apriori.

8 NFUP

9 Scan from Pn to P1 and find the
α,β,γ itemsets in db After P1 is scanned, the occurrence count is accumulated with itemsets of DB

10 The latest partition is scanned first,
initialize variables and accumulate the occurrence Still frequent in Pm then accumulate count Still frequent in dbm,n then accumulate count Only frequent in dbm+1,n then Remove from α set and add Into β set Not belong to any set and frequent in Pm then check if Pm is the latest partition Yes  α set No  γ set

11 Example Min sup = 50% {A: 2} {B: 2} {C: 3} {D: 1} {E: 1} {F: 2}
{AB: 2} {AC: 2} {AF: 1} {BC: 2} {BF: 1} {CF: 2} Else check itemset doesn’t belongs to any set Check if itemset belongs to α set Check if itemset’s count >= 1.5 Run Apriori-gen Else check itemset doesn’t belong to any set Check if itemset belongs to α set Check if P2 is the latest partition yes α no  γ Check if P2 is the latest partition yes α no  γ Scan p2 : 1-itemset Scan P2 : 3-itemset scan P2 : 2-itemset Check if itemset’s count >= 1.5 3 x 0.5 = 1.5 α set start count β set start count γ set start count {A} 2 2 {B} 2 2 {C} 2 3 {F} 2 2 {AB} 2 2 {AC} 2 2 {BC} 2 2 {CF} 2 2 {ABC} 2 2

12 Example Min sup = 50% 3 x 0.5 = 1.5 {A: 1} {B: 3} {C: 2}
{D: 1} {E: 3} {F: 0} {AB: 1} {AC: 0} {BC: 2} {BE: 3} {CE: 2} Yesaccumulate count Count < s*|dbm,n| = 0.5x6 = 3  β set Else check if itemset’s count >= 1.5 Run Apriori-gen Else check if itemset’s count >= 1.5 Check if P1 is the latest partition yes α no  γ Check if itemset belongs to α set scan P1 : 2-itemset Check itemset doesn’t belongs to any set Yesaccumulate count Count < s*|dbm,n| = 0.5x6 = 3  β set Check itemset doesn’t belong to any set Check if itemset belongs to α set Check if P1 is the latest partition yes α no  γ Scan p1 : 1-itemset α set start count β set start count γ set start count {A} 1 2 2 3 {F} 2 2 {E} 1 3 {B} 2 1 2 5 {AC} 2 2 {BE} 1 3 {C} 1 2 5 3 {CF} 2 2 {CE} 1 2 {F} 2 2 {ABC} 2 2 {AB} 2 1 3 2 {AC} 2 2 {BC} 2 1 2 4 {CF} 2 2 {ABC} 2 2

13 Example α set start count β set start count γ set start count {A} 1 3
3 7 {F} 2 2 {E} 1 3 {B} 1 5 8 {AC} 2 2 {BE} 1 3 {C} 1 9 5 {CF} 2 2 {CE} 1 2 {AB} 1 3 {AB} {ABC} 2 1 2 3 {AE} 3 {BC} 1 4 {BC} 1 4 {ABC} 2 2

14 Experiment Intel Pentium IV 1.5GHz CPU, 640 MB main memory
Microsoft Windows 2000 Professional Synthetic datasets:

15 Experiment

16 Experiment

17 Experiment


Download ppt "An Efficient Algorithm for Incremental Mining of Association Rules"

Similar presentations


Ads by Google