Presentation is loading. Please wait.

Presentation is loading. Please wait.

Maintaining Frequent Itemsets over High-Speed Data Streams

Similar presentations


Presentation on theme: "Maintaining Frequent Itemsets over High-Speed Data Streams"— Presentation transcript:

1 Maintaining Frequent Itemsets over High-Speed Data Streams
James Cheng, Yiping Ke, and Wilfred Ng Proceeding of The 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006) Advisor:Jia-Ling Koh Speaker:Chun-Wei Hsieh 05/26/06

2 Introduction Existing approximation techniques for mining frequent itemsets are mainly false-positive using an error parameter, ε ε= γ* min_sup, γ 1 ε is smaller a larger number of itemsets to be maintained ε is larger lower accuracy

3 Introduction MineSW progressively increasing ε
a false-negative approach for recent data (sliding window) for batches , not for transactions

4 Introduction w_size = 2 min_sup,σ = 3/5
sup(bc, W1) = 4, sup(bc, W2) = 2 the set of FIs over W1 is {b, c, bc} the set of FIs over W2 is {b, c, d, bd} W1 W2

5 Preliminaries A computed support A time interval
The computed support of an itemset X over a time interval T the number of transactions that arrive in a time interval T

6 MST Function Requiring the support of the itemset to progressively as it stays longer in a window K : the time of a itemset stays in a window, MST Function

7 MST Function For example, Let σ = 0.01, r = 0.1 and w = 10,
2000 transactions in each time unit. r1=[(1-0.1)/10](1-1)+0.1= m1= 0.01*2000*1=20 r2=[(1-0.1)/10](2-1)+0.1= m2= 0.01*2000*2=40 r3=[(1-0.1)/10](3-1)+0.1= m3= 0.01*2000*3=60 r4=[(1-0.1)/10](4-1)+0.1= m4= 0.01*2000*4=80 r5=[(1-0.1)/10](5-1)+0.1= m5= 0.01*2000*5=100 r6=[(1-0.1)/10](6-1)+0.1= m6= 0.01*2000*6=120 r7=[(1-0.1)/10](7-1)+0.1= m7= 0.01*2000*7=140 r8=[(1-0.1)/10](8-1)+0.1= m8= 0.01*2000*8=160 r9=[(1-0.1)/10](9-1)+0.1= m9= 0.01*2000*9=180 r10=[(1-0.1)/10](10-1)+0.1= m10= 0.01*2000*10=200

8 MST Function ab and cd are retained in windows with Lossy Counting(ε=20) With MineSW, the computed support of ab: t1:3, sup(ab):3 > minsup(1)= 2 t2:0, sup(ab):3 < minsup(2)= 8 : : t7:4, sup(ab):4 > minsup(1)= 2 t8:7, sup(ab):11 > minsup(2)= 8

9 MineSW Algorithm Mining FIs from each batch with γσ
Using a prefix tree to keep the FI and semi-FI of the window The node in the prefix tree has: item uid(X) sup(X)

10 MineSW Algorithm When the first window is not full:

11 MineSW Algorithm processing the expiring time unit

12 MineSW Algorithm processing the new time unit

13 MineSW Algorithm Pruning and Outputting

14 Approximation Quality
The error bound of the computed support of a semi-frequent itemset X over T k : The set of false-negatives are

15 Experiments Compare with LCSW 900 MHz CPU 4G RAM
Data stream: t10i4, t15i6 t: the average size of a transaction i : a maximal frequent itemset Stream :3M transactions W_Size: 20 time units 1 time unit : 50K transactions

16 Experiments

17 Experiments

18 Experiments

19 Experiments


Download ppt "Maintaining Frequent Itemsets over High-Speed Data Streams"

Similar presentations


Ads by Google