Download presentation
Presentation is loading. Please wait.
Published byきょういち ますはら Modified over 5 years ago
1
Maintaining Frequent Itemsets over High-Speed Data Streams
James Cheng, Yiping Ke, and Wilfred Ng Proceeding of The 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006) Advisor:Jia-Ling Koh Speaker:Chun-Wei Hsieh 05/26/06
2
Introduction Existing approximation techniques for mining frequent itemsets are mainly false-positive using an error parameter, ε ε= γ* min_sup, γ 1 ε is smaller a larger number of itemsets to be maintained ε is larger lower accuracy
3
Introduction MineSW progressively increasing ε
a false-negative approach for recent data (sliding window) for batches , not for transactions
4
Introduction w_size = 2 min_sup,σ = 3/5
sup(bc, W1) = 4, sup(bc, W2) = 2 the set of FIs over W1 is {b, c, bc} the set of FIs over W2 is {b, c, d, bd} W1 W2
5
Preliminaries A computed support A time interval
The computed support of an itemset X over a time interval T the number of transactions that arrive in a time interval T
6
MST Function Requiring the support of the itemset to progressively as it stays longer in a window K : the time of a itemset stays in a window, MST Function
7
MST Function For example, Let σ = 0.01, r = 0.1 and w = 10,
2000 transactions in each time unit. r1=[(1-0.1)/10](1-1)+0.1= m1= 0.01*2000*1=20 r2=[(1-0.1)/10](2-1)+0.1= m2= 0.01*2000*2=40 r3=[(1-0.1)/10](3-1)+0.1= m3= 0.01*2000*3=60 r4=[(1-0.1)/10](4-1)+0.1= m4= 0.01*2000*4=80 r5=[(1-0.1)/10](5-1)+0.1= m5= 0.01*2000*5=100 r6=[(1-0.1)/10](6-1)+0.1= m6= 0.01*2000*6=120 r7=[(1-0.1)/10](7-1)+0.1= m7= 0.01*2000*7=140 r8=[(1-0.1)/10](8-1)+0.1= m8= 0.01*2000*8=160 r9=[(1-0.1)/10](9-1)+0.1= m9= 0.01*2000*9=180 r10=[(1-0.1)/10](10-1)+0.1= m10= 0.01*2000*10=200
8
MST Function ab and cd are retained in windows with Lossy Counting(ε=20) With MineSW, the computed support of ab: t1:3, sup(ab):3 > minsup(1)= 2 t2:0, sup(ab):3 < minsup(2)= 8 : : t7:4, sup(ab):4 > minsup(1)= 2 t8:7, sup(ab):11 > minsup(2)= 8
9
MineSW Algorithm Mining FIs from each batch with γσ
Using a prefix tree to keep the FI and semi-FI of the window The node in the prefix tree has: item uid(X) sup(X)
10
MineSW Algorithm When the first window is not full:
11
MineSW Algorithm processing the expiring time unit
12
MineSW Algorithm processing the new time unit
13
MineSW Algorithm Pruning and Outputting
14
Approximation Quality
The error bound of the computed support of a semi-frequent itemset X over T k : The set of false-negatives are
15
Experiments Compare with LCSW 900 MHz CPU 4G RAM
Data stream: t10i4, t15i6 t: the average size of a transaction i : a maximal frequent itemset Stream :3M transactions W_Size: 20 time units 1 time unit : 50K transactions
16
Experiments
17
Experiments
18
Experiments
19
Experiments
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.