Presentation is loading. Please wait.

Presentation is loading. Please wait.

MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.

Similar presentations


Presentation on theme: "MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling."— Presentation transcript:

1 MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling 1

2 OUTLINE Introduction Problem Statement Property of Max-Frequency Algorithm Experiments Conclusion 2

3 INTRODUCTION Most previous work on mining frequently occurring itemsets over data streams either focuses on 1. The sliding window model 2. The time-fading model 3. The landmark model Each of these models requires a fixed window length or decay factor given by the user In many applications, however, choosing such parameters that are most appropriate for every itemset at every timepoint in an evolving stream is almost impossible 3

4 INTRODUCTION We propose to consider for each itemset the window in which it has the highest frequency We define the current frequency of an itemset as the maximum over all windows from the past until the current state that satisfy a minimal size constraint When a stream evolves, the length of the window containing the highest frequency for a given itemset can change continuously This new stream measure turns out to be very suitable to early detect sudden bursts of occurrences of itemsets, while still taking into account the history of the itemset 4

5 PROBLEM STATEMENT * STREAMS AND MAX-FREQUENCY : a stream 〈 I 1 I 2 … I n 〉 is a sequence of itemsets is the length of the stream I 1 is considered the first and oldest itemset in the stream, and I n the latest and most recent : the number of sets in a stream that contain itemset I : the sub-stream of the window 〈 I s I s+1 … I t 〉 : the sub-stream of consisting of the last k items of, 5

6 PROBLEM STATEMENT * STREAMS AND MAX-FREQUENCY Definition 1. Given a minimal window size mwl, the max- frequency of itemset I in a stream is defined as the maximum of the frequencies of I over all windows, of size at least mwl, extending the end of the stream; that is If the length of the stream is less than mwl, the max- frequency is defined to be 0 6

7 PROBLEM STATEMENT * STREAMS AND MAX-FREQUENCY Definition 1. (cont.) The longest window in which the maximum frequency is reached is called the maximal window for I in, and its starting point is denoted That is, is the smallest index such that mwl will be omitted when clear form the context 7

8 PROPERTIES OF MAX-FREQUENCY 8

9 9

10 ALGORITHM * THE SUMMARY Let p 1 < p 2 < … < p r be the borders for itemset A in the stream, ordered from oldest to most recent Let be the number of occurrences of the target itemset A in between two subsequent border positions p i and p i+1 ( for i = 1, …, r-1 ). Denotes the number of occurrences of A since the last border The summary S t of is defined as the array 10

11 ALGORITHM * THE SUMMARY We can easily compute the frequencies of itemset A for any of the border positions form this summary: 11

12 ALGORITHM * THE SUMMARY The fractions in the blocks in between two subsequent border positions are increasing, and as a consequence, among all borders p i, we have that is maximal for i equal to r 12

13 ALGORITHM * THE SUMMARY 13

14 ALGORITHM * MINIMAL FREQUENCY Until now, we assumed that for the target itemset we need to be able to report its frequency exactly. We will now relax this requirement by setting a minimal frequency threshold minfreq Let be a stream with, and suppose that Then we can remove( p 1, a 1 ) from the left-side of the summary 14

15 ALGORITHM * MINIMAL WINDOW LENGTH In the algorithm without minimal window length, a border q in stream can be pruned of we can find two blocks and such that the frequency of the target in is higher than When we are working with a minimal window length, it could be the case that the suffix of the stream starting at r + 1 does not meet the minimal window length requirement In that case, even though the window starting at q has lower frequency than the window starting r + 1, it can still have the highest frequency of all windows that meet the minimal window requirement! 15

16 ALGORITHM * MINIMAL WINDOW LENGTH 16

17 ALGORITHM * MINIMAL WINDOW LENGTH In order to know the maximal frequency with a minimal window length mwl, it suffices to apply the method without any minimal window length to keep track of the borders for the stream Then, when we need the max-frequency, we check the borders of in the complete stream, and the minimal window itself, 17

18 ALGORITHM * MINING ALL ITEMSETS 18

19 ALGORITHM * MINING ALL ITEMSETS We do not need to maintain the summaries of all itemsets, but only those that were once frequent in the minimal window, and that are, at the same time, frequent now within the part of the stream Furthermore, we need to find the frequent itemsets in the mwl windows 19

20 EXPERIMENTS 20

21 EXPERIMENTS 21

22 CONCLUSION We presented a new frequency measure for itemsets in streams that does not rely on a fixed window length or a time-decaying factor An experimental evaluation supported the claim that the new measure can be computed from a summary with extremely small memory requirements, that can be maintained and updated efficiently The summary of the stream consists of the borders and their corresponding frequencies 22


Download ppt "MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling."

Similar presentations


Ads by Google