Download presentation
Presentation is loading. Please wait.
Published byJulien Mayberry Modified over 9 years ago
1
Resource-oriented Approximation for Frequent Itemset Mining from Bursty Data Streams SIGMOD’14 Toshitaka Yamamoto, Koji Iwanuma, Shoshi Fukuda
2
Introduction ▪A data stream: unbounded sequence of data arriving at high speed ▪FIM-DS: Frequent Itemset Mining form Data Stream ▪i.e. : {a}:4 , {b}:3 , {c}:3 ▪Application of FIM-DS: monitoring surveillance systems, communication networks, retail industry…… ▪A Challenging Problem of FIM-DS: handling a huge combinatorial number of entries to be generated form each streaming transaction and stored in memory ▪This study considers approximation techniques for FIM-DS. 2
3
Introduction Some approximation methods for FIM-DS: ▪Parameter-oriented approaches ▪One-scan approximation algorithm ▪Two Type: deletion approach & random sampling approach ▪provide some guarantee that the resulting itemsets have frequencies with errors bounded by a given parameter ▪No false negative under some condition 3
4
Introduction 4
5
5
6
Contents ▪Introduction ▪Preliminary and Background ▪Parameter-Oriented V.S. Resource-Oriented ▪LC-SS Algorithm ▪Skip LC-SS Algorithm (Introduction & Performance & Improvement) ▪Furthermore Improvements ▪Conclusion and Future Work 6
7
Notations and Terminology 7
8
8
9
9
10
10
11
Notations and Terminology 11
12
Lossy Counting algorithm 12
13
Lossy Counting algorithm 13
14
Lossy Counting algorithm ▪The challenging problem: ▪The LC algorithm must generate (and check) every transaction subset at least once ▪Combinatorial explosion of memory consumption 14
15
Space-Saving algorithm 15
16
Space-Saving algorithm 16
17
Contents ▪Introduction ▪Preliminary and Background ▪Parameter-Oriented V.S. Resource-Oriented ▪LC-SS Algorithm ▪Skip LC-SS Algorithm (Introduction & Performance & Improvement) ▪Furthermore Improvements ▪Conclusion and Future Work 17
18
Parameter-Oriented V.S. Resource-Oriented 18
19
19
20
20
21
Parameter-Oriented V.S. Resource-Oriented 21
22
Parameter-Oriented V.S. Resource-Oriented ▪Resource-Oriented approaches: ▪Approximation methods ▪Guarantee a resource-specified constraint: memory consumption or data processing time 22
23
Contents ▪Introduction ▪Preliminary and Background ▪Parameter-Oriented V.S. Resource-Oriented ▪LC-SS Algorithm ▪Skip LC-SS Algorithm (Introduction & Performance & Improvement) ▪Furthermore Improvements ▪Conclusion and Future Work 23
24
LC-SS Algorithm 24
25
LC-SS Algorithm 25
26
LC-SS Algorithm 26
27
LC-SS Algorithm 27
28
The validity in the LC-SS Algorithm 28
29
The validity in the LC-SS Algorithm 29
30
The validity in the LC-SS Algorithm ▪Theorem 2 guarantees the validity(i.e., completeness and accuracy) of the outputs by Algorithm 1. 30
31
The validity in the LC-SS Algorithm 31
32
Contents ▪Introduction ▪Preliminary and Background ▪Parameter-Oriented V.S. Resource-Oriented ▪LC-SS Algorithm ▪Skip LC-SS Algorithm (Introduction & Performance & Improvement) ▪Furthermore Improvements ▪Conclusion and Future Work 32
33
Skip LC-SS Algorithm 33
34
Skip LC-SS Algorithm 34
35
Skip LC-SS Algorithm 35
36
Skip LC-SS Algorithm 36
37
Skip LC-SS Algorithm 37
38
The Validity of the output 38
39
Performance of the Skip LC-SS algorithm ▪Data: ▪online data for earthquake occurrences from 1981 to 2013 in Japan, which consists of 16769 transactions with 1229 items. ▪Using C ▪Mac Pro, Mac OS 10.6, 3.33GHz, 16GB 39
40
Performance of the Skip LC-SS algorithm 40
41
Performance of the Skip LC-SS algorithm 41
42
Performance of the Skip LC-SS algorithm 42
43
Improvement of Skip LC-SS algorithm ▪Two bottleneck: ▪1.updating entries ▪2.replace entries ▪The replacement operation tends to be more expensive than the update one 43
44
R-skip 44
45
T-skip 45
46
46
47
47
48
48
49
Contents ▪Introduction ▪Preliminary and Background ▪Parameter-Oriented V.S. Resource-Oriented ▪LC-SS Algorithm ▪Skip LC-SS Algorithm (Introduction & Performance & Improvement) ▪Furthermore Improvements ▪Conclusion and Future Work 49
50
Furthermore Improvements ▪Key idea: use the stream reduction to dynamically repress each transaction ▪One fact: most items in bursty transactions are non-frequest ▪The principle of non-monotonicity: every itemset with any non-frequest item is no longer frequent ▪Eliminate non-frequent items from each transaction ▪In this paper, use SS-ST algorithm to perform the stream reduction 50
51
SS-ST algorithm 51
52
Experimental results 52
53
Experimental results 53 ▪Web-log data: 19466 transactions with 9961 items, the maximal length decreases by 29 from 106
54
Experimental results ▪Retail data: 88162 transactions with 16470 items, the maximal length decrease by 58 from 76 54
55
Contents ▪Introduction ▪Preliminary and Background ▪Parameter-Oriented V.S. Resource-Oriented ▪LC-SS Algorithm ▪Skip LC-SS Algorithm (Introduction & Performance & Improvement) ▪Furthermore Improvements ▪Conclusion and Future Work 55
56
Conclusion 56
57
Future Work ▪1.introduce efficient data structures for the Skip LC-SS algorithm ▪2.investigate the adaptive approach using the Skip LC-SS algorithm that can fit the relevant resource in the context of FIM-DS 57
58
Thank you! 58
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.