Presentation is loading. Please wait.

Presentation is loading. Please wait.

Resource-oriented Approximation for Frequent Itemset Mining from Bursty Data Streams SIGMOD’14 Toshitaka Yamamoto, Koji Iwanuma, Shoshi Fukuda.

Similar presentations


Presentation on theme: "Resource-oriented Approximation for Frequent Itemset Mining from Bursty Data Streams SIGMOD’14 Toshitaka Yamamoto, Koji Iwanuma, Shoshi Fukuda."— Presentation transcript:

1 Resource-oriented Approximation for Frequent Itemset Mining from Bursty Data Streams SIGMOD’14 Toshitaka Yamamoto, Koji Iwanuma, Shoshi Fukuda

2 Introduction ▪A data stream: unbounded sequence of data arriving at high speed ▪FIM-DS: Frequent Itemset Mining form Data Stream ▪i.e. : {a}:4 , {b}:3 , {c}:3 ▪Application of FIM-DS: monitoring surveillance systems, communication networks, retail industry…… ▪A Challenging Problem of FIM-DS: handling a huge combinatorial number of entries to be generated form each streaming transaction and stored in memory ▪This study considers approximation techniques for FIM-DS. 2

3 Introduction Some approximation methods for FIM-DS: ▪Parameter-oriented approaches ▪One-scan approximation algorithm ▪Two Type: deletion approach & random sampling approach ▪provide some guarantee that the resulting itemsets have frequencies with errors bounded by a given parameter ▪No false negative under some condition 3

4 Introduction 4

5 5

6 Contents ▪Introduction ▪Preliminary and Background ▪Parameter-Oriented V.S. Resource-Oriented ▪LC-SS Algorithm ▪Skip LC-SS Algorithm (Introduction & Performance & Improvement) ▪Furthermore Improvements ▪Conclusion and Future Work 6

7 Notations and Terminology 7

8 8

9 9

10 10

11 Notations and Terminology 11

12 Lossy Counting algorithm 12

13 Lossy Counting algorithm 13

14 Lossy Counting algorithm ▪The challenging problem: ▪The LC algorithm must generate (and check) every transaction subset at least once ▪Combinatorial explosion of memory consumption 14

15 Space-Saving algorithm 15

16 Space-Saving algorithm 16

17 Contents ▪Introduction ▪Preliminary and Background ▪Parameter-Oriented V.S. Resource-Oriented ▪LC-SS Algorithm ▪Skip LC-SS Algorithm (Introduction & Performance & Improvement) ▪Furthermore Improvements ▪Conclusion and Future Work 17

18 Parameter-Oriented V.S. Resource-Oriented 18

19 19

20 20

21 Parameter-Oriented V.S. Resource-Oriented 21

22 Parameter-Oriented V.S. Resource-Oriented ▪Resource-Oriented approaches: ▪Approximation methods ▪Guarantee a resource-specified constraint: memory consumption or data processing time 22

23 Contents ▪Introduction ▪Preliminary and Background ▪Parameter-Oriented V.S. Resource-Oriented ▪LC-SS Algorithm ▪Skip LC-SS Algorithm (Introduction & Performance & Improvement) ▪Furthermore Improvements ▪Conclusion and Future Work 23

24 LC-SS Algorithm 24

25 LC-SS Algorithm 25

26 LC-SS Algorithm 26

27 LC-SS Algorithm 27

28 The validity in the LC-SS Algorithm 28

29 The validity in the LC-SS Algorithm 29

30 The validity in the LC-SS Algorithm ▪Theorem 2 guarantees the validity(i.e., completeness and accuracy) of the outputs by Algorithm 1. 30

31 The validity in the LC-SS Algorithm 31

32 Contents ▪Introduction ▪Preliminary and Background ▪Parameter-Oriented V.S. Resource-Oriented ▪LC-SS Algorithm ▪Skip LC-SS Algorithm (Introduction & Performance & Improvement) ▪Furthermore Improvements ▪Conclusion and Future Work 32

33 Skip LC-SS Algorithm 33

34 Skip LC-SS Algorithm 34

35 Skip LC-SS Algorithm 35

36 Skip LC-SS Algorithm 36

37 Skip LC-SS Algorithm 37

38 The Validity of the output 38

39 Performance of the Skip LC-SS algorithm ▪Data: ▪online data for earthquake occurrences from 1981 to 2013 in Japan, which consists of 16769 transactions with 1229 items. ▪Using C ▪Mac Pro, Mac OS 10.6, 3.33GHz, 16GB 39

40 Performance of the Skip LC-SS algorithm 40

41 Performance of the Skip LC-SS algorithm 41

42 Performance of the Skip LC-SS algorithm 42

43 Improvement of Skip LC-SS algorithm ▪Two bottleneck: ▪1.updating entries ▪2.replace entries ▪The replacement operation tends to be more expensive than the update one 43

44 R-skip 44

45 T-skip 45

46 46

47 47

48 48

49 Contents ▪Introduction ▪Preliminary and Background ▪Parameter-Oriented V.S. Resource-Oriented ▪LC-SS Algorithm ▪Skip LC-SS Algorithm (Introduction & Performance & Improvement) ▪Furthermore Improvements ▪Conclusion and Future Work 49

50 Furthermore Improvements ▪Key idea: use the stream reduction to dynamically repress each transaction ▪One fact: most items in bursty transactions are non-frequest ▪The principle of non-monotonicity: every itemset with any non-frequest item is no longer frequent ▪Eliminate non-frequent items from each transaction ▪In this paper, use SS-ST algorithm to perform the stream reduction 50

51 SS-ST algorithm 51

52 Experimental results 52

53 Experimental results 53 ▪Web-log data: 19466 transactions with 9961 items, the maximal length decreases by 29 from 106

54 Experimental results ▪Retail data: 88162 transactions with 16470 items, the maximal length decrease by 58 from 76 54

55 Contents ▪Introduction ▪Preliminary and Background ▪Parameter-Oriented V.S. Resource-Oriented ▪LC-SS Algorithm ▪Skip LC-SS Algorithm (Introduction & Performance & Improvement) ▪Furthermore Improvements ▪Conclusion and Future Work 55

56 Conclusion 56

57 Future Work ▪1.introduce efficient data structures for the Skip LC-SS algorithm ▪2.investigate the adaptive approach using the Skip LC-SS algorithm that can fit the relevant resource in the context of FIM-DS 57

58 Thank you! 58


Download ppt "Resource-oriented Approximation for Frequent Itemset Mining from Bursty Data Streams SIGMOD’14 Toshitaka Yamamoto, Koji Iwanuma, Shoshi Fukuda."

Similar presentations


Ads by Google