Download presentation
Presentation is loading. Please wait.
Published byAllyson Copeland Modified over 9 years ago
1
Fast and Exact Monitoring of Co-evolving Data Streams Yasuko Matsubara, Yasushi Sakurai (Kumamoto University) Naonori Ueda (NTT) Masatoshi Yoshikawa (Kyoto University) ICDM 2014Y. Matsubara et al.1 Speaker: Yasushi Sakurai
2
Motivation Given: co-evolving data streams – e.g., MoCap (leg/arm sensors) Y. Matsubara et al.2ICDM 2014
3
Motivation Given: co-evolving data streams – e.g., MoCap (leg/arm sensors) Y. Matsubara et al.3ICDM 2014 Multiple distinct patterns different durations e.g., Walking, punching
4
Motivation Given: co-evolving data streams – e.g., MoCap (leg/arm sensors) Y. Matsubara et al.4ICDM 2014 Q1: Walking Q2: Running Q3: Punching Queries :
5
Motivation Given: co-evolving data streams – e.g., MoCap (leg/arm sensors) Y. Matsubara et al.5ICDM 2014 Q1: Walking Q2: Running Q3: Punching Queries : Q. Can we find subsequences that have the characteristics of query ?
6
Outline ICDM 20146 -Motivation -Problem formulation -Main ideas -Experiments -StreamScan at work -Conclusions Y. Matsubara et al.
7
Background ICDM 2014Y. Matsubara et al.7 Goal: statistical monitoring of time-varying data streams
8
Background ICDM 2014Y. Matsubara et al.8 Goal: statistical monitoring of time-varying data streams Query (punch) Data stream
9
Background ICDM 2014Y. Matsubara et al.9 Goal: statistical monitoring of time-varying data streams Query (punch) Data stream Punch is here!
10
Background ICDM 2014Y. Matsubara et al.10 Goal: statistical monitoring of time-varying data streams Query (punch) Data stream Hidden Markov models (HMMs)
11
Background ICDM 2014Y. Matsubara et al.11 Hidden Markov models (HMMs) Initial state probabilities State transition probabilities Output probabilities state 1 state 3 state 2 Details
12
Background ICDM 2014Y. Matsubara et al.12 -Model: -Sequence: Likelihood: Details state 1 state 3 state 2 Viterbi path Trellis structure
13
Background ICDM 2014Y. Matsubara et al.13 -Model: -Sequence: Likelihood: Details state 1 state 3 state 2 Viterbi path Trellis structure
14
Background ICDM 201414 Given: a data stream X and model Y. Matsubara et al.
15
Background ICDM 201415 Given: a data stream X and model Y. Matsubara et al. Find: subsequences that has a high likelihood value
16
Requirements ICDM 201416 1.Exponential threshold function 2.Minimum length of subsequences 3.Non-overlapping matches Y. Matsubara et al.
17
Requirements ICDM 201417 1.Exponential threshold function 2.Minimum length of subsequences 3.Non-overlapping matches Y. Matsubara et al. x1x1 xnxn X
18
Requirements ICDM 201418 1.Exponential threshold function 2.Minimum length of subsequences 3.Non-overlapping matches Y. Matsubara et al. x1x1 xnxn X Walk Jump Walk Jump Punch
19
Requirements ICDM 201419 1.Exponential threshold function 2.Minimum length of subsequences 3.Non-overlapping matches Y. Matsubara et al. x1x1 xnxn X
20
Requirements ICDM 201420 1.Exponential threshold function 2.Minimum length of subsequences 3.Non-overlapping matches Y. Matsubara et al. x1x1 xnxn X
21
Requirements ICDM 201421 1.Exponential threshold function 2.Minimum length of subsequences 3.Non-overlapping matches Y. Matsubara et al. x1x1 xnxn X ignore
22
Requirements ICDM 201422 1.Exponential threshold function 2.Minimum length of subsequences 3.Non-overlapping matches Y. Matsubara et al. x1x1 xnxn X Best match Candidates
23
Problem definition ICDM 201423 Given: data stream Model thresholds Y. Matsubara et al. X
24
Problem definition ICDM 201424 Given: data stream Model thresholds Report: all subsequences that satisfy: 1. 2. only the local maximum, among several overlapping matches Y. Matsubara et al. X Best match
25
Outline ICDM 201425 -Motivation -Problem formulation -Main ideas -Experiments -StreamScan at work -Conclusions Y. Matsubara et al.
26
Previous solution ICDM 201426Y. Matsubara et al. Sliding model method (SMM), by Wilpon et al. Compute likelihood starting from every time-tick
27
Previous solution ICDM 201427Y. Matsubara et al. Sliding model method (SMM), by Wilpon et al. Compute likelihood starting from every time-tick O(n) trellis structures Update O(nk 2 ) for each time-tick
28
Main idea: StreamScan ICDM 201428Y. Matsubara et al. (a) Cumulative likelihood function (b) Subsequence trellis structure (STS)
29
ICDM 201429Y. Matsubara et al. (a) Cumulative likelihood function Details Main idea (a)
30
ICDM 201430Y. Matsubara et al. (a) Cumulative likelihood function Details It requires a single structure It guarantees the best matches Main idea (a)
31
ICDM 201431Y. Matsubara et al. (b) Subsequence trellis structure (STS) Details Main idea (b)
32
ICDM 201432Y. Matsubara et al. (b) Subsequence trellis structure (STS) Details Main idea (b) : Starting position : Cum. likelihood For each cell:
33
ICDM 201433Y. Matsubara et al. Example, Details Algorithm state 1 state 3 state 2
34
ICDM 201434Y. Matsubara et al. Example, Details Algorithm State 10 (1) State 20 (1) State 30 (1) x 1 =3x 2 =1x 3 =1x 4 =2x 5 =3x 6 =3x 7 =3x 8 =1 (STS, k=3)
35
ICDM 201435Y. Matsubara et al. Example, Details Algorithm State 10 (1) State 20 (1) State 30 (1) x 1 =3x 2 =1x 3 =1x 4 =2x 5 =3x 6 =3x 7 =3x 8 =1 : Cum. Likelihood : Starting position (STS, k=3)
36
ICDM 201436Y. Matsubara et al. Example, Details Algorithm State 10 (1) 10 (2) State 20 (1) 0 (2) State 30 (1) 0 (2) x 1 =3x 2 =1x 3 =1x 4 =2x 5 =3x 6 =3x 7 =3x 8 =1 candidate! (STS, k=3)
37
ICDM 201437Y. Matsubara et al. Example, Details Algorithm State 10 (1) 10 (2) 50 (2) State 20 (1) 0 (2) 37.5 (2) State 30 (1) 0 (2) 0 (3) x 1 =3x 2 =1x 3 =1x 4 =2x 5 =3x 6 =3x 7 =3x 8 =1 (STS, k=3)
38
ICDM 201438Y. Matsubara et al. Example, Details Algorithm State 10 (1) 10 (2) 50 (2) 0 (4) State 20 (1) 0 (2) 37.5 (2) 62.5 (2) State 30 (1) 0 (2) 0 (3) 0 (4) x 1 =3x 2 =1x 3 =1x 4 =2x 5 =3x 6 =3x 7 =3x 8 =1 (STS, k=3)
39
ICDM 201439Y. Matsubara et al. Example, Details Algorithm State 10 (1) 10 (2) 50 (2) 0 (4) 0 (5) 0 (6) 0 (7) State 20 (1) 0 (2) 37.5 (2) 62.5 (2) 0 (5) 0 (6) 0 (7) State 30 (1) 0 (2) 0 (3) 0 (4) 156.25 (2) 1562.5 (2) 15625 (2) x 1 =3x 2 =1x 3 =1x 4 =2x 5 =3x 6 =3x 7 =3x 8 =1 (STS, k=3)
40
ICDM 201440Y. Matsubara et al. Example, Details Algorithm State 10 (1) 10 (2) 50 (2) 0 (4) 0 (5) 0 (6) 0 (7) 10 (8) State 20 (1) 0 (2) 37.5 (2) 62.5 (2) 0 (5) 0 (6) 0 (7) 0 (8) State 30 (1) 0 (2) 0 (3) 0 (4) 156.25 (2) 1562.5 (2) 15625 (2) 0 (8) x 1 =3x 2 =1x 3 =1x 4 =2x 5 =3x 6 =3x 7 =3x 8 =1 Report X[2:7] (STS, k=3)
41
ICDM 201441Y. Matsubara et al. StreamScan guarantees -no false dismissals -O(1) space and time per time-tick (Details in paper!) Theoretical analysis Details
42
Outline ICDM 201442 -Motivation -Problem formulation -Main ideas -Experiments -StreamScan at work -Conclusions Y. Matsubara et al.
43
Experiments We answer the following questions… Q1. Effectiveness How successful is StreamScan in capturing sequence patterns? Q2. Scalability How does it scale with the sequence lengths n in terms of time and space? ICDM 201443Y. Matsubara et al.
44
Q1. Effectiveness (MoCap) ICDM 201444Y. Matsubara et al. (Query) (Output)
45
Q1. Effectiveness ICDM 201445Y. Matsubara et al. MoCap Q1. Walk Q2. Jump Q1. Walk Q2. Jump Q3. Moving arms
46
Q2. Scalability Time/space vs. data size (length) : n ICDM 201446Y. Matsubara et al. StreamScan requires constant time/space, i.e., O(1) (n) (time)(space)
47
Q2. Scalability Time/space vs. data size (length) : n ICDM 201447Y. Matsubara et al. StreamScan requires constant time/space, i.e., O(1) (n) (time)(space) SFR’05 1383x SMM’89 479,000x
48
Outline ICDM 201448 -Motivation -Problem formulation -Main ideas -Experiments -StreamScan at work -Conclusions Y. Matsubara et al.
49
StreamScan at work StreamScan is capable of various applications, e.g., App1. Social activity monitoring - Web-click streams App2. Extreme detection in keyword stream - GoogleTrend streams ICDM 201449Y. Matsubara et al.
50
StreamScan at work StreamScan is capable of various applications, e.g., App1. Social activity monitoring - Web-click streams App2. Extreme detection in keyword stream - GoogleTrend streams ICDM 201450Y. Matsubara et al.
51
Application (1) Social activity monitoring Given: Web-click sequences (1 month, 5urls) Query: first Sunday ( 5urls: blog, news, dictionary, Q&A, mail ) ICDM 201451Y. Matsubara et al. Time (per 10 seconds) Query
52
Application (1) Social activity monitoring Given: Web-click sequences (1 month, 5urls) Query: first Sunday ( 5urls: blog, news, dictionary, Q&A, mail ) ICDM 201452Y. Matsubara et al. Time (per 10 seconds) Query StreamScan identifies all weekends + holiday Detected!
53
StreamScan at work StreamScan is capable of various applications, e.g., App1. Social activity monitoring - Web-click streams App2. Extreme detection in keyword stream - GoogleTrend streams ICDM 201453Y. Matsubara et al.
54
“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) ICDM 201454Y. Matsubara et al. Application (2)
55
“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) e.g., Finance-related keywords ICDM 201455Y. Matsubara et al. Application (2) Time (weekly, 10 years)
56
“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) e.g., Finance-related keywords ICDM 201456Y. Matsubara et al. Application (2) Any extreme behavior? Time (weekly, 10 years) ?
57
“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) e.g., Finance-related keywords ICDM 201457Y. Matsubara et al. Application (2) Any extreme behavior? Time (weekly, 10 years) (first 3 years) Query ?
58
“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) e.g., Finance-related keywords ICDM 201458Y. Matsubara et al. Application (2) Time (weekly, 10 years) (first 3 years) Global financial crisis in 2008! Detected!
59
“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) e.g., Flu-related keywords ICDM 201459Y. Matsubara et al. Application (2) (first 3 years) Query ?
60
“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) e.g., Flu-related keywords ICDM 201460Y. Matsubara et al. Application (2) Swine flu pandemic in 2009 Detected!
61
“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) e.g., Seasonal sweets-related keywords ICDM 201461Y. Matsubara et al. Application (2) ? (first 3 years) Query
62
“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) e.g., Seasonal sweets-related keywords ICDM 201462Y. Matsubara et al. Application (2) Release of Android OS, “Gingerbread” “Ice Cream Sandwich” Detected!
63
Outline ICDM 201463 -Motivation -Problem formulation -Main ideas -Experiments -StreamScan at work -Conclusions Y. Matsubara et al.
64
Conclusions StreamScan has the following properties Effective Find fruitful patterns from diverse data streams Exact It guarantees exactness Fast and nimble It requires single scan, O(1) space and time ICDM 2014Y. Matsubara et al.64 ✔ ✔ ✔
65
Thank you! ICDM 201465Y. Matsubara et al. http://www.cs.kumamoto- u.ac.jp/~yasuko/yasuko@cs.kumamoto-u.ac.jp
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.