Fast and Exact Monitoring of Co-evolving Data Streams Yasuko Matsubara, Yasushi Sakurai (Kumamoto University) Naonori Ueda (NTT) Masatoshi Yoshikawa (Kyoto University) ICDM 2014Y. Matsubara et al.1 Speaker: Yasushi Sakurai
Motivation Given: co-evolving data streams – e.g., MoCap (leg/arm sensors) Y. Matsubara et al.2ICDM 2014
Motivation Given: co-evolving data streams – e.g., MoCap (leg/arm sensors) Y. Matsubara et al.3ICDM 2014 Multiple distinct patterns different durations e.g., Walking, punching
Motivation Given: co-evolving data streams – e.g., MoCap (leg/arm sensors) Y. Matsubara et al.4ICDM 2014 Q1: Walking Q2: Running Q3: Punching Queries :
Motivation Given: co-evolving data streams – e.g., MoCap (leg/arm sensors) Y. Matsubara et al.5ICDM 2014 Q1: Walking Q2: Running Q3: Punching Queries : Q. Can we find subsequences that have the characteristics of query ?
Outline ICDM Motivation -Problem formulation -Main ideas -Experiments -StreamScan at work -Conclusions Y. Matsubara et al.
Background ICDM 2014Y. Matsubara et al.7 Goal: statistical monitoring of time-varying data streams
Background ICDM 2014Y. Matsubara et al.8 Goal: statistical monitoring of time-varying data streams Query (punch) Data stream
Background ICDM 2014Y. Matsubara et al.9 Goal: statistical monitoring of time-varying data streams Query (punch) Data stream Punch is here!
Background ICDM 2014Y. Matsubara et al.10 Goal: statistical monitoring of time-varying data streams Query (punch) Data stream Hidden Markov models (HMMs)
Background ICDM 2014Y. Matsubara et al.11 Hidden Markov models (HMMs) Initial state probabilities State transition probabilities Output probabilities state 1 state 3 state 2 Details
Background ICDM 2014Y. Matsubara et al.12 -Model: -Sequence: Likelihood: Details state 1 state 3 state 2 Viterbi path Trellis structure
Background ICDM 2014Y. Matsubara et al.13 -Model: -Sequence: Likelihood: Details state 1 state 3 state 2 Viterbi path Trellis structure
Background ICDM Given: a data stream X and model Y. Matsubara et al.
Background ICDM Given: a data stream X and model Y. Matsubara et al. Find: subsequences that has a high likelihood value
Requirements ICDM Exponential threshold function 2.Minimum length of subsequences 3.Non-overlapping matches Y. Matsubara et al.
Requirements ICDM Exponential threshold function 2.Minimum length of subsequences 3.Non-overlapping matches Y. Matsubara et al. x1x1 xnxn X
Requirements ICDM Exponential threshold function 2.Minimum length of subsequences 3.Non-overlapping matches Y. Matsubara et al. x1x1 xnxn X Walk Jump Walk Jump Punch
Requirements ICDM Exponential threshold function 2.Minimum length of subsequences 3.Non-overlapping matches Y. Matsubara et al. x1x1 xnxn X
Requirements ICDM Exponential threshold function 2.Minimum length of subsequences 3.Non-overlapping matches Y. Matsubara et al. x1x1 xnxn X
Requirements ICDM Exponential threshold function 2.Minimum length of subsequences 3.Non-overlapping matches Y. Matsubara et al. x1x1 xnxn X ignore
Requirements ICDM Exponential threshold function 2.Minimum length of subsequences 3.Non-overlapping matches Y. Matsubara et al. x1x1 xnxn X Best match Candidates
Problem definition ICDM Given: data stream Model thresholds Y. Matsubara et al. X
Problem definition ICDM Given: data stream Model thresholds Report: all subsequences that satisfy: only the local maximum, among several overlapping matches Y. Matsubara et al. X Best match
Outline ICDM Motivation -Problem formulation -Main ideas -Experiments -StreamScan at work -Conclusions Y. Matsubara et al.
Previous solution ICDM Y. Matsubara et al. Sliding model method (SMM), by Wilpon et al. Compute likelihood starting from every time-tick
Previous solution ICDM Y. Matsubara et al. Sliding model method (SMM), by Wilpon et al. Compute likelihood starting from every time-tick O(n) trellis structures Update O(nk 2 ) for each time-tick
Main idea: StreamScan ICDM Y. Matsubara et al. (a) Cumulative likelihood function (b) Subsequence trellis structure (STS)
ICDM Y. Matsubara et al. (a) Cumulative likelihood function Details Main idea (a)
ICDM Y. Matsubara et al. (a) Cumulative likelihood function Details It requires a single structure It guarantees the best matches Main idea (a)
ICDM Y. Matsubara et al. (b) Subsequence trellis structure (STS) Details Main idea (b)
ICDM Y. Matsubara et al. (b) Subsequence trellis structure (STS) Details Main idea (b) : Starting position : Cum. likelihood For each cell:
ICDM Y. Matsubara et al. Example, Details Algorithm state 1 state 3 state 2
ICDM Y. Matsubara et al. Example, Details Algorithm State 10 (1) State 20 (1) State 30 (1) x 1 =3x 2 =1x 3 =1x 4 =2x 5 =3x 6 =3x 7 =3x 8 =1 (STS, k=3)
ICDM Y. Matsubara et al. Example, Details Algorithm State 10 (1) State 20 (1) State 30 (1) x 1 =3x 2 =1x 3 =1x 4 =2x 5 =3x 6 =3x 7 =3x 8 =1 : Cum. Likelihood : Starting position (STS, k=3)
ICDM Y. Matsubara et al. Example, Details Algorithm State 10 (1) 10 (2) State 20 (1) 0 (2) State 30 (1) 0 (2) x 1 =3x 2 =1x 3 =1x 4 =2x 5 =3x 6 =3x 7 =3x 8 =1 candidate! (STS, k=3)
ICDM Y. Matsubara et al. Example, Details Algorithm State 10 (1) 10 (2) 50 (2) State 20 (1) 0 (2) 37.5 (2) State 30 (1) 0 (2) 0 (3) x 1 =3x 2 =1x 3 =1x 4 =2x 5 =3x 6 =3x 7 =3x 8 =1 (STS, k=3)
ICDM Y. Matsubara et al. Example, Details Algorithm State 10 (1) 10 (2) 50 (2) 0 (4) State 20 (1) 0 (2) 37.5 (2) 62.5 (2) State 30 (1) 0 (2) 0 (3) 0 (4) x 1 =3x 2 =1x 3 =1x 4 =2x 5 =3x 6 =3x 7 =3x 8 =1 (STS, k=3)
ICDM Y. Matsubara et al. Example, Details Algorithm State 10 (1) 10 (2) 50 (2) 0 (4) 0 (5) 0 (6) 0 (7) State 20 (1) 0 (2) 37.5 (2) 62.5 (2) 0 (5) 0 (6) 0 (7) State 30 (1) 0 (2) 0 (3) 0 (4) (2) (2) (2) x 1 =3x 2 =1x 3 =1x 4 =2x 5 =3x 6 =3x 7 =3x 8 =1 (STS, k=3)
ICDM Y. Matsubara et al. Example, Details Algorithm State 10 (1) 10 (2) 50 (2) 0 (4) 0 (5) 0 (6) 0 (7) 10 (8) State 20 (1) 0 (2) 37.5 (2) 62.5 (2) 0 (5) 0 (6) 0 (7) 0 (8) State 30 (1) 0 (2) 0 (3) 0 (4) (2) (2) (2) 0 (8) x 1 =3x 2 =1x 3 =1x 4 =2x 5 =3x 6 =3x 7 =3x 8 =1 Report X[2:7] (STS, k=3)
ICDM Y. Matsubara et al. StreamScan guarantees -no false dismissals -O(1) space and time per time-tick (Details in paper!) Theoretical analysis Details
Outline ICDM Motivation -Problem formulation -Main ideas -Experiments -StreamScan at work -Conclusions Y. Matsubara et al.
Experiments We answer the following questions… Q1. Effectiveness How successful is StreamScan in capturing sequence patterns? Q2. Scalability How does it scale with the sequence lengths n in terms of time and space? ICDM Y. Matsubara et al.
Q1. Effectiveness (MoCap) ICDM Y. Matsubara et al. (Query) (Output)
Q1. Effectiveness ICDM Y. Matsubara et al. MoCap Q1. Walk Q2. Jump Q1. Walk Q2. Jump Q3. Moving arms
Q2. Scalability Time/space vs. data size (length) : n ICDM Y. Matsubara et al. StreamScan requires constant time/space, i.e., O(1) (n) (time)(space)
Q2. Scalability Time/space vs. data size (length) : n ICDM Y. Matsubara et al. StreamScan requires constant time/space, i.e., O(1) (n) (time)(space) SFR’ x SMM’89 479,000x
Outline ICDM Motivation -Problem formulation -Main ideas -Experiments -StreamScan at work -Conclusions Y. Matsubara et al.
StreamScan at work StreamScan is capable of various applications, e.g., App1. Social activity monitoring - Web-click streams App2. Extreme detection in keyword stream - GoogleTrend streams ICDM Y. Matsubara et al.
StreamScan at work StreamScan is capable of various applications, e.g., App1. Social activity monitoring - Web-click streams App2. Extreme detection in keyword stream - GoogleTrend streams ICDM Y. Matsubara et al.
Application (1) Social activity monitoring Given: Web-click sequences (1 month, 5urls) Query: first Sunday ( 5urls: blog, news, dictionary, Q&A, mail ) ICDM Y. Matsubara et al. Time (per 10 seconds) Query
Application (1) Social activity monitoring Given: Web-click sequences (1 month, 5urls) Query: first Sunday ( 5urls: blog, news, dictionary, Q&A, mail ) ICDM Y. Matsubara et al. Time (per 10 seconds) Query StreamScan identifies all weekends + holiday Detected!
StreamScan at work StreamScan is capable of various applications, e.g., App1. Social activity monitoring - Web-click streams App2. Extreme detection in keyword stream - GoogleTrend streams ICDM Y. Matsubara et al.
“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) ICDM Y. Matsubara et al. Application (2)
“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) e.g., Finance-related keywords ICDM Y. Matsubara et al. Application (2) Time (weekly, 10 years)
“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) e.g., Finance-related keywords ICDM Y. Matsubara et al. Application (2) Any extreme behavior? Time (weekly, 10 years) ?
“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) e.g., Finance-related keywords ICDM Y. Matsubara et al. Application (2) Any extreme behavior? Time (weekly, 10 years) (first 3 years) Query ?
“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) e.g., Finance-related keywords ICDM Y. Matsubara et al. Application (2) Time (weekly, 10 years) (first 3 years) Global financial crisis in 2008! Detected!
“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) e.g., Flu-related keywords ICDM Y. Matsubara et al. Application (2) (first 3 years) Query ?
“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) e.g., Flu-related keywords ICDM Y. Matsubara et al. Application (2) Swine flu pandemic in 2009 Detected!
“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) e.g., Seasonal sweets-related keywords ICDM Y. Matsubara et al. Application (2) ? (first 3 years) Query
“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) e.g., Seasonal sweets-related keywords ICDM Y. Matsubara et al. Application (2) Release of Android OS, “Gingerbread” “Ice Cream Sandwich” Detected!
Outline ICDM Motivation -Problem formulation -Main ideas -Experiments -StreamScan at work -Conclusions Y. Matsubara et al.
Conclusions StreamScan has the following properties Effective Find fruitful patterns from diverse data streams Exact It guarantees exactness Fast and nimble It requires single scan, O(1) space and time ICDM 2014Y. Matsubara et al.64 ✔ ✔ ✔
Thank you! ICDM Y. Matsubara et al.