Presentation is loading. Please wait.

Presentation is loading. Please wait.

AutoPlait: Automatic Mining of Co-evolving Time Sequences Yasuko Matsubara (Kumamoto University) Yasushi Sakurai (Kumamoto University) Christos Faloutsos.

Similar presentations


Presentation on theme: "AutoPlait: Automatic Mining of Co-evolving Time Sequences Yasuko Matsubara (Kumamoto University) Yasushi Sakurai (Kumamoto University) Christos Faloutsos."— Presentation transcript:

1 AutoPlait: Automatic Mining of Co-evolving Time Sequences Yasuko Matsubara (Kumamoto University) Yasushi Sakurai (Kumamoto University) Christos Faloutsos (CMU) SIGMOD 2014Y. Matsubara et al.1

2 Time Motivation Given: co-evolving time-series – e.g., MoCap (leg/arm sensors) “Chicken dance” Y. Matsubara et al.2SIGMOD 2014

3 Time Motivation Given: co-evolving time-series – e.g., MoCap (leg/arm sensors) “Chicken dance” Y. Matsubara et al.3SIGMOD 2014 Q. Any distinct patterns? Q. If yes, how many? Q. What kind?

4 Motivation Challenges: co-evolving sequences – Unknown # of patterns (e.g., beaks) – Different durations Y. Matsubara et al.4SIGMOD 2014 Time …

5 Motivation Challenges: co-evolving sequences – Unknown # of patterns (e.g., beaks) – Different durations Y. Matsubara et al.5SIGMOD 2014 Time … Q. Can we summarize it automatically ?

6 Motivation Goal: find patterns that agree with human intuition Y. Matsubara et al.6SIGMOD 2014 Time

7 Motivation Goal: find patterns that agree with human intuition Y. Matsubara et al.7SIGMOD 2014 AutoPlait: “fully-automatic” mining algorithm

8 Importance of “fully-automatic” No magic numbers!... because, Big data mining: -> we cannot afford human intervention!! Y. Matsubara et al.8SIGMOD 2014 Manual -sensitive to the parameter tuning -long tuning steps (hours, days, …) Automatic (no magic numbers) - no expert tuning required

9 Outline SIGMOD 20149 -Motivation -Problem definition -Compression & summarization -Algorithms -Experiments -AutoPlait at work -Conclusions Y. Matsubara et al.

10 Problem definition SIGMOD 201410Y. Matsubara et al. Key concepts -Bundle: -Segment: -Regime: -Segment-membership: given hidden

11 Problem definition SIGMOD 201411Y. Matsubara et al. Bundle : set of d co-evolving sequences Bundle X (d=4) given

12 Problem definition SIGMOD 201412Y. Matsubara et al. Segment: convert X -> m segments, S Segment (m=8) s1s2 s3s4 s5s6 s7 s8 hidden

13 Problem definition SIGMOD 201413Y. Matsubara et al. Regime: segment groups: : model of regime r Regimes (r=4) wings beaks hidden

14 Problem definition SIGMOD 201414Y. Matsubara et al. Segment-membership: assignment Segment- membership (m=8) F = { 2, 4, 1, 3, 2, 4, 1, 3 } hidden

15 Problem definition SIGMOD 201415 Given: bundle X Y. Matsubara et al.

16 Problem definition SIGMOD 201416 Given: bundle X Find: compact description C of X Y. Matsubara et al.

17 Problem definition SIGMOD 201417 Given: bundle X Find: compact description C of X Y. Matsubara et al. m segments r regimes Segment-membership

18 Outline SIGMOD 201418 -Motivation -Problem definition -Compression & summarization -Algorithms -Experiments -AutoPlait at work -Conclusions Y. Matsubara et al.

19 Main ideas Goal: compact description of X without user intervention!! Challenges: Q1. How to generate ‘informative’ regimes ? Q2. How to decide # of regimes/segments ? SIGMOD 2014Y. Matsubara et al.19

20 Main ideas Goal: compact description of X without user intervention!! Challenges: Q1. How to generate ‘informative’ regimes ? Idea (1): Multi-level chain model Q2. How to decide # of regimes/segments ? Idea (2): Model description cost SIGMOD 2014Y. Matsubara et al.20

21 SIGMOD 201421 Idea (1): MLCM: multi-level chain model Q1. How to generate ‘informative’ regimes ? Y. Matsubara et al. Model SequencesRegimes beaks claps wings

22 SIGMOD 201422 Idea (1): MLCM: multi-level chain model Q1. How to generate ‘informative’ regimes ? Idea (1): Multi-level chain model – HMM-based probabilistic model – with “across-regime” transitions Y. Matsubara et al. beaks wings claps Model SequencesRegimes

23 Idea (1): MLCM: multi-level chain model SIGMOD 2014Y. Matsubara et al.23 r regimes (HMMs) across-regime transition prob. Single HMM parameters Details

24 Idea (1): MLCM: multi-level chain model SIGMOD 2014Y. Matsubara et al.24 Regimes r=2 Regime 1 (k=3) Regime 2 (k=2) r regimes (HMMs) across-regime transition prob. Single HMM parameters state 1 state 3 state 2 state 1 state 2 Regime switch Regime1 “beaks” Regime2 “wings” Details

25 SIGMOD 201425 Idea (2): model description cost Q2. How to decide # of regimes/segments ? Idea (2): Model description cost Minimize coding cost find “optimal” # of segments/regimes Y. Matsubara et al.

26 Idea (2): model description cost Idea: Minimize encoding cost! SIGMOD 2014Y. Matsubara et al.26 Good compression Good description (# of r, m) Cost M (M) + Cost c (X|M) Model cost Coding cost min ( )

27 Idea (2): model description cost Total cost of bundle X, given C SIGMOD 2014Y. Matsubara et al.27 Details

28 Idea (2): model description cost Total cost of bundle X, given C SIGMOD 2014Y. Matsubara et al.28 Model description cost of Θ Coding cost of X given Θ # of segments/regim es segment lengths segment- membership F duration/dimensi ons Details

29 Outline SIGMOD 201429 -Motivation -Problem definition -Compression & summarization -Algorithms -Experiments -AutoPlait at work -Conclusions Y. Matsubara et al.

30 AutoPlait Overview SIGMOD 2014Y. Matsubara et al.30 Iteration 0 r=1, m=1 Start! 1 2 3 4 5 6 7 Iteration 0 1 2 3 4 5 6 7 Iteration r=1

31 AutoPlait Overview SIGMOD 2014Y. Matsubara et al.31 Iteration 1 r=2, m=4 0 1 2 3 4 5 6 7 Iteration r=1 r=2 Split

32 AutoPlait Overview SIGMOD 2014Y. Matsubara et al.32 Iteration 1 r=2, m=4 Iteration 2 r=3, m=6 Split 0 1 2 3 4 5 6 7 Iteration r=1 r=2 r=3

33 AutoPlait Overview SIGMOD 2014Y. Matsubara et al.33 Iteration 1 r=2, m=4 Iteration 2 r=3, m=6 Iteration 4 r=4, m=8 Split 1 2 3 4 5 6 7 Iteration 0 1 2 3 4 5 6 7 Iteration r=1 r=2 r=3 r=4

34 AutoPlait Algorithms 1.CutPointSearch Find good cut-points/segments 2.RegimeSplit Estimate good regime parameters Θ 3.AutoPlait Search for the best number of regimes (r=2,3,4…) SIGMOD 2014Y. Matsubara et al.34 Inner-most loop Inner loop Outer loop

35 1.CutPointSearch Given: - bundle - parameters of two regimes Find: cut-points of segment sets S 1, S 2, SIGMOD 2014Y. Matsubara et al.35 Inner-most loop

36 1.CutPointSearch SIGMOD 2014Y. Matsubara et al.36 state 1 state 2 state 3 state 2 state 1 switch?? Inner-most loop DP algorithm to compute likelihood: Details

37 1.CutPointSearch Theoretical analysis Scalability - It takes time (only single scan) - n: length of X - d: dimension of X - k: # of hidden states in regime Accuracy It guarantees the optimal cut points - (Details in paper) SIGMOD 2014Y. Matsubara et al.37 Inner-most loop Details

38 2.RegimeSplit Given: - bundle Find: two regimes 1. find cut-points of segment sets: S 1, S 2 2. estimate parameters of two regimes: SIGMOD 2014Y. Matsubara et al.38 Inner loop

39 2.RegimeSplit Two-phase iterative approach -Phase 1: (CutPointSearch) -Split segments into two groups : -Phase 2: (BaumWelch) -Update model parameters: SIGMOD 2014Y. Matsubara et al.39 Phase 1 Phase 2 Inner loop Details

40 3.AutoPlait Given: - bundle Find: r regimes (r=2, 3, 4, … ) - i.e., find full parameter set SIGMOD 2014Y. Matsubara et al.40 Outer loop

41 3.AutoPlait Split regimes r=2,3,…, as long as cost keeps decreasing - Find appropriate # of regimes SIGMOD 2014Y. Matsubara et al.41 Outer loop

42 3.AutoPlait Split regimes r=2,3,…, as long as cost keeps decreasing - Find appropriate # of regimes SIGMOD 2014Y. Matsubara et al.42 r=2, m=4 r=3, m=6 r=4, m=8 Outer loop

43 Outline SIGMOD 201443 -Motivation -Problem definition -Compression & summarization -Algorithms -Experiments -AutoPlait at work -Conclusions Y. Matsubara et al.

44 Experiments We answer the following questions… Q1. Sense-making Can it help us understand the given bundles? Q2. Accuracy How well does it find cut-points and regimes? Q3. Scalability How does it scale in terms of computational time? SIGMOD 201444Y. Matsubara et al.

45 Q1. Sense-making MoCap data SIGMOD 201445Y. Matsubara et al. AutoPlait (NO magic numbers) DynaMMo (Li et al., KDD’09) pHMM (Wang et al., SIGMOD’11)

46 Q1. Sense-making MoCap data SIGMOD 201446Y. Matsubara et al. AutoPlait (NO magic numbers)

47 Q2. Accuracy (a) Segmentation (b) Clustering SIGMOD 201447Y. Matsubara et al. AutoPlait needs “no magic numbers” Target

48 Q3. Scalability Wall clock time vs. data size (length) : n SIGMOD 201448Y. Matsubara et al. AutoPlait scales linearly, i.e., O(n) AutoPlait DynaMMo pHMM

49 Outline SIGMOD 201449 -Motivation -Problem definition -Compression & summarization -Algorithms -Experiments -AutoPlait at work -Conclusions Y. Matsubara et al.

50 AutoPlait at work AutoPlait is capable of various applications, e.g., App1. Model analysis - Web-click sequences App2. Event discovery - Google Trend data SIGMOD 201450Y. Matsubara et al.

51 AutoPlait at work AutoPlait is capable of various applications, e.g., App1. Model analysis - Web-click sequences App2. Event discovery - Google Trend data SIGMOD 201451Y. Matsubara et al.

52 App1. Model analysis (WebClick) Web-click sequences (1 month, 5urls) – 5urls: blog, news, dictionary, Q&A, mail – every 10 minutes SIGMOD 201452Y. Matsubara et al. Time (per 10 minutes)

53 App1. Model analysis (WebClick) Web-click sequences (1 month, 5urls) SIGMOD 201453Y. Matsubara et al. AutoPlait finds 2 patterns: weekday/weekend !

54 App1. Model analysis (WebClick) Web-click sequences (1 month, 5urls) SIGMOD 201454Y. Matsubara et al. AutoPlait finds 2 patterns: weekday/weekend ! Monday (but holiday)

55 App1. Model analysis (WebClick) Pattern of weekday regime SIGMOD 201455Y. Matsubara et al. Observation: Working hard every weekday (i.e., using dictionary, news sites) Details

56 App1. Model analysis (WebClick) Pattern of weekend regime SIGMOD 201456Y. Matsubara et al. Observation: No more work on weekend (i.e., blog, mail, Q&A for non-business purposes) Details

57 AutoPlait at work AutoPlait is capable of various applications, e.g., App1. Model analysis - Web-click sequences App2. Event discovery - Google Trend data SIGMOD 201457Y. Matsubara et al.

58 App2. Event discovery (GoogleTrend) Anomaly detection (flu-related topics,10 years) SIGMOD 201458Y. Matsubara et al.

59 App2. Event discovery (GoogleTrend) Anomaly detection (flu-related topics,10 years) SIGMOD 201459Y. Matsubara et al. AutoPlait detects 1 unusual spike in 2009 (i.e., swine flu)

60 App2. Event discovery (GoogleTrend) Turning point detection (seasonal sweets topics) SIGMOD 201460Y. Matsubara et al.

61 App2. Event discovery (GoogleTrend) Turning point detection (seasonal sweets topics) SIGMOD 201461Y. Matsubara et al. Trend suddenly changed in 2010 (release of android OS “Ginger bread”, “Ice Cream Sandwich”)

62 App2. Event discovery (GoogleTrend) Trend discovery (game-related topics) SIGMOD 201462Y. Matsubara et al.

63 App2. Event discovery (GoogleTrend) Trend discovery (game-related topics) SIGMOD 201463Y. Matsubara et al. It discovers 3 phases of “game console war” (Xbox&PlayStation/Wii/Mobile social games)

64 Outline SIGMOD 201464 -Motivation -Problem definition -Compression & summarization -Algorithms -Experiments -AutoPlait at work -Conclusions Y. Matsubara et al.

65 Conclusions AutoPlait has the following properties Effective Find optimal segments/regimes Sense-making Reasonable regimes Fully-automatic No magic numbers Scalable It scales linearly SIGMOD 2014Y. Matsubara et al.65 ✔ ✔ ✔ ✔

66 Thank you! SIGMOD 201466Y. Matsubara et al. Code: http://www.cs.kumamoto-u.ac.jp/~yasuko/software.html Mail: yasuko@cs.kumamoto-u.ac.jp


Download ppt "AutoPlait: Automatic Mining of Co-evolving Time Sequences Yasuko Matsubara (Kumamoto University) Yasushi Sakurai (Kumamoto University) Christos Faloutsos."

Similar presentations


Ads by Google