AutoPlait: Automatic Mining of Co-evolving Time Sequences Yasuko Matsubara (Kumamoto University) Yasushi Sakurai (Kumamoto University) Christos Faloutsos.

Slides:



Advertisements
Similar presentations
A probabilistic model for retrospective news event detection
Advertisements

Yasuko Matsubara (Kyoto University),
Query Chain Focused Summarization Tal Baumel, Rafi Cohen, Michael Elhadad Jan 2014.
FUNNEL: Automatic Mining of Spatially Coevolving Epidemics Yasuko Matsubara, Yasushi Sakurai (Kumamoto University) Willem G. van Panhuis (University of.
Yasuhiro Fujiwara (NTT Cyber Space Labs)
Streaming Pattern Discovery in Multiple Time-Series Spiros Papadimitriou Jimeng Sun Christos Faloutsos Carnegie Mellon University VLDB 2005, Trondheim,
Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,
Systems Engineering and Engineering Management The Chinese University of Hong Kong Parameter Free Bursty Events Detection in Text Streams Gabriel Pui Cheong.
BBM: Bayesian Browsing Model from Petabyte-scale Data Chao Liu, MSR-Redmond Fan Guo, Carnegie Mellon University Christos Faloutsos, Carnegie Mellon University.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
WindMine: Fast and Effective Mining of Web-click Sequences SDM 2011Y. Sakurai et al.1 Yasushi Sakurai (NTT) Lei Li (Carnegie Mellon Univ.) Yasuko Matsubara.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Ph.D. DefenceUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.
British Museum Library, London Picture Courtesy: flickr.
Fast Temporal State-Splitting for HMM Model Selection and Learning Sajid Siddiqi Geoffrey Gordon Andrew Moore.
Mining Long Sequential Patterns in a Noisy Environment Jiong Yang, Wei Wang, Philip S. Yu, Jiawei Han SIGMOD 2002.
Sequence labeling and beam search LING 572 Fei Xia 2/15/07.
Scalable Text Mining with Sparse Generative Models
Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B. Aditya Prakash, Christos Faloutsos School of Computer Science Carnegie Mellon.
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
CMU SCS Big (graph) data analytics Christos Faloutsos CMU.
Data Mining Chun-Hung Chou
Masquerade Detection Mark Stamp 1Masquerade Detection.
Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by.
Graphical models for part of speech tagging
Fast and Exact Monitoring of Co-evolving Data Streams Yasuko Matsubara, Yasushi Sakurai (Kumamoto University) Naonori Ueda (NTT) Masatoshi Yoshikawa (Kyoto.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Language Models Hongning Wang Two-stage smoothing [Zhai & Lafferty 02] c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
Fast Mining and Forecasting of Complex Time-Stamped Events Yasuko Matsubara (Kyoto University), Yasushi Sakurai (NTT), Christos Faloutsos (CMU), Tomoharu.
CMU SCS Big (graph) data analytics Christos Faloutsos CMU.
IRCS/CCN Summer Workshop June 2003 Speech Recognition.
Carnegie Mellon School of Computer Science Forecasting with Cyber-physical Interactions in Data Centers Lei Li PDL Seminar 9/28/2011.
Using Inactivity to Detect Unusual behavior Presenter : Siang Wang Advisor : Dr. Yen - Ting Chen Date : Motion and video Computing, WMVC.
Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.
Lei Li Computer Science Department Carnegie Mellon University Pre Proposal Time Series Learning completed work 11/27/2015.
Analysing Clickstream Data: From Anomaly Detection to Visitor Profiling Peter I. Hofgesang Wojtek Kowalczyk ECML/PKDD Discovery.
Stream Monitoring under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Christos Faloutsos (Carnegie Mellon Univ.) Masashi Yamamuro (NTT.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre.
John Lafferty Andrew McCallum Fernando Pereira
Streaming Pattern Discovery in Multiple Time-Series Jimeng Sun Spiros Papadimitrou Christos Faloutsos PARALLEL DATA LABORATORY Carnegie Mellon University.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
Data Mining and Decision Support
Automatic Labeling of Multinomial Topic Models
Background 2 Outline 3 Scopus publications 4 Goal and a signal model 5Harmonic signal parameters estimation.
U of Minnesota DIWANS'061 Energy-Aware Scheduling with Quality of Surveillance Guarantee in Wireless Sensor Networks Jaehoon Jeong, Sarah Sharafkandi and.
D YNA MM O : M INING AND S UMMARIZATION OF C OEVOLVING S EQUENCES WITH M ISSING V ALUES Lei Li joint work with Christos Faloutsos, James McCann, Nancy.
Facets: Fast Comprehensive Mining of Coevolving High-order Time Series Hanghang TongPing JiYongjie CaiWei FanQing He Joint Work by Presenter:Wei Fan.
Arizona State University1 Fast Mining of a Network of Coevolving Time Series Wei FanHanghang TongPing JiYongjie Cai.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Forecasting with Cyber-physical Interactions in Data Centers (part 3)
Online Multiscale Dynamic Topic Models
Non-linear Mining of Competing Local Activities
Pre Proposal Time Series Learning completed work
Kijung Shin1 Mohammad Hammoud1
Sequential Data Cleaning: A Statistical Approach
Automatic Segmentation of Data Sequences
Data Warehousing Data Mining Privacy
Online Analytical Processing Stream Data: Is It Feasible?
15-826: Multimedia Databases and Data Mining
Presentation transcript:

AutoPlait: Automatic Mining of Co-evolving Time Sequences Yasuko Matsubara (Kumamoto University) Yasushi Sakurai (Kumamoto University) Christos Faloutsos (CMU) SIGMOD 2014Y. Matsubara et al.1

Time Motivation Given: co-evolving time-series – e.g., MoCap (leg/arm sensors) “Chicken dance” Y. Matsubara et al.2SIGMOD 2014

Time Motivation Given: co-evolving time-series – e.g., MoCap (leg/arm sensors) “Chicken dance” Y. Matsubara et al.3SIGMOD 2014 Q. Any distinct patterns? Q. If yes, how many? Q. What kind?

Motivation Challenges: co-evolving sequences – Unknown # of patterns (e.g., beaks) – Different durations Y. Matsubara et al.4SIGMOD 2014 Time …

Motivation Challenges: co-evolving sequences – Unknown # of patterns (e.g., beaks) – Different durations Y. Matsubara et al.5SIGMOD 2014 Time … Q. Can we summarize it automatically ?

Motivation Goal: find patterns that agree with human intuition Y. Matsubara et al.6SIGMOD 2014 Time

Motivation Goal: find patterns that agree with human intuition Y. Matsubara et al.7SIGMOD 2014 AutoPlait: “fully-automatic” mining algorithm

Importance of “fully-automatic” No magic numbers!... because, Big data mining: -> we cannot afford human intervention!! Y. Matsubara et al.8SIGMOD 2014 Manual -sensitive to the parameter tuning -long tuning steps (hours, days, …) Automatic (no magic numbers) - no expert tuning required

Outline SIGMOD Motivation -Problem definition -Compression & summarization -Algorithms -Experiments -AutoPlait at work -Conclusions Y. Matsubara et al.

Problem definition SIGMOD Y. Matsubara et al. Key concepts -Bundle: -Segment: -Regime: -Segment-membership: given hidden

Problem definition SIGMOD Y. Matsubara et al. Bundle : set of d co-evolving sequences Bundle X (d=4) given

Problem definition SIGMOD Y. Matsubara et al. Segment: convert X -> m segments, S Segment (m=8) s1s2 s3s4 s5s6 s7 s8 hidden

Problem definition SIGMOD Y. Matsubara et al. Regime: segment groups: : model of regime r Regimes (r=4) wings beaks hidden

Problem definition SIGMOD Y. Matsubara et al. Segment-membership: assignment Segment- membership (m=8) F = { 2, 4, 1, 3, 2, 4, 1, 3 } hidden

Problem definition SIGMOD Given: bundle X Y. Matsubara et al.

Problem definition SIGMOD Given: bundle X Find: compact description C of X Y. Matsubara et al.

Problem definition SIGMOD Given: bundle X Find: compact description C of X Y. Matsubara et al. m segments r regimes Segment-membership

Outline SIGMOD Motivation -Problem definition -Compression & summarization -Algorithms -Experiments -AutoPlait at work -Conclusions Y. Matsubara et al.

Main ideas Goal: compact description of X without user intervention!! Challenges: Q1. How to generate ‘informative’ regimes ? Q2. How to decide # of regimes/segments ? SIGMOD 2014Y. Matsubara et al.19

Main ideas Goal: compact description of X without user intervention!! Challenges: Q1. How to generate ‘informative’ regimes ? Idea (1): Multi-level chain model Q2. How to decide # of regimes/segments ? Idea (2): Model description cost SIGMOD 2014Y. Matsubara et al.20

SIGMOD Idea (1): MLCM: multi-level chain model Q1. How to generate ‘informative’ regimes ? Y. Matsubara et al. Model SequencesRegimes beaks claps wings

SIGMOD Idea (1): MLCM: multi-level chain model Q1. How to generate ‘informative’ regimes ? Idea (1): Multi-level chain model – HMM-based probabilistic model – with “across-regime” transitions Y. Matsubara et al. beaks wings claps Model SequencesRegimes

Idea (1): MLCM: multi-level chain model SIGMOD 2014Y. Matsubara et al.23 r regimes (HMMs) across-regime transition prob. Single HMM parameters Details

Idea (1): MLCM: multi-level chain model SIGMOD 2014Y. Matsubara et al.24 Regimes r=2 Regime 1 (k=3) Regime 2 (k=2) r regimes (HMMs) across-regime transition prob. Single HMM parameters state 1 state 3 state 2 state 1 state 2 Regime switch Regime1 “beaks” Regime2 “wings” Details

SIGMOD Idea (2): model description cost Q2. How to decide # of regimes/segments ? Idea (2): Model description cost Minimize coding cost find “optimal” # of segments/regimes Y. Matsubara et al.

Idea (2): model description cost Idea: Minimize encoding cost! SIGMOD 2014Y. Matsubara et al.26 Good compression Good description (# of r, m) Cost M (M) + Cost c (X|M) Model cost Coding cost min ( )

Idea (2): model description cost Total cost of bundle X, given C SIGMOD 2014Y. Matsubara et al.27 Details

Idea (2): model description cost Total cost of bundle X, given C SIGMOD 2014Y. Matsubara et al.28 Model description cost of Θ Coding cost of X given Θ # of segments/regim es segment lengths segment- membership F duration/dimensi ons Details

Outline SIGMOD Motivation -Problem definition -Compression & summarization -Algorithms -Experiments -AutoPlait at work -Conclusions Y. Matsubara et al.

AutoPlait Overview SIGMOD 2014Y. Matsubara et al.30 Iteration 0 r=1, m=1 Start! Iteration Iteration r=1

AutoPlait Overview SIGMOD 2014Y. Matsubara et al.31 Iteration 1 r=2, m= Iteration r=1 r=2 Split

AutoPlait Overview SIGMOD 2014Y. Matsubara et al.32 Iteration 1 r=2, m=4 Iteration 2 r=3, m=6 Split Iteration r=1 r=2 r=3

AutoPlait Overview SIGMOD 2014Y. Matsubara et al.33 Iteration 1 r=2, m=4 Iteration 2 r=3, m=6 Iteration 4 r=4, m=8 Split Iteration Iteration r=1 r=2 r=3 r=4

AutoPlait Algorithms 1.CutPointSearch Find good cut-points/segments 2.RegimeSplit Estimate good regime parameters Θ 3.AutoPlait Search for the best number of regimes (r=2,3,4…) SIGMOD 2014Y. Matsubara et al.34 Inner-most loop Inner loop Outer loop

1.CutPointSearch Given: - bundle - parameters of two regimes Find: cut-points of segment sets S 1, S 2, SIGMOD 2014Y. Matsubara et al.35 Inner-most loop

1.CutPointSearch SIGMOD 2014Y. Matsubara et al.36 state 1 state 2 state 3 state 2 state 1 switch?? Inner-most loop DP algorithm to compute likelihood: Details

1.CutPointSearch Theoretical analysis Scalability - It takes time (only single scan) - n: length of X - d: dimension of X - k: # of hidden states in regime Accuracy It guarantees the optimal cut points - (Details in paper) SIGMOD 2014Y. Matsubara et al.37 Inner-most loop Details

2.RegimeSplit Given: - bundle Find: two regimes 1. find cut-points of segment sets: S 1, S 2 2. estimate parameters of two regimes: SIGMOD 2014Y. Matsubara et al.38 Inner loop

2.RegimeSplit Two-phase iterative approach -Phase 1: (CutPointSearch) -Split segments into two groups : -Phase 2: (BaumWelch) -Update model parameters: SIGMOD 2014Y. Matsubara et al.39 Phase 1 Phase 2 Inner loop Details

3.AutoPlait Given: - bundle Find: r regimes (r=2, 3, 4, … ) - i.e., find full parameter set SIGMOD 2014Y. Matsubara et al.40 Outer loop

3.AutoPlait Split regimes r=2,3,…, as long as cost keeps decreasing - Find appropriate # of regimes SIGMOD 2014Y. Matsubara et al.41 Outer loop

3.AutoPlait Split regimes r=2,3,…, as long as cost keeps decreasing - Find appropriate # of regimes SIGMOD 2014Y. Matsubara et al.42 r=2, m=4 r=3, m=6 r=4, m=8 Outer loop

Outline SIGMOD Motivation -Problem definition -Compression & summarization -Algorithms -Experiments -AutoPlait at work -Conclusions Y. Matsubara et al.

Experiments We answer the following questions… Q1. Sense-making Can it help us understand the given bundles? Q2. Accuracy How well does it find cut-points and regimes? Q3. Scalability How does it scale in terms of computational time? SIGMOD Y. Matsubara et al.

Q1. Sense-making MoCap data SIGMOD Y. Matsubara et al. AutoPlait (NO magic numbers) DynaMMo (Li et al., KDD’09) pHMM (Wang et al., SIGMOD’11)

Q1. Sense-making MoCap data SIGMOD Y. Matsubara et al. AutoPlait (NO magic numbers)

Q2. Accuracy (a) Segmentation (b) Clustering SIGMOD Y. Matsubara et al. AutoPlait needs “no magic numbers” Target

Q3. Scalability Wall clock time vs. data size (length) : n SIGMOD Y. Matsubara et al. AutoPlait scales linearly, i.e., O(n) AutoPlait DynaMMo pHMM

Outline SIGMOD Motivation -Problem definition -Compression & summarization -Algorithms -Experiments -AutoPlait at work -Conclusions Y. Matsubara et al.

AutoPlait at work AutoPlait is capable of various applications, e.g., App1. Model analysis - Web-click sequences App2. Event discovery - Google Trend data SIGMOD Y. Matsubara et al.

AutoPlait at work AutoPlait is capable of various applications, e.g., App1. Model analysis - Web-click sequences App2. Event discovery - Google Trend data SIGMOD Y. Matsubara et al.

App1. Model analysis (WebClick) Web-click sequences (1 month, 5urls) – 5urls: blog, news, dictionary, Q&A, mail – every 10 minutes SIGMOD Y. Matsubara et al. Time (per 10 minutes)

App1. Model analysis (WebClick) Web-click sequences (1 month, 5urls) SIGMOD Y. Matsubara et al. AutoPlait finds 2 patterns: weekday/weekend !

App1. Model analysis (WebClick) Web-click sequences (1 month, 5urls) SIGMOD Y. Matsubara et al. AutoPlait finds 2 patterns: weekday/weekend ! Monday (but holiday)

App1. Model analysis (WebClick) Pattern of weekday regime SIGMOD Y. Matsubara et al. Observation: Working hard every weekday (i.e., using dictionary, news sites) Details

App1. Model analysis (WebClick) Pattern of weekend regime SIGMOD Y. Matsubara et al. Observation: No more work on weekend (i.e., blog, mail, Q&A for non-business purposes) Details

AutoPlait at work AutoPlait is capable of various applications, e.g., App1. Model analysis - Web-click sequences App2. Event discovery - Google Trend data SIGMOD Y. Matsubara et al.

App2. Event discovery (GoogleTrend) Anomaly detection (flu-related topics,10 years) SIGMOD Y. Matsubara et al.

App2. Event discovery (GoogleTrend) Anomaly detection (flu-related topics,10 years) SIGMOD Y. Matsubara et al. AutoPlait detects 1 unusual spike in 2009 (i.e., swine flu)

App2. Event discovery (GoogleTrend) Turning point detection (seasonal sweets topics) SIGMOD Y. Matsubara et al.

App2. Event discovery (GoogleTrend) Turning point detection (seasonal sweets topics) SIGMOD Y. Matsubara et al. Trend suddenly changed in 2010 (release of android OS “Ginger bread”, “Ice Cream Sandwich”)

App2. Event discovery (GoogleTrend) Trend discovery (game-related topics) SIGMOD Y. Matsubara et al.

App2. Event discovery (GoogleTrend) Trend discovery (game-related topics) SIGMOD Y. Matsubara et al. It discovers 3 phases of “game console war” (Xbox&PlayStation/Wii/Mobile social games)

Outline SIGMOD Motivation -Problem definition -Compression & summarization -Algorithms -Experiments -AutoPlait at work -Conclusions Y. Matsubara et al.

Conclusions AutoPlait has the following properties Effective Find optimal segments/regimes Sense-making Reasonable regimes Fully-automatic No magic numbers Scalable It scales linearly SIGMOD 2014Y. Matsubara et al.65 ✔ ✔ ✔ ✔

Thank you! SIGMOD Y. Matsubara et al. Code: Mail: