Fast and Exact Monitoring of Co-evolving Data Streams Yasuko Matsubara, Yasushi Sakurai (Kumamoto University) Naonori Ueda (NTT) Masatoshi Yoshikawa (Kyoto.

Slides:



Advertisements
Similar presentations
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Advertisements

FUNNEL: Automatic Mining of Spatially Coevolving Epidemics Yasuko Matsubara, Yasushi Sakurai (Kumamoto University) Willem G. van Panhuis (University of.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.
Yasuhiro Fujiwara (NTT Cyber Space Labs)
Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.
Anomaly Detection in the WIPER System using A Markov Modulated Poisson Distribution Ping Yan Tim Schoenharl Alec Pawling Greg Madey.
Hidden Markov Models Theory By Johan Walters (SR 2003)
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
Paper Discussion: “Simultaneous Localization and Environmental Mapping with a Sensor Network”, Marinakis et. al. ICRA 2011.
WindMine: Fast and Effective Mining of Web-click Sequences SDM 2011Y. Sakurai et al.1 Yasushi Sakurai (NTT) Lei Li (Carnegie Mellon Univ.) Yasuko Matsubara.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Ph.D. DefenceUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:
Themis Palpanas1 VLDB - Aug 2004 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Linear-Space Alignment. Linear-space alignment Using 2 columns of space, we can compute for k = 1…M, F(M/2, k), F r (M/2, N – k) PLUS the backpointers.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Recognition of Human Gait From Video Rong Zhang, C. Vogler, and D. Metaxas Computational Biomedicine Imaging and Modeling Center Rutgers University.
Conditional Random Fields
Regular Expression Constrained Sequence Alignment Abdullah N. Arslan Assistant Professor Computer Science Department.
Based on Slides by D. Gunopulos (UCR)
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
. Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss.
Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.
May 11, 2005 Tracking on a Graph Songhwai Oh Shankar Sastry Target trajectoriesEstimated tracks Tracking in Sensor Networks Target tracking is a representative.
Fast Temporal State-Splitting for HMM Model Selection and Learning Sajid Siddiqi Geoffrey Gordon Andrew Moore.
Hamed Pirsiavash, Deva Ramanan, Charless Fowlkes
Isolated-Word Speech Recognition Using Hidden Markov Models
Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.
QUIZ!!  T/F: The forward algorithm is really variable elimination, over time. TRUE  T/F: Particle Filtering is really sampling, over time. TRUE  T/F:
Gene finding with GeneMark.HMM (Lukashin & Borodovsky, 1997 ) CS 466 Saurabh Sinha.
Interleaved Multi-Bank Scratchpad Memories: A Probabilistic Description of Access Conflicts DAC '15, June , 2015, San Francisco, CA, USA.
earthobs.nr.no Temporal Analysis of Forest Cover Using a Hidden Markov Model Arnt-Børre Salberg and Øivind Due Trier Norwegian Computing Center.
BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Fast Mining and Forecasting of Complex Time-Stamped Events Yasuko Matsubara (Kyoto University), Yasushi Sakurai (NTT), Christos Faloutsos (CMU), Tomoharu.
BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005.
AutoPlait: Automatic Mining of Co-evolving Time Sequences Yasuko Matsubara (Kumamoto University) Yasushi Sakurai (Kumamoto University) Christos Faloutsos.
Computing longest common substring and all palindromes from compressed strings Wataru Matsubara 1, Shunsuke Inenaga 2, Akira Ishino 1, Ayumi Shinohara.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
Hidden Markov Models & POS Tagging Corpora and Statistical Methods Lecture 9.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Learning to Detect Events with Markov-Modulated Poisson Processes Ihler, Hutchins and Smyth (2007)
Stream Monitoring under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Christos Faloutsos (Carnegie Mellon Univ.) Masashi Yamamuro (NTT.
QUIZ!!  In HMMs...  T/F:... the emissions are hidden. FALSE  T/F:... observations are independent given no evidence. FALSE  T/F:... each variable X.
MaskIt: Privately Releasing User Context Streams for Personalized Mobile Applications SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference.
The Misra Gries Algorithm. Motivation Espionage The rest we monitor.
EE515/IS523: Security 101: Think Like an Adversary Evading Anomarly Detection through Variance Injection Attacks on PCA Benjamin I.P. Rubinstein, Blaine.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
Kijung Shin Jinhong Jung Lee Sael U Kang
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
D YNA MM O : M INING AND S UMMARIZATION OF C OEVOLVING S EQUENCES WITH M ISSING V ALUES Lei Li joint work with Christos Faloutsos, James McCann, Nancy.
1 Applications of Hidden Markov Models (Lecture for CS498-CXZ Algorithms in Bioinformatics) Nov. 12, 2005 ChengXiang Zhai Department of Computer Science.
Graph Indexing From managing and mining graph data.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Hidden Markov Models BMI/CS 576
Online Multiscale Dynamic Topic Models
Non-linear Mining of Competing Local Activities
Spatio-temporal Pattern Queries
Sequential Data Cleaning: A Statistical Approach
Online Analytical Processing Stream Data: Is It Feasible?
Hidden Markov Models By Manish Shrivastava.
Discovering Frequent Poly-Regions in DNA Sequences
Presentation transcript:

Fast and Exact Monitoring of Co-evolving Data Streams Yasuko Matsubara, Yasushi Sakurai (Kumamoto University) Naonori Ueda (NTT) Masatoshi Yoshikawa (Kyoto University) ICDM 2014Y. Matsubara et al.1 Speaker: Yasushi Sakurai

Motivation Given: co-evolving data streams – e.g., MoCap (leg/arm sensors) Y. Matsubara et al.2ICDM 2014

Motivation Given: co-evolving data streams – e.g., MoCap (leg/arm sensors) Y. Matsubara et al.3ICDM 2014 Multiple distinct patterns different durations e.g., Walking, punching

Motivation Given: co-evolving data streams – e.g., MoCap (leg/arm sensors) Y. Matsubara et al.4ICDM 2014 Q1: Walking Q2: Running Q3: Punching Queries :

Motivation Given: co-evolving data streams – e.g., MoCap (leg/arm sensors) Y. Matsubara et al.5ICDM 2014 Q1: Walking Q2: Running Q3: Punching Queries : Q. Can we find subsequences that have the characteristics of query ?

Outline ICDM Motivation -Problem formulation -Main ideas -Experiments -StreamScan at work -Conclusions Y. Matsubara et al.

Background ICDM 2014Y. Matsubara et al.7 Goal: statistical monitoring of time-varying data streams

Background ICDM 2014Y. Matsubara et al.8 Goal: statistical monitoring of time-varying data streams Query (punch) Data stream

Background ICDM 2014Y. Matsubara et al.9 Goal: statistical monitoring of time-varying data streams Query (punch) Data stream Punch is here!

Background ICDM 2014Y. Matsubara et al.10 Goal: statistical monitoring of time-varying data streams Query (punch) Data stream Hidden Markov models (HMMs)

Background ICDM 2014Y. Matsubara et al.11 Hidden Markov models (HMMs) Initial state probabilities State transition probabilities Output probabilities state 1 state 3 state 2 Details

Background ICDM 2014Y. Matsubara et al.12 -Model: -Sequence: Likelihood: Details state 1 state 3 state 2 Viterbi path Trellis structure

Background ICDM 2014Y. Matsubara et al.13 -Model: -Sequence: Likelihood: Details state 1 state 3 state 2 Viterbi path Trellis structure

Background ICDM Given: a data stream X and model Y. Matsubara et al.

Background ICDM Given: a data stream X and model Y. Matsubara et al. Find: subsequences that has a high likelihood value

Requirements ICDM Exponential threshold function 2.Minimum length of subsequences 3.Non-overlapping matches Y. Matsubara et al.

Requirements ICDM Exponential threshold function 2.Minimum length of subsequences 3.Non-overlapping matches Y. Matsubara et al. x1x1 xnxn X

Requirements ICDM Exponential threshold function 2.Minimum length of subsequences 3.Non-overlapping matches Y. Matsubara et al. x1x1 xnxn X Walk Jump Walk Jump Punch

Requirements ICDM Exponential threshold function 2.Minimum length of subsequences 3.Non-overlapping matches Y. Matsubara et al. x1x1 xnxn X

Requirements ICDM Exponential threshold function 2.Minimum length of subsequences 3.Non-overlapping matches Y. Matsubara et al. x1x1 xnxn X

Requirements ICDM Exponential threshold function 2.Minimum length of subsequences 3.Non-overlapping matches Y. Matsubara et al. x1x1 xnxn X ignore

Requirements ICDM Exponential threshold function 2.Minimum length of subsequences 3.Non-overlapping matches Y. Matsubara et al. x1x1 xnxn X Best match Candidates

Problem definition ICDM Given: data stream Model thresholds Y. Matsubara et al. X

Problem definition ICDM Given: data stream Model thresholds Report: all subsequences that satisfy: only the local maximum, among several overlapping matches Y. Matsubara et al. X Best match

Outline ICDM Motivation -Problem formulation -Main ideas -Experiments -StreamScan at work -Conclusions Y. Matsubara et al.

Previous solution ICDM Y. Matsubara et al. Sliding model method (SMM), by Wilpon et al. Compute likelihood starting from every time-tick

Previous solution ICDM Y. Matsubara et al. Sliding model method (SMM), by Wilpon et al. Compute likelihood starting from every time-tick O(n) trellis structures Update O(nk 2 ) for each time-tick

Main idea: StreamScan ICDM Y. Matsubara et al. (a) Cumulative likelihood function (b) Subsequence trellis structure (STS)

ICDM Y. Matsubara et al. (a) Cumulative likelihood function Details Main idea (a)

ICDM Y. Matsubara et al. (a) Cumulative likelihood function Details It requires a single structure It guarantees the best matches Main idea (a)

ICDM Y. Matsubara et al. (b) Subsequence trellis structure (STS) Details Main idea (b)

ICDM Y. Matsubara et al. (b) Subsequence trellis structure (STS) Details Main idea (b) : Starting position : Cum. likelihood For each cell:

ICDM Y. Matsubara et al. Example, Details Algorithm state 1 state 3 state 2

ICDM Y. Matsubara et al. Example, Details Algorithm State 10 (1) State 20 (1) State 30 (1) x 1 =3x 2 =1x 3 =1x 4 =2x 5 =3x 6 =3x 7 =3x 8 =1 (STS, k=3)

ICDM Y. Matsubara et al. Example, Details Algorithm State 10 (1) State 20 (1) State 30 (1) x 1 =3x 2 =1x 3 =1x 4 =2x 5 =3x 6 =3x 7 =3x 8 =1 : Cum. Likelihood : Starting position (STS, k=3)

ICDM Y. Matsubara et al. Example, Details Algorithm State 10 (1) 10 (2) State 20 (1) 0 (2) State 30 (1) 0 (2) x 1 =3x 2 =1x 3 =1x 4 =2x 5 =3x 6 =3x 7 =3x 8 =1 candidate! (STS, k=3)

ICDM Y. Matsubara et al. Example, Details Algorithm State 10 (1) 10 (2) 50 (2) State 20 (1) 0 (2) 37.5 (2) State 30 (1) 0 (2) 0 (3) x 1 =3x 2 =1x 3 =1x 4 =2x 5 =3x 6 =3x 7 =3x 8 =1 (STS, k=3)

ICDM Y. Matsubara et al. Example, Details Algorithm State 10 (1) 10 (2) 50 (2) 0 (4) State 20 (1) 0 (2) 37.5 (2) 62.5 (2) State 30 (1) 0 (2) 0 (3) 0 (4) x 1 =3x 2 =1x 3 =1x 4 =2x 5 =3x 6 =3x 7 =3x 8 =1 (STS, k=3)

ICDM Y. Matsubara et al. Example, Details Algorithm State 10 (1) 10 (2) 50 (2) 0 (4) 0 (5) 0 (6) 0 (7) State 20 (1) 0 (2) 37.5 (2) 62.5 (2) 0 (5) 0 (6) 0 (7) State 30 (1) 0 (2) 0 (3) 0 (4) (2) (2) (2) x 1 =3x 2 =1x 3 =1x 4 =2x 5 =3x 6 =3x 7 =3x 8 =1 (STS, k=3)

ICDM Y. Matsubara et al. Example, Details Algorithm State 10 (1) 10 (2) 50 (2) 0 (4) 0 (5) 0 (6) 0 (7) 10 (8) State 20 (1) 0 (2) 37.5 (2) 62.5 (2) 0 (5) 0 (6) 0 (7) 0 (8) State 30 (1) 0 (2) 0 (3) 0 (4) (2) (2) (2) 0 (8) x 1 =3x 2 =1x 3 =1x 4 =2x 5 =3x 6 =3x 7 =3x 8 =1 Report X[2:7] (STS, k=3)

ICDM Y. Matsubara et al. StreamScan guarantees -no false dismissals -O(1) space and time per time-tick (Details in paper!) Theoretical analysis Details

Outline ICDM Motivation -Problem formulation -Main ideas -Experiments -StreamScan at work -Conclusions Y. Matsubara et al.

Experiments We answer the following questions… Q1. Effectiveness How successful is StreamScan in capturing sequence patterns? Q2. Scalability How does it scale with the sequence lengths n in terms of time and space? ICDM Y. Matsubara et al.

Q1. Effectiveness (MoCap) ICDM Y. Matsubara et al. (Query) (Output)

Q1. Effectiveness ICDM Y. Matsubara et al. MoCap Q1. Walk Q2. Jump Q1. Walk Q2. Jump Q3. Moving arms

Q2. Scalability Time/space vs. data size (length) : n ICDM Y. Matsubara et al. StreamScan requires constant time/space, i.e., O(1) (n) (time)(space)

Q2. Scalability Time/space vs. data size (length) : n ICDM Y. Matsubara et al. StreamScan requires constant time/space, i.e., O(1) (n) (time)(space) SFR’ x SMM’89 479,000x

Outline ICDM Motivation -Problem formulation -Main ideas -Experiments -StreamScan at work -Conclusions Y. Matsubara et al.

StreamScan at work StreamScan is capable of various applications, e.g., App1. Social activity monitoring - Web-click streams App2. Extreme detection in keyword stream - GoogleTrend streams ICDM Y. Matsubara et al.

StreamScan at work StreamScan is capable of various applications, e.g., App1. Social activity monitoring - Web-click streams App2. Extreme detection in keyword stream - GoogleTrend streams ICDM Y. Matsubara et al.

Application (1) Social activity monitoring Given: Web-click sequences (1 month, 5urls) Query: first Sunday ( 5urls: blog, news, dictionary, Q&A, mail ) ICDM Y. Matsubara et al. Time (per 10 seconds) Query

Application (1) Social activity monitoring Given: Web-click sequences (1 month, 5urls) Query: first Sunday ( 5urls: blog, news, dictionary, Q&A, mail ) ICDM Y. Matsubara et al. Time (per 10 seconds) Query StreamScan identifies all weekends + holiday Detected!

StreamScan at work StreamScan is capable of various applications, e.g., App1. Social activity monitoring - Web-click streams App2. Extreme detection in keyword stream - GoogleTrend streams ICDM Y. Matsubara et al.

“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) ICDM Y. Matsubara et al. Application (2)

“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) e.g., Finance-related keywords ICDM Y. Matsubara et al. Application (2) Time (weekly, 10 years)

“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) e.g., Finance-related keywords ICDM Y. Matsubara et al. Application (2) Any extreme behavior? Time (weekly, 10 years) ?

“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) e.g., Finance-related keywords ICDM Y. Matsubara et al. Application (2) Any extreme behavior? Time (weekly, 10 years) (first 3 years) Query ?

“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) e.g., Finance-related keywords ICDM Y. Matsubara et al. Application (2) Time (weekly, 10 years) (first 3 years) Global financial crisis in 2008! Detected!

“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) e.g., Flu-related keywords ICDM Y. Matsubara et al. Application (2) (first 3 years) Query ?

“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) e.g., Flu-related keywords ICDM Y. Matsubara et al. Application (2) Swine flu pandemic in 2009 Detected!

“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) e.g., Seasonal sweets-related keywords ICDM Y. Matsubara et al. Application (2) ? (first 3 years) Query

“Extreme monitoring” in keyword stream Given: GoogleTrend streams (10 years, weekly) e.g., Seasonal sweets-related keywords ICDM Y. Matsubara et al. Application (2) Release of Android OS, “Gingerbread” “Ice Cream Sandwich” Detected!

Outline ICDM Motivation -Problem formulation -Main ideas -Experiments -StreamScan at work -Conclusions Y. Matsubara et al.

Conclusions StreamScan has the following properties Effective Find fruitful patterns from diverse data streams Exact It guarantees exactness Fast and nimble It requires single scan, O(1) space and time ICDM 2014Y. Matsubara et al.64 ✔ ✔ ✔

Thank you! ICDM Y. Matsubara et al.