Learning to Detect Events with Markov-Modulated Poisson Processes Ihler, Hutchins and Smyth (2007)

Slides:

Advertisements

Similar presentations

Bayesian Belief Propagation

Advertisements

Topic models Source: Topic models, David Blei, MLSS 09.

A Tutorial on Learning with Bayesian Networks

Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.

HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:

Bayesian Estimation in MARK

Bayesian posterior predictive probability - what do interim analyses mean for decision making? Oscar Della Pasqua & Gijs Santen Clinical Pharmacology Modelling.

Dynamic Bayesian Networks (DBNs)

Spring Before-After Studies Recap: we need to define the notation that will be used for performing the two tasks at hand. Let: be the expected number.

Gibbs Sampling Qianji Zheng Oct. 5th, 2010.

What is Statistical Modeling

Anomaly Detection in the WIPER System using A Markov Modulated Poisson Distribution Ping Yan Tim Schoenharl Alec Pawling Greg Madey.

Week 8 Video 4 Hidden Markov Models.

Beam Sampling for the Infinite Hidden Markov Model Van Gael, et al. ICML 2008 Presented by Daniel Johnson.

1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.

Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

CS 589 Information Risk Management 30 January 2007.

Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.

A Two Level Monte Carlo Approach To Calculating

1 Integration of Background Modeling and Object Tracking Yu-Ting Chen, Chu-Song Chen, Yi-Ping Hung IEEE ICME, 2006.

Learning Bayesian Networks

Bayesian Analysis for Extreme Events Pao-Shin Chu and Xin Zhao Department of Meteorology School of Ocean & Earth Science & Technology University of Hawaii-

Probabilistic Analysis of a Large-Scale Urban Traffic Sensor Data Set Jon Hutchins, Alexander Ihler, and Padhraic Smyth Department of Computer Science.

Learning Bayesian Networks (From David Heckerman’s tutorial)

Modeling Count Data over Time Using Dynamic Bayesian Networks Jonathan Hutchins Advisors: Professor Ihler and Professor Smyth.

1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.

Activity Detection in Videos Riu Baring CIS 8590 Perception of Intelligent System Temple University Fall 2007.

WSEAS AIKED, Cambridge, Feature Importance in Bayesian Assessment of Newborn Brain Maturity from EEG Livia Jakaite, Vitaly Schetinin and Carsten.

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

Computer vision: models, learning and inference Chapter 19 Temporal models.

Bayes for Beginners Presenters: Shuman ji & Nick Todd.

Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.

Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Style & Topic Language Model Adaptation Using HMM-LDA Bo-June (Paul) Hsu, James Glass.

Probabilistic Graphical Models for Semi-Supervised Traffic Classification Rotsos Charalampos, Jurgen Van Gael, Andrew W. Moore, Zoubin Ghahramani Computer.

Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.

Integrating Topics and Syntax -Thomas L

ADAPTIVE EVENT DETECTION USING TIME-VARYING POISSON PROCESSES Kdd06 University of California, Irvine.

Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.

1 Francisco José Vázquez Polo [ Miguel Ángel Negrín Hernández [ {fjvpolo or

CS Statistical Machine learning Lecture 24

CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.

Beam Sampling for the Infinite Hidden Markov Model by Jurgen Van Gael, Yunus Saatic, Yee Whye Teh and Zoubin Ghahramani (ICML 2008) Presented by Lihan.

Bayesian Travel Time Reliability

1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,... Si Sj.

CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.

A shared random effects transition model for longitudinal count data with informative missingness Jinhui Li Joint work with Yingnian Wu, Xiaowei Yang.

Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Introduction to Sampling Methods Qi Zhao Oct.27,2004.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)

Gibbs Sampling and Hidden Markov Models in the Event Detection Problem By Marc Sobel.

Spatially Explicit Capture-recapture Models for Density Estimation 5.11 UF-2015.

Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.

Canadian Bioinformatics Workshops

Bayesian Inference: Multiple Parameters

MCMC Output & Metropolis-Hastings Algorithm Part I

ICS 280 Learning in Graphical Models

Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.

Course on Bayesian Methods in Environmental Valuation

A Non-Parametric Bayesian Method for Inferring Hidden Causes

OVERVIEW OF BAYESIAN INFERENCE: PART 1

Statistical NLP: Lecture 4

Particle Filters for Event Detection

Topic Models in Text Processing

CS639: Data Management for Data Science

Presentation transcript:

Learning to Detect Events with Markov-Modulated Poisson Processes Ihler, Hutchins and Smyth (2007)

Outline Problem: Finding unusual activity (events) in rhythms of natural human activity Method:  Unsupervised learning  Time-varying Poisson process modulated by a hidden Markov process (events)  Bayesian framework for parameter learning

Why is it hard? Chicken-and-egg problem  Where do we start? Previous approaches: baseline  Simple threshold model  Has severe limitations Need to quantify the notion of an unusual activity  How unusual is a measurement  How persistent is a deviating measurement

The Data Sets 2 data sets used Building data  Counts of people entering and exiting a building  15 weeks of data  30 minute time bins  29 known events in the 15 weeks Freeway Traffic data  Vehicle counts on a freeway on-ramp  6 months of data  5 minute time bins  78 known events in the 6 months

Building Data Example day

Building Data Example week

Freeway Traffic Data Example day

Freeway Traffic Data Example week

A naïve Poisson model Is the data actually Poisson? In a Poisson distribution the mean = the variance Is this the case in out data?

A Baseline Model Use a simple threshold approach We say there is an event if  P(N;λ) < ε

Problems with this Approach Hard to detect sustained small variation Hard to capture event duration Chicken and egg problem

The model (1) Assuming the processes are additive...which is a fair assumption

The model (2)

What is a Markov Process? A = Rainy B = Sunny

Modelling Events with a Markov Process We define a three state Markov chain z(t) is the state at time t, the 3 possible states are  0 if there is no event  +1 if there is a positive event  -1 if there is a negative even With transition matrix

Details of the Markov Process We give each row in the transition matrix a Dirichlet prior: Given z(t), we can model N E (t) as a Poisson with rate γ(t). We give this a Gamma prior Γ(γ;a E,b E ), which is independent of t We can then marginalize out over γ(t):

Graphical Model of the Dependencies

Learning the parameters If we are given the hidden variables N 0 (t), N E (t) and z(t), we can:  compute MAP estimates  draw posterior samples of the parameters λ(t) and M z So, we can use MCMC; iterate between sampling from the hidden variables (given the parameters), and the parameters (given the variables)

Sampling the hidden variables, given the parameters Rough outline: First, use forward-backward algorithm [Baum et al. 1970] to sample z(t) Then given z(t), determine N 0 (t) and N E (t) by sampling

Sampling the parameters, given the hidden variables The conjugate prior distributions give us a straightforward way to compute the posteriors Use the sufficient statistics of the data as (updating) parameters for the posterior:

Prior distributions of z ij and γ(t) Markov-modulated Poisson processes are sensitive to selection of priors for z ij and γ(t) For the domains of these models, we often have strong ideas on e.g. what constitutes a “rare” event Use these ideas to build strong priors in the model in order to avoid overfitting, and to adjust threshold levels of event detection

Calculating Results We are looking to detect unusual events, we can use our model to do this do this by calculating the posterior: We can then compare our predictions with the known event occurrences

Example Posterior Predictions (1)

Example Posterior Predictions (2)

Example Posterior Predictions (3)

Comparison of Predicted Events with Known Events

Other Possible Inferences The model can be modified to test the degree of heterogeneity of the time process. We can ask questions like  are all week days essentially the same?  are all afternoons essentially the same? We can estimate event attendance

Conclusion Model much more affective than threshold approach Good detection rate Difficult to access false positive rate Possibility for extension

Questions