Learning to Detect Events with Markov-Modulated Poisson Processes Ihler, Hutchins and Smyth (2007)
Outline Problem: Finding unusual activity (events) in rhythms of natural human activity Method: Unsupervised learning Time-varying Poisson process modulated by a hidden Markov process (events) Bayesian framework for parameter learning
Why is it hard? Chicken-and-egg problem Where do we start? Previous approaches: baseline Simple threshold model Has severe limitations Need to quantify the notion of an unusual activity How unusual is a measurement How persistent is a deviating measurement
The Data Sets 2 data sets used Building data Counts of people entering and exiting a building 15 weeks of data 30 minute time bins 29 known events in the 15 weeks Freeway Traffic data Vehicle counts on a freeway on-ramp 6 months of data 5 minute time bins 78 known events in the 6 months
Building Data Example day
Building Data Example week
Freeway Traffic Data Example day
Freeway Traffic Data Example week
A naïve Poisson model Is the data actually Poisson? In a Poisson distribution the mean = the variance Is this the case in out data?
A Baseline Model Use a simple threshold approach We say there is an event if P(N;λ) < ε
Problems with this Approach Hard to detect sustained small variation Hard to capture event duration Chicken and egg problem
The model (1) Assuming the processes are additive...which is a fair assumption
The model (2)
What is a Markov Process? A = Rainy B = Sunny
Modelling Events with a Markov Process We define a three state Markov chain z(t) is the state at time t, the 3 possible states are 0 if there is no event +1 if there is a positive event -1 if there is a negative even With transition matrix
Details of the Markov Process We give each row in the transition matrix a Dirichlet prior: Given z(t), we can model N E (t) as a Poisson with rate γ(t). We give this a Gamma prior Γ(γ;a E,b E ), which is independent of t We can then marginalize out over γ(t):
Graphical Model of the Dependencies
Learning the parameters If we are given the hidden variables N 0 (t), N E (t) and z(t), we can: compute MAP estimates draw posterior samples of the parameters λ(t) and M z So, we can use MCMC; iterate between sampling from the hidden variables (given the parameters), and the parameters (given the variables)
Sampling the hidden variables, given the parameters Rough outline: First, use forward-backward algorithm [Baum et al. 1970] to sample z(t) Then given z(t), determine N 0 (t) and N E (t) by sampling
Sampling the parameters, given the hidden variables The conjugate prior distributions give us a straightforward way to compute the posteriors Use the sufficient statistics of the data as (updating) parameters for the posterior:
Prior distributions of z ij and γ(t) Markov-modulated Poisson processes are sensitive to selection of priors for z ij and γ(t) For the domains of these models, we often have strong ideas on e.g. what constitutes a “rare” event Use these ideas to build strong priors in the model in order to avoid overfitting, and to adjust threshold levels of event detection
Calculating Results We are looking to detect unusual events, we can use our model to do this do this by calculating the posterior: We can then compare our predictions with the known event occurrences
Example Posterior Predictions (1)
Example Posterior Predictions (2)
Example Posterior Predictions (3)
Comparison of Predicted Events with Known Events
Other Possible Inferences The model can be modified to test the degree of heterogeneity of the time process. We can ask questions like are all week days essentially the same? are all afternoons essentially the same? We can estimate event attendance
Conclusion Model much more affective than threshold approach Good detection rate Difficult to access false positive rate Possibility for extension
Questions