Change-Point Detection Techniques for Piecewise Locally Stationary Time Series Michael Last National Institute of Statistical Sciences Talk for Midyear Anomaly Detection Workshop 2/3/2006
Stationary Time Series We call a time series stationary if the distribution of (x i,x k ) depends only on l=i-k We call a time series stationary if the distribution of (x i,x k ) depends only on l=i-k Usually use weakly stationary, where we only look at the first two moments (equivalent in Gaussian case) Usually use weakly stationary, where we only look at the first two moments (equivalent in Gaussian case) Example: Sunspot numbers, Chandler Wobble, rainfall (over decades) Example: Sunspot numbers, Chandler Wobble, rainfall (over decades)
Detecting Changes: Piecewise Stationary Time Series Many series not stationary Many series not stationary Earthquakes Earthquakes Speech Speech Finance Finance How to model? How to model? Try stationary between change-points Try stationary between change-points
Problems With This Approach Adak (1998) proposed computing distance between power spectrum computed over small windows – if adjacent windows are close, then merge them into a larger window Adak (1998) proposed computing distance between power spectrum computed over small windows – if adjacent windows are close, then merge them into a larger window Finds too many change-points in earthquakes. Finds too many change-points in earthquakes. E.g. secondary wave tapers off, but change- points will be detected E.g. secondary wave tapers off, but change- points will be detected
Time-Varying Power Spectrum Power spectrum computed over a window about a point Power spectrum computed over a window about a point Window width selection an open question Window width selection an open question Does this have features we can use? Does this have features we can use? Yes! Yes!
Time-Varying Power Spectrum
Finding Abrupt Changes What do we mean by abrupt changes? What do we mean by abrupt changes? Distance between spectrum Distance between spectrum Spectrum as distribution, K-L Information Discrimination Spectrum as distribution, K-L Information Discrimination Requirement of local estimation Requirement of local estimation
Our Distance Function
Theoretical Performance Maximum away from change-points converges to 1. Rate of convergence: Maximum away from change-points converges to 1. Rate of convergence: Consistently estimated with smoothed periodograms Consistently estimated with smoothed periodograms Asymptotically normal Asymptotically normal Finite sample critical values independent of underlying signal Finite sample critical values independent of underlying signal n is length of window, T is length of series n is length of window, T is length of series
Example Series
Simulation Results Simulations to determine effectiveness of change-point localization and identification Simulations to determine effectiveness of change-point localization and identification Separated tasks Separated tasks 8 types of series with different features 8 types of series with different features Minimal amount of tuning Minimal amount of tuning Compared with other methods Compared with other methods Results: Results: Good localization Good localization 65+% correct identification 65+% correct identification
Data Performance
Primary Wave
Secondary Wave
Speech Segmentation Abrupt changes exist at transitions between phonemes Abrupt changes exist at transitions between phonemes Can we reliably recover these? Can we reliably recover these? Given segmented speech, can we meaningfully cluster it? Given segmented speech, can we meaningfully cluster it? Can we interpret clusters? Can we interpret clusters? Can we use the clusters to deduce speaker, accent, or language? Can we use the clusters to deduce speaker, accent, or language?
Time-Varying Power- Spectra
Speech
Window Width Considerations Need a window with enough data to estimate several frequencies in the range where interesting events happen Need a window with enough data to estimate several frequencies in the range where interesting events happen Below 10Hz for earthquakes Below 10Hz for earthquakes At least down to 20Hz for audio At least down to 20Hz for audio At present, this remains one of the major tuning parameters. In effect, wide windows have low variance but risk higher bias At present, this remains one of the major tuning parameters. In effect, wide windows have low variance but risk higher bias
How to asses a Significant Change Asymptotic Distribution: Asymptotic Distribution: Test statistic sum of variables with an F distribution plus their inverses Test statistic sum of variables with an F distribution plus their inverses Asymptotic normality Asymptotic normality Problem: Events of interest are in the tail, asymptotic results break down in tails of distributions Problem: Events of interest are in the tail, asymptotic results break down in tails of distributions Test statistic signal independent Test statistic signal independent Simulate on white noise, pick significance from there Simulate on white noise, pick significance from there
End of Talk Slides which may address specific questions follow, but unless Ive talked way too fast, there probably wont be time to show these. So lets break for coffee, and if anybody has a burning desire to learn more about what Ive said, please come and ask me – Im happy to answer any questions, and may just have a slide lying around to answer with Slides which may address specific questions follow, but unless Ive talked way too fast, there probably wont be time to show these. So lets break for coffee, and if anybody has a burning desire to learn more about what Ive said, please come and ask me – Im happy to answer any questions, and may just have a slide lying around to answer with
Finding the Change- Point(s) Assume correct number of change-points, and find Assume correct number of change-points, and find
Issues How to assess a significant change? How to assess a significant change? Uncertainty in location? Uncertainty in location? Choosing parameters Choosing parameters Window width Window width Smoothing Smoothing Weights Weights
Choosing Parameters: Window Width We need a window width much wider or much narrower than the scale interesting changes happen on We need a window width much wider or much narrower than the scale interesting changes happen on Much wider and the series mixes within a window Much wider and the series mixes within a window Much narrower and continuity of time- varying power spectrum kicks in Much narrower and continuity of time- varying power spectrum kicks in Same scale and oscillations can be detected as big changes Same scale and oscillations can be detected as big changes
Smoothing Makes estimate consistent Makes estimate consistent Ruins independence in frequency Ruins independence in frequency Another tuning parameter Another tuning parameter Bandwidth matters more than shape Bandwidth matters more than shape Current heuristic is about square-root of number of frequencies, seems to work well Current heuristic is about square-root of number of frequencies, seems to work well
Weights Method for incorporating prior knowledge Method for incorporating prior knowledge High weights for frequencies where real changes likely, low for where real changes unlikely High weights for frequencies where real changes likely, low for where real changes unlikely Akin to placing a prior on what frequencies changes will happen on Akin to placing a prior on what frequencies changes will happen on Equivalent to linear filter of signal Equivalent to linear filter of signal
Speech: Unresolved Issues Frequency domain representation of speech different across speakers – e.g. Jessica speaks at a higher pitch (frequency) than I do Frequency domain representation of speech different across speakers – e.g. Jessica speaks at a higher pitch (frequency) than I do Can we find a transform to fix this? Can we find a transform to fix this? After solving this problem, what is the next problem? After solving this problem, what is the next problem?