Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr. Hesam Izakian October 2014

Similar presentations


Presentation on theme: "Dr. Hesam Izakian October 2014"— Presentation transcript:

1 Dr. Hesam Izakian October 2014
Cluster-Centric Anomaly Detection and Characterization in Spatial Time Series The slides can be downloaded from Dr. Hesam Izakian October 2014

2 Outline Spatial time series Problem formulation
Anomaly detection in spatial time series- questions Overall scheme of the proposed method Time series segmentation Spatial time series clustering Assigning anomaly scores to clusters Visualizing the propagation of anomalies An outbreak detection scenario Application Conclusions

3 Spatial time series Structure of data Examples
A set of spatial coordinates One or more time series for each point Examples Daily average temperature in different climate stations Stock market indexes in different countries Number of absent students in different schools Number emergency department visits in different hospitals Measured signals in different parts of brain Alberta Health Services is recording Number of ED visits in different hospitals Absenteeism in different schools etc. For outbreak detection in the province

4 Problem formulation There are N spatial time series
Objective: Find a spatial neighborhood of data In a time interval Containing a high level of unexpected changes N : number of spatial time series r : number of features in spatial part of data n : length of time series xi(s): spatial part of data xi(t): time series part of data l :length of time interval

5 Anomaly detection in spatial time series- questions
Spatial neighborhood of data Size of neighborhood Overlapping neighborhoods Unexpected changes (anomalies) What kind of changes are expected/not expected How to evaluate the level of unexpected changes Anomaly visualization Anomaly characterization What was the source of anomaly How the anomaly is propagated over time

6 Overall scheme of the proposed method
Revealing the structure of data in various time intervals Comparing the revealed structures Spatial time series data Spatial time series data Sliding window Anomaly scores Spatial time series clustering Fuzzy relations

7 Time series part segmentation
Sliding window Spatio-temporal subsequences Local view of time series part The spatial part of data is always fixed

8 Overall scheme of the proposed method
Revealing the structure of data in various time intervals Comparing the revealed structures Spatial time series data Sliding window Anomaly scores Spatial time series clustering Fuzzy relations

9 Fuzzy C-Means clustering- visual illustration
Clustering: Grouping objects (data) so that objects in the same group are similar and objects inside different groups are dis-similar K-means is one of the well-known clustering techniques with Boolean membership assignment

10 Fuzzy C-Means clustering- visual illustration
In fuzzy clustering instead of Boolean assignment of data to clusters, membership degrees are employed

11 Fuzzy C-Means clustering…
Partitions N data Into clusters Result: Objective function: Minimization: N data c cluster uik indicates the membership degree of xk to vi m controls the overlap between clusters (fuzziness) FCM algorithm: 1- Generate a partition matrix U randomly 2- Calculate cluster centers 3- Update partition matrix 4- If not converged go to 2

12 Spatial time series clustering
Reveals available structure within data In form of partition matrices Challenges Different sources: Spatial part vs. temporal part Different dimensionality in each part Different structure within each part A partition matrix is able to express the structure of data in terms of a set of membership degrees

13 Spatial time series clustering…
In spatial time series, we define Adopted FCM objective function Characteristics When λ=0: Only spatial part of data in clustering A higher value of λ : a higher impact of time series part in clustering Optimal value of λ: Optimal impact of each part in clustering X

14 Spatial-time series clustering- Optimal value of λ
Reconstruction criterion evaluates the quality of clusters in terms of data granularization and de-granularization

15 Overall scheme of the proposed method
Revealing the structure of data in various time intervals Comparing the revealed structures Spatial time series data Sliding window Anomaly scores Spatial time series clustering Fuzzy relations

16 Assigning anomaly scores to clusters in different time windows
Assign an anomaly score to each single subsequence based on historical data Aggregating anomaly scores inside revealed clusters fk is the anomaly score calculated for the subsequence corresponding to xk in time window

17 Overall scheme of the proposed method
Revealing the structure of data in various time intervals Comparing the revealed structures Spatial time series data Sliding window Anomaly scores Spatial time series clustering Fuzzy relations

18 Visualizing the propagation of anomalies- Fuzzy relations
Objective: quantifying relations between clusters Each data in time interval Wi is expressed in terms of a set of membership degrees in Ui So, each spatial time series xk is expressed in Ui as a set of membership degrees

19 Visualizing the propagation of anomalies…
Objective function to construct relation Optimization To construct a fuzzy relation R, we try to estimate the elements of U1 through the elements of U2 o is a sup-t composition (e.g., max-min operator) c1: number of clusters in U1 that is correspond to W1 c2: number of clusters in U2 that is correspond to W2 R is a matrix is size c1 * c2 and its elements are in range [0, 1] alpha: learning rate rij=1 means that ith jth cluster in U2 has a strong relation with ith cluster in U1 rij=0 indicates no relation

20 Example An outbreak In southern part of Alberta
Using NAADSM for 100 days NAADSM: North American Animal Disease Spread Model For each station we will have its x-y coordinates and a time series in length 100 measuring the rate of infected herds in 100 days

21 Example… A sliding window is used
Length : 20 Movement: 10 Generated spatio-temporal subsequences:

22

23 Example… The order of clusters are different in different time windows (why?)

24 Example… Calculated anomaly scores for each cluster

25 Example… Only strong fuzzy relations corresponding to anomalous clusters are considered in this figure

26 Example… One may represent the structure of data in different time intervals using a graph-based representation Nodes are clusters Edges are relations between clusters The numbers reported above nodes are anomaly scores

27 Application Implemented for Agriculture and Rural Development (Government of Alberta) Using KNIME (Konstanz Information Miner) Animal health surveillance in Alberta Anomaly detection Data visualization

28 Conclusions A framework for anomaly detection and characterization in spatial time series is developed A sliding window to generate a set of spatio-temporal subsequences is considered Clustering is used to discover the available structure within the spatio-temporal subsequences An anomaly score assigned to each revealed spatio-temporal cluster A fuzzy relation technique is proposed to quantify the relations between clusters in successive time steps For more information please see 1. Hesam Izakian and Witold Pedrycz, Anomaly Detection and Characterization in Spatial Time Series Data: A Cluster-Centric Approach, IEEE Transactions on Fuzzy Systems, DOI: /TFUZZ , 2014. 2. Hesam Izakian and Witold Pedrycz, Agreement-Based Fuzzy C-Means for Clustering Data with Blocks of Features, Neurocomputing, vol. 127, pp. 266–280, 2014. 3. Hesam Izakian, Witold Pedrycz, and Iqbal Jamal, Clustering Spatio–temporal Data: An Augmented Fuzzy C–Means, IEEE Transactions on Fuzzy Systems, vol. 21, no. 5, pp. 855 – 868, 2013.

29 Thank you


Download ppt "Dr. Hesam Izakian October 2014"

Similar presentations


Ads by Google