The Evolution of Spatial Outlier Detection Algorithms - An Analysis of Design CSci 8715 Spatial Databases Ryan Stello Kriti Mehra
Outline Background of the project Problem Statement Review of issues in traditional outlier detection Spatial Outlier Detection Outlier Detection in Spatio-Temporal Data Issues with modeling spatio-temporal data Solutions proposed Techniques to model spatio-temporal data Issues with detecting outliers in these models Summary Suggestions for Future Work
Background of the project - Survey of papers - Focus is on Spatio-Temporal Outlier Detection - Classification is done with a view to understand outlier detection in the Spatio-Temporal domain
Problem Statement Given: Techniques for outlier detection in traditional, spatial and spatio-temporal domain Find: To provide a classification of spatial outlier detection algorithms and highlight shortcomings as problem complexity increases to the spatio-temporal domain Objective: To inform the reader of the complexity of spatial outlier detection and motivate further efforts Constraints: Non-exhaustive
Review of issues in traditional outlier techniques Low – Dimensional spatial outlier detection Low – Dimensional spatial outlier detection Restricted so that imposing a grid is easy Distributive: Normalizes the whole data and pulls out an outlier Average of values is a typical metric Median could be used Iterative: Apply to each neighborhood High- Dimensional Spatial Outlier Detection High- Dimensional Spatial Outlier Detection Statistical: Applied to the entire data and hence fails Space Reduction
Spatial Outlier Detection Spatial Data: Define a region Define proximity relationship Only when region and proximity relationship are defined can the concept of spatial outlier be defined Various methods have been devised to define region and proximity relationship…. Which one should be applied to our application?
Spatio-Temporal Data Increased Complexity due to: Issues with spatial outlier detection already exist A new attribute has to be considered - Time High dimensionality Scenarios: Car making a sharp turn Movement of a cluster of stars Pollution of a lake due to industrial dump Global warming Should the definition of region be the same? Should the definition of proximity relationship be the same? Would the data model used in these scenarios be the same?
Modeling Spatio-Temporal Data Issues and Solutions
Issues Based on the “snapshot” phenomena: GIS applications take photographs of a region periodically. Difficult to determine whether two mobile systems interacted between snapshots Drawback : Data-oriented, data can be recorded only at fixed intervals of time. F1, F2: Initial and final position of Flock of sheep I1, I2: Initial and final position of rain clouds Did the flock of sheep get wet?
Solution Proposed Transition from Data based modeling to Representation based modeling Representation of the data is required to incorporate spatial and temporal aspect
Spatio-Temporal Representations - Neighborhood-Based - Time Series Matching
Neighborhood-Based Determine the neighborhood of object Merge neighborhood sharing edges based on common concept Process: 1. Create Micro Neighborhood based on immediate spatial neighborhood to obtain a Voronoi Polygon Voronoi Diagram of a set of objects O is the subdivision of the plane into n polygons, with the property that a point q lies in the polygon corresponding to an object o i iff dist (q,o i )<=dist (q,o j ) for each o j belonging to O and j<>i; 2. Create Macro Neighborhood by merging micro neighborhoods that share an edge. 3. Detect outliers based on how different the value is from threshold.
Extension to the concept Is the temporal aspect embedded in the semantic process? More spatial than spatio – temporal representation Extension – E.g. Cars : neighborhood overlap X Outlier! Y
Outlier Detection in Mobile Objects Data for mobile objects contains large number of outliers Metric-based outlier detection is not effective Non-metric distance based functions : Similarity Based Time Series compared against a known/expected time series This method has complexity due to difficulty in Determining the expected time series What is the acceptable tolerance for imprecise matches? How much noise is acceptable?
Example of Spatio-temporal analysis of mobile object Normal behavior : Representation of normal behavior of a car would require defining possibilities ( variations caused by taking an exit and lane changing) Precision: frequency matching deviance from norm
Summary TraditionalSpatialSpatio-Temporal TransactionNeighborhoodEvents – where and when does the event occur Non-spatial attribute Representation with data e.g. Distance Abstracting with data to create a model
Future Work Past helps to analyze cause of events Food for thought Using spatio-temporal outlier detection to predict the future is more relevant than using it to analyze the past
Questions?