Assembler Efficient Discovery of Spatial Co-evolving Patterns in Massive Geo-sensory Data Sheng QIAN SIGKDD 2015
Content 1. Introduction 2. Problem Description 3. The Assembler Method Stage I Detecting Individual Evolutions Stage II SCP Generation Time and space complexity 4. Experiment
Introduction Spatial Co-evolving Patterns(SCP) e.g. AQI Sensors in Beijing
Introduction Challenge Interesting evolutions are often flooded by trivial fluctuations The pattern search space is extremely large
Problem Description Our Interest
Problem Description Symbol S = {s 1, s 2,..., s m }Sensors l i Location of s i T = {t 1, t 2,..., t n }Time domain
Problem Description Definitions
Definitions
Definitions
Method: I. Detecting Individual Evolutions Haar Wavelet Transformation
Method: I. Detecting Individual Evolutions Haar Wavelet Transformation c ij
Method: I. Detecting Individual Evolutions Evolving interval extraction
Method: I. Detecting Individual Evolutions Mining Frequent Evolutions Segment-and-group approach 1. Segement: bottom-up 2. Mean Shift: divide segements into groups such that the segments in the same group have similar slopes
Method: II. SCP Generation The Anti-monotonicity Property
Method: II. SCP Generation Find SCP by intersecting matching timestamps
Method: II. SCP Generation SCP Search Tree
Method: II. SCP Generation Neighbor and Parent
Method: II. SCP Generation SCP Search Tree
Method: II. SCP Generation Algorithm
Mining Frequent Evolutions Segment-and-group approach 1. Segement: bottom-up 2. Mean Shift: divide segements into groups such that the segments in the same group have similar slopes
Method: Discussion Time Complexity Segment approach : Segment approach : O(n e · l e · l s ) ≈ O(m) ls is small, ne · le <m Mean Shift : Mean Shift : O(n l · k) ≈ O(m) k: the avg. number of shifting operation Second Stage : Second Stage : O(n G (n|E G | + n p 2 n s )) n G : the number of connected components in G that have SCPs |E G | : the number of edges in G n p : the maximum number of SCPs on a connected component n s : the maximum support of an SCP
Method: Discussion Space Complexity Segment & Mean Shift: nearly linear Second Stage: Second Stage: O(n · n p · n s )
Method: Discussion Parameters Setting The minimum support θ How many occurrences can be considered frequent enough The distance threshold h What distance makes two sensors reachable The change threshold δ How much change in the reading reflects a significant and unusual behavior The mean shift bandwidth ω
Experiment Dataset 1. Air is an air quality data set. 180 air quality sensors are deployed in 16 cities in northern China (Beijing, Tianjin, and 14 cities in the Hebei Province). Each sensor has measured the hourly AQI during the period – Bike is the Citi Bike rental data set for the 332 rental docks in New York, we record the number of available bikes at each dock every 30 minutes during – Syn-Sensor is a collection of 4 synthetic data sets used to evaluate the scalability of Assembler w.r.t. the number of sensors n
Experiment Illumination
Illumination
Efficiency Study Varing and h Efficiency Study Varing θ and h
Experiment Efficiency Study Varing and w Efficiency Study Varing δ and w
Experiments Scalability
Thank you