G10 Anuj Karpatne Vijay Borra Finding Sequential Patterns with Time Lags in Spatio-temporal Earth Science Datasets G10 Anuj Karpatne Vijay Borra
Outline Motivation Example of Dataset Problem Statement First principle example Challenges Novelty Proposed Approach Validation outline and future work
Motivation Deforestation in Indonesian peat lands for palm-oil plantation 2) 1) Increase in average winter temperature in higher latitudes due to global warming Degradation of soil due to poor soil moisture Increase in Pine Beetle Infestation Increased susceptibility of Forest Fires Forest Fires Sources: NASA Earth Observatory, Environment Protection Agency, National Oceanic and Atmospheric Administration, National Resources Canada
An example Dataset Neighborhood Relationship Time Lag Time Lag Active Fire Time Lag Neighborhood Relationship Time Lag Active Fire
Problem Definition Problem Statement: Input: Output: Objective: Finding sequential patterns in spatio-temporal real-valued events using multiple variables with time lags Input: Spatio-temporal gridded data with multiple variables Event Detection schemes for each variable with parameters and threshold Prevalence Threshold measures for a sequential pattern Spatial Neighborhood relationship Statistical Parameters of Time Lag Distribution Output: Patterns of the form where T denotes the time lag distribution of the event relationship (a vector of time lag values for which the relationship is prevalent) Examples: Constraints: Events are point processes occurring with time lags and not interval-based Chains of events being detected with ‘total orderedness’ assumption Homogenous data variables spanning the same range of time Event detection algorithms are robust in handling noise and seasonality of time series Algorithm is correct and complete Objective: Minimize computational time
First-Principle Example eA Time of Occurrence A1 3 A2 4 A3 6 A4 A5 A6 A7 5 A8 A9 A10 A11 A12 A13 A14 2 A15 eB Time of Occurrence B1 6 B2 2 B3 5 B4 B5 B6 4 B7 B8 B9 B10 eC Time of Occurrence C1 5 C2 6 C3 3 C4 2 C5 C6 4 C7 C8
Challenges Novelty Exponential size of search space A vector of prevalence thresholds to be considered for pruning at multiple time lag values . . . . . . . . . . . . Novelty Spatio-temporal sequential pattern mining Sequential pattern mining for Boolean events without time Lag Huang et al. Mohan et al. (Cascade) Moving Object Trajectory Mining Cao et al. STAR Sequential pattern mining for Real-valued events with time lag Our approach
Approach: Computing Prevalence Index: PI(t) For a pattern of size two, : For each possible Time Lag value t, Prevalence Ratio of A at time lag t: Prevalence Ratio of B at time lag t: Prevalence Index of at time lag t: minimum of Prevalence ratios of A and B at time lag t For patterns of size greater than two, : Find as the set of events in which participate in minimum of and
Running Example eAB Time Lag A1B1 3 A1B3 2 A1B7 A2B1 A2B3 1 A2B7 A3B1 A4B1 A4B3 A4B4 A4B6 A4B10 A6B3 A6B10 A7B10 A8B1 A9B1 A9B3 A9B6 A10B1 A11B3 A12B6 A13B6 A14B6 A14B4 A15B4 A15B10 eAB(0) eAB(1) eAB(2) eAB(3) A3B1 A2B3 A1B3 A1B1 A10B1 A2B7 A1B7 A4B1 A15B4 A4B6 A2B1 A4B10 A7B10 A4B3 A6B10 A9B6 A4B4 A9B1 A11B3 A6B3 A14B4 A12B6 A8B1 A13B6 A9B3 A14B6 PI(0) = 0.2 PI(1) = 0.5 PI(2) = 0.47 PI(3) = 0.3 Let eAB be the set of pairs of events in(A,B) satisfying Spatial neighborhood condition: events in eAB fall in spatial neighborhood Positive time lag condition: event in B occurs after event in A in eAB eA Time of Occurrence A1 3 A2 4 A3 6 A4 A5 A6 A7 5 A8 A9 A10 A11 A12 A13 A14 2 A15 eB Time of Occurrence B1 6 B2 2 B3 5 B4 B5 B6 4 B7 B8 B9 B10 eC Time of Occurrence C1 5 C2 6 C3 3 C4 2 C5 C6 4 C7 C8 Output additional statistical properties of event pair occurrences at the prevalent Time Lags
Pruning Strategy Lattice Growing Region If Prevalent Time Lag is empty, remove the candidate pattern from the list of frequent patterns Else, prune the prevalent time lags using statistical properties of the event occurrence distributions over the prevalent time lags Accept middle quantile of prevalent time Lag distribution as pruned time lags
Proposed Approach For size of patterns from 1 to n (number of variables) Generate candidate patterns of size (k+1) from frequent patterns of size (k) Compute Prevalence Index: PI(t) for each candidate pattern at each time lag Generate pattern instance distributions for time lags which have PI(t) > threshold If set of prevalent time lags is empty Discard the candidate pattern Else Prune the prevalent time lags using middle quantile of distribution and add the candidate pattern with the pruned time lags into set of frequent patterns of size (k+1) end
Properties of PI and the proposed approach Prevalence Index is anti-monotonic Prevalence Ratio is anti-monotonic: An event instance participates in a sequence only if it participates in all the subsequences of the pattern The approach is complete Prevalence Index is anti-monotonic for each time lag Apriori-based Candidate Generation Technique is complete Enumeration of event pairs for each spatial join is complete The approach is correct is correct only if: Events in A and B occur in a spatial neighborhood Events in B follow events in A with a time lag distribution T Pruning approach is correct
Insight into Real-world datasets Future Work Region of Interest – Peatland forests in Indonesia (tile h29v08 and tile h29v09) Types of datasets to be used and their respective event detection algorithms: Vegetation Index (EVI) Deforestation: V2delta, Gradual Decrease, Segmentation Forest Fire: KD6 + ID6 Land Surface Temperature Increase in annual land surface temperature because of fire Thermal Anomaly Index Precipitation Soil Moisture Aerosol Information EVI for Forest Fire EVI for Deforestation Land Surface Temperature (Day) Thermal Anomaly Index