Download presentation
Presentation is loading. Please wait.
Published byNigel Freeman Modified over 6 years ago
1
A Framework for Mining Sequential Patterns from Spatio-Temporal Event Data Sets Yan Huang, Liqin Zhang, Pusheng Zhang, IEEE Transactions on Knowledge and Data Engineering, 20 (4), 2008. Group Webpage: (G10) Group Members: Anuj Karpatne Vijay Borra
2
Outline Motivation Basic Concepts Problem Statement Challenges
Key Concepts Approach Validation Novelty Contributions Assumptions and Suggestions
3
Motivation Earth Science: Epidemiology: Climatology:
Global Warming Bug LifeSpan Forest Fire Global Warming Peatland Deforestation Reduced Soil Moisture Forest Fire Epidemiology: Transmission of West Nile Disease: Bird Mosquito Human Being Climatology: El Nino Increase in Forest Fires in Indonesia La Nina Decrease in Forest Fires in Indonesia El Nino La Nina
4
Basic Concepts Event Instance: <ID, Location, Time, Event Type>
Examples: <11, {5,5}, t0, Car Accident> <23, {5,2}, t2, Traffic Jam > <75, (75.50E, 23.30N), May 2009, Deforestation> <83, (75.30E, 23.10N), June 2010, Forest Fire> Event type Sequence: Event Type Event Type Event Type 3 Nominal 2D-Tuple t = 0 to T Categorical
5
Problem Statement Input: Output: Objective: Constraints:
Set of Event Types, F = { f1, f2, …, fk } Event Database, D = { e1, e2, …, en } where, ei = <IDi, locationi, timei, Event Typei F> For e.g. <a1, {x1,y1}, t1, A> <b3, {x2,y2}, t2, B> User-defined Neighborbood Relation: N(e) User-defined Threshold Output: Event type sequences S = {s1, s2, …, sl} where si = {fi(1) → fi(2) → …fi(m)} For e.g. <A → B> <C → D → B> <B → C> Objective: Minimize computational cost Constraints: The algorithm is correct and complete
6
Challenges Developing a scoring mechanism to assess the significance of a given sequential pattern Finding the interpretability of the scoring mechanism using spatial statistics Developing an algorithmic design for mining significant patterns Dealing with memory requirement constraints of the algorithm in the presence of large database of events
7
Key Concepts A sample spatio-temporal data set.
(Source: Fig. 2 of Huang et al.) A sample spatio-temporal data set. Densities of events of B in events of A’s neighborhoods, represented by shades of different intensities.
8
DensityRatio f.ε = set of events with event type f
f.ε = set of events with event type f (Source: Fig. 2 of Huang et al.) densityRatio( A → B) =
9
Sequence Index Event Type Sequence: S c1 c2 c3 c4 c5 c6 c7 a1 a2 a3 a4
b1 b2 b3 b4 b5 a1 a2 a3 a4 a5 a6 A.e B.e C.e A.e S[1] = A S[2] = B S[3] = C S[4] = A S[1:3] = {A→B →C}
10
Sequence Index Event Type Sequence: S c1 c2 c3 c4 c5 c6 c7 a1 a2 a3 a4
Belongs to an event sequence c1 c2 c3 c4 c5 c6 c7 a1 a2 a3 a4 a5 a6 b1 b2 b3 b4 b5 a1 a2 a3 a4 a5 a6 A.e B.e C.e A.e S[1] = A S[2] = B S[3] = C S[4] = A S[1:3] = {A→B →C}
11
Sequence Index Event Type Sequence: S c1 c2 c3 c4 c5 c6 c7 a1 a2 a3 a4
Belongs to an event sequence c1 c2 c3 c4 c5 c6 c7 a1 a2 a3 a4 a5 a6 b1 b2 b3 b4 b5 a1 a2 a3 a4 a5 a6 Does not belong to an event sequence A.e B.e C.e A.e S[1] = A S[2] = B S[3] = C S[4] = A S[1:3] = {A→B →C}
12
Sequence Index Event Type Sequence: S c1 c2 c3 c4 c5 c6 c7 a1 a2 a3 a4
Belongs to an event sequence c1 c2 c3 c4 c5 c6 c7 a1 a2 a3 a4 a5 a6 b1 b2 b3 b4 b5 a1 a2 a3 a4 a5 a6 Does not belong to an event sequence A.e B.e C.e A.e S[1] = A S[2] = B S[3] = C S[4] = A S[1:3] = {A→B →C}
13
Sequence Index Event Type Sequence: S c1 c2 c3 c4 c5 c6 c7 a1 a2 a3 a4
Belongs to an event sequence c1 c2 c3 c4 c5 c6 c7 a1 a2 a3 a4 a5 a6 b1 b2 b3 b4 b5 a1 a2 a3 a4 a5 a6 Does not belong to an event sequence A.e B.e C.e A.e S[1] = A S[2] = B S[3] = C S[4] = A S[1:3] = {A→B →C}
14
Sequence Index Event Type Sequence: S c1 c2 c3 c4 c5 c6 c7 a1 a2 a3 a4
Belongs to an event sequence c1 c2 c3 c4 c5 c6 c7 a1 a2 a3 a4 a5 a6 b1 b2 b3 b4 b5 a1 a2 a3 a4 a5 a6 Does not belong to an event sequence Belongs to a Tail Event Set A.e B.e C.e A.e S[1] = A S[2] = B S[3] = C S[4] = A S[1:3] = {A→B →C}
15
Sequence Index Event Type Sequence: S c1 c2 c3 c4 c5 c6 c7 a1 a2 a3 a4
Belongs to an event sequence c1 c2 c3 c4 c5 c6 c7 a1 a2 a3 a4 a5 a6 b1 b2 b3 b4 b5 a1 a2 a3 a4 a5 a6 Does not belong to an event sequence Belongs to a Tail Event Set A.e B.e C.e A.e S[1] = A S[2] = B S[3] = C S[4] = A S[1:3] = {A→B →C →A}
16
Sequence Index Event Type Sequence: S c1 c2 c3 c4 c5 c6 c7 a1 a2 a3 a4
Belongs to an event sequence c1 c2 c3 c4 c5 c6 c7 a1 a2 a3 a4 a5 a6 b1 b2 b3 b4 b5 a1 a2 a3 a4 a5 a6 Does not belong to an event sequence Belongs to a Tail Event Set A.e B.e C.e A.e S[1] = A S[2] = B S[3] = C S[4] = A S[1:3] = {A→B →C →A}
17
Sequence Index Properties: Antimonotone Property
Event Type Sequence: S Belongs to an event sequence c1 c2 c3 c4 c5 c6 c7 a1 a2 a3 a4 a5 a6 b1 b2 b3 b4 b5 a1 a2 a3 a4 a5 a6 Does not belong to an event sequence Belongs to a Tail Event Set A.e B.e C.e A.e S[1] = A S[2] = B S[3] = C S[4] = A Properties: Antimonotone Property Weak Antimonotone Property S[1:3] = {A→B →C →A}
18
STS-Miner Algorithm: Sequential Pattern Tree Node S[k]: Event Type
Tail Event Set DensityRatio(S[1:k-1] → S[k]) Algorithm: Start with empty sequence Do Depth-first expand (S[1:k]) For each event type S[k+1] Generate candidate pattern S[1:k+1] Compute new DensityRatio and Tail Event Set using follow join If (DensityRatio > threshold) Expand pattern tree by adding node S[k+1] Depth-first expand (S[1:k+1]) else Mark S[1:k+1] as terminal node end Source: Fig. 3 of Huang et al.
19
Algorithm Trace Source: Table 1 of Huang et al.
20
Slicing-STS-Miner When in-memory operations can’t be performed
Using uni-directional property of time for developing temporal slicing-based algorithm Considers each temporal slice at a time in a piece-meal fashion Three Phases of algorithm: Phase 1: Hashing Phase 2: Mining and Merging Phase 3: Pruning Hashing: Divide the time dimension into overlapping slices Source: Fig. 5 of Huang et al.
21
Slicing-STS-Miner Mining and merging: Pruning:
Process slices in time-increasing order Keep updating pattern tree Challenges faced in piece-meal processing: Duplicate sequences in overlapping areas of two consecutive slices. Sequences broken by boundaries of consecutive slices Crossing Tail Event Set: e Crossing Tail Event Set if e Tail Event Set, or e is located in the overlapping region of two consecutive slices e’ → e, where e’ is in slicei, e is in slicei+1 and e’, e are not in the overlapping area Modified Depth-First-Expand Maintain Crossing Tail Event Set as a queue CrossingQ Expand nodes in CrossingQ first till it is empty before moving to a new slice Pruning: Post-processing step as can only be applied when all the slices have been processed.
22
Cost Analysis Notations: In-Memory Processing: For Large Data sets:
Fi : Average size of fi.e ; 0 < < 1; 0 < < 1 pn : Number of maximal sequential patterns ps : Mean length of maximal sequential patterns costloadD : cost of loading data into main memory nSTS/ nSlicing-STS : number of times STS/Slicing-STS loads entire data into main memory In-Memory Processing: costSTS = O(Fi X pn X 2 ps X ) costSlicing-STS = O(Fi X p´n X 2 p´s X ), where, p´n > pn, and p´s > ps For Large Data sets: costSTS = costSTS + nSTS X costloadD costSlicing-STS = costSlicing-STS + nSlicing-STS X costloadD
23
Results Results on Synthetic Data: Real World Applications: NPP
Effect of Sequence Index Thresholds Effect of the Average Number of Event Sequences for Each Pattern Effect of the Average Pattern Size Effect of the Number of Patterns Effect of Slicing Size Real World Applications: NPP Temperature Precipitation Solar Radiation Evaporation
24
Contributions Introduction of 2 novel interest measures –
density ratio (for sequences of size utmost 2) Sequence index (otherwise) Proposed algorithmic designs: STS-Miner: A depth-first expand based mining method exploiting the weak antimonotone property of Sequence index. Slicing-STS-Miner: Utilizes temporal slicing to partition the dataset into overlapping slices when the number of events is too large to be processed in memory.
25
Novelty Related Work- Sequential pattern mining in the market-basket data analysis Events are discrete and considered as transactions in time Example Datasets: Web log click streams, DNA sequences and medical treatments Limitation: ‘Transactionization’ not suited for spatio-temporal data as space and time are continuous. Mining trajectory patterns in spatio-temporal data: Trajectory data of different moving objects reveal insights into the underlying travelling patterns of the objects. Limitation: Same object has to be tracked at different time instances for obtaining trajectory data. Trajectory analysis can only be applied if the trajectories have been provided apriori.
26
Assumptions Events are categorical and instantaneous
Events occur as totally ordered sequences (chains) Neighborhood Definition Contiguous Discrete DensityRatio = 1 implies conditional independence Statistical interpretability of DensityRatio is assumed
27
Suggestions Continuous and Interval-based events
Graphical models to address partial ordering of event types Improvements in neighborhood definition Real-valued based on spatio-temporal closeness Incorporating cyclicity (non-contiguous nature) in neighborhood functions using transformations such as basis functions or kernel functions Dynamically expanding or contracting neighborhoods Incorporating prior knowledge of influences between event type pairs Monte Carlo simulations for interpretability of DensityRatio = 1 More clear space and time complexity analysis
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.