Download presentation
Presentation is loading. Please wait.
Published byEarl McKenzie Modified over 9 years ago
1
1 Cascading spatio-temporal pattern discovery: A summary of results Pradeep Mohan¹, Shashi Shekhar¹, James A.Shine², James P.Rogers 2 ¹University of Minnesota, Twin-Cities, {mohan,shekhar}@cs.umn.edu ²Engineering Research and Development Center, Alexandria, VA {James.A.Shine, James.P.Rogers.II}@usace.army.mil
2
2 Outline Introduction Motivation Problem Statement Related Work Contributions Conclusion and Future Work Interest Measure CSTP Miner Algorithm Evaluation and Case Study
3
3 Motivation : Public Safety Stages: Bar Closing, Assault, Drunk Driving, Hurricane, Climate change etc. Cascading spatio-temporal pattern (CSTP) Bar Closing Assault Drunk Driving Partially ordered subsets of ST event types. Located together in space. Occur in stages over time. Other Applications: Climate change, epidemiology, evacuation planning. T1T2T3 B.2 B.1 C.1 C.2 C.3 C.4 A.1 A.3 A.2 A.4 Assault(A) Drunk Driving (C) Bar Closing(B) Aggregate(T1,T2,T3) C2 C.3 C.4 C.1 A.1 A.3 A.2 A.4 B.2 B.1
4
4 Problem Definition Input : a) ST framework, b) directed ST neighbor relation R, c) Interest measure threshold Objective : a) Minimize computation costs while discovering statistically meaningful CSTPs. Output : A set of CSTPs with interestingness >= threshold Constraints : a) Correctness and Completeness ST Join (R) R = {0.5 Miles, 2 min.} Example: B A C Threshold = 0.5 Aggregate(T1,T2,T3) C2 C.3 C.4 C.1 A.1 A.3 A.2 A.4 B.2 B.1
5
5 Challenges and Contributions Space and Time are continuous Many overlapping ST neighborhoods Neighborhood enumeration is computationally challenging Conflicting Requirements Ex., Statistical interpretation Vs. computational scalability Exponential Candidate Space Ex., Candidate CSTPs exponential in the number of event types Interest Measures Statistical Interpretation Computational Structure CSTP Miner Algorithm Filtering Strategies Evaluation Experimental Evaluation Case study Challenges Contributions
6
6 Limitations of Related Work: ST Data Mining Limitations [ST Co-occurrence] Treating space and time independently. Absence of partial order [ST Sequence] Does not account for multiply connected patterns(e.g. nonlinear) Misses non-linear semantics. No ST statistical interpretation. Related WorkST SequencesST Subsets Partial Order√X Multiply connected X√ Multiple patterns √√ ST Statistical Interpretation X (only spatial) X
7
7 Interest Measures Cascade Participation Ratio (CPR) : [Conditional Probability of observing an instance of CSTP having seen an Instance of A] Cascade Participation Index (CPI) : Lower bound on the Conditional Probability of observing an instance of CSTP having seen an Instance of A, B or C B A C Aggregate(T1,T2,T3) C2 C.3 C.4 C.1 A.1 A.3 A.2 A.4 B.2 B.1
8
8 Interest Measures: Statistical Interpretation ST K-Function2/93/9 = 1/39/9 = 1 CPI2/311 Time Axis X Axis Y Axis Spatial Statistics: ST K-Function (Diggle et al. 1995) Cascade Participation Index (CPI) is an upper bound to the ST K-Function Example:
9
9 CSTP Miner Algorithm: Overview Upper Bound Filter Candidate Generation * Multi-resolution Filter Cycle checking Compute CPI Prune CSTP Prevalent CSTPs *using same strategy as [Kuramochi and Karypis’04] Cycles Removed R CPI Threshold Filtering Choice Pruned CSTPs CPI computation involves ST Join. ST Join Sort-merge over time Nested loop over space. Computational Bottleneck!
10
10 Filtering strategies Enhance Savings : Filter Non-prevalent CSTPs before CPI computation Before Candidate Generation: Upper bound (UB)filter After Candidate Generation: Multi-resolution ST(MST) filter Key Idea There exists a low dimensional embedding in space and time. Over estimate CPI by coarsening ST dataset. If Overestimate (CPI) < Threshold : Pruned Key Idea CPI has anti-monotone upper bound.
11
11 Evaluation Real Dataset: City of Lincoln, Nebraska, Year 2007 Matlab 7.0, X5355 2.66 GHZ with 16 GB Main Memory and Linux OS Events within an interval of 10 minutes were assigned the same time stamp. Goals a.What is the effect of # event types on execution time ? b.What is the effect of CPI threshold ? c. Other experiments: Effect of Neighborhood size, Dataset size, Grid Parameters
12
12 Experimental Analysis Questions a. What is the effect of # event types ? b. What is the effect of CPI threshold ? Trends: a. Patten size is exponential in the number of event types. b. MST filter enhances computational savings. Fixed parameters : a. CPI = 0.2 b. Time Neighborhood = 1750 Time stamps. Fixed parameters : a. # of event types = 5 b. Time Neighborhood = 1750 Time stamps.
13
13 Lincoln, NE crime dataset: Case study Is bar closing a generator for crime related CSTP ? Observation: Crime peaks around bar-closing! Bar locations in Lincoln, NE Is bar closing a crime generator ? Are there other generators (e.g. Saturday Nights )? Questions K.S Test: Saturday night significantly different than normal day bar closing (P-value = 1.249x10 -7, K =0.41)
14
14 Conclusions Cascading ST Patterns are useful in applications like Public Safety and Climate change science. Future work New interest measure alternatives. Qualitative Comparison with Graphical Models (e.g. Dynamic Bayes Nets, Hidden Markov Models etc.) ST Multi-resolution filtering enhances computational performance. Complementary filtering strategies. Statistically interpretable interest measure.
15
15 Acknowledgment Members of the Spatial Database and Data Mining Research Group University of Minnesota, Twin-Cities. This Work was supported by Grants from USARMY and NSF. Thank You for your Questions, Comments and Patience!
16
Crime Report Schema Alignment University of Texas at Dallas
17
Overview Washington DC Incidents Reported Lincoln_Nebraska Incidents Reported NIDCCNence…LongLatitude 376857139 8 Arson38.87 01018 1 - 76.982223 7 378751911 0 Theft38..88 852 - 76.937003 3 377951909 7 Burglary38.95 143 - 77.023804 8 INC_Time_Date_…Team_ Area 4511121:2411-17- 2007 Northwe st Team 4100018:22 12-2-2007 Center Team Two different tables from two different data sources. Our goal is to align attributes between two tables. CodeCrime 45111Arson 41000Auto-theft 41000Unauthorize d use of motor vehicle
18
Dataset ER Diagram Washington DC Lincoln Crime_type Bars Football Match Incident_2007_reported Football Match Bars located crime Crime located Crime is an attribute in Washington DC Dataset, while it is a table in Lincoln Dataset. Heterogeneity
19
Schema Alignment –Syntactic Matching: Keyword-based matching on Crime name Lincoln.CrimeType. IncidentClassification = “Robbery” Washington.Crime = “Robbery” –Semantic Matching: Semantically Relevant A. Specialization vs. Generalization –Lincoln.CrimeType. IncidentClassification = “Death” –Washington.Crime = “Homicide” –Death is super class of Homicide B. Finding Semantic Matching I.Definition of Crimes Using shared Words to determine Similarity II.Relevant Words Find relevant words using K-medoid Clustering and Normalized Google Distance (NGD) * * Jeffrey Partyka, Latifur Khan, Bhavani Thuraisingham, “Geographically-Typed Semantic Schema Matching,” In Proc. of ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS 2009), Seattle, Washington, USA, November 2009. Extended Version Submitted to Journal of Web Semantics, Springer.
20
I. Finding Semantic Matching using Definition of Crime Finding shared words to determine similarity Larceny-Theft: Unlawful taking, carrying, leading, or riding away of property from the possession or constructive possession of another; attempts to do these acts are included in the definition. [1] Theft: Illegal taking of another person's property without that person's freely-given consent. [2] Assault: An act that causes another to apprehend an immediate harmful contact. [3] Red keywords are common words in crime definitions, while blue keywords are not common.. [1] http://www.fbi.gov/ucr/cius_04/offenses_reported/property_crime/larceny-theft.html [2] http://en.wikipedia.org/wiki/Theft [3] http://en.wikipedia.org/wiki/Assult
21
: Column 1 : Column 2 Similarity = H(C|T) / H(C) Washington DC Lincoln Step 3Calculate Similarity Extract distinct keywords from compared columns Group distinct keywords together into semantic clusters Keywords extracted from columns = {Arson, Theft, Stolen, …} “Arson”,”Theft”,”Burglary”,…. “Arson”,”Theft”,”Northwest”…. C1C1 C2C2 C1 U C2C1 U C2 Step 1 Step 2 II. K-medoid + NGD Instance Similarity OffenceLongLatitude Arson38.8701018 1 - 76.9822237 Theft38..88852- 76.9370033 Burglary38.95143- 77.0238048 INC_Team_Area ArsonNorthwest Team TheftCenter Team
22
22 Lincoln, NE crime dataset: Case study Pop IPop IIKSP-Val.α = 0.05α = 0.2 Sat NightAll Year0.41871.249x10 -7 Yes Football Night All Year0.34000.1067NOYes Sat NightFootball Night 0.19870.7899NONo
23
23 Limitations of Related Work: Traditional Data Mining A.1 A.2 B.1 C.1 C.2 A.3A.4 C.3 C.4 Y T X B.2 Space partition Time partition Properties Transaction is a core concept. Support as an interest measure. Limitations Transactionization of a continuous framework → non empty cutsets. Support (frequency) leads to double counting of overlapping edges. Related Work Transaction Graph Mining. Sequential pattern mining.
24
24 Graph Mining Limitations A.1 A.2 D.1 D.2 B.1C.2 C.3 A B C Output Frequent Pattern Space (G F ) Input Dataset Space (G I ) And Other patterns Overlap Graphs of Embeddings E13 E11 E12 (A.1, B.1) E11 E12 E13 (C.2, B.1) Choice 1 = {E11,E13} = 2 Choice 2 = {E12} = 1 A.1 B.1C.2 A.1 B.1 C.3 A.2 B.1C.3 Maximum Independent Set Choices Properties a.MIS Choices are non-unique. b.No statistical interpretation. c.Exact solutions are NP Complete, Approximate solutions need not be complete.
25
25 Related Work Data Mining Traditional DM Spatio-temporal DM SequencesGraphsSequencesSubsetsGraphsAssociation TransactionSingle Graph Single PatternMultiple Patterns Our Work Related WorkST SequencesST SubsetsProcess MiningMIS Graph Patterns Partial OrderXX√X Multiply connected X√√X (un directed graphs) Multiple patterns √√X√ ST Statistical Interpretation X (only spatial) XXX MIS Patterns Process Mining Limitations of Related Work
26
26 More experimental analysis c. What is the effect of temporal neighborhood size ? What is the effect of spatial neighborhood size ? More Questions What is the effect of dataset size ? Fixed parameters : a. CPI = 0.2 b. Time Neighborhood = 1750 time stamps c. Spatial Neighborhood = 7 miles. d. Dataset size = 4083 instances e. Event types = 5 a.MST filter enhances computational savings. b.Performance sensitive to time neighborhood size. c.Performance not very sensitive spatial neighborhood. Trends:
27
27 Sensitivity of MST filter to grid parameters What is the Sensitivity of MST grid parameter d ? What is the Sensitivity of MST grid parameter t ? Fixed parameters : a. Grid parameter, t = 2000 time stamps Fixed parameters : a. Grid parameter, d = 7 miles a.MST Filter is more sensitive to the temporal parameter (t) than the spatial parameter. Trends:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.