Download presentation
Presentation is loading. Please wait.
Published byChrystal Chambers Modified over 9 years ago
1
Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques Pradeep Mohan * Department of Computer Science University of Minnesota, Twin-Cities Advisor: Prof. Shashi Shekhar Thesis Committee: Prof. F. Harvey, Prof. G. Karypis, Prof. J. Srivastava *Contact: mohan@cs.umn.edumohan@cs.umn.edu
2
Biography Education Ph.D., Student, Department. of Computer Science and Engineering., University of Minnesota, MN, 2007 – Present. B. E., Department. of Computer Science and Engineering, Birla Institute of Technology, Mesra, Ranchi, India. 2003-2007 Major Projects during PhD US DoJ/NIJ- Mapping and analysis for Public Safety CrimeStat.NET Libaries 1.0 : Modularization of CrimeStat, a tool for the analysis of crime incidents. Performance tuning of Spatial analysis routines in CrimeStat CrimeStat 3.2a - 3.3: Addition of new modules for spatial analysis. US DOD/ ERDC/ TEC – Cascade models for multi scale pattern discovery Designed new interest measures and formulated pattern mining algorithms for identifying patterns from large crime report datasets. 1
3
Thesis Related Publications Cascading spatio-temporal pattern discovery (Chapter 2) P. Mohan, S.Shekhar, J.A.Shine, J.P. Rogers. Cascading spatio-temporal pattern discovery: A summary of results. In Proc. Of 10 th SIAM International Conference on Data Mining 2010 (SDM 2010, Full paper acceptance rate 20%) P. Mohan, S.Shekhar, J.A.Shine, J.P. Rogers. Cascading spatio-temporal pattern discovery. IEEE Transactions on Knowledge and Data Engineering (TKDE). (Accepted Regular Paper, In Press ~20% Acceptance Rate) Regional co-location pattern discovery (Chapter 3) P.Mohan, S.Shekhar, J.A. Shine, J.P. Rogers, Z.Jiang, N.Wayant. A spatial neighborhood graph based approach to Regional co-location pattern discovery: summary of results. In Proc. Of 19 th ACM SIGSPATIAL International Conference on Advances in GIS 2011 (ACM SIGSPATIAL 2011, Full paper acceptance rate 23%) Crime Pattern Analysis Application (Chapter 4) S.Shekhar, P. Mohan, D.Oliver, Z.Jiang, X.Zhou. Crime pattern analysis: A spatial frequent pattern mining approach. M. Leitner (Ed.), Crime modeling and mapping using Geospatial Technologies, Springer (Accepted with Revisions). 2
4
Other Publications Spatio-temporal data analysis X.Zhou, S.Shekhar, P. Mohan, S. Leiss, P. Snyder. Discovering Interesting sub- paths in spatiotemporal datasets. In Proc. Of 19 th ACM SIGSPATIAL International Conference on Advances in GIS 2011 (ACM SIGSPATIAL 2011, Full paper acceptance rate 23%) Spatial data analysis P. Mohan, R. E. Wilson, S.Shekhar, B.George, N.Levine, M.Celik: Should SDBMS support a join index?: a case study from CrimeStat. In Proc. Of 16 th ACM SIGSPATIAL International Conference on Advances in GIS 2008 (ACM SIGSPATIAL 2008, Full paper acceptance rate 19%) P. Mohan, X. Zhou, S.Shekhar. Quantifying resolution sensitivity of spatial autocorrelation: A Resolution Correlogram approach. In Proc. Of 7 th International Conference on Geographic Information Science, 2012 (GIScience 2012, Full paper) S.Shekhar, M.R.Evans, J.M.Kang, P. Mohan. Identifying patterns in spatial information: A survey of methods. (Accepted) WIREs Data Mining and Knowledge Discovery, Wiley Interdisciplinary Reviews, John Wiley and Sons, Inc, 2011 (in press) 3
5
Outline Introduction Motivation Problem Statement Future Work Our Approach 4
6
Motivation: Public Safety Identifying events (e.g. Bar closing, football games) that lead to increased crime. Crime generators and attractors Identifying frequent crime hotspots Law enforcement planning Predicting crime events Predictive policing (e.g. Predict next location of offense, forecast crime levels around conventions etc.) Predicting the next location of burglary. Courtsey: www.startribune.comwww.startribune.com Question: What / Where are the frequent crime generators ? Question: Where are the crime hotspots ? Question: What are the crime levels 1 hour after a football game within a radius of 1 mile ? 5 Other Applications: Epidemiology Courtsey: https://www.llnl.gov/str/September02/Hall.htmlhttps://www.llnl.gov/str/September02/Hall.html
7
Scientific Domain: Environmental Criminology Courtsey: http://www.popcenter.org/learning/60steps/index.cfm?stepNum=16http://www.popcenter.org/learning/60steps/index.cfm?stepNum=16 Crime pattern theory Routine activity theory and Crime Triangle Courtsey: http://www.popcenter.org/learning/60steps/inde x.cfm?stepnum=8 http://www.popcenter.org/learning/60steps/inde x.cfm?stepnum=8 Crime Event: Motivated offender, vulnerable victim (available at an appropriate location and time), absence of a capable guardian. Crime Generators : offenders and targets come together in time place, large gatherings (e.g. Bars, Football games) Crime Attractors : places offering many criminal opportunities and offenders may relocate to these areas (e.g. drug areas) 6
8
Outline Introduction Future Work Our Approach Problem Statement Spatio-temporal frequent pattern mining problem Challenges 7
9
Spatio-temporal frequent pattern mining problem Given : Spatial / Spatio-temporal framework. Crime Reports with type, location and / or time. Spatial Features of interest (e.g. Bars). Interest measure threshold (P θ ) Spatial / Spatio-temporal neighbor relation. Find: Frequent patterns with interestingness >= P θ Objective : Minimize computation costs. Constraints : Correctness and Completeness. Statistical Interpretation (i.e. account for autocorrelation or heterogeneity) 8
10
Illustration: Output Cascading ST Patterns (Inputs: Spatial, Temporal Neighborhood - 0.5 miles, 20 mins, Threshold - 0.5) Regional Co-location patterns (Inputs: Spatial Neighborhood – 1 mile, Threshold- 0.25) Aggregate(T1,T2,T3) Time T1 Assault(A)Drunk Driving (C)Bar Closing(B) Time T3>T2Time T2 > T1 a BA C CSTP: P1 9
11
Challenges Spatio-temporal Semantics Continuity of space / time Partial order Conflicting Requirements Statistical Interpretation Computational Scalability Computational Cost Exponential set of Candidate patterns Time T1 Time T3>T2 Time T2 > T1 B.2 B.1 C.2 C.3 C.1 C.4 A.1 A.3 A.2A.4 A.5 a Aggregate(T1,T2,T3) B.1 B.2 A.2A.4 C.2 C.3 C.4 A.5 C.1 A.1 A.3 Time partitioning misses relationships Space partitioning misses relationships {Null} ABACBABCCACB C BA B CA C BA A BC C AB ………. C A B B AC A BC C BA B CA A C B # Patterns = Exponential (# event types) 10
12
Our Contributions 11 New Spatio-temporal frequent pattern families. Ex: Cascading ST Patterns and Regional Co-location patterns. Novel interest measures guarantee statistical interpretation and computable in polynomial time. Scalable algorithms based on properties of spatio-temporal data and interest measures. Experimental evaluation using synthetic and real crime datasets.
13
Outline Introduction Future Work Problem Statement Our Approach Big Picture Cascading Spatio-temporal pattern discovery Other Frequent Pattern Families 12
14
Spatio-temporal frequent pattern mining (SFPM) Process of discovering interesting, useful and non-trivial patterns from spatiotemporal data. Input Data SpatialSpatio-temporal (ST) Pattern Semantics UnorderedCo-location PatternsST Co-occurrences Totally OrderedXST Sequences Partially OrderedXCascading ST Patterns Statistical Foundation AutocorrelationCo-location PatternsCascading ST Patterns HeterogeneityRegional Co-location Patterns X Taxonomy of Spatio-temporal frequent patterns X: Unexplored 13 Today’s Focus
15
Cascading ST pattern (CSTP) Output: CSTP Partially ordered subsets of ST event types. Located together in space. Occur in stages over time. BA C CSTP: P1Aggregate(T1,T2,T3) Time T1 Assault(A)Drunk Driving (C)Bar Closing(B) Time T3>T2Time T2 > T1 a Input: Crime reports with location and time. 14
16
16 Related Pattern Semantics: ST Data mining Spatio-temporal frequent patterns Partially OrderedOthers Unordered (ST Co-occurrence) Totally Ordered (ST Sequences) Our Work (Cascading ST patterns ) ST Co-occurrence [Celik et al. 2008, Cao et al. 2006] Designed for moving object datasets by treating trajectories as location time series Performs partitioning over space and time. ST Sequence [Huang et al. 2008, Cao et al. 2005 ] Totally ordered patterns modeled as a chain. Does not account for multiply connected patterns(e.g. nonlinear) Misses non-linear semantics. No ST statistical interpretation. 15
17
Limitations of Related ST Pattern Semantics ST Sequence Total order Ex. (B A,A C) No ST statistical interpretation. BA C CSTP: P1 Time T1 Time T3>T2 Time T2 > T1 B.2 B.1 C.2 C.3 C.1 C.4 A.1 A.3 A.2A.4 A.5 A.1B.1 C.2 C.1A.5 Limitations Absence of Partial Order Ex. (B A, B C, A C) B.2 C.1 Possible ST Sequences 16
18
Interpretation Model: Directed Neighbor Graph (DNG) Nodes: Individual Events Directed Edge (N1 N2) iff Neighbor( N1, N2) and After(N2, N1) B.2 B.1 C.2 C.3 C.1 C.4 A.1 A.3 A.2A.4 TimeT1 TimeT3 TimeT2 Assault(A)Drunk Driving (C)Bar Closing(B) A.5 A.1 A.2 B.1 C.2 C.3 A.3 A.4 C.1 C.4 B.2 A.5 BA C CSTP: P1 17
19
Statistical Foundation: Interest Measures Instances of CSTP P1 : (B A, B C, A C) are (B1 A1, B1 C1, A1 C1) (B1 A3, B1 C2, A3 C2) ? ?(B1 A1; A1 C2; B1 C2) Cascade Participation Ratio : CPR (CSTP, M) : Conditional Probability of an instance of CSTP in neighborhood, given an instance of event-type M Examples Cascade Participation Index: CPI(CSTP) Min ( CPR(CSTP, M) ) over all M in CSTP Example: BA C CSTP: P1 A.1 A.2 B.1 C.2 C.3 A.3 A.4 C.1 C.4 B.2 A.5 18
20
Analytical Evaluation: Statistical Interpretation Cascade Participation Index (CPI) is an upper bound to the ST K-Function per unit volume. Example: ST -K (B A)2/6 = 0.333/6 = 0.56/6 = 1 CPI (B A)2/3 = 0.6611 A.1 A.3 B.1 A.2 B.2 A.1 A.3 B.1 A.2 B.2 A.1 A.3 B.1 A.2 B.2 Spatial Statistics: ST K-Function (Diggle et al. 1995) 20
21
Comparison with Related Interest Measures MeasureKey Property Frequency Double counting of pattern instances Maximum Independent Set (MIS) Size [Kuramochi and Karypis, 2004] NP Complete Scoring Criterion for Bayesian Networks [Neopolitan, 2003; Chickering, 1996] NP Complete Learning requires Prior specification Lower bound on vertex label frequency Frequency based interpretation. BA C CSTP: P1 MeasureValue Frequency3 / (What is the # of transactions ?) MIS2 Lower Bound on Frequency min{1,2,2} = 1 A.1 A.2 B.1 C.2 C.3 A.3 A.4 C.1 C.4 B.2 A.5 19
22
Computational Structure: CSTP Miner Algorithm Basic Idea Initialization for k in (1,2…3..K-1) and prevalent CSTP found do Generate size k candidates. Compute CSTP instances / Materialize part of DNG Calculate interest measure and select prevalent CSTPs. end Not part of a conventional apriori setting Item sets in Association rule mining Chemical compounds/sub graphs in graph mining. Directed acyclic graph in CSTP mining 21
23
CSTP Miner Algorithm: Illustration {Null} ABACBABCCACB C B A C BA A.1 A.2 B.1 C.2 C.3 A.3 A.4 C.1 C.4 B.2 A.5 CPI Threshold = 0.33 0 0.40.80.750.20 C BA C BA 0.4 0.8 0.4 Spatio-temporal join 22
24
Key Bottlenecks Computational Structure: CSTP Miner Algorithm Interest measure evaluation Exponential pattern space Space-Time Partition Join Strategy Time Ordered Nested Loop Strategy Filtering strategies Fixed Parameters: Spatial neighborhood = 0.62 miles and temporal neighborhood = 1hr, CPI threshold = 0.0055 Computational Strategies Reduce irrelevant interest measure evaluation Compute interest measure efficiently 23
25
CSTP Miner Algorithm: Interest Measure Evaluation ST Join Strategies: Perform each interest measure computation efficiently Time Ordered Nested Loop (TONL) Strategy Space-Time Partitioning (STP) Strategy Time Space = volume of ST neighborhood ST join by plane sweep A.1 A.2 B.1 C.2 C.3 A.3 A.4 C.1 C.4 B.2 A.5 # Edges = 13 24
26
CSTP Miner Algorithm: Alternative Ideas Can neighborhood graph be pre-computed ? 25 Trade off : Storage versus Online computation Cost of Storage Pre-computed Graph: O(#Edges+#Nodes) Example: 24 On-the-fly: O(#Nodes) Example: 11 Cost of computation Pre-computed graph: O(#Edges+#Nodes) Example: 24 On-the-fly: O(#Nodes * Log(#Nodes)) Example: 38 Other factors Dense vs Sparse data Positive ST autocorrelation
27
CSTP Miner Algorithm: Filtering Strategies Key Rationale : Enhance Savings filter non prevalent candidates early Upper bound (UB) filter Multi-resolution ST(MST) filter Key Idea There exists a low dimensional embedding in space and time. Over estimate CPI by coarsening ST dataset. If Overestimate (CPI) < Threshold: Pruned Key Idea CPI has anti-monotone upper bound. 26
28
Multi resolution ST Filter : CSTP Miner Algorithm: Filtering Strategies Summarizing on a coarser neighborhood yields compression in most cases. 27 BABABCBCACACCACA B.1 A.1B.1 C.2A.1 C.2C.1 A.5 B.1 A.3B.1 C.3A.3 C.3 B.2 A.2B.2 C.1A.1 C.3 B.2 A.4A.3 C.4 0.80.750.40.2 CPI Threshold = 0.33 Time Space Actual Relation Coarse Relation BABABCBCACACCACA (0,0) (1,0) (0,2) (1,2) (1,2)(1,2)(1,1)(2,0) (0,2) (1,2) (0,0)(1,1)(1,0)(1,1)(2,1)(2,0) (1,2)(2,1) (1,0)(2,1) 0.80.750.80.2
29
Experimental Evaluation :Experiment Setup Goals 1. Compare different design decisions of the CSTPM Algorithm - Performance: Run-time 2. Test effect of parameters on performance: - Number of event types, Dataset Size, Clumpiness Degree Experiment Platform: CPU: 3.2GHz, RAM: 32GB, OS: Linux, Matlab 7.9 28
30
Experimental Evaluation :Datasets Lincoln, NE Dataset Data size: 5 datasets Drawn by increments of 2 months 5000- 33000 instances Event types: Drawn by increments of 5 event types 5 – 25 event types. Real Data Synthetic Data Data size: 5 datasets 5000- 26000 instances Event types: 5 – 25 event types. Clumpiness Degree: 5- 25 instances per event type per cell. 29
31
Experimental Evaluation: Join strategy performance Question: What is the effect of dataset size on performance of join strategies? Trends: ST Partitioning improves performance by a factor of 5-10 on synthetic data and by a factor of 3 on real data. Fixed Parameters: Real Data (CPI = 0.15, 0.31 Miles, 10 Days); Synthetic data(0.5,25,25) 30
32
Experiment 2: What is the effect of # of event types on performance of join strategies? Trends: ST Partitioning improves performance by a factor 10 on synthetic data and by a factor of 2.5 on real data. Fixed Parameters: Real Data (CPI = 0.15, 0.31 Miles, 10 Days); Synthetic data(0.5,25,25) Experimental Evaluation: Join strategy performance 31
33
Experiment 3: What is the effect of dataset size on performance of filtering strategies? Trends: Filtering improves performance by a factor 5 on synthetic data and by a factor of 1.5 on real data. Fixed Parameters: Real Data (CPI = 0.15, 0.435 Miles, 10 Days); Synthetic data(0.65,70,70) Experimental Evaluation: Filtering strategy performance 32
34
Question: What is the effect of # of event types on performance of filtering strategies? Trends: Filtering improves performance by a factor 2.5 on synthetic data and by a factor of 1.3 on real data. Fixed Parameters: Real Data (CPI = 0.15, 0.435 Miles, 10 Days); Synthetic data(0.65,70,70) Experimental Evaluation: Filtering strategy performance 33
35
Question: What is the effect of clumpiness degree on different design decisions? Trends: a.Filtering improves performance by a factor 40 b.ST Partitioning improves performance by a factor of 10. Experimental Evaluation: Filtering strategy performance 34 Fixed Parameters: CPI = 0.5, 15.53 Miles, 1.04 Days
36
Lincoln, NE crime dataset: Case study Is bar closing a generator for crime related CSTP ? Observation: Crime peaks around bar-closing! Bar locations in Lincoln, NE Is bar closing a crime generator ? Are there other generators (e.g. Saturday Nights )? Questions K.S Test: Saturday night significantly different than normal day bar closing (P-value = 1.249x10 -7, K =0.41) 35
37
Lincoln, NE crime dataset: Case study 36
38
Lincoln, NE crime dataset: Case study Pop IPop IIKSP-Val.α = 0.05α = 0.2 Sat NightAll Year0.41871.249x10 -7 Yes Football Night All Year0.34000.1067NOYes Sat NightFootball Night 0.19870.7899NONo 37
39
Outline Introduction Future Work Problem Statement Our Approach Big Picture Cascading Spatio-temporal pattern discovery Other Frequent Pattern Families 38
40
Regional co-location patterns (RCP) Input: Spatial Features, Crime Reports. Output: RCP (e.g. ) Subsets of spatial features. Frequently located in certain regions of a study area. 39
41
Statistical Foundation: Accounting for Heterogenity Regional Participation Ratio Regional Participation index Example Conditional probability of observing a pattern instance within a locality given an instance of a feature within that locality. Example Quantifies the local fraction participating in a relationship. 40
42
42 Performance Tuning: Key Ideas Key Idea Interest Measure shows special pruning properties in certain subsets of the spatial framework. Maximal Locality RPI shows anti-monotonicity property within Maximal Localities Pruning a co-location, {AB}, prunes all its super sets (e.g. {ABC}, {ABCD}…etc.). Collection of connected instances. Maximal localities are mutually disjoint. Contains several RCPs. Key Observations Key Properties RPI within a Maximal locality is an upper bound to RPI of constituent Prevalence localities.
43
43 Performance Tuning {Null} CBA ML1ML2ML3 {AB},0.167 ✕ {BC},0.167 ✕ Compute Maximal Locality {ABC}: Pruned Automatically Due to anti-monotonicity of RPI {AC},0.25,0.25 No RCP Due to upper bound property of RPI Completeness Pruning a pattern within a maximal locality does not prune any valid RCPs. Correctness Accepting a pattern involves additional checks so that only prevalent RCPs are reported. {AB},0.25{BC},0.33{AC},0.25,0.167 ✕ ✕ Prevalence Threshold = 0.25
44
44 Experimental Evaluation: Spatial Neighborhood Size Trend s Run Time # of RCPs What is the effect of spatial neighborhood size on performance of different algorithms ? Fixed Parameters: Dataset Size : 7498 instances; # Features: 5; Prevalence Threshold: 0.07 Run Time: ML Pruning out performs PS Enumeration by a factor of 1.5 - 5 # of RCPs examined: ML Pruning out performs PS Enumeration by a factor of 4.13 - 19
45
45 Experimental Evaluation: Feature Types Trend s Run Time # of RCPs What is the effect of number of feature types on performance of different algorithms ? Fixed Parameters: Dataset Size : 7498 instances; Spatial neighborhood size: 800 feet; Prevalence Threshold: 0.07 Run Time: ML Pruning out performs PS Enumeration by a factor of 1.2 # of RCPs examined: ML Pruning out performs PS Enumeration by a factor of 1.6 – 3.5
46
RCPs from Lincoln Crime Data This result shows the interaction between Alcohol and Vandalism apart from highlighting outbreaks 41
47
Conclusions Proposed SFPM techniques (e.g., Cascading ST Patterns and Regional Co-location patterns) honor ST Semantics (e.g., Partial order, Continuity). Interest measures achieve a balance between statistical interpretation and computational scalability. Algorithmic strategies exploiting properties of ST data (e.g., multiresolution filter) and properties of interest measures enhance computational savings. 42
48
Future Work – Short and Medium Term Input Data SpatialSpatio-temporal (ST) Pattern SemanticsUnordered ✔✔ Totally OrderedX ✔ Partially OrderedXCSTP discovery Statistical Foundation Autocorrelation ✔ CSTP discovery HeterogeneityRCP DiscoveryX Underlying Framework EuclideanRCP DiscoveryCSTP discovery Non-Euclidean (Networks)XX Neighbor RelationUser specifiedRCP DiscoveryCSTP discovery Algorithm DeterminedXX Interestingness Criterion Interest measure thresholdRCP DiscoveryCSTP discovery Threshold freeXX Type of dataBoolean / CategoricalRCP DiscoveryCSTP discovery Quantitative data (e.g., Climate)XX X: Unexplored 43
49
Future Work – Long Term 43 Exploring interpretation of discovered patterns by law enforcement. ST Predictive analytics, Predictive models based on SFPM and Predictive policing. Towards Geo-social analytics for policing (e.g. Criminal Flash mobs, gangs, groups of offenders committing crimes) New ST frequent pattern mining algorithms based on depth first graph enumeration. ST frequent pattern mining techniques that account for patron demographic levels. Explore evaluation of choloropeth maps via ST frequent pattern mining.
50
Acknowledgment Members of the Spatial Database and Data Mining Research Group University of Minnesota, Twin-Cities. This Work was supported by Grants from U.S.ARMY, NGA and U.S. DOJ. Advisor: Prof. Shashi Shekhar, Computer Science, University of Minnesota. Thesis committee. U.S. DOJ – National Institute of Justice: Mr. Ronald E. Wilson (Program Manager, Mapping and Analysis for Public Safety), Dr. Ned Levine (Ned Levine and Associates, CrimeStat Program) U.S. Army – Topographic Engineering Center: Dr. J.A.Shine (Mathematician and Statistician, Geospatial Research and Engineering Division ) and Dr. J.P. Rogers (Additional Director, Topographic Engineering Center) Mr. Tom Casady, Public Safety Director (Formerly Lincoln Police Chief), Lincoln, NE, USA Thank You for your Questions, Comments and Attention! 44
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.