Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predictive Crime Analytics

Similar presentations


Presentation on theme: "Predictive Crime Analytics"— Presentation transcript:

1 Predictive Crime Analytics

2 Project Objective The project aims to analyze the crime data provided by CMPD and design predictive models implementing classification and neural networks to predict: In how many days a case can be closed Number of crimes to occur Case status of an Incident Our goal is to build predictive models based on different inputs, evaluate them and choose the best one.

3 CMPD Data Retrieval Time period- 2011-2016
42 tables (6 years X 7 tables) Linked by Complain_No Imported into an MS SQL Server

4 Data Enrichment Database CMPD Data Weather Unemployment Twitter
Special Events Database

5 Joining Data Charlotte Zipcode table was used to filter out all the records other than Charlotte Area Different tables within CMPD database are joined using Unique Complaint_No For Joining other Data Sources we used Twitter + CMPD using Date Special_Events + CMPD using Date

6 Unemployment + CMPD using (Year-Month)
Period labor force employment unemployment unemployment rate 2011 01 Jan 130927 11.5 02 Feb 126619 11.1 03 Mar 122085 10.6 04 Apr 119310 10.4 05 May 121838 Incident: Complaint_No Year Month Block_No Direction Street_Name Street_Type Suffix City State Zip 2011 01 4425 EDDLEMAN RD CHARLOTTE NC 28208 2228 BEATTIES FORD 28216 2300 N TRYON ST 4027 QUAIL GLENN CT K 28226

7 Weather + CMPD using Date
MeanDewPoint F MinDewPoint F MaxHumidity MeanHumidity MinHUmidity 01/01/2011 51 37 100 89 78 02/01/2011 05/02/2011 50 06/02/2011 07/02/2011 35 Incident: Complaint_No Incident_From_Date Block_No Direction Street_Name Street_Type Suffix City State Zip 01/01/2011 4425 EDDLEMAN RD CHARLOTTE NC 28208 2228 BEATTIES FORD 28216 2300 N TRYON ST 4027 QUAIL GLENN CT K 28226

8 Data Processing The data format was specified Outliers were replaced with the mean value of the field. Outlier cut off value was set to 3 standard deviations. Dates & Times cannot be used directly by most algorithms, so we estimated the duration period. All the missing data entries were replaced with: Continuous fields: replace with mean Nominal fields: replace with mode Ordinal fields: replace with median

9 Missing Values Treatment
The features with > 50% missing values were excluded. The rows with > 50% missing values were excluded. The fields with too many unique categories were excluded (> 100 categories). The categorical fields with >90% values in a single category were excluded. Sparse categories were merged to maximize association with target. Input fields that have only one category after supervised merging are excluded.

10 Analysis Type of incident distribution

11 The trend of the number of Incidents over the Incident hour and the case status

12 Number of Incidents compared by year and day of the week

13 Day of week over vehicle theft

14 Analysis of the vehicle body type that have been stolen most frequently per zip code

15 Feature engineering Feature engineering is fundamental to the application of machine learning. In order to improve our initial results we used Microsoft SQL Server Management Studio (SSMS) to create the following features: • We created Day of week and time of day features • Time period for a case to be closed was calculated from the reported and clearance date

16 Predictive Modeling (1)
Classification : Decision Tree Tool : SPSS Target : Case_Status Tree Depth : 5 Input : WeekDay : feature extracted from Incident_From_Date (MM/DD/YYYY HHMM). Month : feature extracted from Incident_From_Date (MM/DD/YYYY HHMM). TimeFrame : feature extracted from Incident_From_Date (MM/DD/YYYY HHMM). Place1: General place type (e.g., Residential, Retail, Open Area, etc.) Reporting_Agency : Agency that took the report - Airport Police, Charlotte Mecklenburg Police Location_Type : General location type - Indoors, Outdoors, Parking Lot, Parking Deck, Other Temp_Range : Discretized feature from the mean temperature Events & Unemployment_Rate : Taken from augmented Weather and Unemployment dataset

17 Incident_From_Date ( MM/DD/YYYY HHMM) → Discretization → Time Frame
Mean Temperature F → Interval Scaling → Temperature_Range Time Frame 00: :59 Midnight 04: :59 Early Morning 08: :59 Morning 12: :59 Afternoon 16: :59 Evening 20: :59 Night Temperature_Range 0 - 30

18

19 Performance Analysis :

20

21 Predictive Modeling (2)
Method : Neural Network - Multilayer perceptron Tool : SPSS Target : CLearance TimeFrame Important Predictors : Year Month Incident_Hour Numeber_Of_Tweets Incidents MeanTemperature F WeekDay MeanSealevelPressure Day MeanVisibilityMiles

22

23 Evaluation: “Day” level

24 Predictive Modeling (3)
Method : Neural Network - Multilayer perceptron Tool : SPSS Target : Number of Incidents Important Predictors : Year Month Incident_Hour Numeber_Of_Tweets WeekDay MeanTemperature F Unemployement ClearanceTimeFrame Day MeanVisibilityMiles MeanWindSpeedMPH Employment

25

26 Evaluation: “Month” level

27

28 Evaluation

29 Accuracy: Sample size Ratio between the sample size and the number of features used The relationship between features Initial weights and biases Target variable Ratio of training set: test set : validation set

30 Future Work —Application of Deep-Learning methods
Improve the individual Model Performance Add more datasets —Application of Deep-Learning methods Implement Unsupervised Learning

31 Credits: Mansi Dubey Madlen Ivanova Preneesh Jayaraj


Download ppt "Predictive Crime Analytics"

Similar presentations


Ads by Google