Predictive Crime Analytics

Slides:



Advertisements
Similar presentations
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
Advertisements

Introduction to Data Mining with XLMiner
Data Mining: A Closer Look Chapter Data Mining Strategies.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Attribute Data Input and Management
Data Mining: A Closer Look
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
Introduction to Directed Data Mining: Decision Trees
PAKDD'15 DATA MINING COMPETITION: GENDER PREDICTION BASED ON E-COMMERCE DATA Team members: Maria Brbic, Dragan Gamberger, Jan Kralj, Matej Mihelcic, Matija.
 The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.
Overview DM for Business Intelligence.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Outline Class Intros – What are your goals? – What types of problems? datasets? Overview of Course Example Research Project.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
University of Sunderland CSEM03 R.E.P.L.I. Unit 1 CSEM03 REPLI Research and the use of statistical tools.
P-1 © 2005 NeuralWare. All rights reserved. Using Neural Networks in Decision Support Systems Introduction Core Technology Building and Deploying Neural.
Outline Class Intros Overview of Course Example Research Project.
Part II Tools for Knowledge Discovery Ch 5. Knowledge Discovery in Databases Ch 6. The Data Warehouse Ch 7. Formal Evaluation Technique.
In Stat-I, we described data by three different ways. Qualitative vs Quantitative Discrete vs Continuous Measurement Scales Describing Data Types.
Time Span Analysis David Attaway.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Data Mining and Decision Support
Chong Ho Yu.  Data mining (DM) is a cluster of techniques, including decision trees, artificial neural networks, and clustering, which has been employed.
1.  The practice or science of collecting and analyzing numerical data in large quantities, especially for the purpose of inferring* proportions in a.
1 Traffic accident analysis using machine learning paradigms Miao Chong, Ajith Abraham, Mercin Paprzycki Informatica 29, P89, 2005 Report: Hsin-Chan Tsai.
Descriptive Statistics Printing information at: Class website:
Show Me Potential Customers Data Mining Approach Leila Etaati.
Geog. 314 Working with tables.
Data Preliminaries CSC 600: Data Mining Class 1.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
A Smart Tool to Predict Salary Trends of H1-B Holders
Restaurant Revenue Prediction using Machine Learning Algorithms
Machine Learning with Spark MLlib
Data Mining – Intro.
INTRODUCTION AND DEFINITIONS
Chapter 6 Introductory Statistics and Data
CEE 6410 Water Resources Systems Analysis
Data Mining Generally, (Sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it.
Introduction to Spatial Statistical Analysis
Research Methodology Lecture No :25 (Hypothesis Testing – Difference in Groups)
Data mining and statistical learning, lecture 1b
Microsoft Office Illustrated
Database Management  .
Advanced Analytics Using Enterprise Miner
Intro to Machine Learning
Baselining PMU Data to Find Patterns and Anomalies
DAX and the tabular model
SQL Saturday New York City May 19th, 2018
Prepared by: Mahmoud Rafeek Al-Farra
Machine Learning with Weka
The Combination of Supervised and Unsupervised Approach
Predicting Frost Using Artificial Neural Network
Analytics: Its More than Just Modeling
CSCI N317 Computation for Scientific Applications Unit Weka
CSCI N317 Computation for Scientific Applications Unit Weka
Department of Electrical Engineering
Machine Learning Interpretability
What Is Good Clustering?
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Intro to Machine Learning
Data Science in Industry
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Ms. Saint-Paul A.P. Psychology
Lecture 1: Descriptive Statistics and Exploratory
Data Preliminaries CSC 576: Data Mining.
Topic 12 Lesson 2 – Retrieving Data with Queries
Data Pre-processing Lecture Notes for Chapter 2
Chapter 6 Introductory Statistics and Data
About Data Analysis.
Working with Temporal Data
Presentation transcript:

Predictive Crime Analytics

Project Objective The project aims to analyze the crime data provided by CMPD and design predictive models implementing classification and neural networks to predict: In how many days a case can be closed Number of crimes to occur Case status of an Incident Our goal is to build predictive models based on different inputs, evaluate them and choose the best one.

CMPD Data Retrieval Time period- 2011-2016 42 tables (6 years X 7 tables) Linked by Complain_No Imported into an MS SQL Server

Data Enrichment Database CMPD Data Weather Unemployment Twitter Special Events Database

Joining Data Charlotte Zipcode table was used to filter out all the records other than Charlotte Area Different tables within CMPD database are joined using Unique Complaint_No For Joining other Data Sources we used Twitter + CMPD using Date Special_Events + CMPD using Date

Unemployment + CMPD using (Year-Month) Period labor force employment unemployment unemployment rate 2011 01 Jan 1140797 1009870 130927 11.5 02 Feb 1143600 1016981 126619 11.1 03 Mar 1149706 1027621 122085 10.6 04 Apr 1150753 1031443 119310 10.4 05 May 1154418 1032580 121838 Incident: Complaint_No Year Month Block_No Direction Street_Name Street_Type Suffix City State Zip 20110101000308 2011 01 4425 EDDLEMAN RD CHARLOTTE NC 28208 20110101000700 2228 BEATTIES FORD 28216 20110101001104 2300 N TRYON ST 20110101001302 4027 QUAIL GLENN CT K 28226

Weather + CMPD using Date MeanDewPoint F MinDewPoint F MaxHumidity MeanHumidity MinHUmidity 01/01/2011 51 37 100 89 78 02/01/2011 05/02/2011 50 06/02/2011 07/02/2011 35 Incident: Complaint_No Incident_From_Date Block_No Direction Street_Name Street_Type Suffix City State Zip 20110101000308 01/01/2011 4425 EDDLEMAN RD CHARLOTTE NC 28208 20110101000700 2228 BEATTIES FORD 28216 20110101001104 2300 N TRYON ST 20110101001302 4027 QUAIL GLENN CT K 28226

Data Processing The data format was specified Outliers were replaced with the mean value of the field. Outlier cut off value was set to 3 standard deviations. Dates & Times cannot be used directly by most algorithms, so we estimated the duration period. All the missing data entries were replaced with: Continuous fields: replace with mean Nominal fields: replace with mode Ordinal fields: replace with median

Missing Values Treatment The features with > 50% missing values were excluded. The rows with > 50% missing values were excluded. The fields with too many unique categories were excluded (> 100 categories). The categorical fields with >90% values in a single category were excluded. Sparse categories were merged to maximize association with target. Input fields that have only one category after supervised merging are excluded.

Analysis Type of incident distribution

The trend of the number of Incidents over the Incident hour and the case status

Number of Incidents compared by year and day of the week

Day of week over vehicle theft

Analysis of the vehicle body type that have been stolen most frequently per zip code

Feature engineering Feature engineering is fundamental to the application of machine learning. In order to improve our initial results we used Microsoft SQL Server Management Studio (SSMS) to create the following features: • We created Day of week and time of day features • Time period for a case to be closed was calculated from the reported and clearance date

Predictive Modeling (1) Classification : Decision Tree Tool : SPSS Target : Case_Status Tree Depth : 5 Input : WeekDay : feature extracted from Incident_From_Date (MM/DD/YYYY HHMM). Month : feature extracted from Incident_From_Date (MM/DD/YYYY HHMM). TimeFrame : feature extracted from Incident_From_Date (MM/DD/YYYY HHMM). Place1: General place type (e.g., Residential, Retail, Open Area, etc.) Reporting_Agency : Agency that took the report - Airport Police, Charlotte Mecklenburg Police Location_Type : General location type - Indoors, Outdoors, Parking Lot, Parking Deck, Other Temp_Range : Discretized feature from the mean temperature Events & Unemployment_Rate : Taken from augmented Weather and Unemployment dataset

Incident_From_Date ( MM/DD/YYYY HHMM) → Discretization → Time Frame Mean Temperature F → Interval Scaling → Temperature_Range Time Frame 00:00 - 03:59 Midnight 04:00 - 07:59 Early Morning 08:00 - 11:59 Morning 12:00 - 15:59 Afternoon 16:00 - 19:59 Evening 20:00 - 23:59 Night Temperature_Range 0 - 30 31 - 40 41 - 50 51 - 60 61 - 70 71 - 80 81 - 90

Performance Analysis :

Predictive Modeling (2) Method : Neural Network - Multilayer perceptron Tool : SPSS Target : CLearance TimeFrame Important Predictors : Year Month Incident_Hour Numeber_Of_Tweets Incidents MeanTemperature F WeekDay MeanSealevelPressure Day MeanVisibilityMiles

Evaluation: “Day” level

Predictive Modeling (3) Method : Neural Network - Multilayer perceptron Tool : SPSS Target : Number of Incidents Important Predictors : Year Month Incident_Hour Numeber_Of_Tweets WeekDay MeanTemperature F Unemployement ClearanceTimeFrame Day MeanVisibilityMiles MeanWindSpeedMPH Employment

Evaluation: “Month” level

Evaluation

Accuracy: Sample size Ratio between the sample size and the number of features used The relationship between features Initial weights and biases Target variable Ratio of training set: test set : validation set

Future Work —Application of Deep-Learning methods Improve the individual Model Performance Add more datasets —Application of Deep-Learning methods Implement Unsupervised Learning

Credits: Mansi Dubey Madlen Ivanova Preneesh Jayaraj