Using Machine Learning to Analyze Serial Killer Patterns

Slides:



Advertisements
Similar presentations
Random Forest Predrag Radenković 3237/10
Advertisements

Sparse vs. Ensemble Approaches to Supervised Learning
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Rotation Forest: A New Classifier Ensemble Method 交通大學 電子所 蕭晴駿 Juan J. Rodríguez and Ludmila I. Kuncheva.
Intelligible Models for Classification and Regression
Classification and Prediction: Regression Analysis
Ensemble Learning (2), Tree and Forest
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
Presented by: Kamakhaya Argulewar Guided by: Prof. Shweta V. Jain
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A data mining approach to the prediction of corporate failure.
Forensic Psychology Introduction to Serial Killers.
Ensemble Methods: Bagging and Boosting
Geoprofiling in commercial robbery series. Is it useful? Pekka Santtila, PhD Professor of Forensic Psychology Department of Psychology Åbo Akademi University.
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
Julie McDonald And Alli Hicks. Criminal Profiling The analysis of the behavior and circumstances associated with serious crimes in an effort to identify.
FBI Method of Profiling Violent Serial Offenders
An Exercise in Machine Learning
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Classification using Co-Training
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Stock market forecasting using LASSO Linear Regression model
CS 548 Spring 2016 Model and Regression Trees Showcase by Yanran Ma, Thanaporn Patikorn, Boya Zhou Showcasing work by Gabriele Fanelli, Juergen Gall, and.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
PREDICTING SONG HOTNESS
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
DECISION TREE INDUCTION CLASSIFICATION AND PREDICTION What is classification? what is prediction? Issues for classification and prediction. What is decision.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Cluster Validity For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster.
Constructing a Predictor to Identify Drug and Adverse Event Pairs
A Smart Tool to Predict Salary Trends of H1-B Holders
Restaurant Revenue Prediction using Machine Learning Algorithms
Machine Learning with Spark MLlib
What makes crime newsworthy?
Data Transformation: Normalization
Semi-Supervised Clustering
Criminal Psychology & Psychological Profiling
Trees, bagging, boosting, and stacking
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Regression Diagnostics
General principles in building a predictive model
Boosting and Additive Trees
Julie McDonald And Alli Hicks
Intro to Machine Learning
CSE 4705 Artificial Intelligence
Introduction to Data Mining, 2nd Edition by
TED Talks – A Predictive Analysis Using Classification Algorithms
Experiments in Machine Learning
Anindya Maiti, Murtuza Jadliwala, Jibo He Igor Bilogrevic
Classification & Prediction
CSCI N317 Computation for Scientific Applications Unit Weka
Intro to Machine Learning
Analysis for Predicting the Selling Price of Apartments Pratik Nikte
Lecture 10 – Introduction to Weka
Chapter 7: Transformations
Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1 classifier 2 classifier.
Identifying Severe Weather Radar Characteristics
6-3 Serial Killers.
A Data Partitioning Scheme for Spatial Regression
Angel A. Cantu, Nami Akazawa Department of Computer Science
Credit Card Fraudulent Transaction Detection
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presenter: Donovan Orn
Presentation transcript:

Using Machine Learning to Analyze Serial Killer Patterns Mason Garza, Fernando Martinez University of Texas at Rio Grande Valley | Department of Computer Science ABSTRACT RESULTS AND ANALYSIS Motivated by the Ted Bundy tapes, we wanted to apply machine learning data analysis to analyze serial killers, and find potential patterns. We gathered data from a large private database, the FGCU/Radford Serial Killer database, and tried to find predictive algorithms for the following. Motivation Number of Victims Serial Killers’ Sex We then conducted an analysis of the models produced by these algorithms in order to gain insight on how these feature were predicted. Results Motive 11 class: Attention, Enjoyment, Anger, Mental Illness, etc. Binary class: Enjoyment and Other Accuracy: 11 class model had 64.6%, Binary Class had 81.6% [Fig. 8] Number of Victims Range: 2 - 49 Average: 5 Accuracy: 4.7 victim error Serial Killers’ Sex Predominantly Male Accuracy: 92.1% [Fig. 7] Analysis With random forest, by summing the changes in error when a split is made in the trees, we can estimate the importance of a predictor (PI). The features with the greatest PI Binary Motive: Whether or not the killer raped [Fig. 4] Victims: The year of the first kill, and secondarily the presence of a possession trophy [Fig. 5] Serial Killers’ Sex: The birth year of the killer, although this model has more predictors of significance [Fig. 6] We had older models which were more accurate, but when we conducted an analysis of the models, we found that certain features were too unfairly related. E.g. the White Male feature was originally included in the Sex model. We prepared new models which eliminated these features. Figure 7: Serial Killers’ Sex Confusion Matrix Figure 8: Binary Motive Confusion Matrix BACKGROUND Figure 4: PIs for Motive model Figure 5: PIs for Victim model Figure 6: PIs for Sex model FBI defines serial killing as “a series of two or more murders, committed as separate events, usually, but not always, by one offender acting alone”. Contrasted with Mass Murderers, who kill their victims in one act, and organized criminals who kill for an organization. The literature on this subject is very limited. There are papers about applying machine learning to general crime data, which we read to inspired our methods. Database originally included 2870 “serial killer” instances with 179 features. CONCLUSION EXPERIMENT We can predict whether a serial killer will kill for enjoyment or for another motive with an 81.6% accuracy. We can predict any motive with 64.6% accuracy. Binary Classification model for Motive tends to weigh “Rape” and the sex of the victims greatly. We were able predict the number of victims with an error of <5 victims. Our model for predicting number of victims weighs the year and whether or not the killer kept possession trophies highly. We were able predict the sex of the serial killers’ with 92.1% accuracy. Our model for predicting sex considers the time period the serial killer was born it, usually it’s a year in the early 20th century. Preprocessed the data down to ≤ 60 features and 1125 instances. Scraped data from multiple csv files, removed non serial killers, and older ones. Formatted label data Delete features missing more than 20% of data Delete instances missing data from remaining features. Non numerical data, and lists removed. Motive: 11 Class Model & Binary model [Fig. 1] Classification Random Forest with Bagging Error measured with Misclassification Rate Victim: Number of Victims [Fig. 2] Regression Random Forests with Bagging Error measured with Root Mean Squared Error Serial Killer Sex: Binary [Fig. 3] FUTURE WORK Figure 1: Motive Error Figure 2: Number of Victims Error Erase serial killers that started killing before 1930, and see the result on serial killers’ sex. Try different data replacement methods to incorporate more data into the model. Include a wider variety of data such as textual data, and lists. Train more models for different features. Figure 3: Sex Error References: [1] McClendon, Lawrence, and Natarajan Meghanathan. "Using machine learning algorithms to analyze crime data." Machine Learning and Applications: An International Journal (MLAIJ)2.1 (2015): 1-12. [2] Kim, Suhong, et al. "Crime Analysis Through Machine Learning." 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON). IEEE, 2018. [3] Radford FGCU Database