Uzair Bhatti Dan Diecker Puji Bandi Latoya Lewis IS 6833 ANALYTICS ASSIGNMENT PREDICTING HOMICIDE RATE IN ST. LOUIS CITY FOR 2013
Homicide is killing of one human being by another. Homicide is a general term; it includes murder, manslaughter, and other criminal homicides as well as noncriminal killings. Murder is the crime of intentionally and unjustifiably killing another. In the U.S., first- degree murder is a homicide committed with premeditation or in the course of a serious felony. The first type encompasses any homicide resulting from an intentional act done without malice or premeditation and while in the heat of passion or on sudden provocation. The second type is variously defined in different jurisdictions but often includes an element of unlawful recklessness or negligence. Noncriminal homicides include killings committed in defense of oneself or another and deaths resulting from accidents caused by persons engaged in lawful acts. DEFINITION
2008: 167 Total Murder for the Year 2009: 143 Total Murder for the Year 2010: 144 Total Murder for the Year 2011: 113 Total Murder for the Year 2012: 113 Total Murder for the Year St. Louis is ranked fourth dangerous city in the US for Murders HOMICIDE OVERVIEW IN ST. LOUIS
Data Segmentation We collected data by neighborhoods and districts St. Louis city consists of 9 districts, 79 neighborhoods, 3 Patrol Zones Data analysis Formulated four variables that correlate with the homicide rates in neighborhoods and districts Analyze and depict the relation between these four variables and the homicide occurrence Variables Organized data in excel using pivots tables Analyze data based on year, month and zip codes Built a regression analysis from all the data collected to predict the murder rate for 2013 Conclusion The ultimate goal is to predict number of homicides and the determined location of unlawful homicides in St. Louis city for OUR APPROACH/OBJECTIVE
MURDER FOR PAST FOUR YEARS
MURDER DISTRIBUTION BY ZIP CODE
MURDERS BY MONTH
Group A as a Team considered many variables to determine potential relationships to homicide. Due to randomness of Homicides, variables only help determine potential relationships but are no means of causality Variables Time Year, Month, Education (High School Diploma) Home / Renter vacancy Income Unemployment Age / Gender Race Location: Districts, Zip code, Neighborhoods, and Streets Poverty Drugs Gangs/ Violence VARIABLES CONSIDERED
Variables used to develop the Regression Model Median Household Income Determined median household income by Zip code Educational Determined by average high school graduation rate by Zip code Vacancy percentage of Rented/Owned Houses Determined average home vacancy by Zip code Unemployment Rate VARIABLES USED TO PREDICT NUMBER OF HOMICIDES AND LOCATION
Based on available data we have chosen to use regression model to establish a correlation between data gathered on St. Louis city and the number of homicides Variables used have established potential relationship with number of homicides. (Source 5) Used regression analysis to show the relationship between significant variables, and build regression model to predict future homicides PREDICTION APPROACH
Inconsistent data availability Data compatibility issues converting zip codes to districts, districts to neighborhoods Inadequate data for the required variables Lack of current data Each department collects data based on different geographic specifications CONSTRAINTS FACING THE MODEL
REGRESSION OUTPUT WITH ALL VARIABLES The regression output indicates a correlation for number of homicides with fluctuations in High school graduation rates Correlation of homicides to Mean Income, Unemployment and number of vacant dwellings is weak
Standard Error Observations95 ANOVA dfSSMSFSignificance F Regression Residual Total Coefficien ts Standard Errort StatP-valueLower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept E Mean Household Income-8E Graduation Rate Unemployment Rate Vacancy
REGRESSION OUTPUT WITH DROPPED VARIABLES More accurate estimate of homicide numbers using stronger correlating data: SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations95 ANOVA dfSSMSF Significance F Regression E-10 Residual Total Coefficients Standard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0% Intercept E Graduation Rate E
Number of homicides to be predicted in year 2013 can be referred by the statistical model illustrating, Combination of variables can be used to predict number of homicides based on high school graduation rate, Home / Rent vacancy, Unemployment rate, Because significance F is less than.05 we can still claim the combination of variables can be used to predict 2013 homicides. The past 5 year prediction for High school degree attainment is 26.5%. Where as the past 3 year prediction is 26.6%. So we predict that the number of homicides are going to be 109. REGRESSION MODEL EQUATION
Based on current trends in education levels of people living in these areas, this model predicts a decrease in the number of homicides for 2013 Studies show that the graduation rate for the St. Louis City has gone up significantly (at a current rate of 26.5%) Based on the past observations of the murder occurrence we predict that Zip code is going to have highest murder rate followed by and respectively PREDICTION
Education level is a well-recorded data source and can be used for estimation of future trends in homicides. High school graduation rate has an inverse relation with the homicide rate. Future data-gathering should be limited to data points that are strongly correlated with homicides and easy to gather. Benefits: Ease of data maintenance Easier ‘What if?’ functionality if there are fewer data to consider Ease of use and timeliness of predictions – quicker to respond and deploy resources where needed. RECOMMENDATIONS
y_facts.xhtml y_facts.xhtml Missouri.html (homicide overview in St. Louis) Missouri.html (4 th dangerous city in the US for Murders) tes.pdf REFERENCES