Which way will 2016 swing? BIT 5534 Group 3 Final Project

Which way will 2016 swing? BIT 5534 Group 3 Final Project
Virginia Tech Spring 2015 Mike King, Nicole Oliver, Ravi Kosuri

Agenda Background Problem Statement Data Mining Modeling Results
The agenda for this presentation will start with the background, followed by the problem statement, covering our data mining approach, and then cite the specific models that were created to predict this election’s results.

Background Politico, a popular political journalism website, proposed a positive correlation between population density and voting patterns Developing a predictive model that takes factors such as population density and demographic data for predicting the outcome of 2016 elections Potential practical applications Private and Public sector entities that conduct government business can plan ahead. Impact can be wide ranging such as transportation, healthcare, defense, taxes, and judicial to name a few Parties can spend their campaign dollars in counties where they have the highest chance of winning The background for this specific issue centers on an article published on Politico, which is a popular political journalism website. The article suggested a relationship between population density and voting patterns, specifically that is a city or county passes the 800 person per square mile benchmark, they vote Democratic at 66% or better. The opposite was found to be true, where counties or cities with less than 800 persons per square mile are 66% or better to vote Republican. The finding was based on 2012 and 2013 election results, and one of the goals of this paper was to explore the relationship between this value, and apply it to a predictive model for the upcoming national election in 2016. Additional research found that the impact of gender and ethnicity may also have great predictive power on the outcome of a national election, and their values were calculated using previous census data, and projected for 2016 to bolster our predictive models. Our research is intended to develop predictive models that takes factors like population density and demographic data into a prediction for the 2016 presidential election. This research could cover a range of practical applications, such as private and public sector entities that conduct government business can plan ahead for the result of a national election. The impact could cover a large range of sections, such as transportation, healthcare, defense, taxes, and judicial to name a few. This information can also help political parties focus on where they spend their campaign dollars, such as counties in battleground states where their influence could be the most effective and swing electoral college votes for the state.

Problem Statement Test the hypothesis proposed by Politico about the relationship between Population Density and Voting patterns Explore the additional effect of demographics on voting patterns Develop a predictive model for the 2016 presidential election in the swing states of Ohio, Virginia, Nevada, Colorado, Iowa and Florida based on variables such as population density and demographic data Our problem statement covers testing the hypothesis proposed by Politico about the relationship between Population Density and Voting patterns, and determine if it is a viable variable to use in predicting the 2016 election. Our research would also like to explore the additional effect of various demographic measures on these voting patterns, to determine if demographic values possess predictive power over a national election. Our approach will then develop a predictive model for the 2016 election, using additional research that uncovered battleground states for 2016 as Ohio, Virginia, Nevada, Colorado, Iowa, and Nevada. Our predictive models will incorporate the independent variables of population density and demographic data, and use them in a way to make predictions of the voting behavior in the counties of these battleground states.

Data Mining Data Sources Data Description Pre-processing Data
Public websites such as census.gov for population demographics and voting results Data Description Demographic data and voting results in 2012 for each county Population data and land area of each county in Focusing only in the swing states of VA, OH, FL, NV, CO, IA Pre-processing Data Projecting population growth for each county in 2016 Calculating the projected population density for each county Calculating the projected demographic makeup of each county For the data mining activity itself, we started with publicly available data from websites such as census.gov and politico.com for population demographics and voting results. Once this data was collected, the Data Description started with each variable and type collected into a Variable Dictionary, to display matching details for each indicator. It began with the demographic data and voting results in 2012 for each county on these swing states. In addition, population data and land area between was collected, to create variables that include population density in 2012 and As for the states, our research focused only on the battleground states of VA, OH, FL, NV, CO, and IA, since the importance of swing states has been magnified in previous elections. Our data mining approach included the pre-processing of data, using previous census data to use current growth rates in projecting the population growth for each county in Once these values are attained, we could then calculate the projected population density for each county, and present it as a new variable, followed by calculating the projected demographic makeup of each county.

Modeling Techniques for generating predictive models
Nominal Logistic Regression Neural Networks Decision Trees Our modeling approaches covered three of the most common data mining models available, starting with nominal logistic regression. For this modeling approach, we needed to create a binary dummy variable of “Party”, which indicated a majority Democratic or Republican votes. It was then followed by Neural Networks, due to their high predictive power using underlying nodes. The third modeling approach was a Decision Tree, which was used due to the powerful exploratory data analysis and transparent results that it provides.

Model Performance -Regression
For model performance in the Logistic regression Model, we used a Receiver Operating Characteristic or ROC graph to measure the performance of the regression model. For this specific ROC curve, the Area Under Curve is , which is a fair result. Values between .5 and 1 are acceptable, with values closer to 1 providing more accurate performance.

Model Performance – Neural Networks
For model performance in the Neural Network Model, we used a ROC graph to measure the performance of the Neural Network. For this specific ROC curve, the Area Under Curve is for both target variables DEM and REP, which are also fair results.

Model Performance - Decision Trees
For model performance in the Decision Tree Model, we used another ROC graph to measure the performance of the decision tree. For this specific ROC curve, the Area Under Curve is for the entire model, with for Democrat and for Republican, which are good results. Since values closer to 1 provide more accurate performance, the decision tree has performed the strongest of the applied modeling types.

Model Comparison Decision Tree model gives the best fit and predictive performance compared to Neural Networks and Regression In a direct model comparison using JMP, here are the compared Rsquare values for each modeling approach. In each measure, the Decision Tree performs with a stronger Rsquare values than the Logistic regression Model or the Neural network, in addition to a lower root-mean-square error (RMSE) rate and lower Misclassification rate.

Results Population Density has a statistically significant impact on County Voting Patterns. Population Densities above 1039 people/square mile lean Democrat with almost 80% probability The percentage of Black and Hispanic population also contributes significantly to County Voting Patterns Higher Black and Hispanic populations favor Democrats Surprising Result: Higher Hispanic male populations in lower population density areas favor Republicans Decision Trees provided the best predictive performance Florida, Ohio and Iowa are predicted to flip to the Republican column in 2016, while Virginia, Colorado and Nevada are expected to stay Blue. And for our results, which show that Population density does have a statistically significant impact on County Voting patterns. Our results were not identical to the original study, which cited 800 persons per square mile as the break point, as our research was close in finding the first break in our dataset as 1039 persons per square mile. At this break point, we found 80% of the votes lean Democratic, which is consistent with the original study. We also found that the percentage of Black and Hispanic population did contribute significantly to County voting patterns, with higher percentages favoring Democrats. There was one surprising result, where Higher Hispanic male populations in lower population density areas favor Republicans, which was a bit of a switch that aligned with the original population density finding. In the comparison between each predictive modeling approach, Decision Trees provided the best predictive performance based on common metrics such as Rsquare, lower root-mean-square error (RMSE), lower misclassification rate, and superior performance using a ROC curve. The predictive result of our study and the independent variables listed above, is that Florida, Ohio and Iowa are predicted to flip to the Republican column in 2016, while Virginia, Colorado and Nevada are expected to stay Blue.

Predicted results for 2016 State Regression Decision Trees Virginia
Rep (55% to 44%) Dem (51% to 49%) Ohio Dem (82% to 17%) Rep(73% to 27%) Iowa Rep(52% to 48%) Rep(77% to 23%) Florida Rep (59% to 41%) Rep(56% to 44%) Colorado Dem (53% to 47%) Dem(64% to 36%) Nevada Dem (54% to 46%) Dem(60% to 40%) And here is the graph displaying our outcome percentages for each battleground state

Conclusion Statistically significant relationship between population density and voting patterns Generated predictive model using population density and demographic information Predicted the 2016 presidential race outcome in 6 swing states These swing counties between persons per sq mile State County Persons Per Sq Mile Virginia Loudoun County Florida Sarasota County Palm Beach County Colorado Jefferson County Arapahoe County Poquoson city Chesterfield County Ohio Butler County Emporia city Galax city Iowa Polk County Bedford city Lee County And for our conclusion, which found that Population density does have a statistically significant impact on County Voting patterns. We generated applicable models to investigate and prove this relationship, and provided a prediction for the 2016 presidential race And below we have specific counties that are closest to the swing line in Battleground states, which would be the most sensitive to a change in the population density metric for potential voters to either Party. These counties will most likely see the most campaign dollars during the upcoming election due to this potential to swing the state, and potentially the national election result. Thank you.

Which way will 2016 swing? BIT 5534 Group 3 Final Project

Similar presentations

Presentation on theme: "Which way will 2016 swing? BIT 5534 Group 3 Final Project"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Which way will 2016 swing? BIT 5534 Group 3 Final Project

Similar presentations

Presentation on theme: "Which way will 2016 swing? BIT 5534 Group 3 Final Project"— Presentation transcript:

Similar presentations

About project

Feedback