Which way will 2016 swing? BIT 5534 Group 3 Final Project

Slides:



Advertisements
Similar presentations
Campaign Financing and Election Outcome
Advertisements

Resources in the General Election. Money FECA provides FULL public financing for presidential election campaigns FECA provides FULL public financing for.
Strategic Consequences of the Electoral College. Rules Each state appoints “Electors” equal to the number of combined seats in the House and Senate Electors.
Decision Tree Models in Data Mining
Assessment of Model Development Techniques and Evaluation Methods for Binary Classification in the Credit Industry DSI Conference Jennifer Lewis Priestley.
American Pride and Social Demographics J. Milburn, L. Swartz, M. Tottil, J. Palacio, A. Qiran, V. Sriqui, J. Dorsey, J. Kim University of Maryland, College.
Electing the President The Electoral College 2008 – Barack Obama versus John McCain.
American Pride and Social Demographics J. Milburn, L. Swartz, M. Tottil, J. Palacios, A. Qiran, V. Sriqui, J. Dorsey, J. Kim University of Maryland, College.
Voting and Elections Before 1870, only white men over the age of 21 could vote. Before 1870, only white men over the age of 21 could vote – 15 th.
PADM 582 Quantitative and Qualitative Research Methods Basic Concepts of Statistics Soomi Lee, Ph.D.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Pika and Maltese Chapter 2.  Compare and contrast the presidential selection process now to that in the late 1700s.  The main way I want you to look.
2012 Election Overview The National Association of Business Political Action Committees 2012 NABPAC POST-ELECTION CONFERENCE PRESENTED BY: Hans Kaiser.
Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.
Modeling and Forecasting Household and Person Level Control Input Data for Advance Travel Demand Modeling Presentation at 14 th TRB Planning Applications.
Chapter 16 Data Analysis: Testing for Associations.
Pros and Cons of the Electoral College
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
National Update May 2016 Bill McInturff SLIDE 1. SLIDE 2 Public Opinion Strategies—May 2016 SLIDE 2 Heading into the Election Year.
BUS 308 Entire Course (Ash Course) For more course tutorials visit BUS 308 Week 1 Assignment Problems 1.2, 1.17, 3.3 & 3.22 BUS 308.
Outline Sampling Measurement Descriptive Statistics:
Statistics & Evidence-Based Practice
Presidential Elections
Predicting the performance of US Airline carriers
WIS/COLLNET’2016 Nancy, France
GIS in AP Human Geography
Predicting Elections adapted and updated from a feature on ABC News’ Nightline.
Understanding the 2008 Elections
Queen’s University Belfast
BUS 308 mentor innovative education/bus308mentor.com
How to become President of the United States
WHO ARE THE VOTERS IN ALACHUA COUNTY AND WHAT PART OF THE COMMUNITY DO THEY REPRESENT? Presented By Team 2 – Ursula Garfield, Leanna Woods, Melissa Lail,
A Comparison of Two Nonprobability Samples with Probability Samples
Unit 3 Hypothesis.
Making Comparisons All hypothesis testing follows a common logic of comparison Null hypothesis and alternative hypothesis mutually exclusive exhaustive.
Voting In Alachua County
Nick Onopa, Charles Jones, Kathy Anderson
Understanding Results
Chapter 5: Notes American Government.
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Simulation-Based Approach for Comparing Two Means
Demographic and Socio-Economic Profiles that Relate to Political Party Affiliation Examined in Massachusetts and Wyoming for the 2016 Presidential Election.
Colorado and Florida Target Counties for Clinton
BUS 308 HELPS Perfect Education/ bus308helps.com.
Using Data Analytics to Predict Liquor Sales in Iowa State
NBA Draft Prediction BIT 5534 May 2nd 2018
Employee Turnover: Data Analysis and Exploration
10: Leisure at an International Scale: Sport
IDENTIFYING BERNIE SANDERS’ VOTER BASE THROUGH PREDICTIVE ANALYTICS
Predicting Government Spending on Professional Services
Research Statistics Objective: Students will acquire knowledge related to research Statistics in order to identify how they are used to develop research.
How Hispanics Are Changing the Face of Nevada
Public Opinion and Political Action
Introduction to Statistics
How to become President of the United States
How to become President of the United States
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Cross Sectional Designs
Statistical Analysis Error Bars
Categorical Data Analysis Review for Final
How to become President of the United States
Public Opinion and Political Action
Aim: How are voter’s behaviors influenced?
Nature of Science.
Public Opinion and Political Action
Introduction to Political Parties
Chapter 12 & 13 Political Parties and Elections.
15.1 The Role of Statistics in the Research Process
Association between 2 variables
Road to Presidency.
Presentation transcript:

Which way will 2016 swing? BIT 5534 Group 3 Final Project Virginia Tech Spring 2015 Mike King, Nicole Oliver, Ravi Kosuri

Agenda Background Problem Statement Data Mining Modeling Results The agenda for this presentation will start with the background, followed by the problem statement, covering our data mining approach, and then cite the specific models that were created to predict this election’s results.

Background Politico, a popular political journalism website, proposed a positive correlation between population density and voting patterns Developing a predictive model that takes factors such as population density and demographic data for predicting the outcome of 2016 elections Potential practical applications Private and Public sector entities that conduct government business can plan ahead. Impact can be wide ranging such as transportation, healthcare, defense, taxes, and judicial to name a few Parties can spend their campaign dollars in counties where they have the highest chance of winning The background for this specific issue centers on an article published on Politico, which is a popular political journalism website. The article suggested a relationship between population density and voting patterns, specifically that is a city or county passes the 800 person per square mile benchmark, they vote Democratic at 66% or better. The opposite was found to be true, where counties or cities with less than 800 persons per square mile are 66% or better to vote Republican. The finding was based on 2012 and 2013 election results, and one of the goals of this paper was to explore the relationship between this value, and apply it to a predictive model for the upcoming national election in 2016. Additional research found that the impact of gender and ethnicity may also have great predictive power on the outcome of a national election, and their values were calculated using previous census data, and projected for 2016 to bolster our predictive models. Our research is intended to develop predictive models that takes factors like population density and demographic data into a prediction for the 2016 presidential election. This research could cover a range of practical applications, such as private and public sector entities that conduct government business can plan ahead for the result of a national election. The impact could cover a large range of sections, such as transportation, healthcare, defense, taxes, and judicial to name a few. This information can also help political parties focus on where they spend their campaign dollars, such as counties in battleground states where their influence could be the most effective and swing electoral college votes for the state.

Problem Statement Test the hypothesis proposed by Politico about the relationship between Population Density and Voting patterns Explore the additional effect of demographics on voting patterns Develop a predictive model for the 2016 presidential election in the swing states of Ohio, Virginia, Nevada, Colorado, Iowa and Florida based on variables such as population density and demographic data Our problem statement covers testing the hypothesis proposed by Politico about the relationship between Population Density and Voting patterns, and determine if it is a viable variable to use in predicting the 2016 election. Our research would also like to explore the additional effect of various demographic measures on these voting patterns, to determine if demographic values possess predictive power over a national election. Our approach will then develop a predictive model for the 2016 election, using additional research that uncovered battleground states for 2016 as Ohio, Virginia, Nevada, Colorado, Iowa, and Nevada. Our predictive models will incorporate the independent variables of population density and demographic data, and use them in a way to make predictions of the voting behavior in the counties of these battleground states.

Data Mining Data Sources Data Description Pre-processing Data Public websites such as census.gov for population demographics and voting results Data Description Demographic data and voting results in 2012 for each county Population data and land area of each county in 2010-2013 Focusing only in the swing states of VA, OH, FL, NV, CO, IA Pre-processing Data Projecting population growth for each county in 2016 Calculating the projected population density for each county Calculating the projected demographic makeup of each county For the data mining activity itself, we started with publicly available data from websites such as census.gov and politico.com for population demographics and voting results. Once this data was collected, the Data Description started with each variable and type collected into a Variable Dictionary, to display matching details for each indicator. It began with the demographic data and voting results in 2012 for each county on these swing states. In addition, population data and land area between 2010-2013 was collected, to create variables that include population density in 2012 and 2016. As for the states, our research focused only on the battleground states of VA, OH, FL, NV, CO, and IA, since the importance of swing states has been magnified in previous elections. Our data mining approach included the pre-processing of data, using previous census data to use current growth rates in projecting the population growth for each county in 2016. Once these values are attained, we could then calculate the projected population density for each county, and present it as a new variable, followed by calculating the projected demographic makeup of each county.

Modeling Techniques for generating predictive models Nominal Logistic Regression Neural Networks Decision Trees Our modeling approaches covered three of the most common data mining models available, starting with nominal logistic regression. For this modeling approach, we needed to create a binary dummy variable of “Party”, which indicated a majority Democratic or Republican votes. It was then followed by Neural Networks, due to their high predictive power using underlying nodes. The third modeling approach was a Decision Tree, which was used due to the powerful exploratory data analysis and transparent results that it provides.

Model Performance -Regression For model performance in the Logistic regression Model, we used a Receiver Operating Characteristic or ROC graph to measure the performance of the regression model. For this specific ROC curve, the Area Under Curve is .73836, which is a fair result. Values between .5 and 1 are acceptable, with values closer to 1 providing more accurate performance.

Model Performance – Neural Networks For model performance in the Neural Network Model, we used a ROC graph to measure the performance of the Neural Network. For this specific ROC curve, the Area Under Curve is .7870 for both target variables DEM and REP, which are also fair results.

Model Performance - Decision Trees For model performance in the Decision Tree Model, we used another ROC graph to measure the performance of the decision tree. For this specific ROC curve, the Area Under Curve is .8780 for the entire model, with .8504 for Democrat and .8512 for Republican, which are good results. Since values closer to 1 provide more accurate performance, the decision tree has performed the strongest of the applied modeling types.

Model Comparison Decision Tree model gives the best fit and predictive performance compared to Neural Networks and Regression In a direct model comparison using JMP, here are the compared Rsquare values for each modeling approach. In each measure, the Decision Tree performs with a stronger Rsquare values than the Logistic regression Model or the Neural network, in addition to a lower root-mean-square error (RMSE) rate and lower Misclassification rate.

Results Population Density has a statistically significant impact on County Voting Patterns. Population Densities above 1039 people/square mile lean Democrat with almost 80% probability The percentage of Black and Hispanic population also contributes significantly to County Voting Patterns Higher Black and Hispanic populations favor Democrats Surprising Result: Higher Hispanic male populations in lower population density areas favor Republicans Decision Trees provided the best predictive performance Florida, Ohio and Iowa are predicted to flip to the Republican column in 2016, while Virginia, Colorado and Nevada are expected to stay Blue. And for our results, which show that Population density does have a statistically significant impact on County Voting patterns. Our results were not identical to the original study, which cited 800 persons per square mile as the break point, as our research was close in finding the first break in our dataset as 1039 persons per square mile. At this break point, we found 80% of the votes lean Democratic, which is consistent with the original study. We also found that the percentage of Black and Hispanic population did contribute significantly to County voting patterns, with higher percentages favoring Democrats. There was one surprising result, where Higher Hispanic male populations in lower population density areas favor Republicans, which was a bit of a switch that aligned with the original population density finding. In the comparison between each predictive modeling approach, Decision Trees provided the best predictive performance based on common metrics such as Rsquare, lower root-mean-square error (RMSE), lower misclassification rate, and superior performance using a ROC curve. The predictive result of our study and the independent variables listed above, is that Florida, Ohio and Iowa are predicted to flip to the Republican column in 2016, while Virginia, Colorado and Nevada are expected to stay Blue.

Predicted results for 2016 State Regression Decision Trees Virginia Rep (55% to 44%) Dem (51% to 49%) Ohio Dem (82% to 17%) Rep(73% to 27%) Iowa Rep(52% to 48%) Rep(77% to 23%) Florida Rep (59% to 41%) Rep(56% to 44%) Colorado Dem (53% to 47%) Dem(64% to 36%) Nevada Dem (54% to 46%) Dem(60% to 40%) And here is the graph displaying our outcome percentages for each battleground state

Conclusion Statistically significant relationship between population density and voting patterns Generated predictive model using population density and demographic information Predicted the 2016 presidential race outcome in 6 swing states These swing counties between 700-900 persons per sq mile State County Persons Per Sq Mile Virginia Loudoun County 704.634027 Florida Sarasota County 707.1123765 Palm Beach County 707.7330736 Colorado Jefferson County 728.0871357 Arapahoe County 777.1085286 Poquoson city 782.8554562 Chesterfield County 783.3707965 Ohio Butler County 798.7778997 Emporia city 806.338838 Galax city 822.3968257 Iowa Polk County 822.4350556 Bedford city 831.1570468 Lee County 857.9602404 And for our conclusion, which found that Population density does have a statistically significant impact on County Voting patterns. We generated applicable models to investigate and prove this relationship, and provided a prediction for the 2016 presidential race And below we have specific counties that are closest to the swing line in Battleground states, which would be the most sensitive to a change in the population density metric for potential voters to either Party. These counties will most likely see the most campaign dollars during the upcoming election due to this potential to swing the state, and potentially the national election result. Thank you.