Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Data Analytics to Predict Liquor Sales in Iowa State

Similar presentations


Presentation on theme: "Using Data Analytics to Predict Liquor Sales in Iowa State"— Presentation transcript:

1 Using Data Analytics to Predict Liquor Sales in Iowa State
Group #10 Phil Brown Jurgens Vestil Akhil Vyas BIT 5534 – Spring 2018 Hello and welcome to our presentation. This is Philip Brown, Jurgens Vestil, and Akhil Vyas as Group Ten. Our presentation today will be about using data analytics to predict liquor sales in the State of Iowa.

2 Project Overview Business Problem Data Description Data Preparation
Model Development and Results Summary of Findings Conclusion Today for our project we will go over our business problem, our data description, our data preparation steps that we used for our model, a description of our results, a summary of our findings, and then finally we will move on to our conclusion.

3 Business Problem By law, the State of Iowa is required to log every sale of alcohol We intend to use this data to gain insights into the backgrounds of the population based on the type of liquor that they buy This will help to uncover many questions related to the sale activity for alcohol in the State of Iowa, such as the frequency of alcohol sold We aim to share the results with liquor vendors to help determine the required amount of inventory to buy

4 Data Description Our data came from the Iowa Department of Commerce
The dataset contains spirits purchase information of Iowa Class “E” liquor licensees by product and date of purchase from 2012 to 2015 It has 24 attributes and 12 million entries Source:

5 Data Preparation Split the CSV file into multiple parts
Randomly selected CSV file and combined them into one Data was reduced from 12 million entries to 7000 entries Reduced variables from 24 to 13 Removed missing values/rows, and outliers

6 Data Exploration We generated a scatterplot matrix to examine the relationships between the dependent and independent variables Multiple regression was performed on target (Sales) To verify correlation, we generated a scatterplot matrix to examine the relationships between the response variable “Sales” and the explanatory variables. From this plot, we determined the highly correlated variables which are the total volume of liquor ordered in gallons, or Volume Sold (Gallons), as well as the amount that the alcoholic beverage vendor paid for each bottle of liquor ordered, State Bottle Cost. With these variables removed, we then performed Multiple Regression Analysis on the response variable “Sales”, which produced an Rsquare value of 79% with an RMSE value of 58.5.

7 Data Exploration (cont.)
Stepwise Regression was performed to determine significant variables. These variables are: State bottle cost, Volume sold, bottles sold, and bottle Volume To determine the significant variables, we ran Stepwise Regression with the Stopping Rule set to the P-value Threshold, and with the Prob to Enter and Leave set to This then produced a higher R Squared value of 88.31% and RMSE of 45.77%. It should be noted that for the categorical variables, we found that some of the values are significant while others are not. With Stepwise performed, we identified the significant variables in the dataset. These variables are: State bottle cost, Volume sold( L), bottles sold, bottle Volume (ml).

8 Model Development Principal Component Analysis Decision Tree Analysis
3 PCAs chosen based on cumulative percentage Decision Tree Analysis Number of splits = 155; Rsquare of 0.86 and 0.83 for Training and Validation, respectively K-Means Clustering 7 initial clusters selected; produced lower R square value than other methods Time Series Analysis Transfer Function Model selected with Rsquare value of 0.80 and MAE of

9 Model Evaluation Both the PCA models and Decision Tree Analyses performed well with an R Squared value of 0.93 and 0.86, respectively

10 Model Evaluation (cont.)

11 Conclusion The PCA model was statistically the best model with the highest R Squared value We found that alcohol sales peak from October to February based on Time Series Analysis results The higher the cost of alcohol, the higher the sales. Consumer tend to choose more expensive brands Hawkeye Vodka was the number one selling item throughout the year in Des Moines, Iowa City, and Cedar Rapids Black Velvet, a discount Canadian Whiskey, produces the most sales around New Year’s


Download ppt "Using Data Analytics to Predict Liquor Sales in Iowa State"

Similar presentations


Ads by Google