Using Data Analytics to Predict Liquor Sales in Iowa State

Slides:



Advertisements
Similar presentations
Kin 304 Regression Linear Regression Least Sum of Squares
Advertisements

MULTIPLE REGRESSION. OVERVIEW What Makes it Multiple? What Makes it Multiple? Additional Assumptions Additional Assumptions Methods of Entering Variables.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Managers Using Microsoft® Excel 5th Edition
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
19-1 Chapter Nineteen MULTIVARIATE ANALYSIS: An Overview.
Lecture 24: Thurs., April 8th
Goals of Factor Analysis (1) (1)to reduce the number of variables and (2) to detect structure in the relationships between variables, that is to classify.
Multiple Regression – Basic Relationships
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Split Sample Validation General criteria for split sample validation Sample problems.
Data Forensics: A Compare and Contrast Analysis of Multiple Methods Christie Plackner.
WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied, duplicated, or posted to a publicly accessible website, in whole or in part.
1 Model Selection Response: Highway MPG Explanatory: 13 explanatory variables Indicator variables for types of car – Sports Car, SUV, Wagon, Minivan There.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Initiative 1100: Summary & Impact 1 Rick Garza Deputy Administrative Director.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Aim: Review for Exam Tomorrow. Independent VS. Dependent Variable Response Variables (DV) measures an outcome of a study Explanatory Variables (IV) explains.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
Business Statistics for Managerial Decision Making
D/RS 1013 Data Screening/Cleaning/ Preparation for Analyses.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
1 Response Surface A Response surface model is a special type of multiple regression model with: Explanatory variables Interaction variables Squared variables.
(Slides not created solely by me – the internet is a wonderful tool) SW388R7 Data Analysis & Compute rs II Slide 1.
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
CMS SAS Users Group Conference Learn more about THE POWER TO KNOW ® October 17, 2011 Medicare Payment Standardization Modeling using SAS Enterprise Miner.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
HW 21 Key. 23:41 Home Prices. In order to help clients determine the price at which their house is likely to sell, a realtor gathered a sample of 150.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Which way will 2016 swing? BIT 5534 Group 3 Final Project
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Making Sense of Student Loan Data
Predicting the performance of US Airline carriers
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
PREDICTING Flight Delays
Correlation, Bivariate Regression, and Multiple Regression
Multiple Regression Prof. Andy Field.
Evaluating Insurance Marketing Strategies through Data Mining
Belinda Boateng, Kara Johnson, Hassan Riaz
A linear approach to predicting house prices
Kin 304 Regression Linear Regression Least Sum of Squares
Dealing with data qualitative data The main report
USE OF DATA ANALYTICS TO PREDICT THE DEMAND OF BIKES
Nick Onopa, Charles Jones, Kathy Anderson
Predicting the Market Value of the Property Using JMP® Pro 11
BPK 304W Regression Linear Regression Least Sum of Squares
Assisting Nashville Home Sellers Through Predicting Their Home Price
Predicting Academic Performance of University Students
NBA Draft Prediction BIT 5534 May 2nd 2018
Employee Turnover: Data Analysis and Exploration
Reducing Loan Risk Using Data Analytics
Classification and Prediction
IDENTIFYING BERNIE SANDERS’ VOTER BASE THROUGH PREDICTIVE ANALYTICS
Predicting Government Spending on Professional Services
Dr. Morgan C. Wang Department of Statistics
EQ: How well does the line fit the data?
Statistical Analysis using SPSS
Analytics: Its More than Just Modeling
© 2011 Cengage Learning. All Rights Reserved
Multiple Linear Regression Analysis
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Multiple Regression – Split Sample Validation
Analysis for Predicting the Selling Price of Apartments Pratik Nikte
Chapter 7: Transformations
Machine Learning in Business John C. Hull
Presentation transcript:

Using Data Analytics to Predict Liquor Sales in Iowa State Group #10 Phil Brown Jurgens Vestil Akhil Vyas BIT 5534 – Spring 2018 Hello and welcome to our presentation. This is Philip Brown, Jurgens Vestil, and Akhil Vyas as Group Ten. Our presentation today will be about using data analytics to predict liquor sales in the State of Iowa.

Project Overview Business Problem Data Description Data Preparation Model Development and Results Summary of Findings Conclusion Today for our project we will go over our business problem, our data description, our data preparation steps that we used for our model, a description of our results, a summary of our findings, and then finally we will move on to our conclusion.

Business Problem By law, the State of Iowa is required to log every sale of alcohol We intend to use this data to gain insights into the backgrounds of the population based on the type of liquor that they buy This will help to uncover many questions related to the sale activity for alcohol in the State of Iowa, such as the frequency of alcohol sold We aim to share the results with liquor vendors to help determine the required amount of inventory to buy

Data Description Our data came from the Iowa Department of Commerce The dataset contains spirits purchase information of Iowa Class “E” liquor licensees by product and date of purchase from 2012 to 2015 It has 24 attributes and 12 million entries Source: https://data.iowa.gov/Economy/Iowa-Liquor-Sales/m3tr-qhgy

Data Preparation Split the CSV file into multiple parts Randomly selected CSV file and combined them into one Data was reduced from 12 million entries to 7000 entries Reduced variables from 24 to 13 Removed missing values/rows, and outliers

Data Exploration We generated a scatterplot matrix to examine the relationships between the dependent and independent variables Multiple regression was performed on target (Sales) To verify correlation, we generated a scatterplot matrix to examine the relationships between the response variable “Sales” and the explanatory variables. From this plot, we determined the highly correlated variables which are the total volume of liquor ordered in gallons, or Volume Sold (Gallons), as well as the amount that the alcoholic beverage vendor paid for each bottle of liquor ordered, State Bottle Cost. With these variables removed, we then performed Multiple Regression Analysis on the response variable “Sales”, which produced an Rsquare value of 79% with an RMSE value of 58.5.

Data Exploration (cont.) Stepwise Regression was performed to determine significant variables. These variables are: State bottle cost, Volume sold, bottles sold, and bottle Volume To determine the significant variables, we ran Stepwise Regression with the Stopping Rule set to the P-value Threshold, and with the Prob to Enter and Leave set to 0.25. This then produced a higher R Squared value of 88.31% and RMSE of 45.77%. It should be noted that for the categorical variables, we found that some of the values are significant while others are not. With Stepwise performed, we identified the significant variables in the dataset. These variables are: State bottle cost, Volume sold( L), bottles sold, bottle Volume (ml).

Model Development Principal Component Analysis Decision Tree Analysis 3 PCAs chosen based on cumulative percentage Decision Tree Analysis Number of splits = 155; Rsquare of 0.86 and 0.83 for Training and Validation, respectively K-Means Clustering 7 initial clusters selected; produced lower R square value than other methods Time Series Analysis Transfer Function Model selected with Rsquare value of 0.80 and MAE of 27.29

Model Evaluation Both the PCA models and Decision Tree Analyses performed well with an R Squared value of 0.93 and 0.86, respectively

Model Evaluation (cont.)

Conclusion The PCA model was statistically the best model with the highest R Squared value We found that alcohol sales peak from October to February based on Time Series Analysis results The higher the cost of alcohol, the higher the sales. Consumer tend to choose more expensive brands Hawkeye Vodka was the number one selling item throughout the year in Des Moines, Iowa City, and Cedar Rapids Black Velvet, a discount Canadian Whiskey, produces the most sales around New Year’s