Predicting Government Spending on Professional Services

Slides:



Advertisements
Similar presentations
Identify real savings, collaborate more effectively, improve contract compliance and evidence your success.
Advertisements

Kin 304 Regression Linear Regression Least Sum of Squares
STIMULUS AND STATE FISCAL STABILIZATION BOROUGH ASSEMBLY MEETING AUGUST 20, 2009 NANCY WAGNER, FNSBSD SUPERINTENDENT TRACI GATEWOOD, GRANTS AND SPECIAL.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Multiple Linear Regression
Introduction to Data Mining with XLMiner
Chapter 12 - Forecasting Forecasting is important in the business decision-making process in which a current choice or decision has future implications:
1 4. Multiple Regression I ECON 251 Research Methods.
Data Analysis Statistics. Inferential statistics.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
Slide 1 Testing Multivariate Assumptions The multivariate statistical techniques which we will cover in this class require one or more the following assumptions.
Example 16.3 Estimating Total Cost for Several Products.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
April 11, 2008 Data Mining Competition 2008 The 4 th Annual Business Intelligence Symposium Hualin Wang Manager of Advanced.
Inference for regression - Simple linear regression
Critical Analysis. Key Ideas When evaluating claims based on statistical studies, you must assess the methods used for collecting and analysing the data.
Graphical Analysis. Why Graph Data? Graphical methods Require very little training Easy to use Massive amounts of data can be presented more readily Can.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
Data Mining – Algorithms: Linear Models Chapter 4, Section 4.6.
A Statistical Analysis of Seedlings Planted in the Encampment Forest Association By: Tony Nixon.
Copyright © 2010, SAS Institute Inc. All rights reserved. Applied Analytics Using SAS ® Enterprise Miner™
 Mail Order Company in USA › Would like to find out if there is a way › To reduce mailing cost › By analyzing the past data.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Introduction – Addressing Business Challenges Microsoft® Business Intelligence Solutions.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
Chapter 16 Data Analysis: Testing for Associations.
Revenue Generation in Hospital Foundations: Neural Network versus Regression Model Recommendations Mary E. Malliaris Loyola University Chicago Maria Pappas.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Software Architecture Evaluation Methodologies Presented By: Anthony Register.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 12 Multiple.
Copyright © 2011 Pearson Education, Inc. Regression Diagnostics Chapter 22.
© Galit Shmueli and Peter Bruce 2010 Chapter 6: Multiple Linear Regression Data Mining for Business Analytics Shmueli, Patel & Bruce.
ERP and Related Technologies
LOAD FORECASTING. - ELECTRICAL LOAD FORECASTING IS THE ESTIMATION FOR FUTURE LOAD BY AN INDUSTRY OR UTILITY COMPANY - IT HAS MANY APPLICATIONS INCLUDING.
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Business Intelligence Energy, Resources and Utilities.
Prepared by Fayes Salma.  Introduction: Financial Tasks  Data Mining process  Methods in Financial Data mining o Neural Network o Decision Tree  Trading.
Road Owners and PMS Christopher R. Bennett Senior Transport Specialist East Asia and Pacific Transport The World Bank Washington, D.C.
Meta Data and Group Decision-Making
Making Sense of Student Loan Data
Predicting the performance of US Airline carriers
Do SKU and Network Complexity Drive Inventory Levels?
Demand Estimation and Forecasting
Evaluating Insurance Marketing Strategies through Data Mining
Correlation and Simple Linear Regression
Linear Regression CSC 600: Data Mining Class 13.
USE OF DATA ANALYTICS TO PREDICT THE DEMAND OF BIKES
Predict House Sales Price
Advanced Analytics Using Enterprise Miner
Using Data Analytics to Predict Liquor Sales in Iowa State
NBA Draft Prediction BIT 5534 May 2nd 2018
Employee Turnover: Data Analysis and Exploration
IDENTIFYING BERNIE SANDERS’ VOTER BASE THROUGH PREDICTIVE ANALYTICS
Using Tensorflow to Detect Objects in an Image
The Paradox of MDF Marketing
Dr. Morgan C. Wang Department of Statistics
Analytics: Its More than Just Modeling
MAZARS’ CONSULTING PRACTICE Helping your Business Venture Further
DESIGN OF EXPERIMENTS by R. C. Baker
Automating Profitable Growth™
Presentation transcript:

Predicting Government Spending on Professional Services BIT 5534 – Applied Business Intelligence and analytics Lars Gustavson, Tapan Puntikura, Kevin Marinak Hi, My name is Lars Gustavson and I will be presenting on behalf of group 4 which includes myself, Tapan Puntikura, and Kevin Marinak. For our semester project we applied data mining approaches and techniques to developed a model using JMP software to aid in predicting government spending on professional services.

Problem Description In the business of federal contracting, companies are very dependent on the budget planning and spending trends across federal agencies This is particularly true for consulting firms that provide professional program management services as agency spending on these services tends to fluctuate more veritably than other products As such, it is crucial for service providers to be able to anticipate federal spending trends Doing so enables firms to efficiently allocate resources for marketing and business development. It also enables them to appropriately invest in staff and capability development for the services that the government demands. Without proper planning and foresight, firms’ success and growth are left to chance Primary question: How much are agencies expected to spend on professional services in the future? We chose this project and topic as we each work for different consulting firms providing various professional services. However, each of our companies provides services to the federal government and so we understand the importance and challenge of anticipating government spending. The are many factors that influence when and where the government will allocate funds based on agency priorities, taxpayer pressures, and operational needs. For service providers, it is critical to be able to anticipate spending in order to facilitate marketing, business development, and capabilities. Ultimately, a service provider wants to be able to efficiently allocate its resources to provide the services the government wants. Without proper planning and foresight, a company’s success and growth are left to chance. Therefore, the primary question we sought to answer is “How much are agencies expected to spend on professional services in the future?”

Data Source Primary data source for this project was the open source USA Spending data feed (www.usaspending.gov/data). This site provides historical data on federal contracts, loans, grants, and direct payments. The dataset had 225 attributes separated into nine categories. Data was collected for all contract awards since government fiscal year 2013 (Oct. 1 2012) to date. This resulted in a data set with 54,852 records. In order to develop the model, data was collected, analyzed, and prepared from USAspending.gov. This is an open source data feed that provides historical data on federal contracts. This includes information on purchasing agencies, vendors, competition factors, and types of products and services. The collected dataset included federal government contracts since October 2012. The dataset had 225 attributes and 54,852 records.

Data Exploration & Preparation Data Exploration – Studied the 225 potential variables and selected the response variable “dollarsobligated” Data Preparation - The dataset was prepared by removing unnecessary (redundant and irrelevant) attributes. Basic linear fit models, scatterplots, and correlation were analyzed to assess significance of independent variables in relation to the target variables. The number of attributes was reduced down to 65 from the original 225. Data Transformation – Data transformation efforts required removal of additional redundant variables, replacing missing values, and data coding. Some of the numerical attributes were more useful after they were discretized by applying grouping ranges and coding the variables. Our group spent a considerable amount of time in exploring and preparing the data. This included studying the 225 potential variables and selecting the response variable “dollarsobligated”. It was also clear through exploration and analysis that most of the variables were redundant or irrelevant. Basic linear fit models, scatterplots, and correlation were analyzed to assess significance of independent variables in relation to the target variable. Through this analysis, we were able to reduce the number of potential input variables down to 65 variables. Additional transformations, including replacing missing values and applying discretizing ranges, were also done to make the dataset more manageable.

Model development Linear Model – JMP’s model fit tool was used to develop an appropriate linear regression model for predicting the target variable. This included trial and error of attribute selection using stepwise regression, as well as optimizing the number of input variables based on model complexity and accuracy. R-Square = 0.22 Neural Network - A Neural Network model was developed as an attempt to improve on the overall fit achieved by the Linear Model. Various versions of the model were tested for accuracy with the best fit ultimately being tested against a validation set of data that was not included in the training set. The validation of the model against new inputs tested the model’s accuracy against known target values. R-Square = 0.39 Once the dataset was prepared, we developed a linear regression model using the stepwise feature in JMP. This involved the trial and error of attributes in order to optimize the number of input variables based on model complexity and accuracy. A snapshot of the linear predicted plot is shown on the top of the slide. This model resulted in a low Rsquare of 0.22 with a root mean square error over 2 million. In hopes of improving on the fit from the linear model, we also developed a neural network model with one hidden layer. Multiple versions of this model were tested against a validation dataset resulting in a Rsquare of 0.39. A sample diagram of one of the tested models is shown on the bottom of the slide.

Findings & Results Linear Model – The mean of value for a typical contract was $ 449,441. However, the low R-Square value and high root mean squared error (RMSE) indicate that the model is not very effective at predicting the value of a typical contract. Neural Network - An improvement over the Linear Model, but it’s overall fit was still on the low side. The large amount of variance within the dataset made the modelling technique somewhat unreliable. Key Input Variables - Six attributes were identified with the most significance in determining the amount of dollars obligated. Response Variable Input Variable P Value Dollarsobligated Vendorname Fundingrequestingofficeid 2.4623E-31 Currentcompletiondate 1.5975E-27 Performancebasedservicecontract 1.8508E-06 Typeofcontractpricing 0.0007 Statecode 0.00395 Unfortunately, our group found the results from the linear model were not acceptable. The low R-Square value of 0.22 and high root mean squared error indicate that the model is not very effective at predicting the value of a typical contract. The Neural Network Model was an improvement over the Linear Model, however it also resulted in an unacceptably low at 0.39. However, the data mining analysis did identify six attributes that have the most influence on the response variable, “dollarsobligated”. These key attributes are listed on the bottom of the slide.

Conclusion The extracted USAspending.gov dataset turned out to be less useful than the group originally expected. Many of the input variables were duplicative in nature and/or statistically insignificant. All of the models that were developed had low R2 values, indicating a poor overall fit and inferior predictive ability. The group concluded that the dataset did not contain attributes that were sufficient for developing a model with the purpose of determining how money would be allocated for future contract awards. In conclusion, our group initially believed the USAspending.gov dataset would contain the proper indicators for predicting government spending on professional services; however, the data mining and analysis reveal the dataset to be less useful than originally thought. Many of the input variables were redundant and/or irrelevant leading to their exclusion or insignificant influence on the model. As a result, neither the linear regression nor neural network models proved to be of a good enough fit and provided inferior predictive ability. Overall, our group has concluded that the dataset does not contain attributes that are sufficient for developing a model with the purpose of determining how money would be allocated for future contract awards. It is believed that the inherent nature of the government contracting process induces variability in contract awards that can not be modeled with the chosen dataset.