Predicting Government Spending on Professional Services

Predicting Government Spending on Professional Services
BIT 5534 – Applied Business Intelligence and analytics Lars Gustavson, Tapan Puntikura, Kevin Marinak Hi, My name is Lars Gustavson and I will be presenting on behalf of group 4 which includes myself, Tapan Puntikura, and Kevin Marinak. For our semester project we applied data mining approaches and techniques to developed a model using JMP software to aid in predicting government spending on professional services.

Problem Description In the business of federal contracting, companies are very dependent on the budget planning and spending trends across federal agencies This is particularly true for consulting firms that provide professional program management services as agency spending on these services tends to fluctuate more veritably than other products As such, it is crucial for service providers to be able to anticipate federal spending trends Doing so enables firms to efficiently allocate resources for marketing and business development. It also enables them to appropriately invest in staff and capability development for the services that the government demands. Without proper planning and foresight, firms’ success and growth are left to chance Primary question: How much are agencies expected to spend on professional services in the future? We chose this project and topic as we each work for different consulting firms providing various professional services. However, each of our companies provides services to the federal government and so we understand the importance and challenge of anticipating government spending. The are many factors that influence when and where the government will allocate funds based on agency priorities, taxpayer pressures, and operational needs. For service providers, it is critical to be able to anticipate spending in order to facilitate marketing, business development, and capabilities. Ultimately, a service provider wants to be able to efficiently allocate its resources to provide the services the government wants. Without proper planning and foresight, a company’s success and growth are left to chance. Therefore, the primary question we sought to answer is “How much are agencies expected to spend on professional services in the future?”

Data Source Primary data source for this project was the open source USA Spending data feed ( This site provides historical data on federal contracts, loans, grants, and direct payments. The dataset had 225 attributes separated into nine categories. Data was collected for all contract awards since government fiscal year 2013 (Oct ) to date. This resulted in a data set with 54,852 records. In order to develop the model, data was collected, analyzed, and prepared from USAspending.gov. This is an open source data feed that provides historical data on federal contracts. This includes information on purchasing agencies, vendors, competition factors, and types of products and services. The collected dataset included federal government contracts since October The dataset had 225 attributes and 54,852 records.

Data Exploration & Preparation
Data Exploration – Studied the 225 potential variables and selected the response variable “dollarsobligated” Data Preparation - The dataset was prepared by removing unnecessary (redundant and irrelevant) attributes. Basic linear fit models, scatterplots, and correlation were analyzed to assess significance of independent variables in relation to the target variables. The number of attributes was reduced down to 65 from the original 225. Data Transformation – Data transformation efforts required removal of additional redundant variables, replacing missing values, and data coding. Some of the numerical attributes were more useful after they were discretized by applying grouping ranges and coding the variables. Our group spent a considerable amount of time in exploring and preparing the data. This included studying the 225 potential variables and selecting the response variable “dollarsobligated”. It was also clear through exploration and analysis that most of the variables were redundant or irrelevant. Basic linear fit models, scatterplots, and correlation were analyzed to assess significance of independent variables in relation to the target variable. Through this analysis, we were able to reduce the number of potential input variables down to 65 variables. Additional transformations, including replacing missing values and applying discretizing ranges, were also done to make the dataset more manageable.

Model development Linear Model – JMP’s model fit tool was used to develop an appropriate linear regression model for predicting the target variable. This included trial and error of attribute selection using stepwise regression, as well as optimizing the number of input variables based on model complexity and accuracy. R-Square = 0.22 Neural Network - A Neural Network model was developed as an attempt to improve on the overall fit achieved by the Linear Model. Various versions of the model were tested for accuracy with the best fit ultimately being tested against a validation set of data that was not included in the training set. The validation of the model against new inputs tested the model’s accuracy against known target values. R-Square = 0.39 Once the dataset was prepared, we developed a linear regression model using the stepwise feature in JMP. This involved the trial and error of attributes in order to optimize the number of input variables based on model complexity and accuracy. A snapshot of the linear predicted plot is shown on the top of the slide. This model resulted in a low Rsquare of 0.22 with a root mean square error over 2 million. In hopes of improving on the fit from the linear model, we also developed a neural network model with one hidden layer. Multiple versions of this model were tested against a validation dataset resulting in a Rsquare of A sample diagram of one of the tested models is shown on the bottom of the slide.

Findings & Results Linear Model – The mean of value for a typical contract was $ 449,441. However, the low R-Square value and high root mean squared error (RMSE) indicate that the model is not very effective at predicting the value of a typical contract. Neural Network - An improvement over the Linear Model, but it’s overall fit was still on the low side. The large amount of variance within the dataset made the modelling technique somewhat unreliable. Key Input Variables - Six attributes were identified with the most significance in determining the amount of dollars obligated. Response Variable Input Variable P Value Dollarsobligated Vendorname Fundingrequestingofficeid 2.4623E-31 Currentcompletiondate 1.5975E-27 Performancebasedservicecontract 1.8508E-06 Typeofcontractpricing 0.0007 Statecode Unfortunately, our group found the results from the linear model were not acceptable. The low R-Square value of 0.22 and high root mean squared error indicate that the model is not very effective at predicting the value of a typical contract. The Neural Network Model was an improvement over the Linear Model, however it also resulted in an unacceptably low at 0.39. However, the data mining analysis did identify six attributes that have the most influence on the response variable, “dollarsobligated”. These key attributes are listed on the bottom of the slide.

Conclusion The extracted USAspending.gov dataset turned out to be less useful than the group originally expected. Many of the input variables were duplicative in nature and/or statistically insignificant. All of the models that were developed had low R2 values, indicating a poor overall fit and inferior predictive ability. The group concluded that the dataset did not contain attributes that were sufficient for developing a model with the purpose of determining how money would be allocated for future contract awards. In conclusion, our group initially believed the USAspending.gov dataset would contain the proper indicators for predicting government spending on professional services; however, the data mining and analysis reveal the dataset to be less useful than originally thought. Many of the input variables were redundant and/or irrelevant leading to their exclusion or insignificant influence on the model. As a result, neither the linear regression nor neural network models proved to be of a good enough fit and provided inferior predictive ability. Overall, our group has concluded that the dataset does not contain attributes that are sufficient for developing a model with the purpose of determining how money would be allocated for future contract awards. It is believed that the inherent nature of the government contracting process induces variability in contract awards that can not be modeled with the chosen dataset.

Predicting Government Spending on Professional Services

Similar presentations

Presentation on theme: "Predicting Government Spending on Professional Services"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Predicting Government Spending on Professional Services

Similar presentations

Presentation on theme: "Predicting Government Spending on Professional Services"— Presentation transcript:

Similar presentations

About project

Feedback