NBA Draft Prediction BIT 5534 May 2nd 2018 Nate Grefe Eric Porea Tyler Neff
Contents Business Problem Questions Investigated Data Source Pre-Processing Modeling Results Conclusion
Business Problem NBA teams draft prospects every year that can have an immense impact on the success of the franchise Even a quick review of historical data shows the variance in success of a draft pick Predictive Modeling can aid in a better draft process that will help the team Goal is to give teams a powerful tool with many potential applications to improve their success
Questions Investigated Where will the NBA prospect be picked? What variables are most effective in predicting the draft round and pick of a prospect? What variables affect a higher draft pick?
Data Source Data set from data.world - Using statistics from basketball-reference.com College basketball player information from 1989 to 2016 - Pick number - Draft round - Points per game - Minutes played per game - etc. 23 total attributes with 32,605 data points - Numerical & categorical data
Pre-Processing Data set not complete - Some players have no data (injury, never played, etc.) - Holes in data for some attributes Players with no data removed from data set Missing data points left blank - Not appropriate to assume or estimate missing instances 18 attributes and 24,677 data points after cleaning data Standardized attributed to be used for accurate comparison - Points per game, assists per game, rebounds per game, minutes played per game, field goal percentage, three point percentage, and free throw percentage
Modeling Logistic Regression Linear Regression Neural Network A binary approach to predict which round a player is drafted Round 1 or Round 2 Linear Regression Approach used to predict a continuous variables The specific draft pick the player will be (1-60) Neural Network Ability to learn from experience and model non-linear relationships Ability to be used for binary and continuous variables Fast execution times, advantageous for teams “on the clock” to select the best player available Decision/Classification Tree Can provide visual representation of decision making Easier for those unfamiliar with data mining to follow
Modeling Techniques used to model binary (draft round) and continuous (draft pick) target variables Principle Component Analysis Feature engineering to determine which attributes have greatest impact from original model Significant if Eigenvalue greater than 1 Stepwise Regression Allow “stepping” process to determine significant attributes from original model P-values of attributes will determine ability to predict target variable Linear/Logistic Regression Utilized both PCA components and stepwise variables for regression analysis Neural Network No specified functional form needed “black box” issue for interpretation Decision Tree Decisioning criteria used to split nodes based on player statistics
Results – Binary Modeling (Predicting Draft Round) Utilized generated R^2 and RSME values to compare models Generated confusion matrices with accompanying calculations to further compare The Decision/Classification Tree was most efficient, with R^2 of 0.43 Logistic Regression utilizing Principal Components was the least accurate R^2 RMSE Accuracy Specificity Precision Original model 0.26 0.43 72% 82% 63% Principal Component Analysis 0.23 0.44 71% 49% 83% Stepwise Regression 0.27 62% 78% Neural Network 0.48 80% 66% Decision Tree 0.41 76% 77%
Results – Continuous Modeling (Predicting Draft Pick) Utilized R^2 and RSME metrics to compare the various models The Decision/Classification Tree again was the most efficient The Neural Network produced was the least efficient R^2 RMSE Original model 0.29 13.10 Principal Component Analysis 0.28 13.23 Stepwise Regression 0.30 13.15 Neural Network 0.26 13.69 Decision Tree 0.43 12.82
Conclusion We were able to generate a sound business case for the use of predictive analytics for NBA draft decisions Obtained a sufficient dataset for training purposes and standardized the data for modeling purposes Utilized feature engineering to narrow down variables which impact our predicted variables Developed and compared a variety of data mining techniques/models to predict both draft round selection and draft pick selection Concluded that for our dataset, Decision/Classification Trees were most able to explain the variance in both the draft round and draft pick variables
Variable Dictionary Attribute Data Type Description Draft Year Continuous The year in which the player was drafted to play in the NBA Round Categorical The round that the player was drafted in. The NBA draft consists of two (2) rounds. Pick The number in the draft where the player was selected. Currently 60 players are drafted each year in total with the first 30 picks being in round 1 and picks 31-60 occurring in round 2. Field Goal Percentage Numerical The number of field goals made divided by the number of field goal attempts. Includes both 2-point and 3-point field goals and attempts. Three Point Percentage The percentage of 3-point field goals made divided by the total number of 3-point field goals attempted. Free Throw Percentage The number of free throws made divided by the number of free throws attempted. Minutes Played per Game The average number of minutes played each game by the player. Calculated by dividing total minutes played by the number of games played by the player. Average Points per Game The average points scored by the player each game. Calculated by dividing the total points scored by the number of games played by the player. Average Total Rebounds per Game The average rebounds secured by the respective player per game. Calculated by dividing the total rebounds by the number of games played by the player. Average Assists per Game The average number of assists a player has per game. Calculated by dividing total assists by the number of games played by the player.