NBA Draft Prediction BIT 5534 May 2nd 2018

Slides:



Advertisements
Similar presentations
Data Mining in Computer Games By Adib Adam Hussain & Mohammed Sarfraz.
Advertisements

KICKIN’ BACK Predictive Analytics and Fantasy Football Kicking GEORGE GREEN DSS680: PREDICTIVE ANALYTICS.
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Rating Systems Vs Machine Learning on the context of sports George Kyriakides, Kyriacos Talattinis, George Stefanides Department of Applied Informatics,
Application of SAS®! Enterprise Miner™ in Credit Risk Analytics
Data Mining Techniques
Understanding Statistics
Using a Feed-forward ANN to predict NBA Games. About my ANN -Trained incrementally using back propagation -Currently it only uses sigmoid activation -Outputs.
Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.
Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 5 Auxiliary Uses of Trees.
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Nurissaidah Ulinnuha. Introduction Student academic performance ( ) Logistic RegressionNaïve Bayessian Artificial Neural Network Student Academic.
Linear Discriminant Analysis and Logistic Regression.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Data Mining and Decision Support
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
An Excel-based Data Mining Tool Chapter The iData Analyzer.
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Logistic Regression: Regression with a Binary Dependent Variable.
DeepMIDI: Music Generation
Predicting the performance of US Airline carriers
Principal Component Analysis (PCA)
SNS COLLEGE OF TECHNOLOGY
General-Purpose Learning Machine
Robert Anderson SAS JMP
CHAPTER 3 Describing Relationships
Business and Economics 6th Edition
Data Transformation: Normalization
Chapter 7. Classification and Prediction
Linear Regression CSC 600: Data Mining Class 12.
Design of Experiments (DOE)
Huyen Nguyen, Dung Phan, and Girish Shirodkar
Regression Analysis Module 3.
Conclusions and areas for further analysis
Multivariate Analysis - Introduction
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
USE OF DATA ANALYTICS TO PREDICT THE DEMAND OF BIKES
Predict House Sales Price
Predicting Academic Performance of University Students
Using Data Analytics to Predict Liquor Sales in Iowa State
Employee Turnover: Data Analysis and Exploration
Machine Learning & Data Science
Predicting Government Spending on Professional Services
Educational Research: Correlational Studies
Jermaine Carn Advisor: Nick Webb
Data Quality By Suparna Kansakar.
Dr. Morgan C. Wang Department of Statistics
Ninja Trader: Introduction to data mining in financial applications
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Descriptive Statistics vs. Factor Analysis
Introduction to Predictive Modeling
Classification and Prediction
Course Lab Introduction to IBM Watson Analytics
Somi Jacob and Christian Bach
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Statistical Thinking and Applications
CHAPTER 3 Describing Relationships
Chapter 6 Logistic Regression: Regression with a Binary Dependent Variable Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
Multivariate Analysis - Introduction
Business and Economics 7th Edition
Analysis on Accelerated Learning Cohorts
CHAPTER 3 Describing Relationships
Presentation transcript:

NBA Draft Prediction BIT 5534 May 2nd 2018 Nate Grefe Eric Porea Tyler Neff

Contents Business Problem Questions Investigated Data Source Pre-Processing Modeling Results Conclusion

Business Problem NBA teams draft prospects every year that can have an immense impact on the success of the franchise Even a quick review of historical data shows the variance in success of a draft pick Predictive Modeling can aid in a better draft process that will help the team Goal is to give teams a powerful tool with many potential applications to improve their success

Questions Investigated Where will the NBA prospect be picked? What variables are most effective in predicting the draft round and pick of a prospect? What variables affect a higher draft pick?

Data Source Data set from data.world - Using statistics from basketball-reference.com College basketball player information from 1989 to 2016 - Pick number - Draft round - Points per game - Minutes played per game - etc. 23 total attributes with 32,605 data points - Numerical & categorical data

Pre-Processing Data set not complete - Some players have no data (injury, never played, etc.) - Holes in data for some attributes Players with no data removed from data set Missing data points left blank - Not appropriate to assume or estimate missing instances 18 attributes and 24,677 data points after cleaning data Standardized attributed to be used for accurate comparison - Points per game, assists per game, rebounds per game, minutes played per game, field goal percentage, three point percentage, and free throw percentage

Modeling Logistic Regression Linear Regression Neural Network A binary approach to predict which round a player is drafted Round 1 or Round 2 Linear Regression Approach used to predict a continuous variables The specific draft pick the player will be (1-60) Neural Network Ability to learn from experience and model non-linear relationships Ability to be used for binary and continuous variables Fast execution times, advantageous for teams “on the clock” to select the best player available Decision/Classification Tree Can provide visual representation of decision making Easier for those unfamiliar with data mining to follow

Modeling Techniques used to model binary (draft round) and continuous (draft pick) target variables Principle Component Analysis Feature engineering to determine which attributes have greatest impact from original model Significant if Eigenvalue greater than 1 Stepwise Regression Allow “stepping” process to determine significant attributes from original model P-values of attributes will determine ability to predict target variable Linear/Logistic Regression Utilized both PCA components and stepwise variables for regression analysis Neural Network No specified functional form needed “black box” issue for interpretation Decision Tree Decisioning criteria used to split nodes based on player statistics

Results – Binary Modeling (Predicting Draft Round) Utilized generated R^2 and RSME values to compare models Generated confusion matrices with accompanying calculations to further compare The Decision/Classification Tree was most efficient, with R^2 of 0.43 Logistic Regression utilizing Principal Components was the least accurate R^2 RMSE Accuracy Specificity Precision Original model 0.26 0.43 72% 82% 63% Principal Component Analysis 0.23 0.44 71% 49% 83% Stepwise Regression 0.27 62% 78% Neural Network 0.48 80% 66% Decision Tree 0.41 76% 77%

Results – Continuous Modeling (Predicting Draft Pick) Utilized R^2 and RSME metrics to compare the various models The Decision/Classification Tree again was the most efficient The Neural Network produced was the least efficient R^2 RMSE Original model 0.29 13.10 Principal Component Analysis 0.28 13.23 Stepwise Regression 0.30 13.15 Neural Network 0.26 13.69 Decision Tree 0.43 12.82

Conclusion We were able to generate a sound business case for the use of predictive analytics for NBA draft decisions Obtained a sufficient dataset for training purposes and standardized the data for modeling purposes Utilized feature engineering to narrow down variables which impact our predicted variables Developed and compared a variety of data mining techniques/models to predict both draft round selection and draft pick selection Concluded that for our dataset, Decision/Classification Trees were most able to explain the variance in both the draft round and draft pick variables

Variable Dictionary Attribute Data Type Description Draft Year Continuous The year in which the player was drafted to play in the NBA Round Categorical The round that the player was drafted in. The NBA draft consists of two (2) rounds. Pick The number in the draft where the player was selected. Currently 60 players are drafted each year in total with the first 30 picks being in round 1 and picks 31-60 occurring in round 2. Field Goal Percentage Numerical The number of field goals made divided by the number of field goal attempts. Includes both 2-point and 3-point field goals and attempts. Three Point Percentage The percentage of 3-point field goals made divided by the total number of 3-point field goals attempted. Free Throw Percentage The number of free throws made divided by the number of free throws attempted. Minutes Played per Game The average number of minutes played each game by the player. Calculated by dividing total minutes played by the number of games played by the player. Average Points per Game The average points scored by the player each game. Calculated by dividing the total points scored by the number of games played by the player. Average Total Rebounds per Game The average rebounds secured by the respective player per game. Calculated by dividing the total rebounds by the number of games played by the player. Average Assists per Game The average number of assists a player has per game. Calculated by dividing total assists by the number of games played by the player.