Basketball Position Classification

Basketball Position Classification
Brandon Hardesty, Matt Saldaña, Audrey Bunn

Informal Problem Statement
Utilize classification algorithms to predict a basketball player’s most effective position, either forward or guard. NBA players’ statistics are used for comparison in the classification and for algorithmic learning data. Audrey

Formal Problem Statement
Let P be a set of basketball players of length 90. Set P has four subsets, G, F, X, and Y. Subset G is a list of the top 10 NBA guard statistics for the season; and subset F is a list of the top 10 NBA forward statistics for the season. Subset X is a list of NCAA statistics for 25 guards, and subset Y is a list of NCAA statistics for 25 forwards. For all players pi in G and F, pi is mapped to forward if statistics, si, is most similar to set F’s average statistics. Similarly, pi is mapped to guard if si is most similar to set G’s average statistics. P =All players G = NBA Guards F = NBA Forwards X = NCAA guards Y = NCAA forwards pi = Position classified players si = Position specific statistics Brandon

Program Use AAU & Collegiate programs NBA front offices
Companies investing in big data in sports Personal interest Algorithm analysis Spicy

Context Defining Modern NBA Player Positions - Applying Machine Learning to Uncover Functional Roles in Basketball by Han Man Using Machine Learning to Find the 8 Types of Players in the NBA by Alex Cheng Utilize K-means clustering, DBSCAN, and Hierarchical clustering Similarities: data source, normalization, classifying by position Differences: classifies players within the same league, statistics used for classification Spicy

Statistical Evaluation
Average points per game* Average rebounds per game* Free throw percentage 3 Point Percentage *All per 36 min game Audrey

Forwards vs Guards Statistically
Brandon

Implemented Algorithms
Learning Vector Quantization K’s Nearest Neighbors Brute Force Comparison

Brute Force Method Compares inputted data to average stats of learning data Position with the most “winning” comparisons is the classification Breaks ties with point differentials Pros: Easy to comprehend Low RAM usage Cons: Very naïve Inaccurate Variable run times/slow Audrey

Linear Vector Quantization
Method LVQ takes training vectors and codebook vectors as inputs. Then, it iterates through the training vectors and finds the closest codebook vector. The closest codebook vector is then moved closer or further away from the training instance by a learning rate times the difference between the vectors. The test data is classified by finding the closest codebook vector after training has finished. Pros: Accurate Fastest Cons: Not the most accurate Memory intensive Brandon

K’s Nearest Neighbors Method Given: 2 vectors and value of K Pros:
Select K entries from training set that are closest to the test value Make classification prediction by evaluating the training instances closest to the test data Pros: Most accurate Low RAM usage Cons: Not the fastest Spicy

Experimental Procedure
Run college data tests (25 guards, 25 forwards) compared to NBA learning data to classify Multiple runs with different n values (6, 12, 18, 24, 30, 36, 42, 48) Track total run time Track accuracy percentage Track memory usage Brandon

Run Time Comparison: In Milliseconds
Audrey

Accuracy Comparison Spicy

RAM Usage Brandon

Conclusion Best algorithm for this project– K’s Nearest Neighbor
Most accurate Moderate total run times Lowest RAM usage Linear Vector Quantization Comes in Second Moderately accurate Fastest run times Highest RAM usage Brute Force = Worthless Audrey

Future Work Predict wins and losses of games
Predict tournament winners Different sports Different normalization technique Utilizing more in-game statistics Expanding on the number of positions Spicy

Five Questions Q. Why are the brute force run times so variable? What explains the spike in total run time between 24 test players and 42 test players? A. The spike in run times comes from the “tie breaker rounds.” Some of the players that were tested during those runs were borderline players, meaning their stats are almost in between those of a forward and a guard. The additional calculations necessary to create point differentials between the inputted data and these borderline players increased the run time for brute force. Q. Why does K’s Nearest Neighbors take more time than Linear Vector Quantization? A. K Nearest Neighbors has a longer run time because the algorithm has to run through the inputted data four times. The algorithm has to run through the data this many times to find the three closest points (our K value) to the feature vector. Q. Why does LVQ use more RAM to classify? A. The Linear Vector Quantization algorithm has to use more RAM to store codebook vectors and the data set. Q. What would be a more accurate brute force method? A. Running the point differential “tie breaker round” from the beginning instead of the initial “majority rules” method. Point differentials would be more straight to the point, fine- tuned, and accurate. Q. Is there a better classification algorithm out there to solve this problem? A. LVQ input data with the same number of attributes as the training data. The data that is being classified by the LVQ algorithm. Brandon

Basketball Position Classification

Similar presentations

Presentation on theme: "Basketball Position Classification"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Basketball Position Classification

Similar presentations

Presentation on theme: "Basketball Position Classification"— Presentation transcript:

Similar presentations

About project

Feedback