Basketball Position Classification

Slides:



Advertisements
Similar presentations
Nonparametric Methods: Nearest Neighbors
Advertisements

Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
Scott Wiese ECE 539 Professor Hu
1 CS 391L: Machine Learning: Instance Based Learning Raymond J. Mooney University of Texas at Austin.
March Madness  It’s tournament time and its going to be awesome baby!! Throughout this webquest you are going to realize how we use graph theory in our.
By Andrew Finley. Research Question Is it possible to predict a football player’s professional based on collegiate performance? That is, is it possible.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
The value of kernel function represents the inner product of two training points in feature space Kernel functions merge two steps 1. map input data from.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Lebron James vs. Carmelo Anthony Comparing the Two Rookie Seasons.
Principle of Locality for Statistical Shape Analysis Paul Yushkevich.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Rating Systems Vs Machine Learning on the context of sports George Kyriakides, Kyriacos Talattinis, George Stefanides Department of Applied Informatics,
CS Instance Based Learning1 Instance Based Learning.
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Using a Feed-forward ANN to predict NBA Games. About my ANN -Trained incrementally using back propagation -Currently it only uses sigmoid activation -Outputs.
Machine Learning CSE 681 CH2 - Supervised Learning.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting Huang, C. L. & Tsai, C. Y. Expert Systems with Applications 2008.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.
CSSE463: Image Recognition Day 11 Lab 4 (shape) tomorrow: feel free to start in advance Lab 4 (shape) tomorrow: feel free to start in advance Test Monday.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
Robust Object Tracking by Hierarchical Association of Detection Responses Present by fakewen.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
On Utillizing LVQ3-Type Algorithms to Enhance Prototype Reduction Schemes Sang-Woon Kim and B. John Oommen* Myongji University, Carleton University*
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Transfer Market Optimizer by Colton Freund and Zachary Krepps.
CSSE463: Image Recognition Day 11 Due: Due: Written assignment 1 tomorrow, 4:00 pm Written assignment 1 tomorrow, 4:00 pm Start thinking about term project.
CS Machine Learning Instance Based Learning (Adapted from various sources)
GPGPU Performance and Power Estimation Using Machine Learning Gene Wu – UT Austin Joseph Greathouse – AMD Research Alexander Lyashevsky – AMD Research.
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Big data classification using neural network
Chapter 5 Unsupervised learning
Data Transformation: Normalization
Classification with Gene Expression Data
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Semi-supervised Machine Learning Gergana Lazarova
Conclusions and areas for further analysis
CSSE463: Image Recognition Day 11
CSE 4705 Artificial Intelligence
Discrimination and Classification
Forecasting The Future of Movies
NBA Draft Prediction BIT 5534 May 2nd 2018
Prediction of NBA games based on Machine Learning Methods
Instance Based Learning (Adapted from various sources)
Machine Learning Week 1.
Roberto Battiti, Mauro Brunato
CSSE463: Image Recognition Day 11
Objective of This Course
Prepared by: Mahmoud Rafeek Al-Farra
Prepared by: Mahmoud Rafeek Al-Farra
KAIST CS LAB Oh Jong-Hoon
Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)
Revision (Part II) Ke Chen
Coach at Hand.
Instance Based Learning
NBA Analytics Zyrus Johnson.
Matthew Renner, Trish Beeksma, Patch Kenny
Nearest Neighbors CSC 576: Data Mining.
Junheng, Shengming, Yunsheng 11/09/2018
CSSE463: Image Recognition Day 11
CSSE463: Image Recognition Day 11
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Outlines Introduction & Objectives Methodology & Workflow
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Presentation transcript:

Basketball Position Classification Brandon Hardesty, Matt Saldaña,  Audrey Bunn

Informal Problem Statement Utilize classification algorithms to predict a basketball player’s most effective position, either forward or guard. NBA players’ statistics are used for comparison in the classification and for algorithmic learning data. Audrey

Formal Problem Statement Let P be a set of basketball players of length 90. Set P has four subsets, G, F, X, and Y. Subset G is a list of the top 10 NBA guard statistics for the 2017-2018 season; and subset F is a list of the top 10 NBA forward statistics for the 2017-2018 season. Subset X is a list of NCAA statistics for 25 guards, and subset Y is a list of NCAA statistics for 25 forwards. For all players pi in G and F, pi is mapped to forward if statistics, si, is most similar to set F’s average statistics. Similarly, pi is mapped to guard if si is most similar to set G’s average statistics. P =All players G = NBA Guards F = NBA Forwards X = NCAA guards Y = NCAA forwards pi = Position classified players si = Position specific statistics Brandon

Program Use AAU & Collegiate programs NBA front offices Companies investing in big data in sports Personal interest Algorithm analysis Spicy

Context Defining Modern NBA Player Positions - Applying Machine Learning to Uncover Functional Roles in Basketball by Han Man Using Machine Learning to Find the 8 Types of Players in the NBA by Alex Cheng Utilize K-means clustering, DBSCAN, and Hierarchical clustering Similarities: data source, normalization, classifying by position Differences: classifies players within the same league, statistics used for classification Spicy

Statistical Evaluation Average points per game* Average rebounds per game* Free throw percentage 3 Point Percentage *All per 36 min game Audrey

Forwards vs Guards Statistically Brandon

Implemented Algorithms Learning Vector Quantization K’s Nearest Neighbors Brute Force Comparison

Brute Force Method Compares inputted data to average stats of learning data Position with the most “winning” comparisons is the classification Breaks ties with point differentials Pros: Easy to comprehend Low RAM usage Cons: Very naïve Inaccurate Variable run times/slow Audrey

Linear Vector Quantization Method LVQ takes training vectors and codebook vectors as inputs. Then, it iterates through the training vectors and finds the closest codebook vector. The closest codebook vector is then moved closer or further away from the training instance by a learning rate times the difference between the vectors. The test data is classified by finding the closest codebook vector after training has finished. Pros: Accurate Fastest Cons: Not the most accurate Memory intensive Brandon

K’s Nearest Neighbors Method Given: 2 vectors and value of K Pros: Select K entries from training set that are closest to the test value Make classification prediction by evaluating the training instances closest to the test data Pros: Most accurate Low RAM usage Cons: Not the fastest Spicy

Experimental Procedure Run college data tests (25 guards, 25 forwards) compared to NBA learning data to classify Multiple runs with different n values (6, 12, 18, 24, 30, 36, 42, 48) Track total run time Track accuracy percentage Track memory usage Brandon

Run Time Comparison: In Milliseconds Audrey

Accuracy Comparison Spicy

RAM Usage Brandon

Conclusion Best algorithm for this project– K’s Nearest Neighbor Most accurate Moderate total run times Lowest RAM usage Linear Vector Quantization Comes in Second Moderately accurate Fastest run times Highest RAM usage Brute Force = Worthless Audrey

Future Work Predict wins and losses of games Predict tournament winners Different sports Different normalization technique Utilizing more in-game statistics Expanding on the number of positions Spicy

Five Questions Q. Why are the brute force run times so variable? What explains the spike in total run time between 24 test players and 42 test players? A. The spike in run times comes from the “tie breaker rounds.” Some of the players that were tested during those runs were borderline players, meaning their stats are almost in between those of a forward and a guard. The additional calculations necessary to create point differentials between the inputted data and these borderline players increased the run time for brute force. Q. Why does K’s Nearest Neighbors take more time than Linear Vector Quantization? A. K Nearest Neighbors has a longer run time because the algorithm has to run through the inputted data four times. The algorithm has to run through the data this many times to find the three closest points (our K value) to the feature vector. Q. Why does LVQ use more RAM to classify? A. The Linear Vector Quantization algorithm has to use more RAM to store codebook vectors and the data set. Q. What would be a more accurate brute force method? A. Running the point differential “tie breaker round” from the beginning instead of the initial “majority rules” method. Point differentials would be more straight to the point, fine- tuned, and accurate. Q. Is there a better classification algorithm out there to solve this problem? A. LVQ input data with the same number of attributes as the training data. The data that is being classified by the LVQ algorithm. Brandon