Jermaine Carn Advisor: Nick Webb

Slides:



Advertisements
Similar presentations
On Comparing Classifiers : Pitfalls to Avoid and Recommended Approach
Advertisements

Presented by: Adrianna W.
Kirkwood Lacrosse Club Statistics and Scoring Based on the 2014 NCAA Statistics Guide.
Predicting basketball RPI. What is RPI? Ratings Percentage Index Based on win/loss percentage throughout the season. Not necessarily a predictor of a.
EXPLOITATION IN MEN’S COLLEGE BASKETBALL David Berri Southern Utah University Robert Brown California State University, San Marcos Dan Rascher University.
Forks Fury U Basketball
BASKETBALL. Basketball  Basic Rules  Offense  Defense  Court and Positions  Techniques  Red is input.
Oliver Reimer Matthew Crites Brian Jones.  Determine the winner of a NFL game between two teams.  How? ◦ What aspects of a team are most important in.
BASKETBALL ON PAPER 1 Math & Basketball: Astronomy used to be just watching pretty stars, too Dean Oliver Author, Basketball on Paper Consultant to the.
Introduction Offensive strategies of the National Football League have seemingly shifted towards a “West Coast” style offense, relying more heavily on.
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Classification: Evaluation February 23,
Using a Feed-forward ANN to predict NBA Games. About my ANN -Trained incrementally using back propagation -Currently it only uses sigmoid activation -Outputs.
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A data mining approach to the prediction of corporate failure.
NBA Statistics and Information Technology By: Greg Stitt.
By Kelly Galarneau. Don’t Attack Me Until the End.  My probability model is not perfect, and never will be.  We will discuss the assumptions and limitations.
Hoos going to win the NCAA Tournament? 5 minute persuasive talk By: Jim Boley
By: Jim Ogle Charles Henkel Kyle Marcangelo.  We are trying to find what team would win in a hypothetical game based on their stats for the
TEAM HANDBALL History: Created in Germany in 1890’s for the training of gymnasts. Two versions: 11 man outdoor game and a 7 man indoor game. The indoor.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Do Now What experience do you have with basketball? What are the 5 fundamentals of basketball? Who is your favorite team and player?
Feature Selection Poonam Buch. 2 The Problem  The success of machine learning algorithms is usually dependent on the quality of data they operate on.
Basketball. History Gameplay Basic Rules Offense Techniques Defense Court and Positions Vocabulary.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Made by : Calliope – Christine Despotidou MAY 2015.
Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.
BASKETBALL.
Basketball’s Cinderella Stories: What Makes a Successful Underdog in the NCAA Tournament? Alinna Brown.
Introduction to basketball and electronic scoring.
Basketball Irene Kotsiou A1. General rules of basketball Games are played in four quarters of 10 Five players from each team may be on the court at one.
BASKETBALL. ABOUT BASKETBALL Basketball is a sport in which two teams of five players each try to score points on one another by throwing a ball through.
Machine Learning Reading: Chapter Classification Learning Input: a set of attributes and values Output: discrete valued function Learning a continuous.
 The purpose of the Rep Team (All-Star) try outs:  Identify the players that will represent CBHMA for the Atom/Peewee/Bantam/Midget A (B and C if numbers.
Scholastic Bowl Coach Mrs. Ferree.
Training of mind in Football game
Chapter 7 Designing Competition Formats
Handball.
Heal Point System NHIAA February 10, 2009.
Chapter 7 Exploring Measures of Variability
3v3 Intramural Basketball
MARCH MADNESS.
Predictive Analytics at the NHL
Results for all features Results for the reduced set of features
Intramural Ultimate Frisbee
NCAA Basketball Tournament: Predicting Performance
Maroochydore R.S.L. Clippers
Data Science Algorithms: The Basic Methods
Chapter 2 Comparing Two Proportions
Higher: Factors Impacting on Performance- Physical - Tactics
Dipartimento di Ingegneria «Enzo Ferrari»,
School of Computing Science
Evaluating classifiers for disease gene discovery
NBA Draft Prediction BIT 5534 May 2nd 2018
Objectives The student will be able to: use Sigma Notation
Skill training Drill practice Modified and small-sided games
Jeff Harrison & Philip Tan
SPORTS ALGEBRA Center for Research in Mathematics and Science Education North Carolina State University Nancy Smith Michelle Longest.
Implementing AdaBoost
BASKETBALL.
NBA Analytics Zyrus Johnson.
Ensemble learning.
CORRECT STATSKEEPING PROCEDURE
Team Hand Ball.
YOUR TEACHERS HAVE GONE MAD!!!
National 5: Factors Impacting on Performance- Physical - Tactics
Jump-Shot Or Drive? (Using Mixed Strategy Nash Equilibria to Predict Player Behavior) By Patrick Long.
Basketball Position Classification
BASKETBALL.
March Madness Data Crunch Overview
Presentation transcript:

Jermaine Carn Advisor: Nick Webb Data Mining & Machine Learning Approach to Predicting the March Madness Tournament Jermaine Carn Advisor: Nick Webb

Motivation & Hypothesis To find a model for predicting games in the NCAA Basketball Tournament that relied on statistics. More objective approach. Gain in depth knowledge about specific matchup. Hypothesis: a hybrid model will provide higher classification accuracy than simply picking the higher seed. Mention what hybrid model is

Example Bracket 1 in 9.2 quintillion chance of correctly completing the bracket Former President Obama pulled out a 63% for the 2016 Tournament

Data How, What & Where 1008 total instances sports-reference.com/cbb/ (Season statistical totals): FGM, FGA, 3PM, FTM, FTA, ORB, DRB, TOV, OPP ORB, OPP DRB. basketball.realgm.com/ncaa/team-stats: ORB and DRB http://www.cbssports.com/collegebasketball/rankings/nitty-gritty- report: Rating Percetnage Index (RPI) Computed Dean Oliver’s Four Factors (eFG, TR, ORBR, FTR) from aforementioned stats. 1008 total instances There are box score stats available for each team like fga, fta. Scraped those from different websites Used 2000-2015 for training purposes and held out 2016 for testing.

Advanced Stats Rating Percentage Index (RPI) = (WP × 0.25) + (OWP × 0.50) + (OOWP × 0.25) Dean Oliver “Four Factors”: (40%) Effective Field Goal Percentage (eFG) Measure of shooting efficiency that accounts for three pointers being worth more. (25%) Turnover Ratio (TR) An estimate of turnovers per possession. (20%) Offensive Rebounding Percentage (ORBR) Measure of offensive rebounds obtained out of all possible offensive rebounds. (15%) Free Throw Rate (FTR) Measures how often a team gets to the free throw line. The RPI is a measure of strength of schedule. Strength of schedule refers to the degree of difficulty a given team opponents are in comparison to another team's opponents. The statistician Dean Oliver discusses a measurement called the ”Four Factors” that determine the success of a basketball team. The "Four Factors”, with their corresponding weights in parentheses.

Experiment 1: All Data 2000-2015 Data 1008 instance and all attributes Higher Seed: 64.6825% Algorithm ZeroR OneR J48 JRip IBk (k=1) SMO LibSVM Accuracy 49.6032 64.6825 82.0437 74.4048 83.4325 60.7143 86.8056 All algorithms were statistically worse than LibSVM at 5% level.

Experiment 2: Matchup Data Looking at 2 different matchups: Possible Upsets and Toss Ups. Possible Upsets Dataset Defined as when the difference in seeding is greater than 4. 286 instances Toss Up Dataset Defined as difference in seeding being no larger than 4. 394 instances Dataset for Possible Upset contains half upsets and half non upsets. Dataset for Toss Up consist of all possible toss up games. Make clear that the following is on 2000-2015 Mention that i am trying to learn over the entire tournament and look these different types of matchups

Experiment 2: Results Possible Upset Toss Ups Algorithm ZeroR OneR J48 JRip IBk (k=1) SMO LibSVM Higher Seed Accuracy 48.951 68.5315 79.7203 69.9301 81.1189 67.1329 84.6154 56.993 Algorithm ZeroR OneR J48 JRip IBk (k=1) SMO LibSVM Higher Seed Accuracy 47.9167 80.7292 87.5 85.4167 89.0625 72.3958 93.2292 61.4583 Using the WEKA interface, ran these 7 algorithms to find accuracy levels. A comparison of the algorithms resulted in all of the algorithms being statistically worse than LibSVM except for IBk in which there was no statistical difference all at P(x≥ .05).

Attribute Subset Selection Started with 35 attributes Possible Upset Final Subset (5 attributes): Free Throw Attempts (FTA), Defensive Rebounds (DRB), Field Goal Attempts (FGA 2), Free Throw Attempts (FTA 2), Defensive Rebounds (DRB 2) Toss Ups Final Subset (4 attributes): Field Goals Made (FGM), Field Goal Attempts (FGA), Free Throw Attempts (FTA), Free Throw Made (FTM) Note that I removed seed and year from the upset and toss up dataset and started with 35 attributes. Using WEKA and comparing different subsets using significance test, I found the following subsets for each matchup PROVIDE INSIGHT ON FINAL SUBSET!

Experiment 3: 2016 Data Baseline 1: Picking higher seed 68.254% Baseline 2: General Model LibSVM: 68.254% Hybrid Model Possible Upsets LibSVM: 75% Toss Ups IBk: 68.4211% Overall Accuracy: 73.02% Mention what the General Model is: using entire dataset and finding subset of attributes then testing the 2016 tournament over it. Hybrid is breaking the 2016 tournament into the two matchups than testing over the respective 2000-2015 data

Discussion & Future Work More data points Additional statistics Discuss what results and subsets tell me about the matchups (Reaffirms that its hard and where the subset tells me where the focus should be). Possible expansion to including regular season games as well. Then dividing those into Possible Upsets and Toss Ups. Additional statistics should be included to get a better picture of the identity of a team. Stats such as win/loss streak going into the tournament, True Shooting Percentage and other advanced stats. This may give us a better picture of the makeup of a team being that box scores stats can be influenced by the level of competition played over the season.