By Andrew Finley. Research Question Is it possible to predict a football player’s professional based on collegiate performance? That is, is it possible.

Slides:



Advertisements
Similar presentations
Projects Data Representation Basic testing and evaluation schemes
Advertisements

Florida International University COP 4770 Introduction of Weka.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
Decision Tree Approach in Data Mining
Weka & Rapid Miner Tutorial By Chibuike Muoh. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering.
Data Analysis of Tennis Matches Fatih Çalışır. 1.ATP World Tour 250  ATP 250 Brisbane  ATP 250 Sydney... 2.ATP World Tour 500  ATP 500 Memphis  ATP.
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
Indian Statistical Institute Kolkata
Alberto Trindade Tavares ECE/CS/ME Introduction to Artificial Neural Network and Fuzzy Systems.
Predicting the Winner of an NFL Football Game Matt Gray CS/ECE 539.
Machine Learning in Practice Lecture 3 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
What is Statistical Modeling
Fantasy Football : NFL Score Predictor Matt Grecco Abhishek Goyal.
Feature Selection Presented by: Nafise Hatamikhah
Senior Project – Computer Science Machine Learning in Football Andrew Finley Advisor – Prof. Striegnitz Research Question: Every year there are.
Final Project: Project 9 Part 1: Neural Networks Part 2: Overview of Classifiers Aparna S. Varde April 28, 2005 CS539: Machine Learning Course Instructor:
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Three kinds of learning
1 Homework  What’s important (i.e., this will be used in determining your grade): Finding features that make a difference You should expect to do some.
An Extended Introduction to WEKA. Data Mining Process.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Rating Systems Vs Machine Learning on the context of sports George Kyriakides, Kyriacos Talattinis, George Stefanides Department of Applied Informatics,
MEASURING AND PREDICTING UW BADGERS’S PERFORMANCE BY QUARTERBACK AND RUNNING BACK STATS By: Tyler Chu ECE 539 Fall 2013.
Chapter 5 Data mining : A Closer Look.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Basic Data Mining Techniques
An Exercise in Machine Learning
 The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.
COMP3503 Intro to Inductive Modeling
The Concept of Fantasy Football By Eric VanRemortel.
WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior.
Weka: a useful tool in data mining and machine learning Team 5 Noha Elsherbiny, Huijun Xiong, and Bhanu Peddi.
The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
Introduction to Weka Xingquan (Hill) Zhu Slides copied from Jeffrey Junfeng Pan (UST)
An Exercise in Machine Learning
Data Mining and Decision Support
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Copyright  2004 limsoon wong Using WEKA for Classification (without feature selection)
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Modeling Fantasy Football Quarterbacks Myles Wallin Kyle Zeberlein MAY 4, 2016 CELEBRATION OF LEARNING AUGUSTANA COLLEGE.
Machine Learning Reading: Chapter Classification Learning Input: a set of attributes and values Output: discrete valued function Learning a continuous.
Introduction to Data Science Lecture 7 Machine Learning Overview
Classification with Perceptrons Reading:
Machine Learning Week 1.
Weka Package Weka package is open source data mining software written in Java. Weka can be applied to your dataset from the GUI, the command line or called.
Prepared by: Mahmoud Rafeek Al-Farra
Machine Learning with Weka
Artificial Neural Networks for the NFL Draft
Classification and Prediction
College Football Playoff Composition Prediction using Machine Learning
CSCI N317 Computation for Scientific Applications Unit Weka
Machine Learning in Practice Lecture 23
Intro to Machine Learning
Lecture 10 – Introduction to Weka
Basketball Position Classification
Data Mining CSCI 307, Spring 2019 Lecture 7
An introduction to Machine Learning (ML)
Data Mining CSCI 307, Spring 2019 Lecture 8
Presentation transcript:

By Andrew Finley

Research Question Is it possible to predict a football player’s professional based on collegiate performance? That is, is it possible to accurately predict some player’s NFL statistic using only their collegiate statistics? Why – Too many “busts” How – Gather statistics for both NCAA and NFL players Use statistics and ML algorithms to train a program Use program to predict unseen examples

Presentation Outline Related Works Alternate applications of machine learning in sport My Approach Machine Learning - Classification Decision Tree Algorithm Implementation Statistics to predict Gather and Format Statistics Insert into Weka (ML software) Build Decision Tree Results and Analysis Cross-validation Feature Selection

Related Works Mr. NFL/NCAA (Predicts Games) Classification using Linear Regression on Team Statistics FFtoday.com (Predicts Fantasy Football Stats) Linear Regression on Fantasy Football Statistics Draft Tek (Predicts NFL Draft) Ranks college players and takes a matrix of team needs at every position SABRmetrics Use statistical analysis to create new baseball statistics Example: RUNS = (.41) 1B + (.82) 2B + (1.06) 3B + (1.42) HR

Machine Learning Type – Supervised Learning (Classification) Program is given a set of examples (instances) from which it learns to classify unseen examples Each instance is a set of attribute values and with a known class The goal is to generate a set of rules that will correctly classify new examples Algorithm: Decision Tree

Create a graph (tree) from the training data. The leaves are the classes, and branches are attribute values Goal is to make the smallest tree possible that covers all instances Use the tree to make a set of classification rules

My Data I narrowed my predictions down to just Quarterbacks and Running backs Input (NCAA): Individual and team stats from every year of college play, as well as team rankings and strength of schedule, and height and weight Combine data not included due to lack of participation Output (NFL): RB: Yrds/Carry, Total Rushing Yards, and Rushing TDs, for each of first 3 seasons, starting after 3 seasons QB: Total Passing Yards, Passing TDs, Interceptions, and QB Rating, for each of first 3 seasons, starting after 3 seasons

Data Retrieval Step 1 – Find statistics Online: NFL.com, NCAA.org Collegio Football: Database Software Step 2 – Extract data Python scripts parsed necessary statistics off websites Statistics from Collegio were exported manually Step 3 – Convert data into correct format Python scripts used to combine data into 2 large.csv files for, one for RB and one for QB Missing data is filled in as accurately as possible

Example PlayerSchoolYear1Pos1Cl1G1Rush Yds1Car1Rush TD1Yds/Car1RushYds/G1Rec Yds1Rec1Rec TD1Yds/Rec1Rec/G1RecYds/G1PR1PR Yds1PR TD1Yds/PR1PR/G1KR1KR Yds1KR TD1Yds/KR1KR/G1Ret TD1Tot Yds1Tot TD1TotYds/G1 Ronnie BrownAuburn2002RBSo Year2Pos2Cl2G2Rush Yds2Car2Rush TD2Yds/Car2RushYds/G2Rec Yds2Rec2Rec TD2Yds/Rec2Rec/G2RecYds/G2PR2PR Yds2PR TD2Yds/PR2PR/G2KR2KR Yds2KR TD2Yds/KR2KR/G2Ret TD2Tot Yds2Tot TD2TotYds/G2 2003RBJr Year3Pos3Cl3G3Rush Yds3Car3Rush TD3Yds/Car3RushYds/G3Rec Yds3Rec3Rec TD3Yds/Rec3Rec/G3RecYds/G3PR3PR Yds3PR TD3Yds/PR3PR/G3KR3KR Yds3KR TD3Yds/KR3KR/G3Ret TD3Tot Yds3Tot TD3TotYds/G3 2004RBSr HeightWeight 6'-1''230 Season1Team1G1GS1Att1RushYds1RushAvg1RushLng1RushTD1Rec1RecYds1RecAvg1RecLng1RecTD1FUM1Lost1Starting 2005MiamiDolphins TRUE Season2Team2G2GS2Att2RushYds2RushAvg2RushLng2RushTD2Rec2RecYds2RecAvg2RecLng2RecTD2FUM2Lost2Starting 2006MiamiDolphins TRUE Season3Team3G3GS3Att3RushYds3RushAvg3RushLng3RushTD3Rec3RecYds3RecAvg3RecLng3RecTD3FUM3Lost3Starting 2007MiamiDolphins TRUE Blue = NCAA data Red = NFL data

Weka Data Processing Weka is a machine learning algorithm database built in Java. Only accepts.csv files in particular format. Preprocessing: Apply filters to fix missing stats Remove all NFL data except statistic being predicted Classify the desired statistic: if numeric separate into ranges, if nominal separate by values. Specify attributes

Building the Tree Tree is constructed from specified attributes. Weka converts tree to classification rules. Accuracy is measured using cross validation. Cross validation: Break the training data into a specified number of sets, use each set once as the test data, while the rest is used as training data.

Initial Results Initial runs with all attributes used failed; created a 1 layer tree mapped to false for predicted statistic. The accuracy varies greatly with slight changes to attributes used. Tree size seems to increase as the attributes used decreases.

Analysis The initial 1 layer tree that was built gave an accuracy of 68%. This is the worst possible tree, so I should be able to get accuracy better than this. Attribute selection needs to improve.

Next Improve attribute selection to optimize accuracy. (If time) Implement other algorithms to compare accuracy.

Questions?