Download presentation
Presentation is loading. Please wait.
1
Football for KMS: NFL ‘01 Abhijit Kumar Kaijia Bao Vishal Rupani APRIL 30 TH 2008 Course Instructor: Prof. Hsinchun Chen
2
Agenda Data Collection Client Relations Final Presentation Knowledge Discovery Statistical Analysis Data Mining Techniques Key Findings KMS Demonstration ABHI VISHALKAI Objectives Literature Overview Conclusion Data Cleaning Statistical Analysis Final Paper Data Import Data Transformation Data Mining
3
Research Objectives Pattern identification Descriptive Statistics Data Mining Techniques Prediction Developing a strategy Fantasy League
4
Literature Overview Moneyball: The Art of Winning an Unfair Game Michael Lewis Las Vegas Odds www.VegasInsider.com NFL Fantasy League www.Nfl.com/fantasy
5
Knowledge Discovery Process DATA Pro-Football -3 Tables -40 Columns -82,346 Rows Lisa Ordonez -1 Table -90 Columns -50,417 Rows SQL 2005 IS TRANSFORMATION Dependent Variables Calculated Variables Independent Variables SQL 2005 AS Play Decision, Intended Player, Play Direction, Yards GameNum, IsPlayChal, PlayZone, TotalOffTO, PlayDecision, QtrTimeLeft, HalfTimeLeft, GameTimeLeft Defense, Down, GAP, Halftime Left, Off Ydl, Offense, Play Zone, QTR, ToGo, Total Off TO
6
Knowledge Discovery Process DATA Pro-Football -3 Tables -40 Columns -82,346 Rows Lisa Ordonez -1 Table -90 Columns -53,000 Rows SQL 2005 IS TRANSFORMATION Dependent Variables Calculated Variables Independent Variables SQL 2005 AS PROCESSING Simple Statistics -Play Decision -Intended Player -Play Direction -Yards MS Excel 2007 MINING Models - ID3 - Neural Networks Accuracy -Lift Charts -Classification Matrix SQL 2005 AS
7
Dependency Network
9
Intended Player: Statistics Top 3 Intended Players for Passes for the 4 teams that played in the semi-finals H.Ward (142), P.Burress (121), B.Shaw (44) T.Brown (143), D.Patten (93), M.Edwards (39) T.Holt (133), M.Faulk (104), I.Bruce (103) J.Thrash (107), D.Staley (89), T.Pinkston (83)
10
Play Direction: Statistics Direction of Rushes for all plays in 2001 season Middle Left Tackle Right GuardRight End Middle Right Tackle Left GuardLeft End
11
Play Direction: Statistics Direction of Rushes for all plays in 2001 season Number of Rushes Direction
12
Yardage: Statistics Yardage during each down for Pass and Rush Yards To Go Average Yards Covered Passes Rushes
13
Play Decision: Statistics Play Decisions for the 4 teams that played in the semi-finals Number of Decisions Play Decision Type
14
Play Decision: Analysis Overview Discovery of what environmental and/or game factors affect play decision Discovery of football expert knowledge through data mining Prediction of play decisions based on game factors
15
Play Decision: ID3 Analysis
17
Play Decision: Accuracy
18
Rush Accuracy: Lift Chart
19
Field Goal Accuracy: Lift Chart
20
Play Decision: Classification Matrix
21
Play Decision: Key Findings Football strategy can be discovered through data, instead of knowledge experts Top 3 factors affecting decision: Down, Off Ydl, Time Accuracy of the models are different depending on the decision we are trying to predict Team specific strategies may be discovered with more data.
22
Play Direction: Analysis Overview Discover team’s strengths and weakness in their defense and/or offense Prediction of play directions based on game factors Middle Left Tackle Right GuardRight End Middle Right Tackle Left GuardLeft End
23
Play Direction: Accuracy
24
Play Direction: Key Findings (ID3)
25
Intended Player: Analysis Overview Discover each team’s favored recipient of a pass Prediction of intended player based on game factors
26
Intended Player: Lift Chart
27
Intended Player: Key Findings There are 400+ intended players Not enough data to accurately predict intended players Not enough data to gain knowledge over statistical models
28
Conclusions PLAY DECISION - Accurate - Gained Knowledge PLAY DIRECTION - Less accurate - Enough data to gain knowledge INTENDED PLAYERS - Insufficient data - No knowledge gained - Need to increase sample size
29
Future Direction Increase sample set More instances of different scenarios Incorporate additional information Pro-football-Reference.com VegasInsider.com (Odds for favorites) Extend Analysis Nested case (Historical performance)
30
References Prof. Lisa Ordóñez Professor in Statistics Steve Aldrich Author of Moneyball in Football About Football Glossary of terms
31
DATA Pro-Football -3 Tables -40 Columns -82,346 Rows Lisa Ordonez -1 Table -90 Columns -53,000 Rows SQL 2005 IS TRANSFORMATION Dependent Variables Calculated Variables Independent Variables SQL 2005 AS PROCESSING Simple Statistics -Play Decision -Intended Player -Play Direction -Yards MS Excel 2007 MINING Models - ID3 - Neural Networks Accuracy -Lift Charts -Classification Matrix SQL 2005 AS Knowledge Discovery Process
32
Research Objectives Literature Overview Knowledge Discovery Statistics: Intended Player Statistics: Play Direction Statistics: Yardage Statistics: Play Decision Accuracy: Lift Chart Charts Analysis: Play Decision Analysis: Play Direction Analysis: Intended Player Conclusions Future Directions System Design
33
Backup Slide Section
34
Data Collection Football Outsiders Pro-Football Initial Dataset Cleaning Hierarchy Relevance Processing Dependent Independent Calculated Analysis 55,000 rows 90 columns 47,033 rows 30 columns Dependent – 4 Independent – 10 Calculated - 9
35
System Design NFL Season 2001 FOOTBALL DATA DB NFL KMS Model Building Testing/ Accuracy Pattern Analysis Formations Substitutions Play Decisions FIELD STRATEGY DEFENSE STRATEGY METRICS Accuracy Performance
36
Yards Analysis Yards gained on the play is used as a metric to measure effort Discover how environmental and/or game factors affect player’s efforts Key Findings: Top 4 environmental factors Off Ydl Time Down Gap
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.