Predicting Success in the National Football League An in-depth look at the factors that differentiate the winning teams from the losing teams. Benjamin Rollins Center for Quality and Applied Statistics Rochester Institute of Technology
NFL background 32 teams 2 leagues NFC and AFC 16 game regular season 1 bye
Why NFL? Many theories as to optimal style The year of the quarterback vs. ground and pound style Third-down conversion is always thought to be important, but is it? Also, is it the most important?
Data Game by game for seasons 2000 to variables per team Start with analyzing just 2012 Score Rush YardsPass AttemptsPass CompletionsPass Yards InterceptionsFumbles# of SacksSack Yards Penalty YardsFirst DownsThird Down %Rush Attempts
Third Down Conversion Percentage for each team.
Pattern?
Game by Game Summaries of the season data do not fully capture what we want to show. Each game is unique Use difference of variables
Response of Score Quantitative If a team scored more, then it is obvious that they won.
Is There a Relationship?
Regression
Linear Model Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) ThirdDown% <2e-16 *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 254 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 254 DF, p-value: < 2.2e-16 Multicollinearity?
Is it in the reduced model? Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) FirstDownDiff < 2e-16 *** ThirdDownPctDiff e-07 *** RushAttDiff e-06 *** PassAttDiff < 2e-16 *** PassYdsDiff e-08 *** PassIntDiff < 2e-16 *** FumblesDiff < 2e-16 *** SackNumDiff e-10 *** PenYdsDiff *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 246 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 9 and 246 DF, p-value: < 2.2e-16
Is there a better response? Is score difference our true goal?
Response of win Most games in the NFL are close The result of the game is much more important Binary
Logistic Regression Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) FirstDownDiff *** ThirdDownPctDiff ** PassAttDiff *** PassYdsDiff *** PassIntDiff *** FumblesDiff *** PenYdsDiff *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: on 255 degrees of freedom Residual deviance: on 248 degrees of freedom AIC:
Interpretation
How good is this model? FullPredicted ActualLossWin Loss11316 Win9118 Overall error: TestPredicted ActualLossWin Loss184 Win314 Overall error:
Tree Analysis
Comparison over the years
Conclusion Rush attempts is a very important variable in predicting the result of an NFL game. Third down conversion percentage is important as well. Less mistakes, more carries and a better third down conversion percentage usually results in a better team.
Questions? Data Source o Benjamin Rollins Center for Quality and Applied Statistics Rochester Institute of Technology