Football in the 90s Curtis Olswold University of Iowa 22S Honors Project.

Slides:



Advertisements
Similar presentations
American Football. The Field, Time of Game, and Players.
Advertisements

Lesson 10: Linear Regression and Correlation
Chapter 12 Simple Linear Regression
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
EPI 809/Spring Probability Distribution of Random Error.
Predicting the Winner of an NFL Football Game Matt Gray CS/ECE 539.
Oliver Reimer Matthew Crites Brian Jones.  Determine the winner of a NFL game between two teams.  How? ◦ What aspects of a team are most important in.
Objectives (BPS chapter 24)
Introduction Offensive strategies of the National Football League have seemingly shifted towards a “West Coast” style offense, relying more heavily on.
Fantasy Football By: Carson Barnette. Fantasy Football Fantasy football is a game based on the performance of NFL players How do you play? What does this.
Predicting Success in the National Football League An in-depth look at the factors that differentiate the winning teams from the losing teams. Benjamin.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
Linear Regression and Correlation
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Business Statistics: Communicating with Numbers By Sanjiv Jaggia.
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Multiple Regression Research Methods and Statistics.
Correlation and Regression Analysis
AM Recitation 2/10/11.
Introduction to Linear Regression and Correlation Analysis
Elements of Multiple Regression Analysis: Two Independent Variables Yong Sept
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Chapter 13: Inference in Regression
Simple Linear Regression
Linear Trend Lines Y t = b 0 + b 1 X t Where Y t is the dependent variable being forecasted X t is the independent variable being used to explain Y. In.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Understanding Statistics
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
Bivariate Regression Analysis The most useful means of discerning causality and significance of variables.
Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A.
NFL Play Predictions Will Burton, NCSU Industrial Engineering 2015
Multivariate Analysis. One-way ANOVA Tests the difference in the means of 2 or more nominal groups Tests the difference in the means of 2 or more nominal.
Regression. Population Covariance and Correlation.
Linear Regression Model In regression, x = independent (predictor) variable y= dependent (response) variable regression line (prediction line) ŷ = a +
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Maths Study Centre CB Open 11am – 5pm Semester Weekdays
College Prep Stats. x is the independent variable (predictor variable) ^ y = b 0 + b 1 x ^ y = mx + b b 0 = y - intercept b 1 = slope y is the dependent.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
Race and the Evaluation of Signal Callers in the National Football League David J. Berri (California State University, Bakersfield) Rob Simmons (Lancaster.
LESSON 6: REGRESSION 2/21/12 EDUC 502: Introduction to Statistics.
1 Modeling change Kristin Sainani Ph.D. Stanford University Department of Health Research and Policy
PRESENTATION TO MIS480/580 GABE HAZLEWOOD JOSH HOTTENSTEIN SCOTTIE WANG JAMES CHEN MAY 5, 2008 Betting in Super Bowl match ups.
Maths Study Centre CB Open 11am – 5pm Semester Weekdays
Chapter 8 Relationships Among Variables. Outline What correlational research investigates Understanding the nature of correlation What the coefficient.
1 Statistics 262: Intermediate Biostatistics Mixed models; Modeling change.
Construction Engineering 221 Probability and Statistics.
Stats Methods at IC Lecture 3: Regression.
Module II Lecture 1: Multiple Regression
15 Inferential Statistics.
Multiple Regression.
CHAPTER 3 Describing Relationships
Regression 10/29.
Chapter Six Normal Curves and Sampling Probability Distributions
Regression Analysis PhD Course.
Analysis of MLS Season Data Using Poisson Regression with R
Multiple Regression.
The Game of Football.
Simple Linear Regression
Additional notes on random variables
Additional notes on random variables
Regression Part II.
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Football in the 90s Curtis Olswold University of Iowa 22S Honors Project

Sunday Afternoon Ritual A teams strategy, or tactical approach to a game is not unique. There are three dominant types of offense that exist in the NFL: 1) Pass oriented 2) Run oriented 3) Balanced (Run and Pass oriented)

Purpose Construct a model to predict the points a team scores. Determine the probability that a team wins given certain factors. Investigate whether or not there exists a significant difference in the points a team scores by year and week.

Why Am I Doing This? Determine what and how certain variables affect the amount of points a team scores How effective the variables are at determining the outcome of the game: Win or Lose Rules change almost annually in the NFL to increase the amount of points a team scores, is it really working?

The Variables Score Rushing Yards Passing Yards Completions Outcome Passing Attempts Rushing Attempts Interceptions Fumbles

The Sample A random sample was drawn from the population of every regular season week from the 1990 season to the 1999 season. Individual team names were not identified From each week of each year, a sample of 5 teams were randomly chosen. This gave a sample of 850 observations.

Regression Model Score = * (Rushing Attempts) * (Rushing Yards) * (Passing Attempts) * (Pass Completions) * (Passing Yards) * (Intereceptions) * (Fumbles)

Statistics of the Model R 2 is the proportion of variability in Score that is explained by the model. Adjusted R 2 is a measure of how efficient the predictor variables are: Penalizes for overcomplicating the model. For this model: R-Square Adj R-Sq This indicates the model explains over 50% of the variability in score and is not overly complex

Significance of Predictors Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept ratt ryds <.0001 patt <.0001 comp pyds <.0001 int fumble <.0001

Interpretation of Significance Every parameter is significantly different from zero. This means that each of the variables constructively adds to the precision of the model. The significance level is 0.05, meaning there is only a 5% chance of wrongly rejecting the hypothesis that the parameter is zero, or does not help in prediction.

Interpretation of the Model For every rush attempt a team makes, the model predicts they will score 3/20 of a point. Every yard that a team gains on the ground suggests 3/50 of a point increase. When a rush attempt results in a fumble, the teams score will decrease by 1 and 9/100 of a point.

Interpretation of the Model As a team throws the ball more, for each pass attempt, they will decrease their score by 11/25 of a point. However, for every completion, they will increase their point total by 1/5 of a point. For every yard that is gained from the completion of a pass, a teams score increases by 2/25 of a point. If a pass attempt is intercepted, then their points scored will decrease by just over ½ of a point.

Interpretation of the Model All of this can be summed up quite simply: A rushing team is superior to a passing team.This is magnified if the team is able to gain substantial yardage per rush. Conversely, if a team passes many times, but completes a good percentage of them for good yardage, the effects of the pass attempt statistic are not as prevalent.

Examples of Prediction Actual Predicted Score Value % Confidence Level for the Mean

Explanation of the Predictions The above are the predictions for 9 observations of the sample. Obviously, none are exact, which should not be expected. They are, however, relatively close to the actual values.

Residual: Error vs. Predicted Test of First and Second Moment Specification DF Chi-Square Pr > ChiSq

Diagnostic Checking Residual, or Prediction Error: 1) Constant Variance: The plot shows that the variance is slightly shaped like a megaphone. The Cook and Weisberg 1 formal test indicates that the null hypothesis of constant variance cannot be rejected. 2) Normality: The regression coefficients do not rely upon residual normality assumption to be asymptotically normal 2.

Diagnostics Continued Variance Inflation The Variance Inflation is a measure of the multicollinearity (linear relationship among 2 or more predictors) of the variables. None of these indicates severe multicollinearity.

Example of a Run vs. Pass Oriented Offense If a team rushes the ball 35 times in a game gaining 150 yards with 1 fumble, passes 12 times for 115 yards and no interceptions, then on average it will score 23.5 points. If they throw an interception then the points scored reduces to Now suppose a team rushes 12 times for 75 yards without fumbling, passes 35 times, completing 19 for 315 yards with 2 interceptions. They will average points.

Example of a Balanced Offense For a team that rushes the ball 25 times for 110 yards with 1 fumble, passes 22 times and completing 12 for 145 yards and 2 interceptions will score on average points per game. If they only throw one interception, then the points scored becomes

Statistical Comparison of Offenses Definitions: 1) An offense is run oriented if its attempts are 1.5 times or greater than its pass attempts. 2) An offense is pass oriented if its pass attempts are 1.5 times or more than its rush attempts. 3) If a teams passing and rushing attempts are anywhere within 1.5 of each other, then it is balanced.

The ANOVA Procedure Tukey's Studentized Range (HSD) Test for score This test controls the Type I experimentwise error rate. Alpha 0.05 Error Degrees of Freedom 847 Error Mean Square Critical Value of Studentized Range Comparisons significant at the 0.05 level indicated by ***. Difference orient Between Simultaneous 95% Comparison Means Confidence Limits Run - Bala *** Run - Pass *** Bala - Pass ***

Interpretation of Comparisons There is a difference between the orientations of teams. In fact, they are all different from each other! Run oriented teams will actually score more points than both pass oriented and balanced offenses. Balanced offenses score more often than pass oriented teams. Why? Possibly due to the fact that more time is used by running the football than by passing.

Determining the Probability of Winning the Game A win is given a value of 1. If a team ties or loses, they are given a value of 0. The only variables that are in a coachs immediate control are whether they run or pass the ball on offense. For this reason, only rushing and passing attempts will be used as independent variables.

Distribution of Wins and Losses

Frequencies of Game Outcomes Cumulative Cumulative Outcome Frequency Percent Frequency Percent Loss Tie Win

Method of Analysis Logistic Regression will be used to model the probability that a team wins. The form of the model is: (e β0 + β1 * rushes +β2 *passes ) (1 + e β0 + β1 * rushes +β2 *passes )

Estimation of Parameters Analysis of Maximum Likelihood Estimates Standard Parameter DF Estimate Error Chi-Square Pr >ChiSq Intercept <.0001 ratt <.0001 patt <.0001

Interpretation of the Model Both Rushing and Passing Attempts are significant factors in determining the probability of winning a game. The parameter estimates are in the form of the natural logarithm. Odds Ratios will give more insight into how the model is affected by rushing and passing.

Fit of the Model Hosmer and Lemeshow Goodness-of-Fit Test Chi-Square DF Pr > ChiSq This shows that there is not evidence for lack of model fit.

Odds Ratios Point 95% Wald Effect Estimate Confidence Limits ratt patt

Interpretation of the Odds Ratios For a one attempt increase in Rushing Attempts, the odds in favor of winning are multiplied by For every one Pass Attempt, the odds in favor of winning are multiplied by

Conversion into Probabilities The equation for the probability of winning a game: P(Win) = 1 / ( 1 + e β0 + β1 * rushes +β2 *passes ) This yields: P(Win) = 1 / (1 + e – * rushes *passes )

Some Examples For a team rushing 25 times and passing 25 times, the model yields a probability of.458 that they will win the game, or a 45.8% chance they will win. If a team rushes 35 times and passes only 15 times, their probability of winning is.827, or nearly an 83% chance of victory. Now, say that team rushes only 15 times and passes 35 times, the probability changes to.129 or a 13% chance of winning.

What the Model Does Not Suggest Given the model predicts a higher success rate if a team rushes the ball, it may seem that a team should never pass. If this is done, the model gives a 98.5% chance of victory for 50 rushes and no passes. Obviously, if the other team knows you are never going to pass, you wont be able to move the ball 10 yards on 3 plays very consistently. This shows how real world circumstances arent always modeled perfectly.

Another Consideration The model also does not take into account middle of the game strategies. In other words, the farther you are behind, the more passes your team will attempt. Why? Less time is taken off of the game clock by passing.

Does the Year and Week affect Points Scored? Year in and year out rules are changed to increase scoring. Rule changes include: (1)2 point conversions allowed (2)Defensive Line Encroachment Rules (3)The 5-Yard Bump Rule on Receivers (4)Etc…….

2 Way ANOVA for Points Scored, Year and Week Sum of Source DF Squares Mean Square F Value Pr > F Model Error C Total R-Square Coeff Var Root MSE tscore Mean Source DF Type III SS Mean Square F Value Pr > F year week

Interpretation of the 2 Way ANOVA The model indicates there is not sufficient evidence to conclude that Year and Week have no effect on how many points are scored. This means that for any given week in any given year, the points scored by a team is not affected, in this model.

Conclusion All 3 statistical models point towards Rushing Attempts as being the important statistic in determining the points a team scores, and whether or not they win the game. Ball control is thus the essence to winning a football game. This is most readily seen by a team that rushes with consistency. A team is in better position to win if they can run and pass only occasionally.

References (1)Applied Linear Regression, 2nd Edition, pp Sanford Weisberg Publisher: John Wiley and Sons, 1985 (2)Applied Linear Statistical Models, 4th Edition, pp Neter, Kutner, Nachtsteim, Wasserman Publisher: Irwin (Chicago) 1996 (3) Professor Kate Cowles University of Iowa Department of Statistics and Actuarial Science (4) Data collected from: