Predicting Success in the National Football League An in-depth look at the factors that differentiate the winning teams from the losing teams. Benjamin.

Slides:



Advertisements
Similar presentations
Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
Advertisements

BA 275 Quantitative Business Methods
Logistic Regression Example: Horseshoe Crab Data
Logistic Regression.
Confidence Intervals Underlying model: Unknown parameter We know how to calculate point estimates E.g. regression analysis But different data would change.
Multiple Regression Predicting a response with multiple explanatory variables.
Zinc Data SPH 247 Statistical Analysis of Laboratory Data.
x y z The data as seen in R [1,] population city manager compensation [2,] [3,] [4,]
DJIA1 Beneath the Calm Waters: A Study of the Dow Index Group 5 members Project Choice: Hyo Joon You Data Retrieval: Stephen Meronk Statistical Analysis:
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Introduction to Logistic Regression Analysis Dr Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
7/2/ Lecture 51 STATS 330: Lecture 5. 7/2/ Lecture 52 Tutorials  These will cover computing details  Held in basement floor tutorial lab,
MATH 3359 Introduction to Mathematical Modeling Linear System, Simple Linear Regression.
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Business Statistics: Communicating with Numbers By Sanjiv Jaggia.
Crime? FBI records violent crime, z x y z [1,] [2,] [3,] [4,] [5,]
Regression Transformations for Normality and to Simplify Relationships U.S. Coal Mine Production – 2011 Source:
How to plot x-y data and put statistics analysis on GLEON Fellowship Workshop January 14-18, 2013 Sunapee, NH Ari Santoso.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
SPH 247 Statistical Analysis of Laboratory Data May 19, 2015SPH 247 Statistical Analysis of Laboratory Data1.
9/14/ Lecture 61 STATS 330: Lecture 6. 9/14/ Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.
Lecture 15: Logistic Regression: Inference and link functions BMTRY 701 Biostatistical Methods II.
SWC Methodology - TWG February 19, 2015 Settlement Document Subject to I.R.E. 408.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Analysis of Covariance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
© Department of Statistics 2012 STATS 330 Lecture 26: Slide 1 Stats 330: Lecture 26.
© Department of Statistics 2012 STATS 330 Lecture 25: Slide 1 Stats 330: Lecture 25.
Logistic Regression Pre-Challenger Relation Between Temperature and Field-Joint O-Ring Failure Dalal, Fowlkes, and Hoadley (1989). “Risk Analysis of the.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Repeated Measures  The term repeated measures refers to data sets with multiple measurements of a response variable on the same experimental unit or subject.
Regression and Analysis Variance Linear Models in R.
Exercise 8.25 Stat 121 KJ Wang. Votes for Bush and Buchanan in all Florida Counties Palm Beach County (outlier)
Lecture 9: ANOVA tables F-tests BMTRY 701 Biostatistical Methods II.
Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these.
November 5, 2008 Logistic and Poisson Regression: Modeling Binary and Count Data LISA Short Course Series Mark Seiss, Dept. of Statistics.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Regression Model Building LPGA Golf Performance
Byron Gangnes Econ 427 lecture 3 slides. Byron Gangnes A scatterplot.
Using R for Marketing Research Dan Toomey 2/23/2015
FACTORS AFFECTING HOUSING PRICES IN SYRACUSE Sample collected from Zillow in January, 2015 Urban Policy Class Exercise - Lecy.
Exercise 1 The standard deviation of measurements at low level for a method for detecting benzene in blood is 52 ng/L. What is the Critical Level if we.
Lecture 11 Multicollinearity BMTRY 701 Biostatistical Methods II.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Environmental Modeling Basic Testing Methods - Statistics III.
A preliminary exploration into the Binomial Logistic Regression Models in R and their potential application Andrew Trant PPS Arctic - Labrador Highlands.
Applied Statistics Week 4 Exercise 3 Tick bites and suspicion of Borrelia Mihaela Frincu
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Linear Models Alan Lee Sample presentation for STATS 760.
© Department of Statistics 2012 STATS 330 Lecture 22: Slide 1 Stats 330: Lecture 22.
Logistic Regression. Example: Survival of Titanic passengers  We want to know if the probability of survival is higher among children  Outcome (y) =
© Department of Statistics 2012 STATS 330 Lecture 24: Slide 1 Stats 330: Lecture 24.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
© 2000 Prentice-Hall, Inc. Chap Chapter 10 Multiple Regression Models Business Statistics A First Course (2nd Edition)
The Effect of Race on Wage by Region. To what extent were black males paid less than nonblack males in the same region with the same levels of education.
Measuring NFL team performance by quarterback stats A neural networks approach By David Michlig EC 539.
Logistic Regression. What is the purpose of Regression?
R Programming/ Binomial Models Shinichiro Suna. Binomial Models In binomial model, we have one outcome which is binary and a set of explanatory variables.
Logistic Regression Jeff Witmer 30 March Categorical Response Variables Examples: Whether or not a person smokes Success of a medical treatment.
Transforming the data Modified from:
Logistic regression.
CHAPTER 3 Describing Relationships
Résolution de l’ex 1 p40 t=c(2:12);N=c(55,90,135,245,403,665,1100,1810,3000,4450,7350) T=data.frame(t,N,y=log(N));T; > T t N y
Measuring Success in Prediction
Console Editeur : myProg.R 1
SAME THING?.
Regression Transformations for Normality and to Simplify Relationships
Presentation transcript:

Predicting Success in the National Football League An in-depth look at the factors that differentiate the winning teams from the losing teams. Benjamin Rollins Center for Quality and Applied Statistics Rochester Institute of Technology

NFL background 32 teams 2 leagues NFC and AFC 16 game regular season 1 bye

Why NFL? Many theories as to optimal style The year of the quarterback vs. ground and pound style Third-down conversion is always thought to be important, but is it? Also, is it the most important?

Data Game by game for seasons 2000 to variables per team Start with analyzing just 2012 Score Rush YardsPass AttemptsPass CompletionsPass Yards InterceptionsFumbles# of SacksSack Yards Penalty YardsFirst DownsThird Down %Rush Attempts

Third Down Conversion Percentage for each team.

Pattern?

Game by Game Summaries of the season data do not fully capture what we want to show. Each game is unique Use difference of variables

Response of Score Quantitative If a team scored more, then it is obvious that they won.

Is There a Relationship?

Regression

Linear Model Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) ThirdDown% <2e-16 *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 254 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 254 DF, p-value: < 2.2e-16 Multicollinearity?

Is it in the reduced model? Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) FirstDownDiff < 2e-16 *** ThirdDownPctDiff e-07 *** RushAttDiff e-06 *** PassAttDiff < 2e-16 *** PassYdsDiff e-08 *** PassIntDiff < 2e-16 *** FumblesDiff < 2e-16 *** SackNumDiff e-10 *** PenYdsDiff *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 246 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 9 and 246 DF, p-value: < 2.2e-16

Is there a better response? Is score difference our true goal?

Response of win Most games in the NFL are close The result of the game is much more important Binary

Logistic Regression Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) FirstDownDiff *** ThirdDownPctDiff ** PassAttDiff *** PassYdsDiff *** PassIntDiff *** FumblesDiff *** PenYdsDiff *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: on 255 degrees of freedom Residual deviance: on 248 degrees of freedom AIC:

Interpretation

How good is this model? FullPredicted ActualLossWin Loss11316 Win9118 Overall error: TestPredicted ActualLossWin Loss184 Win314 Overall error:

Tree Analysis

Comparison over the years

Conclusion Rush attempts is a very important variable in predicting the result of an NFL game. Third down conversion percentage is important as well. Less mistakes, more carries and a better third down conversion percentage usually results in a better team.

Questions? Data Source o Benjamin Rollins Center for Quality and Applied Statistics Rochester Institute of Technology