Analysis of MLS Season Data Using Poisson Regression with R

Slides:



Advertisements
Similar presentations
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Advertisements

Probability & Statistical Inference Lecture 9
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
EPI 809/Spring Probability Distribution of Random Error.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
PSY 307 – Statistics for the Behavioral Sciences
Statistics for Managers Using Microsoft® Excel 5th Edition
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
Chapter Topics Types of Regression Models
1 4. Multiple Regression I ECON 251 Research Methods.
Ch. 14: The Multiple Regression Model building
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 6 Chicago School of Professional Psychology.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Richard M. Jacobs, OSA, Ph.D.
Simple Linear Regression Analysis
Relationships Among Variables
Smith/Davis (c) 2005 Prentice Hall Chapter Eight Correlation and Prediction PowerPoint Presentation created by Dr. Susan R. Burns Morningside College.
Lecture 5 Correlation and Regression
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Hypothesis Testing in Linear Regression Analysis
Lecture 14 Multiple Regression Model
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Regression Lesson 11. The General Linear Model n Relationship b/n predictor & outcome variables form straight line l Correlation, regression, t-tests,
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
Lecture 4 Introduction to Multiple Regression
Correlation and Regression: The Need to Knows Correlation is a statistical technique: tells you if scores on variable X are related to scores on variable.
Regression Continued. Example: Y [team finish] =  +  X [spending] Values of the Y variable (team finish) are a function of some constant, plus some.
LESSON 6: REGRESSION 2/21/12 EDUC 502: Introduction to Statistics.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Multiple Regression David A. Kenny January 12, 2014.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Multiple Regression Scott Hudson January 24, 2011.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Chapter 15 Multiple Regression Model Building
Introduction to Inference
More Multiple Regression
Chapter 14 Introduction to Multiple Regression
CHAPTER 12 MODELING COUNT DATA: THE POISSON AND NEGATIVE BINOMIAL REGRESSION MODELS Damodar Gujarati Econometrics by Example, second edition.
Huyen Nguyen, Dung Phan, and Girish Shirodkar
Statistical Data Analysis - Lecture /04/03
AP Statistics Chapter 14 Section 1.
Regression Chapter 6 I Introduction to Regression
Chapter 11: Simple Linear Regression
Multiple Regression Analysis and Model Building
12 Inferential Analysis.
Chapter 15 Linear Regression
BIVARIATE REGRESSION AND CORRELATION
Simple Linear Regression
Multiple Regression – Part II
6-1 Introduction To Empirical Models
More Multiple Regression
More Multiple Regression
What is Regression Analysis?
Simple Linear Regression
12 Inferential Analysis.
Correlation and Regression
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Analysis of 2016-17 MLS Season Data Using Poisson Regression with R Ian Campbell Dr. Bahaeddine Taoufik Dr. Nancy Cowden Dr. Kevin Peterson

Main Idea The main goal of this project is to analyze the MLS 2016-17 season and explore the statistical software R by employing a Poisson regression model to make inferences about how goals are scored based on predictor variables i.e. passes, possession time, shots etc... And explore R

The Data MLS 2016-17 Season Table 1: Team Data Per Game Each entry corresponds to an individual game

Variable Correlation

Goals per Match 574 Entries Discrete Data Right skewed Figure 1: Illustration of the distribution of goals per match

Poisson Distribution X is the discrete variable k=1,2,3,… 𝑓 𝑘;𝜆 = Pr 𝑋=𝑘 = 𝜆 𝑘 𝑒 −𝜆 𝑘! X is the discrete variable k=1,2,3,… 𝜆=mean of discrete variable

Poisson Distribution Key assumptions for the Poisson Distribution Independence: The number of goals is not affected by the time in the match Homogeneity: All variables are independent and independent of each other. Time Period is constant Mean and Variance of the Poisson distribution are the same Testing and graphing done with R Goals per Match Observed Vs. Expected Count Observed Mean=1.483 Observed Variance=1.629 Difference=.146 Figure 2: Actual vs. Poisson distribution

Poisson Regression Model ln 𝑦ˆ 𝑖 = 𝑏 0 + 𝑏 1 𝑋 𝑖,1 + 𝑏 2 𝑋 𝑖,2 + 𝑏 3 𝑋 𝑖,3 +…+ 𝑏 𝑘 𝑋 𝑖,𝑘 yi = Predicted response Xi = Predictor Variables b0 = Estimated intercept b1 -> bk = Estimated coefficients ln( 𝑦ˆ 1 ) ⋮ ⋮ ln( 𝑦ˆ 𝑛 ) = 1 𝑋 1,1 𝑋 1,2 𝑋 1,3 𝑋 1,4 … 𝑋 1,𝑘 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 1 𝑋 𝑛,1 𝑋 𝑛,2 𝑋 𝑛,3 𝑋 𝑛,4 … 𝑋 𝑛,𝑘 𝑏 0 𝑏 1 𝑏 2 . . . 𝑏 𝑘 1≤𝑖≤𝑛

Data Analysis -H0: The response variable, goals scored, is over-dispersed -Ha: The response variable, goals scored, is not over-dispersed Must run final test with R to verify over-dispersion Model will follow the equation: Predictor Variables Passes made (Passes) Amount of time that team possessed the ball during the 90-minute game (Possesion.Rate) Red cards received (Red.Cards) Corners taken (Corners) Free kicks taken (Free.Kicks) Penalty kicks taken (Penalty.Kicks) Shots taken (Shots), Shots on target (SoT) ln Goals Scored = b 0 + b 1 Passes + b 2 Possesion + b 3 Shots + b 4 SoT + b 5 (Corners)+ b 6 (Penalty Kicks) + b 7 (Red Cards)+ b 8 (Free Kicks)

R Analysis Insignificant variables Red cards Free kicks Adjusted Equation: 𝐥𝐧 𝐆𝐨𝐚𝐥𝐬 𝐒𝐜𝐨𝐫𝐞𝐝 = 𝐛 𝟎 + 𝐛 𝟏 𝐏𝐚𝐬𝐬𝐞𝐬 + 𝐛 𝟐 𝐏𝐨𝐬𝐬𝐞𝐬𝐢𝐨𝐧 + 𝐛 𝟑 𝐒𝐡𝐨𝐭𝐬 + 𝐛 𝟒 𝐒𝐨𝐓 + 𝐛 𝟓 (𝐂𝐨𝐫𝐧𝐞𝐫𝐬)+ 𝐛 𝟔 (𝐏𝐞𝐧𝐚𝐥𝐭𝐲 𝐊𝐢𝐜𝐤𝐬)

Adjusted Quasi-Poisson Model Significance values increased Standard errors down Estimations improved Final Equation: ln 𝐺𝑜𝑎𝑙𝑠 𝑆𝑐𝑜𝑟𝑒𝑑 =−2.92 𝑒 −1 +1.64 𝑒 −3 𝑃𝑎𝑠𝑠𝑒𝑠 −1.39 𝑒 −2 𝑃𝑜𝑠𝑠𝑒𝑠𝑖𝑜𝑛 +3.81 𝑒 −2 𝑆ℎ𝑜𝑡𝑠 +6.23 𝑒 −2 𝑆𝑜𝑇 −1.88 𝑒 −2 (𝐶𝑜𝑟𝑛𝑒𝑟𝑠)+3.24 𝑒 −1 (𝑃𝑒𝑛𝑎𝑙𝑡𝑦 𝐾𝑖𝑐𝑘𝑠)

Interpretation The interpretation of each variable is true only when increasing that particular variable by one unit, i.e. one pass or one shot, and holding all other variables constant. Variable Estimated Coefficient Interpretation Passes 1.64e-3 (e0.00164-1) x 100%=0.164% For every pass made, the chance of scoring increases by 0.164% on average Shots 3.81e-2 (e0.0381-1) x 100%=3.88% For every shot taken, the chance of scoring increases by 3.88% on average Shots on Target 6.23e-2 (e0.0623-1) x 100%=6.42% For every shot on target, the chance of scoring increases by 6.42% average Possession -1.39e-2 (e-0.0139-1) x 100%=-1.38% For every minute of possession a team has, the chance of scoring decreases by 1.38% on average Penalty Kicks 3.24e-1 (e0.324-1) x 100%=38.38% For every penalty kick awarded, the chance of scoring increases by 38.38% on average Corners -1.88e-2 (e-0.0188-1) x 100%=-1.86% For every corner taken, the chance of scoring decreases by 1.86%

Penalty Kicks Average speed: 70mph Reaches goal line in 0.7 seconds 24’ 8’ 36’ Average speed: 70mph Reaches goal line in 0.7 seconds Average time to reach either post: 0.6 seconds Average human reaction time: 0.25 seconds

Penalty Kick Dataset 93 entries All penalty kicks taken; missed or scored

Over-Dispersion Test P-value greater than 0.05 -H0: The response variable, goals scored, is over-dispersed -Ha: The response variable, goals scored, is not over-dispersed P-value greater than 0.05

Penalty Kick Poisson Adjusted estimation Previous value: 0.324 Smaller data set resulting in lower significance values Adjusted model ln 𝐺𝑜𝑎𝑙𝑠 𝑆𝑐𝑜𝑟𝑒𝑑 =−2.92 𝑒 −1 +1.34 𝑒 −3 𝑃𝑎𝑠𝑠𝑒𝑠 −2.32 𝑒 −2 𝑃𝑜𝑠𝑠𝑒𝑠𝑖𝑜𝑛 +8.17 𝑒 −3 𝑆ℎ𝑜𝑡𝑠 +7.77 𝑒 −2 𝑆𝑜𝑇 −2.76 𝑒 −3 (𝐶𝑜𝑟𝑛𝑒𝑟𝑠)+5.95 𝑒 −1 (𝑃𝑒𝑛𝑎𝑙𝑡𝑦 𝐾𝑖𝑐𝑘𝑠)

Interpretation Variable Estimated Coefficient Interpretation The interpretation of each variable is true only when increasing that particular variable by one unit, i.e. one pass or one shot, and holding all other variables constant. Variable Estimated Coefficient Interpretation Penalty Kicks 5.95e-1 (e0.5.95-1) x 100%=81% For penalty kick taken, the chance of scoring increases by 80% on average Shots 8.17e-3 (e0.0817-1) x 100%=8.51% For every shot taken, the chance of scoring increases by 8.51% on average Shots on Target 7.77e-2 (e0.0777-1) x 100%=8.07% For every shot on target, the chance of scoring increases by 8.07% average Possession -2.32e-2 (1-e-0.0232) x 100%=-2.29% For every minute of possession a team has, the chance of scoring decreases by 2.29% on average Passes 1.34e-3 (e0.00164-1) x 100%=0.164% For every pass made, the chance of scoring increases by 0.164% on average Corners -2.76e-3 (1-e-0.0188) x 100%=-1.86% For every corner taken, the chance of scoring decreases by 1.86%

No Penalty Kick Dataset 481 data points Games that did not have penalty kicks present

Over-Dispersion Test P-value greater than 0.05 -H0: The response variable, goals scored, is over-dispersed -Ha: The response variable, goals scored, is not over-dispersed P-value greater than 0.05

Non-Penalty Kick Poisson Model Equation: ln 𝐺𝑜𝑎𝑙𝑠 𝑆𝑐𝑜𝑟𝑒𝑑 =−3.11 𝑒 −1 +1.50 𝑒 −3 𝑃𝑎𝑠𝑠𝑒𝑠 −1.24 𝑒 −2 𝑃𝑜𝑠𝑠𝑒𝑠𝑖𝑜𝑛 +3.88 𝑒 −2 𝑆ℎ𝑜𝑡𝑠 +5.94 𝑒 −2 𝑆𝑜𝑇 −1.75 𝑒 −2 (𝐶𝑜𝑟𝑛𝑒𝑟𝑠)

Interpretation The interpretation of each variable is true only when increasing that particular variable by one unit, i.e. one pass or one shot, and holding all other variables constant. Variable Estimated Coefficient Interpretation Shots 3.88e-2 (e0.0388-1) x 100%=3.95% For every shot taken, the chance of scoring increases by 3.95% on average Shots on Target 5.94e-2 (e0.0594-1) x 100%=6.11% For every shot on target, the chance of scoring increases by 6.11% average Possession -1.24e-2 (1-e-0.0124) x 100%=-1.23% For every minute of possession a team has, the chance of scoring decreases by 1.23% on average Passes 1.50e-3 (e0.00150-1) x 100%=0.15% For every pass made, the chance of scoring increases by 0.15% on average Corners -1.75e-2 (1-e-0.0175) x 100%=-1.75% For every corner taken, the chance of scoring decreases by 1.75%

Conclusion Variability of soccer and how different games are played different Investigate to improve the model by finding the best predictors Utilize this method on other Leagues with better players

Acknowledgments Dr. Bahaeddine Taoufik Dr. Nancy Cowden Dr. Kevin Peterson

Questions