Download presentation
Presentation is loading. Please wait.
Published byBeatrice Haynes Modified over 6 years ago
1
Analysis of 2016-17 MLS Season Data Using Poisson Regression with R
Ian Campbell Dr. Bahaeddine Taoufik Dr. Nancy Cowden Dr. Kevin Peterson
2
Main Idea The main goal of this project is to analyze the MLS season and explore the statistical software R by employing a Poisson regression model to make inferences about how goals are scored based on predictor variables i.e. passes, possession time, shots etc... And explore R
3
The Data MLS 2016-17 Season Table 1: Team Data Per Game
Each entry corresponds to an individual game
4
Variable Correlation
5
Goals per Match 574 Entries Discrete Data Right skewed
Figure 1: Illustration of the distribution of goals per match
6
Poisson Distribution X is the discrete variable k=1,2,3,…
𝑓 𝑘;𝜆 = Pr 𝑋=𝑘 = 𝜆 𝑘 𝑒 −𝜆 𝑘! X is the discrete variable k=1,2,3,… 𝜆=mean of discrete variable
7
Poisson Distribution Key assumptions for the Poisson Distribution
Independence: The number of goals is not affected by the time in the match Homogeneity: All variables are independent and independent of each other. Time Period is constant Mean and Variance of the Poisson distribution are the same Testing and graphing done with R Goals per Match Observed Vs. Expected Count Observed Mean=1.483 Observed Variance=1.629 Difference=.146 Figure 2: Actual vs. Poisson distribution
8
Poisson Regression Model
ln 𝑦ˆ 𝑖 = 𝑏 0 + 𝑏 1 𝑋 𝑖,1 + 𝑏 2 𝑋 𝑖,2 + 𝑏 3 𝑋 𝑖,3 +…+ 𝑏 𝑘 𝑋 𝑖,𝑘 yi = Predicted response Xi = Predictor Variables b0 = Estimated intercept b1 -> bk = Estimated coefficients ln( 𝑦ˆ 1 ) ⋮ ⋮ ln( 𝑦ˆ 𝑛 ) = 1 𝑋 1,1 𝑋 1,2 𝑋 1,3 𝑋 1,4 … 𝑋 1,𝑘 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 1 𝑋 𝑛,1 𝑋 𝑛,2 𝑋 𝑛,3 𝑋 𝑛,4 … 𝑋 𝑛,𝑘 𝑏 0 𝑏 1 𝑏 𝑏 𝑘 1≤𝑖≤𝑛
9
Data Analysis -H0: The response variable, goals scored, is over-dispersed -Ha: The response variable, goals scored, is not over-dispersed Must run final test with R to verify over-dispersion Model will follow the equation: Predictor Variables Passes made (Passes) Amount of time that team possessed the ball during the 90-minute game (Possesion.Rate) Red cards received (Red.Cards) Corners taken (Corners) Free kicks taken (Free.Kicks) Penalty kicks taken (Penalty.Kicks) Shots taken (Shots), Shots on target (SoT) ln Goals Scored = b 0 + b 1 Passes + b 2 Possesion + b 3 Shots + b 4 SoT + b 5 (Corners)+ b 6 (Penalty Kicks) + b 7 (Red Cards)+ b 8 (Free Kicks)
10
R Analysis Insignificant variables Red cards Free kicks
Adjusted Equation: 𝐥𝐧 𝐆𝐨𝐚𝐥𝐬 𝐒𝐜𝐨𝐫𝐞𝐝 = 𝐛 𝟎 + 𝐛 𝟏 𝐏𝐚𝐬𝐬𝐞𝐬 + 𝐛 𝟐 𝐏𝐨𝐬𝐬𝐞𝐬𝐢𝐨𝐧 + 𝐛 𝟑 𝐒𝐡𝐨𝐭𝐬 + 𝐛 𝟒 𝐒𝐨𝐓 + 𝐛 𝟓 (𝐂𝐨𝐫𝐧𝐞𝐫𝐬)+ 𝐛 𝟔 (𝐏𝐞𝐧𝐚𝐥𝐭𝐲 𝐊𝐢𝐜𝐤𝐬)
11
Adjusted Quasi-Poisson Model
Significance values increased Standard errors down Estimations improved Final Equation: ln 𝐺𝑜𝑎𝑙𝑠 𝑆𝑐𝑜𝑟𝑒𝑑 =−2.92 𝑒 − 𝑒 −3 𝑃𝑎𝑠𝑠𝑒𝑠 −1.39 𝑒 −2 𝑃𝑜𝑠𝑠𝑒𝑠𝑖𝑜𝑛 𝑒 −2 𝑆ℎ𝑜𝑡𝑠 𝑒 −2 𝑆𝑜𝑇 −1.88 𝑒 −2 (𝐶𝑜𝑟𝑛𝑒𝑟𝑠)+3.24 𝑒 −1 (𝑃𝑒𝑛𝑎𝑙𝑡𝑦 𝐾𝑖𝑐𝑘𝑠)
12
Interpretation The interpretation of each variable is true only when increasing that particular variable by one unit, i.e. one pass or one shot, and holding all other variables constant. Variable Estimated Coefficient Interpretation Passes 1.64e-3 (e ) x 100%=0.164% For every pass made, the chance of scoring increases by 0.164% on average Shots 3.81e-2 (e ) x 100%=3.88% For every shot taken, the chance of scoring increases by 3.88% on average Shots on Target 6.23e-2 (e ) x 100%=6.42% For every shot on target, the chance of scoring increases by 6.42% average Possession -1.39e-2 (e ) x 100%=-1.38% For every minute of possession a team has, the chance of scoring decreases by 1.38% on average Penalty Kicks 3.24e-1 (e ) x 100%=38.38% For every penalty kick awarded, the chance of scoring increases by 38.38% on average Corners -1.88e-2 (e ) x 100%=-1.86% For every corner taken, the chance of scoring decreases by 1.86%
13
Penalty Kicks Average speed: 70mph Reaches goal line in 0.7 seconds
24’ 8’ 36’ Average speed: 70mph Reaches goal line in 0.7 seconds Average time to reach either post: 0.6 seconds Average human reaction time: 0.25 seconds
14
Penalty Kick Dataset 93 entries
All penalty kicks taken; missed or scored
15
Over-Dispersion Test P-value greater than 0.05
-H0: The response variable, goals scored, is over-dispersed -Ha: The response variable, goals scored, is not over-dispersed P-value greater than 0.05
16
Penalty Kick Poisson Adjusted estimation Previous value: 0.324
Smaller data set resulting in lower significance values Adjusted model ln 𝐺𝑜𝑎𝑙𝑠 𝑆𝑐𝑜𝑟𝑒𝑑 =−2.92 𝑒 − 𝑒 −3 𝑃𝑎𝑠𝑠𝑒𝑠 −2.32 𝑒 −2 𝑃𝑜𝑠𝑠𝑒𝑠𝑖𝑜𝑛 𝑒 −3 𝑆ℎ𝑜𝑡𝑠 𝑒 −2 𝑆𝑜𝑇 −2.76 𝑒 −3 (𝐶𝑜𝑟𝑛𝑒𝑟𝑠)+5.95 𝑒 −1 (𝑃𝑒𝑛𝑎𝑙𝑡𝑦 𝐾𝑖𝑐𝑘𝑠)
17
Interpretation Variable Estimated Coefficient Interpretation
The interpretation of each variable is true only when increasing that particular variable by one unit, i.e. one pass or one shot, and holding all other variables constant. Variable Estimated Coefficient Interpretation Penalty Kicks 5.95e-1 (e ) x 100%=81% For penalty kick taken, the chance of scoring increases by 80% on average Shots 8.17e-3 (e ) x 100%=8.51% For every shot taken, the chance of scoring increases by 8.51% on average Shots on Target 7.77e-2 (e ) x 100%=8.07% For every shot on target, the chance of scoring increases by 8.07% average Possession -2.32e-2 (1-e ) x 100%=-2.29% For every minute of possession a team has, the chance of scoring decreases by 2.29% on average Passes 1.34e-3 (e ) x 100%=0.164% For every pass made, the chance of scoring increases by 0.164% on average Corners -2.76e-3 (1-e ) x 100%=-1.86% For every corner taken, the chance of scoring decreases by 1.86%
18
No Penalty Kick Dataset
481 data points Games that did not have penalty kicks present
19
Over-Dispersion Test P-value greater than 0.05
-H0: The response variable, goals scored, is over-dispersed -Ha: The response variable, goals scored, is not over-dispersed P-value greater than 0.05
20
Non-Penalty Kick Poisson
Model Equation: ln 𝐺𝑜𝑎𝑙𝑠 𝑆𝑐𝑜𝑟𝑒𝑑 =−3.11 𝑒 − 𝑒 −3 𝑃𝑎𝑠𝑠𝑒𝑠 −1.24 𝑒 −2 𝑃𝑜𝑠𝑠𝑒𝑠𝑖𝑜𝑛 𝑒 −2 𝑆ℎ𝑜𝑡𝑠 𝑒 −2 𝑆𝑜𝑇 −1.75 𝑒 −2 (𝐶𝑜𝑟𝑛𝑒𝑟𝑠)
21
Interpretation The interpretation of each variable is true only when increasing that particular variable by one unit, i.e. one pass or one shot, and holding all other variables constant. Variable Estimated Coefficient Interpretation Shots 3.88e-2 (e ) x 100%=3.95% For every shot taken, the chance of scoring increases by 3.95% on average Shots on Target 5.94e-2 (e ) x 100%=6.11% For every shot on target, the chance of scoring increases by 6.11% average Possession -1.24e-2 (1-e ) x 100%=-1.23% For every minute of possession a team has, the chance of scoring decreases by 1.23% on average Passes 1.50e-3 (e ) x 100%=0.15% For every pass made, the chance of scoring increases by 0.15% on average Corners -1.75e-2 (1-e ) x 100%=-1.75% For every corner taken, the chance of scoring decreases by 1.75%
22
Conclusion Variability of soccer and how different games are played different Investigate to improve the model by finding the best predictors Utilize this method on other Leagues with better players
23
Acknowledgments Dr. Bahaeddine Taoufik Dr. Nancy Cowden
Dr. Kevin Peterson
24
Questions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.