Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season
Introduction English Premier League Soccer (Football) 20 Teams – Each plays all others twice (home/away) Games consist of two halves (45 minutes each) No overtime Each team is on offense and defense for 38 games (38 first and second halves) Response Variable: Goals in a half Potential Independent Variables Fixed Factors: Home Dummy, Half2 Dummy, Game#(1-38) Random Factors: Offensive Team, Defensive Team Distribution of Response: Poisson?
Preliminary Summary
Summary of Previous Slide Teams vary extensively on offense and defense Offense: min=38, max=73, mean=50.6, SD=8.85 Defense: min=26, max=79, mean=50.6, SD=13.75 Strong Negative correlation between off/def: r=-0.80 Home Teams outscore Away Teams 1.3:1 Second Half outscores First Half 1.2:1 No evidence of autocorrelation in total goals scored over weeks, Durbin-Watson Stat = 2.03
“Marginal Analysis” – No Team Effects Break Down Goals by Home/Half2 (380 Games)
Summary of Previous Slide Means (Variances) for 4 Half Types: Home/1 st Half: Mean = Variance = Away/1 st Half: Mean = Variance = Home/2 nd Half: Mean = Variance = Away/2 nd Half: Mean = Variance = Thus, means and variances in strong agreement Chi-Square Statistics for testing for Poisson: Df = (4 categories-1)-(1 Parameter estimated) = 2 P-values all exceed 0.50 (.8505,.5440,.7353,.6957) Goals scored consistent with Poisson Distribution
Generalized Linear Models Dependent Variable: Goals Scored Distribution: Poisson Link Function: log Independent Variables: Home, Half2 Dummy Variables Models: Model fit using generalized linear model software packages
Parameter Estimates / Model Fit – Model 1 Distribution Poisson Link Function Log Dependent Variable goals Number of Observations Read 1520 Number of Observations Used 1520 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood Algorithm converged.
Parameter Estimates / Model Fit – Model 1 Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Intercept home half Scale Analysis Of Parameter Estimates Parameter Pr > ChiSq Intercept <.0001 home <.0001 half Scale NOTE: The scale parameter was held fixed.
Parameter Estimates / Model Fit – Model 2 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood Algorithm converged.
Parameter Estimates / Model Fit – Model 2 Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Intercept home half home*half Scale Parameter Pr > ChiSq Intercept <.0001 home half home*half Scale NOTE: The scale parameter was held fixed.
Testing for Home/Half2 Interaction H 0 : No Home x Half2 Interaction ( HomeHalf2 = 0) H A : Home x Half2 Interaction ( HomeHalf2 ≠ 0) Test 1 – Wald Test Test 2 – Likelihood Ratio Test
Testing for Main Effects for Home & Half2 Wald tests only reported here (both effects are very significant) Tests based on Model 1 (no interaction model)
Interpreting the GLM
Incorporating Random (Team) Effects Teams clearly vary in terms of offensive and defensive skills (see slide 3) Since many factors are inputs into team abilities (players, coaches, chemistry), we will treat team offensive and defensive effects as Random There will be 20 random offensive effects (one per team) and 20 defensive effects
Random Team Effects All effects are on log scale for goals scored Offense Effects: o i ~ NID(0, o 2 ) Defense Effects: d i ~ NID(0, d 2 ) In Estimation process assume COV(o i,d i )=0 which seems a stretch (but we can still “observe” the covariance of the estimated random effects)
Mixed Effects Model Fixed Effects: Intercept, Home, Half2 ( Random Effects: Offteam, Defteam ( ) Conditional Model (on Random Effects)
Model in Matrix Notation - Example League has 3 Teams: A, B, C Order of Entry of Games: Order of Entry of Scores within Game: Home/1 st, Away/1 st, Home/2 nd, Away/2 nd 3 Offense Effects, 3 Defense Effects, 24 Observations
Model – Based on 3 Teams
Sequence of Potential Models 1.No fixed or random effects (common mean) 2.Fixed home and second half effects, no random effects 3.Fixed home and second half effects, random offense team effects 4.Fixed home and second half effects, random defense team effects 5.Fixed home and second half effects, random offense and defense team effects
Results – Estimates (P-Values) Model Home Half2 o2o2 d2d2 Res 2 -2lnLAICBIC (.0001) N/A (.0001).2624 (.0001).1783 (.0052) N/A (.0001).2624 (.0001).1783 (.0050) (.143*) N/A (.0001).2624 (.0001).1783 (.0040) N/A.0588 (.012*) (.0001).2624 (.0001).1783 (.0039).0084 (.162*).0549 (.012*) Based on Z-test, not preferred Likelihood Ratio Test H 0 : o 2 = 0 vs H A : 0 2 >0 TS: =6.7 P=0.5P( 1 2 ≥6.7)=.005 Based on AIC, BIC, Model with both offense and defense effects is best No interaction found between team effects and home or half2
Goodness of Fit We Test whether the Poisson GLMM is appropriate model by means of the Scaled Deviance H 0 : Model Fits H A : Model Lacks Fit Deviance = DF = N-#fixed parms = =1517 P-value=P( 2 ≥1570.7)= No Evidence of Lack-of-Fit* * If we use Scaled Deviance, we do reject, where scaled deviance=1570.7/0.9531=1647.9
Best Linear Unbiased Predictors (BLUPs) Estimated Team (Random) Effects (Teams with High Defense values Allow More Goals) Estimated Fixed Effects For each Half ijkl compute exp{ HOME i +HALF2 j +o k +d l } as the BLUP
Comparison of BLUPs with Actual Scores For Each Team Half, we have Actual and BLUP Correlation Between Actual & BLUP = Concordant Pairs of Halves (One scores higher on both Actual and BLUP than other) = Discordant Pairs of Halves = “Gamma” = ( )/( )= Evidence of Some Positive Association Between actual and predicted scores
Sources: Data: SoccerPunter.com Methods: Littell, Milliken, Stroup, Wolfinger(1996). “SAS System for Mixed Models” Wolfinger, R. and M. O’Connell(1993). “Generalized Linear Mixed Models: A Pseudo-Likelihood Approach,” J. Statist. Comput. Simul., Vol. 48, pp
SAS Code data one; infile 'engl2003d.dat'; input hteam $ 1-20 rteam $21-40 goals half2 56 home 64 round 71-73; if home=1 then do; offteam=hteam; defteam=rteam; end; else do; offteam=rteam; defteam=hteam; end; %include 'glmm800.sas'; %glimmix(data=two, procopt=method=reml, stmts=%str( class offteam defteam; model goals = home half2 /s; random offteam defteam /s ; ), error=poisson, link=log); run;