Presentation is loading. Please wait.

Presentation is loading. Please wait.

Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season.

Similar presentations


Presentation on theme: "Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season."— Presentation transcript:

1 Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

2 Introduction English Premier League Soccer (Football)  20 Teams – Each plays all others twice (home/away)  Games consist of two halves (45 minutes each)  No overtime  Each team is on offense and defense for 38 games (38 first and second halves)  Response Variable: Goals in a half  Potential Independent Variables Fixed Factors: Home Dummy, Half2 Dummy, Game#(1-38) Random Factors: Offensive Team, Defensive Team  Distribution of Response: Poisson?

3 Preliminary Summary

4 Summary of Previous Slide Teams vary extensively on offense and defense  Offense: min=38, max=73, mean=50.6, SD=8.85  Defense: min=26, max=79, mean=50.6, SD=13.75  Strong Negative correlation between off/def: r=-0.80 Home Teams outscore Away Teams 1.3:1 Second Half outscores First Half 1.2:1 No evidence of autocorrelation in total goals scored over weeks, Durbin-Watson Stat = 2.03

5 “Marginal Analysis” – No Team Effects Break Down Goals by Home/Half2 (380 Games)

6 Summary of Previous Slide Means (Variances) for 4 Half Types:  Home/1 st Half: Mean = 0.692 Variance = 0.689  Away/1 st Half: Mean = 0.521 Variance = 0.514  Home/2 nd Half: Mean = 0.813 Variance = 0.912  Away/2 nd Half: Mean = 0.637 Variance = 0.628  Thus, means and variances in strong agreement Chi-Square Statistics for testing for Poisson:  Df = (4 categories-1)-(1 Parameter estimated) = 2  P-values all exceed 0.50 (.8505,.5440,.7353,.6957)  Goals scored consistent with Poisson Distribution

7

8 Generalized Linear Models Dependent Variable: Goals Scored Distribution: Poisson Link Function: log Independent Variables: Home, Half2 Dummy Variables Models: Model fit using generalized linear model software packages

9 Parameter Estimates / Model Fit – Model 1 Distribution Poisson Link Function Log Dependent Variable goals Number of Observations Read 1520 Number of Observations Used 1520 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 1517 1650.4574 1.0880 Scaled Deviance 1517 1650.4574 1.0880 Pearson Chi-Square 1517 1549.2570 1.0213 Scaled Pearson X2 1517 1549.2570 1.0213 Log Likelihood -1411.0226 Algorithm converged.

10 Parameter Estimates / Model Fit – Model 1 Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Intercept 1 -0.6397 0.0588 -0.7549 -0.5245 118.48 home 1 0.2624 0.0634 0.1381 0.3866 17.12 half2 1 0.1783 0.0631 0.0546 0.3020 7.98 Scale 0 1.0000 0.0000 1.0000 1.0000 Analysis Of Parameter Estimates Parameter Pr > ChiSq Intercept <.0001 home <.0001 half2 0.0047 Scale NOTE: The scale parameter was held fixed.

11 Parameter Estimates / Model Fit – Model 2 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 1516 1650.3613 1.0886 Scaled Deviance 1516 1650.3613 1.0886 Pearson Chi-Square 1516 1549.7072 1.0222 Scaled Pearson X2 1516 1549.7072 1.0222 Log Likelihood -1410.9745 Algorithm converged.

12 Parameter Estimates / Model Fit – Model 2 Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Intercept 1 -0.6519 0.0711 -0.7912 -0.5126 84.15 home 1 0.2839 0.0941 0.0995 0.4683 9.10 half2 1 0.2007 0.0958 0.0129 0.3885 4.39 home*half2 1 -0.0395 0.1274 -0.2891 0.2101 0.10 Scale 0 1.0000 0.0000 1.0000 1.0000 Parameter Pr > ChiSq Intercept <.0001 home 0.0026 half2 0.0363 home*half2 0.7566 Scale NOTE: The scale parameter was held fixed.

13 Testing for Home/Half2 Interaction H 0 : No Home x Half2 Interaction (  HomeHalf2 = 0) H A : Home x Half2 Interaction (  HomeHalf2 ≠ 0) Test 1 – Wald Test Test 2 – Likelihood Ratio Test

14 Testing for Main Effects for Home & Half2 Wald tests only reported here (both effects are very significant) Tests based on Model 1 (no interaction model)

15 Interpreting the GLM

16 Incorporating Random (Team) Effects Teams clearly vary in terms of offensive and defensive skills (see slide 3) Since many factors are inputs into team abilities (players, coaches, chemistry), we will treat team offensive and defensive effects as Random There will be 20 random offensive effects (one per team) and 20 defensive effects

17 Random Team Effects All effects are on log scale for goals scored Offense Effects: o i ~ NID(0,  o 2 ) Defense Effects: d i ~ NID(0,  d 2 ) In Estimation process assume COV(o i,d i )=0 which seems a stretch (but we can still “observe” the covariance of the estimated random effects)

18 Mixed Effects Model Fixed Effects: Intercept, Home, Half2 (  Random Effects: Offteam, Defteam (  ) Conditional Model (on Random Effects)

19 Model in Matrix Notation - Example  League has 3 Teams: A, B, C  Order of Entry of Games: A@B, A@C, B@C, B@A, C@A, C@B  Order of Entry of Scores within Game: Home/1 st, Away/1 st, Home/2 nd, Away/2 nd  3 Offense Effects, 3 Defense Effects, 24 Observations

20 Model – Based on 3 Teams

21 Sequence of Potential Models 1.No fixed or random effects (common mean) 2.Fixed home and second half effects, no random effects 3.Fixed home and second half effects, random offense team effects 4.Fixed home and second half effects, random defense team effects 5.Fixed home and second half effects, random offense and defense team effects

22 Results – Estimates (P-Values) Model   Home  Half2 o2o2 d2d2  Res 2 -2lnLAICBIC 1 -.407 (.0001) N/A 1.0445001.95003.95009.3 2 -.6397 (.0001).2624 (.0001).1783 (.0052) N/A 1.02134992.34994.34999.6 3 -.6413 (.0001).2624 (.0001).1783 (.0050).01004 (.143*) N/A1.00994985.64989.64991.6 4 -.6592 (.0001).2624 (.0001).1783 (.0040) N/A.0588 (.012*) 0.96304958.64962.64964.6 5 -.6605 (.0001).2624 (.0001).1783 (.0039).0084 (.162*).0549 (.012*) 0.95314951.94957.94960.9 Based on Z-test, not preferred Likelihood Ratio Test H 0 :  o 2 = 0 vs H A :  0 2 >0 TS: 4958.6-4951.9=6.7 P=0.5P(  1 2 ≥6.7)=.005 Based on AIC, BIC, Model with both offense and defense effects is best No interaction found between team effects and home or half2

23 Goodness of Fit We Test whether the Poisson GLMM is appropriate model by means of the Scaled Deviance H 0 : Model Fits H A : Model Lacks Fit Deviance = 1570.7 DF = N-#fixed parms = 1520-3=1517 P-value=P(  2 ≥1570.7)=0.1646 No Evidence of Lack-of-Fit* * If we use Scaled Deviance, we do reject, where scaled deviance=1570.7/0.9531=1647.9

24 Best Linear Unbiased Predictors (BLUPs) Estimated Team (Random) Effects (Teams with High Defense values Allow More Goals) Estimated Fixed Effects For each Half ijkl compute exp{-0.6605+HOME i +HALF2 j +o k +d l } as the BLUP

25 Comparison of BLUPs with Actual Scores For Each Team Half, we have Actual and BLUP Correlation Between Actual & BLUP = 0.2655 Concordant Pairs of Halves (One scores higher on both Actual and BLUP than other) = 452471 Discordant Pairs of Halves = 355617 “Gamma” = (452471-355617)/(452471+355617)=0.1199 Evidence of Some Positive Association Between actual and predicted scores

26 Sources: Data: SoccerPunter.com Methods: Littell, Milliken, Stroup, Wolfinger(1996). “SAS System for Mixed Models” Wolfinger, R. and M. O’Connell(1993). “Generalized Linear Mixed Models: A Pseudo-Likelihood Approach,” J. Statist. Comput. Simul., Vol. 48, pp. 233-243.

27 SAS Code data one; infile 'engl2003d.dat'; input hteam $ 1-20 rteam $21-40 goals 47-48 half2 56 home 64 round 71-73; if home=1 then do; offteam=hteam; defteam=rteam; end; else do; offteam=rteam; defteam=hteam; end; %include 'glmm800.sas'; %glimmix(data=two, procopt=method=reml, stmts=%str( class offteam defteam; model goals = home half2 /s; random offteam defteam /s ; ), error=poisson, link=log); run;


Download ppt "Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season."

Similar presentations


Ads by Google