Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season.

Slides:



Advertisements
Similar presentations
Regression of NFL Scores on Vegas Line – 2007 Regular Season.
Advertisements

Topic 12: Multiple Linear Regression
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Data: Crab mating patterns Data: Typists (Poisson with random effects) (Poisson Regression, ZIP model, Negative Binomial) Data: Challenger (Binomial with.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
EPI 809/Spring Probability Distribution of Random Error.
Logistic Regression Example: Horseshoe Crab Data
Overview of Logistics Regression and its SAS implementation
Chapter 12 Simple Linear Regression
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Log-Linear Models & Dependent Samples Feng Ye, Xiao Guo, Jing Wang.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
1 Experimental design and analyses of experimental data Lesson 6 Logistic regression Generalized Linear Models (GENMOD)
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Chapter Eighteen MEASURES OF ASSOCIATION
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
1 Modeling Ordinal Associations Section 9.4 Roanna Gee.
REGRESSION AND CORRELATION
OLS versus MLE Example YX Here is the data:
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression.
Generalized Linear Models
Poisson Distribution Goals in English Premier Football League – 2006/2007 Regular Season.
Poisson Regression Caution Flags (Crashes) in NASCAR Winston Cup Races L. Winner (2006). “NASCAR Winston Cup Race Results for ,” Journal.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
Objectives of Multiple Regression
Introduction to Multilevel Modeling Using SPSS
Logistic Regression and Generalized Linear Models:
Inference for regression - Simple linear regression
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
2 December 2004PubH8420: Parametric Regression Models Slide 1 Applications - SAS Parametric Regression in SAS –PROC LIFEREG –PROC GENMOD –PROC LOGISTIC.
Logistic Regression Database Marketing Instructor: N. Kumar.
November 5, 2008 Logistic and Poisson Regression: Modeling Binary and Count Data LISA Short Course Series Mark Seiss, Dept. of Statistics.
Linear correlation and linear regression + summary of tests
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
GEE Approach Presented by Jianghu Dong Instructor: Professor Keumhee Chough (K.C.) Carrière.
BUSI 6480 Lecture 8 Repeated Measures.
Chapter 13 Multiple Regression
Negative Binomial Regression NASCAR Lead Changes
1 Analysis Considerations in Industrial Split-Plot Experiments When the Responses are Non-Normal Timothy J. Robinson University of Wyoming Raymond H. Myers.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
Log-linear Models HRP /03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.
Sigmoidal Response (knnl558.sas). Programming Example: knnl565.sas Y = completion of a programming task (1 = yes, 0 = no) X 2 = amount of programming.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL.
1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.
Dependent Variable Discrete  2 values – binomial  3 or more discrete values – multinomial  Skewed – e.g. Poisson Continuous  Non-normal.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
2/25/ lecture 121 STATS 330: Lecture 12. 2/25/ lecture 122 Diagnostics 4 Aim of today’s lecture To discuss diagnostics for independence.
G Lecture 71 Revisiting Hierarchical Mixed Models A General Version of the Model Variance/Covariances of Two Kinds of Random Effects Parameter Estimation.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Analysis of matched data Analysis of matched data.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
AP Statistics Chapter 14 Section 1.
Notes on Logistic Regression
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Goals in English Premier Football League – 2006/2007 Regular Season
ביצוע רגרסיה לוגיסטית. פרק ה-2
Statistics II: An Overview of Statistics
Modeling Ordinal Associations Bin Hu
Presentation transcript:

Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Introduction English Premier League Soccer (Football)  20 Teams – Each plays all others twice (home/away)  Games consist of two halves (45 minutes each)  No overtime  Each team is on offense and defense for 38 games (38 first and second halves)  Response Variable: Goals in a half  Potential Independent Variables Fixed Factors: Home Dummy, Half2 Dummy, Game#(1-38) Random Factors: Offensive Team, Defensive Team  Distribution of Response: Poisson?

Preliminary Summary

Summary of Previous Slide Teams vary extensively on offense and defense  Offense: min=38, max=73, mean=50.6, SD=8.85  Defense: min=26, max=79, mean=50.6, SD=13.75  Strong Negative correlation between off/def: r=-0.80 Home Teams outscore Away Teams 1.3:1 Second Half outscores First Half 1.2:1 No evidence of autocorrelation in total goals scored over weeks, Durbin-Watson Stat = 2.03

“Marginal Analysis” – No Team Effects Break Down Goals by Home/Half2 (380 Games)

Summary of Previous Slide Means (Variances) for 4 Half Types:  Home/1 st Half: Mean = Variance =  Away/1 st Half: Mean = Variance =  Home/2 nd Half: Mean = Variance =  Away/2 nd Half: Mean = Variance =  Thus, means and variances in strong agreement Chi-Square Statistics for testing for Poisson:  Df = (4 categories-1)-(1 Parameter estimated) = 2  P-values all exceed 0.50 (.8505,.5440,.7353,.6957)  Goals scored consistent with Poisson Distribution

Generalized Linear Models Dependent Variable: Goals Scored Distribution: Poisson Link Function: log Independent Variables: Home, Half2 Dummy Variables Models: Model fit using generalized linear model software packages

Parameter Estimates / Model Fit – Model 1 Distribution Poisson Link Function Log Dependent Variable goals Number of Observations Read 1520 Number of Observations Used 1520 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood Algorithm converged.

Parameter Estimates / Model Fit – Model 1 Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Intercept home half Scale Analysis Of Parameter Estimates Parameter Pr > ChiSq Intercept <.0001 home <.0001 half Scale NOTE: The scale parameter was held fixed.

Parameter Estimates / Model Fit – Model 2 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood Algorithm converged.

Parameter Estimates / Model Fit – Model 2 Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Intercept home half home*half Scale Parameter Pr > ChiSq Intercept <.0001 home half home*half Scale NOTE: The scale parameter was held fixed.

Testing for Home/Half2 Interaction H 0 : No Home x Half2 Interaction (  HomeHalf2 = 0) H A : Home x Half2 Interaction (  HomeHalf2 ≠ 0) Test 1 – Wald Test Test 2 – Likelihood Ratio Test

Testing for Main Effects for Home & Half2 Wald tests only reported here (both effects are very significant) Tests based on Model 1 (no interaction model)

Interpreting the GLM

Incorporating Random (Team) Effects Teams clearly vary in terms of offensive and defensive skills (see slide 3) Since many factors are inputs into team abilities (players, coaches, chemistry), we will treat team offensive and defensive effects as Random There will be 20 random offensive effects (one per team) and 20 defensive effects

Random Team Effects All effects are on log scale for goals scored Offense Effects: o i ~ NID(0,  o 2 ) Defense Effects: d i ~ NID(0,  d 2 ) In Estimation process assume COV(o i,d i )=0 which seems a stretch (but we can still “observe” the covariance of the estimated random effects)

Mixed Effects Model Fixed Effects: Intercept, Home, Half2 (  Random Effects: Offteam, Defteam (  ) Conditional Model (on Random Effects)

Model in Matrix Notation - Example  League has 3 Teams: A, B, C  Order of Entry of Games:  Order of Entry of Scores within Game: Home/1 st, Away/1 st, Home/2 nd, Away/2 nd  3 Offense Effects, 3 Defense Effects, 24 Observations

Model – Based on 3 Teams

Sequence of Potential Models 1.No fixed or random effects (common mean) 2.Fixed home and second half effects, no random effects 3.Fixed home and second half effects, random offense team effects 4.Fixed home and second half effects, random defense team effects 5.Fixed home and second half effects, random offense and defense team effects

Results – Estimates (P-Values) Model   Home  Half2 o2o2 d2d2  Res 2 -2lnLAICBIC (.0001) N/A (.0001).2624 (.0001).1783 (.0052) N/A (.0001).2624 (.0001).1783 (.0050) (.143*) N/A (.0001).2624 (.0001).1783 (.0040) N/A.0588 (.012*) (.0001).2624 (.0001).1783 (.0039).0084 (.162*).0549 (.012*) Based on Z-test, not preferred Likelihood Ratio Test H 0 :  o 2 = 0 vs H A :  0 2 >0 TS: =6.7 P=0.5P(  1 2 ≥6.7)=.005 Based on AIC, BIC, Model with both offense and defense effects is best No interaction found between team effects and home or half2

Goodness of Fit We Test whether the Poisson GLMM is appropriate model by means of the Scaled Deviance H 0 : Model Fits H A : Model Lacks Fit Deviance = DF = N-#fixed parms = =1517 P-value=P(  2 ≥1570.7)= No Evidence of Lack-of-Fit* * If we use Scaled Deviance, we do reject, where scaled deviance=1570.7/0.9531=1647.9

Best Linear Unbiased Predictors (BLUPs) Estimated Team (Random) Effects (Teams with High Defense values Allow More Goals) Estimated Fixed Effects For each Half ijkl compute exp{ HOME i +HALF2 j +o k +d l } as the BLUP

Comparison of BLUPs with Actual Scores For Each Team Half, we have Actual and BLUP Correlation Between Actual & BLUP = Concordant Pairs of Halves (One scores higher on both Actual and BLUP than other) = Discordant Pairs of Halves = “Gamma” = ( )/( )= Evidence of Some Positive Association Between actual and predicted scores

Sources: Data: SoccerPunter.com Methods: Littell, Milliken, Stroup, Wolfinger(1996). “SAS System for Mixed Models” Wolfinger, R. and M. O’Connell(1993). “Generalized Linear Mixed Models: A Pseudo-Likelihood Approach,” J. Statist. Comput. Simul., Vol. 48, pp

SAS Code data one; infile 'engl2003d.dat'; input hteam $ 1-20 rteam $21-40 goals half2 56 home 64 round 71-73; if home=1 then do; offteam=hteam; defteam=rteam; end; else do; offteam=rteam; defteam=hteam; end; %include 'glmm800.sas'; %glimmix(data=two, procopt=method=reml, stmts=%str( class offteam defteam; model goals = home half2 /s; random offteam defteam /s ; ), error=poisson, link=log); run;