Advanced quantitative methods for social scientists (2017–2018) LC & PVK Session 2 Multilevel analysis in Stata (with a focus on random slope models for.

Advanced quantitative methods for social scientists (2017–2018) LC & PVK
Session 2 Multilevel analysis in Stata (with a focus on random slope models for comparative research) Louis Chauvel University of Luxembourg, PEARL Institute for Research on Socio-Economic Inequality (IRSEI)

Outline Background Method with example: the PISA survey
Chauvel L, Leist AK. Socioeconomic hierarchy and health gradient in Europe: the role of income inequality and of social origins. International Journal for Equity in Health. 2015;14:132. doi: /s y. Chauvel L, Hartung A, More Inequality, More Viscosity? Intergenerational Mobility in International Comparison, March 31 – April 2, 2016: PAA Annual Meeting, Washington DC, Outline Background Standard multiple regressions versus random effects models Fixed effects and random effects Basics on notations in multilevel analysis 2-Level models / random effect / random slope Generalization: Higher level models and cross-classified models Method with example: the PISA survey Fitting models random effects and random slopes Post-estimation techniques: BLUPs, Multilevel tools (mlt) Understanding and presenting results Examples of publication Further developments on panel analysis xtmixed as a pervasive command

Main references R Stata
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Thousand Oaks: Sage Publications. Rabe-Hesketh, S., and A. Skrondal Multilevel and longitudinal modeling using STATA. Stata Press. Gelman, A., and J. Hill Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press. Stata R

Multilevel (2L) data structure
Simple example: 2 level data That is … Level 2 Level 1 Country Country Country Country Country Country 1 I I I I4 Country 2 I I I I4 Country 3 I I I I4 Country 4 I I I I4 NB: Minimum 20 level-2 groups

Typical example: PISA 2012 Educational performance at age circa 15 Many countries (68) Parental/family backgrounds Performance variation by country Influence of parental background (Social Reproduction) by country Explanation of Social Reproduction variation? Country GDP/capita, gini etc. The old solution: series of standard OLS To open the dataset and prepare it … * PROGRAM SEGMENT 0 To process the old solution … * PROGRAM SEGMENT 1

FRANCE ! LUX HKG ALBANIA

matrix R=J(1,5,.) levelsof cco foreach i of numlist `r(levels)' { di ì' ta cnt if ì'==cco capture { quietly: reg PV1READ ST04Q01 f1 stdage if ì'==cco matrix A=e(b) noisily matrix li A matrix C=ì',A matrix R=R \ C } mat li R preserve clear svmat R gen CountryScore=R5 gen SocReproduction=R3 gen cco=R1 two scatter SocR Cou, ml(cco) reg SocR Cou reg SocR Cou if R1!=1 restore

Multilevel Data: why? Multilevel models respect the structure of data we have 1. Clustered data and correlated errors in each cluster 2. ML relaxes assumption of uncorrelated (independent) errors 3. Partitioning variance-covariance components Question: At what level is most of the variance? Conceptually: Different levels and their effects? Statistically: Are your data clustered? Empirically: are there variations both at L1 and L2? … And we can “easily” refine the models

Fixed Effects Model (FEM) & Random Effects (REM)
J groups For i cases within j groups aj is a separate intercept for each group at within-group, equivalent to: “within group” model : all variables are centered around mean of each group. In practice : FEM = J replications of standard OLS Models With dummy variable approach => group differences as a fixed effect * PROGRAM SEGMENT 2

Random Effects Alternatively, treat effects as random effect
No estimates for each case, but model them A simple random intercept model Notation from Rabe-Hesketh & Skrondal Where b is the main intercept Zeta (z) is a random effect for each group Allowing each of j groups to have its own intercept Assumed to be independent & normally distributed Error (e) is the error term for each case Also assumed to be independent & normally distributed NB: Minimum 20 level-2 groups

xtreg syntax xtreg PV1READ ST04Q01 f1 stdage, i(cco) fe
* PROGRAM SEGMENT 4 *Comparing FE and RE models xtreg PV1READ ST04Q01 f1 stdage, i(cco) fe Dependant variable X-explanatory variables level 2 group variable FE or RE model

Usual Solution => Hausman Specification Test
Best Model? Fixed effects most consistent as N grows very large But less efficient than random effects when low within-group variation (big between group variation) and small sample size (not PISA…) Usual Solution => Hausman Specification Test Hausman Specification Test: tool help evaluate fit of fixed vs. random effects Logic: Both fixed & random effects models are consistent if models are properly specified However, some model violations cause random effects models to be inconsistent Ex: if X variables are correlated to random error In short: Models should give the same results… If not, random effects may be biased If results are similar, use the most efficient model: random effects If results diverge, odds are that the random effects model is biased. In that case use fixed effects…

Hausman Specification Test
Strategy: Estimate both fixed & random effects models Save the estimates each time Finally invoke Hausman test Ex (here with the “old” xtreg stata command): xtreg PV1READ ST04Q01 f1 stdage, i(cco) fe est store femod xtreg PV1READ ST04Q01 f1 stdage, i(cco) re est store remod esttab femod remod hausman femod remod * PROGRAM SEGMENT 4 *Comparing FE and RE models

Linear Fixed Intercepts Model
. xtreg PV1READ ST04Q01 parentalbckgrnd stdage, i(cco) fe Fixed-effects (within) regression Number of obs = Group variable: ccode Number of groups = R-sq: within = Obs per group: min = between = avg = overall = max = F(3,413120) = corr(u_i, Xb) = Prob > F = PV1READ | Coef. Std. Err t P>|t| [95% Conf. Interval] ST04Q01 | parentalbckgrnd | stdage | _cons | sigma_u | sigma_e | rho | (fraction of variance due to u_i) F test that all u_i=0: F(66, ) = Prob > F = SD of u (intercepts); SD of e; intra-class correlation

Linear Random Intercepts Model
. xtreg PV1READ ST04Q01 parentalbckgrnd stdage, i(cco) re Random-effects GLS regression Number of obs = Group variable: ccode Number of groups = R-sq: within = Obs per group: min = between = avg = overall = max = Wald chi2(3) = corr(u_i, X) = 0 (assumed) Prob > chi = PV1READ | Coef. Std. Err z P>|z| [95% Conf. Interval] ST04Q01 | parentalbckgrnd | stdage | _cons | sigma_u | sigma_e | rho | (fraction of variance due to u_i) Assumes normal uj, uncorrelated with X vars SD of u (intercepts); SD of e; intra-class correlation

Hausman Specification Test
Example: Pisa read score fe vs re . hausman femod remod ---- Coefficients ---- | (b) (B) (b-B) sqrt(diag(V_b-V_B)) | femod remod Difference S.E. ST04Q01 | parentalbc~d | stdage | b = consistent under Ho and Ha; obtained from xtreg B = inconsistent under Ha, efficient under Ho; obtained from xtreg Test: Ho: difference in coefficients not systematic chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B) = Prob>chi2 = (V_b-V_B is not positive definite) Direct comparison of coefficients… Non-significant p-value indicates that models yield similar results… OK

Within & Between Effects / Centering
Why do we do Multilevel models?  To understand the role of inequality Between and Within countries So “Centering” variables both grand mean and group mean centering Grand mean centering: computing variables as deviations from overall mean Should be systematically done for X variables Group mean centering: computing variables as deviation from group mean Useful for decomposing within vs. between effects  relative role of inequality between and within countries Often in conjunction with aggregate group mean vars.

Within & Between Effects
You can estimate BOTH within- and between-group effects in a single model Strategy: Split a variable (e.g., household possession score) into two new variables… 1. Group mean household possession score 2. Within-group deviation from mean household possession score Often called “group mean centering” Then, put both variables into a random effects model Model will estimate separate coefficients for between vs. within effects Ex: egen betwparentalbckgrnd=mean(parentalbckgrnd), by(cco) gen withinparentalbckgrnd=parentalbckgrnd-betwparentalbckgrnd xtreg PV1READ ST04Q01 stdage betw withi, i(cco) re * PROGRAM SEGMENT 5 *Assessing within and between effects

Linear Random Intercepts Model
. xtreg PV1READ ST04Q01 stdage betw withi, i(cco) re Random-effects GLS regression Number of obs = Group variable: ccode Number of groups = R-sq: within = Obs per group: min = between = avg = overall = max = Wald chi2(4) = corr(u_i, X) = 0 (assumed) Prob > chi = PV1READ | Coef. Std. Err z P>|z| [95% Conf. Interval] ST04Q01 | stdage | betwparentalbckgrnd | withinparentalbckgrnd | _cons | sigma_u | sigma_e | rho | (fraction of variance due to u_i) Parental background has huge effect both within and between

Generalizing: Random Coefficients (=Random slopes)
Linear random intercept model allows random variation in intercept (mean) for groups But, the same idea can be applied to other coefficients That is, slope coefficients can ALSO be random! Random Coefficient Model Which can be written as: Where zeta-1 is a random intercept component = differences between countries Zeta-2 is a random slope component = country specific inequality effect

Linear Random Coefficient Model
Rabe-Hesketh & Skrondal Both intercepts and slopes vary randomly across j groups PV1READ Inequality between countries vary randomly Inequality within country parentalbckgrnd

xtmixed syntax * PROGRAM SEGMENT 6 * a first random slope model xtmixed – allows random intercepts & slopes “Mixed” models refer to models that have both fixed and random components xtmixed [depvar] [fixed equation] || [random eq], options xtmixed PV1READ ST04Q01 stdage || cco: parentalbckgrnd , iter(5) diff mle cov(unstr) Dependant variable fixed effect variables RE Level 2 variable slope variable estimation options cov(unstructured) cov(unstr) relaxes constraints regarding covariance among random effects (See Rabe-Hesketh & Skrondal) Stata default treats random terms (intercept, slope) as totally uncorrelated… not always reasonable

Example: PISA 2012 . xtmixed supportenv age male dmar demp educ incomerel ses || country: , mle Mixed-effects ML regression Number of obs = Group variable: ccode Number of groups = Obs per group: min = avg = max = Wald chi2(2) = Log likelihood = Prob > chi = PV1READ | Coef. Std. Err z P>|z| [95% Conf. Interval] ST04Q01 | stdage | _cons | .../...

Ex: PISA 2012 (cont’d) Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] ccode: Unstructured | sd(parent~d) | sd(_cons) | corr(parent~d,_cons) | sd(Residual) | LR test vs. linear regression: chi2(3) = 1.5e+05 Prob > chi2 = “cons” (constant) are intercepts for countries “parent^d” for the slopes Non-zero SDs indicates that both intercepts and slopes vary If some of the estimates are not significant  you can simplify the model

What about the random slopes?
Slopes = within country parental background gradient of inequality

What about the random slopes?
* PROGRAM SEGMENT 8 * like 6 with BLUP predictors of intercepts and slopes best linear unbiased predictions (BLUPs) slopes intercepts

Multilevel Model Notation
Random coeff (random slope) can be expressed in a single equation: Random Coefficient Model However, it is common to separate levels: Level 1 equation Gamma = constant u = random effect Here, we specify a random component for level-1 constant & slope Intercept equation Slope Equation

Cross-Level Interactions
Does context (i.e., level-2) influence the effect of level-1 variables? Example: Effect of country inequality (gini) on lower achievements Can you think of others?

Cross-level interactions
Idea: specify a level-2 variable that affects a level-1 slope Level 1 equation Intercept equation Slope equation with interaction Cross-level interaction: Level-2 variable Z affects slope (B2) of a level-1 X variable Coefficient g3 reflects size of interaction (effect on B2 per unit change in Z)

Cross-level Interactions
Cross-level interaction in single-equation form: Random Coefficient Model with cross-level interaction Stata strategy: manually compute cross-level interaction variables Ex: Poverty*WelfareState, Gender*SingleSexSchool Then, put interaction variable in the “fixed” model Interpretation: B3 coefficient indicates the impact of each unit change in Z on slope B2 If B3 is positive, increase in Z results in larger B2 slope.

Beyond 2-level models Sometimes data has 3 levels or more
Ex: School, classroom, individual Ex: Family, individual, time (repeated measures) Can be dealt with in xtmixed xtmixed syntax: specify “fixed” equation and then random effects starting with “top” level xtmixed var1 var2 var3 || schoolid: var2 || classid:var3 Again, specify unstructured covariance: cov(unstr)

Advice about building models
Raudenbush & Bryk 2002 Start building the level 1 model first Then build level 2 model Keeping a close eye on level 2 N.

Advanced quantitative methods for social scientists (2017–2018) LC & PVK Session 2 Multilevel analysis in Stata (with a focus on random slope models for.

Similar presentations

Presentation on theme: "Advanced quantitative methods for social scientists (2017–2018) LC & PVK Session 2 Multilevel analysis in Stata (with a focus on random slope models for."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Advanced quantitative methods for social scientists (2017–2018) LC & PVK Session 2 Multilevel analysis in Stata (with a focus on random slope models for.

Similar presentations

Presentation on theme: "Advanced quantitative methods for social scientists (2017–2018) LC & PVK Session 2 Multilevel analysis in Stata (with a focus on random slope models for."— Presentation transcript:

Similar presentations

About project

Feedback