Download presentation
Presentation is loading. Please wait.
Published byHenry Berry Modified over 9 years ago
1
1 Scottish Social Survey Network: Master Class 1 Data Analysis with Stata Dr Vernon Gayle and Dr Paul Lambert 23 rd January 2008, University of Stirling The SSSN is funded under Phase II of the ESRC Research Development Initiative
2
2 Multilevel data and analysis with Stata (in 15 minutes)
3
3 Generalised linear model Y = BX + e Y = outcome variable(s) X = explanatory variables e = error term for each individual response Generalised linear mixed models –Adding complexity to the GLM, such as by disaggregating the error structures
4
4 The work of statistical modelling Y i = BX i + e i Most of the time: –we have a single Y –we ignore e –we concentrate on what goes into B
5
5 Example Data: British Household Panel Survey 2005 adult interviews (7k adults in work) Y = GHQ scale score for adults in employment (General Health Questionnaire, higher = worse subjective well-being) X = various possible measures, including gender, age, marital status, occupational advantage, education, partner’s GHQ You can run this example, the files are at:
6
Results from four linear models 1234 Cons11.03**6.29**6.14**6.56** Fem1.25**1.28**1.39** Age0.22**0.23**0.22** Age-squared-0.0024**-0.0026**-0.0024** Cohab-0.33*-0.77**-0.76**-1.52** Own CAMSIS-0.01*-0.01 Father’s CAMSIS0.01 Degree/Diploma-0.05 Vocational qual-0.13 No qual-0.11 Works > 10hrs0.13 Partner’s GHQ0.08** R20.00090.02340.02440.0293
7
7 Some regression assumptions All variables are measured without errors All relevant predictors of the independent variable are included in the analysis Expected value of the error is zero Heteroscedasticity of the error No autocorrelation (no relation between error terms for different cases) –[above using: Menard, S. 1995. Applied Logistic Regression Analysis, London: Sage.]
8
8 Multilevel modelling What if there was some connection between some of the cases within the dataset? –This occurs by design in certain projects e.g. educational research, sample includes multiple children from the same school –Some connections (‘hierarchical clusters’) are standard in most social surveys
9
9
10
10 How to account for hierarchy / clustering in individual data? 1.We could try a unique dummy var. for every cluster –Country: Y = BX + scot + wal + Nir + e –‘areg’ in Stata allows several hundred variables like this –often called a ‘hierarchical fixed effect’ –but many hierarchies have too many clusters for this to be satisfactory 2.We could use higher level explanatory variables –e.g. average unemployment rate in local authority district –these are also ‘hierarchical fixed effects’ 3.We could try telling the model that we expect the error terms to be related –these are ‘hierarchical random effects’ = multilevel models
11
11 Creating a multilevel model Linear model: Y i = BX i + e i Multilevel model (‘random intercepts’) Y ij = BX ij + u j + e ij Multilevel model (‘random coefficients’) Y ij = BX ij + UB j + u j + e ij
12
12 How to implement multilevel models? In SPSS and Stata, there are extension specifications which can be made in order to specify the simplest random intercepts model
13
13 Stata examples regress ghq fem age age2 cohab regress ghq fem age age2 cohab, robust cluster(ohid) xtmixed ghq fem age age2 cohab ||ohid:
14
14 Comments Models which ignore clustering should be unbiassed but inefficient The simplest multilevel model: Shouldn’t change coefficent estimates (unbiased) Should change confidence intervals (inefficient)
15
15
16
16
17
17 3-level model in Stata (xtmixed)
18
18 The same model in MLwiN
19
19 A controversial claim about Stata Stata is the best package to use for multilevel modelling, because: –It is integrated with data management capacity: easy to change variables; change cases; add higher level explanatory variables; etc –It has a wide range of hierarchical model estimators –It allows easy comparison between long-standing hierarchical estimators (from economics) and new random effects models By constrast: –Other mainstream packages don’t have adequate range of model estimators –Specialist packages (e.g. MLwiN; HLM) do have more advanced modelling estimators, but they inhibit data manipulation / serious model building
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.