1 Scottish Social Survey Network: Master Class 1 Data Analysis with Stata Dr Vernon Gayle and Dr Paul Lambert 23 rd January 2008, University of Stirling.

Slides:



Advertisements
Similar presentations
Questions From Yesterday
Advertisements

Multilevel modelling short course
Depression and work incapacity in Scotland: Evidence from the Scottish Health and British Household Panel Surveys Matt Sutton Will Whittaker Health Methodology.
The Marginal Utility of Income Richard Layard* Guy Mayraz* Steve Nickell** * CEP, London School of Economics ** Nuffield College, Oxford.
ASSUMPTION CHECKING In regression analysis with Stata
Hierarchical Linear Modeling: An Introduction & Applications in Organizational Research Michael C. Rodriguez.
Statistical Analysis Overview I Session 2 Peg Burchinal Frank Porter Graham Child Development Institute, University of North Carolina-Chapel Hill.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
SC968: Panel Data Methods for Sociologists Random coefficients models.
Katie Reed EPSSA Methods Workshop. School environment New Latino destinations Immigrant Incorporation Importance of “context of reception” for immigrants’
© McGraw-Hill Higher Education. All Rights Reserved. Chapter 2F Statistical Tools in Evaluation.
Advanced Methods and Models in Behavioral Research – 2014 Been there / done that: Stata Logistic regression (……) Conjoint analysis Coming up: Multi-level.
Latent Growth Curve Modeling In Mplus:
Lecture 8 Relationships between Scale variables: Regression Analysis
Psychology 202b Advanced Psychological Statistics, II February 1, 2011.

Clustered or Multilevel Data
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
An Introduction to Logistic Regression
Multiple Regression Research Methods and Statistics.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Basic Analysis of Variance and the General Linear Model Psy 420 Andrew Ainsworth.
Introduction to Multilevel Modeling Using SPSS
Understanding Multivariate Research Berry & Sanders.
Advanced Business Research Method Intructor : Prof. Feng-Hui Huang Agung D. Buchdadi DA21G201.
Multilevel Analysis Kate Pickett Senior Lecturer in Epidemiology.
MULTILEVEL ANALYSIS Kate Pickett Senior Lecturer in Epidemiology SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.ppt‎University of York.
Scottish Social Survey Network: Master Class 1 Data Analysis with Stata Dr Vernon Gayle and Dr Paul Lambert 23 rd January 2008, University of Stirling.
Advanced Methods and Models in Behavioral Research – 2010/2011 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.
Hierarchical Linear Modeling (HLM): A Conceptual Introduction Jessaca Spybrook Educational Leadership, Research, and Technology.
Introduction Multilevel Analysis
Statistics and Quantitative Analysis U4320 Segment 12: Extension of Multiple Regression Analysis Prof. Sharyn O’Halloran.
Lambert/Gayle, RC Individuals in Household Panels: The importance of person group clustering Paul Lambert & Vernon Gayle Dept. Applied Social Science,
Longitudinal Data Analysis Professor Vernon Gayle
HSRP 734: Advanced Statistical Methods June 19, 2008.
Funded through the ESRC’s Researcher Development Initiative Prof. Herb MarshMs. Alison O’MaraDr. Lars-Erik Malmberg Department of Education, University.
Multilevel Data in Outcomes Research Types of multilevel data common in outcomes research Random versus fixed effects Statistical Model Choices “Shrinkage.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Data Analysis – Statistical Issues Bernd Genser, PhD Instituto de Saúde Coletiva, Universidade Federal da Bahia, Salvador
Multilevel Modeling Software Wayne Osgood Crime, Law & Justice Program Department of Sociology.
SAMPLE SELECTION in Earnings Equation Cheti Nicoletti ISER, University of Essex.
Scottish Social Survey Network: Master Class 1 Data Analysis with Stata Dr Vernon Gayle and Dr Paul Lambert 23 rd January 2008, University of Stirling.
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
Assumption checking in “normal” multiple regression with Stata.
Data Analysis in Practice- Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine October.
More complex event history analysis. Start of Study End of Study 0 t1 0 = Unemployed; 1 = Working UNEMPLOYMENT AND RETURNING TO WORK STUDY Spell or Episode.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
FIXED AND RANDOM EFFECTS IN HLM. Fixed effects produce constant impact on DV. Random effects produce variable impact on DV. F IXED VS RANDOM EFFECTS.
Multilevel Modeling. Multilevel Question Turns out the Simple Random Sampling is very expensive Travel to Moscow, Idaho to give survey to a single student.
Individuals in Household Panels1 Individuals in Household Panels: The importance of person group clustering Paul Lambert & Vernon Gayle Dept. Applied Social.
Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.
Analysis of Experiments
Proposed Statistical Methodology for the Canadian Heart Health Surveys Follow-up Study
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
: LSS1 Longitudinal Studies Seminars: Longitudinal Analyses Using STATA Stirling University, Data and Variable Management Paul Lambert.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Survey Design: Some Implications for.
Beginners statistics Assoc Prof Terry Haines. 5 simple steps 1.Understand the type of measurement you are dealing with 2.Understand the type of question.
Multiple Regression Scott Hudson January 24, 2011.
Chapter 11 REGRESSION Multiple Regression  Uses  Explanation  Prediction.
Using Multilevel Modeling in Institutional Research
Econ 326 Prof. Mariana Carrera Lab Session X [DATE]
B&A ; and REGRESSION - ANCOVA B&A ; and
Linear Mixed Models in JMP Pro
HLM with Educational Large-Scale Assessment Data: Restrictions on Inferences due to Limited Sample Sizes Sabine Meinck International Association.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Advanced Quantitative Analysis
Model Comparison: some basic concepts
Migration and the Labour Market
From GLM to HLM Working with Continuous Outcomes
Section 5 Multiple Regression.
Presentation transcript:

1 Scottish Social Survey Network: Master Class 1 Data Analysis with Stata Dr Vernon Gayle and Dr Paul Lambert 23 rd January 2008, University of Stirling The SSSN is funded under Phase II of the ESRC Research Development Initiative

2 Multilevel data and analysis with Stata (in 15 minutes)

3 Generalised linear model Y = BX + e Y = outcome variable(s) X = explanatory variables e = error term for each individual response Generalised linear mixed models –Adding complexity to the GLM, such as by disaggregating the error structures

4 The work of statistical modelling Y i = BX i + e i Most of the time: –we have a single Y –we ignore e –we concentrate on what goes into B

5 Example Data: British Household Panel Survey 2005 adult interviews (7k adults in work) Y = GHQ scale score for adults in employment (General Health Questionnaire, higher = worse subjective well-being) X = various possible measures, including gender, age, marital status, occupational advantage, education, partner’s GHQ You can run this example, the files are at:

Results from four linear models 1234 Cons11.03**6.29**6.14**6.56** Fem1.25**1.28**1.39** Age0.22**0.23**0.22** Age-squared ** ** ** Cohab-0.33*-0.77**-0.76**-1.52** Own CAMSIS-0.01*-0.01 Father’s CAMSIS0.01 Degree/Diploma-0.05 Vocational qual-0.13 No qual-0.11 Works > 10hrs0.13 Partner’s GHQ0.08** R

7 Some regression assumptions  All variables are measured without errors  All relevant predictors of the independent variable are included in the analysis  Expected value of the error is zero  Heteroscedasticity of the error  No autocorrelation (no relation between error terms for different cases) –[above using: Menard, S Applied Logistic Regression Analysis, London: Sage.]

8 Multilevel modelling What if there was some connection between some of the cases within the dataset? –This occurs by design in certain projects e.g. educational research, sample includes multiple children from the same school –Some connections (‘hierarchical clusters’) are standard in most social surveys

9

10 How to account for hierarchy / clustering in individual data? 1.We could try a unique dummy var. for every cluster –Country: Y = BX + scot + wal + Nir + e –‘areg’ in Stata allows several hundred variables like this –often called a ‘hierarchical fixed effect’ –but many hierarchies have too many clusters for this to be satisfactory 2.We could use higher level explanatory variables –e.g. average unemployment rate in local authority district –these are also ‘hierarchical fixed effects’ 3.We could try telling the model that we expect the error terms to be related –these are ‘hierarchical random effects’ = multilevel models

11 Creating a multilevel model Linear model: Y i = BX i + e i Multilevel model (‘random intercepts’) Y ij = BX ij + u j + e ij Multilevel model (‘random coefficients’) Y ij = BX ij + UB j + u j + e ij

12 How to implement multilevel models? In SPSS and Stata, there are extension specifications which can be made in order to specify the simplest random intercepts model

13 Stata examples regress ghq fem age age2 cohab regress ghq fem age age2 cohab, robust cluster(ohid) xtmixed ghq fem age age2 cohab ||ohid:

14 Comments Models which ignore clustering should be unbiassed but inefficient The simplest multilevel model:  Shouldn’t change coefficent estimates (unbiased)  Should change confidence intervals (inefficient)

15

16

17 3-level model in Stata (xtmixed)

18 The same model in MLwiN

19 A controversial claim about Stata Stata is the best package to use for multilevel modelling, because: –It is integrated with data management capacity: easy to change variables; change cases; add higher level explanatory variables; etc –It has a wide range of hierarchical model estimators –It allows easy comparison between long-standing hierarchical estimators (from economics) and new random effects models By constrast: –Other mainstream packages don’t have adequate range of model estimators –Specialist packages (e.g. MLwiN; HLM) do have more advanced modelling estimators, but they inhibit data manipulation / serious model building