Introduction to Multilevel Modeling Stephen R. Porter Associate Professor Dept. of Educational Leadership and Policy Studies Iowa State University Lagomarcino.

Slides:



Advertisements
Similar presentations
Questions From Yesterday
Advertisements

Properties of Least Squares Regression Coefficients
Hierarchical Linear Modeling: An Introduction & Applications in Organizational Research Michael C. Rodriguez.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Kin 304 Regression Linear Regression Least Sum of Squares
The Simple Regression Model
The Multiple Regression Model.
Statistical Analysis Overview I Session 2 Peg Burchinal Frank Porter Graham Child Development Institute, University of North Carolina-Chapel Hill.
Correlation and regression
Forecasting Using the Simple Linear Regression Model and Correlation
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.
Objectives (BPS chapter 24)
Graphs in HLM. Model setup, Run the analysis before graphing Sector = 0 public school Sector = 1 private school.
Psychology 202b Advanced Psychological Statistics, II
Longitudinal Experiments Larry V. Hedges Northwestern University Prepared for the IES Summer Research Training Institute July 28, 2010.
Chapter 10 Simple Regression.
CHAPTER 3 ECONOMETRICS x x x x x Chapter 2: Estimating the parameters of a linear regression model. Y i = b 1 + b 2 X i + e i Using OLS Chapter 3: Testing.
4. Multiple Regression Analysis: Estimation -Most econometric regressions are motivated by a question -ie: Do Canadian Heritage commercials have a positive.
Clustered or Multilevel Data
Treatment Effects: What works for Whom? Spyros Konstantopoulos Michigan State University.
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Chapter 11 Multiple Regression.
Multiple Regression and Correlation Analysis
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Correlation and Regression Analysis
Simple Linear Regression Analysis
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Analysis of Clustered and Longitudinal Data
Objectives of Multiple Regression
Introduction to Multilevel Modeling Using SPSS
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Chapter 11 Simple Regression
Introduction to Regression Analysis. Two Purposes Explanation –Explain (or account for) the variance in a variable (e.g., explain why children’s test.
Hierarchical Linear Modeling (HLM): A Conceptual Introduction Jessaca Spybrook Educational Leadership, Research, and Technology.
Introduction Multilevel Analysis
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Multilevel Linear Modeling aka HLM. The Design We have data at two different levels In this case, 7,185 students (Level 1) Nested within 160 Schools (Level.
Examining Relationships in Quantitative Research
Ordinary Least Squares Estimation: A Primer Projectseminar Migration and the Labour Market, Meeting May 24, 2012 The linear regression model 1. A brief.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
SW388R6 Data Analysis and Computers I Slide 1 Multiple Regression Key Points about Multiple Regression Sample Homework Problem Solving the Problem with.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 13 Multiple Regression
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Lecture 14 Summary of previous Lecture Regression through the origin Scale and measurement units.
Political Science 30: Political Inquiry. Linear Regression II: Making Sense of Regression Results Interpreting SPSS regression output Coefficients for.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Correlation & Regression Analysis
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
Sampling and Nested Data in Practice-Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Analysis of Experiments
4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look.
Multiple Regression David A. Kenny January 12, 2014.
Introduction to Multilevel Analysis Presented by Vijay Pillai.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Lecture 6 Feb. 2, 2015 ANNOUNCEMENT: Lab session will go from 4:20-5:20 based on the poll. (The majority indicated that it would not be a problem to chance,
REGRESSION G&W p
HLM with Educational Large-Scale Assessment Data: Restrictions on Inferences due to Limited Sample Sizes Sabine Meinck International Association.
From GLM to HLM Working with Continuous Outcomes
Presentation transcript:

Introduction to Multilevel Modeling Stephen R. Porter Associate Professor Dept. of Educational Leadership and Policy Studies Iowa State University Lagomarcino Hall Ames, IA

2 Goals of the workshop Understand why multilevel modeling is important and understand basic 2-level models. Become informed consumer of multilevel research. Know how to estimate some simple models using the software package HLM. Have a thorough grounding in the basics so you can learn more complicated multi-level techniques (3- level, SEM, etc.) on your own.

3 Schedule 1 st day Review and discuss multilevel terminology and theory Begin reviewing choices in model building 2 nd day Estimate simple 2-level models using student version of HLM Discuss in detail model building.

Introduction

5 Why multilevel modeling? Nested data are very common in higher education. Analysis of nested data poses unit of analysis problem – should we analyze the individual or the group? Unfortunately, we often can’t choose one over the other. Traditional linear models offer a simple view of a complex world – generally assume same effects across groups. If effects do differ across groups, we can explain these differences with multilevel modeling.

6 Unit of analysis problem: individual, group or both? Example: studying what affects student retention (1000 students per college) in a group of colleges (n=50). Total dataset N=50,000. We can assign college-level variables to each individual, but … We end up estimating the standard errors for college-level variables using N=50,000. Yet we only have 50 different college observations, so N really equals 50.

7 Unit of analysis problem: individual, group or both? Alternatively, we can average student data for each college so that we have 1 observation per college (N=50). Now we have reduced variance on our student- level variables. We also have variables which measure both individual student characteristics (SAT score=aptitude/preparation) and college environment (average SAT score=selectivity).

8 What are nested data? Simply put, sub-units are grouped (or “nested”) within larger units. Often the data are observations of individuals nested within groups. Key: individuals within groups are more similar to one another than to individuals in other groups. We can empirically verify this. Sometimes data are multiple observations nested within an individual.

9 Students/faculty nested within departments/disciplines Note that this could be one institution, or individuals from several different institutions. Examples: student satisfaction, gains in skills; faculty salaries, research productivity.

10 Students/faculty nested within institutions Examples: student satisfaction, retention

11 Time periods nested within students Example: grade-point average

Terms and theory

13 Terminology: HLM HLM stands for hierarchical linear models. It is both a statistical technique and a software package. People also use the term multilevel models. Economists often refer to these models as random- coefficient regression models

14 Terminology: levels Level-1 variables: These are the variables that are nested within groups. Typically these are individual-level variables. Level-2 variables Typically these are unit-level variables. Note that growth models have time periods at level-1, and individuals at level-2.

15 Terminology: variance Numbers represent people, each number is a person’s question response on a 5-point Likert scale; 6 groups Variance between groups only: Variance within groups only: Variance both between and within groups:

16 Terminology: random and fixed Fixed effects are variable coefficients that are constant across groups, they do not vary. Typical OLS coefficients. Random effects are coefficients that can vary across groups. This means the coefficient can take a different value for each group. E.g., if we allow an intercept for each group, then the intercept is said to be random. It is random because we assume it is stochastic. Yet we can also explain some of this variance with other variables.

17 One way to think about multilevel models: “slopes-as-outcomes” Suppose we estimate 1 regression equation for each group, e.g., for the 1,000 students in school A, the 1,000 students in school B, etc. The result is 50 regression equations. We then take the slope coefficients for each school, as well as information about each school such as private/public status, and make a new dataset. We run a regression model on these 50 observations using the slope coefficients (or intercepts) as the dependent variable and public/private status as the independent variable. The result is a single set of coefficients for the school dataset.

18 Now for some algebra! You must learn some of the basic mathematical notation used in multilevel modeling. As we will see, the program HLM uses this notation to express the models that you estimate. Understanding these basic symbols and expressions will allow you to tackle more complex analyses, and understand other researchers’ more complex analyses.

19 A level-1 model: multiple students in one school (familiar OLS equation) Student i is viewed as having average achievement in the school, plus a positive deviation due to SES, plus a positive or negative deviation due to the unique circumstances of the student.

20 A level-1 model: multiple students in multiple schools Now we’re estimating the equation from before for each school. Each school can have a different average achievement (or intercept), and a different impact of SES on achievement (or slope).

21 Need to make some additional assumptions about the coefficients, because they vary Student-level errors are normally distributed. Gamma’s: we expect the average achievement for school j to equal the average school mean for all j schools, and the slope of SES for school j to equal the average of the slopes for all j schools. Tau’s: these are the variances of the intercepts and slopes, and the covariance between them.

22 Level-2 model: explaining the Level-1 coefficients Since our intercepts and slopes vary by school, we can now model why they vary. Suppose we hypothesize that levels of achievement and impact of SES are related to whether a school is public or Catholic. We need equations for the intercept and slope to describe our hypothesis:

23 Level-2 model: continued

24 So math achievement of an individual student in school j is explained by … mean achievement in public schools, plus impact of a school being Catholic on mean achievement (if j is Catholic) the effect of SES on achievement, plus the impact of a school being Catholic on how SES affects achievement (again, if j is Catholic) student- and school-specific error terms

25 Summary Level-1 Level-2 Explain dependent variable Explain slopes Explain intercepts

Some practical aspects of multilevel modeling

27 Questions to answer Can you use multilevel techniques to study your dependent variable? Should you use multilevel techniques to study your dependent variable? How will you center your level-1 and level-2 predictors? Which of the level-1 coefficients will be explained at level-2? I.e., are they fixed or random? How does my model perform?

28 Can I use HLM? HLM requires a large amount of data. Minimum: number of groups: 30, but most recommend 50+ number of individuals within groups: 5-10, but can have low as 1. average group size: 10, obviously more is better.

29 Should I be using HLM? How much of the variance in your dependent variable is explained by group membership? Intraclass correlation coefficient (ICC) = var between groups (var between groups+var within groups)

30 Centering variables Whether and how you center is a very important decision: interpretation of results depends on your choice. Important because the intercept at level-1 is also a dependent variable. Centering Refers to subtracting a mean from your independent variables. The transformed value for an individual measures how much they deviate (+/-) from the mean.

31 Centering variables Suppose we center verbal SAT scores around a student mean of 500. How would we interpret a regression coefficient if all variables were similarly transformed?

32 Centering variables Why would we want to center? Variable may lack a natural zero point, such as SAT score. Stability of estimates at level-1 affected by location of variables. Location at level-2 is less important.

33 Centering variables Generally two types of centering. For a specific variable: Grand mean centering – subtract the mean for the entire sample from each observation in the sample. Group mean centering – subtract the mean for each group from each member of the group. To fully understand the implications of centering, see the discussion in Bryk and Raudenbush (2002) pp

34 Fixed or random? It would be nice to have everything random; that is, a different set of coefficients for each group. But due to HLM demands on data, usually only the intercept and a few variables can be random. Important: if you randomize gender and you have a group without females, that group will be dropped. Generally you should run parallel models for intercept and slopes, as in our theory example.

35 Model statistics Goodness of fit: Proportion of variance explained at level-1 Variance explained at level-2

36 Some thoughts about building your models Before using HLM, run OLS regressions for sample and for each group. Building the null model: This is should be your first step. Calculate the ICC Building the level-1 models: Should be theory driven Step-up approach Be cautious about what you leave as random – it’s often difficult to leave more than the intercept and one variable as random

37 Some thoughts about building your models Building the level-2 models Rule of thumb: 10 observations/variable Parallel models Many scholars drop insignificant variables at both levels. (I disagree with this.)