Multilevel Modeling. Multilevel Question Turns out the Simple Random Sampling is very expensive Travel to Moscow, Idaho to give survey to a single student.

Slides:



Advertisements
Similar presentations
Questions From Yesterday
Advertisements

Multilevel modelling short course
Regression and correlation methods
By Zach Andersen Jon Durrant Jayson Talakai
Statistical Analysis Overview I Session 2 Peg Burchinal Frank Porter Graham Child Development Institute, University of North Carolina-Chapel Hill.
Issues in factorial design
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Multiple regression analysis
Longitudinal Experiments Larry V. Hedges Northwestern University Prepared for the IES Summer Research Training Institute July 28, 2010.
Chapter 10 Simple Regression.
Dr. Chris L. S. Coryn Spring 2012
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Clustered or Multilevel Data
Treatment Effects: What works for Whom? Spyros Konstantopoulos Michigan State University.
Linear Regression and Correlation Analysis
Incomplete Block Designs
Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of Suppose.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
5-3 Inference on the Means of Two Populations, Variances Unknown
Analysis of Clustered and Longitudinal Data
Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Introduction to Multilevel Modeling Using SPSS
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Complexities of Complex Survey Design Analysis. Why worry about this? Many government studies use these designs – CDC National Health Interview Survey.
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Basic Statistics (for this class) Special thanks to Jay Pinckney (The HPLC and Statistics Guru) APOS.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Multiple Regression. In the previous section, we examined simple regression, which has just one independent variable on the right side of the equation.
Multilevel Modeling 1.Overview 2.Application #1: Growth Modeling Break 3.Application # 2: Individuals Nested Within Groups 4.Questions?
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
More complicated ANOVA models: two-way and repeated measures Chapter 12 Zar Chapter 11 Sokal & Rohlf First, remember your ANOVA basics……….
1 1 Slide © 2016 Cengage Learning. All Rights Reserved. The equation that describes how the dependent variable y is related to the independent variables.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2003 Thomson/South-Western Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
G Lecture 5 Example fixed Repeated measures as clustered data
Hierarchical Linear Modeling (HLM): A Conceptual Introduction Jessaca Spybrook Educational Leadership, Research, and Technology.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Introduction Multilevel Analysis
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
Multilevel Data in Outcomes Research Types of multilevel data common in outcomes research Random versus fixed effects Statistical Model Choices “Shrinkage.
Introduction to Multilevel Modeling Stephen R. Porter Associate Professor Dept. of Educational Leadership and Policy Studies Iowa State University Lagomarcino.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Chapter 13 Multiple Regression
Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.
Data Analysis in Practice- Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine October.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
FIXED AND RANDOM EFFECTS IN HLM. Fixed effects produce constant impact on DV. Random effects produce variable impact on DV. F IXED VS RANDOM EFFECTS.
Sampling and Statistical Analysis for Decision Making A. A. Elimam College of Business San Francisco State University.
Instructor: Dr. Amery Wu
Analysis of Experiments
1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.
Business Research Methods
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 3: Missing Data in Longitudinal Studies.
T-tests Chi-square Seminar 7. The previous week… We examined the z-test and one-sample t-test. Psychologists seldom use them, but they are useful to understand.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
An introduction to basic multilevel modeling
Presentation transcript:

Multilevel Modeling

Multilevel Question Turns out the Simple Random Sampling is very expensive Travel to Moscow, Idaho to give survey to a single student. The subsets are conventionally called primary sampling units or psu's. In a two-stage sample, rst a sample is drawn from the primary sampling units (the rst-stage sample), and within each psu included in the rst-stage sample, a sample of population elements is drawn (the second-stage sample). This can be extended to situations with more than two levels, e.g., individuals within households within municipalities, and then is called a multistage sample.

These are examples of two-level data structures, but extensions to multiple levels are possible: 10 cities ->In each city: 5 schools ->In each school: 2 classes ->In each class: 5 students ->Each student given the test twice

What is Multilevel or Hierarchical Linear Modeling? Nested Data Structures

Individuals Undivided Unit of Analysis = Individuals

Individuals Nested Within Groups Unit of Analysis = Individuals + Classes

… and Further Nested Unit of Analysis = Individuals + Classes + Schools

Examples of Multilevel Data Structures Neighborhoods are nested within communities Families are nested within neighborhoods Children are nested within families

Examples of Multilevel Data Structures Schools are nested within districts Classes are nested within schools Students are nested within classes

Multilevel Data Structures Level 4 District (l) Level 3 School (k) Level 2 Class (j) Level 1 Student (i)

2 nd Type of Nesting Repeated Measures Nested Within Individuals Focus = Change or Growth

Time Points Nested Within Individuals

Nested Data Data nested within a group tend to be more alike than data from individuals selected at random. Nature of group dynamics will tend to exert an effect on individuals.

Nested Data Intraclass correlation (ICC) provides a measure of the clustering and dependence of the data 0 (very independent) to 1.0 (very dependent) Details discussed later

Multilevel Modeling Seems New But…. Extension of General Linear Modeling Simple Linear Regression Multiple Linear Regression ANOVA ANCOVA Repeated Measures ANOVA

Why Multilevel Modeling vs. Traditional Approaches? Traditional Approaches – 1-Level 1. Individual level analysis (ignore group) 2. Group level analysis (aggregate data and ignore individuals)

Problems with Traditional Approaches 1. Individual level analysis (ignore group) Violation of independence of data assumption leading to misestimated standard errors (standard errors are smaller than they should be).

Problems with Traditional Approaches 1. Group level analysis (aggregate data and ignore individuals) Aggregation bias = the meaning of a variable at Level-1 (e.g., individual level SES) may not be the same as the meaning at Level-2 (e.g., school level SES)

Example: BeforeAfter PatientSBPDBPSBPDBP Paired t-test: the average change in DBP is significantly different from zero (p = ) Unpaired t-test: the average change in DBP is significantly different from zero (p = 0.036)

Multilevel Approach 2 or more levels can be considered simultaneously Can analyze within- and between- group variability

How Many Levels Are Usually Examined? 2 or 3 levels very common 15 students x 10 classes x 10 schools = 1,500

Types of Outcomes Continuous Scale (Achievement, Attitudes) Binary (pass/fail) Categorical with 3 + categories

Effect for estimation of a mean if the sample is a two-stage sample using random sampling with replacement at either stage or if the sampling fractions are so low that the difference between sampling with and sampling without replacement is negligible.

Effect for estimation of a mean Since considerations for the choice of a design always are of an approximate nature, only those designs are considered here where each level-two unit contains the same number of level-one units. Level-two units will sometimes be referred to as clusters. The number of level-two units is denoted N The number of level-one units within each level-two unit is denoted n These numbers are called the level-two sample size and the cluster size, respectively The total sample size is Nn. If in reality the number of level-one units fluctuates between level-two units, it will almost always be a reasonable approximation to use for n the average number of sampled level- one units per level-two unit.

Effect for estimation of a mean Suppose that the mean is to be estimated of some variable Y in a population which has a two-level structure. As an example, Y could be the duration of hospital stay after a certain operation under the condition that there are no complications or additional health problems. Random Intercept

Effect for estimation of a mean 1.This increase in complexity permeates to regression, etc 2.This is a relatively simple model, more complex models lead to more complex calculations that require the calculation of large covariance matrices

Easier Case Another alternative to this operation is to add a dummy variable for each individual The effect of each level-2 unit is a constant (fixed), not a random variable

Software to do Multilevel Modeling SAS Users PROC MIXED Extension of General Linear Modeling Simple Linear Regression Multiple Linear Regression ANOVA ANCOVA Repeated Measures ANOVA PROC REG PROC GLM PROC ANOVA

Example: Family and Gender The response variable Height measures the heights (in inches) of 18 individuals. The individuals are classified according to Family and Gender data heights; input Family Gender$ Height datalines; 1 F 67 1 F 66 1 F 64 1 M 71 1 M 72 2 F 63 2 F 63 2 F 67 2 M 69 2 M 68 2 M 70 3 F 63 3 M 64 4 F 67 4 F 66 4 M 67 4 M 67 4 M 69 ; run; Different than “Effects…” because now we have more cluster levels, but no random intercepts

Example: Family and Gender The PROC MIXED statement invokes the procedure. The CLASS statement instructs PROC MIXED to consider both Family and Gender as classification variables. Dummy (indicator) variables are, as a result, created corresponding to all of the distinct levels of Family and Gender. For these data, Family has four levels and Gender has two levels. proc mixed data=heights; class Family Gender; model Height = Gender Family Family*Gender/s; run; s : requests that a solution for the fixed-effects parameters be produced along with their approximate standard errors

Family and Gender Run program simple-proc_mixed2.sas What happens when you try to use the statement CLASS in a PROC REG? Ordinary Linear Regression coefficients are just one set of them, while for HLM coefficients are estimated for each group unit (i.e., school)

Dorsal shells in lizards Two-sample t-test: the small observed difference is not significant (p = ).

Mother effect We have 102 lizards from 29 mothers Mother effects might be present Hence a comparison between male and female animals should be based on within-mother comparisons.

Mother effect Mother # of dorsal shells

First Choice Test for a ‘sex’ effect, correcting for ‘mother’ effects, Β can be interpreted as the average difference between males and females for each mother More complex example than “Effect…” because now we have a variable x ij for each observation

SAS program proc mixed data = lizard; class mothc; model dors = sex mothc; run; Source F Value Pr > F SEX MOTHC 3.95 < Highly significant mother effect. 2.Significant gender effect. 3.Many degrees of freedom are spent to the estimation of the mother effect, which is not even of interest Notice that in the previous example “Family and Gender”, gender was a used to define level(cluster) here is just a variable. In previous example it was assumed that individual of the same gender were “clustered”/correlated? Now it is just an input variable

Later in this semester… Note the different nature of the two factors: SEX: defines 2 groups of interest MOTHER: defines 29 groups not of real interest. A new sample would imply other mothers. In practice, one therefore considers the factor ‘mother’ as a random factor. The factor ‘sex’ is a fixed effect. Thus the model is a mixed model. In general, models can contain multiple fixed and/or random factors. Fixed Effects ModelRandom Effects Model As in the slides of “Effect…”

Later in this semester… Note the different nature of the two factors: SEX: defines 2 groups of interest MOTHER: defines 29 groups not of real interest. A new sample would imply other mothers. In practice, one therefore considers the factor ‘mother’ as a random factor. The factor ‘sex’ is a fixed effect. Thus the model is a mixed model. In general, models can contain multiple fixed and/or random factors. proc mixed data = lizard; class mothc; model dors = sex / solution; random mothc; run;

More terminology Fixed coefficient A regression coefficient that does not vary across individuals Random coefficient A regression coefficient that does vary across individuals

Is a variable random or fixed effect? LaMotte 1983, pp. 138–139 Treatment levels used are the only ones about which inferences are sought => fixed Effect Inferences are sought about a broader collection of treatment effects than those used in the experiment, or if the treatment levels are not selected purposefully => Random Effect

More terminology Balanced design Equal number of observations per unit Unbalanced design Unequal number of observation per unit Unconditional model Simplest level 2 model; no predictors of the level 1 parameters (e.g., intercept and slope) Conditional model Level 2 model contains predictors of level 1 parameters

Weighted Data Problem: Minority Voters White Voters Pct. of Voting Population Minority Voters White Voters Pct. of People who have a phone Solution: Give more “weight” to the minority people with telephone

Weighted Data Not limited to 2 categories Minority/Dem. White /Rep Pct. of Voting Population Pct. of People who have a phone Minority/Rep. White /Dem How many categories? As many as there are significant Minority/Dem. White /Rep Minority/Rep. White /Dem

Proportion Suppose minority voters are 1/3 of the voting population but only 1/6 of the people with phone Needless to say that in reality this is a much more complex issue A sampling weight for a given data point is the number of receipts in the target population which that sample point represents.

Which weight we need to use? Oversimplified example (don’t take seriously) Minority Voters White Voters Pct. of People who have a phone Minority Voters White Voters Pct. of Voting Population in 2008 Minority Voters White Voters Pct. of Voting Population in 2010 O M

Proportion minority white answer the phone survey 2.75 Minority will vote for candidate X White will votes for candidate X 4.Non-Weighted Conclusion: 325/600 =54.16% of the voters will vote for candidate X 5.Weighted Conclusion: 1.75 minority = 75% of minority with phone=>(.75)*(1/6)=12.5% of people with phone * 2 weight= 25% pct of voting population white = 50% of white people with phone =>(.5)*(5/6)= 41.66% of people with phone *.8 weight =>33.33% 3. 25% %=58.33% Suppose minority voters are 1/3 of the voting population but only 1/6 of the people with phone

SAS Weighted Mean proc means data=sashelp.class; var height; run; proc means data=sashelp.class; weight weight; var height; run;

Weighted PROC MIXED proc mixed data=sashelp.class covtest; class Sex; model height=Sex Age/solution; weight weight; run; proc mixed data=sashelp.class covtest; class Sex; model height=Sex Age/solution; weight weight; run; Notice the difference (kind of small) in let’s say the coefficients of the model (Solution for Fixed Effects/Estimates)

Farms Example It's stratified by regions within Iowa and Nebraska. Regress on farm area, with separate intercept and slope for each state

References LaMotte, L. R. (1983). Fixed-, random-, and mixed-effects models. In Encyclopedia of Statistical Sciences, S. Kotz, N. L. Johnson, and C. B. Read