Multi-level Analysis Recognizing the Problem Maureen Smith, MD PhD Depts. of Population Health Sciences and Family Medicine University of Wisconsin-Madison.

Slides:



Advertisements
Similar presentations
Regression and correlation methods
Advertisements

By Zach Andersen Jon Durrant Jayson Talakai
Statistical Analysis Overview I Session 2 Peg Burchinal Frank Porter Graham Child Development Institute, University of North Carolina-Chapel Hill.
Objectives 10.1 Simple linear regression
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Linear Regression t-Tests Cardiovascular fitness among skiers.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
HSRP 734: Advanced Statistical Methods July 24, 2008.
Advanced Methods and Models in Behavioral Research – 2014 Been there / done that: Stata Logistic regression (……) Conjoint analysis Coming up: Multi-level.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
The Simple Linear Regression Model: Specification and Estimation
School of Veterinary Medicine and Science Multilevel modelling Chris Hudson.
Mixing it up: Mixed Models Tracy Tomlinson December 11, 2009 Tracy Tomlinson December 11, 2009.
Chapter 10 Simple Regression.
Clustered or Multilevel Data
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Ch. 14: The Multiple Regression Model building
Today Concepts underlying inferential statistics
Longitudinal Data Analysis: Why and How to Do it With Multi-Level Modeling (MLM)? Oi-man Kwok Texas A & M University.
Correlation and Regression Analysis
Simple Linear Regression Analysis
Go to Table of ContentTable of Content Analysis of Variance: Randomized Blocks Farrokh Alemi Ph.D. Kashif Haqqi M.D.
Relationships Among Variables
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Introduction to Multilevel Modeling Using SPSS
Chapter 13: Inference in Regression
One-Factor Experiments Andy Wang CIS 5930 Computer Systems Performance Analysis.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Using the Margins Command to Estimate and Interpret Adjusted Predictions and Marginal Effects Richard Williams
Advanced Methods and Models in Behavioral Research – 2010/2011 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
G Lecture 5 Example fixed Repeated measures as clustered data
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
Introduction Multilevel Analysis
12a - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part I.
Statistics for the Social Sciences Psychology 340 Fall 2013 Correlation and Regression.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
From Theory to Practice: Inference about a Population Mean, Two Sample T Tests, Inference about a Population Proportion Chapters etc.
Multilevel Linear Models Field, Chapter 19. Why use multilevel models? Meeting the assumptions of the linear model – Homogeneity of regression coefficients.
Multilevel Data in Outcomes Research Types of multilevel data common in outcomes research Random versus fixed effects Statistical Model Choices “Shrinkage.
Introduction to Multilevel Modeling Stephen R. Porter Associate Professor Dept. of Educational Leadership and Policy Studies Iowa State University Lagomarcino.
Multilevel Linear Modeling aka HLM. The Design We have data at two different levels In this case, 7,185 students (Level 1) Nested within 160 Schools (Level.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
Chapter 13 Multiple Regression
Right Hand Side (Independent) Variables Ciaran S. Phibbs June 6, 2012.
Right Hand Side (Independent) Variables Ciaran S. Phibbs.
Data Analysis in Practice- Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine October.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Multilevel Modeling. Multilevel Question Turns out the Simple Random Sampling is very expensive Travel to Moscow, Idaho to give survey to a single student.
ANOVA, Regression and Multiple Regression March
Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.
Sampling and Nested Data in Practice-Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine.
Analysis of Experiments
1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 3: Missing Data in Longitudinal Studies.
G Lecture 71 Revisiting Hierarchical Mixed Models A General Version of the Model Variance/Covariances of Two Kinds of Random Effects Parameter Estimation.
Chapter 14 Introduction to Multiple Regression
Multiple Regression Analysis and Model Building
HLM with Educational Large-Scale Assessment Data: Restrictions on Inferences due to Limited Sample Sizes Sabine Meinck International Association.
CHAPTER 29: Multiple Regression*
An Introductory Tutorial
One-Factor Experiments
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Multi-level Analysis Recognizing the Problem Maureen Smith, MD PhD Depts. of Population Health Sciences and Family Medicine University of Wisconsin-Madison

Target Audience Those of you likely to sit down in front of a computer and try to do this! Make sure you have pencil and paper.

Goals Today Understand how –Data and multi-level modeling relate –Underlying concepts are ubiquitous –What the typical output means –Much trouble you can get into We will not –Spend a lot of time of statistical tests –Spend a lot of time of software

A day in the life of a researcher We have data –ID (observation #) –X (variable 1) –Y (variable 2) We want to use the value of X to explain the value of Y IDXY

Welcome to the fantasy world of linear regression A simple model y i = intercept + slope(x i ) + error i indicates observations (1…N) Assumptions –Linearity –Independence –Normality –Constant variance

Reality check How often are observations truly independent from one another? –Dot indicates geographic location of teenager –Orange or green indicates hair color Do these teenagers look independent?

Clustering (Artificial or natural)

1) Clustering introduced in sampling Multistage sampling –Circles represent city blocks –Blocks randomly sampled –All persons in block surveyed to determine attitudes Persons in one block are more like their neighbors than persons who live in another block Nesting or clustering of data –Persons within blocks Block 1 Block 2 Block 3 Block 4 Not all blocks are selected

Effect of sample design on errors Errors in linear regression –Assume independence –Each person => info –Each person worth “1” If clustering occurs –Obs not independent –Each person => less info –Each person worth < “1” Block 1 Block 2 Block 3 Block 4

Simple linear regression won’t work! Violates assumption of independence If don’t account for it –Standard errors are too small –Makes coefficients look more significant –“You think there is more information in the data than actually exists”

How much information is lost? “Design Effect” If designing a study using multistage sampling, need to increase sample size to account for loss of information Design effect –Each observation is “worth less” –Need to estimate your “effective” sample size –Used for sample size calculations in multi-stage sampling N effective = N n Design effect

Questions – Pair up! Multi-stage sample design –City blocks N= 3 –Persons N=26 Design effect = 2 1.What is the effective sample size? 2.What sample size would you use in your power calculations? Block 1 Block 2 Block 3 Block 4

2) Clustering introduced naturally Analyze costs of care for hospitalized patients Patients in one hospital are more alike than patients in another hospital Nesting or clustering of data –Patients within hospitals Hospital 1 Hospital 2 Hospital 3 Hospital 4

Effect of natural clusters on errors Same effect on errors –Obs not independent –Each person => less info –Each person worth < “1” Simple linear regression won’t work! Hospital 1 Hospital 2 Hospital 3 Hospital 4

Accounting for Clustering Do we care?

What do we do? First question - do we care? –Is clustering a nuisance? OR –Is clustering an interesting phenomenon? Leads to different analytic strategies

If clustering is a nuisance Example - Multi-stage sampling –Don’t care how people vary within city blocks versus between city blocks –Artificially imposed by the sampling design –Not interested in measuring it –Just want to correct for it Use analytic strategies that correct for clustering

How to correct errors for clustering Robust estimates of variance –Stata “, robust cluster (____)” –SAS empirical estimates of variance Programs that account for complex survey design (weights, strata, clusters) –Stata “svy” commands –SAS “survey___” commands Other strategies

If clustering is interesting Example - examine costs for hospitalized patients Split out the variation in costs –How much variation due to differences in patients? –How much variation due to differences in hospitals? Examine factors that explain variation in costs –Characteristics of patients –Characteristics of hospitals Analytic strategy = Multi-level modeling!

Questions 1.Identify 3 patient characteristics that might explain variation in costs 2.Identify 3 hospital characteristics that might explain variation in costs 3.Do you think more of the variation in costs is explained by the patient or the hospital?

Representing Clustering in a Model (Multi-level models) (Hierarchical linear models) (Random effects models)

The concept of “levels” Our example – 2 levels –Micro = patients (N=26) Micro-level = “units” –Macro = hospitals (N=3) Macro-level = “groups” At each level –Patient characteristics –Hospital characteristics Hospital 1 Hospital 2 Hospital 3 Hospital 4

Data Structure - Patient Patient ID Hospital ID Age (X) Cost (Y) Y represents a patient characteristic –Cost (thousands of $) X represents a patient characteristic –Age –Note – understand possible mechanism at each step –“Older patients are sicker and tend to cost more” Patient-level data ( = “unit-level data”)

Simple Linear Regression y i = a + bx i + e i i indexes patients (i=1 to N) Relates x to y Both variables are patient characteristics Remember the assumptions

Questions cost i = a + b(age i ) + e i 1.Is there a problem with this model when applied to these data? 2.If so, what? Patient ID Hospital ID Age (X) Cost (Y)

The Problem Does not account for the clustering of patients within hospitals –Data have a structure that is not represented –e i - Assumption of independence is not met Patient ID Hospital ID Age (X) Cost (Y)

Do we care? If clustering is nuisance => Stata robust option If clustering is interesting => Multilevel model

Data Structure - Hospital Hospital ID Beds (W) Hospital-level data ( = “group-level data”) W represents a hospital characteristic –# of beds in the hospital Possible mechanism = “Bigger hospitals do things more expensively” –More technology –More high-cost specialists

Combined Data Structure Hospital ID Patient ID Age (X) Cost (Y) Patient-level data + Hospital ID Beds (W) Hospital-level data = ?

Combined Data Structure Patient ID Hospital ID Age (X) Cost (Y) Beds (W) Patient- and hospital-level data Age (X) and Cost (Y) –Variation between patients Beds (W) –Only variation between hospitals –No variation within hospitals

WARNING – Equations coming up! Remember - In multi-level modeling … SUBSCRIPTS ARE YOUR FRIENDS!

Simple Linear Regression (one approach to modeling this data structure) y ij = a + bx ij + dw j + e ij j indexes hospitals (j=1 to N) i indexes patients within hospitals (i=1 to n j ) cost ij = a + b(age ij ) + d(beds j ) + e ij Frequently used

Questions 1.Is there a problem with this model when applied to these data? 2.If so, what? Patient ID Hospital ID Age (X) Cost (Y) Beds (W) cost ij = a + b(age ij ) + d(beds j ) + e ij

The Problem, Part 2 You must assume that all of the data structure is represented by the explanatory variables Unlikely this will account for the clustering of patients within hospitals –Assumes that all clustering within hospitals is explained by the number of beds in the hospital (W) –If “beds” does not explain all clustering, then assumption of independence is not met for e ij

How do we represent the clustering? Let the regression coefficients vary from group to group y ij = a j + b j x ij + dw j + e ij Groups j can have higher or lower values of a j and b j Why not create d j ?

Starting simple – random intercept Model the clustering between groups –Let the intercept only (a j ) vary from group to group –Take out all group-level variables (W) y ij = a j + bx ij + e ij Groups j - higher or lower values of a j only Assumes some groups tend to have, on average, higher or lower values of Y

Question y ij = a j + bx ij + e ij 1.Why take the group-level variable (W) out of this model? 2.Must W be taken out of the model?

How do we want to model variation between groups? W – a “partial” way to model variation between groups –If included, it will pick up part of the variation between groups –“Part of the variation in costs between hospitals will be explained by the number of beds in the hospital” Goal of a random intercept model –Model the actual structure of the data –Let groups vary, on average, in Y –“Let the hospitals vary, on average, in cost”

How do we actually do it? y ij = a j + bx ij + e ij Split a j into (a 0 + u j ) y ij = a 0 + u j + bx ij + e ij a 0 = average intercept (constant) u j = deviation from the average intercept for group j = conditional on X, individuals in group j have Y values that are u j higher than the overall average “Conditional on patient age, patients in Hospital j have costs that are u j higher than the average costs for all patients”

Representation as Equations Single Equation vs. Multiple Equation Representation (1) y ij = a 0 + bx ij + u j + e ij OR (2) y ij = a j + bx ij + e ij a j = a 0 + u j u j = deviation from the overall average for group j

What do we do with u j ? Part 1 – Fixed effects Are groups j regarded as unique? –Do you want to draw conclusions about each group? TREAT AS “FIXED EFFECTS” Create j – 1 indicator variables (0/1) Leads to j – 1 regression parameters

Questions 1.For our data, what does this equation look like if u j is modeled as a fixed effect? 2.Are all indicator variables in a model also fixed effects? Patient ID Hospital ID Age (X) Cost (Y) cost ij = a 0 + b(age ij ) + u j + e ij

Modeling u j as a fixed effect (u j = “differences between hospitals”) cost ij = a 0 + b(age ij ) + c(hosp2 ij ) + e ij hosp2 = 0/1 –1 = patient i in hospital 2, 0 = patient i in hospital 1 Do we need index j? No – why? cost i = a 0 + b(age i ) + c(hosp2 i ) + e i What assumptions does this model make?

What do we do with u j ? Part 2 – Random effects Three issues –Are groups regarded as sample from pop.? –Do you want to test the effect of group level variables (remember W = # beds)? –Do you have small group sizes (2-50 or 100)? TREAT AS “RANDOM EFFECTS” Model u j explicitly Additional assumption that u j is i.i.d. –Groups (hospitals) considered exchangeable Can include group-level explanatory variables (W)

Questions 1.For our data, what does this equation look like if u j is modeled as a random effect? 2.How would we include our hospital- level explanatory variable? y ij = a 0 + b(x ij ) + u j + e ij Patient ID Hospital ID Age (X) Cost (Y) Beds (W)

Modeling u j as a random effect (u j = “differences between hospitals”) cost ij = a 0 + b(age ij ) + u j + e ij u j = deviation from the average cost for hospital j = estimated using HLM, SAS, Stata (get a number!) cost ij = a 0 + b(age ij ) + d(beds j ) + u j + e ij Uses the number of beds in the hospital to explain some of the variation in u j Question - what happens to u j if the number of beds explains all of the differences between hospitals?

Question cost ij = a 0 + b(age ij ) + c(hosp2 ij ) + d(beds j ) + u j + e ij Is this equation an example of random or fixed effects? What are the challenges in modeling this equation?

It is GARBAGE!!!!! Totally meaningless –Models u j as fixed effect as well as a random effect with a hospital-level covariate Probably won’t run (if it does, don’t believe anything you see) Can’t model u j (variation in the average cost between hospitals) as both a random effect and a fixed effect at the same time!

What we have done so far We discussed –Clustering (artificial and natural) –Accounting for clustering Nuisance = robust estimates of variance Interesting = multilevel models – Representing clustering in simple model Fixed effects Random effects with group-level explanatory variables

What we will do next Sitting down at the computer –Modeling random effects using SAS proc mixed –Random coefficients other than the intercept (briefly) Repeated measures Non-continuous outcomes Computer programs

SAS Proc Mixed Sitting in front of the computer (Major reality check!)

Random Effects Models for Continuous Outcomes SAS proc mixed –Powerful and dangerous –Poor documentation (use Singer 1998) –Defaults may not be appropriate –Single level representation of equations Stata xtreg –Good documentation –Defaults usually ok –Single level representation of equations

Example of a Hospital Epidemic Revised from Singer ,185 patients admitted to 160 hospitals patients per hospital –Hospital with N=67 in Washington DC Patient-level outcome is the severity of new disease (MATHACH) Question: what does an equation look like if hospital modeled as random effect? –Use single and multiple equation representation Patient IDHospital MATHACH (Y)

Modeling hospital as a random effect (u j = “differences between hospitals”) MATHACH ij = a 0 + u j + e ij and MATHACH ij = a j + e ij a j = a 0 + u j u j = deviation from the average MATHACH severity score for hospital j Write code to model using SAS proc mixed (single equation)

Code for SAS proc mixed (Described using Singer’s terminology) Proc mixed noclprint covtest; Class hospital; Model MATHACH = /solution; Random intercept /subject=hospital; Model statement indicates fixed effects (in this case, only one fixed effect – the intercept a 0 – which is implied) Indicates random effects (intercept and error term) and the specification of the level 2 units (hospitals). The error term e ij is implied.

STOP! Something seems strange! How can an intercept be fixed and random? Proc mixed terminology derived from multiple equation representation MATHACH ij = a j + e ij a j = a 0 + u j Fixed and random in proc mixed refer to the modeling of a j, not u j BUT decision to model hospitals as fixed or random applies to u j, not a j (earlier discussion)

Important Lesson (This is a source of major confusion) Use of the terminology (fixed vs. random) differs widely Even when the term in question (e.g., u j ) is agreed upon, definitions not only differ but are often incompatible (Gelman 2004 Ann Stat) Solution –Don’t just throw terminology around –Draw the dataset, write down the equation(s) and circle the terms that you want to model –Make sure you understand exactly where those terms are on the computer output

Code for SAS proc mixed Proc mixed noclprint covtest; Class hospital; Model MATHACH = /solution; Random intercept /subject=hospital; Fixed effects = a 0 is implied (representing the fixed part of the intercept). Adds random effect for hospital = u j (representing the random part of the intercept). Random effect for patient = e ij is implied (represents error term). MATHACH ij = a 0 + u j + e ij

SAS Output MATHACH ij = a 0 + u j + e ij

Yikes! More info needed

Question: What IS a random effect? MATHACH ij = a 0 + u j + e ij How do we describe it? What do we want to know? How do you model those subscripts?

A random effect is a random variable Random variables are described by distributions Start with the easy one (basic error term) –e ij ~ N(0,  2 ) –The estimated parameter is  2 Same approach to new term –u j ~ N(0,  ) –The estimated parameter is  Assume that e ij and u j are independent

Interpreting this equation One estimated fixed effect –a 0 describes average MATHACH score for the hospitals Two random effects (variances are estimated) –  describes variability in hospital means (variability in average MATHACH between hospitals) –  describes variability in MATHACH within hospitals What is this equation called? MATHACH ij = a 0 + u j + e ij u j ~ N(0,  ) e ij ~ N(0,  2 )

Code for SAS proc mixed Proc mixed noclprint covtest; Class hospital; Model MATHACH = /solution; Random intercept /subject=hospital; Request hypothesis tests for variance and covariance components (  &  2 ) Identify categorical variables /solution prints estimates and hypothesis tests for fixed effects (a 0 in this case)

SAS Output MATHACH ij = a 0 + u j + e ij u j ~ N(0,  ) e ij ~ N(0,  2 )

SAS Output MATHACH ij = a 0 + u j + e ij u j ~ N(0,  ) e ij ~ N(0,  2 )

SAS Output MATHACH ij = a 0 + u j + e ij u j ~ N(0,  ) e ij ~ N(0,  2 )

ICC - A cool thing Use the output to calculate the ICC (intraclass correlation coefficient) Figure out the portion of the total variance that occurs between (as opposed to within) hospitals –Bigger means more clustering ICC =  / (  +  2 ) = 8.6 / ( ) = 0.18

Adding predictor variables Patient-level covariate is SES Hospital-level covariate (MEANSES) is aggregate of patient SES MEANSES and SES are centered at the grand mean (mean of 0) Pati ent ID Hosp ital SES (X) MATH ACH (Y) MEAN SES (W)

Question What does the equation look like if we add a hospital-level characteristic (MEANSES)? What do we expect might happen to the estimate of u j ?

Solution MATHACH ij = a 0 + d(MEANSES j ) + u j + e ij u j ~ N(0,  ) e ij ~ N(0,  2 ) Use the average SES in hospitals to explain some of the variation in MATHACH between hospitals We would expect that the u j (represented by  ) might decrease

Code for SAS proc mixed Proc mixed noclprint covtest; Class hospital; Model MATHACH = MEANSES /solution ddfm=bw; Random intercept /subject=hospital; Added fixed effect for MEANSES (recall that fixed effect for a 0 is implied) Keep random effect for hospital = u j (representing the random part of the intercept). Random effect for patient = e ij is implied (represents error term). MATHACH ij = a 0 + d(MEANSES j ) + u j + e ij Use between/within method for computing denominator degrees of freedom (read about it)

SAS Output Estimate d for MEANSES Estimate of  decreases from 8.6 to 2.6

Interpreting the Numbers a 0 = = ? d = 5.86 = ? u j = ? e ij = ? Remember  = 2.64 Remember  2 = MATHACH ij = a 0 + d(MEANSES j ) + u j + e ij u j ~ N(0,  ) e ij ~ N(0,  2 ) Question: what is the real-world meaning of each term? What does each term say about MATHACH and hospitals?

Interpreting the Output Conditional Fixed Effects a 0 = Intercept estimates average MATHACH among hospitals when all other predictors are 0 –MEANSES is centered at the grand mean (has a mean of 0) –Intercept is average MATHACH in a hospital of average MEANSES d = estimated coefficient on MEANSES –1 unit increase in average SES in a hospital associated with a 5.86 units increase in MATHACH MATHACH ij = a 0 + d(MEANSES j ) + u j + e ij

Interpreting the Output Conditional Random Effects  = describes variability in average MATHACH between hospitals after accounting for MEANSES –Decreased from 8.6 to 2.6 –MEANSES explains a large portion of between-hospital variation in MATHACH  2 = residual variability in MATHACH among patients within hospitals –Does not change significantly MATHACH ij = a 0 + d(MEANSES j ) + u j + e ij u j ~ N(0,  ) e ij ~ N(0,  2 )

More Cool Things Var. Comp. a 0 = d = not est.  = 8.61  2 = MATHACH ij = a 0 + d(MEANSES j ) + u j + e ij u j ~ N(0,  ) e ij ~ N(0,  2 ) Add MEANSES a 0 = d = 5.86  = 2.64  2 = Write down everything on this slide

Questions What percent of the explainable variation in mean hospital MATHACH scores is explained by MEANSES? What is the (residual) ICC among hospitals after accounting for MEANSES?

Solutions 69% of the explainable variation in average MATHACH between hospitals is explained by MEANSES (8.61 – 2.64) / 8.61 = 0.69 Residual ICC = 0.06 = 2.64 / ( ) –Correlation between hospitals that have the same average SES –Can you drop the random effect for hospitals? Has a sufficient amount of u j been picked up by MEANSES?

Random Coefficients Briefly!

How do we represent the clustering? Let the regression coefficients vary from group to group y ij = a j + b j x ij + dw j + e ij Groups j can have higher or lower values of a j and b j

Representation as Equations Multiple Equation Representation y ij = a j + b j x ij + e ij a j = a 0 + u j b j = b 0 + v j e ij ~ N(0,  2 ) u j ~ N(0,  u ) v j ~ N(0,  v ) cov (u j,v j ) =  uv

Doing Random Coefficients Can definitely do it Extremely complex to model Difficult to interpret (must be very careful about centering) Maureen’s opinion –Usually like hitting a gnat with a sledgehammer –Almost impossible to explain in the real world

Repeated Measures One case where you might want to think about random coefficients

What do the data look like? Repeated measures of SBP over time are the level-1 units The time point of each measurement is a level- 1 characteristic Each set of measurements clustered within level-2 units (patients) Age represents a level- 2 characteristic Obs # SBP (Y) Time (X) Pati ent Age (W)

Question What is the equation to model SBP, accounting for clustering within patients? –As a function of time? –Adding in patient age? Obs # SBP (Y) Time (X) Pati ent Age (W)

Solutions SBP ij = a 0 + b(TIME ij ) + u j + e ij u j ~ N(0,  ) e ij ~ N(0,  2 ) SBP ij = a 0 + b(TIME ij ) + d(AGE j ) + u j + e ij u j ~ N(0,  ) e ij ~ N(0,  2 )

Code for SAS proc mixed Proc mixed noclprint covtest; Class patient; Model SBP = TIME /solution ddfm=bw; Random intercept /subject=patient type=un; Specifies that you want the structure of the variance- covariance matrix of the random effects to be unstructured (previously it was compound symmetry). SBP ij = a 0 + b(TIME ij ) + u j + e ij

Non-continuous outcomes Slightly more complex extensions

Basics are similar Generalized linear mixed models Move to Stata –SAS has multiple problems –Xt commands are useful –GLLAMM is great and flexible Skrondal & Rabe-Hesketh have good introductory paper

Choosing a Computer Program Decision points –Distribution of dependent variable –Easy of use and compatibility –Single vs. multiple equation representation Major choices –SAS vs. Stata –Specialized programs (HLM, MLWin)

Thank you! (All those identifying symptoms of MATHACH, please proceed to the closest Washington DC hospital)