Sharon Wolf NYU Abu Dhabi Additional Insights Summer Training Institute June 15, 2015 1.

Slides:



Advertisements
Similar presentations
Contextual effects In the previous sections we found that when regressing pupil attainment on pupil prior ability schools vary in both intercept and slope.
Advertisements

Hierarchical Linear Modeling: An Introduction & Applications in Organizational Research Michael C. Rodriguez.
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Statistical Analysis and Data Interpretation What is significant for the athlete, the statistician and team doctor? important Will Hopkins
Correlation and regression
Estimation and Reporting of Heterogeneity of Treatment Effects in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare.
1 Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Longitudinal Experiments Larry V. Hedges Northwestern University Prepared for the IES Summer Research Training Institute July 28, 2010.
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Clustered or Multilevel Data
Treatment Effects: What works for Whom? Spyros Konstantopoulos Michigan State University.
Chapter 11 Multiple Regression.
Today Concepts underlying inferential statistics
Chapter 7 Correlational Research Gay, Mills, and Airasian
Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.
Experimental Group Designs
Chapter 14 Inferential Data Analysis
Model Checking in the Proportional Hazard model
CORRELATIO NAL RESEARCH METHOD. The researcher wanted to determine if there is a significant relationship between the nursing personnel characteristics.
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Introduction to Multilevel Modeling Using SPSS
Regression and Correlation Methods Judy Zhong Ph.D.
Inference for regression - Simple linear regression
Hypothesis Testing in Linear Regression Analysis
Office of Institutional Research, Planning and Assessment January 24, 2011 UNDERSTANDING THE DIAGNOSTIC GUIDE.
Simple Linear Regression
STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.
Section #6 November 13 th 2009 Regression. First, Review Scatter Plots A scatter plot (x, y) x y A scatter plot is a graph of the ordered pairs (x, y)
Advanced Business Research Method Intructor : Prof. Feng-Hui Huang Agung D. Buchdadi DA21G201.
Moderation & Mediation
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
Introduction Multilevel Analysis
Introduction to Linear Regression
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
Multilevel Data in Outcomes Research Types of multilevel data common in outcomes research Random versus fixed effects Statistical Model Choices “Shrinkage.
Introduction to Multilevel Modeling Stephen R. Porter Associate Professor Dept. of Educational Leadership and Policy Studies Iowa State University Lagomarcino.
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Using Regression Discontinuity Analysis to Measure the Impacts of Reading First Howard S. Bloom
Sample Size Considerations for Answering Quantitative Research Questions Lunch & Learn May 15, 2013 M Boyle.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
HLM Models. General Analysis Strategy Baseline Model - No Predictors Model 1- Level 1 Predictors Model 2 – Level 2 Predictors of Group Mean Model 3 –
META-ANALYSIS, RESEARCH SYNTHESES AND SYSTEMATIC REVIEWS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
One-Way Analysis of Covariance (ANCOVA)
Data Analysis in Practice- Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine October.
Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Sampling and Nested Data in Practice-Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine.
Chapter 5 Multilevel Models
Developing an evaluation of professional development Webinar #2: Going deeper into planning the design 1.
Chapter Two Methods in the Study of Personality. Gathering Information About Personality Informal Sources of Information: Observations of Self—Introspection,
Jessaca Spybrook Western Michigan University Multi-level Modeling (MLM) Refresher.
Assumptions of Multiple Regression 1. Form of Relationship: –linear vs nonlinear –Main effects vs interaction effects 2. All relevant variables present.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Uses of Diagnostic Tests Screen (mammography for breast cancer) Diagnose (electrocardiogram for acute myocardial infarction) Grade (stage of cancer) Monitor.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
The simple linear regression model and parameter estimation
Analysis for Designs with Assignment of Both Clusters and Individuals
Improving the Design of STEM Impact Studies: Considerations for Statistical Power Discussant Notes Cristofer Price SREE
More on Specification and Data Issues
Regression Analysis.
Presentation transcript:

Sharon Wolf NYU Abu Dhabi Additional Insights Summer Training Institute June 15,

 Conceptual overview  Analytic considerations  Power/Minimum detectable differential effects  Cross-level interactions in MLM  Centering variables  Recommendations and tips 2

When is the story in the subgroups? 3

Guide questions about how to target resources most efficiently: How widespread are the effects of an intervention? Is the intervention effective for a specific subgroup? Is the intervention effective for any subgroup? Exploratory* versus confirmatory subgroup findings 4

 Two examples from welfare reform in the United States and the different policy implications.  Michalopoulos & Schwartz (2000) assessed two types of subgroups: 1. A range of person-level subgroups (e.g., education level, prior employment experience, and risk of depression). 2. The nature of the program and program office practices. 5

Characteristics believed to be related to the need for a particular intervention or the likelihood of benefiting from it. Demographic characteristics – e.g., gender, age, education level Risk factors - past smoking, drug abuse, severity of disease, poverty status Combinations of characteristics – e.g., gender and age; cumulative levels of risk/risk index 6

 Exogenous to the intervention: not affected by the intervention or correlated with its receipt (all pre- random assignment characteristics).  Endogenous to the intervention: affected by the intervention or correlated with its receipt (e.g., dosage of the intervention). Valid causal inferences much more difficult.  Gambia: higher “dosage” (i.e., higher attendance)  more learning?  Increased attendance could bring less advantaged students into the intervention group, biasing the average treatment effect (ATE) downward. 7

 Exploratory subgroup analyses  Provide a basis for hypothesis-generation  Essential step in the scientific method  Should be considered suggestive  Confirmatory subgroup analyses  Appropriate basis for testing hypotheses  Provide strong evidence if findings are: (a) consistent with existing findings, (b) large enough magnitude to be meaningful, (c) robust. Bloom & Michalopoulos,

 Internal contextual considerations  Features of findings internal to a given study  E.g., pattern across all outcomes for a particular subgroup in a study  External contextual considerations  Features of findings external to a given study  E.g., consistency with prior study findings 9

1. What is the impact of the program for each subgroup? 2. What are the relative impacts of the program across subgroups? 10

Minimum detectable differential effects 11

1. Did the program work for a particular subgroup ?  Assess impacts separately for this subgroup  Assess power to detect impacts for this subgroup 2. Were the effects different for particular subgroups?  Assess impacts using a cross-level interaction  Assess power to detect a cross-level interaction 12

 Minimum Detectable Effect Size (MDES): the smallest true effect, in standard deviations of the outcome, that is detectable for a given level of power and statistical significance. Accepted parameters:  Power: 80%  Statistical significance level:

 ρ = intraclass correlation  δ = MDES  λ = non-centrality parameter  J = number of clusters  n = number of units per cluster 14

 Main effect:  Main effect with covariate:  Cluster level Moderator:  Individual level Moderator:

 The number of clusters (highest level units) is more important than the size of the cluster (lower level units) in reducing the MDES.  A higher intra-cluster correlation (ICC) increases the MDES (i.e., if τ 00 is relatively large).  The proportion of variance in the outcome you can predict with L1 and L2 variables (i.e., R |X 2 and R |W 2 ) reduces the MDES. 16

 Maintains a significant portion of power because the number of clusters (or L2 units) remains the same.  The only statistical difference between the subexperiment and the full experiment is the number of L1 units per cluster. 17

18

 Minimum Detectable Effect Size Differences (MDESD): the smallest true effect of the difference in program impacts for two subgroups, in standard deviations of the outcome, that is detectable for a given level of power and statistical significance. Accepted parameters:  Power: 80%  Statistical significance level:

 Main effect:  Main effect with covariate:  Cluster level Moderator:  Individual level Moderator:

 Within-level variance becomes increasingly important. Implications include:  The number of cases per cluster (lower level units) become more important for increasing power.  The intra-cluster correlation (ICC) becomes less significant in affecting power (though still important).  The proportion of variance in the outcome you can predict with L1 variables (not L2; i.e., R |X 2 ) increases power. 21

Assessing individual-level moderation in cluster-randomized trials using multi-level models 22

1. Lower level direct effects. Does a L1 predictor X (e.g., student gender) have a relationship with the L1 outcome variable Y (e.g., student reading)? 2. Cross-level direct effects. Does a L2 predictor (e.g., school treatment status) have a relationship with an L1 outcome variable Y (e.g., student reading)? 3. Cross-level interaction effects. Does the nature or strength of the relationship between a L1 variable (e.g., gender) and the outcome (e.g., reading) change as a function of a higher-level variable (e.g., school treatment status)? 23

24

25

 Level 2 (e.g., school treatment status) and Level 1 (e.g., student gender) variables interacting to produce an effect on the outcome (e.g., student reading scores).  In terms of your impact estimation equation:  (a) Add Level 1 predictor (moderator).  (b) Expand Level 2 model to include a fixed slope (  1 ).  (c) Add a level 2 predictor (treatment status) to the slope. 26

Added L1 predictor (moderator) Expand L2 slope Add L2 predictor to the slope 27

Coefficient for cross-level interaction 28

29

 “Simple Regression Equation”: Calculate the expected values of Y ij under different conditions of T j and M ij  For continuous moderators, plot at values of one standard deviation below the mean, the mean, and one standard deviation above the mean for M.  It may also be useful to choose additional values that may be informative in specific contexts. 30

E(Y ij | M ij,T j ) = γ 00 + γ 01 (T j )+ γ 10 (M ij )+ γ 11 (M ij )(T j ) Under control conditions: E(Y ij | M ij,T j = 0) = γ 00 + γ 10 (M ij ) Under treatment conditions: E(Y ij | M ij,T j = 1) = γ 00 + γ 01 + γ 10 (M ij ) + γ 11 (M ij ) 31

If we need it 32

Implications for interpreting effect estimates and detecting impact variation 33

 How do you want to interpret the intercept in your model? The coefficients?  Example: School diversity/cultural awareness program H1: Improved sense of belonging for minority students (L1 moderator). H2: Improved sense of belonging for minority students in less diverse schools (L1 & L2 moderators).  The distribution of the moderator variable across clusters needs to be considered. 34

 CGM = Centering at the grand mean  Deviations calculated from the sample mean for all individuals  CGM L1 with all individuals; L2 with all clusters  CWC = Centering within clusters  aka, group-mean centering  Deviations calculated around the mean of the cluster j to which case i belongs 35

Y (outcome) X (predictor) The distribution of M is highly variable across clusters. 36 Cluster 1 Cluster 2 Cluster 3

Y (outcome) CGM M 37 X (predictor)

Y (outcome) X (predictor) CGM M 38

 Does not affect the rank order of scores on the variable. The complex, multilevel association between the L1 and L2 variables is unaffected.  Yields scores that are correlated with variables at both levels of the hierarchy. (This is a critical differences with CWC.)  Produces an interaction coefficient (γ 11 ) that is a weighted combination of the within- and between- cluster regression coefficients. 39

Y (outcome) X (predictor) CWC M1M1 M2M2 M3M3 40 The distribution of M is highly concentrated within clusters.

Y (outcome) X (predictor) CWC M1M1 M2M2 M3M3 41

 Affects the rank order of scores of variables within the sample.  Produces scores that are uncorrelated with Level 2 variables (because the mean for all L2 variables is zero).  Produces an interaction coefficient (γ 11 ) that is an unbiased estimate of the Level 1 association  γ 11 is a pure estimate of the cross-level interaction, no longer confounded with the Level 2 interaction. 42

Y (outcome) X (predictor) 43 The distribution of M is even across clusters.

Y (outcome) X (predictor) CGM M 44

Y (outcome) X (predictor) CWC M1 M2 M3 45

 Centering will affect estimates more if the predictor variable is not evenly distributed across clusters.  Cross-level interaction term using CGM will provide a coefficient estimate that is a mix of the L1 and L2 effects.  Cross-level interaction term using CWC will provide a pure estimate of the L1 relationship.  Decisions on how to center depend on your data and your research question (!!). 46

 Predictor: Treatment status (L2)  Individual level moderator: Student age (L1) (continuous)  Outcome: Reading score  Some options on how to center the data and what it means for interpreting your moderated effect… 47

48

49

50

51

52

Distortions to statistical inferences can occur when multiple related hypothesis tests are conducted. Suggested approaches: 1. Explicitly distinguish between exploratory and confirmatory findings 2. Minimize the number of confirmatory hypothesis tests conducted by a given study. 3. Create an omnibus hypothesis test about the intervention’s effects that considers all outcome measures and subgroups together. (e.g., composite measure of individual outcomes). 4. Consider family-wise error correction (reduces statistical power considerably). 53

1. Calculate ρ for all levels. 2. Determine your research question and relevant approach to assessing subgroup affects. 3. Calculate the power needed to detect a subgroup effect (either for a particular subgroup, or for a cross-level interaction, depending on your research question). 4. Rescale (i.e., center) predictor variables as needed. 5. Assess the practical significance of your findings (i.e., calculate effect sizes). 6. Report results regarding each step of the model building process including all coefficients, standard errors and variance components. 54

 Aguinis, H., Gottfredson, R. K., & Culpepper, S. A. (2013). Best-practice recommendations for estimating cross-level interaction effects using multilevel modeling. Journal of Management,  Bloom, H. S. (Ed.). (2005). Learning more from social experiments: Evolving analytic approaches. Russell Sage Foundation.  Bloom, H. & Michalopoulos, M. (2013). When Is the Story in the Subgroups? MDRC Working Paper.  Enders, C. K., & Tofighi, D. (2007). Centering predictor variables in cross- sectional multilevel models: a new look at an old issue. Psychological methods, 12(2), 121.  Mathieu, J. E., Aguinis, H., Culpepper, S. A., & Chen, G. (2012). Understanding and estimating the power to detect cross-level interaction effects in multilevel modeling. Journal of Applied Psychology, 97(5),