Presentation is loading. Please wait.

Presentation is loading. Please wait.

IES Workshop on Evaluating State and District Level Interventions

Similar presentations


Presentation on theme: "IES Workshop on Evaluating State and District Level Interventions"— Presentation transcript:

1 IES Workshop on Evaluating State and District Level Interventions
Mark W. Lipsey Director, Center for Evaluation Research and Methodology Vanderbilt University David Holdzkom Assistant Superintendent for Evaluation and Research Wake County Public School System, North Carolina April 24, 2008 Washington, DC

2 Purpose To help schools, districts, and states design and implement rigorous evaluations of the effects of promising practices, programs, and policies on educational outcomes.

3 Why encourage locally initiated impact evaluation?
Many interventions are not effective; users and interested others need to know. The interventions most relevant to improving outcomes are those that schools and districts believe are promising and feasible. IES has funding to support research initiated by schools, districts, and states.

4 What kinds of interventions might be evaluated?
Practices, e.g., one-on-one tutoring, educational software, acceleration of high ability students, cooperative learning. Programs, e.g., Reading Recovery, Ladders to Literacy, Cognitive Tutor algebra, Saxon Math, Caring School Community (character education). Policies, e.g., reduced class size, pre-K, alternative high schools, all year calendar.

5 Key Issues in Designing Impact Evaluations for Education Interventions

6 Logic Models, Variables, and Evaluation Questions

7 Logic model: 1. Specifying the problem the intervention addresses
Nature of the need: What and for whom (e.g., kindergarten students who aren’t ready for school). Why (e.g., poor pre-literacy skills, inappropriate school behavior). Rationale/evidence supporting the intervention target (e.g., at entry K students need to be ready to learn or they will begin to fall behind; research shows school readiness can be enhanced for at-risk 4 year olds).

8 Logic model: 2. Specifying the planned intervention
What the intervention does that addresses the need: Content: What the students should know or be able to do; why this meets the need. Pedagogy: Instructional techniques and methods to be used; why appropriate. Delivery system: How the intervention will arrange to deliver the instruction. The key factors or core ingredients most essential and distinctive to the intervention.

9 Logic model: 3. Specifying the theory of change
Target Population Intervention Proximal Outcomes Distal Outcomes Positive attitudes to school 4 year old at-risk children Pre-K with literacy curriculum Improved pre-literacy skills Increased school readiness Greater learning gains in K Learn appropriate school behavior

10 Mapping variables onto the intervention theory: Sample characteristics
Positive attitudes to school 4 year old at-risk children Pre-K with literacy curriculum Improved pre-literacy skills Increased school readiness Greater learning gains in K Learn appropriate school behavior Sample descriptors: * Basic demographics * Diagnostic, need/eligibility identification * Baseline performance Potential moderators: * Setting, context * Personal and family characteristics * Prior experience

11 Mapping variables onto the intervention theory: Intervention characteristics
Positive attitudes to school 4 year old at-risk children Pre-K with literacy curriculum Improved pre-literacy skills Increased school readiness Greater learning gains in K Learn appropriate school behavior Independent variable: * T vs. C comparison conditions Generic fidelity: * T and C exposure to the generic aspects of the intervention (type, amount, quality) Specific fidelity: * T and C (?) exposure to distinctive aspects of the intervention (type, amount, quality) Potential moderators: * Characteristics of personnel * Intervention setting, context e.g., class size

12 Mapping variables onto the intervention theory: Intervention outcomes
Positive attitudes to school 4 year old at-risk children Pre-K with literacy curriculum Exposed to intervention Improved pre-literacy skills Increased school readiness Greater learning gains in K Learn appropriate school behavior Other dependent variables: * Side effects– possible unplanned positive or negative outcomes. * Mediators– DVs on causal pathways from intervention to other DVs. Focal dependent variables: * Pretests (pre-intervention). * Posttests (at end of intervention) * Follow-ups (lagged after end of intervention).

13 Research questions: Relationships of (possible) interest
Intervention effects: Causal relationship between intervention and outcomes. Duration of effects post-intervention. Moderator relationships: Differential intervention effects for different subgroups. Mediator relationships: Stepwise causal relationship with effects on a proximal outcome causing effects on a distal outcome.

14 Research Designs for Estimating Intervention Effects

15 What is an intervention effect and why is it so difficult to determine?

16 Research designs to discuss
Two strong ones 1. Randomized experiment 2. Regression-discontinuity Two weak ones 3. Nonrandomized comparison groups with statistical controls 4. Comparative interrupted time series

17 1. Randomized experiment
High pre Med high Low Receive experimental intervention Outcome Research sample of students, teachers, classrooms, schools, etc. Randomly assigned Intervention effect Med low pre Do not receive experimental intervention Outcome Random assignment to conditions Pretest blocking Sample Posttest

18 Circumstances conducive to randomized experiments
More demand than supply for program– allocate scarce resource by lottery. New program that can be phased in– wait list control had delayed start. Pull-out or add-on program for selected students– randomly select from among those eligible. Volunteers willing to opt in for a chance to receive the program.

19 Example: Junior high algebra curriculum
In the Moore Oklahoma Independent School District conducted a study of the effectiveness of the Cognitive Tutor Algebra I program on students in their junior high school system. Students in 5 junior high schools were randomly assigned to either the Cognitive Tutor Algebra I course or the ‘practical as usual’ algebra courses. Cognitive Tutor teachers received the curriculum materials and 4 days of training. Outcome measures included the ETS Algebra I end-of-course exam, course grades, and a survey of student attitudes towards mathematics.

20 Example: Alternative high school for students at risk of dropping out
Horizon High School in Las Vegas identified 9th and 10th grade students behind grade level and at risk of dropping out. A random sample of these students was assigned to attend an alternative high school that featured a focus on cooperative learning, small group instruction, and support services. Outcomes were compared for the alternative and regular high schools on dropout rates, self-esteem, drug use, and arrest rates.

21 Example: Remedial reading programs for elementary students
The Allegheny Intermediate Unit (AIU), which serves 42 suburban school districts in Allegheny County, Pennsylvania, randomly assigned 50 schools to one of four commercially available remedial reading interventions. Within each school struggling readers in grades 3 and 5 were identified and randomly assigned to instruction as usual or the remedial reading program designated for that school. In each program, 3 students met with a trained teacher one hour/day for 20 weeks. Measures of reading skill were administered at the beginning and end of the school year for program and control students.

22 2. Regression-discontinuity (aka the cutting-point design)
When well-executed, its ability to provide an unbiased estimate of the intervention effect is strong– comparable to a randomized experiment. It is adaptable to many circumstances where it may be difficult to apply a randomized design.

23 Corresponding regression equation (T: 1=treatment, 0=control)
Consider first a posttest on pretest regression for a randomized experiment with no effect Posttest (Y) T C Mean Y Mean S Pretest (S) Corresponding regression equation (T: 1=treatment, 0=control)

24 Pretest-posttest randomized experiment, now with an intervention effect
(Y) T T Mean Y C Δ C Mean Y T & C Mean S Pretest (S)

25 Consider now the same regression with no effect but with a cutting point applied
Posttest (Y) C C Mean Y T Mean Y T Cutting Point Selection Variable (S)

26 Regression discontinuity scatterplot (null case)
Posttest (Y) T C Cutting Point Selection Variable (S)

27 Now add an intervention effect
Posttest (Y) C Δ T Cutting Point Selection Variable (S)

28 Regression discontinuity scatterplot with effect
Posttest (Y) T C Cutting Point Selection Variable (S)

29 The effect estimated by R-D is the same as that from the randomized experiment
Posttest (Y) C Δ T Cutting Point Selection Variable (S)

30 The selection variable for R-D
A continuous quantitative variable measured on every candidate for assignment to T or C who will participate in the study. Assignment to T or C strictly on the basis of the score obtained and the predetermined cutting point. Does not have to correlate highly with the outcome variable. Can be tailored to represent an appropriate basis for the assignment decision in the setting.

31 Special issues with the R-D design
Correctly fitting the functional form– possibility that it is not linear curvilinear functions interaction with the cutting point. Statistical power requires about 3 times the sample size of a comparable randomized experiment covariates correlated with the outcome but not the selection variable are helpful.

32 Circumstances conducive to the regression-discontinuity design
The situation involves a selection from some larger group of who will, or should, receive the intervention and who will not. The basis for selection is or can be made explicit and systematic enough to be captured in a quantitative rating or ranking. The allocation of the intervention can be made strictly on the basis of the selection score and cutting point in a high proportion of cases. Exceptions can be identified in advance and exempted from the scheme.

33 Example: Effects of universal pre-k in Tulsa, Oklahoma
Eligibility for pre-k determined strictly on the basis of age– cutoff by birthday. Overall sample of 1,567 children just beginning pre-k plus 1,461 children just beginning kindergarten who had been in pre-k the previous year. WJ Letter-Word, Spelling, and Applied Problems as outcome variables.

34 Entry into pre-k selected by birthday
WJ test score ? C No Pre-K yet; tested at beginning of pre-K year T Completed pre-K; tested at beginning of K Born after September 1 Born before September 1 Age

35 Samples and testing Administer WJ tests Year 1 Year 2 pre-k
kindergarten First cohort pre-k Second cohort Administer WJ tests

36 Excerpts from Regression Analysis
Letter-Word Spelling Applied Probs Variable B coeff Treatment (T) 3.00* 1.86* 1.94* Age: Days ± from Sept 1 .01 .01* .02* Days2 .00 Days x T -.01 Days2 x T Free lunch -1.28* -.89* -1.38* Black .04 -.44* -2.34* Hispanic -1.70* -.48* -3.66* Female .92* 1.05* .76* Mother’s educ: HS .59* .57* 1.25* * p<.05

37 3. Nonrandomized comparison groups with statistical controls
Statistical controls: Analysis of covariance and multiple regression Matching on the control variables Propensity scores derived from the control variables.

38 Nonequivalent comparison analog to the randomized experiment
Receive experimental intervention Outcome Population of students, teachers, classrooms, schools, etc. Selected through some nonrandom more-or-less natural process Intervention effect (??) Do not receive experimental intervention Outcome

39 Issues for obtaining good intervention effect estimates from nonrandomized comparison groups
The fundamental problem: selection bias Knowing/measuring the variables necessary and sufficient to statistically control for the selection bias characteristics related to the outcome on which the groups differ Using an analysis model that properly adjusts for the selection bias, given appropriate control variables

40 Nonequivalent comparison groups: Pretest/covariate and posttest means
(Y) T Diff in post means C Diff in pretest/cov means Pretest/Covariate(s) (X)

41 Nonequivalent comparison groups: Covariate-adjusted treatment effect estimate
Posttest (Y) T C Δ Pretest/Covariate(s) (X)

42 Covariate-adjusted treatment effect estimate with a relevant covariate left out
Posttest (Y) T C Δ Pretest/Covariate(s) (X)

43 Using control variables via matching
Groupwise matching: select control comparison to be groupwise similar to intervention group, e.g., schools with similar demographics, geography, etc. Generally a good idea. Individual matching: select individuals from the potential control pool that match intervention individuals on one or more observed characteristics. May not be a good idea.

44 Potential problems with individual level matching
Basic problem with nonequivalent designs– need to match on all relevant variables to obtain a good estimate of the intervention effect. If match on too few variables, may omit some that are important to control. If try to match on too many variables, the sample will be restricted to the cases that can be matched; may be overly narrow. If must select disproportionately from one tail of the treatment distribution and the other tail of the control distribution, may have regression to the mean artifact.

45 Regression to the mean: Matching on the pretest
Area where matches can be found

46 Propensity scores as control variables
The propensity score is the probability of being in the intervention group instead of the comparison group. It is estimated (“predicted”) from data on the characteristics of the individuals already in each group, typically using logistic regression. It thus combines all the control variables into a single variable optimized to differentiate the intervention sample from the control sample.

47 One option: Use the propensity score to create matched groups
Propensity Score Quintiles Treatment Group Matches Control Group

48 Another option: Use the propensity score as a covariate in ANCOVA or MR
Posttest (Y) T C Δ Propensity score (P)

49 Circumstances appropriate for the nonequivalent comparison design
A stronger design is truly not feasible. A sample of relatively comparable units not receiving the intervention is available. A full account can be given of the differences between the groups potentially related to the outcomes of interest. Data on those differences can be obtained and used for statistical control.

50 Example: Effects of a professional development program for teachers
In the Montgomery County Public Schools, MD, some 3d grade teachers had received the Studying Skillful Teaching training, some had not. The reading and math achievement test scores for students of teachers with and without training were compared. Analysis of covariance was used to test for differences in student outcomes with a propensity score control variable and covariates representing teacher credentials, student pretest, reduced/free lunch status, ethnicity, and special ed or ELL service.

51 4. Comparative interrupted time series
9th grade program schools Mean Achievement 9th grade other schools Program Onset School Year

52 Requirements for a good intervention effect estimate from comparative interrupted time series
The fundamental problem: changes stemming from other sources. Sufficient pre-intervention time series data showing relative stability. No other potentially influential event coincides with the program onset or staggered onsets if available. Comparison time series for very similar units in same environment but without the program. An analysis model that properly estimates changes and differences with autocorrelated data.

53 Circumstances appropriate for comparative interrupted time series
A stronger design is truly not feasible. Time series data on a relevant outcome for those exposed to the program are available for periods before and after the onset of the program. Sufficient data points are available, with no change in the nature of the measure, to establish stable statistical trends. Data on the same measure over the same time period are available for comparable cases without the program.

54 Example: The ninth grade Success Academy in Philadelphia
The Success Academy grouped 9th graders together in small learning communities with a specialized curriculum and a small group of dedicated teachers. Implemented by 7 of the 22 nonselective high schools in The outcomes were attendance, academic credit earned, promotion to 10th grade, achievement test scores, and graduation rates. Outcomes are compared for 9th graders during the 3 years prior and 5 years after program onset and for the program schools vs. a matched group of schools without the program.

55 Other Important Aspects of the Research Plan

56 Statistical power Statistical power = probability of statistical significance when there is an effect. Power is mainly a function of: alpha level for significance testing minimum effect size to detect in standard deviation units the sample size: number of students, classrooms, schools, etc. the covariates included in the analysis the research design and corresponding analysis model.

57 Power: Critical considerations
A realistic identification of the minimal effect size with practical significance that the research should be powered to detect. The unit that is assigned to conditions (students, classrooms, schools, etc.). The intracluster correlations (ICC) expected for student outcomes when students are nested within the units assigned. The expected correlations with outcomes of any covariates measured on the units assigned to conditions. The number of schools, classrooms, students, etc. available for the study. Specific design issues such as the need for 3-4 times as many units for regression-discontinuity as for a comparable randomized experiment.

58 Computer program for power estimation in multilevel designs
Raudenbush, S. W., Spybrook, J., Liu, X., & Congdon, R. (2006). Optimal design for longitudinal and multilevel research: Documentation for the “Optimal Design” software. Optimal Design Version 1.76

59 Multilevel Data Analysis
Applicable when sampling and assignment to conditions occurs with one unit (e.g., classrooms, schools) and outcomes are measured on units nested within (e.g., students). Requires specialized computer programs, e.g., HLM, MLWin, SAS Proc Mixed, SPSS Mixed Models.

60 References and readings
Experimental and quasi-experimental design Shadish, W. R., Cook, T. D., & Campbell, D. T. (2001). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin. Bickman, L., & Rog, D. J. (eds)(2008). The SAGE Handbook of Applied Social Research Methods (Second Edition). Sage Publications. Regression-discontinuity Hahn, J., Todd, P. and Van der Klaauw, W. (2002). Identification and estimation of treatment effects with a regression-discontinuity design. Econometrica, 69(1), Cappelleri J.C. and Trochim W. (2000). Cutoff designs. In Chow, Shein-Chung (Ed.) Encyclopedia of Biopharmaceutical Statistics, NY: Marcel Dekker. Cappelleri, J., Darlington, R.B. and Trochim, W. (1994). Power analysis of cutoff-based randomized clinical trials. Evaluation Review, 18,

61 Nonequivalent comparison designs
Rosenbaum, P.R., & Rubin, D.B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70(1): Luellen, J. K., Shadish, W.R., & Clark, M.H. (2005). Propensity scores: An introduction and experimental test. Evaluation Review 29(6): Schochet, P.Z., & Burghardt, J. (2007). Using propensity scoring to estimate program-related subgroup impacts in experimental program evaluations. Evaluation Review 31(2): Time series Bloom, H. S. (2003). Using “short” interrupted time-series analysis to measure the impact of whole school reforms. Evaluation Review, 27(1), 3-49. Chatfield, C. (2003). The analysis of time series: An introduction (Sixth Ed.) Chapman & Hall/CRC.

62 Examples used in this presentation
Multilevel analysis Hox, J. (2002). Multilevel analysis: Techniques and applications. Lawrence Erlbaum. Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (Second ed.). Sage publications. Examples used in this presentation Morgan, P., & Ritter, S. (2002). An experimental study of the effects of Cognitive Tutor® Alegbra I on student knowledge and attitude. Carnegie Learning, Inc. Dynarski, M., Gleason, P., Rangarajan, A., & Wood, R. (1998). Impacts of Dropout Prevention Programs. Final Report. Mathematica Policy Research, Princeton, NJ. Torgesen, J., Myers, D., Schirm, A., et al. (2006). National Assessment of Title I Interim Report to Congress: Volume II: Closing the Reading Gap, First Year Findings from a Randomized Trial of Four Reading Interventions for Striving Readers. Washington, DC: U.S. Department of Education, Institute of Education Sciences.

63 W. T. Gormley, T. Gayer, D. Phillips, & B. Dawson (2005)
W. T. Gormley, T. Gayer, D. Phillips, & B. Dawson (2005). The effects of universal pre-k on cognitive development. Developmental Psychology, 41(6), Modarresi, S., & Wolanin, N. (2007). The effects of Studying Skillful Teaching training program on students’ reading and mathematics achievement. Evaluation Brief, February. Montgomery County Public Schools, MD. Kemple, J. J., Herlihy, C. M., & Smith, T. J. (2005). Making progress toward graduation: Evidence from the Talent Development High School model. New York: MDRC. [Includes 9th grade Success Academy]. Raudenbush, S. W., Spybrook, J., Liu, X., & Congdon, R. (2006). Optimal design for longitudinal and multilevel research: Documentation for the “Optimal Design” software. Optimal Design Version 1.76


Download ppt "IES Workshop on Evaluating State and District Level Interventions"

Similar presentations


Ads by Google