Download presentation
Presentation is loading. Please wait.
Published byEileen Gilbert Modified over 9 years ago
1
POLS 7170X Master’s Seminar Program/policy Evaluation Class 6-7 Brooklyn College-CUNY Shang E. Ha
2
Quasi-Experimental Impact Assessment A randomized field experiment is the strongest research design for assessing program impact When a randomized design is not feasible, there are alternative research designs that an evaluator can use Even when well crafted and implemented, these alternative designs may still yield biased estimates of program effects; such biases systematically exaggerate or diminish program effects, and the direction the bias may take cannot usually be known in advance
3
Bias in Estimation of Program Effects A program effect The difference between that observed outcome and the outcome that would have occurred for those same targets, all other things being equal, had they not been exposed to the program Bias comes into the picture when either the measurement of the outcome with program exposure or the estimate of what the outcome would have been without program exposure is higher or lower than the corresponding “true” value
4
Bias: An Example A reading program for young children that emphasizes vocabulary development We have an appropriate vocabulary test for measuring outcome We use this test to measure the children’s vocabulary before and after the program We conduct a simple pre-post test [exhibit 9-A] The vocabulary of young children tends to increase over time!
5
Selection Bias A group comparison design for which the groups have not been formed through randomization is known as a nonequivalent comparison design When the equivalence between the treatment group and the control group does not hold, the difference in outcome between the groups produces a form of bias in the estimate of program effects (selection bias)
6
Selection Bias A program where a group of individuals volunteer to participate and use those who do not volunteer as the control group Because we are unlikely to know what all the relevant differences are between volunteers and nonvolunteers, we have limited ability to determine the nature and extent of the bias
7
Attrition Even in the case of well-executed randomized field experiments, bias can occur Attrition Targets drop out of the intervention or control group and cannot be reached Targets refuse to cooperate in outcome measurement C.f. failure to treat
8
Other Sources of Bias Secular trends Relatively long-term trends in community, region, or country In a period when a community’s birth rate is declining, a program to reduce fertility may appear effective because of bias stemming from that downward trend Interfering events Short-term events A natural disaster may make it appear that a program to increase community cooperation has been effective, when in reality it is the crisis situation that has brought community members together Maturation Natural maturational and developmental processes can produce considerable change independently of the program A program to improve preventive health practices among adults may seem ineffective because health generally declines with age
9
Matching The intervention group is typically specified first and the evaluator then constructs a control group by selecting targets unexposed to the intervention that match those in the intervention group on selected characteristics To the extent that the matching falls short of equating the groups on characteristics that will influence the outcome, selection bias will be introduced into the resulting program effect estimate
10
Matching Procedures Individual matching: to draw a “partner” for each target who receives the intervention from the pool of potential targets unexposed to the program Relevant matching variables: age, gender, father’s occupation, hours of work, etc Aggregate matching: individuals are not matched case by case, but the overall distributions in the intervention and control groups on each matching variable are made comparable
11
Problems of Individual Matching Individual matching is usually preferable to aggregate matching But individual matching is more time- consuming and difficult to execute for a large number of matched variables Matching by individuals can sometimes result in a drastic loss of cases If matching persons cannot be found for some individuals in the intervention group, those unmatched individuals have to be discarded as data sources
12
Statistical Controls The functional equivalent of matching Any program effect estimate based on a simple comparison of the outcomes for the intervention and control groups must be presumed to include selection bias If the relevant differences between the groups can be measured, statistical techniques can be used to attempt to statistically control for the differences between groups that would otherwise lead to biased program estimates
13
Statistical Controls: Illustration Panel A: Outcome Comparison ParticipantsNon-Participants Ave. Wage$7.75$8.20 Panel B: Outcome Comparison after Adjusting for Educational Attainment ParticipantsNon-Participants Less than HS High School Less than HS High School Ave. Wage$7.60$8.10$7.75$8.50 Panel C: Outcome Comparison after Adjusting for Education and Employment ParticipantsNon-Participants Less than HS/UnEm HS/UnEmLess than HS/UnEm Less than HS/Emp HS/UnEmHS/Emp Ave. Wage$7.60$8.10$7.50$7.83$8.00$8.60
14
How to Read Regression Results? The adjustments shown in previous slide were accomplished in a very simple way to illustrate the logic of statistical controls. In actual application, the evaluator would generally use multivariate statistical methods to control for a number of group differences simultaneously
15
How to Read Regression Results? Regression Results Predicting Improvement in Test Scores (scale, 0-100) CoefficientStandard Error Program Intervention12.34*3.45 Constant56.23*20.23 # of Students=5,000 * Statistically significant at p<.05
16
Simple Pre-Post Studies Outcomes are measured on the same targets before program participation and again after sufficiently long participation for effects to be expected E.g., the effects of Medicare (before eligible – after eligible) In general, simple pre-post designs provide biased estimates of program effects that have little value for purposes of impact assessment
17
Quasi-Experimental Design: Cautions The advantages of quasi-experimental research designs for program impact assessment rest entirely on their practicality and convenience in situations where randomized field experiments are not feasible Under favorable circumstances and carefully done, quasi-experiments can yield estimates of program effects that are comparable to those derived from randomized designs, but they can also produce wildly erroneous results
18
The Magnitude of a Program Effect The most direct way to characterize the magnitude of the program effect is simply the numerical difference between the means of the two outcome values (treatment/intervention group vs. control group) Problem of simple numerical difference between the means of the two outcomes It is very specific to the particular measurement instrument E.g., the effects of a program on knowledge about drug abuse Treatment group.17 & control group.15 a.02 increase What’s the scale of the outcome variable “knowledge about drug abuse”?
19
Standardized Mean Difference The standardized mean difference expresses the mean outcome difference between the intervention group and a control group in standard deviation units The standard deviation is a statistical index of the variation across individuals or other units on a given measure that provides information about the range or spread of the scores Describing the size of a program effect in standard deviation units indicates how large it is relative to the range of scores found between the lowest and highest ones recorded in the study A preschool program: the standardized mean difference size A test of reading readiness:.50 (the mean score for the intervention group is half a standard deviation higher than that for the control group) A test of advancing vocabulary:.35
20
Standardized Mean Difference By convention, standardized mean difference effect size is given a positive value when the outcome is more favorable for the intervention group and a negative value if the control group is favored [Exhibit 10-A] for formula
21
Odds-Ratio Odds-ratio tends to be preferred when outcome variables are binary (e.g., pregnant or not; graduation or not, etc) An odds ratio indicates how much smaller or larger the odds of an outcome even are for the intervention group compared to the control group An odds ratio of 1.0 indicates even odds; that is, participants in the intervention group were no more and no less likely than controls to experience the change Odds ratios greater/smaller than 1.0 indicate that intervention group members were more/less likely to experience a change An odds ratio of 2.0 means that members of the intervention group were twice as likely to experience the outcome than members of the control group
22
Odds Ratio Positive OutcomeNegative Outcome Intervention Groupp1-p Control Groupq1-q Odds Ratio = [p/(1-p)]/[q/(1-q)]
23
Statistical Significance We would like to know whether the observed program effect is real or it occur by chance (statistical noise) If the estimate of program effect is large relative to the expected level of statistical noise, we will be relatively confident that we have detected a real effect and not a chance pattern of noise If the program effect estimate is small relative to statistical noise, we will have little confidence that we have observed a real program effect Conventionally, statistical significance is set a the.05 alpha level (that means, the chance of a pseudo-effect produced by noise being as large as the observed program effect is 5% or less – we have a 95% confidence level that the observed effect is not simply the result of statistical noise)
24
Type I and Type II Errors Population Circumstances Results of Significance Test on Sample Data Intervention and Control Means Differ Intervention and Control Means Do Not Differ Significant DifferenceCorrect conclusion (Prob. = 1-β) Type I Error (Prob. = α) Not a Significant Difference Type II Error (Prob. = β) Correct conclusion (Prob. = 1-α)
25
Type I and Type II Errors Type I error: finding statistical significance when there is no program effect Easy to control: the conventional alpha level.05 means that the probability of a Type I error is being held to 5% or less Type II error: not obtaining statistical significance when there is a program effect Difficult: the program design has to have adequate statistical power (the probability that an estimate of the program effect will be statistically significant when it represents a real effect)
26
Statistical Power Statistical power is a function of… The effect size to be detected; The sample size; The type of statistical significance test used; The alpha level set to control Type I error (usually fixed at.05) [Example] When program effects are not statistically significant, this result is generally taken as an indication that the program failed to produce effects This interpretation of statistically nonsignificant results is technically incorrect if the lack of statistical significance was the result of an underpowered study and not the program failure to produce meaningful effects
27
Practical Significance Statistical significance ≠ practical significance A small statistical effect may represent a program effect of considerable practical significance A large statistical effect for a program may be of little practical significance
28
Moderator Variables A moderator variable characterizes subgroups in an impact assessment for which the program effects may differ Men vs. women, white vs. black, young vs. old One important role of moderator analysis is to avoid premature conclusions about program effectiveness based only on the overall mean program effects
29
Mediator Variables A mediator variable is an intervening variable that comes between program exposure and some key outcome and represents a step on the causal pathway by which the program is expected to bring about change in the outcome Exploration of mediator relationships helps the evaluator and the program stakeholders better understand what processes occur among the target population as a result of exposure to the program
30
Meta-Analysis A statistical analysis of the statistical effects from multiple studies of a topic Reports of all available impact assessment studies of a particular intervention or type of program are first collected The program effects on selected outcomes are encoded Other descriptive information about the evaluation methods, program participants, and nature of the intervention is also recorded Publication bias
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.