Measuring Progress: Strategies for Monitoring and Evaluation Rebecca Stoltzfus
“Feedback is the breakfast of champions.” Dannon Nutrition Leadership Institute, 1999
“Any program worth implementing is worth evaluating.” sometime in my graduate studies
“Experienced evaluators design their evaluations to address the specific questions of concern to decision-makers.” Habicht J-P, Victora CG & Vaughan JP. Evaluation designs for adequacy, plausibility and probability of public health programme performance and impact. Internat J Epidemiol 1999; 28:
Starting Questions: WHO? – Who will the evaluation inform? WHY? – What questions will it answer? HOW? – What evaluation design will provide the answers, with sufficient confidence and at lowest cost?
Who are the decision-makers? Community organizations and members Program implementors Policy-makers Donor agencies Researchers Different decision-makers need different information and in different forms to guide their decisions.
Why? What questions do they need to have answered? Four types of data: – Provision – Utilization – Coverage – Impact
Questions of Provision Are the services and supplies available? – No. of clinics offering iron pills Are they accessible? – Proportion of population < 10 km from clinic Is their quality adequate? – Proportion of staff with key knowledge about iron
Questions of Utilization Are the services being used? – Are clinics being attended? – Are improved seeds being purchased? – Are people getting microcredit?
Questions of Coverage Is the target population getting the intervention? – Proportion of pregnant women who attend MCH clinic at least twice in pregnancy – Proportion of children sleeping under a bednet – Proportion of poor accessing credit Coverage is significantly more difficult to assess than utilization because it requires a population- based denominator.
Questions of Impact Have health outcomes improved? – Maternal mortality, perinatal mortality, child development, child mortality Have crop yields increased? Have household incomes increased?
Three Basic Designs Monitoring (Adequacy) – Measuring target indicators over time Plausibility Evaluation – Building a reasonable argument for causality, without a randomized trial Probability Evaluation – Establishing cause and effect by randomly allocating program and non-program areas
Strength of Evidence Monitoring (Adequacy) Plausibility Evaluation Probability Evaluation Impact Evaluation
Strength of Evidence Monitoring (Adequacy) Plausibility Evaluation Probability Evaluation Increasing Confidence Increasing Cost
Why Randomize? In impact evaluations we want to know: – Are those who received the intervention better off? (easy) – Is that benefit attributable to the intervention? (hard) To obtain causal attribution, we want to know what would have been the fate of the intervened without the intervention (the counterfactual) Many options: – Crossover designs with self as control – Pre-post with non-randomized control; difference of differences – Propensity score matching – Constructing a counterfactual from other survey data – Instrumental variables or other econometric methods – Randomized control; considered most rigorous
Randomization does not justify “closurization” “For want of a good name for it, I have chosen a terrible one.... Your randomize and then you close your eyes.”
The Axes of WHAT and HOW Where does your work sit? MonitoringPlausibilityProbability Provision Utilization Coverage Impact