POLS 7170X Master’s Seminar Program/policy Evaluation Class 6-7 Brooklyn College-CUNY Shang E. Ha.

Slides:



Advertisements
Similar presentations
Experimental Research Designs
Advertisements

Statistical Issues in Research Planning and Evaluation
Chance, bias and confounding
Selection of Research Participants: Sampling Procedures
1. Estimation ESTIMATION.
Review: What influences confidence intervals?
Who are the participants? Creating a Quality Sample 47:269: Research Methods I Dr. Leonard March 22, 2010.
11 Populations and Samples.
Today Concepts underlying inferential statistics
Chapter 4 Selecting a Sample Gay, Mills, and Airasian
Chapter 7 Correlational Research Gay, Mills, and Airasian
Chapter 14 Inferential Data Analysis
Richard M. Jacobs, OSA, Ph.D.
Experimental Research
Chapter 5: Descriptive Research Describe patterns of behavior, thoughts, and emotions among a group of individuals. Provide information about characteristics.
Inferential Statistics
Copyright c 2001 The McGraw-Hill Companies, Inc.1 Chapter 7 Sampling, Significance Levels, and Hypothesis Testing Three scientific traditions critical.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
Chapter 8 Experimental Research
Quasi-experimental Design CRJS 4466EA. Introduction Quasi-experiment Describes non-randomly assigned participants and controls subject to impact assessment.
Chapter 2: The Research Enterprise in Psychology
Chapter 2: The Research Enterprise in Psychology
Overview of Statistical Hypothesis Testing: The z-Test
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Hypothesis Testing II The Two-Sample Case.
Multiple Choice Questions for discussion
COLLECTING QUANTITATIVE DATA: Sampling and Data collection
EVAL 6970: Cost Analysis for Evaluation Dr. Chris L. S. Coryn Nick Saxton Fall 2014.
Epidemiology The Basics Only… Adapted with permission from a class presentation developed by Dr. Charles Lynch – University of Iowa, Iowa City.
Understanding Statistics
T tests comparing two means t tests comparing two means.
Introduction To Research 589(A)
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Section 9-2 Inferences About Two Proportions.
Chapter 1: The Research Enterprise in Psychology.
The Research Enterprise in Psychology. The Scientific Method: Terminology Operational definitions are used to clarify precisely what is meant by each.
Chapter 3 Research Methods Used to Study Child Behavior Disorders.
Chapter 2 The Research Enterprise in Psychology. Table of Contents The Scientific Approach: A Search for Laws Basic assumption: events are governed by.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Chapter 1: Introduction to Statistics
Research Process Parts of the research study Parts of the research study Aim: purpose of the study Aim: purpose of the study Target population: group whose.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
CHAPTER 12 Descriptive, Program Evaluation, and Advanced Methods.
Evaluating Impacts of MSP Grants Hilary Rhodes, PhD Ellen Bobronnikov February 22, 2010 Common Issues and Recommendations.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Issues concerning the interpretation of statistical significance tests.
Evaluating Impacts of MSP Grants Ellen Bobronnikov Hilary Rhodes January 11, 2010 Common Issues and Recommendations.
CHAPTER 2 Research Methods in Industrial/Organizational Psychology
Research Design ED 592A Fall Research Concepts 1. Quantitative vs. Qualitative & Mixed Methods 2. Sampling 3. Instrumentation 4. Validity and Reliability.
Review I A student researcher obtains a random sample of UMD students and finds that 55% report using an illegally obtained stimulant to study in the past.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 7 Sampling, Significance Levels, and Hypothesis Testing Three scientific traditions.
Chapter 2 The Research Enterprise in Psychology. Table of Contents The Scientific Approach: A Search for Laws Basic assumption: events are governed by.
Chapter 6: Analyzing and Interpreting Quantitative Data
Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.
Hypothesis Testing Introduction to Statistics Chapter 8 Feb 24-26, 2009 Classes #12-13.
Experimental and Ex Post Facto Designs
Chapter 7 Data for Decisions. Population vs Sample A Population in a statistical study is the entire group of individuals about which we want information.
Review: Stages in Research Process Formulate Problem Determine Research Design Determine Data Collection Method Design Data Collection Forms Design Sample.
Chapter 13 Understanding research results: statistical inference.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Chapter 10: The t Test For Two Independent Samples.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
Methods of Presenting and Interpreting Information Class 9.
Understanding Results
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Interpreting Epidemiologic Results.
Type I and Type II Errors
Presentation transcript:

POLS 7170X Master’s Seminar Program/policy Evaluation Class 6-7 Brooklyn College-CUNY Shang E. Ha

Quasi-Experimental Impact Assessment  A randomized field experiment is the strongest research design for assessing program impact  When a randomized design is not feasible, there are alternative research designs that an evaluator can use Even when well crafted and implemented, these alternative designs may still yield biased estimates of program effects; such biases systematically exaggerate or diminish program effects, and the direction the bias may take cannot usually be known in advance

Bias in Estimation of Program Effects  A program effect The difference between that observed outcome and the outcome that would have occurred for those same targets, all other things being equal, had they not been exposed to the program  Bias comes into the picture when either the measurement of the outcome with program exposure or the estimate of what the outcome would have been without program exposure is higher or lower than the corresponding “true” value

Bias: An Example  A reading program for young children that emphasizes vocabulary development We have an appropriate vocabulary test for measuring outcome We use this test to measure the children’s vocabulary before and after the program We conduct a simple pre-post test [exhibit 9-A]  The vocabulary of young children tends to increase over time!

Selection Bias  A group comparison design for which the groups have not been formed through randomization is known as a nonequivalent comparison design  When the equivalence between the treatment group and the control group does not hold, the difference in outcome between the groups produces a form of bias in the estimate of program effects (selection bias)

Selection Bias  A program where a group of individuals volunteer to participate and use those who do not volunteer as the control group Because we are unlikely to know what all the relevant differences are between volunteers and nonvolunteers, we have limited ability to determine the nature and extent of the bias

Attrition  Even in the case of well-executed randomized field experiments, bias can occur  Attrition Targets drop out of the intervention or control group and cannot be reached Targets refuse to cooperate in outcome measurement C.f. failure to treat

Other Sources of Bias  Secular trends Relatively long-term trends in community, region, or country In a period when a community’s birth rate is declining, a program to reduce fertility may appear effective because of bias stemming from that downward trend  Interfering events Short-term events A natural disaster may make it appear that a program to increase community cooperation has been effective, when in reality it is the crisis situation that has brought community members together  Maturation Natural maturational and developmental processes can produce considerable change independently of the program A program to improve preventive health practices among adults may seem ineffective because health generally declines with age

Matching  The intervention group is typically specified first and the evaluator then constructs a control group by selecting targets unexposed to the intervention that match those in the intervention group on selected characteristics  To the extent that the matching falls short of equating the groups on characteristics that will influence the outcome, selection bias will be introduced into the resulting program effect estimate

Matching Procedures  Individual matching: to draw a “partner” for each target who receives the intervention from the pool of potential targets unexposed to the program Relevant matching variables: age, gender, father’s occupation, hours of work, etc  Aggregate matching: individuals are not matched case by case, but the overall distributions in the intervention and control groups on each matching variable are made comparable

Problems of Individual Matching  Individual matching is usually preferable to aggregate matching  But individual matching is more time- consuming and difficult to execute for a large number of matched variables  Matching by individuals can sometimes result in a drastic loss of cases If matching persons cannot be found for some individuals in the intervention group, those unmatched individuals have to be discarded as data sources

Statistical Controls  The functional equivalent of matching  Any program effect estimate based on a simple comparison of the outcomes for the intervention and control groups must be presumed to include selection bias  If the relevant differences between the groups can be measured, statistical techniques can be used to attempt to statistically control for the differences between groups that would otherwise lead to biased program estimates

Statistical Controls: Illustration Panel A: Outcome Comparison ParticipantsNon-Participants Ave. Wage$7.75$8.20 Panel B: Outcome Comparison after Adjusting for Educational Attainment ParticipantsNon-Participants Less than HS High School Less than HS High School Ave. Wage$7.60$8.10$7.75$8.50 Panel C: Outcome Comparison after Adjusting for Education and Employment ParticipantsNon-Participants Less than HS/UnEm HS/UnEmLess than HS/UnEm Less than HS/Emp HS/UnEmHS/Emp Ave. Wage$7.60$8.10$7.50$7.83$8.00$8.60

How to Read Regression Results?  The adjustments shown in previous slide were accomplished in a very simple way to illustrate the logic of statistical controls.  In actual application, the evaluator would generally use multivariate statistical methods to control for a number of group differences simultaneously

How to Read Regression Results? Regression Results Predicting Improvement in Test Scores (scale, 0-100) CoefficientStandard Error Program Intervention12.34*3.45 Constant56.23*20.23 # of Students=5,000 * Statistically significant at p<.05

Simple Pre-Post Studies  Outcomes are measured on the same targets before program participation and again after sufficiently long participation for effects to be expected E.g., the effects of Medicare (before eligible – after eligible)  In general, simple pre-post designs provide biased estimates of program effects that have little value for purposes of impact assessment

Quasi-Experimental Design: Cautions  The advantages of quasi-experimental research designs for program impact assessment rest entirely on their practicality and convenience in situations where randomized field experiments are not feasible  Under favorable circumstances and carefully done, quasi-experiments can yield estimates of program effects that are comparable to those derived from randomized designs, but they can also produce wildly erroneous results

The Magnitude of a Program Effect  The most direct way to characterize the magnitude of the program effect is simply the numerical difference between the means of the two outcome values (treatment/intervention group vs. control group)  Problem of simple numerical difference between the means of the two outcomes It is very specific to the particular measurement instrument E.g., the effects of a program on knowledge about drug abuse  Treatment group.17 & control group.15  a.02 increase  What’s the scale of the outcome variable “knowledge about drug abuse”?

Standardized Mean Difference  The standardized mean difference expresses the mean outcome difference between the intervention group and a control group in standard deviation units  The standard deviation is a statistical index of the variation across individuals or other units on a given measure that provides information about the range or spread of the scores  Describing the size of a program effect in standard deviation units indicates how large it is relative to the range of scores found between the lowest and highest ones recorded in the study A preschool program: the standardized mean difference size  A test of reading readiness:.50 (the mean score for the intervention group is half a standard deviation higher than that for the control group)  A test of advancing vocabulary:.35

Standardized Mean Difference  By convention, standardized mean difference effect size is given a positive value when the outcome is more favorable for the intervention group and a negative value if the control group is favored  [Exhibit 10-A] for formula

Odds-Ratio  Odds-ratio tends to be preferred when outcome variables are binary (e.g., pregnant or not; graduation or not, etc)  An odds ratio indicates how much smaller or larger the odds of an outcome even are for the intervention group compared to the control group An odds ratio of 1.0 indicates even odds; that is, participants in the intervention group were no more and no less likely than controls to experience the change Odds ratios greater/smaller than 1.0 indicate that intervention group members were more/less likely to experience a change  An odds ratio of 2.0 means that members of the intervention group were twice as likely to experience the outcome than members of the control group

Odds Ratio Positive OutcomeNegative Outcome Intervention Groupp1-p Control Groupq1-q Odds Ratio = [p/(1-p)]/[q/(1-q)]

Statistical Significance  We would like to know whether the observed program effect is real or it occur by chance (statistical noise)  If the estimate of program effect is large relative to the expected level of statistical noise, we will be relatively confident that we have detected a real effect and not a chance pattern of noise  If the program effect estimate is small relative to statistical noise, we will have little confidence that we have observed a real program effect  Conventionally, statistical significance is set a the.05 alpha level (that means, the chance of a pseudo-effect produced by noise being as large as the observed program effect is 5% or less – we have a 95% confidence level that the observed effect is not simply the result of statistical noise)

Type I and Type II Errors Population Circumstances Results of Significance Test on Sample Data Intervention and Control Means Differ Intervention and Control Means Do Not Differ Significant DifferenceCorrect conclusion (Prob. = 1-β) Type I Error (Prob. = α) Not a Significant Difference Type II Error (Prob. = β) Correct conclusion (Prob. = 1-α)

Type I and Type II Errors  Type I error: finding statistical significance when there is no program effect Easy to control: the conventional alpha level.05 means that the probability of a Type I error is being held to 5% or less  Type II error: not obtaining statistical significance when there is a program effect Difficult: the program design has to have adequate statistical power (the probability that an estimate of the program effect will be statistically significant when it represents a real effect)

Statistical Power  Statistical power is a function of… The effect size to be detected; The sample size; The type of statistical significance test used; The alpha level set to control Type I error (usually fixed at.05) [Example]  When program effects are not statistically significant, this result is generally taken as an indication that the program failed to produce effects  This interpretation of statistically nonsignificant results is technically incorrect if the lack of statistical significance was the result of an underpowered study and not the program failure to produce meaningful effects

Practical Significance  Statistical significance ≠ practical significance A small statistical effect may represent a program effect of considerable practical significance A large statistical effect for a program may be of little practical significance

Moderator Variables  A moderator variable characterizes subgroups in an impact assessment for which the program effects may differ Men vs. women, white vs. black, young vs. old  One important role of moderator analysis is to avoid premature conclusions about program effectiveness based only on the overall mean program effects

Mediator Variables  A mediator variable is an intervening variable that comes between program exposure and some key outcome and represents a step on the causal pathway by which the program is expected to bring about change in the outcome  Exploration of mediator relationships helps the evaluator and the program stakeholders better understand what processes occur among the target population as a result of exposure to the program

Meta-Analysis  A statistical analysis of the statistical effects from multiple studies of a topic Reports of all available impact assessment studies of a particular intervention or type of program are first collected The program effects on selected outcomes are encoded Other descriptive information about the evaluation methods, program participants, and nature of the intervention is also recorded  Publication bias