Balancing Rigor and Reality Evaluation Designs for 4-H Youth Development Programs Mary E. Arnold, Ph.D. 4-H Youth Development Specialist Program Planning and Evaluation Oregon State University
Elements of Rigor Evaluation design Conceptualization of program constructs & outcomes Measurement strategies Timeframe of the evaluation study Program integrity Program participation and attrition Statistical analyses Braverman, M. T., & Arnold, M. E. (2008). An evaluator’s balancing act: Maintaining rigor while being responsive to multiple stakeholders. In M. T. Braverman, M. Engel, R. A. Rennekamp, & M. E. Arnold (Eds.) Program evaluation in a complex organizational system: Lessons from Cooperative Extension. New Directions for Evaluation, 120,
Rigor and the 4-H Organization Who determines standards of rigor? How do decisions about evaluation methods get made? How, and to what extent, is the quality of a completed evaluation determined?
(XO) O = “Observation” (data collection)X = “intervention” (program) E = Experimental group (program participants) C = Control group (non-participants ) Post Only Design Evaluation Question Example: What life skills do campers report developing at 4-H camp?
Strongly Disagree Disagree Agree Strongly Agree 19 To work with others as a team To feel good about myself To be independent To make me want to try new things To be responsible To cooperate with others To talk to others more easily To work through disagreements 1234 Attending this camp gave me the opportunity to: What can be said? Percentage of youth ratings for each item (frequencies) Mean ratings for each item Ranking of highest to lowest of mean ratings What is the level of rigor? What cannot be said?
Post Only Control Group Design E (XO) C (XO) Evaluation Question Example: Do youth who attend 4-H summer science camp have better science skills than youth who do not attend?
Please fill in the circle that tells how much you currently can use each of the following skills when you work on a science investigation: What can be said? Percentage of youth ratings for each item – for each group (frequencies) Mean ratings for each item – for each group Ranking of highest to lowest of mean ratings – for each group Comparison between groups for each of the above With enough cases, a statistical test for significant differences between the two groups can be conducted NeverSometimesUsuallyAlways I can use scientific knowledge to form a question OOOO I can ask a question that can be answered by collecting data OOOO I can design a scientific procedure to answer a question OOOO What is the level of rigor? What cannot be said?
One Group Pre-Test/Post-Test (O XO) Evaluation Question Example: Do youth have higher levels of positive youth development at the end of the program than they did at the beginning?
Please indicate your level of agreement with each item: Strongly DisagreeDisagreeAgree Strongly Agree I feel good about my scholastic abilityOOOO I feel accepted by my friendsOOOO I can figure out right from wrongOOOO I can do things that make a differenceOOOO What can be said? Percentage of youth ratings for each item – before and after the program (frequencies) Mean ratings for each item – before and after the program Pre and post-program comparisons for each of the above With enough cases, a statistical test for significant differences between pre and pos- program ratings can be conducted What is the level of rigor? What cannot be said?
Retrospective Pre-Test ( O XO) Evaluation Question Example: Do youth have higher levels of positive youth development at the end of the program than they did at the beginning?
For each of the following items, please indicate how you felt before participating in this program, and how you feel now after participating in this program. 1 = Strongly disagree 2 = Disagree 3 = Agree 4 Strongly Agree What can be said? Percentage of youth ratings for each item – before and after the program (frequencies) Mean ratings for each item – before and after the program Pre and post-program comparisons for each of the above With enough cases, a statistical test for significant differences between pre and pos- program ratings can be conducted BeforeAfter I feel accepted by my friendsOOOOOOOO I can figure out right from wrongOOOOOOOO I can do things that make a differenceOOOOOOOO
Control Group Pre-Test/Post Test Evaluation Question Example: Do youth have higher levels of positive youth development at the end of the program than they did at the beginning? E (O XO) C (O ---O)
Please indicate your level of agreement with each item: Strongly DisagreeDisagreeAgree Strongly Agree I feel good about my scholastic abilityOOOO I feel accepted by my friendsOOOO I can figure out right from wrongOOOO I can do things that make a differenceOOOO What can be said? Percentage of youth ratings for each item – both groups; before and after the program (frequencies) Mean ratings for each item – both groups; before and after the program Pre and post-program comparisons between groups With enough cases, a statistical test for significant differences between groups pre and post-program can be conducted
Time Series Design with Control Group Evaluation Question Example: Do youth have higher levels of positive youth development at the end of the program than they did at the beginning? E (O O X OO) C (O O --- OO)
Please indicate your level of agreement with each item: Strongly DisagreeDisagreeAgree Strongly Agree I feel good about my scholastic abilityOOOO I feel accepted by my friendsOOOO I can figure out right from wrongOOOO I can do things that make a differenceOOOO What can be said? Percentage of youth ratings for each item – both groups; before and after the program (frequencies) Mean ratings for each item – both groups; before and after the program Pre and post-program comparisons between groups With enough cases, a statistical test for significant differences between groups pre and post-program can be conducted Sophisticated analysis in some case, such as latent growth curve modeling
Time Series Design with Control Group OOOOXOOOOOOOXOOO