Download presentation
1
Educational Assessment
Assessment Issues and Program Evaluation Procedures
2
Outcomes Assessment a.k.a How Do I Know If I’m Doing What I Think I’m Doing? 1st: Identify what you are trying to do. This may include general outcomes and specific outcomes. For example: Increase the number of women entering the fields of math and engineering (general) Improve high school girls attitudes about math and engineering (specific) 2nd: Identify ways to accurately assess whether these outcomes are occurring. 3rd: Establish a procedure for program evaluation
3
Identify What You Are Trying To Do
Some examples: Change attitudes about math and engineering Increase girls’ sense of self-efficacy in math and engineering Improve motivation to engage in math and engineering Increase skills in math and engineering Increase the number of girls who go on to major in math and engineering from your high school Increase the number of women who graduate from college with math and engineering majors Some of these are assessments of attitude, some are assessments of skills and some are assessments of behavior. Because long-term outcome assessment is often difficult, we’d like to be able to assess attitudes that should theoretically predict those long term changes of behavior. It is especially good if we have some empirical knowledge about such a relationship: for example, we know that a sense of self-efficacy in reading is related to the development of future reading skills. Don’t know how much empirical evidence we have for math/engineering, so long-term follow-up would still be really useful. For now, I’m going to talk in more detail about how to accurately assess attitudes and motivation, which is typically (and most easily) done with questionnaires
4
Critical Issues for Assessment Tools
Reliability Consistency of test scores The extent to which performance is not affected by measurement error Validity The extent to which a test actually measures what it is supposed to measure Reliability - use scale example Sometimes these are scales that are already shown to be reliable and valid - great to use these when you can Sometimes you must make up your own scale, and then it will be important to evaluate whether it is reliable and valid. Also, a scale that is reliable and valid for one purpose may not be for another purpose, So good to always check in your own data
5
Types of Reliability Test-Retest
Correlation of two tests taken on separate occasions by the same individual Limits: Practice effects, recall of former responses Alternate Form Correlation of scores obtained on two parallel forms Limits: May have practice effects, alternate forms often not available Ist two probably won’t use, but should know about
6
Types of Reliability Split-half
Correlation between two halves of a test Limits: Shortens test, which affects reliability, difficult with tests that measure different things in the same test (heterogeneous tests) Kuder-Richardson and Coefficient Alpha Inter-item consistency: Average correlation of each item with every other item Limits: Not useful for hetergeneous tests These are better, because they require only one administration. If you plan to publish the results of your program evaulation, you should be sure to check your measure using one of these techniques K-R for yes-no responses Alpha for continuous scale responses
7
Types of Validity Content Validity
Checking to make sure that you’ve picked questions that cover the areas you want to cover, thoroughly and well. Difficulties: “Adequate sampling of the item universe.” Important to ensure that all major aspects are covered by the test items and in the correct proportions Specific Procedures: Content validity is built into the test from the onset through the choice of appropriate items.
8
Types of Validity Concurrent and Predictive Validity
Definition: The relationship between a test and some criteria. The practical validity of a test for a specific purpose. Examples: Do high school girls who score high on this test go on to succeed in college as engineering majors? (P) Do successful women engineering majors score high on this test? (C) Difficulties: Criterion contamination; trainers must not know examinees’ test scores Specific Procedures: infinite, based on purpose of the test
9
Types of Validity Construct Validity
Definition: the extent to which the test may be said to measure a theoretical construct or trait Any data throwing light on the nature of the trait and the conditions affecting its development and manifestations represent appropriate evidence for this validation Example: I have designed a program to lower girls’ math phobia. The girls who complete my program should have lower scores on the Math Phobia Measure compared to their scores before the program and compared to the scores of girls who have not completed the program
10
Optimizing Reliability & Validity
Here are some tips for making sure your test will be reliable and valid for your purpose (circumstances that affect reliability and validity): The more questions the better (the number of test items) Ask questions several times in slightly different ways (homogeneity) Get as many people as you can in your program (N) Get different kinds of people in your program (sample heterogeneity) (Linear relationship between the test and the criterion)
11
Selecting and Creating Measures
1. Define the construct(s) that you want to measure clearly 2. Identify existing measures, particularly those with established reliability and validity 3. Determine whether those measures will work for your purpose and identify any areas where you may need to create a new measure or add new questions 4. Create additional questions/measures 5. Identify criteria that your measure should correlate with or predict, and develop procedures for assessing those criteria
12
Measuring Outcomes Pre and post tests
Involves giving measure before intervention/training and then following the intervention in order to measure change as a result of the intervention Important to identify what you are trying to change with your intervention (the constructs) in order to use measures that will pick up that change Be sure to avoid criterion contamination Limitations: If your group is preselected for the program, the variability will be restricted
13
Measuring Outcome Follow-up Procedures
These may involve re-administering your pre/post measure again after some interval following the end of the program or any other criterion that should theoretically be predicted by your intervention, such as: choosing to take math/engineering courses choosing to major in math/engineering choosing a career in math/engineering
14
Measuring Outcome Control Groups
One critical problem faced by anyone who conducts an intervention is whether any observed change are related to the intervention or to some other factor (e.g., time, preselection, etc). The only way to be sure that your intervention is causing the desired changes is to use a control group. The control group must be the same as the treatment group in every way (usually by random assignment to groups), except the control group does not receive the intervention. Any differences between these groups can then be attributed to the intervention. How do you know whether the girls who chose to attend your program would not have gone on to major in math/engineering anyway?
15
Measuring Outcome Alternatives to randomly assigned control groups:
Matched controls Comparison groups Comparison across programs Remember, you’ll need to use the same assessment and follow-up procedures for both groups
16
Comparing Across Programs
In order to compare successfully across programs, you will also need to assess: Program characteristics Participant characteristics So you will need to also ask yourselves: What are the important aspects of the programs that I should know about? What are the important characteristics of the girls that I should know about? Probably the most likely procedure most of you will use is comparison across programs. And this is a big part of why we all came together for this workship. So I want to talk a bit about how to do this.
17
An Ideal Outcome Assessment
Treatment group All participants Participants receives All participants All participants fill out initial randomly intervention fill out post- are followed questionnaires assigned to questionnaires through college conditions Control group and to first job receives no intervention
18
A More Realistic Outcome Assessment?
Girls involved Each program Programs in each program Girls Girls fill reports data & conducting fill out pre-tests participate out post- program charac- follow-ups and client in programs questionnaires teristics report follow- characteristics up data
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.