CRJS 4466 PROGRAM & POLICY EVALUATION LECTURE #4 Test #1 results Evaluation projects Questions?
Measurement in Program Evaluation: test – measurement theory: observed score on measure true score error
Deductive/Inductive Model Theory ConceptPropositionConcept VariableHypothesisVariable Operationalization Operationalization Indicator(s)Indicator(s) Empirical Empirical
Conceptual Framework Measures Intended Inputs Program Components Intended Outputs Intended Outcomes Measures
An Example Let’s begin with an example – The photo radar program in BC is intended to reduce the number of speed-related motor vehicle collisions on BC roadways We can model it Photo Radar Program Fewer speed- related motor vehicle collisions
An Example If we want to measure the performance of the program, we need to translate the intended outcome into observables Our conceptual framework for measurement outlines the process
Table 4-2:Program Logic of the Vancouver Radar Camera Intervention
Construct Is the construct clearly stated? Speed-related motor vehicle collisions Measurement procedures (the actual steps we use to gather the data) Criteria/Issues For Measurement MeasurementProcess Our Example Attending police officer’s assessment of whether speed was a contributing factor; recorded in an accident report; entered into a database Are the measurement procedures valid and reliable?
Figure 4-2:Measuring Constructs in Evaluations
Measuring Mental Constructs -We ask survey questions -We try to control how the questions are asked -Intended survey questions or survey items are stimuli While we are asking the questions, uncontrolled things happen: -Interviewer characteristics -Setting characteristics -Interviewee characteristics -Instrument characteristics STIMULI RESPONSES The Person’s: KNOWLEDGEATTITUDESEXPERIENCE Valid and reliable responses to survey items (useful data) -Responses to uncontrolled stimuli (noise) -These produce invalid or unreliable data The challenge is to separate useful data from noise
Validity and Reliability of Measures Validity: does the variable actually measure the corresponding construct? In our example of the photo radar program, do we believe that police officers can actually tell whether speed was a contributing factor in a motor vehicle accident? Reliability: if we repeat the measurement process for a construct in a given situation, do we get the same result? In a given accident situation, would independent observers reach the same conclusions about speed being a contributing factor?
Types of Validity There are different ways of assessing validity – several are relevant here Face validity: do we judge the measurement process/variable to validly represent the construct? Content validity: would experts in the field say that the measure captures the meaning of the construct? Concurrent validity: does the measure correlate with another measure that is valid? – Measuring crime levels (police reports and victim surveys)
Types of Reliability We can also assess reliability in different ways Having two or more independent observers take measurements in a given situation – Two police officers completing accident report forms Having the same observer repeat the measurement process in a given situation – Police officer repeats the assessment of possible contributing causes of the accident
Tests for Checking Reliability Test-retest method - take the same measurement more than once. Split-half method - make more than one measurement of a social concept (prejudice). Use established measures. Check reliability of research-workers.
Characteristics of Variables Variables can categorize (nominal variables) – Categories must be mutually exclusive and jointly exhaustive In a job training program, clients could be categorized as being on social assistance or not Variables can rank (ordinal variables) – Categories are ranked from less to more In a job training program, clients could be asked to rate the program: not beneficial, somewhat beneficial, very beneficial Variables can count (interval and ratio variables) – There is a unit of measurement Number of weeks of job training
Likert Item and Response Categories Improved pre-harvest planning, quicker reforestation, and better planting maintenance would reduce the need for chemical or mechanical treatments. StronglyStrongly AgreeAgree Neither DisagreeDisagree Please circle the appropriate response
Example Questions Question 8:Do you think that your police services would improve if your police department and all other police departments (emphasis in the original) in the West Shore area combined into one department? _____ Yes_____ No_____ Undecided Question 9:Have you discussed this question of police consolidation with friends or neighbors? _____ Yes_____ No_____ Undecided Question 10:Are you for or against combining your police department with police departments in surrounding municipalities? _____ Yes_____ No_____ Undecided
Examples of Validity and Reliability Issues Applicable to Surveys Validity: BiasSource of the ProblemReliability: Random Error race, gender, appearance, interjections, interviewer reactions to responses interviewerinconsistency in the way questions are worded/spoken old age, handicaps, suspicion respondentwandering attention biased questions, response set, question order instrumentsingle measures to measure client perceptions of the program privacy, confidentiality, anonymity interviewing situation/interviewing method noise, interruptions biased coding, biased categories (particularly for qualitative data) data processingcoding errors, intercoder reliability problems
Four Levels of Measurement 1.Nominal - offer names for labels for characteristics (gender, birthplace). 2.Ordinal - variables with attributes we can logically rank and order.
Four Levels of Measurement 3.Interval - distances separating variables (temperature scale). 4.Ratio - attributes composing a variable are based on a true zero point (age).