Download presentation
Presentation is loading. Please wait.
Published byLisa Geraldine Stone Modified over 9 years ago
1
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability
2
ERROR & CONFIDENCE Reducing error All assessment scores have error Want to minimize so scores are accurate Protocols & periodic staff training/retraining Increasing confidence Results lead to correct placement Assessments that produce valid, reliable, and usable results
3
ASSESSMENT RESULTS Norm-referenced Individual’s score compared to others in their peer/norm group School tests, 95% Norm group needs to be representative of test takers the test was designed for
4
ASSESSMENT RESULTS Criterion-referenced Individual’s score compared to a preset standard or criterion Standard doesn’t change based on the individual or group A=250-295 points
5
VALIDITY Describes how well the assessment results match their intended purpose Are you measuring what you think you are measuring? Relationship between program & assessment content Does not have validity for all purposes, populations or time
6
VALIDITY Depends on different types of evidence Is a matter of degree (no tool is perfect) Is a unitary concept Change from past Former types are now considered as evidence Content validity/content-related evidence
7
FACE VALIDITY Not listed in text Do the items seem to fit?
8
CONTENT VALIDITY (Content-related evidence) How well does assessment measure subject or content? Representative Completeness----all major areas Nonstatistical Review of literature or expert opinion Blueprint of major components Per Austin (1991), minimum requirement for any assessment
9
CRITERION-RELATED VALIDITY (Criterion-related evidence) Comparison of results Statistical Reported as validity or correlation coefficient +1 to -1 (1 is a perfect relationship) 0 = no relationship r.73 better than r.52 r +/-.40 to +/-.70 = acceptable range
10
CRITERION-RELATED VALIDITY (Criterion-related evidence) May use.30 to.40 if statistically significant If validity is reported, it is generally criterion-related validity 2 types Predictive Concurrent
11
PREDICTIVE VALIDITY The ability of an assessment to predict future behaviors or outcomes Measures are taken at different times ACT or SAT & success in college Leisure Satisfaction predicts discharge
12
CONCURRENT VALIDITY More than one instrument measures the same content Desire to predict 1 set of scores from another set of scores that are taken at the same or nearly same time measuring the same variable
13
CONSTRUCT VALIDITY (Construct-related evidence) Theoretical/conceptual Content & criterion-related validity contribute to construct validity Research concerning conceptual framework on which assessment is based contribute to construct validity Not demonstrated in a single project or statistical measure Few TR have: focus = behavior not construct
14
CONSTRUCT VALIDITY (Construct-related evidence) Factor analysis Convergent validity (what it measures) Divergent validity (what it doesn’t measure) Expert panels here too
15
THREATS TO VALIDITY Assessment s/b valid for intended use (e.g. research instruments) Unclear directions Unclear or ambiguous terms Items that are at inappropriate level for subjects Items not related to construct being measured
16
THREATS TO VALIDITY Too few items Too many items Items with an identifiable pattern of response Method of administration Testing conditions Subjects health, reluctance, attitudes See Stumbo, 2002, p.41-42
17
VALIDITY Can’t get valid results without reliable results, but can get reliable results without valid results Reliability is a necessary but not sufficient condition for validity See Stumbo, 2002, p. 54
18
RELIABLITY Accuracy or consistency of a measurement Reproducible results Statistical in nature r = between 0 & 1 (with 1 being perfect) Should not be lower than.80 Tells what portion of variance is non-error variance Increases with length of test & spread of scores
19
STABILITY (Test-retest) How stable is the assessment? Assessment not overly influenced by passage of time Same group assessed 2 times with same instrument & results of the 2 testings are correlated Are the 2 sets of scores alike? Time effects (longer, shorter)
20
EQUIVALENCY (Equivalent forms) Also known as parallel-form or alternative- form reliability How closely correlated are 2 or more forms of the same assessment? 2 forms have been developed and demonstrated to measure the same construct Forms have similar but not same items e.g. NCTRC exam Short & long forms are not equivalent
21
INTERNAL CONSISTENCY How closely are items on the assessment related? Split half 1st half vs. 2nd half Odd/even Matched random subsets If can’t divide Cronbach’s alpha Kuder-Richardson Spearman-Brown’s formula
22
INTERRATER RELIABILITY Percentage of agreements with number of observations Difference between agreement & accuracy Raters compared to each other 80% agreement
23
INTERRATER RELIABILITY Simple agreement Number of agreements & disagreements Point-to-point agreement Takes each data point into consideration Percentages of agreement for the occurrence of target behavior Kappa index
24
INTRARATER RELIABILITY Not in text Compared with self
25
RELIABILITY Manuals often give this information High reliability doesn’t indicate validity Generally a longer test has higher reliability Lessens influence of chance or guessing
26
FAIRNESS Reduction or elimination of undue bias Language Ethnic or racial backgrounds Gender Free of stereotypes & biases Beginning to be a concern for TR
27
USABILITY & PRACTICALITY Nonstatistical Is this tool better than any other tool on market or one I can design? Time, cost, staff qualifications, ease of administration, scoring, etc
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.