Download presentation
Presentation is loading. Please wait.
Published byEmery McDowell Modified over 8 years ago
1
1 Measurement Error All systematic effects acting to bias recorded results: -- Unclear Questions -- Ambiguous Questions -- Unclear Instructions -- Socially-acceptable Answers -- Respondents do not know answers -- Deliberate lies
2
2 VALIDITY ADDRESSES SYSTEMATIC ERROR IN MEASUREMENT A VALID INSTRUMENT IS ONE THAT MEASURES WHAT IT PURPORTS TO MEASURE, ALL IT PURPORTS TO MEASURE AND ONLY WHAT IT PURPORTS TO MEASURE
3
3 FACE VALIDITY “ITEMS LOOK AS THOUGH THEY MEASURE WHAT IS IMPORTANT” LOOK VALID TO THE DEVELOPER OR LOOK VALID TO THOSE WHO WILL COMPLETE THE INSTRUMENT APPEAL AND APPEARANCE OF THE INSTRUMENT A “FIELD TEST” IS USED TO VERIFY
4
4 CONTENT VALIDITY DOES THE TEST GIVE A FAIR MEASURE ON SOME IMPORTANT SET OF TASKS? DOES IT REPRESENT THE CONTENT OF THE DOMAIN? PROCEDURE: PANEL OF EXPERTS USED TO COMPARE THE ITEMS LOGICALLY TO THE DOMAIN TO BE MEARSURED TO PRODUCE A “JURIED” INSTRUMENT NOT EXPRESSED AS A NUMBER
5
5 PREDICTIVE VALIDITY (CRITERION RELATED VALIDITY) DO TEST SCORES PREDICT A CERTAIN FUTURE PERFORMANCE? GIVE TEST AND USE RESULTS TO PREDICT THE OUTCOME SOME TIME LATER CORRELATION OFTEN USED COULD ALSO BE USED FOR “KNOWN SOURCE”: WILL THIS LEADERSHIP MEASURE DIFFERENTIATE BETWEEN PEOPLE WHO WILL BECOME GOOD LEADERS AND THOSE WHO WILL NOT?
6
6 CONCURRENT VALIDITY (CRITERION RELATED VALIDITY) COMPARE RESULTS WITH SOME CURRENT PERFORMANCE CORRELATION OFTEN USED “KNOWN SOURCE” COULD ALSO APPLY OFTEN USED TO MAKE A TEST TO SUBSTITUTE FOR A LESS CONVENIENT PROCEDURE
7
7 CONSTRUCT VALIDITY WANT TO KNOW WHAT PSYCHOLOGICAL OR OTHER PROPERTY OR PROPERTIES CAN “EXPLAIN” THE VARIANCE OF THE TEST. “HOW CAN SCORES ON THIS TEST BE EXPLAINED PSYCHOLOGICALLY?” “WHAT PSYCHOLOGICAL CONSTRUCTS UNDERLIE THIS TEST?” PROCEDURE: FACTOR ANALYSIS OR HYPOTHESIS TESTING
8
8 RELIABILITY DOES AN INSTRUMENT CONSISTENTLY MEASURE WHATEVER IT IS MEASURING? SYNONYMS: –DEPENDABILTY –STABILITY –CONSISTENCY –PREDICTABILITY –ACCURACY RELIABILITY IS DEFINED THROUGH ERROR: THE MORE ERROR, THE GREATER THE UNRELIABILITY; THE LESS ERROR, THE GREATER THE RELIABILITY
9
9 TEST-RETEST(2 Adm) A “COEFFICIENT OF STABILITY” IS PRODUCED TEST ADMINISTERED TO SAME GROUP ON TWO OCCASIONS, CORRELATED THE TWO OR CALCULATE % AGREEMENT ON ITEMS COEFFICIENT CAN CHANGE WITH TIME; THUS, ALWAYS REPORT TIME: r (one-week) =.77 LONGER TIME BETWEEN TESTS, LOWER THE COEFFICIENT
10
10 ONE ADMINISTRATION 1. COEFFICIENT OF EQUIVALENCY, OR 2. COEFFICIENT OF INTERNAL CONSISTENCY 1. EQUIVALENCY: TELLS HOW WELL A TEST AGREES WITH ANOTHER EQUIVALENT MEASURE MADE AT THE SAME TIME: PARALLEL FORM PROCEDURE EXAMPLE: NEED TWO SIMILAR TESTS FOR A PRETEST – POSTTEST STUDY -- CALLED PARALLEL FORM OR EQUIVALENT FORM TESTS PROCEDURE: ADMINISTER TEST, SCORE WITH ITEM ANALYSIS, MAKE TWO EQUALLY DIFFICULT AND DISCRIMINATING TESTS
11
11 INTERNAL CONSISTENCY A. SPLIT-HALF METHOD B. ALPHA OR K-R METHODS (ONE ADMINISTRATION) (1) GIVE TEST (2) CALCULATE RELIABILITY FOR A: SPLIT TEST IN HALF: ODD EVEN, OR FIRST WITH SECOND HALF (r). SINCE r IS A FUNCTION OF LENGTH, CORRECT FOR SHORT TESTS: SPEARMAN-BROWN CORRECTION
12
12 ALPHA OR K-R K-R 20, 21 & CRONBACH’S ALPHA SPLIT TESTS IN ALL POSSIBLE WAYS AND INTERCORRELATE CANNOT USE WITH SPEED TESTS USE K-R 20 FOR RIGHT/WRONG ALPHA & 21 FOR MULTIPLE RESPONSE CATEGORIES; e.g., Likert-type Scales
13
13 FACTORS INFLUENCING RELIABILITY 1. > ITEMS = > RELIABILITY 2. > TIME = > RELIABILITY 3. R 4. > OBJECTIVE SCORING = > R 5. > PROBABILITY OF SUCCESS BY CHANCE = < RELIABILITY 6. > INACCURACY IN SCORING = < R 7. > HOMOGENOUS MATERIAL = > R 8. > COMMON EXPERIENCE OF STUDENTS = > R 9. > TRICK QUESTIONS = < R 10. > MISINTERPREATION OF ITEMS = < R
14
14 IMPROVING r WRITE UNAMBIGUOUS ITEMS ADD MORE ITEMS OF EQUAL KIND AND DIFFICULTY ( See page 29). USE CLEAR AND STANDARD INSTRUCTIONS r DEPENDS ON SPREAD OF SCORES; THUS, LOW r COULD BE BECAUSE EVERYONE SCORES ABOUT THE SAME TEST CAN BE RELIABLE FOR ONE LEVEL OF ABILITY AND NOT ANOTHER VALIDITY COEFFICIENT CANNOT EXCEED THE SQUARE ROOT OF r
15
15 ACCEPTABLE r? NUNNALLY: … DEPENDS ON USE. “IN EARLY STAGES OF RESEARCH ON HYPOTHESIZED MEASURES OF A CONSTRUCT, ONE SAVES TIME AND MONEY BY WORKING WITH INSTRUMENTS THAT HAVE ONLY MODEST r; AN r OF.50 TO.60 WILL SUFFICE.”
16
16 SUITABILITY REALLY A PART OF VALIDITY “IS INSTRUMENT SUITABLE FOR THE AUDIENCE?” -- FIELD TEST READABILITY: FOG INDEX (Many others available) 1. RANDOM SAMPLE OF 100 WORDS, COUNT NUMBER OF SENTENCES. DIVIDE #WORDS/# SENTENCES = AVERAGE SENTENCE LENGTH (ASL) 2. COUNT WORDS, IN THOSE 100, WITH 3 OR MORE SYLLABLES; OMITTING COMBINATIONS OF EASY WORDS (BUTTER-FLY) AND WORDS MADE 3 BY ADDING “ED” OR “ES” (CREATED); OMITTING CAPITALIZED WORDS = % HARD WORDS (%HW) 3. # YEARS EDUC = (ASL + % HW) (.4) [MOST PEOPLE PREFER TO READ 2 GRADE LEVELS BELOW THEIR LEVEL OF EDUCATION].
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.