Download presentation
Presentation is loading. Please wait.
Published byLouisa Pope Modified over 9 years ago
1
1 EPSY 546: LECTURE 1 INTRODUCTION TO MEASUREMENT THEORY George Karabatsos
2
2 What is test theory?
3
3 WHAT IS A TEST? Test: A procedure for obtaining a sample of person behavior from a specified domain of items.
4
4 WHAT IS A TEST? Test: A procedure for obtaining a sample of person behavior from a specified domain of items. General: Exam, questionnaire, survey, judge-observed task, etc.
5
5 ITEM RESPONSE SCORING Test item responses are “scored”. Some Examples: Dichotomous : 1 = Correct, 0 = Incorrect (Scored from possibly a multiple choice test item)
6
6 ITEM RESPONSE SCORING Test item responses are “scored”. Some Examples: “Rating Scale”: 1 = Strongly Disagree 2 = Disagree 3 = Agree 4 = Strongly Agree
7
7 ITEM RESPONSE SCORING Test item responses are “scored”. Some Examples: Partial Credit: 1 = Completely incorrect 2 = Partially correct 3 = Completely correct
8
8 WHAT TESTS DO Tests are designed to measure latent traits that manifest in the responses to the test items.
9
9 LATENT VARIABLES Some substantive examples of latent traits: –Exam: Ability on long division. –Attitude Questionnaire: Agreement towards capital punishment. –Survey: Frequency of drug use. –Survey: Quality of life.
10
10 LATENT VARIABLES Latent trait = latent variable = psychological trait/variable/attribute = unidimensional variable = construct
11
11 LATENT VARIABLES For measurement, latent variables are often numerically represented either: –by total test score (person or item), –or by parameters of “person ability” or “item difficulty”.
12
12 Some Challenges of latent trait measurement (5) 1. No single approach to the measurement of a latent trait is universally accepted.
13
13 Some Challenges of latent trait measurement (5) 1. No single approach to the measurement of a latent trait is universally accepted. ** Two theorists may possibly select different items to measure a particular latent trait (e.g., math ability).
14
14 Some Challenges of latent trait measurement (5) 2. Psychological measurements are usually based on limited samples of behavior.
15
15 Some Challenges of latent trait measurement (5) 2. Psychological measurements are usually based on limited samples of behavior. ** Practically impossible to confront respondents with all possible items that represent the latent trait (e.g., all long division items)
16
16 Some Challenges of latent trait measurement (5) 2. Psychological measurements are usually based on limited samples of behavior. ** N = 1, for each person on an item.
17
17 Some Challenges of latent trait measurement (5) 3. Latent trait measurement obtained is always subject to error.
18
18 Some Challenges of latent trait measurement (5) 3. Latent trait measurement obtained is always subject to error. Random: sampling error of respondents, and of items; inherent unreliability of respondents (e.g., boredom, lucky guess, carelessness).
19
19 Some Challenges of latent trait measurement (5) 3. Latent trait measurement obtained is always subject to error. Systematic: Cheating on exam; Response bias; item does not measure latent trait; misscoring; test form out of order.
20
20 Some Challenges of latent trait measurement (5) 4. Establishing measurement scales for the latent trait.
21
21 Some Challenges of latent trait measurement (5) 4. Establishing measurement scales for the latent trait. Stevens (1946): “the assignment of numerals or events according to rules.” (NOT!)
22
22 Some Challenges of latent trait measurement (5) 4. Establishing measurement scales for the latent trait. Michell: Measurement requires tests of the hypothesis that the variable is quantitative. (Echoing Luce, Krantz, Suppes, Tversky, in three FM volumes)
23
23 Some Challenges of latent trait measurement (5) 5. Latent traits must also demonstrate relationships to other important traits or observable phenomena.
24
24 Some Challenges of latent trait measurement (5) 5. Latent traits must also demonstrate relationships to other important traits or observable phenomena. **Measurements of latent traits have value when they can be related to other traits or events in the real world.
25
25 WHAT IS TEST THEORY? The study of the 5 pervasive measurement problems just described, and developing/applying methods for their resolution.
26
26 TEST THEORY COURSE Become aware of the logic and mathematical models that underlie practices in test use and construction.
27
27 TEST THEORY COURSE Awareness of these models, including their assumptions and limitations, should lead to an improved practice in test construction and more intelligent use of test information in decision making.
28
28 TEST THEORY COURSE Test theory provides general framework for viewing the process of instrument development. Test theory distinguishes from the more applied subject of educational and psychological assessment (focuses on administration and interpretation of specific tests).
29
29 Process of Test Construction
30
30 TEST CONSTRUCTION 10 steps can be followed to construct an test for the measurement of persons (and items). (C&A, Chapter 4)
31
31 TEST CONSTRUCTION 1. Identify the primary purpose(s) for which the test measurements will be used.
32
32 TEST CONSTRUCTION 1. Identify the primary purpose(s) for which the test measurements will be used. 2. Hypothesize items that define the latent trait of interest.
33
33 TEST CONSTRUCTION 3. Prepare a set of test specifications, delineating the proportion of items that should focus on each type of behavior identified in Step 2.
34
34 TEST CONSTRUCTION 3. Prepare a set of test specifications, delineating the proportion of items that should focus on each type of behavior identified in Step 2. 4. Construct an initial pool of items.
35
35 TEST CONSTRUCTION 5. Have items reviewed and revised.
36
36 TEST CONSTRUCTION 5. Have items reviewed and revised. 6. Hold preliminary item tryouts (and revise).
37
37 TEST CONSTRUCTION 5. Have items reviewed and revised. 6. Hold preliminary item tryouts (and revise). 7. Field test the items on a large sample representative of the examinee population for whom the test is intended. (PILOT STUDY)
38
38 TEST CONSTRUCTION 8. Determine statistical properties of the items, and when appropriate, eliminate items that do not meet pre-established criteria.
39
39 TEST CONSTRUCTION 8. Determine statistical properties of the items, and when appropriate, eliminate items that do not meet pre-established criteria. 9. Design and conduct reliability and validity studies for the final form of the test.
40
40 TEST CONSTRUCTION 10. Develop guidelines for administration, scoring, and interpretation of the test scores. (e.g., prepare norm tables, suggest recommended cutting scores or standards for performance, etc.)
41
41 Statistical Concepts for Test Theory
42
42 BASIC STATISTICS (C&A2) Frequency tables and graphs Distribution Normal distribution (p.d.f., c.d.f.) Central tendency: Mode, median, mean. Variability: Variance, standard deviation. Z - scores For infinite populations.
43
43 BASIC STATISTICS (C&A2) Relationship between two variables Scatterplot. Pearson’s correlation coefficient. Ordinary linear regression. Standard error of Y predictions, for a given regression equation.
44
44 BASIC STATISTICS (C&A5) Statistics: Test Items Mean and total score for an item, over respondents (item difficulty). Variance of responses on a test item Inter-item correlation (Pearson’s product moment correlation or phi-correlation)
45
45 VARIANCE OF TEST SCORES AND TEST ITEMS Since tests are usually scored by the sum of the item scores, it follows that there should be some relationship between individual item variances and the variance of the total test scores.
46
46 VARIANCE OF TEST SCORES AND TEST ITEMS In fact, since the measurement of individual differences is a central goal of testing, one goal of test construction should be to maximize the variance of the total test scores. The reliability and validity of a test depends on this variance.
47
47 Covariance between items i and j : N = Number of respondents J = number of items = population mean VARIANCE OF TEST SCORES AND TEST ITEMS
48
48 Variance-Covariance Matrix VARIANCE OF TEST SCORES AND TEST ITEMS
49
49 Total Test Score Variance = Sum of item variances + sum of item covariances VARIANCE OF TEST SCORES AND TEST ITEMS
50
50 Implications of Equation (first term) Total test score variance increases as the number of items (J) is increased. (except when the added items have a non positive correlation with the other items). VARIANCE OF TEST SCORES AND TEST ITEMS
51
51 Implications of Equation (second term) Test score variance increases when items are added that have positive covariances with the other test items. VARIANCE OF TEST SCORES AND TEST ITEMS
52
52 Implications of Equation Test score variance is maximized when: –items are equal in difficulty (this increases item covariances), –and of “medium” difficulty (this increases item variances). VARIANCE OF TEST SCORES AND TEST ITEMS
53
53 Introduction To Scaling
54
54 4 SCALES OF MEASUREMENT 1. Nominal Scale: –Used for classification. –Assigns the same numbers to objects that are equivalent, and a different number to objects that are not.
55
55 4 SCALES OF MEASUREMENT 1. Nominal Scale: –Class of admissible transformations: class of one-to-one transformations. i.e., n i (x) = n i (y) iff n j (x) = n j (y) for all scales i, j, and objects x, y.
56
56 4 SCALES OF MEASUREMENT 2. Ordinal Scale: –With respect to some attribute, this scale orders objects in magnitude, but does not measure distances between the objects. –Example: Ranking
57
57 4 SCALES OF MEASUREMENT 2. Ordinal Scale: –Class of admissible transformations: class of increasing monotonic transformations. i.e., n i (x) > n i (y) iff n jj (x) > n j (y) for all scales i, j, and objects x, y.
58
58 4 SCALES OF MEASUREMENT 3. Interval Scale: –Involves the numerical representation of relation upon the differences between entities with respect to some attribute. (no absolute zero point) –Example: temperature measurement. (Fahrenheit, Celsius)
59
59 4 SCALES OF MEASUREMENT 3. Interval Scale: –Class of admissible transformations: class of positive linear transformations. n j (x) = a[n i (x)] + b for a > 0, 0 0 e.g., C = (5/9)F (160/9)
60
60 4 SCALES OF MEASUREMENT 4. Ratio Scale: –Has properties of order, equal distance between units, and an absolute zero point. –Non-zero measurements on this scale may be expressed as ratios of one another. –Examples: Length, weight, etc.
61
61 4 SCALES OF MEASUREMENT 4. Ratio Scale: –Class of admissible transformations: class of multiplicative transformations n i (x) = [ n j (x) ] c, for c > 0
62
62 MEASUREMENT As mentioned earlier, establishing a measurement scale for a given variable requires hypothesis tests. The measurement of directly observable, physical phenomena is easily obtainable and verifiable.
63
63 MEASUREMENT However, this is not the case for the measurement of “latent” psychological phenomena (e.g., ability, intelligence, attitudes, beliefs, etc.), which are not directly observable.
64
64 CONJOINT MEASUREMENT The axioms of conjoint measurement can be tested to determine whether latent traits are measurable on an ordinal or interval scale.
65
65 INDEPENDENCE AXIOM (row)
66
66 Monotone Homogeneity (MH)
67
67 2PL:
68
68 3PL:
69
69 4PL:
70
70 INDEPENDENCE AXIOM (column)
71
71 ISOP (Scheiblechner 1995)
72
72 RASCH-1PL:
73
73 Thomsen condition (e.g.,double cancellation)
74
74
75
75 MH analysis ICC Crossings
76
76 DM analysis
77
77 Model Selection & Evaluation
78
78 Model Assessment: Detailed
79
79 Model Assessment: Detailed Person Fit Posterior Item Predictive Examinee Responses P-value 2154 110100.67 279 101001.12 987 000011.00
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.