1 EPSY 546: LECTURE 1 SUMMARY George Karabatsos
2 REVIEW
3 Test (& types of tests) REVIEW
4 Test (& types of tests) Item response scoring paradigms REVIEW
5 Test (& types of tests) Item response scoring paradigms Data paradigm of test theory (typical) REVIEW
6 DATA PARADIGM
7 Latent Trait Re (unidimensional) REVIEW: Latent Trait
8 Latent Trait Re (unidimensional) Real Examples of Latent Traits REVIEW: Latent Trait
9 Item Response Function (IRF) REVIEW: IRF
10 Item Response Function (IRF) –Represents different theories about latent traits. REVIEW: IRF
11 Item Response Function (IRF) –Dichotomous response: P j ( ) = Pr[X j = 1] = Pr[Correct Response to item j | ] REVIEW: IRF
12 Item Response Function (IRF) –Polychotomous response: P jk ( ) = Pr[X j > k | ] = Pr[Exceed category k of item j | ] REVIEW: IRF
13 Item Response Function (IRF) –Dichotomous or Polychotomous response: E j ( ) = [Expected Rating for item j | ] 0 < E j ( ) < K REVIEW: IRF
14 IRF: Dichotomous items
15 IRF: Polychotomous items
16 The unweighted total score X +n stochastically orders the latent trait (Hyunh, 1994; Grayson, 1988) REVIEW: SCALES
17 4 Scales of Measurement –Conjoint Measurement REVIEW: SCALES
18 Conjoint Measurement –Row Independence Axiom REVIEW
19 Conjoint Measurement –Row Independence Axiom Property: Ordinal Scaling and unidimensionality of (test score) REVIEW
20 INDEPENDENCE AXIOM (row)
21 Conjoint Measurement –Row Independence Axiom Property: Ordinal Scaling and unidimensionality of (test score) IRF: Non-decreasing over REVIEW
22 Conjoint Measurement –Row Independence Axiom Property: Ordinal Scaling and unidimensionality of (test score) IRF: Non-decreasing over Models: MH, 2PL, 3PL, 4PL, True Score, Factor Analysis REVIEW
23 2PL:
24 3PL:
25 4PL:
26 Monotone Homogeneity (MH)
27 Conjoint Measurement –Column Independence Axiom (adding) REVIEW
28 Conjoint Measurement –Column Independence Axiom (adding) Property: Ordinal Scaling and unidimensionality of both (test score) and item difficulty (item score) REVIEW
29 INDEPENDENCE AXIOM (column)
30 Conjoint Measurement –Column Independence Axiom (adding) Property: Ordinal Scaling and unidimensionality of both (test score) and item difficulty (item score) IRF: Non-decreasing and non-intersecting over REVIEW
31 Conjoint Measurement –Column Independence Axiom (adding) Property: Ordinal Scaling and unidimensionality of both (test score) and item difficulty (item score) IRF: Non-decreasing and non-intersecting over Models: DM, ISOP REVIEW
32 DM/ISOP (Scheiblechner 1995)
33 Conjoint Measurement –Thomsen Condition (adding) REVIEW
34 Conjoint Measurement –Thomsen Condition (adding) Property: Interval Scaling and unidimensionality of both (test score) and item difficulty (item score) REVIEW
35 Thomsen condition (e.g.,double cancellation)
36 Conjoint Measurement –Thomsen Condition (adding) Property: Interval Scaling and unidimensionality of both (test score) and item difficulty (item score) IRF: Non-decreasing and parallel (non- intersecting) over REVIEW
37 Conjoint Measurement –Thomsen Condition (adding) Property: Interval Scaling and unidimensionality of both (test score) and item difficulty (item score) IRF: Non-decreasing and parallel (non- intersecting) over Models: Rasch Model, ADISOP REVIEW
38 RASCH-1PL:
39 5 Challenges of Latent Trait Measurement REVIEW
40 5 Challenges of Latent Trait Measurement Test Theory attempts to address these challenges REVIEW
41 Test Construction (10 Steps) REVIEW
42 Test Construction (10 Steps) Basic Statistics of Test Theory REVIEW
43 Total Test Score (X + ) variance = Sum[Item Variances] + Sum[Item Covariances] REVIEW
44 EPSY 546: LECTURE 2 TRUE SCORE TEST THEORY AND RELIABILITY George Karabatsos
45 TRUE SCORE MODEL Theory: Test score is a random variable. X +n Observed Test Score of person n, T n True Test Score (unknown) e n Random Error (unknown)
46 TRUE SCORE MODEL The Observed person test score X +n is a random variable (according to some distribution) with mean T n = E(X +n ) and variance 2 (X +n ) = 2 (e n ).
47 TRUE SCORE MODEL The Observed person test score X +n is a random variable (according to some distribution) with mean T n = E(X +n ) and variance 2 (X +n ) = 2 (e n ). Random Error e n = X +n – T n is distributed with mean E(e n ) = E(X +n –T n ) = 0, and variance 2 (e n ) = 2 (X +n ).
48 TRUE SCORE MODEL True Score: T n true score of person n E (X n )expected score of person n sPossible score s {0,1,…,s,…,S} p ns Pr[Person n has test score s]
49 TRUE SCORE MODEL 3 Assumptions: 1)Over the population of examinees, error has a mean of 0. E[e] = 0 2)Over the population of examinees, true scores and error scores have 0 correlation. [T, e] = 0
50 TRUE SCORE MODEL 3 Assumptions: 3)For a set of persons, the correlations of the error scores between two testings is zero. [e 1, e 2 ] = 0 –“Two testings”: when a set of persons take two separate tests, or complete two testing occasions with the same form. –The two sets of person scores are assumed to be randomly chosen from two independent distributions of possible observed scores.
51 TRUE SCORE ESTIMATION
52 TRUE SCORE ESTIMATION
53 TRUE SCORE ESTIMATION
54 TRUE SCORE ESTIMATION is test reliability. The proportion of variance of observed scores that is explained by the variance of the true scores.
55 TEST RELIABILITY is the error of measurement.
56 TEST RELIABILITY is the standard error of measurement. (random error)
57 TEST RELIABILITY is the standard error of measurement. (random error) Estimated ((1– )*100)% confidence interval around the test score:
58 TEST RELIABILITY It is desirable for a test to be Reliable.
59 TEST RELIABILITY Reliability – the degree to which the respondents’ test scores are consistent over repeated administrations of the same test.
60 TEST RELIABILITY Reliability – the degree to which the respondents’ test scores are consistent over repeated administrations of the same test. Indicates the precision of a set of test scores in the sample.
61 TEST RELIABILITY Reliability – the degree to which the respondents’ test scores are consistent over repeated administrations of the same test. Indicates the precision of a set of test scores in the sample. Random and systematic error can affect the reliability of a test.
62 TEST RELIABILITY Reliability – the degree to which the respondents’ test scores are consistent over repeated administrations of the same test. Test developers have a responsibility to demonstrate the reliability of scores obtained from their tests.
63 ESTIMATING RELIABILITY Estimated item variance Estimated total test score variance
64 ESTIMATING RELIABILITY Estimated covariance between items i and j Estimated total test score variance
65 OTHER FORMS OF RELIABILITY Test-Retest Reliability: The correlation between persons’ test scores over two administrations of the same test.
66 OTHER FORMS OF RELIABILITY Split-Half Reliability (using Spearman-Brown correction for test length): AB Correlation between scores of Test A and Test B
67 TEST VALIDITY VALIDITY: A test is valid if it measures what it claims to measure. Types: Face, Content, Concurrent, Predictive, Construct.
68 Face validity: When the test items appear to measure what the test claims to measure. Content Validity: When the content of the test items, according to experts, adequately represent the latent trait that the test intends to measure. TEST VALIDITY
69 Concurrent validity: When the test, measuring a particular latent trait, correlates highly with another test that measures the same trait. Predictive validity: When the scores of the test predict some meaningful criterion. TEST VALIDITY
70 Construct validity: A test has construct validity when the results of using the test fit hypotheses concerning the nature of the latent trait. The higher the fit, the higher the construct validity. TEST VALIDITY
71 RELIABILITY & VALIDITY Up to a point, reliability and validity increase together, but then any further increase in reliability (over ~.96) decreases validity. For e.g., when there is perfect reliability (perfect correlations between items), the test items are essentially paraphrases of each other.
72 RELIABILITY & VALIDITY “If the reliability of the items were increased to unity, all correlations between items would also become unity, and a person passing one item would pass all items and and another failing one item would fail all the other items. Thus all the possible scores would be a perfect score of one or zero…Is the dichotomy of scores the best that would be expected for items with equal difficulty?” (Tucker, 1946, on the attenuation paradox) (see also Loevinger, 1954)