Download presentation
Presentation is loading. Please wait.
Published byCleopatra Pope Modified over 6 years ago
1
Interpretations of item thresholds for the partial credit model
Margaret Wu University of Melbourne
2
Partial Credit Model Polytomous items: Scoring categories for an item may be 0,1,2,… Dichotomous items are a special case of polytomous items. There is no need to make a distinction. If there are K categories, then there are K-1 item thresholds. When writing scoring rubrics, we must ensure that Students with higher abilities will have a higher probability of obtaining higher scores. That is, the scoring categories are “ordered”
3
Example: PISA PS X601Q01 Cinema Outing
Which movie should the boys consider watching? Movie Boys can consider watching Children in the Net Yes/No Monsters from the Deep Carnivore Pokamin Enigma King of the Wild Scoring Rubrics: Full Credit Code 2: Yes, No, No, No, Yes, Yes, in that order. Partial Credit Code 1: One incorrect answer. No Credit Code 0: Other responses.
4
Partial credit versus dichotomous scoring
Whenever possible, for an item, the more score categories that we can divide the students into ordered ability groups, the more discriminating the test will be. The test reliability will be higher as well When the data fit the PCM model, the maximum score of a PCM item corresponds the discriminating power of the item.
5
How many score categories?
As many as can clearly distinguishing student ability levels More is better than fewer, provided data fit the model Higher score must correspond to higher ability. i.e., scores must be ordered.
6
Maximum score for an item
Should not be the number of distinguishable score categories. E.g. if there are 5 categories, then the maximum score is 5. This is NOT the right scoring method. A weight needs to be applied, despite the number of score categories Some items may be weighted more than others How to decide on item weight (maximum score of an item)?
7
Deciding on item weight (maximum score of an item)
Some people suggest Length of time test takers need to answer an item The “importance” of an item (to the construct) The amount of “information” an item provides for the ability estimate Should item difficulty be taken into account? Actually no, not directly.
8
An example Measure propensity for developing skin cancer
Possible indicators Skin colour Eye colour Exposure to the sun Family history Skin colour: extremely fair, fair, medium, olive, brown, black (6 types,0-6) Eye colour: blue/green, brown, black (3 types,0-3) Hair colour: grey, blond, light brown, dark brown, black(0-5) Exposure to the sun: rarely, occasionally, frequently(0-3) Family history: yes, no(0-1)
9
Ordered scoring categories
Given two adjacent score categories, a higher ability student will have a higher chance of being in the higher score category. Local step (i.e, from step 1 to 2, or from step 2 to 3) follows a simple Rasch model:
10
Expected score increases with ability
Can prove that E(X) is an increasing function with respect to theta
11
Parameterisation of PCM
Masters (1982) Example 3-category item
12
Graphical interpretations of
1 2 p1 p2 p0
13
Very few respondents are in the middle category
A 3-category item with 1>2 (reversal in parameters) But category ordering still holds 2 1
14
Example: TIMSS M032764 Cost of telephone plans Scoring rubrics
2: answer of 150, with correct working shown 1: answer of 150 with no working shown 1: correct method but incorrect computation 1: answer of 30 0: other answers
15
TIMSS M032764 item analysis Item 99 ------- item:99 (M032764)
Cases for this item Discrimination 0.35 Item Threshold(s): Weighted MNSQ Item Delta(s): Label Score Count % of tot Pt Bis t (p) PV1Avg:1 PV1 SD:1 (.000) (.031) (.000) ==============================================================================
16
TIMSS M032764 item characteristic curve
17
Parameter Estimation The values of are determined by the number of respondents (frequency) in each score category, i . The values of are NOT determined by “who obtained the score i “. That is, whether low or high ability students obtained a particular score. The PCM does not stipulate any particular pattern of frequencies in the score categories. So we can have, say 200 scored 0, 500 scored 1 and 300 scored 2. OR, 500 scored 0, 100 scored 1, 400 scored 2.
18
Some simulation results
Item 1 Item 2 Item 3 Item 4 Item 5 Generating (-2,0,2) (-1,0,1) (0,0,0) (1,0,-1) (2,0,-2) Score 0 freq. 586 994 1547 2035 2298 Score 1 freq. 1938 1558 921 471 195 Score 2 freq. 1960 1453 941 482 202 Score 3 freq. 516 995 1591 2012 2305
19
Expected Scores Curves
20
Item Characteristic Curves
21
Item Discrimination Which item is more “discriminating”?
Item 1 Item 2 Item 3 Item 4 Item 5 Generating (-2,0,2) (-1,0,1) (0,0,0) (1,0,-1) (2,0,-2) Which item is more “discriminating”? If deltas are close together, does it mean the item does not separate students?
22
Discrimination of PCM items
If data fit the PCM models, then: Items with the same maximum score will have the same discrimination But each item will discriminate students at different ability levels
23
Consider the dichotomous case
24
Should deltas be certain distances apart?
No, provided that the test as a whole can discriminate students at various ability levels, there need not be any restriction for the separation of deltas for individual items.
25
In conclusion As long as the item fit the PCM, reversed thresholds do not indicate an issue with the item. PCM thresholds do not need to be certain distances apart, as long as there are items that discriminate students at different ability levels. Maximum scores of PCM items need to reflect the discrimination power of the item, not the number of possible categories.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.