Download presentation
Presentation is loading. Please wait.
Published byCaren Allen Modified over 9 years ago
1
Differential Item Functioning
2
Anatomy of the name DIFFERENTIAL –Differential Calculus? –Comparing two groups ITEM –Focus on ONE item at a time –Not the whole test FUNCTIONING –All we have is the item performance (1 or 0). –Not about the content, format of item Is there any Differential Item Functioning between groups?
5
Why do we care about DIF? Validation process of test –Bias-Free against minorities Necessary but not sufficient –Inference or interpretation beyond statistics data must be involved Bias? DIF? Impact? –DIF: Conditional on ability –Bias: Pejorative in nature –Impact: Not conditional on ability
6
Definition of DIF An item has no DIF if the probability of getting the item right is dependent only on ability, not on group membership. An item has DIF if the probability of getting the item right is dependent on group membership (and possibly on ability).
7
Causes & Types of DIF Causes –Construct irrelevant variance –Opportunity to learn Types –Adverse –Benign
8
Causes (k-12) Construct Irrelevant Variance Opportunity to Learn Benign Adverse MP Responsibility Field Client
9
Some DIF Examples Meaning of “ascend” in MCAS vocabulary test Potato Salad example in NAEP Biology test Train schedule in urban area in LSAT logical reasoning problem Color of lemon from ETS
10
Empirical Evidence It is a kind of Function. Inputs: –Item response vector –Total score –Group indicator Output: –A number called DIF index
11
Feverish World of DIF Every categorical data analysis method can be used, since the DIF index is just simply a mathematical function with an item response vector as the main input. –Mantel-Haenszel method –Standardization method –Logistic regression method –Dimensionality analysis –IRT based methods
12
One question, many answers Mantel-Haenszel method –Differences in constant odds ratio Standardization method –Differences in proportion of correct Logistic regression method –Group variable coefficient estimates Dimensionality analysis –Second dimension of data IRT based methods –Area between two ICCs
13
Area between two ICCs Male Female
14
DIF in MP Standardization method Index describing the degree of DIF –Standardized P-Difference Comparing groups –Male-Female –White-Black –White-Hispanic Minimum 200 examinees in one group
15
Classification of DIF A: [-0.05 ~ 0.05]negligible B: [-0.1 ~ -0.05) and (0.05 ~ 0.1]low C: outside the [-0.1 ~ 0.1]high CC A: [-0.05 ~ 0.05]negligible B: [-0.1 ~ -0.05) and (0.05 ~ 0.1]low C: outside the [-0.1 ~ 0.1]high AB B
17
Some more Jargon Matching variable –Conditional variable –Total score, theta score, external measure Focal group –Study group Base group –Reference group
18
White GroupBlack Group Item of Interest Base groupFocal group
19
White GroupBlack Group White GroupBlack Group We can now study this item of interest for both the White group and the Black group
20
Impact vs. DIF Impact –Difference between two groups in performance on item level (and total score level) DIF –Difference between two groups in performance on item level AFTER groups matched with respect to the ability
21
Standardized P-Difference 1)Match the different groups by score level 2)At every score level get the proportion correct for each group 3)Apply weighting to the difference of proportion correct 4)Accumulate these weighted differences across all score levels 5)Divide the sum of the weighted difference by the sum of the weights
22
Formal Definition of Standardized P-Difference w m : Weighting factor at score level m P fm : Proportion correct of the focal group P bm : Proportion correct of the base group
23
Summation (Σ)
24
Does it work? If we know which items have DIF in advance, we can test the method to see whether it catches the DIF properly or not. We simulated data from a 40 item test. One item had DIF: we made it more difficult for one group than another. We ran the Standardized P-Difference procedure to evaluate the DIF for each item. Ideally, the method would make the right decision on each item.
25
Data Simulation plan Examinees –2000 examinees in focal group and 8000 in base group –Focal group ability: ~N (0,1) –Base group ability: ~N (1,1) Items –40 MC items only –41 score levels (from 0 to 40) DIF setting –Only 1 item having DIF –The focal group difficulty parameter is 1.0 higher than the base group one. –The others have the same item parameters for both groups.
28
ITEM 26
29
ITEM 27
30
ITEM 26 ITEM 27
32
Some more complexity? Double differential functioning? –Discriminant parameter or point-by-serial correlation How big is big? –Hypothetical testing Spoiled onion in the basket? –Purification of the criterion Polytomous item –Testlet DIF
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.