Reflections and illustrations on DIF Paul De Boeck K.U.Leuven 25th IRT workshop Twente, October 2009.

Slides:



Advertisements
Similar presentations
Multilevel Models with Latent Variables Daniel J. Bauer Department of Psychology University of North Carolina 9/13/04 SAMSI Workshop.
Advertisements

Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.
The effect of differential item functioning in anchor items on population invariance of equating Anne Corinne Huggins University of Florida.
DIF Analysis Galina Larina of March, 2012 University of Ostrava.
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
StatisticalDesign&ModelsValidation. Introduction.
How Should We Assess the Fit of Rasch-Type Models? Approximating the Power of Goodness-of-fit Statistics in Categorical Data Analysis Alberto Maydeu-Olivares.
Sakesan Tongkhambanchong, Ph.D.(Applied Behavioral Science Research) Faculty of Education, Burapha University.
Introduction to Item Response Theory
Exploring the Full-Information Bifactor Model in Vertical Scaling With Construct Shift Ying Li and Robert W. Lissitz.
Correlation and regression
The DIF-Free-Then-DIF Strategy for the Assessment of Differential Item Functioning 1.
A controversy in PISA and other large- scale assessments: the trade-off between model fit, invariance and validity David Andrich CEM: 30 years of Evidence.
Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.
The Simple Linear Regression Model: Specification and Estimation
Elementary hypothesis testing
Stat 512 – Lecture 14 Analysis of Variance (Ch. 12)
MARE 250 Dr. Jason Turner Hypothesis Testing II. To ASSUME is to make an… Four assumptions for t-test hypothesis testing:
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
Topic 2: Statistical Concepts and Market Returns
Final Review Session.
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
Independent Component Analysis (ICA) and Factor Analysis (FA)
Biol 500: basic statistics
Part III: Inference Topic 6 Sampling and Sampling Distributions
Explanatory Secondary Dimension Modeling of Latent Different Item Functioning Paul De Boeck, Sun-Joo Cho, and Mark Wilson.
Today Concepts underlying inferential statistics
Lecture II-2: Probability Review
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.
DIFFERENTIAL ITEM FUNCTIONING AND COGNITIVE ASSESSMENT USING IRT-BASED METHODS Jeanne Teresi, Ed.D., Ph.D. Katja Ocepek-Welikson, M.Phil.
Fall 2013 Lecture 5: Chapter 5 Statistical Analysis of Data …yes the “S” word.
Education 793 Class Notes T-tests 29 October 2003.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 2 – Slide 1 of 25 Chapter 11 Section 2 Inference about Two Means: Independent.
Chapter 9 Comparing More than Two Means. Review of Simulation-Based Tests  One proportion:  We created a null distribution by flipping a coin, rolling.
Discriminant Function Analysis Basics Psy524 Andrew Ainsworth.
Introduction Multilevel Analysis
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Rasch trees: A new method for detecting differential item functioning in the Rasch model Carolin Strobl Julia Kopf Achim Zeileis.
1 Lesson 8: Basic Monte Carlo integration We begin the 2 nd phase of our course: Study of general mathematics of MC We begin the 2 nd phase of our course:
Funded through the ESRC’s Researcher Development Initiative Prof. Herb MarshMs. Alison O’MaraDr. Lars-Erik Malmberg Department of Education, University.
Measurement Bias Detection Through Factor Analysis Barendse, M. T., Oort, F. J. Werner, C. S., Ligtvoet, R., Schermelleh-Engel, K.
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Inference about Two Means: Independent Samples 11.3.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Summary Five numbers summary, percentiles, mean Box plot, modified box plot Robust statistic – mean, median, trimmed mean outlier Measures of variability.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Education 793 Class Notes Decisions, Error and Power Presentation 8.
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.
Robust Estimators.
Item Factor Analysis Item Response Theory Beaujean Chapter 6.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)
Item Response Theory in Health Measurement
Sampling Theory and Some Important Sampling Distributions.
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Heriot Watt University 12th February 2003.
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Linear Regression Linear Regression. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Purpose Understand Linear Regression. Use R functions.
Two Approaches to Estimation of Classification Accuracy Rate Under Item Response Theory Quinn N. Lathrop and Ying Cheng Assistant Professor Ph.D., University.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Estimating standard error using bootstrap
Bayesian data analysis
Virginia Tech, Educational Research and Evaluation
Sampling Distribution
Sampling Distribution
One-Way Analysis of Variance
Presentation transcript:

Reflections and illustrations on DIF Paul De Boeck K.U.Leuven 25th IRT workshop Twente, October 2009

Reflections and illustrations on DIF Paul De Boeck University of Amsterdam 25th IRT workshop Twente, October 2009

Is DIF a dead topic? A non-explanatory approach Paul De Boeck University of Amsterdam 25th IRT workshop Twente, October 2009

Is there life after death for DIF? A non-explanatory approach Paul De Boeck K.U.Leuven 25th IRT workshop Twente, October 2009

The three DIF generations Zumbo, Language Assessment Quarterly, st generation: from “item bias” to “differential item functioning” 2nd generation: modeling item responses, IRT, multidimensional models 3rd generation: explanation of DIF The end of history “.. the pronouncements I hear from some quarters that psychometric and statistical research on DIF is dead or near dying..”

Outline Issues Reflections and more Possible answers

Issues Anchoring Statistic Indeterminacies

I apologize,.. There are already so many methods yes The best among the existing methods are very good methods yes They are standard and good practice yes Do we really need more? no, therefore no real issues And still

1. Anchoring Blind, iterative Purification - all other in step 1 - nonrejected items in following steps A priori set, test They work based on pragmatism and a heuristic, on prior theory, what can one want more?

2. Statistic and its distribution Based on difference per item or set of items MH statistic ST-p-DIF B u from SIBTEST LR test statistic Raju distance Other Parameter estimates They work, what can one want more?

3. Indeterminacies with an IRT modeling approach Basic model is 1PL or Rasch model for uniform DIF 2PL for uniform and non-uniform DIF type 1 2PL multidimensional for uniform and non-uniform DIF type 2

Difficulties – uniform DIF Additive or translational indeterminacy β fi = β ri + δ βi β* fi = β ri + δ* βi δ* βi = δ βi + c β γ* = γ – c β β fi, β ri focal group and reference group difficulties δ βi DIF effect γ group effect * transformed values

Invariance of DIF explanation δ βi = Σ k=0 ω k X ik (+ ε i ) X ik : value of item i on item covariate k ω k : weight of covariate k in explaining DIF k=0 for intercept ω k>0 are translation invariant, and only these covariates have explanatory value

Degrees of discrimination non-uniform DIF type 1 Multiplicative indeterminacy α fi = α ri x δ αi α* fi = α ri x δ* αi δ* αi = δ αi x c α σ θf * = σ θf / c α additive formulation discrimination DIF

Loadings for multidimensional models The indeterminacies look a little embarrassing, because the results depend on one’s choice.

Reflections Random item effects Item mixture models Robust statistics

Intro: Beliefs DIF is gradual why not a random item effect? DIF or no DIF why not a latent class of DIF items? DIF items are a minority why not identify outliers?

Where is the DIF?

Intro: ANOVA approach η gpi = ln(Pr(Y gpi =1)/Pr(Y gpi =0)) η gpi = μ overall mean + λ gp = αθ gp person effect, ability θ gp ~ N (0,1) + λ i = β i item effect, overall item difficulty + λ g = γ g group effect + λ gp interaction p x g does not exist + λ gpi = α’ i θ gp interaction pwg x i + λ gi = β’ gi interaction i x g uniform DIF + λ gpi = α’’ gi θ gp interaction pwg x i x g non-uniform DIF type 1 2PL version

+ λ gpi = α’ i θ gp + λ gpi = α’’ gi θ gp interaction pwg x i x g is non-uniform DIF Type 1 + λ gpi = α’ i θ gp1 + λ gpi = α’’ gi θ gp2 interaction pwg x i x g is non-uniform DIF Type 2

Secondary dimension DIF η gpi = (α i + gδ αi )θ gp + (β i + gδ βi ) + λ g = α i θ gp + gδ αi θ gp + (β i + gδ βi ) + λ g Secondary-dimension DIF η gpi = α i θ gp1 + gδ αi θ gp2 + (β i + gδ βi ) + λ g Cho, De Boeck & Wilson, NCME 2009 g = 0 reference group g = 1 focal group

can explain uniform DIF η gpi = α i θ gp1 + gδ αi θ gp2 + (β i + gδ βi ) + λ g gδ αi μ θg2 + gδ αi θ’ gp2 = gδ βi Cho, De Boeck & Wilson, NCME 2009

Different from the MIMIC model Secondary dimension DIF η gpi θ gp1 θ gp2 G η gpi θ gp1 gθ gp2 G

1. Random item effects Within group random item effects (β ri, β fi ) ~ N(μ βr, 0, σ 2 βr, σ 2 βf, ρ βrβf ) (β i, β f-gi ) ~ N(μ βr, 0, σ 2 β, σ 2 βf-g, ρ ββf-g ) small number of parameters² Idea based on Longford et al in Holland and Wainer (1993) for the MH there is evidence that the true DIF parameters are distributed continuously Van den Noortgate & De Boeck, JEBS, 2005 Gonzalez, De Boeck & Tuerlinckx, Psychological Methods, 2008 De Boeck, Psychometrika, 2008

2. Latent class of DIF items Asymmetric DIF is exported to other items Is avoided when DIF items are removed, appropriate removing eliminates interaction Basis of purification process Let us make a latent class for items to be removed, and identify the DIF items on the basis of their posterior probability

Item mixture model η gpi |c i =0 = θ gp + β i non-DIF class η gpi |c i =1 = θ gp + β gi DIF class θ rp + β i θ rp + β 0i θ fp + β i θ fp + β 1i non-DIFDIF reference focal Frederickx, Tuerlinckx, De Boeck & Magis, resubmitted 2009

further model specifications: - item effects are random - normal for the non-DIF items - bivariate normal for the DIF item difficulties - group specific normals for abilities

Simulation study 1PL P=500, I = 20, 50 x 2 #DIF = 0, 5 (1.5, 1, 0.5, -1, -1.5) x 2 μ θ1 = 0, μ θ2 = 0, 0.5, x 2 = 16 μ β = μ β0 = μ β1, σ 2 β = σ 2 β0 = σ 2 β1 = 1, ρ β0β1 = 0 five replications MCMC WinBUGS prior β variance: Inv Gamma, Half normal, Uniform distributional parameters are estimated posterior prob determines whether flagged as DIF

Results simulation study average #errors LRT 1.64 MH 1.39 ST-p-DIF 0.65 mixture inverse gamma0.30 mixture normal0.36 mixture uniform0.40 item mixture does better or equally good then every other traditional method in all 16 cells

More results - results of mixture model are not affected by DIF being asymetrical - neither by true distribution of item difficulties (normal vs uniform)

3. DIF items are outliers Outlying with respect to the item difficulty difference between reference and focal group Types of difference: - simple difference - standardized – divided by standard error - Raju distance – first equal mean difficulty linking, then standardize τ i = I/(I-1) 2 x (d i - d.) 2 /s 2 d is beta (0.5, (I-2)/2) distributed if d i is normally distributed

Go robust: d. is replaced by the median s d is replaced by mean absolute deviation Taking advantage of the fact that interitem variation is an approximation of se if robustly estimated De Boeck, Psychometrika 2008 Magis & De Boeck, 2009, rejected

20 items, nrs 19 and 20 are the true DIF items

Simple difference

Standardized difference

Raju

Simulation study 1PL P=500, I = 20, 40 x 2 %DIF = 0%, 10%, 20% x 3 size of DIF = 0.2, 0.4, 0.6, 0.8, 1.0 x 5 μ θ1 = 0, μ θ2 = 0, 1 x 2 = replications

Results 0% DIF MH SIBTEST Logistic Raju classic Raju robust Type 1 errors ≈ 5%

Results DIF size = 1, P=1000, I=40, equal μ θ 10% DIF20%DIF Type 1 PowerType 1 Power MH SIBTEST Logistic Raju classic Raju robust

Results are similar for unequal mean abilities Results are similar but less pronounced for smaller P and smaller DIF size

Possible answers Anchoring? Anchor set memberschip is binary latent item variable, or, the clean set of items Statistic? Robust statistic works also for nonparametric approaches Indeterminacy? (go explanatory) no issue for random item model, look at the cov equal means in item mixture approach equal means for Raju distance

Item mixtures and robust statistics do in one step what purification does in several steps, item by item, and through different purification steps – purification is approximate: They both give a rationale for the solving the indeterminacy issue Random item effect approach is not sensitive to indetermincay

Si no è utile è ben ispirazione Good for other purposes or a broader concept than DIF, for qualitative differences between groups Random item models Item mixture models Robust statistics IRT

Thank you, and stay alive

Dimensionality uniform DIF and non-uniform DIF type