The DIF-Free-Then-DIF Strategy for the Assessment of Differential Item Functioning 1.

Slides:



Advertisements
Similar presentations
The effect of differential item functioning in anchor items on population invariance of equating Anne Corinne Huggins University of Florida.
Advertisements

DIF Analysis Galina Larina of March, 2012 University of Ostrava.
Item Response Theory in Health Measurement
1 Characterizing the Impact of Horizontal Heat Transfer on the Linear Relation of Heat Flow and Heat Production Ronald G. Resmini Department of Geography.
Experimental Design, Response Surface Analysis, and Optimization
Factoring Polynomials
Confidence Intervals Underlying model: Unknown parameter We know how to calculate point estimates E.g. regression analysis But different data would change.
Simple Linear Regression
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Jingchen Liu, Geongjun Xu and Zhiliang Ying (2012)
Testing factorial invariance in multilevel data: A Monte Carlo study Eun Sook Kim Oi-man Kwok Myeongsun Yoon.
How Science Works Glossary AS Level. Accuracy An accurate measurement is one which is close to the true value.
The Analysis of Variance
The Calibration Process
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
Lecture 19 Simple linear regression (Review, 18.5, 18.8)
Chapter 14 Inferential Data Analysis
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Correlation Question 1 This question asks you to use the Pearson correlation coefficient to measure the association between [educ4] and [empstat]. However,
Relationships Among Variables
PSY 307 – Statistics for the Behavioral Sciences
Measurement & Experimental Design. Types of Variables Dependent variables – those variables we expect to change as the result of an experimental manipulation.
Measurement and Data Quality
Objectives of Multiple Regression
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
DIFFERENTIAL ITEM FUNCTIONING AND COGNITIVE ASSESSMENT USING IRT-BASED METHODS Jeanne Teresi, Ed.D., Ph.D. Katja Ocepek-Welikson, M.Phil.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
A statistical method for testing whether two or more dependent variable means are equal (i.e., the probability that any differences in means across several.
CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
t(ea) for Two: Test between the Means of Different Groups When you want to know if there is a ‘difference’ between the two groups in the mean Use “t-test”.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Rasch trees: A new method for detecting differential item functioning in the Rasch model Carolin Strobl Julia Kopf Achim Zeileis.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill.
Measurement Bias Detection Through Factor Analysis Barendse, M. T., Oort, F. J. Werner, C. S., Ligtvoet, R., Schermelleh-Engel, K.
Calibration of Response Data Using MIRT Models with Simple and Mixed Structures Jinming Zhang Jinming Zhang University of Illinois at Urbana-Champaign.
Chapter 16 Data Analysis: Testing for Associations.
Discussion of time series and panel models
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.
Scaling and Index Construction
University of Ostrava Czech republic 26-31, March, 2012.
Sampling Fundamentals 2 Sampling Process Identify Target Population Select Sampling Procedure Determine Sampling Frame Determine Sample Size.
INTRODUCTORY LECTURE 3 Lecture 3: Analysis of Lab Work Electricity and Measurement (E&M)BPM – 15PHF110.
Item Response Theory in Health Measurement
Logistic Regression Analysis Gerrit Rooks
Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?
1 Chapter 14: Repeated-Measures Analysis of Variance.
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L14.1 Lecture 14: Contingency tables and log-linear models Appropriate questions.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
SW388R7 Data Analysis & Computers II Slide 1 Principal component analysis Strategy for solving problems Sample problem Steps in principal component analysis.
Chapter 13 Understanding research results: statistical inference.
ChE 551 Lecture 04 Statistical Tests Of Rate Equations 1.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Stats Methods at IC Lecture 3: Regression.

Inference for Least Squares Lines
Linear Regression.
Regression Analysis Module 3.
Correlation, Regression & Nested Models
CJT 765: Structural Equation Modeling
Detecting Item Parameter Drift
I. Statistical Tests: Why do we use them? What do they involve?
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

The DIF-Free-Then-DIF Strategy for the Assessment of Differential Item Functioning 1

Establish common metric for DIF assessment Equal-mean-difficulty (EMD) All-other-item (AOI) Constant-item (CI) 2

Equal-mean-difficulty (EMD) the mean item difficulties of the test for the two groups are constrained to be equal. Software: ConQuest, BILOG-MG PARSCALE EMD is true when (a) no DIF item, or (b) multiple DIF items but some favor the reference group and others favor focal group, and the DIF amount are identical. i.e., ASA ≒ 0. When ASA >>0, EMD is severely violated. When applied EMD method, overall, the DIF amount in a test is forced to be balanced between groups. Therefore, half items will be classified as favoring the reference group while another half as favoring the focal group. i.e., DTF=0. 3

All-other-item (AOI) all but the studied item in the test are treated as DIF-free and served as a matching variable Software: Winsteps, IRTLRDIF, CFA-based DIF detection AOI is true when (a) the test is perfect, i.e., no any DIF item, or (b) the studied item is the only DIF item in the test 4

Constant-item (CI) select a subset of items to serve as anchor to establish a common metric for DIF assessment The CI method yields appropriate DIF assessment even when the test contains 40% DIF items. In addition, the longer the anchors, the higher the power rate of detecting DIF. Moreover, four anchors are generally enough to yield a high power rate. The superiority of the CI method is verified in the CFA- based DIF assessment (MIMIC method) and IRT-based DIF detection technique. 5

Scale purification procedure Three major steps: calibrate item parameters separately for each group, compute the linking coefficients (a slope and intercept) to place the reference & focal groups on a common metric and assess each item for DIF. re-link group matric using only those items not identified as having DIF in the previous step, and assess each item for DIF. repeat step b until the same items are identified as having DIF in consecutive iterations. 6

DIF assessment method with scale purification procedure provide a substantial improvement over EMD and AOI method. But it will come worse when they are high percentage DIF items (>20%, constant DIF) 7

Implementing the DFTD Strategy Woods’ Rank-based (RB) method to select DIF- free items assess item 1 for DIF using all other items as anchors, and repeat this step until the last item is assessed. compute the ratio of the likelihood ratio statistic (LR) to the number of free parameters for each item (df), i.e., LR/df. rank order the items based on the ratio LR/df. designated the desired numbers of items (generally 4 items) that have the smallest ratios. 8

Findings: RB method has very high rate of accuracy in selecting clean anchors in large DIF items and large sample size.  unknown on small sample size the rate of accuracy decreased when the percentage of DIF items increased.  need purification procedure the designated anchors yielded more accurate DIF assessment than the AOI method in terms of Type I error rate and power rates.  support DFTD strategy. 9

Rank-based scale purification method (RB-S): to add a scale purification procedure to the RB method. assess item 1 for DIF using all other items as anchors and repeat this step until the last item is assessed. remove those items that are identified as having DIF from the anchors and re-assess all items for DIF using the new anchors; repeat step b until the same set of items are identified as having DIF in two consecutive iterations designed the desired numbers of items that have the smallest ratios (LR/df). 10

Simulation Study one Design: (1) method: RB vs RB-S; (2) impact: r ~ N(0,1); f ~ N(0,1) or N(-1,1) (3) sample size: 250/250, 500/500, 1000/1000 (4) test length: 20, 40 (5) DIF items Percentage: 10%, 20%, 30%, 40% (6) DIF patterns: constant and balanced 11

100 replications Data gen model: 2PL logistic model DIF magnitude: ~ N (0.6, 0.01). ASAASA Software: IRTLRDIF Dependent Variable: Rate of accuracy, Type I error rate, Power, 12

Results Rate of Accuracy: (a) Both of RB & RB-S methods yields a rate of accuracy higher than the chance level; (b) the larger the sample size and the longer the test, the higher the rate of accuracy; (c) RB& RB-S methods yield similar rates of accuracy when ASA was very close to zero (e.g., under the balanced DIF pattern). (d) RB-S method yield a rate of accuracy higher than that of RB method when the ASA was far away from zero (e.g., high percentage of DIF under the constant pattern)  support DFTD strategy. 13

14 Relationship between the ASA and the rate of accuracy for the RB and RB-S methods for large (R1000/F1000; 1c) in study 1

Result Type I error & power rate (a) when ASA was close to zero, RB & RB-S method yield a well-controlled Type I error as well as high power rate. (b) when ASA was far away from zero, RB method yield an inflated Type I error rate and a deflated power rate whereas RB-S method maintain a good control over Type I error rate and power rate. 15

Simulation Study two Purpose: compare RB-S method and AOI, AOI-S, PA RB-S method: use RB-S method to select four anchor and then assess the other items for DIF with the designated anchors (purification procedure, DFTD). AOI method: all other items were treated as anchors when assessing the studied item for DIF (no purification, no DFTD) AOI-S method: the AOI method with scale purification procedure (purification procedure, no DFTD) PA: pure anchor method, in which four clean anchors were selected by design and then the other items were assessed for DIF. 16

Design: (1) method: RB vs RB-S; (2) impact: r ~ N(0,1); f ~ N(0,1) or N(-1,1) (3) DIF item percentage: 10%, 20%, 30%, 40% (4) DIF pattern: constant, balanced (5) sample size: 500/500 (4) test length: 20 items Dependent variable: Type I, power 17

Result (a) When ASA was very close to zero, all methods yield a well controlled Type I error rate and the AOI and AOI-S method yield a power rate higher than that yielded by RB-S and PA methods. (b) When ASA was large, only RB-S and PA method maintain control over Type I error rate and power rate, whereas AOI method suffered from the large ASA seriously, followed by AOI-S method. 18

19 Relationship between the ASA and the mean Type I error rate for the AOI, AOI-S, PA, and RB-S methods in study 2

Conclusions and Discussions Limitation of IRTLRDIF in implement CI method. Some other methods to select DIF-free item. e.g., the iterative constant-item method (Shih & Wang, 2009); Nonuniform DIF; Rash model, 3PL, GPCM,… Implement DFTD to IRT/nonIRT based DIF assess method (MH, LR) 20

ASA 21

ASA ≒ 0: test do not contain DIF item or when tests contain multiple DIF items but their effects are cancelled out between groups (balanced pattern) ASA > 0: the test as a whole favors the reference group; ASA < 0: the test as a whole favors the focal group; 22