Differential Item Functioning. Anatomy of the name DIFFERENTIAL –Differential Calculus? –Comparing two groups ITEM –Focus on ONE item at a time –Not the.

Slides:



Advertisements
Similar presentations
Test Development.
Advertisements

Chapter 8 Flashcards.
DIF Analysis Galina Larina of March, 2012 University of Ostrava.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
Item Response Theory in Health Measurement
Measurement Reliability and Validity
Item Analysis: A Crash Course Lou Ann Cooper, PhD Master Educator Fellowship Program January 10, 2008.
Advanced Topics in Standard Setting. Methodology Implementation Validity of standard setting.
Validity In our last class, we began to discuss some of the ways in which we can assess the quality of our measurements. We discussed the concept of reliability.
Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.
Overview of field trial analysis procedures National Research Coordinators Meeting Windsor, June 2008.
Chapter 4 Validity.
Statistical Methods Chichang Jou Tamkang University.
Statistics in HRM Kenneth M. York School of Business Administration Oakland University.
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
Item Analysis Prof. Trevor Gibbs. Item Analysis After you have set your assessment: How can you be sure that the test items are appropriate?—Not too easy.
Examing Rounding Rules in Angoff Type Standard Setting Methods Adam E. Wyse Mark D. Reckase.
Part 5 Staffing Activities: Employment
© UCLES 2013 Assessing the Fit of IRT Models in Language Testing Muhammad Naveed Khalid Ardeshir Geranpayeh.
Discriminant Analysis Testing latent variables as predictors of groups.
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.
Correlation Nabaz N. Jabbar Near East University 25 Oct 2011.
McGraw-Hill © 2006 The McGraw-Hill Companies, Inc. All rights reserved. Correlational Research Chapter Fifteen.
Identification of Misfit Item Using IRT Models Dr Muhammad Naveed Khalid.
DIFFERENTIAL ITEM FUNCTIONING AND COGNITIVE ASSESSMENT USING IRT-BASED METHODS Jeanne Teresi, Ed.D., Ph.D. Katja Ocepek-Welikson, M.Phil.
Kaizen–What Can I Do To Improve My Program? F. Jay Breyer, Ph.D. Presented at the 2005 CLEAR Annual Conference September Phoenix,
You got WHAT on that test? Using SAS PROC LOGISTIC and ODS to identify ethnic group Differential Item Functioning (DIF) in professional certification exam.
STRONG TRUE SCORE THEORY- IRT LECTURE 12 EPSY 625.
Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.
Understanding Statistics
The ABC’s of Pattern Scoring Dr. Cornelia Orr. Slide 2 Vocabulary Measurement – Psychometrics is a type of measurement Classical test theory Item Response.
Cara Cahalan-Laitusis Operational Data or Experimental Design? A Variety of Approaches to Examining the Validity of Test Accommodations.
Measuring Mathematical Knowledge for Teaching: Measurement and Modeling Issues in Constructing and Using Teacher Assessments DeAnn Huinker, Daniel A. Sass,
CHAPTER 6, INDEXES, SCALES, AND TYPOLOGIES
Chapter 7 Item Analysis In constructing a new test (or shortening or lengthening an existing one), the final set of items is usually identified through.
A MULTIDIMENSIONAL APPROACH TO THE IDENTIFICATION OF TEST FAIRNESS EXPLORATION OF THREE MULTIPLE-CHOICE SSC PAPERS IN PAKISTAN Syed Muhammad Fahad Latifi.
Empirical Bayes DIF Assessment Rebecca Zwick, UC Santa Barbara Presented at Measured Progress August 2007.
Correlational Research Chapter Fifteen Bring Schraw et al.
Dimensionality of the latent structure and item selection via latent class multidimensional IRT models FRANCESCO BARTOLUCCI.
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Part 5 Staffing Activities: Employment
Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.
1 Differential Item Functioning in Mplus Summer School Week 2.
Evaluating Risk Adjustment Models Andy Bindman MD Department of Medicine, Epidemiology and Biostatistics.
MOI UNIVERSITY SCHOOL OF BUSINESS AND ECONOMICS CONCEPT MEASUREMENT, SCALING, VALIDITY AND RELIABILITY BY MUGAMBI G.K. M’NCHEBERE EMBA NAIROBI RESEARCH.
Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
The ABC’s of Pattern Scoring
The Practice of Social Research Chapter 6 – Indexes, Scales, and Typologies.
Chapter 8: Confidence Intervals based on a Single Sample
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Item Response Theory in Health Measurement
FIT ANALYSIS IN RASCH MODEL University of Ostrava Czech republic 26-31, March, 2012.
Chapter 6 - Standardized Measurement and Assessment
2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)
Lesson 2 Main Test Theories: The Classical Test Theory (CTT)
DISCRIMINANT ANALYSIS. Discriminant Analysis  Discriminant analysis builds a predictive model for group membership. The model is composed of a discriminant.
Intro to Research Methods
Indexes, Scales, and Typologies
ARDHIAN SUSENO CHOIRUL RISA PRADANA P.
Evaluation of measuring tools: validity
Introduction to the Validation Phase
Validity and Reliability
MANA 4328 Dennis C. Veit Measurement MANA 4328 Dennis C. Veit 1.
Week 3 Class Discussion.
Week 10 Slides.
Reliability and Validity of Measurement
MANA 4328 Dennis C. Veit Measurement MANA 4328 Dennis C. Veit 1.
15.1 The Role of Statistics in the Research Process
Evaluating Multi-item Scales
Presentation transcript:

Differential Item Functioning

Anatomy of the name DIFFERENTIAL –Differential Calculus? –Comparing two groups ITEM –Focus on ONE item at a time –Not the whole test FUNCTIONING –All we have is the item performance (1 or 0). –Not about the content, format of item Is there any Differential Item Functioning between groups?

Why do we care about DIF? Validation process of test –Bias-Free against minorities Necessary but not sufficient –Inference or interpretation beyond statistics data must be involved Bias? DIF? Impact? –DIF: Conditional on ability –Bias: Pejorative in nature –Impact: Not conditional on ability

Definition of DIF An item has no DIF if the probability of getting the item right is dependent only on ability, not on group membership. An item has DIF if the probability of getting the item right is dependent on group membership (and possibly on ability).

Causes & Types of DIF Causes –Construct irrelevant variance –Opportunity to learn Types –Adverse –Benign

Causes (k-12) Construct Irrelevant Variance Opportunity to Learn Benign Adverse MP Responsibility Field Client

Some DIF Examples Meaning of “ascend” in MCAS vocabulary test Potato Salad example in NAEP Biology test Train schedule in urban area in LSAT logical reasoning problem Color of lemon from ETS

Empirical Evidence It is a kind of Function. Inputs: –Item response vector –Total score –Group indicator Output: –A number called DIF index

Feverish World of DIF Every categorical data analysis method can be used, since the DIF index is just simply a mathematical function with an item response vector as the main input. –Mantel-Haenszel method –Standardization method –Logistic regression method –Dimensionality analysis –IRT based methods

One question, many answers Mantel-Haenszel method –Differences in constant odds ratio Standardization method –Differences in proportion of correct Logistic regression method –Group variable coefficient estimates Dimensionality analysis –Second dimension of data IRT based methods –Area between two ICCs

Area between two ICCs Male Female

DIF in MP Standardization method Index describing the degree of DIF –Standardized P-Difference Comparing groups –Male-Female –White-Black –White-Hispanic Minimum 200 examinees in one group

Classification of DIF A: [-0.05 ~ 0.05]negligible B: [-0.1 ~ -0.05) and (0.05 ~ 0.1]low C: outside the [-0.1 ~ 0.1]high CC A: [-0.05 ~ 0.05]negligible B: [-0.1 ~ -0.05) and (0.05 ~ 0.1]low C: outside the [-0.1 ~ 0.1]high AB B

Some more Jargon Matching variable –Conditional variable –Total score, theta score, external measure Focal group –Study group Base group –Reference group

White GroupBlack Group  Item of Interest Base groupFocal group

White GroupBlack Group White GroupBlack Group We can now study this item of interest for both the White group and the Black group

Impact vs. DIF Impact –Difference between two groups in performance on item level (and total score level) DIF –Difference between two groups in performance on item level AFTER groups matched with respect to the ability

Standardized P-Difference 1)Match the different groups by score level 2)At every score level get the proportion correct for each group 3)Apply weighting to the difference of proportion correct 4)Accumulate these weighted differences across all score levels 5)Divide the sum of the weighted difference by the sum of the weights

Formal Definition of Standardized P-Difference w m : Weighting factor at score level m P fm : Proportion correct of the focal group P bm : Proportion correct of the base group

Summation (Σ)

Does it work? If we know which items have DIF in advance, we can test the method to see whether it catches the DIF properly or not. We simulated data from a 40 item test. One item had DIF: we made it more difficult for one group than another. We ran the Standardized P-Difference procedure to evaluate the DIF for each item. Ideally, the method would make the right decision on each item.

Data Simulation plan Examinees –2000 examinees in focal group and 8000 in base group –Focal group ability: ~N (0,1) –Base group ability: ~N (1,1) Items –40 MC items only –41 score levels (from 0 to 40) DIF setting –Only 1 item having DIF –The focal group difficulty parameter is 1.0 higher than the base group one. –The others have the same item parameters for both groups.

ITEM 26

ITEM 27

ITEM 26 ITEM 27

Some more complexity? Double differential functioning? –Discriminant parameter or point-by-serial correlation How big is big? –Hypothetical testing Spoiled onion in the basket? –Purification of the criterion Polytomous item –Testlet DIF