DIF detection using OLR

Slides:

Advertisements

Similar presentations

DIF Analysis Galina Larina of March, 2012 University of Ostrava.

Advertisements

M2 Medical Epidemiology

Logistic Regression Psy 524 Ainsworth.

Item Response Theory in Health Measurement

What is Interaction for A Binary Outcome? Chun Li Department of Biostatistics Center for Human Genetics Research September 19, 2007.

Departments of Medicine and Biostatistics

HSRP 734: Advanced Statistical Methods July 24, 2008.

Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.

Today Concepts underlying inferential statistics

Cohort Studies Hanna E. Bloomfield, MD, MPH Professor of Medicine Associate Chief of Staff, Research Minneapolis VA Medical Center.

Chapter 14 Inferential Data Analysis

Classification and Prediction: Regression Analysis

Chapter 12 Inferential Statistics Gay, Mills, and Airasian

DIFFERENTIAL ITEM FUNCTIONING AND COGNITIVE ASSESSMENT USING IRT-BASED METHODS Jeanne Teresi, Ed.D., Ph.D. Katja Ocepek-Welikson, M.Phil.

Part 2 DIF detection in STATA. Dif Detect - Stata Developed by Paul Crane et al, Washington University based on Ordinal logistic regression (Zumbo, 1999)

Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.

POTH 612A Quantitative Analysis Dr. Nancy Mayo. © Nancy E. Mayo A Framework for Asking Questions Population Exposure (Level 1) Comparison Level 2 OutcomeTimePECOT.

Andrew Thomson on Generalised Estimating Equations (and simulation studies)

April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.

Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.

Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

1 Differential Item Functioning in Mplus Summer School Week 2.

Evaluating Risk Adjustment Models Andy Bindman MD Department of Medicine, Epidemiology and Biostatistics.

The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.

Item Response Theory in Health Measurement

Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.

POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)

Tutorial I: Missing Value Analysis

1 Introduction to Modeling Beyond the Basics (Chapter 7)

DIF and cross-cultural measurement of cognitive functioning Paul K. Crane, MD MPH Laura B. Gibbons, PhD.

Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.

Item Response Theory Dan Mungas, Ph.D. Department of Neurology University of California, Davis.

Easy (and not so easy) questions to ask about adolescent health data J. Dennis Fortenberry MD MS Indiana University School of Medicine.

Nonparametric Statistics

Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.

Looking for statistical twins

Bootstrap and Model Validation

Friday Harbor Laboratory University of Washington August 22-26, 2005

An Introduction to Latent Curve Models

Nonparametric Statistics

Power, Sample Size and Confounding

Hypothesis Testing.

Logistic Regression APKC – STATS AFAC (2016).

April 18 Intro to survival analysis Le 11.1 – 11.2

Lecture Slides Elementary Statistics Twelfth Edition

Matched Case-Control Study

Advanced Quantitative Techniques

Lecture 18 Matched Case Control Studies

Journal Club Notes.

Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.

Genome Wide Association Studies using SNP

CJT 765: Structural Equation Modeling

Paul K. Crane, MD MPH Dan M. Mungas, PhD

12 Inferential Analysis.

Generalized Linear Models (GLM) in R

Nonparametric Statistics

Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II

His Name Shall Be Revered …

I. Statistical Tests: Why do we use them? What do they involve?

Practical Introduction to PARSCALE

SAFS Quantitative Seminar

12 Inferential Analysis.

Statistics II: An Overview of Statistics

Test co-calibration and equating

Lecture 1: Descriptive Statistics and Exploratory

Introduction to IRT for non-psychometricians

Chapter 6 Logistic Regression: Regression with a Binary Dependent Variable Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.

Evaluating Multi-item Scales

Presentation transcript:

DIF detection using OLR Paul K. Crane, MD MPH Internal Medicine University of Washington

Outline Statistical background DIFdetect package What do we do when we find DIF? DIF adjustments to PARSCALE code How good are adjusted scores? Discussion

Statistical background Recall definition of DIF: when demographic characteristic(s) interfere with relationship expected between ability level and responses to an item A conditional definition; have to control for ability level, or else we can’t differentiate between DIF and differential test impact

Logistic regression applied to DIF detection Swaminathan and Rogers (1990) Tested two models: P(Y=1|X, group)=f(β1X+β2*group+β3*X*group) P(Y=1|X)=f(β1X) Compared the –2 log likelihoods of these two models to a chi squared distribution with 2 df Uniform and non-uniform tested at same time

Camilli and Shepard (1994) Recommended a two step procedure, to first test for non-uniform DIF and then for uniform DIF P(Y=1|X, group)=f(β1X+β2*group+β3*X*group) P(Y=1|X, group)= f(β1X+β2*group) P(Y=1|X)=f(β1X) -2 log likelihoods of each pair of models compared to determine non-uniform DIF and uniform DIF in two separate steps

Millsap and Everson (1994) Dismissive of “observed score” techniques such as logistic regression X contains several items that have DIF, so adjusting for X is theoretically problematic Advocated latent approaches such as IRT for DIF detection Very influential publication

Zumbo (1999) Extended Swaminathan and Rogers framework to ordinal logistic regression case to handle polytomous items Did not address latent trait; also used a single step rather than two steps

Crane, van Belle, Larson (2004) Pointed out that logistic regression model is a re-parameterization of the IRT model as long as IRT-derived θ estimates are used as ability scores Addressed multiple hypothesis testing of non-uniform DIF; no difference between four different techniques of adjusting

Crane et al. (2004) – 2 Biggest change in terms of specific criteria for uniform DIF Recognized that non-uniform and uniform DIF were analogous to effect modification and confounding Employed epidemiological thinking about how to detect confounding relationships from the data

Crane et al. (2004) – 3 Same models used (though now θ not X) P(Y=1|θ, group)= f(β1θ+β2*group) P(Y=1|θ)=f(β1’θ) Determine the impact of including the group term on the magnitude of the relationship between θ and item responses Determine size of |(β1-β1’)/β1|. If this is large, uniform DIF (confounding) is present Maldonado and Greenland simulation study on confounder selection strategies

Work still pending “Optimal” criteria for uniform and non-uniform DIF are unknown Adjust α for multiple hypotheses? How many multiple hypotheses? Effect size for non-uniform DIF? In huge data sets, likely to have a significant interaction term What proportional change in β1 is significant UDIF?

DIFdetect package Can download from the web www.alz.washington.edu/DIFDETECT/welcome.html STATA-based user friendly package

Outline revisited Statistical background DIFdetect package What do we do when we find DIF? DIF adjustments to PARSCALE code How good are adjusted scores? Discussion

What to do when we find DIF? Educational settings often items with DIF are discarded Unattractive option for us Tests are too short as it is; lose variation Lose precision DIF doesn’t mean that the item doesn’t measure the underlying construct at all, just that it does so differently in different groups

What do we do – 2 Need a technique to incorporate items found to have DIF differently than DIF-free items Precedent for this approach in Reise, Widaman, and Pugh (1993) Constrain parameters for DIF-free items to be identical across groups Estimate parameters for items found with DIF separately in appropriate groups

Compensatory DIF Compensatory DIF occurs when DIF in some items leads to erroneous findings in other items Both false-positive and false-negative DIF findings Iterative process for each covariate until stable solution is reached (i.e., same items identified with DIF on separate runs of DIFdetect)

Adjustments to PARSCALE Create a new dataset that treat items according to their DIF status No DIF 1 DIF 2 No DIF 3 Group 1 Missing Group 2 Group 3

Modified data set 0001 12XX2 0002 12XX4 0003 01XX3 … 0132 1X2X2

PARSCALE code Need new lines (new blocks) for all new items that we create We are automating this step as an extension to DIFdetect Current best advice is to use a huge table in Word Creation of new items is easy; we have STATA code for creation of virtual items

Preparation of data for PARSCALE

Reminder of PARSCALE tips When outfiling from STATA, use wide format Use commas Change missing values to .x Open the file in Word and replace “.x” with X Remember to change 2-digit numbers to their appropriate letters

It gets complicated… This is the CASI, first run of education DIF, after looking at gender and age :

Table helps with PARSCALE code

Adjusted scores related to dementia and CIND In the ACT study, controlling for CASI score (continuous): odds ratio of 2.9 (1.8-4.9) for low DIF-adjusted IRT score (among those with low CASI scores) Adjusted for gender, education, and age Strict 2-stage sample design  verification bias In the CSHA, controlling for 3MS score (continuous): weighted odds ratio of 1.6 (1.1-2.3) for dementia for low DIF-adjusted IRT score, and 1.4 (1.2-1.8) for CIND Adjusted for education and language Sampling and weighting to deal with verification bias

Incorporation of adjusted scores into analyses Here we are in novel territory Is there a reason not to adjust scores for DIF? Questions and comments

Comparison of OLR with other techniques OLR is more flexible (can look at continuous constructs, e.g., education, without dichotomizing or grouping) DIFdetect is very fast When using IRT-derived θ scores, a re-parameterization of IRT analyses DIFdetect OLR incorporates epidemiology concepts of confounding and effect modification Teresi (ed) special issue of Medical Care to come out