You got WHAT on that test? Using SAS PROC LOGISTIC and ODS to identify ethnic group Differential Item Functioning (DIF) in professional certification exam.

Slides:



Advertisements
Similar presentations
Copyright © 2006 Educational Testing Service Listening. Learning. Leading. Using Differential Item Functioning to Investigate the Impact of Accommodations.
Advertisements

DIF Analysis Galina Larina of March, 2012 University of Ostrava.
Item Response Theory in a Multi-level Framework Saralyn Miller Meg Oliphint EDU 7309.
Rebecca Sleeper July  Statistical  Analysis of test taker performance on specific exam items  Qualitative  Evaluation of adherence to optimal.
Item Response Theory in Health Measurement
Introduction to Item Response Theory
AN OVERVIEW OF THE FAMILY OF RASCH MODELS Elena Kardanova
Simple Logistic Regression
Models for Measuring. What do the models have in common? They are all cases of a general model. How are people responding? What are your intentions in.
Overview of field trial analysis procedures National Research Coordinators Meeting Windsor, June 2008.
Discrete-Event Simulation: A First Course Steve Park and Larry Leemis College of William and Mary.
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
Item Analysis Prof. Trevor Gibbs. Item Analysis After you have set your assessment: How can you be sure that the test items are appropriate?—Not too easy.
© UCLES 2013 Assessing the Fit of IRT Models in Language Testing Muhammad Naveed Khalid Ardeshir Geranpayeh.
Aim: How do we use SPSS to create and interpret scatterplots? SPSS Assignment 1 Due Friday 2/12.
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.
Computerized Adaptive Testing: What is it and How Does it Work?
Item Response Theory for Survey Data Analysis EPSY 5245 Michael C. Rodriguez.
Item Response Theory. What’s wrong with the old approach? Classical test theory –Sample dependent –Parallel test form issue Comparing examinee scores.
Kaizen–What Can I Do To Improve My Program? F. Jay Breyer, Ph.D. Presented at the 2005 CLEAR Annual Conference September Phoenix,
Modern Test Theory Item Response Theory (IRT). Limitations of classical test theory An examinee’s ability is defined in terms of a particular test The.
Translation and Cross-Cultural Equivalence of Health Measures.
Is the Force Concept Inventory Biased? Investigating Differential Item Functioning on a Test of Conceptual Learning in Physics Sharon E. Osborn Popp, David.
Chapter 9 Audit Sampling: An Application to Substantive Tests of Account Balances This presentation focuses (like my course) on MUS. It omits the effect.
The ABC’s of Pattern Scoring Dr. Cornelia Orr. Slide 2 Vocabulary Measurement – Psychometrics is a type of measurement Classical test theory Item Response.
Measuring Mathematical Knowledge for Teaching: Measurement and Modeling Issues in Constructing and Using Teacher Assessments DeAnn Huinker, Daniel A. Sass,
SAS PROC IRT July 20, 2015 RCMAR/EXPORT Methods Seminar 3-4pm Acknowledgements: - Karen L. Spritzer - NCI (1U2-CCA )
智慧型系統實驗室 iLab 南台資訊工程 1 Evaluation for the Test Quality of Dynamic Question Generation by Particle Swarm Optimization for Adaptive Testing Department of.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Lab 5: Item Analyses. Quick Notes Load the files for Lab 5 from course website –
Basic Measurement and Statistics in Testing. Outline Central Tendency and Dispersion Standardized Scores Error and Standard Error of Measurement (Sm)
A COMPARISON METHOD OF EQUATING CLASSIC AND ITEM RESPONSE THEORY (IRT): A CASE OF IRANIAN STUDY IN THE UNIVERSITY ENTRANCE EXAM Ali Moghadamzadeh, Keyvan.
Differential Item Functioning. Anatomy of the name DIFFERENTIAL –Differential Calculus? –Comparing two groups ITEM –Focus on ONE item at a time –Not the.
Using the IRT and Many-Facet Rasch Analysis for Test Improvement “ALIGNING TRAINING AND TESTING IN SUPPORT OF INTEROPERABILITY” Desislava Dimitrova, Dimitar.
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.
Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
The ABC’s of Pattern Scoring
Item Factor Analysis Item Response Theory Beaujean Chapter 6.
Translation and Cross-Cultural Equivalence of Health Measures
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Item Response Theory in Health Measurement
Summary of Bayesian Estimation in the Rasch Model H. Swaminathan and J. Gifford Journal of Educational Statistics (1982)
FIT ANALYSIS IN RASCH MODEL University of Ostrava Czech republic 26-31, March, 2012.
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Heriot Watt University 12th February 2003.
Rating Scale Examples. A helpful resource
Chapter 1 Introduction to Statistics. Section 1.1 Fundamental Statistical Concepts.
Overview of Item Response Theory Ron D. Hays November 14, 2012 (8:10-8:30am) Geriatrics Society of America (GSA) Pre-Conference Workshop on Patient- Reported.
ARDHIAN SUSENO CHOIRUL RISA PRADANA P.
Evaluating Multi-Item Scales
Item Analysis: Classical and Beyond
Which of these is “a boy”?
III Choosing the Right Method Chapter 10 Assessing Via Tests
BUS 308Competitive Success/tutorialrank.com
BUS 308 HELPS Education Your Life-- bus308helps.com.
BUS 308 HELPS Lessons in Excellence-- bus308help.com.
BUS 308 HELPS Perfect Education/ bus308helps.com.
BUS 308 Education for Service-- tutorialrank.com.
BUS 308 HELPS Education for Service-- bus308helps.com.
His Name Shall Be Revered …
III Choosing the Right Method Chapter 10 Assessing Via Tests
Mohamed Dirir, Norma Sinclair, and Erin Strauts
Rating Scale Examples.
Analyzing test data using Excel Gerard Seinhorst
Descriptive Statistics
Item Analysis: Classical and Beyond
Multitrait Scaling and IRT: Part I
Item Analysis: Classical and Beyond
Investigations into Comparability for the PARCC Assessments
Tests are given for 4 primary reasons.
Presentation transcript:

You got WHAT on that test? Using SAS PROC LOGISTIC and ODS to identify ethnic group Differential Item Functioning (DIF) in professional certification exam questions Steve Grilli Life Office Management Association

PRESENTATION OUTLINE Introduce LOMA Intro to Educational Stats LOMA’s SAS Item Analysis Program “DIF” Defined Logistic Regression LOMA’s SAS DIF Identification Program Conclusions

About LOMA   Founded in an international association of insurance and financial services companies   Located in Atlanta, GA... with local partners around the world   Purpose: to facilitate information sharing, improve company operations and management, provide industry-specific employee development

LOMA By the Numbers  80+ years of experience  1,200+ members in 80 countries  13 professional education programs  Courses available in 7 languages  100,000+ annual examination enrollments  More than 10,000 attendees to conferences & meetings each year  1,200 individuals serve on more than 50 LOMA committees

Educational Statistics: “Classical” Item Analysis vs. IRT Item Response Theory – three parameter Rasch model with discrimination parameter a, difficulty b, and “pseudo-guessing” parameter c. Used in Computer Adaptive Testing (CAT)

Classical Item Analysis  Biserial correlation between performance on a dicotomous test item (X=1 if student got it correct; 0 otherwise), and a continuous variable – score on the entire exam.

ITEM ANALYSIS – SAS CODE  /* CALCULATE BISERIAL CORRELATIONS FOR AN ARREA OF EXAM QUESTIIONS */  DATA NEXT;  SET PXDAT;  SET ADD;  SET YI;  ARRAY P PX1-PX&R;  ARRAY ZCAL 3 Z1-Z&R;  ARRAY BISA 3 BISA1-BISA&R;  ARRAY BIS 3 BIS1-BIS&R;  ARRAY YI YIMEAN1-YIMEAN&R;  DO OVER P;  ZCAL=PROBIT(P);  BISA=.39894/EXP((ZCAL*ZCAL)/2);  END;  DO OVER BIS;  BIS=((YI-YMEAN)/YSTD)*(P/BISA);  END;  PROC TRANSPOSE DATA=NEXT OUT=BIS PREFIX=BIS;  VAR BIS1-BIS&R;

ITEM ANALYSIS – SAS OUTPUT  ITEM ANALYSIS  PAPER EXAMS  COURSE 290  FORM 1265  04M  COURSE: 290  ITEM: 1 1 2* OMIT  1,180 UPPER 3RD  1,181 MIDDLE 3RD  1,180 LOWER 3RD  3,541 TOTAL  BISERIAL CORRELATION: CONFIDENCE:  COURSE: 290  ITEM: * 5 6 OMIT   1,180 UPPER 3RD  1,181 MIDDLE 3RD  1,180 LOWER 3RD  3,541 TOTAL  BISERIAL CORRELATION: CONFIDENCE: 100.0

ITEM ANALYSIS – SAS OUTPUT  PAPER ITEM ANALYSIS EXCEPTION REPORT  COURSE 290  FORM 1265  04M  ERROR CODES  E1: BISERIAL CORRELATION LESS THAN.200  E2: FEWER THAN 50% OF THE UPPER GROUP CHOSE RIGHT ANSWER  E3: 25% OR MORE OF UPPER GROUP CHOSE A SPECIFIC DISTRACTOR  E4: DISCRIMINATION CONFIDENCE LESS THAN 90% (50 OR MORE STUDENTS)  (NOTE PROBLEM ANSWERS IN PARENTHESIS FOR E2 AND E3)  ITEM PROBLEMS  53 E1 E4  71 E3(1)

DIFFERENTIAL ITEM FUNCTIONING (DIF) “ an item displays DIF if examinees from different groups have differing probabilities or likelihoods of success on the item after conditioning or matching on the ability the item is intended to measure” -- NCME “ an item displays DIF if examinees from different groups have differing probabilities or likelihoods of success on the item after conditioning or matching on the ability the item is intended to measure” -- NCME DIF is a necessary but not a sufficient condition for item bias DIF is a necessary but not a sufficient condition for item bias Item bias exists when members of one group are less likely to answer an item correctly because of some aspect of the item or the testing situation that in not relevant to the purpose of the testing. Item bias exists when members of one group are less likely to answer an item correctly because of some aspect of the item or the testing situation that in not relevant to the purpose of the testing.

TYPES OF DIF  Two types of DIF: Uniform and Non-Uniform.  Uniform DIF is when one group’s advantage is roughly constant across the ability scale.  Non-Uniform DIF occurs when the advantage varies at different ability levels; i.e., ability and group membership interact

DIF DETECTION  Experts recommend the use of logistic regression to detect DIF  LOMA chose this method for its conceptual clarity, its ability to detect non- uniform DIF, and the ease with which existing SAS software could be employed in its detection

LOGISTIC REGRESSION

LOMA DIF LOGISTIC MODEL Theta is ability measure (score on the exam) E is education, 1 if BA or higher; 0 otherwise G is group membership – generally US vs China Theta x G is the interaction term to test for non-uniform DIF G x E is interaction of group and education

DIF LOGISTIC MODEL: SAS CODE  PROC LOGISTIC DESCENDING ;  ODS OUTPUT TypeIII=MODEL&I GlobalTests=GT&I;  CLASS EDCODE (PARAM=REF REF='A')  GRP (PARAM=REF REF='US');  MODEL RES&I=GRADE EDCODE GRP GRP*GRADE GRP*EDCODE/  SELECTION=STEPWISE INCLUDE=1 SLE=.01 SLS=.01 HIER=MULTIPLE;

SAS DIF PROGRAM: OUTPUT  DIFFERENTIAL ITEM FUNCTIONING REPORT  COURSE M  REFERENCE GROUP: UNITED STATES  FOCAL GROUP: CHINA  ITEM MODEL PREDICTORS DIF TYPE LR CHI SQ CONFIDENCE  1 SCORE %   2 SCORE %   3 SCORE %  4 SCORE, GROUP UNIFORM %  5 SCORE, GROUP UNIFORM %  6 SCORE, GROUP UNIFORM %  7 SCORE, GROUP UNIFORM %  8 SCORE %  9 SCORE, GROUP, GROUP*SCORE NON-UNIFORM %

SAS DIF PROGRAM: FILE OUTPUT  [290,M04,17]  1=US v CH -- S -- NONE  [290,M04,18]  1=US v CH -- S -- NONE  [290,M04,19]  1=US v CH -- S, G, G*S -- NON-U  [290,M04,20]  1=US v CH -- S, ED, G -- U (2)

Item: S0420-E (QID=9780)CR: 4DiffiEst: 81Codes: mc; 0, r TextRef: O&S, c. 11, pp Mandatory: Y Most of an insurer's customers can be characterized as either external or internal. However, some customers have characteristics of both internal and external customers. One example of an insurance customer who has characteristics of both internal and external customers is (1) a third-party administrator (2)a policy beneficiary (3)an individual policyowner (4)a general agent ItemRegionGroupN1234*56OmitCRR,Bis.,Conf.,E,DIF S /04 All Upper 3rd1, , 0.466, 100.0, DIF: US v CH -- S, ED, G -- U (2) Middle 3rd1, Lower 3rd1, Total3, S /04 US/Can Upper 3rd , 0.264, 100.0, E4 Middle 3rd Lower 3rd Total S /04 Int'l Upper 3rd1, , 0.527, Middle 3rd1, Lower 3rd1, Total3, pp. 288, 289

CONCLUSIONS Need to monitor DIF due to increasing globalization SAS PROC LOGISTIC and ODS feature simple and effective means of DIF detection