Dealing with Omitted and Not- Reached Items in Competence Tests: Evaluating Approaches Accounting for Missing Responses in Item Response Theory Models.

Slides:



Advertisements
Similar presentations
Item Response Theory in a Multi-level Framework Saralyn Miller Meg Oliphint EDU 7309.
Advertisements

Item Response Theory in Health Measurement
Models for Measuring. What do the models have in common? They are all cases of a general model. How are people responding? What are your intentions in.
Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.
Overview of field trial analysis procedures National Research Coordinators Meeting Windsor, June 2008.
Latent Change in Discrete Data: Rasch Models
Multivariate Data Analysis Chapter 11 - Structural Equation Modeling.
LECTURE 5 TRUE SCORE THEORY. True Score Theory OBJECTIVES: - know basic model, assumptions - know definition of reliability, relation to TST - be able.
Using Growth Models for Accountability Pete Goldschmidt, Ph.D. Assistant Professor California State University Northridge Senior Researcher National Center.
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
Estimating Growth when Content Specifications Change: A Multidimensional IRT Approach Mark D. Reckase Tianli Li Michigan State University.
How to deal with missing data: INTRODUCTION
Modeling Achievement Trajectories When Attrition is Informative Betsy J. Feldman & Sophia Rabe- Hesketh.
Comparison of Reliability Measures under Factor Analysis and Item Response Theory —Ying Cheng , Ke-Hai Yuan , and Cheng Liu Presented by Zhu Jinxin.
Today Concepts underlying inferential statistics
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.
Measurement, Control, and Stability of Multiple Response Styles Using Reverse Coded Items Eric Tomlinson Daniel Bolt University of Wisconsin-Madison Ideas.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
SHOWTIME! STATISTICAL TOOLS IN EVALUATION CORRELATION TECHNIQUE SIMPLE PREDICTION TESTS OF DIFFERENCE.
Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010.
©2006 Prentice Hall Business Publishing, Auditing 11/e, Arens/Beasley/Elder Audit Sampling for Tests of Details of Balances Chapter 17.
Module G Variables Sampling Accounting 4081Module G.
©2010 Prentice Hall Business Publishing, Auditing 13/e, Arens//Elder/Beasley Audit Sampling for Tests of Details of Balances Chapter 17.
©2012 Pearson Education, Auditing 14/e, Arens/Elder/Beasley Audit Sampling for Tests of Details of Balances Chapter 17.
Inference for Regression BPS chapter 23 © 2010 W.H. Freeman and Company.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
User Study Evaluation Human-Computer Interaction.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Rasch trees: A new method for detecting differential item functioning in the Rasch model Carolin Strobl Julia Kopf Achim Zeileis.
Scales & Indices. Measurement Overview Using multiple indicators to create variables Using multiple indicators to create variables Two-step process: Two-step.
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
The KOPPITZ-2 A revision of Dr. Elizabeth Koppitz’
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
MEASUREMENT. MeasurementThe assignment of numbers to observed phenomena according to certain rules. Rules of CorrespondenceDefines measurement in a given.
Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
CHAPTER 14 AUDIT SAMPLING FOR TESTS OF DETAIL OF BALANCES.
Chapter 6: Analyzing and Interpreting Quantitative Data
Constructs AKA... AKA... Latent variables Latent variables Unmeasured variables Unmeasured variables Factors Factors Unobserved variables Unobserved variables.
Item Factor Analysis Item Response Theory Beaujean Chapter 6.
NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 22, 2011 ORLANDO, FL.
Chapter 16: Correlation. So far… We’ve focused on hypothesis testing Is the relationship we observe between x and y in our sample true generally (i.e.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Latent regression models. Where does the probability come from? Why isn’t the model deterministic. Each item tests something unique – We are interested.
1 Hester van Eeren Erasmus Medical Centre, Rotterdam Halsteren, August 23, 2010.
Item Response Theory in Health Measurement
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Heriot Watt University 12th February 2003.
Tutorial I: Missing Value Analysis
2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Lesson 2 Main Test Theories: The Classical Test Theory (CTT)
Analysis…Measures of Central Tendency How can we make SENSE of our research data???
Audit Sampling for Tests of Details of Balances
assessing scale reliability
Classical Test Theory Margaret Wu.
Item Analysis: Classical and Beyond
STATISTICAL TOOLS FOR AUDITING
12 Inferential Analysis.
Assessing Student Learning
National Conference on Student Assessment
By ____________________
12 Inferential Analysis.
15.1 The Role of Statistics in the Research Process
Item Analysis: Classical and Beyond
Multitrait Scaling and IRT: Part I
Item Analysis: Classical and Beyond
Presentation transcript:

Dealing with Omitted and Not- Reached Items in Competence Tests: Evaluating Approaches Accounting for Missing Responses in Item Response Theory Models 25/11/2013

Missing responses in competence tests A.Not-administered items B.Omitted items C.Not-reached items B & C are commonly observed in large scale tests.

Dealing with missing responses Classical approaches – Simply ignore missing responses – Score as incorrect responses – Score as fractional correct – Two-stage procedure Imputation-based approaches – Disadvantage in IRT models

Model-based approaches for nonignorable missing data mechanism Latent approach for modeling missing responses due to omitted items Latent approach for modeling missing responses due to not-reached items

Manifest approach for missing responses – Compute missing indicator – Regress  by the manifest variable (latent regression) Comparison between two approaches

Performance of model-based approaches Model-based approaches perform better if the corresponding assumptions are met. – Unbiased estimates – Higher reliability Is the model assumption plausible? – Single latent (or manifest) variable for missing responses

Research questions 1.Test the appropriateness of the unidimensionality assumption of omission indicator. 2.Whether the missing responses are ignorable? Or, whether the model-based approaches are needed? 3.Evaluate the performance of different approaches regarding item and person estimation.

Read data National Educational Panel Study Reading (59 items) and mathematics (28 items) N = 5194 Averaged missing rate (per person): – Omitted item: 5.37% and 5.15% – Not-reached items: 13.46% and 1.32%

Analysis Five approaches (models) – M1: missing responses as incorrect – M2: two-step procedure – M3: ignoring missing responses – M4: manifest approach – M5: latent approach

Analysis (cont.) Four kinds of missing responses a)Omitted items only b)Not-reached items only c)Composite across both d)Dealing with both separately (Figure 2) Two competence domains

Results Dimensionality of the omission indicators – Acceptable WMNSQ – Point-biserial correlation of the occurrence of missing responses on an item and the overall missing tendency Amount of ignorability – A long story… – Five conclusions

Item parameter estimates

Person parameter estimates

Complete case simulation True model: Figure 2b One for omission – Mean omission rate per item: 3.7% Two for time limits – Positive or negative correlation between latent ability and missing propensity – Mean percentage of not-reached items per persons: 13.4% and 12.5% Two simulation datasets were produced.

Results

Discussion Model-based approaches successfully draw on nonignorability of the missing responses. It was found the missing propensity was not needed to model item responses. (why not?) The findings are limited to low-stakes tests. Is there a general missing propensity across competence domain and time?