FIT ANALYSIS IN RASCH MODEL University of Ostrava Czech republic 26-31, March, 2012.

Slides:



Advertisements
Similar presentations
Item Analysis.
Advertisements

Continued Psy 524 Ainsworth
Estimation of Means and Proportions
“Students” t-test.
DIF Analysis Galina Larina of March, 2012 University of Ostrava.
LOGO One of the easiest to use Software: Winsteps
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Item Response Theory in Health Measurement
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
AN OVERVIEW OF THE FAMILY OF RASCH MODELS Elena Kardanova
Models for Measuring. What do the models have in common? They are all cases of a general model. How are people responding? What are your intentions in.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Additional Topics in Regression Analysis
The Simple Regression Model
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
PSY 307 – Statistics for the Behavioral Sciences
Linear Regression and Correlation Analysis
Slides by JOHN LOUCKS St. Edward’s University.
Chapter 11 Multiple Regression.
Today Concepts underlying inferential statistics
© UCLES 2013 Assessing the Fit of IRT Models in Language Testing Muhammad Naveed Khalid Ardeshir Geranpayeh.
Elec471 Embedded Computer Systems Chapter 4, Probability and Statistics By Prof. Tim Johnson, PE Wentworth Institute of Technology Boston, MA Theory and.
1 Psych 5500/6500 Statistics and Parameters Fall, 2008.
Identification of Misfit Item Using IRT Models Dr Muhammad Naveed Khalid.
Selecting the Correct Statistical Test
Inference for regression - Simple linear regression
Hypothesis Testing:.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Review of Basic Statistics. Definitions Population - The set of all items of interest in a statistical problem e.g. - Houses in Sacramento Parameter -
Regression Analysis (2)
Chapter 6 The Normal Probability Distribution
SECTION 6.4 Confidence Intervals for Variance and Standard Deviation Larson/Farber 4th ed 1.
Quantitative Methods Heteroskedasticity.
Chapter 3 – Descriptive Statistics
Statistics Primer ORC Staff: Xin Xin (Cindy) Ryan Glaman Brett Kellerstedt 1.
Today’s lesson Confidence intervals for the expected value of a random variable. Determining the sample size needed to have a specified probability of.
Chapter 9: Testing Hypotheses
Descriptive Statistics: Variability Lesson 5. Theories & Statistical Models n Theories l Describe, explain, & predict real- world events/objects n Models.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Introduction to Linear Regression
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.
Curve-Fitting Regression
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Two Main Uses of Statistics: 1)Descriptive : To describe or summarize a collection of data points The data set in hand = the population of interest 2)Inferential.
Confidence Intervals for Variance and Standard Deviation.
Examining Data. Constructing a variable 1. Assemble a set of items that might work together to define a construct/ variable. 2. Hypothesize the hierarchy.
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
Sampling and estimation Petter Mostad
© Copyright McGraw-Hill 2004
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
1 Chi-square Test Dr. T. T. Kachwala. Using the Chi-Square Test 2 The following are the two Applications: 1. Chi square as a test of Independence 2.Chi.
Section 6.4 Inferences for Variances. Chi-square probability densities.
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
Chi Square Test for Goodness of Fit Determining if our sample fits the way it should be.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
LESSON 5 - STATISTICS & RESEARCH STATISTICS – USE OF MATH TO ORGANIZE, SUMMARIZE, AND INTERPRET DATA.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
Confidence Intervals. Point Estimate u A specific numerical value estimate of a parameter. u The best point estimate for the population mean is the sample.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Estimating standard error using bootstrap
Chapter 9 Hypothesis Testing.
Point and interval estimations of parameters of the normally up-diffused sign. Concept of statistical evaluation.
BPK 304W Correlation.
CHAPTER 29: Multiple Regression*
Discrete Event Simulation - 4
Chapter 6 Confidence Intervals.
Presentation transcript:

FIT ANALYSIS IN RASCH MODEL University of Ostrava Czech republic 26-31, March, 2012

Two questions relating fit analysis:  How to assess fit the real data to the chosen model of measurement?  What should we do if the data don’t fit the model?

3 Two approaches to assessing fit Fit statistics, based on standardized residuals Chi – square criteria, assesssing the closeness of model and empirical characteristic curves

Let a ni be a scored response for the interaction of the person n, n=1,…,N, and the item i, i=1,…,I (the 1’s and 0’s in dichotomous case); x ni – standardized residual; P ni – the probability of a correct response for person n on item i.

Properties of standardized residuals x ni M(x ni )=0, D(x ni )=1 ; In theory values vary in range ( -∞,+∞); in practice values usually range from -10 to +10. If P ni =0,99 and a ni =0 than x ni ≈-9,94 ; similarily if P ni =0,01 and a ni =1, than x ni ≈ 9,94 ; Positive values represent correct responses: x ni >0 if a ni =1 ; Negative values represent incorrect responses: x ni <0 if a ni =0 ; The values are assumed to have normal distribution N (0,1).

Dependence of the standardized residual on the difference θ n - δ i

Statistics x ni can have positive and negative values, so summing residuals across items and persons is informative. The solution of this problem is squaring standadized residuals:

Properties of y ni Statistics y ni has only non negative values; In theory values y ni vary in range (0,+∞); in practice values usually range from 0 to 100, at that most of values are in range (0,2); The expected value and variance of y ni are: M(y ni )=1, D(y ni )=2. Statistics y ni = x ni 2 can be evaluated as having χ 2 distribution with one degree of freedom.

Distribution features of statistics y ni These squared standardized residuals are only approximate chi- squares Statistics y ni would have exact χ 2 distribution, if the following conditions were completed: 1) a ni was continuous variable (rather than discrete); 2) exact values of the possibity P ni were known (indeed only estimates of this possibility are known that are based on parameter estimates); 3) The data fit the measurement model.

Person fit statistics for an examinee n: This statistics is a sum of all values y ni for the examinee across all items. It has approximately χ 2 distribution with df= I. A problem: for each degree of freedom there is a different crirical value, so no single critical value can be used

Item fit statistics for an item i: This statistics is a sum of all values y ni for the item across all examinees. It has approximately χ 2 distribution with df= N. The same problem: for each degree of freedom there is a different crirical value, so no single critical value can be used.

A possible solution of the critical value problem Transformating the chi-squre statistics into a mean – square by dividing the chi-square by its degrees of freedom (Outfit MNSQ in Winsteps): Person-fit statistics for an examinee n: Item-fit statistics for an item i :

Properties of mean square statistics U n (1) and U i (1) Statistics U n (1) и U i (1) vary in range from [0,+∞); Expected value is 1 : M(U n (1) )=M(U i (1) )=1. Statistics U n (1) и U i (1) are very sensitive to outliers (unexpected correct or incorrect responses).

To counteract this sentivity to outliers the weighted versions of person-fit and item-fit statistics were developed:

Properties of weighted fit statistics Each squared standardized residual y ni is weighted by the dispersion D(a ni )=P ni ·q ni before it is summed. The value D(a ni ) is the least for the items difficulty of which don’t correspond to ability level of the examinee. Thus, contribution of these items to statistics U n (2) and U i (2) will be reduced. Statistics U n (2) and U i (2) vary in range [0,+∞) and have expected value of 1.

Total fit statistics The MNSQ statistics U n (1), U n (2), U i (1 ) и U i (2) are called total fit statistics. Observed values of MNSQ statistics U n (1), U n (2), U i (1 ) and U i (2) are the more closed to expected value 1, the more the real data fir the Rasch model. If the real data don’t fit the model, observed values of MNSQ statistics will differ from 1.

The problem with critical values of total fit statistics Critical values of mnsq statistics are different for different samples and different tests The distributions of mnsq statistics are approximate and, as a rule, empirical distributions differ from the theoretical ones So we can not use the same critical values for mnsq statistics defined from their theoretical distribution.

Interpretation of total fit statistics values The value of item-fit ststistics of 1.3 can be interpreted as indicating noise in the data in the item response pattern: there is 30% more variation in the data than it was predicted by the modal (underfit) The value of item-fit ststistics of 0.8 can be interpreted as indicating Guttman pattern: there is 20% less variation in the data than it was predicted by the modal (overrfit)

Recomendations on interpretation of fit statistics

Transformating the mnsq statistics to standardized form (zstd in Winsteps) There are two kinds of transformation that converts the mean-square to an approximate t- statistics: Logarithm transformation Cube-root transformation

Properties of standardized fit statistics Standardized fit statistics t have approximately normal distribution N(0,1), So with this statistics common critical values can be developed: for significance level 0,05 the acceptable values are in the range of (-2,+2) Simulation studies have shown that the standardized fit statistics have more consistent distributional properties in the face of varying sample size than do the mnsq ststistics

Statistics for Item Fit Analysis Total item-fit statistics U i (1) (Mnsq Outfit) Standardized item-fit statistics t i (1) (Zstd Outfit) Weighted total item-fit statistics U i (2) (Mnsq Infit) Standardized weighted item-fit statistics t i (2) (Zstd Infit) A combination of item-fit statistics provides the best opportunity to detect poor fit items

Recommended ranges for the total item-fit statistics for different test types

Some reasons for poor item-fit (underfit - mnsq fit-statistics values are more than 1.2; zstd fit-statistics values are more than +2) Test is not unidimensional Bed items (mistakes with keys, bed distractors in MC items, item mistakes, etc.) Particular features of examinee behavior (guessing, carelessness, etc.)

About items that are overfit (mnsq fit-statistics values are less than 0.8; zstd fit-statistics values are less than -2) Patterns of these items are too perfect (Guttman). It is not in agreement with a probabilistic nature of the model Too perfect patterns of these items can be consequence of more high discrimination of these items. A possible reason for it is violation of local independence: examinee’s response to one item affects his response to another one. Such items don’t contribute into measurement of ability.

26 The second approach to item fit: chi-square statistics df = s - 1

Problems with chi-square approach use How many points for sample dividing should we take? 3 or 5 or 10 or more? In many test situations this statistics has low efficiency In Rasch measurement the preference gives to item-fit statistics described above. In addition software Winsteps produces confidence intervals for ICC based on chi- square approach

Example of fit analysis: test description Test contains 50 items which are divided into three parts: part 1 has 32 MC items with 4 options (А1 – А32); part 2 has 12 opened items with a short answer (В1 – В12); part 3 has 6 items with free-constructed response (С1 – С6)/ Most of items were scored dichotomously, and a few items were scored polytomously. The total sample size is 655.

Item statistics

ICC of a poor fit item Dicriminating power of this item is lower than other items have

ICC of a good item

ICCs of the first 9 items (А1-А9)

ICC of an item with two fit statistics with values below the left critical value Discriminating power of the item is higher than other items have

Statistics for Person Fit Analysis Total person-fit statistics U n (1) (Mnsq Outfit) Standardized person-fit statistics t n (1) (Zstd Outfit) Weighted total person-fit statistics U n (2) (Mnsq Infit) Standardized weighted person-fit statistics t n (2) (Zstd Infit) A combination of person-fit statistics provides the best opportunity to detect poor fit items

Some reasons of poor person fit: Bed items Personal features of the examinee Gap in the examinee knowledge Violation of test conditions

Example of person analysis

Analysis of examinewe profiles