IRT Equating Kolen & Brennan, 2004. IRT If data used fit the assumptions of the IRT model and good parameter estimates are obtained, we can estimate person.

Slides:



Advertisements
Similar presentations
The effect of differential item functioning in anchor items on population invariance of equating Anne Corinne Huggins University of Florida.
Advertisements

Estimation of Means and Proportions
Properties of Least Squares Regression Coefficients
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
A Method for Estimating the Correlations Between Observed and IRT Latent Variables or Between Pairs of IRT Latent Variables Alan Nicewander Pacific Metrics.
Sampling Distributions
Behavioural Science II Week 1, Semester 2, 2002
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
Intro to Statistics for the Behavioral Sciences PSYC 1900
Ch. 6 The Normal Distribution
k r Factorial Designs with Replications r replications of 2 k Experiments –2 k r observations. –Allows estimation of experimental errors Model:
PROBABILITY AND SAMPLES: THE DISTRIBUTION OF SAMPLE MEANS.
7-2 Estimating a Population Proportion
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview Central Limit Theorem The Normal Distribution The Standardised Normal.
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
EPSY 8223: Test Score Equating
Choosing Statistical Procedures
Item Response Theory. What’s wrong with the old approach? Classical test theory –Sample dependent –Parallel test form issue Comparing examinee scores.
AM Recitation 2/10/11.
Linking & Equating Psych 818 DeShon. Why needed? ● Large-scale testing programs often require multiple forms to maintain test security over time or to.
Adventures in Equating Land: Facing the Intra-Individual Consistency Index Monster * *Louis Roussos retains all rights to the title.
The Probability of a Type II Error and the Power of the Test
RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I Error: Finding a difference when there is no true difference.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Chapter 7 Estimates and Sample Sizes
Chapter 9 Hypothesis Testing II: two samples Test of significance for sample means (large samples) The difference between “statistical significance” and.
Lecture 7 Introduction to Hypothesis Testing. Lecture Goals After completing this lecture, you should be able to: Formulate null and alternative hypotheses.
Measuring Mathematical Knowledge for Teaching: Measurement and Modeling Issues in Constructing and Using Teacher Assessments DeAnn Huinker, Daniel A. Sass,
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Estimating a Population Proportion
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Lecture 2 Review Probabilities Probability Distributions Normal probability distributions Sampling distributions and estimation.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Estimation of a Population Mean
1 EPSY 546: LECTURE 1 SUMMARY George Karabatsos. 2 REVIEW.
Scaling and Equating Joe Willhoft Assistant Superintendent of Assessment and Student Information Yoonsun Lee Director of Assessment and Psychometrics Office.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
University of Ostrava Czech republic 26-31, March, 2012.
1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.
Estimation. The Model Probability The Model for N Items — 1 The vector probability takes this form if we assume independence.
1 Sampling Distribution of Arithmetic Mean Dr. T. T. Kachwala.
5.5.3 Means and Variances for Linear Combinations
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
Handout Six: Sample Size, Effect Size, Power, and Assumptions of ANOVA EPSE 592 Experimental Designs and Analysis in Educational Research Instructor: Dr.
LECTURE 14 NORMS, SCORES, AND EQUATING EPSY 625. NORMS Norm: sample of population Intent: representative of population Reality: hope to mirror population.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
IRT Equating Kolen & Brennan, 2004 & 2014 EPSY
Nonequivalent Groups: Linear Methods Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices (2 nd ed.). New.
Sampling Distributions
Vertical Scaling in Value-Added Models for Student Learning
Booklet Design and Equating
Hypothesis Testing: Hypotheses
Chapter 9 Hypothesis Testing.
Evaluation of measuring tools: reliability
Chapter 7 Sampling Distributions.
Sampling Distribution
Sampling Distribution
Discrete Event Simulation - 4
Hypothesis Testing.
Chapter 7 Sampling Distributions.
Chapter 7 Sampling Distributions.
Inference on the Mean of a Population -Variance Known
Lecture Slides Elementary Statistics Twelfth Edition
Last Update 12th May 2011 SESSION 41 & 42 Hypothesis Testing.
Chapter 7 Sampling Distributions.
Presentation transcript:

IRT Equating Kolen & Brennan, 2004

IRT If data used fit the assumptions of the IRT model and good parameter estimates are obtained, we can estimate person abilities independent of the particular items (sample invariance) When forms are scaled similarly, a person would obtain the same ability estimate regardless of the specific form taken

IRT Equating Because of the sample invariance properties of IRT, equating test forms is simply a process of placing item parameters on the same scale Samples of 2000 seem adequate while 3000 is preferred for calibrating 3-PL models – 3000 is more than needed typically for linear equating – 3000 is not enough for most equipercentile methods

IRT Scale Transformation With nonequivalent groups, parameters from different tests need to be placed on the same scale. If an IRT model fits the data, any linear transformation of the  -scale also fits the data. A linear equation can be used to convert IRT parameter estimates to the same scale. For 2 groups on a common scale, the Ms and SDs will differ.

Item Parameters Scale I Items – a Ij – b Ij – c Ij Person –  Ii Scale J Items – a Jj – b Jj – c Jj Person –  Ji

Transformation Scale I and Scale J are 3-PL IRT scales that differ by a linear transformation  -values for two scales are related as:  Ji = A  Ii + B

Transformation Equations Item parameters on the 2 scales are related Note that the lower asymptote is independent of the scale transformation; because the c parameter is estimated from the probability metric and not the ability metric, no transformation is needed.

A and B Constants

Example (Table 6.1)  Ji = A  Ii + B =.5(-2.00) + (-.5) = -1.5

Expressing A & B by Groups It is more typical to express these relations in terms of groups of items or people The means and SDs are based on groups of items or persons

Example (Table 6.1)

In Practice In nonequivalent group designs, a common- item set (anchor items) is used on each form. Parameter estimates from common items are used to find the scaling constants Conversion formulas using  can only be used when the same population is used to estimate parameters on both scales

Estimating Parameters: Random Groups Equating Design Groups are assumed to be randomly equivalent If the same IRT scaling convention is used (e.g., M = 0, SD = 1), parameter estimates for the two forms are on the same scale No further transformation is necessary

Estimating Parameters: Single Group Design with Counterbalancing The parameters for all examinees on each form can be estimated simultaneously.

Estimating Parameters: Nonequivalent Group Anchor Test Design Item parameters estimated on common items can be used to find the transformation constants Alternatively, parameters on Forms X and Y can be estimated simultaneously through concurrent calibration – considering unique items on each form as “not administered”

Mean/Sigma Transformation Uses the means and SDs of the b-parameters from the common items The values of A and B are then substituted into the rescaling equations.  Ji = A  Ii + B, and those for the item parameters

Equating True Scores The equating is complete if we report ability estimates;  s are the same (within measurement error) when estimated from item parameters on the same scale. If scaled scores are reported,  can be used to estimate true scores on the two forms which can be used as equated scores through the TCC. However, we equate observed scores, not true scores; studies suggest similar results

Equating True Scores 1.  s are transformed to true scores on both Form X and Y 2.Create conversion of true scores to the reporting scale (using observed scores) New Form Observed Score   i  Old Form Observed Score  Scaled Score No justification exists for using observed scores.

Equating Observed Scores The IRT model is used to produce an estimated distribution of observed scores on each form Observed score distributions are equated through equipercentile methods