Investigating item difficulty change by item positions under the Rasch model Luc Le & Van Nguyen 17th International meeting of the Psychometric Society,

Slides:



Advertisements
Similar presentations
Item Response Theory in a Multi-level Framework Saralyn Miller Meg Oliphint EDU 7309.
Advertisements

1 Scaling of the Cognitive Data and Use of Student Performance Estimates Guide to the PISA Data Analysis ManualPISA Data Analysis Manual.
School autonomy and student achievement. An international study with a focus on Italy Angelo Paletta Maria Magdalena Isac Daniele Vidoni.
Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.
Reading Graphs and Charts are more attractive and easy to understand than tables enable the reader to ‘see’ patterns in the data are easy to use for comparisons.
Using Growth Models for Accountability Pete Goldschmidt, Ph.D. Assistant Professor California State University Northridge Senior Researcher National Center.
Estimating Growth when Content Specifications Change: A Multidimensional IRT Approach Mark D. Reckase Tianli Li Michigan State University.
What are “Signal Words”?
Why Scale -- 1 Summarising data –Allows description of developing competence Construct validation –Dealing with many items rotated test forms –check how.
Student Achievement and Predictors of Student Achievement in a State Level Agricultural Mechanics Career Development Event Edward Franklin Glen Miller.
Copyright © 2001 by The Psychological Corporation 1 The Academic Intervention Monitoring System (AIMS) A guidebook & questionnaires to facilitate selection.
Implication of Gender and Perception of Self- Competence on Educational Aspiration among Graduates in Taiwan Wan-Chen Hsu and Chia- Hsun Chiang Presenter.
Adventures in Equating Land: Facing the Intra-Individual Consistency Index Monster * *Louis Roussos retains all rights to the title.
Spinath, F., Spinath, B., & Plomin, R. (2008). The nature and nurture of intelligence and motivation in the origins of sex differences in elementary school.
Validation of the Assessment and Comparability to the PISA Framework Hao Ren and Joanna Tomkowicz McGraw-Hill Education CTB.
1 / 27 California Educational Research Association 88 th Annual Conference Formative Assessment: Implications for Student Learning San Francisco, CA November.
Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.
Out with the Old, In with the New: NYS Assessments “Primer” Basics to Keep in Mind & Strategies to Enhance Student Achievement Maria Fallacaro, MORIC
 Random Guessing › A function of the proficiency of a person relative to the difficulty of an item(Waller, 1973, 1976, 1989) › Not a property of an item.
The ABC’s of Pattern Scoring Dr. Cornelia Orr. Slide 2 Vocabulary Measurement – Psychometrics is a type of measurement Classical test theory Item Response.
C R E S S T / U C L A Impact of Linguistic Factors in Content-Based Assessment for ELL Students Jamal Abedi UCLA Graduate School of Education & Information.
Review and Validation of ISAT Performance Levels for 2006 and Beyond MetriTech, Inc. Champaign, IL MetriTech, Inc. Champaign, IL.
4.4 Equations as Relations
Assessing assessment: the role of student effort in comparative studies Ray Adams Jayne Butler.
Copyright © 2004, Graduate Management Admission Council ®. All Rights Reserved. 1 Expected Classification Accuracy Lawrence M. Rudner Graduate Management.
Employing Empirical Data in Judgmental Processes Wayne J. Camara National Conference on Student Assessment, San Diego, CA June 23, 2015.
1 An Investigation of The Response Time for Maths Items in A Computer Adaptive Test C. Wheadon & Q. He, CEM CENTRE, DURHAM UNIVERSITY, UK Chris Wheadon.
The Nation’s Report Card: U.S. History National Assessment of Educational Progress (NAEP)
1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.
Pearson Copyright 2010 Some Perspectives on CAT for K-12 Assessments Denny Way, Ph.D. Presented at the 2010 National Conference on Student Assessment June.
Scaling and Equating Joe Willhoft Assistant Superintendent of Assessment and Student Information Yoonsun Lee Director of Assessment and Psychometrics Office.
Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.
University of Ostrava Czech republic 26-31, March, 2012.
Estimation. The Model Probability The Model for N Items — 1 The vector probability takes this form if we assume independence.
A table is a set of data elements (values) that is organized using a model of vertical columns (which are identified by their name) and horizontal rows.
NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 22, 2011 ORLANDO, FL.
Melbourne Education Research Institute 1 Education: an international perspective Barry McGaw Director University of Melbourne Education Research Institute.
C R E S S T / U C L A Psychometric Issues in the Assessment of English Language Learners Presented at the: CRESST 2002 Annual Conference Research Goes.
Aligning Assessments to Monitor Growth in Math Achievement: A Validity Study Jack B. Monpas-Huber, Ph.D. Director of Assessment & Student Information Washington.
Ming Lei American Institutes for Research Okan Bulut Center for Research in Applied Measurement and Evaluation University of Alberta Item Parameter and.
Measuring Mathematics Self Efficacy of students at the beginning of their Higher Education Studies With the TransMaths group BCME Manchester Maria.
1 Collecting and Interpreting Quantitative Data Deborah K. van Alphen and Robert W. Lingard California State University, Northridge.
EVALUATION SUFFECIENCY Types of Tests Items ( part I)
Using Psychometric Analysis to Drive Mathematics Standardized Assessment Decision Making Mike Mazzarella George Mason University.
Understanding RIT and Reading MAP Reports. Agenda Unique features of the RIT scale Calibrating items for MAP Scoring a test Interpretation of scores How.
Rational Functions Review. Simplify Simplify.
1 Main achievement outcomes continued.... Performance on mathematics and reading (minor domains) in PISA 2006, including performance by gender Performance.
Norm Referenced Your score can be compared with others 75 th Percentile Normed.
An Inquiry into Individual Student Science Achievement in Hong Kong: How can school-sponsored science activities effect the relationship between social.
INTRODUCTION TO THE ELPAC
An overview of the changes to GCE biology in England
Assessing the Quality of Instructional Materials: Item Response Theory
Correlation.
The Impact of Item Response Theory in Educational Assessment: A Practical Point of View Cees A.W. Glas University of Twente, The Netherlands University.
Information and Guidance on the Changes and Expectations for 2016/17
The Impact of Item Response Theory in Educational Assessment: A Practical Point of View Cees A.W. Glas University of Twente, The Netherlands University.
Which of these is “a boy”?
Booklet Design and Equating
Week 3 Class Discussion.
Analyzing Reliability and Validity in Outcomes Assessment Part 1
Aligned to Common Core State Standards
Discussion Comments June 22, 2016 Presented at:
Brian Gong Center for Assessment
Mohamed Dirir, Norma Sinclair, and Erin Strauts
An Introduction to Correlational Research
Can We Rely on the Dermatology Life Quality Index as a Measure of the Impact of Psoriasis or Atopic Dermatitis?  James Twiss, David M. Meads, Elizabeth.
Analyzing Reliability and Validity in Outcomes Assessment
Collecting and Interpreting Quantitative Data
  Using the RUMM2030 outputs as feedback on learner performance in Communication in English for Adult learners Nthabeleng Lepota 13th SAAEA Conference.
Test Construction: The Elements
Presentation transcript:

Investigating item difficulty change by item positions under the Rasch model Luc Le & Van Nguyen 17th International meeting of the Psychometric Society, Hong Kong July, 2011

Research Rational IRT item parameter variation: item context, content, format, position, instruction, sample size Impact of different item positions in common item equating: California Achievement Test (CAT; Yen, 1980) Graduate Record Examination (GRE; Kingston & Dorans, 1982) NAEP reading (Zwick, 1991) ACT math and reading (Pommerich & Harris, 2003) PISA science (Le, 2009)

Study Questions How does item difficulty change when changing their position in a test? Effect of gender on the relationship? Effect of ability levels on the relationship?

Study Method Data: Graduate Skills Assessment (GSA) for Columbia in 2010 78 multiple-choice items in three domains of generic skills: Problem Solving (PS), Critical Thinking (CT), and Interpersonal Understandings (IP) 26 items in each domain (1 CT item was removed) 8 test forms in a rotation complete design Each item appears in 6 different positions 8000 Colombian university students (50% males and 50% females) randomly did each test form

Study Method Analysis design: Step 1: Randomly select 1000 candidates from each test form Step 2: Calibrate items in each domain based on a three-faceted Rasch model (test form adjustment) Step 3: Examine the difference of item difficulty estimates from each pair of the forms in relation to the position difference for each item Step 4: (Gender effect) Repeat steps 1-3 for males and females separately Step 5: (Ability level effect) Repeat steps 1-3 for lower and higher ability groups separately

Three-faceted Rasch model x = 0, 1; : difficulty parameter of item i : difficulty parameter of item i in form j : Difficulty of test form j : response (score) of the examinee to the item : Examinee ability

Results

Table 1. Test form difficulty PS CT IP Form Difficulty SE 1 0.008 0.010 0.041 0.005 2 -0.025 0.053 -0.011 3 -0.034 0.011 4 0.004 -0.046 0.017 5 0.014 0.023 -0.006 6 0.020 -0.014 7 0.027 -0.013 8 0.034 -0.084

Figure 1. Mean of item difficulty estimates by item position order – PS items 11 29 32 49 51 69

Figure 2. Mean of item difficulty estimates by item position order – CT items 11 29 32 48 51 70

Figure 3. Mean of item difficulty estimates by item position order – IP items 10 27 30 47 50 68

Table 2. Frequency of item position change PS CT IP Difference in positions Number of Pairs % 52 7.1 50 2 34 4.7 18 2.6 32 4.4 4 2.5 4.6 20 2.7 13 6 0.8 0.9 15 3 0.4 10 1.4 0.5 17 89 12.2 53 7.6 85 11.7 19 128 17.6 130 18.6 17.9 21 23 3.2 2.1 22 3.0 55 82 58 8.0 25 8 1.1 0.6 7 1.0 36 60 8.2 44 6.3 38 48 6.6 8.6 40 56 7.7 5.7 42 6.0 4.9 2.9 59 61 12 1.6 16 2.2 Total 728 100 700

Figure 4. Mean of item difficulty difference by item position change – PS items

Figure 5. Mean of item difficulty difference by item position change – CT items

Figure 6. Mean of item difficulty difference by item position change – IP items

Table 4. Substantial difference by 0.3 logits PS CT IP Pairs % Easier 3 0.4 1 0.1 5 0.7 Harder 173 23.8 86 12.3 109 15.0 None 552 75.8 613 87.6 614 84.3 Total 728 100.0 700

Figure 7. Correlation between difference of item difficulty estimates and item position change

Summary Items tended to become more difficult when being located at the latter end of the test The positive relationship between item difficulty difference and position change was different by item domains The relationship was higher for males than for females The relationship was different by lower and higher ability groups

Application Findings give cautions for test linking designs and common item equating processes In horizontal equating: Common items from different test forms should be located in similar test positions In vertical equating: Should consider both item positions and different ability levels. A simple solution: common items in the beginning of the test.

Further study Which kind of items (by item characteristics) are most vulnerable with the changing of item positions?

Thank you