PARCC Field Test Study Comparability of High School Mathematics End-of- Course Assessments National Conference on Student Assessment San Diego June 2015
Overview PARCC field test EOC study design Statistical analysis SME review of item maps 2
To assist states in aligning instruction to the CCSSM, model course pathways were developed for High School mathematics with standards organized into two sequences of coursework designed to lead to college and career readiness and to prepare students for study in more advanced mathematics courses. – The Traditional pathway is based on organization of high school course work typically seen in the United States. o It includes two algebra courses and a geometry course, with some data, probability and statistics included in each course. (Algebra 1, Geometry, Algebra 2) – The Integrated pathway provides a more integrated approach to secondary mathematics that is less common in the United States, but typical internationally. o It includes a sequence of three courses, each of which includes number, algebra, geometry, probability and statistics. (Integrated Mathematics 1, 2, 3) Introduction 3
The HS EOC comparability study was designed to address the following research questions: 1. What degree of comparability (e.g., linked or concorded) can be achieved between the assessments of the two course sequences? Can the comparability be achieved at the course level or only at the aggregate level? 2. How do the psychometric properties of items that are used in assessments of both course sequences compare? More specifically, can a single calibration suffice for an item used in both course sequences or must an item be separately calibrated for use in each? Study Overview 4
To the extent possible, the Field Test was designed to reflect future operational administrations – 2 separate administrations – PBA in March, EOY in April – Dual mode administration – PBA and EOY field test forms constructed to full operational test blueprints and requirements FT data collection design – 2 conditions: 1) Full summative (FS, PBA+EOY), 2) PBA or EOY but not both – Linking through common items across forms and conditions, and randomly equivalent groups – Oversampling to reach target sample size 1,200 valid cases per form – Initial design of 6 FS forms per test title for scoring/scaling and research studies; modified in response to recruitment challenges Overview of Field Test Design 5
Primary FT data (CBT as per RFP) – Traditional and Integrated forms with common items o Original design had 6 Condition 1 (FS, PBA & EOY) forms each EOC – Number of forms reduced due for all EOCS, with greater reduction and redistribution for Integrated o Linkage across same level courses (Alg1/Math1, Geometry/Math2, Alg2/Math3), and diagonally as per PARCC frameworks – For each EOC o Sample recruitment challenges, sought volunteers o Target of 1,200 valid cases per form not met despite forms reduction - persistent gaps for Integrated Math EOC Study Data Collection 6
Data Status – Traditional Math 7 PBAEOY TestTest Condition FormValid Cases Number of core items per Form Possible score points per Form FormValid Cases Number of core items per Form Possible score points per Form Algebra 1Cond 1 FS (PBA+EOY) 11, , , , , , , , , GeometryCond 1 FS (PBA+EOY 11, , , , , , , , , Algebra 2Cond 1 FS (PBA+EOY 11, , , , , , , ,
Data Status – Integrated Math 8 TestTest ConditionFormValid CasesNumber of core items per Form Possible score points per Form FormValid CasesNumber of core items per Form Possible score points per Form Integrated Math1 Cond 1 FS (PBA+EOY) Cond 2 PBA 21, Cond 2 EOY Integrated Math2 Cond 1 FS (PBA+EOY) Cond 2 PBA Cond 2 EOY Integrated Math3 Cond 1 FS (PBA+EOY) Cond 2 PBA Cond 2 EOY PBAEOY
Core Items, N Common (Points) 9 Type of LinkEOC Linkage PBAEOY Traditional Condition 1 Integrated Cond 1 Integrated Cond 2 Total Integrated Cond 1 Integrated Cond 2 Total Pathway ALG1 IM16(15)4(9)10(24)7(10)18(23)25(33) GEOM IM2 1(3) 3(4)8(9)11(13) ALG2 IM32(7)2(2)4(9)8(10)16(26)24(36)
Classical item analysis – cross-sequence examination of relative item difficulties Cross-sequence DIF Comparative analyses of factor structure Cross-sequence linking – Separate calibrations (1PL), linking with mean-mean procedure Item maps – For examination of consistency of item difficulties – For examination of consistency of meaning of scores at key points with respect to KSAs Analysis Plan 10
Calculate summary statistics of item difficulties (p-values) for common items administered in each pathway Convert common item p-values to z-scores and plot to examine the consistency of relative difficulty across the pathways Item Difficulty for Common Items
Z-Value Plot: PBA Algebra 1 vs. Mathematics 1 12
Z-Value Plot: EOY Algebra 1 vs. Mathematics 1 13
Z-Value Plot: EOY Geometry vs. Mathematics 2 14
Z-Value Plot: EOY Algebra 2 vs. Mathematics 3 15
Algebra 1, Mathematics 1: Correlations indicate consistency of common item relative difficulty in the two EOC populations, at levels considered sufficient to support linking Geometry, Mathematics 2: Lower correlation, typically considered insufficient for linking Algebra 2, Mathematics 3: Correlation at level considered sufficient to support linking Z-Score Summary
Students per EOC Test Item
Separate Calibrations, Linking 18 For dichotomous items, the 1PL model (Rasch) For polytomous items, the one-parameter partial credit (1PPC) model After separate calibrations, examined correlations of item difficulty parameter estimates for the EOC pair common items. Item parameter estimates for each EOC course pair were placed on the same scale using the common item linking mean- mean procedure.
Algebra 1 with Mathematics 1.92 Algebra 2 with Mathematics 3.92 Geometry with Mathematics 2.84 Correlations of Common Item Difficulty Parameter Estimates
Item Maps Item maps for each course included both course-specific items and common items, separately identified. The common items provide the vehicle for aligning the items from the two courses. Criteria for location of items on the map is based on a specified response probability (RP67) - Metric: Scale score=(RP67 theta * 100)
Item Map: Algebra 1 vs. Int. Math 1 21
Question:Does obtaining a Score of X (showing what a student knows and can do in terms of item content) for Test I match what it means to obtain a Score of X in Test II? Responses: 1Yes, very much so 2For the most part, but there are some differences 3Somewhat, but weakly 4No, not at all Expert Review (Subject Matter Experts) Rating Scale
Experts Review—First Set of Ratings Interpret the meaning of scores at key points on the scale in terms of the KSAs represented by the distribution of items in the vicinity of the score. Key scale scores: 550, 650, 750 Review items located near the 3 scale points and interpret performance on the two tests. All items and item specific information were provided. Side by side comparison of maps for designated Traditional- Integrated EOC pairs Compare the distribution of items on each item map Examine pattern of common item performance across EOCs, and relative to unique items within 23
Provide ratings at values of 550, 650, 750, and Overall for each of the following: Course level – Algebra 1 / Mathematics 1 – Geometry / Mathematics 2 – Algebra 2 / Mathematics 3 Aggregate level (end of 3-course sequence) – Traditional Sequence / Integrated Sequence Rating Tasks
Item Map: Geometry vs. Int. Math 2 25
Item Map: Algebra 2 vs. Int. Math 3 26
Ratings following Session 1 Algebra 1 with Mathematics 1 27
Ratings following Session 1 Geometry with Mathematics 2
Group Discussion of Item Maps/Ratings SMEs discussed results and were given the opportunity to change ratings during the 2 nd meeting Second ratings for Algebra 1 / Mathematics 1 indicated less comparability than initial rating Second ratings for Traditional Pathway with Integrated Pathway indicated more comparability than initial ratings 29
Algebra 1 with Mathematics 1 – Responses were close to evenly distributed among ratings of 1 to 3 Algebra 2 with Mathematics 3 – Modal response was (2) For the most part – 87.5% of response were either 1 or 2 Geometry with Mathematics 2 – Modal response was that the math skills were not comparable. – 67% of responses either (3) Somewhat but weakly or (4) No, not at all Aggregate level – Majority of the responses were (2) For the most part, but there are some differences Item Mapping Summary
Results from field test data do not always translate directly to operational administration results. The small sample sizes, especially for the Integrated Mathematics courses, make firm conclusions problematic. Data from operational administrations should result in increased volume, therefore, more stable results should allow for firmer conclusions. Limitations 31
The data suggest separate scales for Geometry and Mathematics 2 – Concordance tables may be a possibility for aligning scores, if common item correlations are high enough; however, this will likely yield concordant scores that differ substantially in terms of meaning, that is, in terms of the underlying knowledge, skills, and abilities needed to obtain each score. For the Algebra 1/Mathematics 1 and Algebra 2/Mathematics 3 comparisons, the data from the smallish sample sizes indicate that using concurrent calibration is not strongly supported. – Depending on Operational results, options for reporting may include linking of the separate IRT scales to support a common reporting scale, or concordance tables to align scores. Conclusions