Lietta Scott, PhD Arizona Department of Education How are States using accessibility and accommodation research to improve their tests? Arizona’s Experience Lietta Scott, PhD Arizona Department of Education
Arizona’s New Online Assessment System In 2015, Arizona started the move of their Paper-Pencil multiple choice assessments to Online. We took the opportunity, with AIR to: Study the effectiveness and validity of an English glossary accommodation for ELs Examine how various item types translate between the two delivery modes.
2015 Glossary Study - Grades 3 and 7 ELA: 2 reading passages w/associated items Math: Grade 3 (6 items) – Grade 7 (13 items) Glossing is different than dictionaries - definitions are context specific Glossaries included both writing and audio file of the English definition Rules for glossing based on SBAC’s (Solano-Flores, 2012)
2015 Glossary Study - Grades 3 and 7 Questions asked: Are they effective? Do they help ELs without changing the construct? How easy is it to implement them?
2015 Glossary Study - Grades 3 and 7 Data: All students who took our statewide assessment online (40%) Randomization of students (study/control) Randomization of items Included with all other field test items (~130) into 7 field test slots per test
2015 Glossary Study - Grades 3 and 7 Results – Math: Negative impact (reduced correct responses) for both EL and Non-EL students in both grades Perhaps glossing words that are irrelevant to getting the item correct is pulling students’ focus from relevant item content.
2015 Glossary Study - Grades 3 and 7 Results – ELA: Grade 3: Negative impact for both EL and Non- EL students Grade 7: Positive impact for EL students and No impact for Non-EL students Perhaps literal interpretation of figurative, metaphorical, or colloquial words was key
Arizona’s 2017 Glossary Research Studied reliability of creating glossary entries Changed glossing rules Gloss “culturally bound language” & unusual usages Didn’t gloss words that were not relevant to solving the particular problem (remove distraction) Still didn’t gloss words would cue the answer Studied all grades, both subjects Added 1 more condition: English glossary with audio plus Spanish translation of the word with audio (80% of AZ ELs)
% of glossed words in common Glossing is harder than it looks. Without reliable, reproducible glossing, how can it have reliable effects? Percent of glossed words in common between two individual or teams of raters Approach to glossing Training % of glossed words in common 1 person 1 week, trained with decision tree 10 3 people - 2/3 must agree on gloss 1 week, trained with decision tree and standard reference guide 48 2/3 must agree on selected data* 1 week. All words listed in spreadsheet. Extract information from reference guide and enter into a spreadsheet 59 * Used in data collection
2017 Glossary Study – ELA Group, Condition Grade 3 4 5 6 7 8 9 10 11 Non EL – Glossary Non EL – Glossary w/ Spanish translation EL – EL – Glossary w/ Negative effect > .95 confidence Positive effect > .95 confidence
2017 Glossary Study – Mathematics Group, Condition Grade 3 4 5 6 7 8 9 10 11 Non EL – Glossary Non EL – Glossary w/ Spanish translation EL – EL – Glossary w/ Negative effect > .95 confidence Positive effect > .95 confidence
2017 Glossary Study – Conclusions Glossaries help Els without affecting non-Els Need to be done reliably & according to studied guidelines In ELA may be a distraction in grades 3 & 4 Effects larger in upper grades, especially in Math. Probability of correct response increase: In lower grades: 1-2 percentage points In upper grades: 10+ percentage points
Item Level Mode Comparability Ran DIF on items by Mode Examined the types of items that showed Mode DIF Explored reasons for Mode DIF by examining how the items display
Mode DIF Rare in Both Subjects ELA items: 1.7% (6 of 350) – All in Grades 3-6 Editing items (drop-down) were harder online The correct response was visible in the box without clicking. They didn’t make a choice since the sentence was already correct. Math items: 4.3% (17 of 391) – Across all Grades Open Equation items were harder than Gridded Response items for all grades Perhaps due to limiting the number of options available for the student to consider.