Identifying the gaps in state assessment systems CCSSO Large-Scale Assessment Conference Nashville June 19, 2007 Sue Bechard Office of Inclusive Educational Assessment Ken Godin
Research Questions Of all the students who are not proficient, how can states identify those who are in the assessment gap? Who are the students in the gaps, what are their attributes, and how do they perform?
Gap identification process Conduct exploratory interviews with teachers to identify the assessment gaps Review student assessment data Review teacher judgment data Operationalize gap criteria Conduct focused teacher interviews to confirm gap criteria Parker and Saxon: Teacher views of students and assessments Bechard and Godin: Finding the real assessment gaps
Data sources State assessment data – grade 8 mathematics results from two systems General large-scale test results Demographics (special programs, ethnicity, gender) Teachers’ judgments of students’ classroom work Student questionnaires completed at time of test Accommodations used at time of test State data bases for additional student demographic data Disability classification Free/reduced lunch Attendance Student-focused teacher interviews
Why use teacher judgment of students’ classroom performance? Gap 1: the test may not reflect classroom performance Teachers see students performing proficiently in class, but test results are below proficient. Gap 2: the test may not be relevant for instructional planning Teachers rate students’ class work as low as possible and test results are at “chance” level. No information is generated on what students can do.
Teacher judgment instructions The instructions were clear that this was to be a judgment of the student’s demonstrated achievement on GLE-aligned academic material in the classroom, not a prediction of test performance. NECAP: The teacher judgment field consisted of 12 possibilities – each of the 4 achievement levels had low, medium, and high divisions. MEA: The teacher judgment field consisted of 4 possibilities - one possibility per achievement level. (For comparisons across the two systems, we used a collapsed version of the NECAP judgments (down to the 4 achievement levels).
Research on validity of teacher judgment While there are some conflicting results, the most accurate judgments were found when: teachers were given specific evaluation criteria levels of competency were clearly delineated criterion-referenced tests in mathematics or reading were the matching measure criterion-referenced tests reflected the same content as did classroom assessments judgments were of older students who had no exceptional characteristics, and teachers were asked to assign ratings to students, not to rank-order them
Validation of teacher judgment data from NECAP and MEA Data collected to establish as “Round 1” cutpoints (of 3 rounds) during standard-setting. Validation studies were conducted which asked: Were there differences between the sample of students with non-missing teacher judgments data and the rest of the population? Were there suspicious trends in the judgment data suggesting that teachers did not take the task seriously? How did teacher judgments compare with students’ actual test scores? Results of these investigations were considered supportive of using the teacher judgment data for standard setting.
Teacher judgment vs. test performance (NECAP)
Teacher judgment vs. test performance (MEA) † Students within error of bottom of scale (i.e., chance score) is subset of Achievement Level 1.
Operationalizing the gap definitions using teacher judgment
Student questionnaires (answered after taking the test) 1. How difficult was the mathematics test? A. harder than my regular mathematics schoolwork B. about the same as my regular mathematics schoolwork C. easier than my regular mathematics schoolwork 2. How hard did you try on the mathematics test? A. I tried harder on this test than I do on my regular mathematics schoolwork. B. I tried about the same as I do on my regular mathematics schoolwork. C. I did not try as hard on this test as I do on my regular mathematics schoolwork
Accommodations (used during the mathematics test) NECAP: 16 accommodations listed by category: Setting Scheduling/timing Presentation formats Response formats MEA: 21 accommodations listed by category: Setting Scheduling Modality Equipment Recording
Student-focused teacher interviews Student profile data math test scores (both overall and on subtests) specific responses to released math test items student’s responses to the questionnaire special program status accommodations used during testing Teacher interview questions Questions regarding perceptions of the students in each gap on various aspects of gap criteria, 17 Likert scale questions on the student’s class work and participation in classroom activities.
Student-focused teacher interview samples NECAP sample: 20 8th grade math and special ed teachers 7 schools across three states (NH, RI, and VT). 51 students: gap 1=19, gap 2=18, and comparison group=14. MEA sample: 7 8th grade math and special ed teachers 3 schools 14 students: gap 1=4, non-gap 1=3, gap 2=2, non-gap 2=5, and comparison group=0.
Results: Percentages of students in the gaps (NECAP) Gap 2 and non-gap 2 percentages are different when fine or gross grained ratings are used.
Results: Percentages of students in the gaps (MEA)
Accommodations use (NECAP) Students in gap 1 were significantly less likely to use accommodations than students in non-gap 1. Only a small percentage of students in gap 1 used any accommodations at all. The majority of students in both gap 2 and non-gap 2 used one or more accommodations.
Accommodations use (MEA) Similar patterns of accommodations use are seen for gap 1 on the MEA as in NECAP.
Performance of students in gap 1 compared to non-gap 1 on the NECAP + Statistically higher than expected - Statistically lower than expected
Performance of students in gap 1 compared to non-gap 1 on the MEA + Statistically higher than expected - Statistically lower than expected
Special program status of students in gap 1 ( NECAP) The majority of students in gap 1 were in general education. Students with IEPs were under-represented in gap 1 and over- represented in non-gap 1. + Statistically higher than expected - Statistically lower than expected
Special program status of students in gap 1 (MEA) There were similar gap 1 compositions in MEA. + Statistically higher than expected - Statistically lower than expected
Disability designations in gap 1 Learning disabilities (NECAP) Gap 1: 57.7% of the IEP gap 1 group (n=208) Non-gap 1: 49.7% of the IEP non-gap 1 group (n=860) Comparison: 49.2% of the IEP comparison group (n=83) Total population: 52% of students with IEPs (N=4,465) Disability designations only seen in non-gap 1 : NECAP: Students with learning impairments, deafness, multiple disabilities and traumatic brain injury MEA: Students with learning impairments and traumatic brain injury
Additional characteristics of students in gap 1 compared to non-gap 1 Gap 1 students: Were more likely female and white Had the fewest absences Had higher SES Found the state test about the same level of difficulty as class work Exhibited academic and mathematics-appropriate behaviors in class
Performance of students in gap 2 on the test (NECAP and MEA) By definition, students in both gap 2 and non- gap 2 scored no better than chance on the assessment.
Special program status of students in gap 2 (NECAP) The majority of students in gap 2 and non-gap 2 were students with IEPs.
Special program status of students in gap 2 (MEA) MEA results show the majority of the students in gap 2 had IEPs. The percentages of students in general education in gap 2 and non-gap 2 groups are higher than in NECAP.
Disability designations in gap 2 Learning disabilities: Fewer than half of the students in gap 2 groups had learning disabilities in both systems Other disability designations differed between the two systems. NECAP Students who were deaf/blind and those with multiple disabilities were only found in gap 2. Students with hearing impairments, deafness and traumatic brain injury were only found in non-gap 2. MEA Students with hearing impairments were only in gap 2. Students with visual impairments or blindness were only in non-gap 2.
Additional characteristics of students in gap 2 compared to non-gap 2 Students in gap 2 were very similar to students in non-gap 2 on most variables. Students from both groups felt that the test was as hard as or harder than their schoolwork. They tried as hard as or harder on the test as in class. They used mathematics tools in the classroom (e.g., calculators).
Summary: How many students are in the gaps? 10.9% % of the total student population in two systems are in gaps 1 & 2. NECAP Gap 1 = 8.6% Gap 2 = 2.3% MEA Gap 1 = 7.1% Gap 2 = 4.3%
Summary We found substantial differences between the composition of the gap 1 groups, which held in both systems. Gap 1 students may have characteristics and behaviors that mask their difficulties. Non-gap 1 students are those generally thought to be in the “achievement gap”.
Summary (cont.) Low performing students in gap 2 and non- gap 2 share many characteristics. Their extremely low performances in both classroom activities and the test raise issues about the relevancy of the general assessment for them.
Conclusions For students in gap 1, increase focus on classroom supports and training on how to transfer their knowledge and skills from classroom to assessment environments. For students in non-gap 1, examine expectations and opportunities to learn. Providing a different test based on modified academic achievement standards is premature. Students with IEPs in gap 2 and non-gap 2 may benefit from the 2% option for AYP and an alternate assessment based on modified academic achievement standards (AA-MAAS). There will be challenges designing a test based on MAAS that is strictly aligned with grade level content.