Presentation is loading. Please wait.

Presentation is loading. Please wait.

National Conference on Student Assessment Austin, Texas June 28, 2017

Similar presentations


Presentation on theme: "National Conference on Student Assessment Austin, Texas June 28, 2017"— Presentation transcript:

1 National Conference on Student Assessment Austin, Texas June 28, 2017
Reporting Overall Language Proficiency Aligned with 2012 English Language Arts Development (ELD) Standards National Conference on Student Assessment Austin, Texas June 28, 2017 Overall session: Introductory Comments about new standards, ESSA requirements, challenges and goals, and Study that CDE has done looking at reporting domains versus symbolic (grades K-2) versus oral and productive / receptive (Grades 3 and above). Both ELPA21 and ELPAC have been designed with the standards based on integrated skills in mind. Two programs one that is more mature, one that is under development. Comments prior to my talk Copyright © 2017 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS

2 Reporting Overall Language Proficiency Aligned with Integrated Skills Represented in 2012 ELD Standards Timeline for English Language Proficiency Assessments for California (ELPAC) Activities Assessment Design Standard Setting Design Score Reporting For context, both the timeline and ELPAC design will give us some points of comparison between ELPAC21 and ELPAC, and may guide the discussion later. We are similar, I think in design and the goals for thinking about integrated skills. In my presentation, I will focus a bit on the SS method that incorporates the integration of skills into the reporting of performance levels, then provide a brief glimpse into what we have just ahead on ELPAC.

3 ELPAC Development Timeline
Summative Assessment Field Test (SA FT) SA FT Analyses SA Standard Setting October 2017 State Board Approval Initial Assessment (IA) Field Test IA Standard Setting February 2018 Operational Assessment Spring/Fall 2018 Slide 2 is Orange is YOU ARE HERE

4 ELPAC Development Timeline
Test / Blueprint Design PLD Development Item Development Scoring, Item analysis Dimensionality Analysis Vertical Scale Development Summative Assessment Standard Setting Workshop State Board Approval Initial Assessment Scoring and Analyses Forms Assembly Standard Setting Workshop Operational Assessment Summative Assessment: February 2018 Initial Assessment: Summer 2018 First arrow, where we are in the work. 2nd arrow, where we will focus today

5 ELPAC Design

6 Overview of ELPAC Design
Aligned to the new, more challenging set of standards (2012 California ELD) Administered at seven grades/grade spans: kindergarten (K), 1, 2, 3–5, 6–8, 9–10 and 11– 12 Administered in four domains: Listening, Speaking, Reading, and Writing Paper-based assessment Two separate assessments each year: Initial Assessment (IA) and Summative Assessment (SA) The IA will be shorter than the SA Change from CELDT which had one test administered to students in Kindergarten and grade 1, ELPAC met the need we heard from CA educators and the ELL field (*is that true, from the EL field?)

7 Overview of ELPAC (continued)
K and grade one administered one-on-one Grades 2–12 group administration for Listening, Reading, and Writing Speaking Administered to all students one-on-one Scored by a trained test examiner in real time Listening Grades K–2: read aloud to students by a test examiner Grades 3–12: read aloud to students via recorded audio Writing All constructed-response items Centrally scored

8 Overview of ELPAC Design
ELPAC task types are set within a communicative context. Some task types involve integrated skills—e.g., speaking with listening. Test is designed to report four performance levels on the composite (overall) score. Recommendation for reporting scales will follow field test analyses. ERIC -- This is a place for discussion. What reporting scales will be reported? Will we report four performance levels for each student on two subscales (e.g., Listening and Speaking; Reading and Writing) Four performance levels on the composite or overall or total score were planned. The next bullet says the REPORTING SCALES ARE TBD. So, I changed the bullet to say, on the composite (total) score. And I will point out that we are analyzing the FT data to see if we can report, for example, R+W and S+L, and it has not been determined if we would report performance levels for those sub-scores at the individual, or aggregate, or if that would be used for accountability. COULD ANY OF THOSE THREE BE RIGHT? ERIC -- the performance levels we develop at standard setting are called performance levels. Proficiency levels are called out in the standards and I think it is safer to keep those two terms in those two places, b/c we do not have the same Number or Name for the levels as are used in the standards. WHAT DO YOU SAY?

9 Standard Setting Design

10 General Performance Level Descriptors (PLDs)—Excerpts
Description 4 English learners at this level have fully functional receptive (listening and reading) and productive (speaking and writing) skills. 3 English learners at this level have moderately functional receptive (listening and reading) and productive (speaking and writing) skills. 2 English learners at this level have somewhat functional receptive (listening and reading) and productive (speaking and writing) skills. 1 English learners at this level have limited functional receptive (listening and reading) and productive English (speaking and writing) skills. At a high level, there are four PLDs, and I display only the gist her to show that Integrated skills are considered in the PLDs

11 Performance Level Descriptors
Level Level Level Level 4 Proficient The general PLDs are structured so that for the summative assessment, a recommendation to consider an English learner for reclassification would be based on the threshold between level 3 and level 4. For the IA, a student whose IA results fall at or above the threshold between level 3 and level 4 would be considered Initial Fluent English Proficient (IFEP). These recommendations for IFEP and reclassification will be reconsidered by the SBE upon adoption of the specific threshold scores.

12 General PLDs Level Description 4
English learners at this level have fully functional receptive (listening and reading) and productive (speaking and writing) skills. They can use English to learn and communicate in meaningful ways that are appropriate to different tasks, purposes, and audiences in a variety of social and academic contexts. They may need occasional linguistic support to engage in familiar social and academic contexts; they may need light support to communicate on less familiar tasks and topics. 3 English learners at this level have moderately functional receptive (listening and reading) and productive (speaking and writing) skills. They can sometimes use English to learn and communicate in meaningful ways in a range of topics and content areas. They need light to minimal linguistic support to engage in familiar social and academic contexts; they need moderate support to communicate on less familiar tasks and topics. Point out highlighted language

13 Summative Assessment Standard Setting
New and different process for ELPAC Use new ELD standards integrated approach as foundation for standard setting design Two rounds of judgments for each domain Reading and Listening: Bookmark Method Speaking and Writing: Performance Profile Method Round three addresses integration across domains Round Three: Composite Score Judgments Reading, Writing, Listening, and Speaking considered simultaneously—Compensatory Model Holistic judgments on score profiles Recommend three threshold scores

14 Reading and Listening Bookmark Method—Two rounds includes item judgments, feedback, and impact data Result of Round 2—Recommended cut scores for each domain (Reading and Listening)

15 Writing and Speaking Performance Profile Method—Two rounds
Panelists listen to or read through student samples representative of the range of total scores. Profiles represent most frequently occurring patterns (outlier profiles are not used). Judgments made on student performance across all tasks, not item level. Result of Round 2: Recommended cut scores for each domain (Writing and Speaking)

16 Sample Speaking Score Profiles
TASKS (Grade 6–8) Talk about a Scene Speech Functions Speaking—Support an Opinion Present and Discuss Information Summarize an Academic Presentation Total Score 1 2 3 5 6 8 4 10 12 14 16 7 19 21 23 9 26

17 Sample Speaking Score Profiles
TASKS (Grade 6–8) Talk about a Scene Speech Functions Speaking—Support an Opinion Present and Discuss Information Summarize an Academic Presentation Total Score 1 2 3 5 6 8 4 10 12 14 16 7 19 21 23 9 26

18 Round 3 Judgments for Total Score
Panelists consider holistically what pattern across domains resulting in total scores would be acceptable cut scores. Initial discussion is based on Round 2 domain judgments. Sum of Reading, Listening, Writing, and Speaking is placed on a temporary reporting scale. Panelists consider Patterns of domain scores resulting in Round 2 suggested cut scores and adjacent scores; Impact based on the recommended total score cut scores on ELPAC test takers (percent in each level); and Pattern of English language arts/literacy (ELA) scores in relation to ELPAC.

19 Round Three: Composite Score Judgments
Round 2 Threshold Score Recommendations by Domain Reading Listening Speaking Writing Sum Level 2 12 10 14 11 47 Level 3 17 21 66 Level 4 25 28 95 For example this might be the initial recommended total score, a profile of the four domain recommended cuts, summed to a total. Panelists may start to consider the nature of the integrated score, and (next slide)

20 Review Composite Profiles for Each Level at Adjacent Total Scores
Frequently Occurring Profiles Total Reading Listening Speaking Writing Sum Level 3 20 12 13 65 19 14 17 21 66 18 22 15 67 16 Panelists will also be presented with the range of possible combinations of the domain scores summing to the same total score as well as possible combinations of adjacent total scores (e.g., +/- 1 score on each domain). Level 3 only is displayed. They will be given an opportunity to consider whether they would revise any of the cut scores on any of the individual domain scores by considering the combinations, and discuss rationales for this judgment. Considerations may include a desire to limit the extent to which a high score in one domain can compensate for a low score in another domain.

21 Impact Data: Based on ELPAC Performance
Performance Level  Percent Classified Level 1 28 Level 2 22 Level 3 32  Level 4 18  Panelists will consider two types of Impact Data. The traditional data is the percent of students who would fall into each level based on round 2 results, prior to round 3 judgments *Data are for illustration purposes only, not based on actual scores.

22 ELA Performance Levels
Impact Data Comparing ELA and ELPAC Performance Levels* ELA Performance Levels ELPAC Levels 1 2 3 4 12.5 10.00 2.50 0.00 8.75 11.25 5.00 7.50 3.75 *Data are for illustration purposes only, not based on actual scores.

23 Summary of Standard Setting Judgments
Use new ELD standards integrated approach across four domains as foundation for standard setting design Explicitly acknowledging compensatory nature of the total composite score Including the ELA score as a point of comparison for ELPAC judgments Panelists are asked to consider a total score as a composite across Reading, Listening, Speaking and Writing This method for standard setting allows panelists to consider what might be the lower bound for each domain score without requiring four “passing scores.”

24 Work in the Near Future: Dimensionality & Vertical Scaling
Prior to the standard setting and score reporting, analyses will be conducted to consider the possible reporting scales.

25 Dimensionality of the Composite
Item-level factor analytic approach using multidimensional item response theory Evaluate four competing models Two two-factor models— Oral language skills and written language skills Receptive and productive language skills Single-factor model—all four language skills psychometrically indistinguishable Four-factor model—each of the four language skills considered unique skills Models fitted to each grade/grade span and considered as a whole to facilitate vertical scales Finally after these analyses are complete. hypothesized models will be fitted to each grade/grade span test. The outcome for the model fits across grade/grade spans will be considered as a whole to facilitate the anticipated vertical scales.

26 Discussion


Download ppt "National Conference on Student Assessment Austin, Texas June 28, 2017"

Similar presentations


Ads by Google