Download presentation
Presentation is loading. Please wait.
Published byElfreda Poole Modified over 6 years ago
1
EALTA MILSIG: Standardising the assessment of writing across nations
Ülle Türk Language Testing Unit Estonian Defence Forces STANAG 6001 testing conference 7-9 July 2009 Zagreb, Croatia
2
Outline Background Aims of the project Procedure Standard setting
Results Conclusions
3
Background: EALTA EALTA = European Association for Language Testing and Assessment Established in 2004 as a professional association for language testers in Europe. Mission: to promote the understanding of theoretical principles of language testing and assessment, and the improvement and sharing of testing and assessment practices throughout Europe. Annual conferences Discussion lists specialist lists
4
Background: MILSIG March 2008 – MILSIG mailing list established: EALTA conference in 2008: a meeting of language testers working in the military participating countries/ institutions: Denmark, Estonia, Latvia, Lithuania, SHAPE, Slovenia, Sweden agreement to co-operate in standardising writing assessment
5
Aims of the project To select a number of sample scripts that
have been written in response to a variety of prompts demonstrate English language proficiency at STANAG levels 1-3 (4) could later be used as benchmark performances in assessing writing and in rater training sample performances for teachers and test takers To study the possibility of carrying out standardisation via .
6
Procedure and timeline
Each participating country/institution selects 4 scripts, including problem scripts, at levels 1-3 – end of May Scripts are collected, coded and sent to all participants – middle of June Scripts are marked following the procedures established in each country – end of September STANAG level descriptors used Weak, standard and strong performances at each level identified Comments provided Results analysed; decisions taken
7
Participants Denmark (1) Estonia (5) Latvia (4) Lithuania (3)
SHAPE (2) Slovenia (5)
8
Standard setting procedures
Council of Europe: A manual Relating Language Examinations to the Common European Framework of Reference for Languages: Learning, Teaching, Assessment (CEFR) Pilot version: September 2003 Final version: January 2009 ‘Relating an examination or test to the CEFR can best be seen as a process of “building an argument” based on a theoretical rationale.’ (p 9) Familiarisation Specification Standardisation training/ benchmarking Standard setting Validation
10
Table 5.2: Time Management for Assessing Written Performance Samples
Introductory tasks (Familiarisation) 60 min Working with Standardised Samples: Phase 1: Illustration with circa 3 illustrative performances Break Phase 2: Controlled practice with circa 3-5 illustrative performances Phase 3: Free Stage with circa 3-5 illustrative performances Lunch Benchmarking Local Samples: Individual rating and group discussion of high, middle and low performances Individual rating of circa 5 more performances
11
Familiarisation: Raters rating descriptors
Mean correlation: 0.89 (SD =.04) Range: 0.83 (R14) to 0.98 (R05)
12
Task types and original ratings
27 scripts: 12 letters: 4 (+ 5) essays: 1 report: 1 memorandum: A first draft of a lecture (2): Paper for a newsletter (1): Paper/letter/essay (1): 6 L1, 14 L2, 7 L3 3 L1, 8 L2, 1 L3 2 L1, 4 L2, 3 L3 L3 L2 1 L2, 1 L3 L1
13
Rating scripts Task: Use STANAG 6001 writing descriptors, NOT your own rating scale. If the script was written for a STANAG test in your country/ institution, which level would it be awarded? Do you consider it a weak, standard or strong performance at the awarded level? Why?
14
Analysis of ratings Coding: L1 weak = 1 L1 standard = 2 L1 strong = 3
15
Scripts recoded MILSIGPR_01–MILSIGPR_12a = MSP-01–MSP-12
MILSIGPR_12b = MSP-13 MILSIGPR_12c = MSP-14 MILSIGPR_12d = MSP-15 MILSIGPR_12e = MSP-16 MILSIGPR_12f = MSP-17 MILSIGPR_12g = MSP-18 MILSIGPR_12h = MSP-19 MILSIGPR_13 = MSP-20 MILSIGPR_14 = MSP-21 etc
16
Script ratings Mean rating: 2.8–7.8 (St dev: 0.00-1.47)
1-3 (L1): 1 script (6 scripts) 4-6 (L2): 24 scripts (12 scripts) 7-9 (L3): 2 scripts (7 scripts) 15 scripts (55.6%) – agreement on the level, though usually not on whether it is weak, standard or strong performance at that level
17
Three examples MILSIGPR_07 (MSP-07) MILSIGPR_13 (MSP-20)
A lot of grammatical mistakes, spelling, very basic range. Not enough for Level 2. MILSIGPR_13 (MSP-20) task at level 3, but the writing is not coherent, very incorrect, sometimes difficult to understand the meaning and very uninteresting – getting even worse towards the end MILSIGPR_14 (MSP-21) well written with control of grammar, good vocabulary and abstract concepts and arguments clearly conveyed, the person might be able to write at a high level 3, but does not quite prove it here
18
Mean ratings for scripts
Mean rating: 5.2 (SD = 1.44)
19
Script ratings by country
Country/ institution Mean St dev C01 5.3 1.27 C02 5.0 1.26 C03 5.8 1.30 C04 5.1 1.64 C05 1.70 C06 1.48
20
Correlations between country ratings
0.741 0.670 0.761 0.644 0.768 0.746 0.584 0.692 0.585 C06 0.730 0.786 0.845 0.715 N = 27; N = 23 All significant at 0.01 level
21
Mean ratings by task type
Range Mean St dev Letter (12) 4.6 1.03 Essay (9) 5.4 1.33 Other (6) 5.9 1.21
22
Conclusions Such a project is indeed needed!
23
Way forward 1 L1 script, 12 L2 scripts, 2 L3 scripts
Analysis of scripts good benchmarks? Collecting more scripts, particularly at L3 Scripts based on a variety of task types Did we start at the wrong end? Looking at scripts that caused disagreement Can we reach agreement? What features make them problematic? Expanding the circle to include more countries
24
References EALTA website: http://www.ealta.eu.org
Council of Europe Relating Language Examinations to the Common European Framework of Reference for Languages: Learning, Teaching, Assessment (CEFR): _EN.asp
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.