Download presentation
Presentation is loading. Please wait.
Published bySheena Powers Modified over 6 years ago
1
Testing the Test – Serbian STANAG 6001 English Language Test
STANAG 6001 Testing Team PELT Directorate, Serbian MOD STANAG 6001 Testing Workshop Brno, Czech Republic, 6 – 8 September 2016
2
General and Specific Concerns
Any kind of testing/examination has some general and some specific points of concern. In general points, relevant to any kind of language examination, we are governed by the set of principles as presented in the Principles of Good Practice for ALTE Examinations (Association of Language Testers in Europe) Specific points of concern arise from the following: STANAG 6001 is a high-stake examination; It is a language proficiency test testing general English in military setting; It is a criterion –referenced test, based on STANAG 6001 table of level descriptors, incommensurate with other criterion-referenced tests (e.g. Cambridge ESOL exams, IELTS, etc.) and language proficiency scales (CEFR, ALTE levels, etc.)
3
Limiting Factors Bearing this in mind, there are many serious constraints when designing the test (including the things beyond your control): What are the actual needs of the particular nation? (NATO member? PfP member? MD member? Test all levels? Test L4?) What kind of test? (Multi-level1-2-3? Bi-level L1/2, L2/3? Single level?) STANAG 6001 language descriptors are uniform, not open to individual/national interpretation Number of test takers per cycle Number of testing cycles per year Testing facilities at your disposal: premises (small/large testing rooms?), amenities (multimedia equipment? PCs/laptops? Headphones/loudspeakers?), staff (Number of invigilators? Trained OPI-ers?), etc.
4
Your Responsibilities
Things you are in control of and can make individual decisions on are the following: Test format (based on the test specifications you designed) Number of questions, type of questions, elicitation techniques, etc. Rating criteria (analytic/holistic? Mixed?), cut-off scores, etc. But, even these decisions are heavily influenced by aforesaid constraints. Whatever your test eventually come to be, it has to meet the following examination qualities: Validity Reliability Impact Practicality
5
Quick Overview of the Serbian STANAG 6001 Test Particulars:
Levels Multilevel (1-2-3) Receptive skills 40-question pen and paper test Type of questions MCQ, T/F, CR, matching Scoring Objective Method Modified REDS, establishing cut-off scores for each level Productive skills Adaptable test with multilevel prompts Ranging from simple questions/tasks to descriptive preludes Subjective/Rater’s judgment (based on analytic scale) Mixed (Analytic-holistic), establishing MAC for each level No. of candidates per testing cycle No. of testing cycles per year 3 - 4 Test results validity 3 years Partial testing /Retesting individual skills not possible
6
Testing the Test Test analyses are done in different modes and at different stages of test development and test administration. 1. Qualitative analysis: questionnaires, feedback forms, comments from both test takers and invigilators/interlocutors, after each pre-testing and official test administration Quantitative analysis: different statistical operations (MS Excel, SPSS) after each pre-testing and official test administration 2. Analysis of individual items: FV, DI, calibration against anchor items for each level, variance, distractor efficiency analysis Analysis of the entire reading /listening test: total score analysis and discrete levels analysis; central tendency mean, median, mode); dispersion (standard deviation, range, variance); distribution: normal/skewed (skewness, kurtosis;) histograms; reliability coefficients (Cronbach’s alpha)
7
Testing the Test 3. qualitative quantitative
Analysis of receptive skills: qualitative usual + verbal protocol quantitative statistical analysis Analysis of productive skills: Feedback from interlocutors/candidates, comments both on and off the record correlations, inter/intra rater reliability 4. Analysis of the test: qualitative after test administration in the form of report quantitative Analysis of the achieved SLPs: after test administration,
8
Testing the Test Relating final test results to: 5.
*when and if possible *ECL/ALCPT scores (reading, listening) Previously achieved SLPs STANAG SLPs acquired abroad (Hungary, Germany...) Pro-achievement test results from MA, intensive courses and similar tests CEFR and other certificates acquired in civilian sector (foreign language schools, the British Council Cambridge ESOL and IELTS certificates, etc.) BAT (at some point hopefully) for external benchmarking purposes and criterion-related validity
9
Scoring Criteria for STANAG 6001 Speaking & Writing Tests
Interlocutor frame (scripted interview) in speaking test enhances standardization of the speaking test and reduces variability amongst different raters. Analytic rating scales enhance reliability in speaking and writing tests due to more consistency in scores and also reduce “rater-candidate interaction“ and bias. Recorded speaking responses and writing responses are cross-rated for higher degree of consistency /reliability.
10
Rating Scale for STANAG 6001 Speaking Test
Candidate no. Speaking task no. Discourse adequacy, coherence and length Fluency, pronunciation and general intonation Lexical competence and accuracy Grammatical competence and accuracy Awarded level
11
Inter-Rater Reliability
Inter-Rater Reliability *calculated on 12 randomly selected independently rated speaking samples Candidate Rater B Rater S Rater N Cand1 2+ 3 2 Cand2 1+ Cand3 Cand4 Cand5 1 0+ Cand6 Cand7 Cand8 Cand9 Cand10 Cand11 Cand12 Correlations Rater B Rater S Rater N Pearson Correlation 1 ,872** ,502 Sig. (2-tailed) ,000 ,096 N 12 ,538 ,071 **. Correlation is significant at the 0.01 level (2-tailed).
12
Scoring Criteria for STANAG 6001 Reading & Listening Tests
Scoring criteria for STANAG 6001 Reading & Listening Comprehension Tests *Adapted REDS method (originally: Sustained = %, Developing = %, Emerging = 40-50%, Random = 0-35%) Total: 40 questions. Maximum: 40 points / 100%. Sustained: Level 1: 8 points out of 10 / 80% Level 2: 11 points out of 15 / 73.3% Level 3: 11 points out of 15 / 73.3%
13
Scoring Criteria for STANAG 6001 Reading & Listening Tests
Level 1 No. of points S 8 - 10 D 6 - 7 E 4 - 5 R 0 - 3 Level 2 0 - 5 Level 3
14
Scoring Criteria for STANAG 6001 Reading & Listening Tests
SIMPLIFIED TABLE FOR AWARDING LEVELS: LEVEL 1 LEVEL 2 LEVEL 3 AWARDED LEVEL SUSTAINED 3 DEVELOPING 2+ EMERGING/RANDOM 2 RANDOM 1+ - 1 0+
15
Statistical Operations in Reading Test Analysis
June 2015 No. of candidates Listening Speaking Reading Writing Average rating* (base levels) 118 2,06 1,59 2,01 1,57 Mode rating 2 Reading Test June 2016 Statistics Cronbach's Alpha Cronbach's Alpha Based on Standardized Items N of Items ,769 ,759 40 *Level 1 (10 items) Level 2 (15 items) Level 3 N Valid 118 Missing Mean 9,88 11,62 6,42 Median 10,00 12,00 6,00 Mode 10 12 4 Std. Deviation ,417 1,912 3,393 Variance ,174 3,657 11,511 Skewness -4,386 -,678 ,226 Std. Error of Skewness ,223 Kurtosis 22,795 ,155 -,966 Std. Error of Kurtosis ,442 Range 3 8 15
16
Statistical Operations in Reading Test Analysis
17
Statistical Operations in Reading Test Analysis
Distribution of candidates’ scores (0 - 15) per level shows some overlapping of outliers, but the majority of scores don’t overlap. (*L1 is excluded due to smaller no. of items (10))
18
Statistical Operations in Reading Test Analysis
Distribution of items’ facility indexes also shows some overlapping.
19
Statistical Operations in Reading Test Analysis
Average facility value per level *(L1 = 98.8 % L2 = 77.5 % L3 = 42.8 %)
20
Comparison Chart *Approximations for comparison purposes, not equations
STANAG 6001 Levels ECL/ALCPT Score CEFR Scale ALTE Levels Cambridge ESOL Certificates IELTS Band score 1 50 – 65 A1 Basic user KET 3.0 – 4.0 A2 2 66 – 85 B1 Independent user PET 4.5 – 5.0 (Threshold) B2 3 FCE 5.5 – 6.5 (Plus) 86 – 100 C1 Competent user 4 CAE 7.0 – 9.0 C2 5 CPE
21
Test Results Correlation (*different test construct)
Candidate No. Rank Last name First name Alcpt score Alcpt value Stanag L, R SLP 1 xxx 85 2 2232 58 1221 3 61 2121 4 73 2222 5 78 6 77 3222 7 79 8 82 2221 9 65 10 11 80 12 86 13 72 14 68 15 62 16 32 1121 17 75 18 19 76 20 39 21 56 22 69 23 70 24 21+21+ 25 55 26 27 63 11+11+ 28 29 30 81 31 59 2122 33 2111 34 48 1111 35 2121+ 36 21+21 37 38 40 1010 Correlation Alcpt value Stanag L, R Pearson Correlation 1 ,675** Sig. (2-tailed) ,000 N 38 **. Correlation is significant at the 0.01 level (2-tailed).
22
SLPs Re-Testing Results
Testing cycle: October 2015 February 2016 July 2016 No. of retested candidates 70 76 56 Confirmed SLP 16 (22.9%) 10 (13.2%) 24 (42.9%) Slightly weaker SLP, at least by a (+) in one of four skills 19 (27.1%) 15 (19.7%) 7 (12.5%) Slightly improved SLP, at least by a (+) in one of four skills 35 (50%) 51 (67.1%) 25 (44.6%) *The results of retested candidates are as expected. There is typically a 3-5 year gap between testing and retesting, during which the majority of candidates have had some language training improving their skills. However, these shifts are not too dramatic: 2222 ↔ ↔222+2 ↔ ↔2+222 ↔ ↔2+22+2 21+21+↔ ↔2221+ ↔ ↔ ↔ ↔21+21 3232 ↔ ↔32+32 ↔ ↔ ↔ ↔32+32+
23
Pretesting Locally at the Military Academy
Selected senior year Military Academy cadets with 4 years of continual English language training The upside: cheap, easy to organize, good sample, testing demographics similar enough The downside: certain limitations due to cadets’ lack of real life and job experience Pretesting abroad currently unavailable due to budget cuts and organizational complexity Pretesting materials are secure because cadets are normally tested in a separate testing session and not eligible for retesting for another 3 years
24
Cooperation with Other Language Professionals
Cooperation with English language professionals, experts and teachers within the system of defence exists on all levels and in all forms (Military Academy Department of Foreign Languages, GS J-7 Training and Doctrine Department – Group for English language training, PELT part-time English language experts and lecturers, etc.) English teachers act as invigilators, interlocutors and expert judges when determining content and face validity, cut-off scores, feedback, etc.
25
Cooperation with Other Functional Units in HR Sector
Reporting the test results Interpreting STANAG 6001 language proficiency levels to language non-professionals Consulting with Personnel departments in MoD and GS, the Centre for Peacekeeping Operations, the Military Academy, The National Defence School, etc. about language-related career matters, language requirements for appointments, attending courses abroad, participation in PK missions, etc.
26
Thank you for your attention
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.