Download presentation
Presentation is loading. Please wait.
1
Canadian Defence Academy
Are We All On the Same Page? An Exploratory Study of OPI Ratings Across NATO Countries Using the NATO STANAG 6001 Scale* Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May *This research was conducted as an MA Thesis, Carleton University, September 06
2
Presentation Outline Context Research Questions Literature Review
Methodology Results Ratings Raters Scale use Conclusion -
3
NATO Language Testing Context
Standardized Language Profile (SLP) based on the NATO STANDARDIZATION AGREEMENT (NATO STANAG) 6001 Language Proficiency Levels 26 NATO countries, 20 Partnership for Peace (PfP) countries Interoperability is essential
4
Research Questions The overarching research question was: How comparable or consistent are ratings across NATO raters and countries?
5
Research Questions Research questions pertaining to the ratings (RQ1)
Research questions pertaining raters’ training and background (RQ2) Research questions pertaining to the rating process and to the scale (RQ3)
6
Literature Review Testing Constructs Rater Variance
What are we testing? Rater Variance How do raters vary?
7
Methodology Design of study : Exploratory survey
Participants : Recruited at Sofia BILC 05 103 raters from 18 countries and 2 NATO units Control group
8
Methodology Instrumentation & Procedure & Analysis
Rater data questionnaire 2 Oral Proficiency Interviews (OPIs) A & B Questionnaire accompanying each sample OPI
9
Methodology Analysis Rating comparisons Rater comparisons
Original ratings ‘Plus’ ratings Rater comparisons Training Background
10
Methodology Country to country comparisons Rating process
Within country dispersion Rating process Rating factors Rater/scale interaction Scale user-friendliness
11
Results RQ1- Summary Ratings : To compare OPIs ratings in NATO countries, and to explore the efficacy of ‘plus levels’ or plus ratings. Some rater-to-rater differences ‘Plus’ levels brought ratings closer to the mean Some country-to-country differences Greater ‘within-country’ dispersion Low correlation between samples A & B
12
Results All Ratings for Sample A (level 1)
Levels Numbers % 1 46 44.7 1+ 14 13.6 2 40 38.8 2+ 1.9 3 1.0 Total 103 100.0
13
Results All Ratings (with +) for Sample A
Levels Numbers % Within Level 1 range 70 68.0 Within Level 2 range 32 31.1 Within Level 3 range 1 1.0 Total 103 100.0
14
View of OPI ratings sample A
Adjusted scores with ‘pluses’ 60 Within L1 range Within L2 range Within L3 range 50 40 Count 30 60 32 20 10 10 1 within level 1 within level 2 within level 3 Stacked view of A
15
All Countries’ Means for Sample A
19 20 18 17 16 15 Country numbers 14 13 12 11 10 9 8 7 5 4 6 3 2 1 1.00 1.20 1.40 1.60 1.80 2.00 2.20 2.40 Overall Country Mean
16
Results All Ratings for Sample B (level 2)
Levels Numbers % 1 2 1.9 1+ 1.0 47 45.6 2+ 8 7.8 3 34 33.0 3+ 4 Total 96 93.2
17
View of OPI ratings sample B
18
All Countries’ Means for Sample B
19
Samples A & B A Spearman rank-order correlation coefficient ρ = .57
A Pearson product-moment correlation coefficient r = .55 = low statistical correlations between the two sets of data (Samples A & B) = no consistency from raters
20
Results RQ2- Summary Raters: To investigate rater training and scale training and see how (or if) they impacted the ratings, and to explore how various background characteristics impacted the ratings Trained raters scored within the mean, especially for sample B Experienced raters did not do as well as scale-trained raters Full-time raters closer to mean ‘New’ NATO raters closer to mean No difference in ratings btwn NS & NNS raters
21
Tester (Rater) Training
70 60 50 Frequency 40 63.27% 30 20 36.73% 10 none to little substantial to lots
22
Rating B and Tester Training Crosstabulation
Summary of Tester Trg Total Little Lots Score B correct? Yes No Missing Total 14 20 2 36 14 20 2 36 44 14 4 62 58 34 6 98
23
STANAG Scale Training none to little substantial to lots Percent 60.0%
50 40 Percent 30 60.0% 20 40.0% 10 none to little substantial to lots
24
Rating B and STANAG Training Crosstabulation
Summary of STANAG Trg Total Little Lots Rating B correct? Yes No Missing Total 28 24 5 57 14 20 2 36 29 8 1 38 57 32 6 95
25
Years Experience 0 to 1 year 2 to 3 years 4 to 5 years 5 years +
50 40 30 Frequency 49.5% 20 10 19.8% 14.85% 15.84% 0 to 1 year 2 to 3 years 4 to 5 years 5 years +
26
Rating B and 4 Yrs Experience Crosstabulation
Total 3 yrs or less 4 yrs or more Rating B correct? Yes No Missing Total 26 6 3 35 14 20 2 36 34 29 3 66 60 35 6 101
27
Results Raters’ Background
Work in Testing Full-time? Yes (33.0 %) No (65.0 %) Full-time testers more reliable 60% were NNS 53% were from ‘older’ NATO countries
28
‘Old’ & ‘New’ NATO Countries
Rating B Correct? Total Yes No Other/Missing New NATO? Yes No Total 27 54 14 20 2 36 6 26 32 4 2 6 37 55 92
29
‘Old’ & ‘New’ NATO Countries
Summary of Tester Trg Total Little Lots New NATO? Yes No Total 6 23 29 14 20 2 36 30 28 58 36 51 87
30
Results RQ3- Summary Scale: To explore the ways in which raters used the various STANAG statements and rating factors to arrive at their ratings. Rating process did not affect ratings significantly Rating factors not equal everywhere 3 main ‘types’ of raters emerged: Evidence-based Intuitive Extra-contextual
31
Results An ‘evidenced-based’ rating for Sample B (level 2):
This candidate’s performance cannot be rated as 2+. Grammatical/structural control is inadequate and does not rise above (even occasionally) into the upper level. Mispronunciation detracts from the delivery and can be problematic. No evidence of well-controlled but extended discourse. No clear evidence of the use of even some complex structures that might raise the performance to the + level. Finally, there is no evidence that the performance rises and crosses into level 3. (Rater 36)
32
Results An ‘intuitive’ rating for Sample A (level 1): I would say that just about every single sentence in the interpretation of the level 2 speaking could be applied to this man. And because of that I would say that he is literally at the top of level 2. He is on the verge of level 3 literally. So I would automatically up him to a low 3. (Rater 1)
33
Results An ‘extra-contextual’ rating for Sample A (level 1):
I wouldn’t give him a 2 plus but I would give him a 3 minus. I have to admit that I am basing that decision on the fact that by demonstrating he is a high 2 in every single aspect of the description of a level 2, I would give him a sort of vote of confidence that in any job abroad he might have a hard time at first but I think he could handle really working in the language. (Rater 1)
34
Results An ‘extra-contextual’ rating for Sample A (level 1): Yes! I would be happy to give him a 1+. Since we do not use ‘plus levels’ I am afraid that rating him as a clear 1 would disadvantage him and, for this reason, I would rather give him a very low 2. (Rater 20)
35
Results An ‘extra-contextual’ rating for Sample A (level 1):
I got to question 7 and re-read the STANAG document and now I think ‘2’ is more appropriate. (Rater 95) *** Level 3 is the basic level needed for officers in (my country). I think the candidate could perform the tasks required of him. He could easily be bulldozed by native speakers in a meeting, but would hold his own with non-native speakers. He makes mistakes that very rarely distort meaning and are rarely disturbing. (Rater 95)
36
Results Control group:
Comparable ratings to lesser trained group of participants Evidence-based ratings
37
Implications Plus levels beneficial Training uneven
Frequent re-training Different grids Institutional perspectives
38
Limitations & Future Research
OPIs new to some participants Future research could: Get participants to test Investigate rating grids Look at other skills
39
So, are we all on the same page?
Conclusion So, are we all on the same page? YES! BUT… Plus levels were instrumental in bridging gap Training was found to be key to reliability More in-country norming should be the first step toward international benchmarking
40
Thank You! Questions? The full thesis is available on the CDA website
Are We All On the Same Page? An Exploratory Study of OPI Ratings Across NATO Countries Using the NATO STANAG 6001 Scale Julie J. Dubeau The full thesis is available on the CDA website (A condensed article is also forthcoming)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.