Download presentation
Presentation is loading. Please wait.
1
Introduction to the Validation Phase
Relating language examinations to the Common European Framework of Reference for Languages Gábor Szabó ECML RelEx Workshop Graz, May 2009
2
Suggested Linking Procedures in the Manual
Familiarisation with the CEFR Linking on the basis of specification of examination content Standardisation and Benchmarking Standard setting Validation: checking that exam results relate to CEFR levels as intended
3
What is validity? Does the test measure what it intends to measure?
The degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests. Traditional classification of validity: Content validity Construct validity Criterion related validity Face validity More modern approach: Validity seen as single unitary construct
4
Aspects of validity Content Validity
Operational validity: pilots and pretests Psychometric aspects Procedural validity of standardization Internal validity of standard setting External validation
5
Content validity Does the test accurately reflect the syllabus on which it is based AND reflect the descriptors in the CEFR? Does the content specification reflect all areas to be assessed in suitable proportions?
6
Operational validity Do pilot populations accurately represent the target population of the test? Is the pilot test takers’ performance representative of their true ability? (response validity) Begrippen worden hierna uitgelegd
7
Psychometric aspects Do the test’s psychometric qualities support validity claims? CTT-based results Test-level data Reliability figures Mean, mode, median Standard deviation Measurement error Score distribution Item-level data Facility values Discrimination indices
8
Psychometric aspects Do the test’s psychometric qualities support validity claims? IRT-based results
9
Psychometric aspects IRT
Low ability High ability Px I I I3 Low difficulty High difficulty
10
Psychometric aspects IRT
Item Y Low difficulty High difficulty – – – – P P P P P P P P P9 Low ability High ability
11
Psychometric aspects IRT
X XXXXXXX + XX XXXXXXXXX + XX MEASURE | MAP OF PERSONS AND ITEMS PERSONS-+- ITEMS XXXXXXXXXXX | XXXXXX XXXXXXXXXX | XXXXX XXXXXXXXXX | XXXX | XXX XXXXXXXXXXXXX | XX XXXXXXX | XX X | XX | XX | X X | X XXXXXXX | X XXXXXX | X XX + X X | X | XXXXX | XXXX | XX |
12
Psychometric aspects IRT
ITEMS STATISTICS: ENTRY ORDER | NUM SCORE COUNT MEASURE ERROR|MNSQ INFIT|MNSQ OUTFT|PTBIS| NAME | | | | | | .25| 1 | | | | .37| 2 | | | | .19| 3 | | | | .26| 4 | | | | .14| 5 | | | | .26| 6 | | | | .11| 7 | | | | .20| 8 | | | | .18| 9 | | | | .06| 10
13
Psychometric aspects Do the test’s psychometric qualities support validity claims? IRT-based results Item difficulty figures Person ability figures Fit statistics Items persons
14
Procedural validity of standardization
Has the procedure of standard setting had the effects as intended? Was the training effective? Did the judges feel free to follow their own insights?
15
Internal validity of standard setting
Are the judgements of the judges to be trusted? Are judges consistent within themselves? Are judges consistent with each other? Is the aggregated standard to be considered as the definite standard?
16
External validation Establishing the validity of a test in relation to an external point of reference (CEFR) Correlation analysis Validation of standardization Teacher judgements Application of anchor tests
17
External validation Correlation analysis
18
External validation Correlation analysis
19
External validation Correlation analysis
20
External validation Validation of standardization
To what extent is the judges’ standard valid? Decison tables Criterion test Test to be validated Contingency value
21
External validation Validation of standardization – decision tables
Test 1 B2 - 152 12 8 128 Test 2 = /3=93,333%
22
External validation Teacher judgements
To what extent do decisions based on test results coincide with teacher judgements? Decison tables Box plots The least certain form of external empirical validation Heavily dependent on teachers’ interpretation of CEFR descriptors
23
External validation Application of anchor tests
To what extent do item logit values coincide in the anchor test and the test to be validated? To what extent do candidates’ ability logit values coincide based on the two tests? Checking item and person fit
24
External validation Application of anchor tests
test to be validated
25
External validation Application of anchor tests
26
External validation An example
Design Applying tests already linked to the CEFR as point of reference ECL task Reference task Estimating IRT-based item difficulties Comparing item diffculty logits in the two tasks
27
External validation An example
English B2 reading
28
External validation An example
29
External validation Potential problems with empirical external validation Availability of reference tests ”less frequently taught languages” (LFTL) Acceptance of reference tests Increasing criticism of reference tests Problems with reference tests Often little or no evidence of actual link between reference tests and CEFR Task properties (item number)
30
External validation German B2 reading
31
External validation Suggested solutions Availability Acceptance
Support to LFTL to develop and share tasks (ECML) Acceptance Setting up transparent, consensus-based criteria for tasks to be accepted as reference points (would handle task quality issue) Emphasize candidate-centered methods But only where judges’/teachers’ familiarity with CEFR is documented and guaranteed
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.