Introduction to the Validation Phase

Introduction to the Validation Phase
Relating language examinations to the Common European Framework of Reference for Languages Gábor Szabó ECML RelEx Workshop Graz, May 2009

Suggested Linking Procedures in the Manual
Familiarisation with the CEFR Linking on the basis of specification of examination content Standardisation and Benchmarking Standard setting Validation: checking that exam results relate to CEFR levels as intended

What is validity? Does the test measure what it intends to measure?
The degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests. Traditional classification of validity: Content validity Construct validity Criterion related validity Face validity More modern approach: Validity seen as single unitary construct

Aspects of validity Content Validity
Operational validity: pilots and pretests Psychometric aspects Procedural validity of standardization Internal validity of standard setting External validation

Content validity Does the test accurately reflect the syllabus on which it is based AND reflect the descriptors in the CEFR? Does the content specification reflect all areas to be assessed in suitable proportions?

Operational validity Do pilot populations accurately represent the target population of the test? Is the pilot test takers’ performance representative of their true ability? (response validity) Begrippen worden hierna uitgelegd

Psychometric aspects Do the test’s psychometric qualities support validity claims? CTT-based results Test-level data Reliability figures Mean, mode, median Standard deviation Measurement error Score distribution Item-level data Facility values Discrimination indices

Psychometric aspects Do the test’s psychometric qualities support validity claims? IRT-based results

Psychometric aspects IRT
Low ability High ability Px I I I3 Low difficulty High difficulty

Item Y Low difficulty High difficulty – – – – P P P P P P P P P9 Low ability High ability

ITEMS STATISTICS: ENTRY ORDER | NUM SCORE COUNT MEASURE ERROR|MNSQ INFIT|MNSQ OUTFT|PTBIS| NAME | | | | | | .25| 1 | | | | .37| 2 | | | | .19| 3 | | | | .26| 4 | | | | .14| 5 | | | | .26| 6 | | | | .11| 7 | | | | .20| 8 | | | | .18| 9 | | | | .06| 10

Psychometric aspects Do the test’s psychometric qualities support validity claims? IRT-based results Item difficulty figures Person ability figures Fit statistics Items persons

Procedural validity of standardization
Has the procedure of standard setting had the effects as intended? Was the training effective? Did the judges feel free to follow their own insights?

Internal validity of standard setting
Are the judgements of the judges to be trusted? Are judges consistent within themselves? Are judges consistent with each other? Is the aggregated standard to be considered as the definite standard?

External validation Establishing the validity of a test in relation to an external point of reference (CEFR) Correlation analysis Validation of standardization Teacher judgements Application of anchor tests

External validation Correlation analysis

External validation Validation of standardization
To what extent is the judges’ standard valid? Decison tables Criterion test Test to be validated Contingency value

External validation Validation of standardization – decision tables
Test 1 B2 - 152 12 8 128 Test 2 = /3=93,333%

External validation Teacher judgements
To what extent do decisions based on test results coincide with teacher judgements? Decison tables Box plots The least certain form of external empirical validation Heavily dependent on teachers’ interpretation of CEFR descriptors

External validation Application of anchor tests
To what extent do item logit values coincide in the anchor test and the test to be validated? To what extent do candidates’ ability logit values coincide based on the two tests? Checking item and person fit

test to be validated

External validation An example
Design Applying tests already linked to the CEFR as point of reference ECL task Reference task Estimating IRT-based item difficulties Comparing item diffculty logits in the two tasks

English B2 reading

External validation Potential problems with empirical external validation Availability of reference tests ”less frequently taught languages” (LFTL) Acceptance of reference tests Increasing criticism of reference tests Problems with reference tests Often little or no evidence of actual link between reference tests and CEFR Task properties (item number)

External validation German B2 reading

External validation Suggested solutions Availability Acceptance
Support to LFTL to develop and share tasks (ECML) Acceptance Setting up transparent, consensus-based criteria for tasks to be accepted as reference points (would handle task quality issue) Emphasize candidate-centered methods But only where judges’/teachers’ familiarity with CEFR is documented and guaranteed

Introduction to the Validation Phase

Similar presentations

Presentation on theme: "Introduction to the Validation Phase"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to the Validation Phase

Similar presentations

Presentation on theme: "Introduction to the Validation Phase"— Presentation transcript:

Similar presentations

About project

Feedback