Download presentation
Published byBarnaby Stewart Modified over 9 years ago
1
VALIDITY AND TEST VALIDATION Prepared by Olga Simonova, Inna Chmykh, Svetlana Borisova, Olga Kuznetsova Based on materials by Anthony Green 1
2
Validity ABC Test of English Results Ivana 45% Irina 78%
Which student is better at English?
3
Validity Language Ability Assessment tasks
Some aspects may not be tested: Construct under-representation T Assessment tasks
4
Validity Language Ability
Some abilities that are important to success in a test may not be connected to real-world language abilities: ability to cope with exam stress; awareness of how multiple-choice questions are written; willingness to guess etc. These are construct irrelevant factors.
5
What is validity? Tests are tools for helping us to make good decisions. Construct relevance: a test of maths (even if it’s very reliable) can’t tell us about someone’s ability to sing; a test of written grammar can’t tell us much about someone’s ability to hold a conversation. Construct representation: does the test cover all aspects of the relevant abilities?
6
What is validity? ‘validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests’ American Educational Research Association et al. (1999) This means that test results can be valid for one purpose and for one particular population of test takers, but not for others. A test may be valid for placement purposes on a general language course, but not for employment selection.
7
Building a validation argument
What do we want the results to mean? What evidence can we collect to find out if scores really support this interpretation? evaluation – the test taker’s performance is a fair reflection of his/her abilities; generalization – similar scores would be obtained if the test taker was given a different form of the test, or if the raters scoring his/her performance were different; explanation – the test reflects a coherent theory of language ability; utilisation – the tested abilities are relevant to the decision being made about the test taker.
8
Validation in the assessment cycle:
at different stages in the cycle, different questions need to be answered; different types of validity may be more relevant at each stage; tests made for different purposes raise different issues.
9
Building a validation argument:
Evaluation – the test taker’s performance is a fair reflection of his/her abilities. Test form and administration. Generalization – similar scores would be obtained if the raters scoring his/her performance were different. Test score and rating scales. Explanation – the test reflects a coherent theory of language ability. Specification. Utilisation –the tested abilities are relevant to the decision being made about the test taker. Test purpose and target language use domain.
10
VALIDITY AND TEST VALIDATION
10
11
Validity in test design
“Tests for the measurement of language abilities must be constructed according to a coherent validity framework based on the latest developments in theory and practice.” (Weir, 2005)
12
Socio-cognitive approach
(O’Sullivan & Weir, 2010) CONTEXT VALIDITY COGNITIVE TEST TASK PERFORMANCE SCORING VALIDITY CONSEQUENTIAL VALIDITY CRITERION-RELATED VALIDITY
13
Content (context) validity
Content validity is based on subject experts' judgments of test content. Does the content of the test adequately cover all the aspects of language ability we are interested in for making this decision?
14
Content (context) validity
A test is said to have content validity if its content constitutes a representative sample of the language skills, structures, etc. with which it is meant to be concerned. (Hughes, 2005) The term content validity was traditionally used to refer to the content coverage of the task. Context validity is preferred as a more inclusive superordinate which signals the need to consider the discoursal, social and cultural context as the linguistic parameters under which the task is performed (its operations and conditions). (Weir and Shaw, 2005)
15
Cognitive (or theory-based) validity
Do test takers go through the same mental processes when responding to test tasks as when they use language in the real world in the situations we are interested in?
16
Cognitive (or theory-based) validity
Theory-based validity involves collecting a priori evidence through piloting and trialling before the test event, for example through verbal reports from test takers on the cognitive processing activated by the test task, and a posteriori evidence involving statistical analysis of scores following test administration. (Weir and Shaw, 2005)
17
Scoring validity Scoring validity accounts for the extent to which test scores are: based on appropriate criteria; exhibit consensual agreement in their marking; free as possible from measurement error; stable over time; engender confidence as reliable decision making indicators. (Weir and Shaw, 2005)
18
Scoring validity = reliability
Are the test scores consistent enough for us to have confidence in the results?
19
Criterion-related validity
Criterion-related validity relates to the degree to which results on the test agree with those provided by some independent and highly dependable assessment of the candidate's ability. This independent assessment is thus the criterion measure against which the test is validated. (Hughes, 2003) Are test results of the test consistent with other evidence we have about test takers’ abilities? Criterion-related validity takes two forms: concurrent validity predictive validity
20
Concurrent validity “involves the comparison of the test scores with some other measures of the same candidates taken at roughly the same time as the test.” (Alderson et al., 1995:177) Do scores on our test agree with the results of other tests of the same abilities?
21
Predictive validity Predictive validity entails the comparison of test scores with some other measure for the same candidates taken some time after the test has been given. (Alderson et al., 1995) The degree to which a test can predict candidates' future performance. (Hughes, 2003) Did the test accurately predict which test takers were going to perform best in their jobs/ in class/ etc.?
22
Consequential validity (impact)
Does the introduction and use of the test have the intended social consequences? Is there any: bias in scoring and interpretation of results? unfairness in test use? positive or negative effect on teaching and learning?
23
Face validity Face validity refers to the test's “surface credibility or public acceptability” (Alderson, et al., 1995:172). Bachman (1990:307) states that “face validity is the appearance of real life.” Do test takers/ teachers/ politicians/ the public generally believe in the value of the test?
24
Face validity The assessment is credible to users: it looks as though it measures the skills or abilities of interest. For example, a multiple choice grammar test does not look as though it really tests the ability to speak English in real- world situations. All kinds of evidence could be used to show that people who pass the test are actually able to communicate effectively, but users may not be convinced because test takers are not actually required to speak. If the test does not have face validity, it is unlikely to be successful.
25
Construct validity In recent years the term construct validity has been used to refer to the general, overarching notion of validity. It is not enough to assert that a test has construct validity; empirical evidence is needed. (Hughes, 2003) The arguments for using the test as a reasonable justification for taking any decision must be presented and examined: validation.
26
Round-up: suitable data for test validity
Face validity Questionnaires to and interviews with candidates, administrations and other users. Context validity a) Compare test content with specifications/syllabus. b) Questionnaires to and interviews with 'experts' such as teachers, subject specialists, applied linguists. c) Expert judges rate test items and texts according to precise list of criteria. Cognitive validity Students introspect on their test-taking procedures, either concurrently or retrospectively. Keystroke logs. Eye-tracking. Concurrent validity a) Compare students' test scores with their scores on another test. b) Compare students' test scores with teachers' rankings. c) Compare students' test scores with other measures of ability such as students' teacher rating.
27
Suitable data for test validity
Predictive validity a) Compare students' test scores with their scores on tests taken some time later. b) Compare students' test scores with success in final exam. c) Compare students' test scores with other measures of their ability taken some time later, such as employers' assessments of their ability. Construct validity a) Compare performance on each subtest with other subtests. b) Compare performance on each subtest with total of all other subtests. d) Compare students' test score with students' biodata and psychological characteristics. e) Multitrait-multimethod studies. f) Factor analysis.
28
Who is a validator? Roles Designers Producers Organisers
Administrators Assessees Scorers Users Example validity questions Does the design of the test reflect an adequate theory of language? Is an appropriate balance of abilities required for success on the test? Do the test items reflect the designers’ intentions? Is the test organised and administered in a way that will ensure fairness? Do assessees respond to the test tasks in a way that reflects realistic language processing? Do scorers consistently and accurately capture the qualities of test takers’ performance? Are decisions taken by users justified by the test?
29
Who is a validator? Assessment developers (teachers, testing agencies): to check the quality of their own work; to showcase the quality of their tests. Assessment users: to check that tests are giving them accurate and relevant information. Independent agencies: to enforce/ encourage good quality assessment.
30
Conclusion Test validity, according to Alderson et al., (1995:193), is 'time-consuming and difficult'. However, it is essential as a test without validity cannot be useful as a decision making tool. Applied linguists and teachers should focus more of their efforts on practical research in this field.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.