Download presentation
1
Chap. 2 Principles of Language Assessment
2
Practicality Practical means: (1) is not excessively expensive
(2) stays within appropriate time constraints (3) is relatively easy to administer, and (4) has a scoring/evaluation procedure that is specific and time-efficient
3
Reliability A reliable test is consistent and dependable.
On two different occasions or by different people, the test should yield similar results. Student-Related Reliability may be caused by temporary illness, fatigue, “bad day”, anxiety, and other physical or psychological factors.
4
Rater Reliability Human error, subjectivity, and bias may enter into the scoring process. Inter-rater reliability occurs when two/more scorers yield inconsistent scores of the same test (scoring criteria, inexperience, inattention,preconceived biases).
5
Intra-rater reliability occurs because of unclear scoring criteria, fatigue, bias toward “good” and “bad” students, or carelessness. Test Administration Reliability Unreliability may also result from the conditions in which the test is administered. Examples: street noise, temperature, desks and chairs, the amount of light.
6
Reliability & Validity
Test Reliability The test itself can cause measurement errors. Examples: a long test, a timed test, ambiguous test items, or a test item with more than one answer. Validity: the degree to which a test measures what it is supposed to measure or can be used successfully for the purposes for which it is intended.
7
Validity For example, a valid test of reading ability actually measures reading ability. Five types of validity: content validity, criterion-related validity, construct validity, consequential validity, and face validity. Content validity: A test adequately and sufficiently measures the particular skills/behavior it sets out to measure.
8
Validity Examples: A test that requires the learner actually to speak within an authentic context (T). An oral test asks students to answer multiple-choice questions requiring grammatical judgments (F). Direct testing involves the test-taker in actually performing the target task. e.g. producing target words orally.
9
Validity Indirect testing tests the learner with a task that is related to the target task. For example, in a test of oral production, the mark of stressed syllables in a list of written words is only indirect testing. Criterion-related validity: a form of validity in which a test is compared or correlated with an outside criterion measure.
10
Criterion-Related Validity
Concurrent validity: A test has concurrent validity if its results are supported by other concurrent performance beyond the assessment itself. For example, a high score on the final exam. will be substantiated by actual proficiency in the language. Predictive validity: A test accurately predicts future performance. e.g. a language aptitude test predicts second/foreign language ability
11
Construct Validity A construct is any theory, hypothesis, or method that attempts to explain observed phenomena in our universe of perceptions. For example, “proficiency” and “communicative competence” are linguistic constructs. Construct validity: The test items can reflect the essential aspects of the theory on which the test is based. (e.g. the relationship between a test of communicative competence and the theory of c. c.)
12
Construct Validity The scoring analysis for the interview includes: pronunciation, fluency, grammatical accuracy, vocabulary use, and socio-linguistic appropriateness. If an proficiency interview includes only pronunciation and grammar being evaluated, the construct validity is questionable. (TOEFL)
13
Consequential Validity
Consequential validity includes all the consequences of a test, including the accuracy in measuring intended criteria, the impact on the preparation of test-takers, the effect on the learner, and the social consequences of a test’s interpretation and use.
14
Face Validity Face validity refers to the degree to which a test looks right, and appears to measure the knowledge or abilities it claims to measure, based on the subjective judgment. Face validity means that the students perceive the test to be valid. (Does the test, on the face of it, appear from the learner’s perspective to test what it is designed to test?)
15
Authenticity The language is as natural as possible.
Items are contextualized rather than isolated. Topics are meaningful (relevant, interesting). Thematic organization to items is provided. Tasks represent, or close to, real-world tasks.
16
Washback Washback is the effect of testing on teaching and learning.
It generally refers to the effects the tests have on instruction in terms of how students prepare for the test. S’s incorrect responses/correct responses/strategies for success/ can be served as learning devices. Comment generously and specifically on S’s test performance.
17
Washback In reality, letter grades and numerical scores give no information of intrinsic interest to the student. Instead, give praise for strengths and offer constructive criticism of weaknesses. Formative tests provide washback with information to the learner on progress toward goals. Teachers tend to offer no means of washback except grades in summative tests.
18
Applying Principles to the Evaluation
(1). Are the test procedures practical? (administrative details, time frame, smooth administration, materials and equipment, cost, scoring system, reporting results) (2). Is the test reliable? (clean test sheet, audible sound amplification, equally visible video input, lighting,temperature, objective scoring procedures)
19
Intra-rater reliability guidelines:
(consistent sets of criteria, uniform attention, double check consistency, the same standards to all, avoidance of fatigue) (3) Does the procedure demonstrate content validity? (two steps) A: Are classroom objectives identified and appropriately framed?
20
B: Are lesson objectives represented in the form of test specification?
(4) Is the procedure face valid and “biased for best”? Conditions for face valid: a. Directions are clear. b. The structure of the test is organized logically. c. Its difficulty level is appropriately pitched.
21
d. The test has no surprises.
e. Timing is appropriate. (5). Are the test tasks as authentic as possible? a. as natural as possible b. as contextualized as possible c. interesting, enjoyable, and/humorous d. thematic organization e. real-world tasks
22
(6) Does the test offer beneficial washback to the learner?
(content validity, preparation time before the test, reviewing after the test, self-assessment, and peer discussion of the test results)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.