Download presentation
Published byDennis Chambers Modified over 9 years ago
1
BASIC PRINCIPLES OF ASSSESSMENT RELIABILITY & VALIDITY
TSL 3112 – LANGUAGE ASSESSMENT BASIC PRINCIPLES OF ASSSESSMENT RELIABILITY & VALIDITY
2
LECTURE’S OBJECTIVES Explain the difference between validity and reliability. Distinguish the different types of validity and reliability in tests and other instruments in language assessment. Suggest ways to ensure reliability and validity in in language assessment. (Main reference - Brown, H. Douglas, Language Assessment: Principles and classroom practices. )
3
WHAT IS RELIABILITY & VALIDITY
Reliability is the degree to which an assessment tool produces stable and consistent results. Validity refers to how well a test measures what it is purported to measure Why is it necessary? While reliability is necessary, it alone is not sufficient. For a test to be reliable, it also needs to be valid. For example, if your scale is off by 3 kg, it reads your weight every day with an excess of 3 kg. The scale is reliable because it consistently reports the same weight every day, but it is not valid because it adds 3 kg to your true weight. It is not a valid measure of your weight.
4
1- Reliability “a reliable test is consistent and dependable: this means that if you give the test to the same students but in two different occasions, the test should yield=give the same results. There are four factors that might contribute to the unreliability of a test : Students related reliability Rater reliability (scoring) Test administration reliability Test reliability
5
1- Reliability a. Students-related reliability
is related to factors that might contribute to the students themselves such as illness, fatigue, anxiety or any other physical and psychological factors which might make the observed score unreflective to the actual students performance. Also includes test-wiseness or strategies for efficient test-taking b. Rater reliability is related to human subjectivity, biased and error. inter-rater reliability is when two or more scorers yield inconsistent scores of the same test. Due to lack of attention, inexperience, or even biases. Intra-rater reliability is common in the classroom teachers because of unclear criteria or biases toward particular bad or good students. (how to solve this problem? PAGE 28)
6
1- Reliability Applying the reliability principle – read page 41
c. Test administration reliability is related to the condition in which that test is administered. factors such as heat, light, noise, confusing directions, and different testing time allowed to different students can affect students’ scores d. Test reliability is related to the test itself. Is it too long?, the time limit, poorly written test is there more than one answer to the question?, is the wording of the test clear? Subjective or objective test? Applying the reliability principle – read page 41
7
1- Reliability Test length factors
Factors that can affect the reliability of a test Test length factors longer tests produce higher reliabilities. Teacher-Student factors any good teacher-student relationship would help increase the consistency of the results. Environment factors favourable environment will improve the reliability of the test Test administration factors test administrators should strive to provide clear and accurate instructions Marking factors
8
2- Validity Tests themselves are not valid or invalid. Instead, we validate the use of a test score. Validity is a matter of degree, not all or none. Test validity refers to “ the extent to which inferences made from he assessment rests are appropriate , meaningful , and useful in terms of purpose of the assessment” A test of reading ability must actually measure reading ability. Does the test content/ items match the course content or unit being taught? (read Brown page 30)
9
2- Validity a. Content-related Validity
is achieved when the students were asked to perform the behavior that is being measured. For example, when you are assessing a student's ability to speak a second language but you are asking him/her to answer a paper and pen multiple choice questions that require grammatical judgment, then you are not achieving the content validity. (direct vs indirect testing) Does the test follow the logic of the lesson or unit? Are the objectives of the lessons clear and present in the test? Test should assess real course objectives and test performance directly. Read page 30-32
10
2- Validity b – Criterion-related validity
when the test has demonstrated its effectiveness in predicting criterion or indicators of a construct. looks at the relationship between a test score and an outcome. For example, SAT scores are used to determine whether a student will be successful in college. First-year grade point average becomes the criterion for success. Looking at the relationship between test scores and the criterion can tell you how valid the test is for determining success in college. The criterion can be any measure of success for the behaviour of interest.
11
2- Validity A criterion-related validation study can be either predictive of later behaviour or a concurrent measure of behavior or knowledge. i) Concurrent validity its results are supported by other concurrent performance beyond the assessment itself. Statements of concurrent validity indicate the extent to which test scores may be used to estimate an individual’s present standing on a criterion. Eg: a high score on the final exam of English language will be substantiated by actual proficiency in the language.
12
2- Validity ii) Predictive validity
refers to the form of criterion-related validity that is an index of the degree to which a test score predicts some criterion measure obtained at a future time. To assess a test-taker’s likelihood of future success. It is important in the case of placement tests
13
2- Validity c - Construct-related validity
Construct validity refers to the degree to which a test or other measure assesses the underlying theoretical construct it is supposed to measure (i.e., the test is measuring what it is purported to measure). Concerned with the theoretical relationships among constructs and The corresponding observed relationships among measures ( read page 33)
14
2- Validity c - Construct-related validity
As an example, think about a general knowledge test of basic algebra. If a test is designed to assess knowledge of facts concerning rate, time, distance, and their interrelationship with one another, but test questions are phrased in long and complex reading passages, then perhaps reading skills are inadvertently being measured instead of factual knowledge of basic algebra. In order to demonstrate construct validity, evidence that the test measures what it purports to measure (in this case basic algebra) as well as evidence that the test does not measure irrelevant attributes (reading ability) are both required.
15
2- Validity d. Consequential validity (impact)
Some professionals feel that, in the real world, the consequences that follow from the use of assessments are important indications of validity. Two levels of impact: i) macro level: the effect on society and educational systems ii) micro level: the effect of individual test takers.
16
2- Validity e - Face validity
A test is said to have face validity if it "looks like" it is going to measure what it is supposed to measure Face validity is not empirical; one is saying that the test “appears it will work,” as opposed to saying “it has been shown to work.” refers “ is to the extent students view the assessment as fair, relevant , and useful for improving learning” Students will judge a test to be face valid if Directions are clear Organized in a logical way No surprises Time appropriate Appropriate in difficulty
17
2- Validity These validity are summarized by Davies (1968) in the following way: Type of Validity Test Face Looks like a good one to learner/layman Content Accurately reflects the syllabus it is based on Predictive Accurately predicts future performance Concurrent Gives similar results to already validated tests or other immediate external criteria (eg. Teacher’s subjective assessment) Construct Reflects closely a valid theory of foreign language learning that it takes as its model.
18
Tutorial Read Exercise 6 page , in small group evaluate the scenarios provided. Based on samples of formative and summative assessments, discuss aspects of reliability / validity that must be considered in these assessments. Discuss measures that a teacher can take to ensure high validity of language assessment for the primary classroom.
19
References Brown, H. Douglas, Language Assessment: Principles and classroom practices. Pearson Education, Inc. Chitravelu, Nesamalar, ELT Methodology: Principles and Practice. Penerbit Fajar Bakti, Sdn, Bhd.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.