Download presentation
1
RELIABILITY AND VALIDITY OF ASSESSMENT
2
ITEM ANALYSIS
3
Item analysis has to be done, before a meaningful and scientific inference about the test can be made in terms of its validity, reliability, objectivity and usability. Process which examines student’s response to individual test items inorder to assess the quality of the items and of the test as a whole.
4
The tools include : Item difficulty. Item discrimination. Item distractors.
5
THE PURPOSES OF ITEM ANALYSIS
6
Improve test items and identify unfair items.
Reveal which questions were most difficult. If a particular distracter is the most often chosen answer, the item must be examined. To identify common misconceptions among students about a particular concept.
7
To improve the quality of tests.
If items are too hard, teachers can adjust the way they teach.
8
Item Difficulty
9
It is the percentage of students taking the test who answered the item correctly.
Higher the value, easier the item.
10
D= R/N X 100 R – Number of pupils who answered the item correctly. N – Total number of pupils who tried them.
11
Example Number of pupils answered item correctly = 40
Total number of pupils who tried them = 50 40/50 X 100 = 80 %
12
Ideal difficulty levels for multiple-choice items
Format IdealDifficulty Five-response multiple-choice Four-response multiple-choice Three-response multiple-choice True-false
13
Item Discrimination
14
Ability of an item to differentiate among the students on the basis of how well they know the material being tested. A good item discriminates between those who do well on the test and those who do poorly. Higher the discrimination index,better the item.
15
DI = RU – RL/1/2 X N RU – Number of correct responses from the upper group RL – number of correct responses from lower group N – total number of pupils who tried them.
16
Example Total score – 60 Total sample – 50 Upper group – 25
Lower group – 25 22 – 10/1/2 X 50 = 0.29
17
Interpretation 0.4 or higher – very good items.
0.3 to good items. 0.20 to 0.29 –fairly good items. 0.19 or less – poor items. So the item in the example, is a fairly good item.
18
Distractors
19
Analyzing the distractors (i. e
Analyzing the distractors (i.e., incorrect alternatives) is useful in determining the relative usefulness of the decoys in each item. The alternatives are probably totally implausible and therefore of little use as decoys in multiple choice items.
20
One way to study responses to distractors is with a frequency table that tells you the proportion of students who selected a given distractor. Remove or replace distractors selected by a few or no students because students find them to be implausible.
21
RELIABILITY
22
Reliability is the degree to which an assessment tool produces stable and consistent results.
23
TYPES OF RELIABILITY
24
Test – retest reliability
Parallel reliability Inter – rater reliability Internal consistency Form equivalence (Alternate form)
25
Test-retest reliability
26
Obtained by administering the same test twice over a period of time to a group of individuals.
Scores from Time 1 and Time 2 can then be correlated to evaluate the test for stability. Also known as temporal stability.
27
Parallel forms reliability
28
It is obtained by administering different versions of an assessment tool to the same group of individuals. Scores from the two versions can then be correlated to evaluate the consistency of results across alternate versions.
29
Inter-rater reliability
30
Used to assess the degree to which different judges or raters agree in their assessment decisions.
Useful because human observers will not necessarily interpret answers the same way.
31
Internal consistency reliability
32
It used to evaluate the degree to which different test items that probe the same construct produce similar results. Two types are Average inter-item correlation Split-half reliability
33
Average inter-item correlation
Obtained by taking all of the items on a test that probe the same construct , determining the correlation coefficient for each pair of items, and finally taking the average of all of these correlation coefficients.
34
Split-half reliability
“Splitting in half” all items of a test to form two “sets” of items. The total score for each “set” is computed. Determining the correlation between the two total “set” scores to obtain split half reliability.
35
Form equivalence (Alternate form)
36
Also known as alternate form reliability.
Two different forms of test, based on the same content, on one occasion to the same examinees. Reliability is stated as correlation between scores of Test 1 and Test 2.
37
VALIDITY
38
An indication of how well an assessment actually measures what it is supposed to measure.
Refers to the accuracy of an assessment. It is the veracity of an assessment instrument.
39
TYPES OF VALIDITY
40
Face validity Construct validity Content validity Criterion related validity Formative validity Sampling validity
41
Face Validity
42
Measure of the extent to which an examination looks like an examination in the subject concerned and at the appropriate level. Candidates, teachers and the public have expectations as to what an examination looks like and how it is conducted.
43
Construct Validity
44
The extent to which an assessment corresponds to other variables, as predicted by some rationale or theory. It is also known as theoretical construct.
45
Content Validity
46
The extent to which a measure adequately represents all facets of a concept.
It is the extent to which the content of the test matches the instructional objectives
47
Criterion-Related validity
48
Degree to which content on a test (predictor) correlates with performance on relevant criterion measures (concrete criterion in the "real" world?)
49
Formative Validity
50
When applied to outcomes assessment it is used to assess how well a measure is able to provide information to help improve the program under study.
51
Sampling Validity
52
It is similar to content validity.
It ensures that the measure covers the broad range of areas within the concept under study.
53
FACTORS THAT CAN LOWER VALIDITY
54
Unclear directions Difficult reading vocabulary and sentence structure Ambiguity in statements Inadequate time limits Inappropriate level of difficulty
55
Cont’d Poorly constructed test items.
Test items inappropriate for the outcomes being measured. Tests that are too short. Administration and scoring.
56
Cont’d Improper arrangement of items (complex to easy?).
Identifiable patterns of answers. Teaching. Students . Nature of criterion.
57
WAYS TO IMPROVE VALIDITY AND RELIABILITY
58
IMPROVING RELIABILITY
First, calculate the item-test correlations and rewrite or reject any that are too low. Second, look at the items that did correlate well and write more like them. The longer the test, the higher the reliability up to a point
59
IMPROVING VALIDITY Make sure your goals and objectives are clearly defined and operationalized. Expectations of students should be written down. Match your assessment measure to your goals and objectives.
60
Cont’d Have the test reviewed by faculty at other schools to obtain feedback from an outside party who is less invested in the instrument. Get students involved; have the students look over the assessment for troublesome work.
61
RELATIONSHIP BEWEEN RELIABILITY AND VALIDITY
62
The two do not necessarily go hand-in-hand.
We can illustrate it as follows. Reliable but not valid - an archer who always hits about the same place but not near the bullseye.
63
Valid but not reliable - archer who hits various places centered around the bullseye, but not very accurately. Neither reliable nor valid - an archer who hits various places all off to the same side of the bullseye.
64
Cont’d Both reliable and valid - archer who hits consistently close to the bullseye. A valid assessment is always reliable, but a reliable assessment is not necessarily valid.
65
FACTORS IN RESOLVING CONFLICTS BETWEEN VALIDITY AND RELIABILITY
66
Validity is paramount. Validity will not damage educational effectiveness but excessive concern for reliability or costs may do so. Staff costs are limited by the credits in the workload planning system being used.
67
Cont’d Student time costs are limited by the planned learning hours allocated to them. Reliability cannot be 100% for any one assessment and may need to be compromised. Between-marker reliability can be improved by marker training and monitoring.
68
Cont’d Clear, detailed criteria will maximise examiner reliability and validity. Educationally effective coursework assessments are often simultaneously designed to prevent plagiarism detection.
69
Cont’d Where each student produces a number of similar assignments they can be randomly sampled. Self and peer assessment can reduce staff costs and uses as a learning activity. High-reliability assessment is costly and so should be used only where it is critical.
70
Cont’d Programme-wide design of assessment can avoid the worst of the conflicts. Designing good assessments is a creative, challenging task that demands expertise in the teaching of the subject,time and is improved by peer support and review.
71
QUESTIONS
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.