ASSESSMENT INSTRUMENTS AND PROCEDURES
Assessment methods Performance test Checklist Rating scale Portfolio Peer assessment Self-assessment Observation Achievement test
checklist a list of sequential behaviors arranged in categories used to determine whether the child exhibits the behaviors or skills listed
Evaluating and Assessing With Checklists Curriculum objectives are used to plan instruction and to evaluate children’s performance on the same objectives After the planned activities, children are assessed to determine how well they learned Evaluation is achieved through observation during the activities, and through specific assessment tasks
Advantages of Using Checklists Easy to use and update Require little training Available whenever evaluation is needed Flexible and can be used with a variety of assessment strategies Behaviors can be recorded frequently
Disadvantages of Using Checklists Can be time consuming Teachers find it difficult to adapt teaching and evaluation behaviors to include checklists If there are too many checklists, the teacher can be overwhelmed with assessment and record keeping Teachers may not consider assessments with checklists as valid measures Checklists do not indicate how well a child performs
Example of checklist http://www.eed.state.ak.us/tls/frameworks/wrldlang/wlinstr3.html#Writing
Rating Scales Rating scale: used to determine the degree to which the child exhibits a behavior or the quality of that behavior; each trait is rated on a continuum, the observer decides where the child fits on the scale
Rating Scales Make a qualitative judgment about the extent to which a behavior is present Consist of a set of characteristics or qualities to be judged by using a systematic procedure Numerical and graphic rating scales are used most frequently
Types of Rating Scales Numerical Rating Scales: a sequence of numbers is assigned to descriptive Categories; the rater marks a number to indicate the degree to which a characteristic is present Graphic Rating Scales: a set of categories described at certain points along the line of a continuum; the rater can mark his or her judgment at any location on the line
Advantages of Using Rating Scales Used for behaviors not easily measured by other means Quick and easy to complete User can apply knowledge about the child from other times Minimum of training required Easy to design using consistent descriptors (e.g., always, sometimes, rarely, or never) Can describe the child’s steps toward understanding or mastery
Disadvantages of Using Rating Scales: Reliability Highly subjective (rater error and bias are a common problem) Raters may rate a child on the basis of their previous interactions or on an emotional, rather than an objective, basis Ambiguous terms make them unreliable: raters are likely to mark characteristics by using different interpretations of the ratings (e.g., do they all agree on what “sometimes” means?)
Portfolio A student portfolio is a systematic collection of student work and related material that depicts a student's activities, accomplishments, and achievements in one or more school subjects.
process portfolio documents the stages of learning and provides a progressive record of student growth. A product portfolio demonstrates mastery of a learning task or a set of learning objectives and contains only the best work.
advantages assess and promote critical thinking. encourage students to become accountable and responsible for their own learning (i.e., self-directed, active, peer-supported, adult learning). can be the focus of initiating a discussion between student and tutor. facilitate reflection and self-assessment. can accommodate diverse learning styles, though they are not suitable for all learning styles.
can monitor and assess students’ progress over time. can assess performance, with practical application of theory, in real-time naturalistic settings (i.e., authentic assessment). use multiple methods of assessment. take into account the judgment of multiple assessors. have high face validity, content validity, and construct validity.
Disadvantages When portfolios are used for summative assessment, students may be reluctant to reveal weaknesses. Portfolios are personal documents, and ethical issues of privacy and confidentiality may arise when they are used for assessment. Difficulties may arise in verifying whether the material submitted is the candidate’s own work.
Portfolios take a long time to complete and assess. The portfolio process involves a large amount of paperwork. Portfolio assessment may produce unacceptably low inter-rater reliability, especially if the assessment rubrics are not properly prepared or are used by untrained assessors.
Peer Assessment Students individually assess each other's contribution using a predetermined list of criteria. Grading is based on a predetermined process, but most commonly it is an average of the marks awarded by members of the group.
Advantages Agreed marking criteria means there can be little confusion about assignment outcomes and expectations. Encourages student involvement and responsibility. Encourages students to reflect on their role and contribution to the process of the group work. Focuses on the development of student’s judgment skills. Students are involved in the process and are encouraged to take part ownership of this process
Provides more relevant feedback to students as it is generated by their peers. It is considered fair by some students, because each student is judged on their own contribution. When operating successfully can reduce a lecturer's marking load. Can help reduce the ‘free rider’ problem as students are aware that their contribution will be graded by their peers.
Disadvantages Additional briefing time can increase a lecturer’s workload. The process has a degree of risk with respect to reliability of grades as peer pressure to apply elevated grades or friendships may influence the assessment, though this can be reduced if students can submit their assessments independent of the group. Students will have a tendency to award everyone the same mark. Students feel ill equipped to undertake the assessment. Students may be reluctant to make judgements regarding their peers. At the other extreme students may be discriminated against if students ‘gang up’ against one group member.
Self Assessment This is similar to peer evaluation but students assess their own contribution as well as their peers using an established set of criteria.
Advantages Encourages student involvement and responsibility. Encourages students to reflect on their role and contribution to the process of the group work. Allows students to see and reflect on their peers’ assessment of their contribution. Focuses on the development of student’s judgment skills.
Disadvantages Potentially increases lecturer workload by needing to brief students on the process as well as on-going guidance on performing self evaluation. Self evaluation has a risk of being perceived as a process of presenting inflated grades and being unreliable. Students feel ill equipped to undertake the assessment.
Observations Classroom observation is another form of ongoing assessment. Most teachers can "read" their students; observing when they are bored, frustrated, excited, motivated, etc. As a teacher picks up these cues, she or he can adjust the instruction accordingly. It is also beneficial for teachers to make observational notes (referred to as anecdotal notes). These notes serve to document and describe student learning relative to concept development, reading, social interaction, communication skills, etc.
Usefulness of observation Discover students’ interests. Assess students’ developmental levels. observe what strategies children use to attain their goals. Observe what skills the children need to practice. learn a lot about students’ personalities.
Limitations Time consuming Works well for observing one individual, but is difficult to use when observing a group; Observers keep themselves apart from the children which would be difficult for a teacher to do.
Follow these steps to create a rubric with a rating scale: 1. Select the performance target (based on your objectives or standards) 2. Define the performance task (outline all expectations) 3. Determine the dimensions that will be assessed. For example, if you are creating a rubric for assessing a research paper, you might evaluate the research, content, mechanics, and style. 4. For each of the dimensions, identify at least three different "degrees" of performance. The more detail you can include, the better. For example, if one dimension is "research" the degrees might include: Exemplary=at least 5 sources; Intermediate=at least 3 sources; and Novice=less than 3 sources.
5. Assign points (numbers) and/or words (e. g 5. Assign points (numbers) and/or words (e.g., novice, intermediate, proficient) as the scale to evaluate the learning outcomes. 6. Add a column to record the score for each dimension, as well as a row for the total score. 7. Distribute copies of the rubric to students when they begin the task -- that way they will know exactly how they will be assessed.
The following link presents some assessment strategies with samples of scales, rubric etc. http://www.eed.state.ak.us/tls/frameworks/mathsci/ms5_2as1.htm
Paper and Pencil Assessment Paper-pencil assessment is often the first choice for formal assessment because of its practicality. It may use recognition or recall tasks. Recognition: Multiple choice, true-false, matching Recall: Short-answer, essay, word problems It often only measures lower-level skills. However, they can be used to measure higher-level skills, but these questions take more time to write. Essays are more often used to measure higher-level skills.
Matching Items Keep the items in each column homogeneous Have more items in one column than the other
Multiple-Choice Items Present distractors that are clearly wrong to students who know the material but plausible to students who haven’t mastered it Avoid putting negatives in both the stem and the alternative Use “all of the above” or “none of the above” seldom if at all Avoid giving logical clues about the correct answer
Short-Answer and Completion Items Indicate the type of response required For completion items, include only one or two blanks per item
General Guidelines for Constructing Paper-Pencil Assessments Define tasks clearly and unambiguously Decide whether students should have access to reference materials Specify scoring criteria in advance Place easier and shorter items at the beginning of the instrument Set parameters for students’ responses
Item Assessment After you create your objective assessment items and give your test, how can you be sure that the items are appropriate -- not too difficult and not too easy?
Item Difficulty/facility Item difficulty is simply the proportion of students who answered an item correctly. To determine the difficulty level of test items, a measure called the Difficulty Index is used. This measure asks teachers to calculate the proportion of students who answered the test item accurately.
For example, let's say you gave a multiple choice quiz and there were four answer choices (A, B, C, and D). The following table illustrates how many students selected each answer choice for Question #1 and #2. * is the correct answer Question A B C D Question 1 3 24* Question 2 12* 13 2
The difficulty of the item is found by dividing the number of students who choose the correct answer by the number of total students For question 1 the difficulty is 24/30=0.8 Rule of thumb- if the item difficulty is more than .75, it is an easy item; if the difficulty is below .25, it is a difficult item. This question could be considered a moderately easy question since 80% of the students got it correct.
However for question 2, the difficulty index is 12/30=0.4. Since more students chose A, it would be advised that the distractor A be looked at for errors.
Discrimination Index Refers to how well an assessment differentiates between high and low scorers. In other words, you should be able to expect that the high-performing students would select the correct answer for each question more often than the low-performing students. If this is true, then the assessment is said to have a positive discrimination index (between 0 and 1)
If, however, you find that more of the low-performing students got a specific item correct, then the item has a negative discrimination index (between -1 and 0).
"1" indicates the answer was correct; "0" indicates it was incorrect. The table displays the results of ten questions on a quiz. Note that the students are arranged with the top overall scorers at the top of the table. "1" indicates the answer was correct; "0" indicates it was incorrect.
Upper group- first half of students Lower group- lower half of students The discrimination index is determined by subtracting the number of students in the lower group who got the item correct from the number of students in the upper group who got the item correct. Then, divide by the number of students in each group (in this case, there are five in each group). For Question #1, that means you would subtract 4 from 4, and divide by 5, which results in a Discrimination Index of 0.
Analysis from the table We can see that Question #2 had a difficulty index of .30 (meaning it was quite difficult), and it also had a negative discrimination index of -0.6 (meaning that the low-performing students were more likely to get this item correct). This question should be carefully analyzed, and probably deleted or changed. Our "best" overall question is Question 3, which had a moderate difficulty level (.60), and discriminated extremely well (0.8).
Standard deviation The Standard Deviation is a measure of how spread out numbers are. Its symbol is σ (the greek letter sigma) σ= the standard deviation x= each value in the population
A normal distribution of data means that most of the examples in a set of data are close to the "average," while relatively few examples tend to one extreme or the other. When the examples are pretty tightly bunched together and the bell-shaped curve is steep, the standard deviation is small. When the examples are spread apart and the bell curve is relatively flat, that tells you have a relatively large standard deviation
One standard deviation away from the mean in either direction on the horizontal axis (the red area on the above graph) accounts for somewhere around 68 percent of the people in this group. Two standard deviations away from the mean (the red and green areas) account for roughly 95 percent of the people. And three standard deviations (the red, green and blue areas) account for about 99 percent of the people
we have a set of scores which are normally distributed we have a set of scores which are normally distributed. The range is from 0 to 200, the mean and median are 100, and the standard deviation is 20. In a normal curve, the standard deviation indicates precisely how the scores are distributed.
Note that the percentage of scores is marked off by standard deviations on either side of the mean. In the range between 80 and 20 (that’s one standard deviation on either side of the mean), there are 68.26% of the cases. In other words, in a normal distribution, roughly two thirds of the scores lie between one standard deviation on either side of the mean. If we go out to two standard deviations on either side of the mean, we will include 95.44% of the scores; and if we go out three standard deviations, that will encompass 98.74% of the scores; and so on.
The following are test scores for 11 students The following are test scores for 11 students. Determine the standard deviation for the scores. Use the following table to assist you. 92, 66, 99, 75, 69, 51, 89, 75, 54, 45, 69
x