Download presentation
Presentation is loading. Please wait.
Published byUrsula Stone Modified over 9 years ago
1
CREATING EFFECTIVE QUESTIONS FOR ASSESSMENT AND AS AIDS IN LEARNING IN TODAY'S PHARMACOLOGY PROGRAMS George A. Dunaway, Ph.D. Emeritus Professor Department of Pharmacology Southern Illinois University School of Medicine Springfield, IL IS THERE MORE TO TESTING THAN WRITING QUESTIONS? Experimental Biology Meetings April 10, 2011 Washington, DC
2
MAKING THE MOST OF AN ITEM ANALYSIS Item analysis information can be used to make important decisions for high-risk examinations. Item analysis information can be used to make important decisions for high-risk examinations. For present examination For present examination Validity of each question to decide to retain or omit exam questions Validity of each question to decide to retain or omit exam questions Whole test validity to make pass/fail decisions Whole test validity to make pass/fail decisions For subsequent examinations, it can provide insights into improvement of questions. For subsequent examinations, it can provide insights into improvement of questions. Experimental Biology Meetings April 10, 2011 Washington, DC
3
ITEM ANALYSIS INFORMATON Test Information Test Information Date, number of examinees and test items Date, number of examinees and test items High and low scores, median and mean scores, SEM and SD, High and low scores, median and mean scores, SEM and SD, Test reliability, e.g., Cronbach’s alpha, which can be interpreted as the mean of all possible split-half coefficients. Test reliability, e.g., Cronbach’s alpha, which can be interpreted as the mean of all possible split-half coefficients. Ranking of individual examination scores Ranking of individual examination scores Question information Question information Difficulty, i.e., % answering correct or “p” value Difficulty, i.e., % answering correct or “p” value Discrimination power of each question, e.g., biserial or point biserial (rbp) Discrimination power of each question, e.g., biserial or point biserial (rbp) Frequency of selection of each option Frequency of selection of each option Test performance of group selecting each option, correctly or incorrectly. Test performance of group selecting each option, correctly or incorrectly. Experimental Biology Meetings April 10, 2011 Washington, DC
4
MAKING SENSE OF THE POINT BISERIAL CORRELATION COEFICIENT r pb = (Y c - Y t )/S [N c /(N t - N c ) N t /(N t - 1)] 1/2 r pb =discrimination power of a question, r pb =discrimination power of a question, i.e., how well does the ranking of students on each question correlate with their ranking using their test average. Y c =mean test score of students answering question correctly Y t =mean test score of all students S=standard deviation of test mean N c =number answering question correctly N t =total answering question If Y c > Y t a positive correlation apparently exists between whole test population and question population. If Y c > Y t a positive correlation apparently exists between whole test population and question population. The breadth of the S has an inverse effect on the r pb. The breadth of the S has an inverse effect on the r pb. Population variations are weighted using the last term. Population variations are weighted using the last term. The magnitude of r pb suggests the extent of correlation of student scores on question and test. The magnitude of r pb suggests the extent of correlation of student scores on question and test.
5
SETTING STANDARDS FOR QUESTIONS Average (p value) on question is similar to test average. Average (p value) on question is similar to test average. Majority of students (~70%) chose correct answer. Majority of students (~70%) chose correct answer. All responses have been selected, i.e., no options were easily eliminated by guessing. All responses have been selected, i.e., no options were easily eliminated by guessing. Performance of individuals on question correlates with their performance on whole test. Performance of individuals on question correlates with their performance on whole test. Realistically, we often have to settle for less. Realistically, we often have to settle for less. Experimental Biology Meetings April 10, 2011 Washington, DC
6
Consider MCQ’s used in a high risk testing environment. Consider MCQ’s used in a high risk testing environment. Using item analysis information, to consider two aspects Using item analysis information, to consider two aspects Question suitability for current high risk examination. Question suitability for current high risk examination. Deciding potential modifications for use on a later exam. Deciding potential modifications for use on a later exam. Question items that are particularly useful include p value, rpb, frequency of selection of each option, and student test scores choosing each option. Question items that are particularly useful include p value, rpb, frequency of selection of each option, and student test scores choosing each option. For each question, determine the following: For each question, determine the following: What is the p value? What is the p value? How well did the rpb for correct answer population correlate with whole test population? How well did the rpb for correct answer population correlate with whole test population? For each question how well does the population selecting the incorrect options correlate with whole test? For each question how well does the population selecting the incorrect options correlate with whole test? What is the frequency of selecting each option? What is the frequency of selecting each option? PRACTICAL USE OF ITEM ANALYSIS TO EVALUATE TEST QUESTIONS Experimental Biology Meetings April 10, 2011 Washington, DC
7
If p value , question is likely too easy and testing only memorization. If p value , question is likely too easy and testing only memorization. If p value , question is likely confusing, poorly written, or testing obscure information not in learning issues. If p value , question is likely confusing, poorly written, or testing obscure information not in learning issues. Action step: For high or for low p values, (1) revise and retain tested concept, (2) discard question, and/or (3) improve learning resources. Action step: For high or for low p values, (1) revise and retain tested concept, (2) discard question, and/or (3) improve learning resources. If rpb for correct answer population has a poor correlation with whole test population If rpb for correct answer population has a poor correlation with whole test population Action step: Consider possible keying error or poorly written question needing to be edited. Action step: Consider possible keying error or poorly written question needing to be edited. If one or more of the incorrect options are too poorly or highly selected. If one or more of the incorrect options are too poorly or highly selected. Action step: Consider revisions that exploit predictable misconceptions. Action step: Consider revisions that exploit predictable misconceptions. GENERALITIES USING ITEM ANALYSIS INFORMATION Experimental Biology Meetings April 10, 2011 Washington, DC
8
ITEM ANALYSIS INFORMATION FROM RECENT EXAMINATION (2010) Student test scores were segregated into deciles, which yielded a “pseudo-normal” distribution. Student test scores were segregated into deciles, which yielded a “pseudo-normal” distribution. Mean: 74.0% (Median score: 73.7%) Mean: 74.0% (Median score: 73.7%) Std. Dev: 6% Std. Dev: 6% High score: 92.2% High score: 92.2% Low score: 55.7% Low score: 55.7% Test Reliability: 0.88 Test Reliability: 0.88 Outcomes Outcomes Pass: Test average must at least = 1 SD below mean, i.e., 68% Pass: Test average must at least = 1 SD below mean, i.e., 68% Concern*: Test average between 1 and 2 SD below mean, i.e., 62% < score < 68% Concern*: Test average between 1 and 2 SD below mean, i.e., 62% < score < 68% Failure*: Score ≤ 62% Failure*: Score ≤ 62% *Unit-specific remediation(s) required at end of term *Unit-specific remediation(s) required at end of term Experimental Biology Meetings April 10, 2011 Washington, DC
9
EXAMPLE 1 p value: 0.26 p value: 0.26 A:18% selected Test score:67.1%rpb: -0.109 A:18% selected Test score:67.1%rpb: -0.109 B:19% selectedTest score:68.6%rpb: -0.118 B:19% selectedTest score:68.6%rpb: -0.118 C:20% selected Test score:72.8%rpb: -0.008 C:20% selected Test score:72.8%rpb: -0.008 D:17% selected Test score:65.8%rpb: -0.088 D:17% selected Test score:65.8%rpb: -0.088 E:26% selectedTest score:81.4%rpb: +0.339 E:26% selectedTest score:81.4%rpb: +0.339 What are the primary concerns for this question? 1.Are there concerns with p value for question? 2.What is suggested by the rpb for correct answer population? 3.Was average test performance of those selecting incorrect options consistent with their test performance? 4.What was the distribution of selection of the test question options? 5.Keep (test-worthy) or discard (not suitable for current test)? 6.For later use, what potential question-specific modifications would you consider? Experimental Biology Meetings April 10, 2011 Washington, DC
10
EXAMPLE 2 p value: 0.77 p value: 0.77 A:02% selected Test score: 65.6%rpb: -0.228 A:02% selected Test score: 65.6%rpb: -0.228 B:77% selectedTest score: 77.4%rpb: +0.182 B:77% selectedTest score: 77.4%rpb: +0.182 C:05% selected Test score: 73.5%rpb: -0.045 C:05% selected Test score: 73.5%rpb: -0.045 D:08% selected Test score: 67.8%rpb: -0.213 D:08% selected Test score: 67.8%rpb: -0.213 E:08% selectedTest score: 71.1%rpb: -0.159 E:08% selectedTest score: 71.1%rpb: -0.159 Experimental Biology Meetings April 10, 2011 Washington, DC What are the primary concerns for this question? 1.Are there concerns with p value for question? 2.How well does rpb for correct answer population correlate with whole test population? 3.Was average test performance of those selecting incorrect options consistent with test performance? 4.Was there adequate selection of test question options? 5.Keep (test-worthy) or discard (not suitable for current test)? 6.For later use, what potential question-specific modifications would you consider?
11
EXAMPLE 3 p value: 0.18 p value: 0.18 A:18% selected Test score: 74.0%rpb: -0.066 A:18% selected Test score: 74.0%rpb: -0.066 B:06% selectedTest score: 62.6%rpb: -0.452 B:06% selectedTest score: 62.6%rpb: -0.452 C:06% selected Test score: 74.4%rpb: -0.019 C:06% selected Test score: 74.4%rpb: -0.019 D: 12% selected Test score: 72.9%rpb: -0.109 D: 12% selected Test score: 72.9%rpb: -0.109 E: 58% selectedTest score: 77.0%rpb: +0.351 E: 58% selectedTest score: 77.0%rpb: +0.351 Experimental Biology Meetings April 10, 2011 Washington, DC What are the primary concerns for this question? 1.Are there concerns with p value for question? 2.How well does rpb for correct answer population correlate with whole test population? 3.Was average test performance of those selecting incorrect options consistent with test performance? 4.Was there adequate selection of test question options? 5.Keep (test-worthy) or discard (not suitable for current test)? 6.For later use, what potential question-specific modifications would you consider?
12
EXAMPLE 4 p value: 0.06 p value: 0.06 A:45% selected Test score: 76.5%rpb: +0.209 A:45% selected Test score: 76.5%rpb: +0.209 B:05% selectedTest score: 66.5%rpb: -0.267 B:05% selectedTest score: 66.5%rpb: -0.267 C:06% selected Test score: 71.3%rpb: -0.136 C:06% selected Test score: 71.3%rpb: -0.136 D:30% selected Test score: 75.4%rpb: +0.101 D:30% selected Test score: 75.4%rpb: +0.101 E:06% selectedTest score: 70.3%rpb: -0.171 E:06% selectedTest score: 70.3%rpb: -0.171 Experimental Biology Meetings April 10, 2011 Washington, DC What are the primary concerns for this question? 1.Are there concerns with p value for question? 2.How well does rpb for correct answer population correlate with whole test population? 3.Was average test performance of those selecting incorrect options consistent with test performance? 4.Was there adequate selection of test question options? 5.Keep (test-worthy) or discard (not suitable for current test)? 6.For later use, what potential question-specific modifications would you consider?
13
APPENDIX Experimental Biology Meetings April 10, 2011 Washington, DC
14
GOOD TEST RESULTS ARE GENERATED BY BOTH GOOD TEACHING AND QUESTIONS Give the students a reasonable expectation of test material Give the students a reasonable expectation of test material That is, a reasonable set of objectives or expected outcomes and references for attaining the information. That is, a reasonable set of objectives or expected outcomes and references for attaining the information. Use a question format, which tests information to be learned by assessing skills, facts, knowledge in a context that it will be used. Use a question format, which tests information to be learned by assessing skills, facts, knowledge in a context that it will be used. That is, when the student applies it as a professional. That is, when the student applies it as a professional. Common mistakes reducing test effectiveness Common mistakes reducing test effectiveness Lack of reasonable or predictable association of expectations and tested material Lack of reasonable or predictable association of expectations and tested material Questions that do not require adequate understanding of tested material Questions that do not require adequate understanding of tested material Poor syntax and grammatical skills makes expectations difficult to predict leading to poor responses. Poor syntax and grammatical skills makes expectations difficult to predict leading to poor responses. Experimental Biology Meetings April 10, 2011 Washington, DC
15
COMMON (AVOIDABLE) MISTAKES LEADING TO UNRELIABLE MCQ ASSESSMENTS Question construction gives clues to correct answer or allows elimination of incorrect answers Question construction gives clues to correct answer or allows elimination of incorrect answers Heterogeneous or nonparallel content choices Heterogeneous or nonparallel content choices Series of True/False options with no particular relevance to stem Series of True/False options with no particular relevance to stem Use of “all of the above are correct” or “none of the above are correct” as answer choices. Use of “all of the above are correct” or “none of the above are correct” as answer choices. Experimental Biology Meetings April 10, 2011 Washington, DC
16
EQs require significant time and effort to compose EQs require significant time and effort to compose EQs can be difficult to effectively and subjectively grade. EQs can be difficult to effectively and subjectively grade. The level of knowledge that can be assessed by EQs is somewhat different from other types of questions. The level of knowledge that can be assessed by EQs is somewhat different from other types of questions. That is, a well-designed EQ can assess conceptual clarity, organizational skills, and problem solving skillss. That is, a well-designed EQ can assess conceptual clarity, organizational skills, and problem solving skillss. Further, insight can be gained by the teacher into their teaching and curriculum design effectiveness. Further, insight can be gained by the teacher into their teaching and curriculum design effectiveness. The benefit to the student is that this type of problem-solving environment simulates science career experiences. The benefit to the student is that this type of problem-solving environment simulates science career experiences. EQ assessment permits the student and teacher insight into basic knowledge and the ability to use it with their existing knowledge base to solve problems. EQ assessment permits the student and teacher insight into basic knowledge and the ability to use it with their existing knowledge base to solve problems. Another advantage is that EQs can be stimulating and exciting for graduate students, which could reduce test anxiety. Another advantage is that EQs can be stimulating and exciting for graduate students, which could reduce test anxiety. With feedback, the student can use this experience to identify the status and accessibility of their knowledge, and recognition of the need for improvement. With feedback, the student can use this experience to identify the status and accessibility of their knowledge, and recognition of the need for improvement. COMPOSING AND ASSESSING ESSAY QUESTIONS (EQS) Experimental Biology Meetings April 10, 2011 Washington, DC
17
Examining learning objectives to decide information is to be tested. Examining learning objectives to decide information is to be tested. Incorporate into an EQ as many concepts from learning objectives as are practical to minimize the probes needed for their assessment. Incorporate into an EQ as many concepts from learning objectives as are practical to minimize the probes needed for their assessment. After deciding on concepts to evaluate knowledge for each EQ, After deciding on concepts to evaluate knowledge for each EQ, Determine the extent of knowledge to be required. Determine the extent of knowledge to be required. Consider what background knowledge the student should know and what you will provide. Consider what background knowledge the student should know and what you will provide. Create a “scenario” that poses a problem that requires recall and use of the information (new and existing) to be tested. Create a “scenario” that poses a problem that requires recall and use of the information (new and existing) to be tested. To provide an ability to evaluate the student’s response, the conundrum can have multiple imbedded problems and distracters of varying levels of conceptual difficulty. To provide an ability to evaluate the student’s response, the conundrum can have multiple imbedded problems and distracters of varying levels of conceptual difficulty. The goal is to measure the student’s ability to use effectively their new knowledge in concert with an existing knowledge base. The goal is to measure the student’s ability to use effectively their new knowledge in concert with an existing knowledge base. Provocative EQs present situations that have not been previously discussed, are likely to unfamiliar, but can be analyzed using their knowledge. Provocative EQs present situations that have not been previously discussed, are likely to unfamiliar, but can be analyzed using their knowledge. After reading the goal of the EQ is to provoke: After reading the goal of the EQ is to provoke: Recall all acquired knowledge relative to the problem, Recall all acquired knowledge relative to the problem, Assembly of the information Assembly of the information Coherently integration all knowledge (previous and newly acquired) Coherently integration all knowledge (previous and newly acquired) Composition of a cogent response Composition of a cogent response ELEMENTS OF EQ CONSTRUCTION Experimental Biology Meetings April 10, 2011 Washington, DC
18
A critical aspect of EQ construction is minimizing unintended distractions and ensuring an understanding of the breadth and depth of expected response. A critical aspect of EQ construction is minimizing unintended distractions and ensuring an understanding of the breadth and depth of expected response. Students should understand clearly, what problem(s) must be considered. Students should understand clearly, what problem(s) must be considered. Remember that unlike MCQ, test takers they do not have response options to cue them into what is being asked. Remember that unlike MCQ, test takers they do not have response options to cue them into what is being asked. Ambiguous questions often mislead informed students. Ambiguous questions often mislead informed students. Expectations should be consistent with how the graduate student will need to use the information to solve career-associated problems. Expectations should be consistent with how the graduate student will need to use the information to solve career-associated problems. Instead of fact recall, questions should probe concepts primarily or secondarily associated with research or work related experiences. Instead of fact recall, questions should probe concepts primarily or secondarily associated with research or work related experiences. Ambiguities can also be reduced by avoiding the use words with multiple usages that could be confusing, using appropriate grammar, spelling, and punctuation errors, Ambiguities can also be reduced by avoiding the use words with multiple usages that could be confusing, using appropriate grammar, spelling, and punctuation errors, Provide clear instruction concerning how the question SHOULD and SHOULD NOT be answered. Provide clear instruction concerning how the question SHOULD and SHOULD NOT be answered. For example, indicate that responses to be graded should be complete sentences and that outlines will not substitute for answers. For example, indicate that responses to be graded should be complete sentences and that outlines will not substitute for answers. Adding time expectations to answer each question is instructional. Adding time expectations to answer each question is instructional. Ask others to read your question for clarity. Ask others to read your question for clarity. Ask the reader if they can tell you what you are asking of the student. Ask the reader if they can tell you what you are asking of the student. Reviewers will not need to know the correct answer to provide feedback. Reviewers will not need to know the correct answer to provide feedback. ELEMENTS OF EQ CONSTRUCTION Experimental Biology Meetings April 10, 2011 Washington, DC
19
Prior to grading, construct for each question an outline listing all of the components that you expect for a perfect score. Prior to grading, construct for each question an outline listing all of the components that you expect for a perfect score. Assign relative point values for each component to obtain a score. Assign relative point values for each component to obtain a score. Use outline to explain to student what your expected. Use outline to explain to student what your expected. Use anonymity until final grading decisions are made. Use anonymity until final grading decisions are made. Prior to assessment, make sets of identical questions Prior to assessment, make sets of identical questions Referring to the question outline, evaluate all responses before proceeding to grading the next question. Referring to the question outline, evaluate all responses before proceeding to grading the next question. Using grading outline for each question has many advantages. Using grading outline for each question has many advantages. It is easier to be consistent and fair It is easier to be consistent and fair It facilitates consistent discussion of grading standards. It facilitates consistent discussion of grading standards. It minimizes more subtle forms of unrecognized bias. It minimizes more subtle forms of unrecognized bias. Depending on EQ risk (e.g., course exam or Ph.D. exam) a percent grade or pass/fail recommendation can be chosen. Depending on EQ risk (e.g., course exam or Ph.D. exam) a percent grade or pass/fail recommendation can be chosen. For example, if a percentage grade is not needed, student performance could be categorized as (1) meeting, (2) exceeding, or (3) beneath passing standards. For example, if a percentage grade is not needed, student performance could be categorized as (1) meeting, (2) exceeding, or (3) beneath passing standards. EQ ASSESSMENT Experimental Biology Meetings April 10, 2011 Washington, DC
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.