Designing a Classroom Test Anthony Paolo, PhD Director of Assessment & Evaluation Office of Medical Education & Psychometrician for CTC Teaching & Learning.

Slides:



Advertisements
Similar presentations
An Introduction to Test Construction
Advertisements

Alternate Choice Test Items
Item Analysis.
FACULTY DEVELOPMENT PROFESSIONAL SERIES OFFICE OF MEDICAL EDUCATION TULANE UNIVERSITY SCHOOL OF MEDICINE Using Statistics to Evaluate Multiple Choice.
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
1. 2 Dr. Shama Mashhood Dr. Shama Mashhood Medical Educationist Medical Educationist & Coordinator Coordinator Question Review Committee Question Review.
Designing the Test and Test Questions Jason Peake.
Using Multiple Choice Tests for Assessment Purposes: Designing Multiple Choice Tests to Reflect and Foster Learning Outcomes Terri Flateby, Ph.D.
Gary D. Borich Effective Teaching Methods 6th Edition
Item Analysis: A Crash Course Lou Ann Cooper, PhD Master Educator Fellowship Program January 10, 2008.
Constructing Exam Questions Dan Thompson & Brandy Close OSU-CHS Educational Development-Clinical Education.
M ATCHING I TEMS Presenter Pema Khandu B.Ed.II(S)Sci ‘B’
Making Assignment Expectations Clear: Create a Grading Rubric Barb Thompson Communication Skills Libby Daugherty Assessment FOR Student Learning 1.
MCQ’s 1: Construction of an MCQ MCQ’s 1: Construction of an MCQ.
Assessing and Evaluating Learning
Classroom Assessment A Practical Guide for Educators by Craig A. Mertler Chapter 9 Subjective Test Items.
Classroom Assessment A Practical Guide for Educators by Craig A
Education 325: Assessment for Classroom Teaching G. Galy, PhD Week 5.
TEXAS TECH UNIVERSITY HEALTH SCIENCES CENTER SCHOOL OF PHARMACY KRYSTAL K. HAASE, PHARM.D., FCCP, BCPS ASSOCIATE PROFESSOR BEYOND MULTIPLE CHOICE QUESTIONS.
Module 6 Test Construction &Evaluation. Lesson’s focus Stages in Test Construction Tasks in Test Test Evaluation.
Oscar Vergara Chihlee Institute of Technology July 28, 2014.
Designing and evaluating good multiple choice items Jack B. Monpas-Huber, Ph.D. Director of Assessment & Student Information.
Tips for Top Tests FOSL fall September 10, 2007 Adapted from “Tools for Teaching” by Barbara Gross Davis.
Completion, Short-Answer, and True-False Items
HOW DOES ASKING OUR STUDENTS QUESTIONS ENGAGE THEM IN THEIR LEARNING? Campbell County Schools.
Evaluation of Student Learning: Test Construction & Other Practical Strategies Faculty Professional Development Fall 2005 Dr. Kristi Roberson-Scott.
CONSTRUCTING OBJECTIVE TEST ITEMS: MULTIPLE-CHOICE FORMS CONSTRUCTING OBJECTIVE TEST ITEMS: MULTIPLE-CHOICE FORMS CHAPTER 8 AMY L. BLACKWELL JUNE 19, 2007.
Ferris Bueller: Voodoo Economics Voodoo_Economics_Anyone_Anyone. mp4Voodoo_Economics_Anyone_Anyone. mp4.
Prepare and Use Knowledge Assessments. IntroductionIntroduction Why do we give knowledge tests? What problems did you have with tests as a student? As.
Assessment in Education Patricia O’Sullivan Office of Educational Development UAMS.
Session 2 Traditional Assessments Session 2 Traditional Assessments.
Teaching Today: An Introduction to Education 8th edition
Dillon School District Two Revised Bloom’s Taxonomy.
Using questions to achieve Higher Order Thinking
Classroom Evaluation & Grading Chapter 15. Intelligence and Achievement Intelligence and achievement are not the same Intelligence and achievement are.
EDU 385 Classroom Assessment Session 6 Preparing and Using Achievement Tests.
ASSESSING STUDENT ACHIEVEMENT Using Multiple Measures Prepared by Dean Gilbert, Science Consultant Los Angeles County Office of Education.
Assessment Item Writing Workshop Ken Robbins FDN-5560 Classroom Assessment Click HERE to return to the Documentation HERE.
Bloom’s Taxonomy vs. Bloom’s Revised Taxonomy. Bloom’s Taxonomy 1956 Benjamin Bloom, pyschologist Classified the functions of thought or coming to know.
Assessment Specifications Gronlund, Chapter 4 Gronlund, Chapter 5.
RELIABILITY AND VALIDITY OF ASSESSMENT
Lecture by: Chris Ross Chapter 7: Teacher-Designed Strategies.
Writing Multiple Choice Questions. Types Norm-referenced –Students are ranked according to the ability being measured by the test with the average passing.
Assessment and Testing
Presented By Dr / Said Said Elshama  Distinguish between validity and reliability.  Describe different evidences of validity.  Describe methods of.
Review: Alternative Assessments Alternative/Authentic assessment Real-life setting Performance based Techniques: Observation Individual or Group Projects.
The Instructional Design Process
Assessment Item Types: SA/C, TF, Matching. Assessment Item Types Objective Assessments Objective Assessments Performance Assessments Performance Assessments.
Test Question Writing Instructor Development ANSF Nurse Training Program.
Language Testing How to make multiple choice test.
Bloom’s Revised Taxonomy Creating Higher Level Discussions.
Do not on any account attempt to write on both sides of the paper at once. W.C.Sellar English Author, 20th Century.
EVALUATION SUFFECIENCY Types of Tests Items ( part I)
Multiple-Choice Item Design February, 2014 Dr. Mike Atkinson, Teaching Support Centre, Assessment Series.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 6 Construction of Knowledge Tests.
Objective Examination: Multiple Choice Questions Dr. Madhulika Mistry.
Writing Learning Outcomes Best Practices. Do Now What is your process for writing learning objectives? How do you come up with the information?
 Good for:  Knowledge level content  Evaluating student understanding of popular misconceptions  Concepts with two logical responses.
Assessment in Education ~ What teachers need to know.
Copyright © Springer Publishing Company, LLC. All Rights Reserved. DEVELOPING AND USING TESTS – Chapter 11 –
Writing Selection Items
Georgetown University
EDU 385 Session 8 Writing Selection items
Chapter 10: Bloom’s Taxonomy
UMDNJ-New Jersey Medical School
Constructing Exam Questions
به نام خدا.
Multiple Choice Item (MCI) Quick Reference Guide
Classroom Assessment A Practical Guide for Educators by Craig A. Mertler Chapter 8 Objective Test Items.
Multiple Choice Item (MCI) Quick Reference Guide
Presentation transcript:

Designing a Classroom Test Anthony Paolo, PhD Director of Assessment & Evaluation Office of Medical Education & Psychometrician for CTC Teaching & Learning Technologies September 2008

Content Purpose of classroom test Test blueprint & specifications Item writing Assembling the test Item analysis

Purpose of Classroom Test Establish basis for assigning grades Determine how well each student has achieved course objectives Diagnose student problems Identify areas where instruction needs improvement Motivate students to study Communicate what material is important

Test Blueprint To ensure the test assesses what you want to measure Ensure the test assesses the level or depth of learning you want to measure

Bloom’s Revised Cognitive Taxonomy Remembering & Understanding –Remembering: Retrieving, recognizing, recalling relevant knowledge. –Understanding: Constructing meaning from information through interpreting, classifying, summarizing, inferring, explaining. ITEM TYPES: MC, T/F, Matching, Short Answer Applying & Analyzing –Applying: Implementing a procedure or process. –Analyzing: Breaking material into constituent parts, determining how the parts relate to one another and to an overall structure or purpose through differentiating, organizing, and attributing. ITEM TYPES: MC, Short Answer, Problems, Essay Evaluating & Creating –Evaluating: Making judgments based on criteria & standards through checking and critiquing. –Creating: Putting elements together to form a coherent or functional whole; reorganizing elements into a new pattern or structure through generating, planning, or producing. ITEM TYPES: MC, Essay

Test Blueprint Learning Level (Number of test items) Content/ Objective Knows facts (Recall) UnderstandingApplies Principles (Application) Total Kreb Cycle Aquaporins Cell Types 5005 Total107825

Test Specifications To ensure the test covers the content and/or objectives in the proper proportions

Test Specifications TopicsTime spent on Topic % of total class time Number (%) of test items Kreb Cycle 10 hrs40%10 (40%) Aquaporins 10 hrs40%10 (40%) Cell Types 5 hrs20%5 (20%) Total25 hrs100%25 (100%)

Item Writing – General Guidelines 1 Present a single clearly defined problem that is based on a significant concept rather then trivial or esoteric ideas Use simple, precise & unambiguous wording Exclude extraneous or irrelevant information Eliminate any systematic pattern of answers that may allow guessing correctly

Item Writing – General Guidelines 2 Avoid cultural, racial, ethnic & sexual bias. Avoid presupposed knowledge which favors one group over another (“fly ball” favors those that know baseball) Refrain from providing unnecessary clues to the correct answer. Avoid negatively phrased items (i.e., except, not) Arrange answers in alphabetical / numerical order

Item Writing – General Guidelines 3 Avoid “None of the above” or “All of the above” type answers Avoid “Both A & B” or “Neither A or B” type answers

Item Writing – Correct Answer is Longer More qualified or more general Uses familiar phraseology Is grammatically correct for item stem Is 1 of the 2 similar statements Is 1 of the 2 opposite statements

Item Writing – Wrong Answer is Usually the first or last option Contain extreme words (always, never, nonsense, etc.) Contain unexpected language or technical terms Contain flippant remarks or completely unreasonable statements

Item Writing – Grammatical Cues

Item Writing – Logical Cues

Item Writing – Absolute Terms

Item Writing – Word Repeats

Item Writing – Vague Terms

Item Writing Effective test items match the desired depth of learning as directly as possible Applying & Analyzing Applying: Implementing a procedure or process. Analyzing: Breaking material into constituent parts, determining how the parts relate to one another and to an overall structure or purpose through differentiating, organizing, and attributing. –ITEM TYPES: MC, Short Answer, Problems, Essay

Comparison of MC & Essay 1 EssayMC Depth of learning Can measure application and more complex outcomes. Poor for recall. Can be designed to measure application and more complex outcomes as well as recall. Item prepFewer test items, less prep time Relatively large number of items, more prep time Content sampling Limited, few itemsBroader content sampling

Comparison of MC & Essay 2 EssayMC EncouragementEncourages organization, integration & effective expression of ideas Encourages development of broad background of knowledge & abilities ScoringTime consuming, requires special measures for consistent results Easy to score with consistent results.

Item Writing - Application MC application of knowledge items tend to have long vignettes that require decisions. Case, et al. at the NBME investigated the impact of increasing levels of interpretation, analysis and synthesis required to answer a question on item performance. (Academic Medicine, 1996;71: )

Item Writing - Application

Preparing & Assembling the Test Provide general directions –Time allowed (allow enough time to complete test) –How items are scored –How to record answers –How to record name /ID Arrange items systematically Provide adequate space for short answer and essay responses Placement of easier & harder items

Interpreting test scores Teachers High scores = good instruction Low scores = poor students Students High scores = smart, well-prepared Low scores = poor teaching, bad test

Interpreting test scores High scores too easy, only measured simple educational objectives, biased scoring, cheating, unintentional clues to right answers Low scores too hard, tricky questions, content not covered in class, grader bias, insufficient time to complete test

Item Analysis Main purpose of item analysis is to improve the test Analyze items to identify: Potential mistakes in scoring Ambiguous/tricky items Alternatives that do not work well Problems with time limits

Reliability The reliability of a test refers to the extent to which a test is likely to produce consistent results. Test-Retest Split-Half Internal consistency Reliability coefficients range from 0 (no reliability) to 1 (perfect reliability) Internal consistency usually measured by Kuder- Richardson 20 (KR-20) or Cronbach’s coefficient alpha

Internal Consistency Reliability High reliability means that the questions of the test tended to hang together. Students that answered a given question correctly were more likely to answer other questions correctly. Low reliability means that the questions tended to be unrelated to each other in terms of who answered them correctly.

Reliability Coefficient Interpretation General guidelines for homogeneous tests.80 and above – Very good reliability.70 to.80 – Good reliability, a few test items may need to be improved.50 to.70 – Somewhat low, several items will likely need improvement (unless short test 15 or fewer items).50 and below – Questionable reliability, test likely needs revision

Item difficulty 1 Proportion of students that got the item correct (ranges from 0% to 100%) Helps evaluate if an item is suited to the level of examinee being tested. Very easy or very hard items cannot adequately discriminate between student performance levels. Spread of student scores is maximized with items of moderate difficulty.

Item difficulty 2 Moderate item difficulty is the point halfway between a perfect score and a chance score. Item format Moderate Difficulty level 4-option MC63% 5-option MC60% 10-option MC55%

Item discrimination 1 How well does the item separate those that know the material from those that do not. In LXR, measured by the Point-Biserial (rpb) correlation (ranges from -1 to 1). rbp is the correlation between item and exam performance

Item discrimination 2 + rpb means that those scoring higher on the exam were more likely to answer the item correctly. (better discrimination) - rpb means that high scorers on the exam answered the item wrong more frequently than low scorers. (poor discrimination) A desirable rpb correlation is or higher.

Evaluation of Distractors Distractors are designed to fool those that do not know the material. Those that do not know the answer, guess among the choices. Distractors should be equally popular. (# expected = # answered item wrong / # of distractors) Distractors ideally have a low or -rpb

LXR Example 1 (* correct answer) A*BCDE N % 99%0% 1%0% Avg % Correct on Exam 85.3%0% 82.0%0% rpb Very easy item, would probably review the alternates to make sure they are not ambiguous and/or provide clues that they are wrong.

LXR Example 2 (* correct answer) ABC*DE N % 0%24%74%2%0% Avg % Correct on Exam 0%80.7%87.2%78.7%0% rpb Three of the alternatives are not functioning well, would review them.

LXR Example 3 (* correct answer) ABC*DE N %3%1%17%6%76% Avg % Correct on Exam 83.0%80.0%83.4%82.2%86.8% rpb Probably a miskeyed item. The correct answer is likely option E.

LXR Example 4 (* correct answer) AB*CDE N %13%49%3%25%9% Avg % Correct on Exam 81.5%87.4%82.3%84.5%82.4% rpb Relatively hard item with good discrimination. Would review alternatives C & D to see why they attract a relatively low & high number of students.

LXR Example 5 (* correct answer) AB*CDE N %3%69%1%6%21% Avg % Correct on Exam 83.0%85.3%80.0%82.2%86.8% rpb Poor discrimination for correct choice “B”. Choice “E” actually does a better job discriminating. Would review item for proper keying, ambiguous wording, proper wording of alternatives, etc. This item needs revision.

Resources Constructing Written Test Questions for the Basic and Clinic Sciences ( How to Prepare Better Multiple-Choice Test Items: Guidelines for University Faculty (Brigham Young University: (testing.byu.edu/info/handbooks/betteritems.pdf)

Thank you for your time Questions ???