UMDNJ-New Jersey Medical School Developing a Departmental Examination for a Third Year Family Medicine Clerkship Judy C. Washington, MD, Jesse Crosson, PhD, Chantal Brazeau, MD UMDNJ-New Jersey Medical School
Educational Goals and Objectives Participants attending this lecture-discussion will leave understanding How to develop a departmental exam from an existing database. How to apply the National Board of Medical Examiners standards to exam kit questions How to validate a test instrument using existing resources. How the process to develop a standardized exam from an exam kit can be simplified with the assistance of a statistician
Our Team Our team had no experience in designing a test but… Dr. Washington took courses in curriculum design/evaluation served as a content expert Dr. Brazeau has experience with standards setting with NBME Dr. Crosson has the statistical expertise assisted in the analysis of the reports from the testing center instructed them how to do the year-end summary analysis
Rationale Current SHELF Exam not relevant to the 20 common problems and other important concepts in Family Medicine well standardized Clerkship faculty need a reliable examination for the third year clerkship Using Sloane’s Essentials of Family Medicine (4th edition) the Exam Kit could help solve the dilemma Used both exams during transition (curved)
Developing the question database Categorizing and discarding questions Predoc education committee (five faculty) NBME standards were used Selecting the items Developed a database of Hard/Easy questions Rewrote or developed new questions Developing the test Allowed the computer to select the questions 60 hard/40 easy for a total of 100 questions
NBME Standards Testwiseness: avoid absolute terms, grammatical cues, long correct answer Irrelevant difficulty: avoid complex options, none of the above, tricky stems, vague terms Other guidelines: avoid negatively phrased items, long stem-short options are best, avoid trivial facts, test important concepts
Validating the Exam Administered the first version- July 2002 100 questions Students were responsible for the entire text Students continued to take the SHELF Discarded non-discriminatory questions Too easy/too hard Calculating the reliability coefficient Calculated by our testing center/Dr. Crosson determined the reliability of the number
Reliability Coefficient and Discrimination Index The extent to which the test is likely to produce consistent scores Types: inter-correlations between items, length and content of the test Ranges from 0 (no reliability) to 1.00 (perfect reliability) Discrimination Index Difference between the % correct in the upper and lower group Point Biserial Correlation Correlation between examinees performance on the item (right or wrong) and total test score Mean average score
Validating the Exam Fourth version of the exam year 1 had a high reliability coefficient reliability coefficient was .85 lowest grade was 56 and highest grade was 89 with a mean of 72 test was curved to the mean during 2002-2003 but not 2003-2004 Administered the same exam January-June 2003 (year 1) and July 2003- June 2004 (year 2) Cumulative reliability coefficient was .78 (.8 to.9 was our goal)
Challenges Two exams were viewed as jumping through hoops Not using the SHELF 2004-2005 Significant number of students failed (18) The departmental Exam was viewed as not representative of what was covered Exam now covers fewer chapters that include the 20 common problems
Challenges Creating an exam that was felt to be more representative We reviewed the old exam Discarded questions Use questions from the database that we created and from the MUSC site Validating a new exam Decided to use Modified Angoff Procedure Convened faculty to review exam and calculate the score
Modified Angoff procedure A group of experts discusses the characteristic of a “borderline” examinee For each item on the test judges estimate the percentage of borderline examinees who would answer the item correctly Pass/fail standard the average of the percentages for the items.
Angoff Scores
Challenges The present exam Making changes to improve reliability Mean score is the average of the Angoff Score: 71.97 Curved to 81.61 (based on best rotation from last year) Reliability coefficient is lower: 0.66 4 students failed Making changes to improve reliability Reconvened faculty to review 19 questions that are too easy Recalculate Angoff of these questions and the mean of the test or… Rewrite questions, calculate the Angoff, and then the mean of the test Next exam to be given February 11, 2005
Discussion Questions about the process? What resources do you already have to allow you to do this process? Testing center could provide analysis? Do you have a statistician? Do you find students complain about the content of the exam? This helps make it valid especially when you have to report to the Dean of Education Is anyone using a similar process?
Resources Constructing Written Test Questions For the Basic and Clinical Sciences - Section IV. National Board of Medical Examiners. http://www.nbme.org/PDF/2001iwgsec4.pdf