EXAMS: WHAT ARE THEY REALLY GOOD FOR, ANYWAY?
What professors say about exams. What students hear about exams.
The standard assumption in nearly all education research is that learning occurs while students study and encode material. It is also generally assumed that testing is a relatively neutral event that only measures the learning that occurred during study but is not, in and of itself, a learning experience. Thus, if learning happens exclusively during study periods, and if tests are neutral assessments, then additional study trials should have a strong positive effect on learning, whereas additional test trials should produce little effect. If repeated study and/or test trials do benefit learning, this would contradict the conventional wisdom that students drop material that they have already learned from further study or testing in order to focus their efforts on material they have not yet learned. This latter strategy is implicitly endorsed by many contemporary theories of study-time allocation and is often used in many popular study methods (e.g. flash cards).
Some numbers for your consideration: The amount of hours the Coursemaster spends in preparing, setting up, proctoring, and grading each Anatomy exam: Making up exam: ~2 hours Setting up exam: ~2 hours Proctoring exam: ~1 hour Grading exam: ~5 hours TOTAL/EXAM~10 hours TOTAL FOR 3 EXAMS ~ 30 HOURS The total number of hours Coursemaster devotes to teaching for the entire course is ~140 hours. Thus, the Coursemaster spends ~20% as much time in duties related solely to test taking as he does in teaching the entire course. In a Pass/Fail grading system, this effort is made solely to determine those students who do not meet a minimal level of competency in the subject (usually a grade of at least 60-65%). In over 25 years of teaching at WUMS, the number of such students is normally <1%. Questions to ponder: Are exams a useful way for both students and faculty to be spending their time? Are exams simply anxiety producing neutral events that only measure learning that occurred during study, or do they contribute to learning in some fundamental way?
In a recent experiment, 40 Washington University undergraduates were asked to learn a list of 40 Swahili-English word pairs. Students learned the list across a total of 8 alternating study (S) and test (T) periods. The first study period consisted of 40 study trials (5 secs/pair) followed by 40 test trials (8 secs/pair). After that, the number of study and test trials varied according to the condition. 1.In the first condition, subjects repeatedly studied, and were tested on, the entire list of 40 word pairs in each study and test period (denoted ST). 2. In the second condition, once a word pair was “learned” (i.e., recalled) it was dropped from further study but still tested in each subsequent test period (denoted SnT). 3. In the third condition, “learned” pairs were dropped from further testing but still studied in each subsequent study period (denoted STn). 4. In the fourth condition, “learned” pairs were dropped from both study and test periods (SnTn). This condition represents what conventional wisdom and many educators instruct students to do: Study something until it is learned (i.e., can be recalled) and then drop it from further study and testing ( “non-cumulative” exams?).
Which learning strategy do you think led to a steeper learning curve of the study material? Jeffrey D. Karpicke and Henry L. Roediger III, The Critical Importance of Retrieval for Learning Science V. 319: ; 15 Feb. 2008
Cumulative performance during the learning phase. There were no differences in the learning curves of the four learning strategies. IT DIDN’T MAKE THE SLIGHTEST BIT OF DIFFERENCE! The students in all four conditions also predicted they would recall about 50% of the word pairs when they were retested in 1 weeks time.
Proportion of word pairs recalled on the retest taken 1 week after “learning”. THIS IS NOT WHAT HAPPENED! Total number of trials:
BEARING THESE RESULTS IN MIND, WHERE DO WE GO FROM HERE? Since we have such a select medical student body, this should allow us to be more creative in our “testing” of students. Our exams should NOT simply test the basic “competency level” of our students. Our students ARE competent! One possibility - let the NBME filter out the fewer than 1% who aren’t competent for one reason or another. Medicine, like all science today, is a collaborative enterprise. Therefore we need to find creative ways to make our testing reflect this new paradigm. For example, can we find ways to “team test” and still be confident that each individual student is doing their part? Peer pressure may be a powerful motivator in this regard. Testing should be frequent, non-threatening (self-testing?), and collaborative.
WEEK 1 Can these two x-rays be from the same patient? If so, where is the injury? Is the spinal cord damaged? If so, which part? WEEKS 2-3 Would this patient need to be on a respirator? Why or why not? Would any part of the autonomic nervous system be damaged? Where would you insert a needle to obtain a CSF sample from this patient? What anatomical landmarks would you use for this procedure? WEEKS 6-7 Would you expect any loss of bladder control and/or sexual function? Why or why not? WEEKS 9-11 Would you expect any motor and/or sensory loss in this patient’s Upper Limbs? Thorax? Abdomen? Lower limbs? WEEK What might be a cause of elevated CSF pressure? In such a condition would you still consider collecting a CSF sample the same way you suggested in WEEK 2? KEEPING IT FRESH: THE QUESTION OF THE WEEK (our attempt to combine competency skills, testing benefits, and benign peer pressure)
SO WHAT’S THE TAKE HOME MESSAGE FOR WUMS STUDENTS? STUDY LESS AND TEST MORE!
THE END Keeping It Fresh: The Challenge of An Old Subject (Anatomy) Middle-Aged Professors and Young Students