Research Model Problem and Research Approach In today’s climate of increasingly large computer science classes, educators face the challenge of delivering high-quality, individualized feedback for programming assignments. This feedback is essential to student learning. Peer review is a proven learning approach that allows students to observe and critique different solutions to a problem, as well as to receive detailed feedback on their own work. The overarching purpose of this research is to investigate whether peer review can be an adequate substitute for instructor review. We also seek to investigate if a combination of reviewers with specific demographics and personal characteristics could prove more effective than a single reviewer. Research Questions Can peer review adequately substitute for instructor review? What reviewer demographics make for a sufficient peer reviewer? Will students feel that they are getting reliable feedback from peers? Results and Reflections The following results were found by comparing and performing statistical analysis on the data collected from the reviews and from the post-review survey: Result: Comparing the means of the reviewer-given grades and the instructor-given grades, we find that there is a significant difference between the two means. Reflection: Reviewer-given scores differ significantly from the instructor-given scores, and peer reviews from all reviewers are not an adequate substitute for instructor reviews. However, this is not entirely surprising because some students are poor reviewers. Result: We analyzed the absolute value of the differences between the scores given by reviewers and the instructor using a paired t-test, and identified 20 reviewers (out of a total of 35) for which there was no statistically-significant difference between reviewer-given scores and the instructor-given scores. Reflection: This means that altogether, the best reviewers produce scores that are comparable to the instructor scores. This potentially invokes the possibility of identifying the best reviewers in classes and relying more heavily on their reviews as substitutes for instructor reviews. Result: Looking at the reviewees’ feedback on their reviewers under the categories of amount of detail, accuracy, and helpfulness, and comparing them from all of the reviewers up against the best reviewers, we find that the only category that yields a significant difference is accuracy. Reflection: This is expected and reassuring since accuracy of the review is important, and the best reviewers should generate more accurate reviews than all of the reviewers (including the bad reviewers). It was somewhat surprising, however, that review detail and helpfulness were not significant contributors to outstanding reviews. Result: Having identified the better reviewers, we compared their demographics and personal characteristics against the less accurate reviewers’ and found that there were no significant correlations connecting any certain characteristics to being a good reviewer. Reflection: We find it rather unexpected that there are no correlations. We plan to repeat the experiment with different demographic measures to find if this affected correlations. Measures Pre-Survey Review Assignments Post-Review Survey Research Questions Is peer review an adequate substitute for instructor review? What reviewer demographics make for a sufficient peer reviewer? Independent Variables Reviewee Reviewer Demographics and Personal Characteristics* Dependent Variables Accuracy Helpfulness Difference between Instructor and Reviewer scores Garousi, V. (2010). Applying Peer Reviews in Software Engineering Education: An Experiment and Lessons Learned. Education, IEEE Transactions on, 53(2), Reily, K., Finnerty, P. L., & Terveen, L. (2009). Two Peers are Better Than One: Aggregating Peer Reviews for Computing Assignments is Surprisingly Accurate. GROUP '09: proceedings of the 2009 International ACM SIGGROUP Conference on Supporting Group Work : May 10-13, 2009, Sanibel Island, Florida, USA (pp ). New York, N.Y.: Association for Computing Machinery. Bibliography The author acknowledges the Blugold Fellowship of the University of Wisconsin – Eau Claire for funding this research. Special thanks to student subjects in the experiment for their participation and cooperation, as well as to Dr. Joline Morrison and Dr. Mike Morrison for help with the work. Acknowledgement of previous student researchers correlated to this project: Luke Komiskey, Brandon Holt, Daphne Brinkerhoff, and Greg Boettcher. Acknowledgements Methods The following experimental steps were executed in a second-semester freshman computer science programming class: A pre-survey was conducted to collect demographical and personal data (specifically outlined in the footnotes of the research model illustration above) describing the student subjects in the experiment. The students completed assignments for the class, and then were asked to complete reviews of three peer assignments by answering detailed questions provided by the instructor, as well as giving a final score for the assignment. This allowed each student to compose three reviews as well as receive three reviews. Thirty-five usable reviews were completed. A post-survey was conducted asking the students to compare their received feedback from peers to received feedback from the instructor based on the criteria of brevity, accuracy, and helpfulness. This survey also directly addressed the question of whether students find feedback from peers reliable or not. *Demographics and Personal Characteristics include gender, age, year in school, patience, problem solving ability, ability to look at something new and figure it out, honesty, general curiosity, fairness, ability to give quality advice, type of learner, learning speed, personality type, coding confidence, amount of experience in subject, average grade in subject (A, B, C, D, or F), and amount of review experience. Result: When asked after the reviews were completed if the students found feedback from peers reliable, 27 of 33 students (about 82 percent) said “yes.” Reflection: This is highly important because if the goal is to find a substitute for instructor review, and peer review is proven to be an adequate substitute, it will not matter unless the students trust the feedback and accept it as being reliable. Future Directions Refine demographic and review feedback questions in hopes of finding a pattern of classifying characteristics of good reviewers Have student subjects perform more reviews in an effort to have more data Gather data over time to see if students improve as reviewers Manipulate different factors to see if they improve review results (examples could include training reviewers or providing a rubric for the assignment being graded)