Electronic Essay Graders Jay Lubomirski
How electronic essay graders evaluate writing samples Comparing the electronic graders to the human graders Gaming the system Topics
Educational Testing Services (ETS) is a non-profit test administration company Responsible for tests like GRE®, SAT® Subject tests, TOEFL® Test, etc Criterion® Service – online writing evaluation service e-rater® Scoring engine – system that scores essays written within the Criterion® Service ETS e-reader
Started in 1998, new versions since Focuses on writing quality rather than content Uses natural language processing to look at grammar, usage, mechanics, and development Goal is to predict the score a human grader would give an essay e-rater®
e-rater® is feed a sample set of essays based on the same prompt (question) and their scores from a human grader e-rater® builds a model of the essay content and how it relates to the scores the human grader gave the essays e-rater® is then fed the evaluation essays to score Assumption is that “good essays resemble other good essays” Process
Grammar checker looks for 30 error types Subject- verb agreement Homophone errors Misspellings Overuse of vocabulary The lexical complexity scorer computes a word frequency index and compares it against the word frequency in model Grammar & Lexical Complexity
Automatically identifies sentences that follow essay- discourse categories Introductory material Thesis Main ideas Supporting ideas Conclusion Organization is determined by computing length of discourse elements Scored against the model Organization and Development
In 2012, Mark Shermis compared 9 electronic grading systems (8 commercial, 1 open source) against 8 essay prompts Essays sourced from high school writing assessments that were graded by human readers Results demonstrated that electronic essay scoring was capable of producing scores similar to human readers Scoring the Systems
These systems are looking at language structure, they cannot verify facts presented in the essay Les Perelman, Director of Writing at MIT, wrote an essay that received the top score from e-rater®. The essay prompt was about the rising costs of college. Perelmen based his essay on the premise that college costs are so high because “Teaching assistants are paid an excessive amount of money.” Problems with electronic grading systems
“In conclusion, as Oscar Wilde said, "I can resist everything except temptation." Luxury dorms are not the problem. The problem is greedy teaching assistants. It gives me an organizational scheme that looks like an essay, it limits my focus to one topic and three subtopics so I don’t wander about thinking irrelevant thoughts, and it will be useful for whatever writing I do in any subject. I don’t know why some teachers seem to dislike it so much. They must have a different idea about education than I do.”
Winerip, M. (2012) “Facing a Robo-Grader? Just Keep Obfuscating Mellifluously” New York Times, April 22, Retrieved 4/13/2013 from essays.html essays.html Ramineni, C (2012) “Evaluation of the e-rater® Scoring Engine for the GRE® Issue and Argument Prompts” Educational Testing Service. Retrieved 4/13/2013 from Kolowich, S (2012) “Large study shows little difference between human and robot essay graders.” Inside Higher Ed. Retrieved 4/11/2013 from between-human-and-robot-essay-graders between-human-and-robot-essay-graders Shermis, M (2012) “Contrasting State-of-the-Art Automated Scoring of Essays: Analysis” Retrieved 4/13/2013 from Dikli, S. (2006). “An Overview of Automated Scoring of Essays.” Journal of Technology, Learning, and Assessment, 5(1). Sources