Automated Assessment of Programming Exercises
Programming Exercises Can categorize into: Large Small There are similarities and differences in what the system provides for these use cases. We will start with systems for “small” exercises
Codingbat Pretty basic, but a good yardstick to measure against. Fairly widely used in CS Small exercises Lots of sites have answers. http://codingbat.com
CodeAcademy http://codeacademy.com
Dynamic vs. Static Assessment The goal is to (automatically) assess, and give feedback Dynamic Assessment: The runtime behavior of the student’s answer We generally use this as a proxy for “Does it compute the right function?” A heuristic: Unit tests Static Assessment: Evaluation of the “quality” of the student’s answer (distinct from correctness) Does it do things “in the right way” Heuristics on the code, NOT its behavior
OpenDSA Programming Exercises Geared for small exercises (write/edit 5-20 lines of code). Evaluated in a “sandbox” on a server. Java-only right now Process: Check for compiler errors, return if any. Dynamic analysis: Run unit tests. Feedback if there are problems. Static analysis: Check various heuristics. http://algoviz.org/OpenDSA/Books/RecurTutor/html/CodeCompletionEx.html
Feedback: Program Visualization Algorithm Visualization vs. Program Visualization AV: Tutorial presentation of an algorithm. The algorithm is “baked into” the AV PV: Visual feedback generated from a program to help understand that particular program jGRASP JhavePOP http://jhave.org/jhavepop Code Mirror (PythonTutor, Guo) http://pythontutor.com PV does not address the issue of whether the program is “correct”, it helps the user to decide that for themselves
Khan Academy (Programming) HTML5-based Integrates tutorial content, quizzes, and programming exercises At least some exercises are HEAVILY scaffolded, and heavily scripted with detailed requirements. https://www.khanacademy.org/computing/computer- programming/programming
Web-CAT Designed to support grading of large programs Aspects of (support for) manual grading, and automated grading Style checking Unit testing (test cases “correct” or not, time constraint) Testing vs. debugging Code coverage of student tests