Exploring the Relationship Between Novice Programmer Confusion and Achievement By: Diane Marie Lee Ma. Mercedes Rodrigo Ryan Baker Jessica Sugay Andrei Coronel
Affective States and Achievement Recent studies have illustrated the relationships between affective states and achievement Negative affective states have negative impact on student’s achievement (Craig et al, 2006; Rodrigo, 2009; Lagud, 2010) Craig = Autotutor; boredom has negative correlation with learning gains Rodrigo = boredom and confusion = lower achievement in the programming course Lagud = Aplusix = highest levels of boredom and confusion among low-achieving students
Confusion Double-edged/ Dual Nature (D’Mello 2009) Harmful Helpful
Goal Discovery-with-models approach to finding the relationship between novice programmer confusion and achievement
Data Collection 149 students enrolled in CS21a – Introduction to Computing I Four lab sessions BlueJ IDE BlueJ Plug-in (Jadud and Henriksen, 2009) BlueJ IDE plug-in by Jadud Connected to a SQLite database server
Data Collection Compilation logs include Compilation logs = all submissions made to the compiler Compilation logs include Computer number Timestamp Code Error message (if any) And many more!
Data Collection Total of 340 student-lab sessions Total of 13,528 compilation logs collected 13,000++ of compilation logs are to be too hard to label Can’t see confusion in just one compilation = must look at group of compilations
Data Labeling Sorted the compilations by student and by Java class name Grouped the compilations into clips Clips = 8 compilations Total: 2,386 clips Raters were asked to label a sample of 664 clips Wanted to have a representative sample, so we’ll get 2 clips per student However, some students are so amazing that they had less than 16 compilations during the whole lab session
Data Labeling Used low-fidelity text replays Labels Maintains good inter-rater reliability and efficient in aiding coders to label student disengagement (Baker et al. 2006) Labels Confused Not Confused Bad Clip Cohen’s Kappa between raters: 0.77 The same error appeared in the same general vicinity within the code for several consecutive compilations. The coders inferred that the student did not know what was causing the error and how to fix it. An assortment of errors appeared in consecutive compilations and remained unresolved. The coders inferred that the student was experimenting solutions, changing the actual error message but not addressing the real source of the error. Code malformations that showed a poor understanding of Java constructs,e.g. “return outside method”. The coders inferred that the student did not grasp even the basics of program construction, despite the availability of written aids such as Java code samples and explanatory slides.
Data Labeling Filter out “bad clips” Remove clips where raters disagreed on the label Left with 418 clips for model construction
Model Construction Used RapidMiner version 5.1 Used J48 Decision Trees Features were mined from the clips J48 Decision Trees with 10 fold batch cross validation at the student level
Model Construction Feature set used: Average time between compilations Maximum time between compilations Average time between compilations w/ errors Maximum time between compilations w/ errors Number of compilations w/ errors Number of pairs consecutive compilations ending w/ the same error Time- and error-related features
Kappa: 0.86
Data Relabeling Model was coded as a Java program Had the program relabel all the 2,386 clips Generated three sets of confused-not confused sequences Correlated the percentage of the sequences of each student to their midterm exam scores We counted the number of occurrences of each state or sequence per student within each set. The total number of sequences per student varied. We there- fore normalized the data by dividing the number of occurrences of each state or sequence per student by the total number of occurrences for that student.
Not Confused-Not Confused Not-Confused-Confused Confused-Not Confused Results Not Confused-Not Confused Not-Confused-Confused Confused-Not Confused Confused-Confused Relationship with midterm .064 .139 .144 -.229 (0.539) (0.180) (0.163) (0.026) R = above P = below
Results NNN NNC NCN NCC CNN CNC CCN CCC -.015 .014 .062 -.046 .233 NNN NNC NCN NCC CNN CNC CCN CCC Relationship with Midterm -.015 .014 .062 -.046 .233 .163 .052 -.337 (.901) (.909) (.610) (.704) (.05) (.174) (.665) (.004)
Conclusion Prolonged confusion has a negative impact on student’s performance Resolved confusion has a positive impact on student’s performance A certain amount of confusion is needed for learning
On-going Work Support the incorporation of tools for automatic detection of confusion in computer science learning environments Redoing the sampling and clipping method
Thank you Questions? Confusion = thrashing = repetitively getting errors Area for future work = go deeper to confusion literature