A First Look at Novice Compilation Behavior Using BlueJ By Cole Spicer
Introduction Where does correct code come from? When and What students choose to compile Want to determine how students tend to program in lab What types of errors do they encounter? The experiment was conducted at the University of Kent So they chose to conduct such of an experiment, to try to understand a beginning computer scientists programming behavior Because perfect code doesn’t just fall from the sky Good programming habits are learned through repetative teaching and experience. Getting students to approach and go about each problem in a constructive way. So the experiment in the paper deals with analyzing beginning students programming behavior by looking at the types Of errors they encounter and also the sequence of errors and the times between them
Introduction cont… Look into novice compilation behavior Goal: Is to help inform the teaching of programming and the program development So the experiment in the paper deals with analyzing beginning students programming behavior by looking at the types Of errors they encounter and also the sequence of errors they encounter in a session The experiment also looks into the times between them the errors The Goal of this experiment is to give feed back to the professors and to the students By giving feedback to the professors they can look at the data realize points that they need to stress more For example if the student is consistantly forgetting to put semicolons at the end of each statement. The professor will be able to see this and Know that maybe in the next lab session just mention something to the student about not forgetting to put a semicolon at the end of each statement, and Maybe the students semicolon errors will decrease
BlueJ Java programming development environment for small scale software development. For those that do not know what BlueJ is. Blue J is a Java programming environment for small scale software development Here are a couple of screenshots of BlueJ.
Previous Work Most studies explore the cognitive psychology of novice programmers Planning Pedagogic IDE Error Rates Error Message Design Characterizing Novices Few studies explore the programmer’s behavior Previous work in the field of compilation behavior tend to only deal with the cognitive psychology of novice programmers. This deals with what the programmer understands Some cognitive studies deal with characterizing the planning and problem solving process. These type of studies don’t help us out Because these studies do not deal with the syntactic problems the students deal with during the programming process. There are also studies that deal with designing programming languages for novices and the environments to support these languages. But little studies have dealt with the use of these environments by the students Other previous studies that are not neccasarly behavioral studies but are still import to our study are those dealing with error rates and error message design. These studies provide models and ideas for analyses of the process students go through while programming Another study that dealt with characterizing novices classified students as either stoppers or movers. Where stoppers were classified as students that while Working would constantly ask for help every step of the way. Where movers were those students who tried to figure out problems on there own. They Also mention that there were also extreme movers that worked paid little attention to the feedback of the compiler and would try to just hack away until they Figured it out. But we wouldn’t know anything about that would we.
Methods Observed novice compilation behavior in classroom tutorial sessions. Met once a week for one hour 63 students Worked through one or two problems to help illustrate concepts form that week’s lecture. For our study there were 63 students who signed consent forms for us to record there data. Labs met once a week
Methods cont… Set BlueJ up to report at compile time Complete source from students session Metadata Username Research site Client-side index indicating compilation number in the current sequence Compilation result Filename Start time of compile Etc; Shipped to a server for storage and later analysis BlueJ was set up to report at compile time the complete students source, metadata which included Username, research site (for this experiment Kent), client-side index indicating compilation number in the current sequence, Compilation result, filename, start time of compile and etc. Next the data was shipped to a server for storage and later analysis.
Course Marks 4 marked assignments Three were take-home coursework In class exam Intended to provide feedback to students and instructors a sense of where they stood. During the term the 63 students had 4 marked assignments, where 3 were take-home courseworks and the final was a Inclass exam. These assignments were used to provide feedback to the professor about where the students stood at this Point.
Grade Distributions Assessments Here’s a box and whiskers graph. The point of this graph was to see where the 63 students participating in the experiment stood when compared To the entire class, and as you can see their pretty similar. So this tells us that we can for this experiment we can use the results we get from this Experiment to represent the entire class
Attendance University of Kent Programming Courses = One class session out of eight Students in the study missed significantly more. Only six students managed to attend seven of nine sessions Reasons may be lack of motivation So attendence for the experiment was a problem for the researchers. At the University of Kent, For programming courses the average for absence’s is one class session out of eight. For the experiment Students missed significantly more. They state that students missed one out of two and only six students managed To make it too seven of the nine sessions. So the reason why students missed more was basically lack of motivation. When students realized that they were not getting extra credit for the class and that they weren’t getting paid for their Efforts, there was no motivation for students to come. So attendance was a problem for the researchers.
Analysis of Results A minority of different types of errors account for the majority of errors dealt with by students. Mostly quick fix errors This can be seen as the user is just letting the compiler do the thinking for them. So the researchers say that a minority of different types of errors account for the majority of errors dealt with by students. These errors are mostly quick fix errors, so a typical programming behavior is for the programmer to write some code and then go back and fix all The errors. The researchers also say that this may appear that the student is just letting the compiler do the work for them. I know From my experience I use eclipse and in eclipse you can use an automatic compiler that gives you instant feedback and also you can Use automatic finish to finish statements. Next the researchers look into error types and distribution to provide a clearer image into novice Programming behavior.
Error Types and Distribution Total of 1,926 errors Of the 42 error types the five most common accounted for 58% of all the errors Missing semicolons 18% Unknown symbol: variable 12% Brackett expected 12% Illegal start of expression 9% Unknown symbol: class 7%
Compilation Errors Looking at this graph we can see the frequency of errors. The type and the frequency of errors that a student must deal with after compiling their code plays a significant role in determining their consequent behavior
Time Between Compilations 51% of all compilation events occurred 30 seconds after the previous event 20% of all compilation events involved more than two minutes of work time between events.
Time Between Compiles
Time Between Compiles cont… The graph doesn’t tell us anything about compilation results. So analyze the compile events as pairs F means the compilation was an error T means the compilation was a success Now we want to rebuild the time between compilations graph, and then we will be able to understand more about the Students programming behavior
Compilation Pairs First thing the researchers figured that the 30% T T compilation pair was over-represented. They said because There are many ways to compile in BlueJ where some ways recompile all files in a project. So they just left it out of the Graph. So analyzing this graph shows us more into the behavior of a student that recompile quickly and when they do not . As you can see when a student encounters an error they are likely to recompile quickly, also we can see from this graph that when a student gets a successful compile over 60% of the time they wait over 2 minutes to recompile. Though they say they Do not know what goes on in these two minutes they do say that a substantial amount of work is done in these two minutes. They say Over 100 characters are modified.
Amount of Work Most is done in the two minute span following a successful compile Even though the student is compiling frequently doesn’t mean a lot of work is being done. The three most common errors are typically handled in less than 30 seconds and require little change to the source.
Quick Recompile Since most of the work is done after a successful compile. What are they doing when they recompile quickly. So from the previous graph we discovered that a lot of time was spent between compilation events when the first event was a Successful event. And in this time most of the work is performed. So the researchers wanted to know how much work is done Between compile events when recompilation occurs quickly. Looking at these tables we can see seconds spent and characters changed After the three most common errors.
Session Analysis “Session” represents the sequence of compiles from one class period. Student typically compiles 10 times per session. Remember that our goal is to see if we can determine different characteristics about compilation behaviors. So now the researchers wanted to investigate the behavior of individuals within the population. So lets analyze a session. A session is a sequence of compiles from one class period. On average a student compiles 10 times per session.
Session Analysis cont… By observing this we can see that most students do compile around 10 times. But this doesn’t represent the whole population, some students compile double that or even Triple. So what kind of behavior are they.
Session Analysis cont… We can see the students who have typical compilation patterns, and the students who compile more than the average. Students who compile more than the average could be very meticulous or just sloppy Some students do not even trust the error message reported by BlueJ It would be nice if all the outlying data was form a meticulous student. But from the data that they collected it seems to suggest otherwise. This shows up in the data as repetative errors. And 21 % of the time the exact same error occurs with no change to the source code
Session Analysis cont…
Shaping Behavior Typical behavior of students from this experiment is to make changes and then come back and correct all the syntax errors that resulted from most recent addition of code. Now what can be done to shape this behavior. What can be done to change the programmers behavior.
Shaping Behavior cont… Encouraging them to make fewer semicolon mistake Introduce highlighting of bracket pairs Highlight spaces where a expected token is supposed to be Now observe how students behavior changes
Shaping Behavior cont… Don’t want students to become dependent of development environments. Want all shaping of behavior to be in the students best interest.
Future Work What behaviors are classified as good or bad? How to detect these behaviors