Evaluation of the Advice Generator of an Intelligent Learning Environment Maria Virvou, Katerina Kabassi Department of Informatics University of Piraeus
Evaluation of educational software Formative evaluation involving human tutors. A formative evaluation occurs during design and early development of a project. Oriented to the immediate needs of developers. Increase the likelihood that the final product will achieve its stated goals. The evaluation should take place before the implementation to minimize the cost of early design errors.
Intelligent Learning Environment Protected learning environment for novice users of graphical user interfaces. File manipulation program such as windows 98/NT explorer. Monitors users’ actions and reasons about them. In case of an error the system provides spontaneous advice.
Human Plausible Reasoning (HPR) Collins, A., Michalski R.:1989 A descriptive theory on human plausible inference. The theory consists of: –Representation of plausible inference patterns, such as deductions, inductions and analogies. –A set of parameters, e.g. similarity, typicality. –a system relating the different plausible inference patterns and the different certainty parameters.
IFM’s architecture Intelligent Tutoring Systems’ architecture IFM’s components –Domain representation, –Advice generator, –User modeller, –User interface.
Operation of the system 1.The user issues a command. 2.The system reasons about user’s action. 3.If action is considered unexpected, the system generates alternative commands. Otherwise, command is executed. 4.The user is not obligated to follow the system’s advice. The user can: execute his/her initial action execute a new action.
Certainty parameters Degree of typicality of the usage of a command in the set of the total number of executed commands ( ) Degree of similarity of a command or an object to another command or object, respectively ( ) Frequency of an error set in the set of all errors ( ) Dominance of an error in the set of all errors ( ) Degree of certainty ( ) = 0.4 * * * *
Evaluation IFM’s aim: Provision of plausible advice similar to human tutors’ advice. HPR for simulating the human plausible reasoning of a human advisor. Evaluation’s aim: How close is IFM’s reasoning to human tutors’ reasoning? Trying to reveal what the human tutors’ way of thinking was.
Methods of evaluation Heuristic evaluation –The expert reviewers critique to determine conformance with a short list of design heuristics –Questionnaire Cognitive walkthrough –The experts simulate users walking through the interface to carry out typical tasks –Comment on real-life examples
Results of the Questionnaires. 60% of the human tutors thought that the similarity between objects or commands was the most important aspect when generating advice. 55% believed that the identification of the most frequent error of an individual student was the second more important aspect that should be taken into account. 55% of the human tutors believed that the frequency of an observed error was to be taken into account, but not in first priority. Finally, 60% believed that the frequency of execution of a command was the last aspect to be taken into account.
Additional Results Similarity between two commands. 60% of the human tutors believed that the relative position of commands in the graphical representation was more important than the similarity of their result when executed. This proportion was even greater (60% - 85%) when the human tutors commented on the real-life protocols.
Users’ Protocols Human tutors were asked to comment on real-life examples The protocols collected by an empirical study (early stages of the life-cycle) Human tutors were also given information provided by the user model. Example were given as input to IFM. Comparison of IFM’s advice to human tutors’ comments.
A real-life example Undesired action: A user issues a command for creating a Microsoft Word Document and claims s/he didn’t intend to have done so. 55% of the human tutors suggested –Create a bitmap image, because this command was next to the one issued IFM suggested 1.Create a text document, because this command’s result was similar to that of the command issued (Human tutors’ second alternative). 2.Create a sound file, because this was his/her more usual action (Human tutors’ third alternative).
Analysis of the results Connection of the results with the certainty parameters used by the advice generator. IFM could successfully generate the alternative actions to be suggested to the student but not in the right order. The categorization of the alternative actions was not completely successful and had to be refined. Refinement of the formula for the calculation of the certainty.
Refinement of the formula = 0.4 * * * * = 0.4 * * * * The identification of the most frequent error of a particular student was the second factor to be taken into account (55%) - The weight of dominance is 0.3, instead of 0.1. The fact that a student has repeated an error many times must have been taken into account, but not in first priority (60%) - The weight of frequency was fixed to 0.2, instead of 0.3.
Calculation of the similarity Similarity of objects –the similarity of their names (70%). –their relative distance in the graphical representation. Similarity of commands –their relative distance in the graphical representation (60%). –the similarity of the commands’ result (40%). IFM was successful in this calculation
Conclusions The evaluation aimed –How successful HPR was at reproducing human tutors reasoning. –Usability of the system (future plans). The evaluation revealed –The learning environment was quite successful at generating alternative commands –The order of presentation of the alternative commands to students was not similar to the the one of the majority of human experts. –The evaluation contributed to the refinement of the adaptation of IFM into the learning environment.
Comments & Questions Maria Virvou: Katerina Kabassi:
Simple example 1.Cut(A:\program1.pas) 2.Paste(A:\project2\) The systems finds the action suspect. Replacement of file A:\project2\program1.pas System’s advice: Change directory to A:\project1\ project1 program1.pas A:\ project2 program1.pas program2.pas
Questionnaires When you suggest an alternative action to a user who has just executed an action that you think as unintended, which one of the following issues do you consider and in what order of significance? –Object or command similarity (an object is a file or a folder and a command is for example, cut or copy). –The user’s error frequency (e.g. The user has repeatedly selected an unintended command, which is neighbouring to the one s/he meant.) –In case of error diagnosis, whether it is the most common user’s mistake related to the other types of error he commits.