Dynamic Detection of Novice vs. Skilled Use Without a Task Model CHI 2007 Proceedings Amy Hurst, Scott E. Hudson, Jennifer Mankoff Carnegie Mellon University.

Dynamic Detection of Novice vs. Skilled Use Without a Task Model CHI 2007 Proceedings Amy Hurst, Scott E. Hudson, Jennifer Mankoff Carnegie Mellon University

Motivation Create Intelligent User Interfaces Main Idea: If applications could detect a user’s expertise, software could automatically adapt to better match expertise.

Uses Support adaptive interface with one more useful piece of information – novice or skilled user. Provide tailored intelligent help: descriptive vs. brief depending on user skill. Automatically generate data only when a user is likely to need them.

Challenges in Skill Detection Has to be application independent Done dynamically, continuously, and unobtrusively

Approach to Dynamic Detection Using the following: A set of features that quantify interaction such as mouse motion. A set of training data containing examples of novice and skilled use metrics. It is possible to determine Which features are most predictive of skilled use. Train a classifier so that given unlabeled test data, returns and indication of novice or skilled behavior.

Difference in Expertise - Qualitative Knowledge, speed, and comfort Experts tend to use domain knowledge in the head, or recall to achieve their goal Novices rely on knowledge in the world, or recognition. These differences manifest themselves in measurable differences in user action.

Difference in Expertise - Quantitative Use of menus  Selecting the correct menu item. Expert typically recall the location, or use keyboard shortcuts.  Size and organization of menu affects selection time.  Skilled users memorize the location.  Novices typically do not know what menu item to search and where they may be located.

Difference in Expertise - Quantitative Skilled Performance Modeling  Fitts’ Law and Steering Law – both indicate that distance and size of target affect the speed of selection.  These were not directly used, but the underlying properties of mouse motion, velocity and acceleration, are useful.  Keystroke Level Modeling

Difference in Expertise - Quantitative Mouse vs. Keyboard and other data  Monitoring on-screen dialogs, help browsers, and keyboard logs  Possible Keyboard actions: Detect actions and immediate undo Detect use of keyboard shortcuts  These were not used as these occur less frequently compared to mouse-based features.

Creating a Classifier – predictive model Direct engineering of best set of features a priori vs. using a large set of plausible features and use machine learning based techniques to determine which features were more predictive when used in a statistical model. Allows for speculatively trying a range of features – many may turn out to be useless, while some may prove useful. To use this approach a large data set of labeled training data is needed

Classifier – Data Collection Modified GIMP to streamline tasks  Removed menu bars, toolboxes, ‘close’ and ‘quit’  All tasks accomplished through popup menus  Mouse events logged via XNEE that received data directly from X11 windowing system.  GTK used to log menu interactions (mouse enter/exit menu item)  Carefully avoided any information that could not be gathered in an application independent fashion – specific height and width of menu items

Classifier – Data Collection Detecting Informative Moments  Actions readily isolated, indicative of a phenomena that can be easily and accurately labeled  Menu selection: starting with a right click to open a pop-up menu, and ending with the left click to select a menu item or dismiss the menu without a selection

Classifier – Data Collection Participants (paid)  Short questionnaire To verify novice status To determine experience with image editing and drawing manipulation applications Reading test – 153 word passage  Participant Status Did not know location of most menu items All but four knew to select ‘undo’ from ‘edit’ menu, so this was removed from analysis. Reading speed were above adult average: 230-612 wpm Mostly Windows users

Classifier – Data Collection Method  Tasks designed to be repetitive and progress from novice to skilled behavior Clear, specific sequential instructions on paper Two separate tasks in fixed order 1. Draw transparent shapes and change background pattern 2. Draw letters and shapes and color them with solid colors or gradients Each task – seven identical trials Each trial had ten menu selections Extreme outliers removed from training data set  Difficulty staying on task, skipping sections of trials, technical failures

Classifier – Data Collection Labeling and Validating Novice vs. Skilled Behavior First task of first trial labeled as novice Final trial in both tasks labeled as skilled Menu search samples labeled as novice – 600 Menu search samples labeled as skilled – 700  Subjective  users impression of their performance after each trial  Objective  Compared performance times of menu selections within trials with predictions by KLM.

Classifier – Data Collection Plot of average of participant’s subjective responses to questions asked after each trial: Task #1 "I had no problem locating the menu items in this trial" Task #2 "It was easy for me to complete this trial without external help." Subjective impression

Classifier – Data Collection Objective Analysis of data Compared performance times of menu selections within trials with predictions by KLM. Analysis divided into groups defined by submenu depth (Expertise develops quicker in higher level menus) Analysis showed that users progressed through a learning curve.  Visiting second level submenus, users were performing better than KLM predicted times by the fourth trial in first task  Users reached KLM predictions for third level sub menus by end of second task Analyzed variation across trials of the most promising feature: the ratio of time to make a menu selection vs. the depth of the selection

Classifier – Data Collection Plot of a promising menu feature’s mean for each trial number. Note the rise in the learning curve between the first and second tasks.

Classifier – Data Collection Candidate Features  Features derived from low-level motion Total Time (seconds) Elapsed time within the action (starting when the menu opened and ending when it closed). Range: 0.504 – 143 X and Y Mouse Velocity (pixels/second) Average velocity of the mouse during a menu operation in the X and Y directions. (Range: X: 24756 – 35745; Y: 30116 – 37789) X and Y Mouse Acceleration (change in velocity/second) Average unsigned acceleration of the mouse during a menu operation in the X and Y directions. (Range: X:0 – 242041107; Y: 0 – 1770018051.8) Dwell Time (seconds) Time spent dwelling (not moving) during the interaction sequence. (Range: 0 – 112)

Classifier – Data Collection Candidate Features  Features related to the interaction technique Average Dwell Time (seconds/count) Time spent dwelling divided by the number of menu items visited. (Range: 0 - 3.581) Number of Opened Submenus (count) Total number of submenus that the user opened while searching. (Range: 0 - 59) Selection Depth (count) Depth of the selection (Range: 0 - 3) used in combination, conditionally with other features. Menu Item Visits (count) Total number of menu items that were visited or passed through during menu action. (Range: 0 - 160) Unique Item Visits (count) Number of unique menu items visited. (Range: 1 - 57) Selected Item Dwell Time (seconds) Time spent dwelling within the menu item that was ultimately selected. This feature sums all times spent in that item. (Range: 0 - 22)

Classifier – Data Collection Candidate Features  Features related to performance models KLM Diff (seconds) Difference between KLM predicted time and actual time for the action. (Range: 0.54 - 143.196) KLM Ratio (dimensionless) KLM predicted time divided by the actual time for the action. (Range: 0.003 - 3.488) Time Depth Ratio (seconds/depth) Time to make a menu selection divided by the depth of that selection. (Range: 0 - 1.368)

Classifier – Data Collection Feature Selection  Used analysis of information gain to rank content of each feature in isolation  Top 10 features: Average Y Acceleration KLM Diff Time Depth Ratio KLM Ratio Total Time Dwell Time Average Dwell Time Selected Item Dwell Time Menu Item Visits Number of Opened Submenus

Classifier – Data Collection Building and validation of Classifier  C4.5 Decision Tree learning algorithm implemented on a WEKA machine learning environment  Other learning algorithms considered  Bayesian Networks  Naïve Bayes,  Support Vector Machines, and  Linear Discriminant Analysis  Testing the Classifier Used traditional 10 fold cross-validation test Hold out 10% of test data. Classifier built with 90% of data and test classifier accuracy in predicting the 10% held-out set. Ten trials performed with 10 disjoint hold-out sets  The classifier achieved 91% accuracy.

Classifier – Data Collection Trends that the Classifier looked for to make classification  Novice behavior Low average Y acceleration (mouse moved slowly, stopped, changed direction) Longer time to make the selection of a given submenu depth Large total number of menu selections and unique menu selections  Skilled behavior High average Y acceleration Faster navigation of deeper menu items Low total number of menu selections

Implementation – Closing the loop Prototype Application to Adapt to Expertise Functions across GTK applications Displays name and expertise- tailored descriptions

Implementation – Closing the loop Validating Ability to Detect Expertise Used 4 paid participants Used the modified GIMP application Study consisted of two tasks 1. Scripted task to familiarize with application. 2. Free-form task to draw a scene. Each used different strategies in the free-form tasks  1 and 2 mostly used menu items from scripted tasks  3 and 4 explored menus first  4 had difficulties with scripted task – stayed novice

Implementation – Closing the loop Moving average of live classifier predictions for repetitive and free-form tasks. The vertical bars indicate the transition between tasks.

Conclusion and Future Work Skill differences are often ignored by applications This can be easily and accurately detected and used to better adapt to user needs Future work will include  Validation of the technique across multiple applications, and operating systems.  Explore performance in a wider range of real world situations.

Questions and Comments ?

Dynamic Detection of Novice vs. Skilled Use Without a Task Model CHI 2007 Proceedings Amy Hurst, Scott E. Hudson, Jennifer Mankoff Carnegie Mellon University.

Similar presentations

Presentation on theme: "Dynamic Detection of Novice vs. Skilled Use Without a Task Model CHI 2007 Proceedings Amy Hurst, Scott E. Hudson, Jennifer Mankoff Carnegie Mellon University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dynamic Detection of Novice vs. Skilled Use Without a Task Model CHI 2007 Proceedings Amy Hurst, Scott E. Hudson, Jennifer Mankoff Carnegie Mellon University.

Similar presentations

Presentation on theme: "Dynamic Detection of Novice vs. Skilled Use Without a Task Model CHI 2007 Proceedings Amy Hurst, Scott E. Hudson, Jennifer Mankoff Carnegie Mellon University."— Presentation transcript:

Similar presentations

About project

Feedback