Usability Evaluation, part 2
REVIEW: A Test Plan Checklist, 1 Goal of the test? Specific questions you want to answer? Who will be the experimenter? Who are the users going to be? How many users are needed? What kind of instruction will the users be given? What tasks will you ask the users to perform? What criteria will be used to determine the end of each task?
REVIEW: A Test Plan Checklist, 2 What aids will be made available to users? To what extent will the experimenter be allowed to help the users? What data is going to be collected and how will it be analyzed? What is the criterion for judging the interface a success?
REVIEW: A Test Plan Checklist, 3 Where and when will the evaluation be done? How long will the evaluation take? What computer support? What software? Initial state of the system? Are there any system/network load requirements?
REVIEW: The Role of the Experimenter Ensures that room, computer, etc are all ready During testing: –Should not interfere! –If a user is bogged down can give a small hint –If the user is hopelessly off track, can fake an equipment problem
REVIEW: How Many Users? Huge individual differences between users! Up to a factor of 10 A single point determines an infinite number of lines Some data is better than none
REVIEW: Which Tasks? Keep close to the real tasks –may need to shorten some for time reasons –may need to provide users with background information
REVIEW: When in the Process? Remember: early is better Formative vs. summative evaluation –During design design modifications –After design evaluation of “finished” product, comparison to baseline, rigorous statistics
What to Measure Process data (Qualitative) –problems, questions, reactions –what users are thinking Bottom-line data (Quantitative) –mostly later for usability measurement –not as useful early in design Asking users questions –problematic – users will answer
Running the test Preparation Introduction Test Debriefing
Running the test - Prep Room ready? Equipment ready? Interface in the start state?
Running the test - Intro Cover the following with the user –Evaluating the interface, not the user –No personal stake –Released version will differ –Confidentiality reminder -- system,results –Voluntary participation –Welcome to ask questions –Specific instructions –Any questions?
Running the test Refrain from interacting with the user If the user is clearly stuck If several observers are present designate one as lead
Running the test - debriefing Fill out any questionnaires Ask follow-up questions Discussion Any other comments?
Summary of Evaluation Techniques Method# UsersLifecycle Cognitive WalkthroughNoneEarly design Heuristic EvaluationNoneEarly design Performance measures10+Competitive Analysis Thinking Aloud3-5Iterative early evaluation Observation3+Task analysis
Summary of Evaluation Techniques Method# UsersLifecycle Questionnaire30Task Analysis, Follow-up Interview5Task Analysis Focus Group6-9 per groupTask Analysis Logging Use20+Final testing, Follow-up User feedback100+Follow-up
Performance Measures Concrete, quantitative measures of usability –learning time –use time for specific tasks and users –Features used / not used –error rates –measures of user satisfaction Comparative usability goals –prior versions –competitors
Things to Watch Goals should be realistic –100% is never realistic Many goals go beyond the application UI –training, manuals Testing goals should help improve the UI –detail--not just good/bad
Think-Aloud Method – Very Useful! Give the user one or more tasks and ask them to complete as best as they can using the interface User asked to think-aloud –ask questions (although they won’t be answered) –explain decisions, identify confusion Tester records session –avoids interfering as much as possible only when test would end otherwise –explain to subject that you won’t answer
A variation – Constructive Interaction Get two subjects to work together, so they’ll naturally talk about their goals, problems they encounter, strategies for overcoming the problems, etc.
Logging Use Users interact with interface –Specific interaction is logged for data analysis depending on specific goals Often combine with questionnaires –Can be done as individual tasks are completed or at study completion (or both) Often can be done in complete absence of evaluators.
Running your tests Have a Plan Be Prepared Provide an Introduction Run the Test Don’t Overlook Debriefing
Test plan details, 1 Should use think-aloud method Description of the testing logistics –How will the prototype be made available to the user? –Where and when will the test take place? Goals - specific design decisions you want to test User instructions –Both orientation instructions and specific instructions for the tasks to be done
Test Plan Details, 2 Data to be collected; form of collection How the test will be done –How many group members at each test (each member need only participate in one test; if you have several users, you may want to split up roles) –roles (e.g., orienting the user, logging user actions on a tracking form, post-test interview) – who does what? Anticipated problems and how you will address them. Merge the best of the individual test plans to create a unified group test plan
Summary Be Organized Be Prepared Do a pilot
Example Evaluating a prototype web information management interface, TopicShop An information workspace for visualizing, exploring, and organizing collections of web sites
Work area initially is empty This window displays a set of sites and “meta- information” for each site
Several sites added to work area
Group started, annotation added Linked views Status is visible
Evaluation, 1 Goal –Is this interface “better”? Getting specific Better than what? –Selected a baseline for a comparison study – Google for browsing + Chrome bookmarks for organizing Better at what? –Selecting “good” sites –Organizing them into personally meaningful groups
Evaluation, 2 Specific questions –Do users find more good sites with one interface than the other? –Do users take more time to complete their tasks in one interface than the other? –…
Evaluation, 3 Task –Select 15 best sites, organize them into groups Users –Computer- and internet- experienced college students How many users? –40 subjects Training? –15 minute tutorial and practice session –…
Evaluation, 4 Task completion criteria –15 sites selected, at least one subgroup created: subjects decided when they were done User aids –Could refer back to tutorial sheet Help from experimenter? None Initial state –Empty work area –Site profiles area populated with sites from particular topic, sorted by number of in-links
Evaluation, 5 What data is collected? –All actions logged and timestamped –Selected sites –Position of sites –Groups of sites –Labels for groups and individual sites –Survey Measures –# of good sites selected (what is a “good” site?) –Time taken
Evaluation, 6 Evaluation –Various comparisons between the TopicShop and Google+bookmarks condition, using appropriate statistical tests