SBD: Usability Evaluation Chris North cs3724: HCI.

Slides:



Advertisements
Similar presentations
DEVELOPING A METHODOLOGY FOR MS3305 CW2 Some guidance.
Advertisements

Chapter 15: Analytical evaluation
Task-Centered User Interface Design who are the users? what are the tasks? plagiarize! iterative design –rough descriptions, mock-ups, prototypes test.
1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 30, 2005.
Ch 11 Cognitive Walkthroughs and Heuristic Evaluation Yonglei Tao School of Computing and Info Systems GVSU.
USABILITY AND EVALUATION Motivations and Methods.
11 HCI - Lesson 5.1 Heuristic Inspection (Nielsen’s Heuristics) Prof. Garzotto.
HCI Methods for Pathway Visualization Tools Purvi Saraiya, Chris North, Karen Duca* Virginia Tech Dept. of Computer Science, Center for Human-Computer.
Part 4: Evaluation Days 25, 27, 29, 31 Chapter 20: Why evaluate? Chapter 21: Deciding on what to evaluate: the strategy Chapter 22: Planning who, what,
1 User Centered Design and Evaluation. 2 Overview Why involve users at all? What is a user-centered approach? Evaluation strategies Examples from “Snap-Together.
Empirical Methods in Human- Computer Interaction.
Experiments Testing hypotheses…. Recall: Evaluation techniques  Predictive modeling  Questionnaire  Experiments  Heuristic evaluation  Cognitive.
Today’s class Group Presentation More about principles, guidelines, style guides and standards In-class exercises More about usability Norman’s model of.
Heuristic Evaluation IS 485, Professor Matt Thatcher.
Heuristic Evaluation Evaluating with experts. Discount Evaluation Techniques  Basis: Observing users can be time- consuming and expensive Try to predict.
Evaluating with experts
1 User Centered Design and Evaluation. 2 Overview My evaluation experience Why involve users at all? What is a user-centered approach? Evaluation strategies.
Evaluation: Inspections, Analytics & Models
1 SKODA-AUTO.CZ prototype evaluation Poznań, 23th of March 2015.
©2011 1www.id-book.com Analytical evaluation Chapter 15.
SBD: Usability Evaluation
1 Usability evaluation and testing User interfaces Jaana Holvikivi Metropolia.
Evaluation: Controlled Experiments Chris North cs3724: HCI.
… and after unit testing …
Predictive Evaluation
1 SWE 513: Software Engineering Usability II. 2 Usability and Cost Good usability may be expensive in hardware or special software development User interface.
Evaluation Methods Analytic: theory, models, guidelines (experts) –Cognitive Walkthrough –Usability Inspection –Heuristic Evaluation Empirical: observations,
Design and Evaluation. Overview Formal Evaluations of Visualization Techniques Design (review) Evaluation/Critique of Visualizations (what we’ve been.
Discount Evaluation Evaluating with experts. Discount Evaluation Techniques Basis: – Observing users can be time-consuming and expensive – Try to predict.
INFO3315 Week 4 Personas, Tasks Guidelines, Heuristic Evaluation.
Design Chris North cs3724: HCI. Quiz What are the 3 steps in producing a UI?
Part 1-Intro; Part 2- Req; Part 3- Design  Chapter 20 Why evaluate the usability of user interface designs?  Chapter 21 Deciding on what you need to.
Formative Evaluation cs3724: HCI. Problem scenarios summative evaluation Information scenarios claims about current practice analysis of stakeholders,
Chapter 26 Inspections of the UI. Heuristic inspection Recommended before but in lieu of user observations Sort of like an expert evaluation Heuristics.
Fall 2002CS/PSY Empirical Evaluation Analyzing data, Informing design, Usability Specifications Inspecting your data Analyzing & interpreting results.
Multimedia Specification Design and Production 2012 / Semester 1 / week 5 Lecturer: Dr. Nikos Gazepidis
Usability Evaluation June 8, Why do we need to do usability evaluation?
SEG3120 User Interfaces Design and Implementation
Usability Testing Chapter 6. Reliability Can you repeat the test?
Y ASER G HANAM Heuristic Evaluation. Roadmap Introduction How it works Advantages Shortcomings Conclusion Exercise.
Usability Testing Chris North cs3724: HCI. Presentations karen molye, steve kovalak Vote: UI Hall of Fame/Shame?
Chapter 15: Analytical evaluation. Inspections Heuristic evaluation Walkthroughs.
Chapter 15: Analytical evaluation Q1, 2. Inspections Heuristic evaluation Walkthroughs Start Q3 Reviewers tend to use guidelines, heuristics and checklists.
Chapter 8 Usability Specification Techniques Hix & Hartson.
Evaluating a UI Design Expert inspection methods Cognitive Walkthrough
Usability 1 Usability evaluation Without users - analytical techniques With users - survey and observational techniques.
Usability Engineering Dr. Dania Bilal IS 582 Spring 2006.
Usability Engineering Dr. Dania Bilal IS 592 Spring 2005.
Design and Evaluation. Design Use Field Guide to identify information relevant to planning visualization.Field Guide Formally plan visualization using.
Chapter 15: Analytical evaluation. Aims: Describe inspection methods. Show how heuristic evaluation can be adapted to evaluate different products. Explain.
Interaction Styles Chris North cs3724: HCI. Presentations mike miller sean king Vote: UI Hall of Fame/Shame?
Administrivia  Feedback from the mid-term evaluation  Insights from project proposal.
SBD: Usability Evaluation Chris North CS 3724: HCI.
Empirical Evaluation Chris North cs5984: Information Visualization.
Introduction to Evaluation “Informal” approaches.
Fall 2002CS/PSY Predictive Evaluation (Evaluation Without Users) Gathering data about usability of a design by a specified group of users for a particular.
User Interface Evaluation
SIE 515 Design Evaluation Lecture 7.
Human-Computer Interaction
Qualitative vs. Quantitative
SBD: Usability Evaluation
Data Collection and Analysis
SY DE 542 User Testing March 7, 2005 R. Chow
Evaluation ECE 695 Alexander J. Quinn March 30, 2018.
Inspecting your data Analyzing & interpreting results
Evaluation.
Title of your experimental design
HCI Evaluation Techniques
Nilesen 10 hueristics.
Formative Evaluation cs3724: HCI.
Presentation transcript:

SBD: Usability Evaluation Chris North cs3724: HCI

Usability Evaluation Analytic Methods: Usability inspection, Expert review Heuristic Evaluation Cognitive walk-through GOMS analysis Empirical Methods: Usability Testing –Field or lab –Observation, problem identification Controlled Experiment –Formal controlled scientific experiment –Comparisons, statistical analysis

Heuristic Evaluation

Nielsen’s 10 Heuristics 1.Visible status, feedback wysiwyg 2.User control, undo, exits wizards 3.Familiar, speak user’s language Acrobat error msg 4.Consistent, standards multi-close Word, PPT 5.Recognition over recall web nav, phone menu 6.Efficient, expert shortcuts Word bold 7.Aesthetic, minimalist phone book 8.Prevent errors HomeFinder 9.Error recovery undo, back 10.Help, task-based IIS help doc

Shneiderman’s 8 Golden Rules 1.Consistency multi-close Word, PPT 2.Shortcuts for experts Word bold 3.Feedback Wysiwyg 4.Sequences with closure wizards 5.Prevent errors, rapid recovery undo 6.Easy reversal HomeFinder 7.User control ClipIt modal 8.Reduce memory load web nav, phone menu

Speak the User’s Language

Help documentation Context help, help doc, UI

Usability Testing

Formative: helps guide design Early in design process when architecture is finalized, then its too late! A few users Usability problems, incidents Qualitative feedback from users Quantitative usability specification

Usability Specification Table Scenario task Worst casePlanned Target Best case (expert) Observed Find most expensive house for sale? 1 min.10 sec.3 sec.??? sec …

Usability Test Setup Set of benchmark tasks Easy to hard, specific to open-ended Coverage of different UI features E.g. “find the 5 most expensive houses for sale” Consent forms Not needed unless video-taping user’s face (new rule) Experimenters: Facilitator: instructs user Observers: take notes, collect data, video tape screen Executor: run the prototype if faked Users 3-5 users, quality not quantity

Usability Test Procedure Goal: mimic real life Do not cheat by showing them how to use the UI! Initial instructions “We are evaluating the system, not you.” Repeat: Give user a task Ask user to “think aloud” Observe, note mistakes and problems Avoid interfering, hint only if completely stuck Interview Verbal feedback Questionnaire ~1 hour / user

Usability Lab E.g. McBryde 102

Data Note taking E.g. user keeps clicking on the wrong button…” Verbal protocol: think aloud E.g. user thinks that button does something else… Rough quantitative measures HCI metrics: e.g. task completion time,.. Interview feedback and surveys Video-tape screen & mouse Eye tracking, biometrics?

Analyze Initial reaction: “stupid user!”, “that’s developer X’s fault!”, “this sucks” Mature reaction: “how can we redesign UI to solve that usability problem?” the user is always right Identify usability problems Learning issues: e.g. can’t figure out or didn’t notice feature Performance issues: e.g. arduous, tiring to solve tasks Subjective issues: e.g. annoying, ugly Problem severity: critical vs. minor

Cost-Importance Analysis Importance 1-5: (task effect, frequency) 5 = critical, major impact on user, frequent occurance 3 = user can complete task, but with difficulty 1 = minor problem, small speed bump, infrequent Ratio = importance / cost Sort by this 3 categories: Must fix, next version, ignored ProblemImportanceSolutionsCostRatio I/C

Refine UI Simple solutions vs. major redesigns Solve problems in order of: importance/cost Example: Problem: user didn’t know he could zoom in to see more… Potential solutions: –Better zoom button icon, tooltip –Add a zoom bar slider (like moosburg) –Icons for different zoom levels: boundaries, roads, buildings –NOT: more “help” documentation!!! You can do better. Iterate Test, refine, test, refine, test, refine, … Until? Meets usability specification

Project: Usability Evaluation Usability Evaluation: Informal test 3 users: Not (tainted) HCI students Simple data collection (Biometrics optional!) Exploit this opportunity to improve your design Report: Procedure (users, tasks, specs, data collection) Usability problems identified, specs not met Design modifications Revised implementation plan

Controlled Experiments

Usability test vs. Controlled Expm. Usability test: Formative: helps guide design Single UI, early in design process Few users Usability problems, incidents Qualitative feedback from users Controlled experiment: Summative: measure final result Compare multiple UIs Many users, strict protocol Independent & dependent variables Quantitative results, statistical significance

What is Science? Measurement Modeling

Scientific Method 1.Form Hypothesis 2.Collect data 3.Analyze 4.Accept/reject hypothesis How to “prove” a hypothesis in science? Easier to disprove things, by counterexample Null hypothesis = opposite of hypothesis Disprove null hypothesis Hence, hypothesis is proved

Empirical Experiment Typical question: Which visualization is better in which situations? Spotfirevs.TableLens

Cause and Effect Goal: determine “cause and effect” Cause = visualization tool (Spotfire vs. TableLens) Effect = user performance time on task T Procedure: Vary cause Measure effect Problem: random variation Cause = vis tool OR random variation? Real world Collected data random variation uncertain conclusions

Stats to the Rescue Goal: Measured effect unlikely to result by random variation Hypothesis: Cause = visualization tool (e.g. Spotfire ≠ TableLens) Null hypothesis: Visualization tool has no effect (e.g. Spotfire = TableLens) Hence: Cause = random variation Stats: If null hypothesis true, then measured effect occurs with probability > random variation) Hence: Null hypothesis unlikely to be true Hence, hypothesis likely to be true

Variables Independent Variables (what you vary), and treatments (the variable values): Visualization tool »Spotfire, TableLens, Excel Task type »Find, count, pattern, compare Data size (# of items) »100, 1000, Dependent Variables (what you measure) User performance time Errors Subjective satisfaction (survey) HCI metrics

Example: 2 x 3 design n users per cell Task1Task2Task3 Spot- fire Table- Lens Ind Var 1: Vis. Tool Ind Var 2: Task Type Measured user performance times (dep var)

Groups “Between subjects” variable 1 group of users for each variable treatment Group 1: 20 users, Spotfire Group 2: 20 users, TableLens Total: 40 users, 20 per cell “With-in subjects” (repeated) variable All users perform all treatments Counter-balancing order effect Group 1: 20 users, Spotfire then TableLens Group 2: 20 users, TableLens then Spotfire Total: 40 users, 40 per cell

Issues Eliminate or measure extraneous factors Randomized Fairness Identical procedures, … Bias User privacy, data security IRB (internal review board)

Procedure For each user: Sign legal forms Pre-Survey: demographics Instructions »Do not reveal true purpose of experiment Training runs Actual runs »Give task »measure performance Post-Survey: subjective measures * n users

Data Measured dependent variables Spreadsheet: UserSpotfireTableLens task 1 task 2 task 3 task 1 task 2 task 3

Step 1: Visualize it Dig out interesting facts Qualitative conclusions Guide stats Guide future experiments

Step 2: Stats Task1Task2Task3 Spot- fire Table- Lens Ind Var 1: Vis. Tool Ind Var 2: Task Type Average user performance times (dep var)

TableLens better than Spotfire? Problem with Averages: lossy Compares only 2 numbers What about the 40 data values? (Show me the data!) Spotfire TableLens Avg Perf time (secs)

The real picture Need stats that compare all data Spotfire TableLens Avg Perf time (secs)

Statistics t-test Compares 1 dep var on 2 treatments of 1 ind var ANOVA: Analysis of Variance Compares 1 dep var on n treatments of m ind vars Result: p = probability that difference between treatments is random (null hypothesis) “statistical significance” level typical cut-off: p < 0.05 Hypothesis confidence = 1 - p

In Excel

p < 0.05 Woohoo! Found a “statistically significant” difference Averages determine which is ‘better’ Conclusion: Cause = visualization tool (e.g. Spotfire ≠ TableLens) Vis Tool has an effect on user performance for task T … “95% confident that TableLens better than Spotfire …” NOT “TableLens beats Spotfire 95% of time” 5% chance of being wrong! Be careful about generalizing

p > 0.05 Hence, no difference? Vis Tool has no effect on user performance for task T…? Spotfire = TableLens ? NOT! Did not detect a difference, but could still be different Potential real effect did not overcome random variation Provides evidence for Spotfire = TableLens, but not proof Boring, basically found nothing How? Not enough users Need better tasks, data, …

Data Mountain Robertson, “Data Mountain” (Microsoft)

Data Mountain: Experiment Data Mountain vs. IE favorites 32 subjects Organize 100 pages, then retrieve based on cues Indep. Vars: UI: Data mountain (old, new), IE Cue: Title, Summary, Thumbnail, all 3 Dependent variables: User performance time Error rates: wrong pages, failed to find in 2 min Subjective ratings

Data Mountain: Results Spatial Memory! Limited scalability?