Qualitative Evaluation. Lecture Outline  Evaluation objectives  Evaluation methods Human Subjects  “Think Aloud”  Wizard of Oz No Human Subjects 

Slides:



Advertisements
Similar presentations
DEVELOPING A METHODOLOGY FOR MS3305 CW2 Some guidance.
Advertisements

Ch 11 Cognitive Walkthroughs and Heuristic Evaluation Yonglei Tao School of Computing and Info Systems GVSU.
11 HCI - Lesson 5.1 Heuristic Inspection (Nielsen’s Heuristics) Prof. Garzotto.
Electronic Communications Usability Primer.
Multimedia and the World Wide Web
MScIT HCI Web GUI design. IBM’s CUA guidelines - taster Design Principles Each principle has supporting implementation techniques. The two design.
Part 4: Evaluation Days 25, 27, 29, 31 Chapter 20: Why evaluate? Chapter 21: Deciding on what to evaluate: the strategy Chapter 22: Planning who, what,
AJ Brush Richard Anderson
Heuristic Evaluation. Sources for today’s lecture: Professor James Landay: stic-evaluation/heuristic-evaluation.ppt.
Testing your design Without users: With users: Cognitive walkthrough
Today’s class Group Presentation More about principles, guidelines, style guides and standards In-class exercises More about usability Norman’s model of.
Heuristic Evaluation Evaluating with experts. Discount Evaluation Techniques  Basis: Observing users can be time- consuming and expensive Try to predict.
Evaluating with experts
1 User Centered Design and Evaluation. 2 Overview My evaluation experience Why involve users at all? What is a user-centered approach? Evaluation strategies.
Evaluation techniques Part 1
Heuristic Evaluation of Usability Teppo Räisänen
Analytical Evaluations 2. Field Studies
1 SKODA-AUTO.CZ prototype evaluation Poznań, 23th of March 2015.
Usability Methods: Cognitive Walkthrough & Heuristic Evaluation Dr. Dania Bilal IS 588 Spring 2008 Dr. D. Bilal.
Heuristic evaluation IS 403: User Interface Design Shaun Kane.
Heuristic Evaluation “Discount” Usability Testing Adapted from material by Marti Hearst, Loren Terveen.
Heuristic Evaluation: Hotels.com
Predictive Evaluation
User Centred Design Overview. Human centred design processes for interactive systems, ISO (1999), states: "Human-centred design is an approach to.
SAMPLE HEURISTIC EVALUATION FOR 680NEWS.COM Glenn Teneycke.
INFO3315 Week 4 Personas, Tasks Guidelines, Heuristic Evaluation.
CS 4720 Usability and Accessibility CS 4720 – Web & Mobile Systems.
Basic Principles of HCI Lecture Requirements Analysis Establish the goals for the Website from the standpoint of the user and the business. Agree.
Nielsen’s Ten Usability Heuristics
Usability Evaluation/LP Usability: how to judge it.
10 Usability Heuristics for User Interface Design.
Multimedia Specification Design and Production 2012 / Semester 1 / week 5 Lecturer: Dr. Nikos Gazepidis
Usability Evaluation June 8, Why do we need to do usability evaluation?
SEG3120 User Interfaces Design and Implementation
Evaluation of User Interface Design 4. Predictive Evaluation continued Different kinds of predictive evaluation: 1.Inspection methods 2.Usage simulations.
Design 2 (Chapter 5) Conceptual Design Physical Design Evaluation
LZW Compression Grant Friedline Robert Frankeny Thomas Sutcavage.
Mahindra Infotainment System Heuristic Evaluation v1.0 Maya Studios July 6, 2010.
Chapter 15: Analytical evaluation Q1, 2. Inspections Heuristic evaluation Walkthroughs Start Q3 Reviewers tend to use guidelines, heuristics and checklists.
Evaluating a UI Design Expert inspection methods Cognitive Walkthrough
 What to “know”? ◦ Goals of information visualization. ◦ About human perceptual capabilities. ◦ About the issues involved in designing visualization for.
Usability 1 Usability evaluation Without users - analytical techniques With users - survey and observational techniques.
Developed by Tim Bell Department of Computer Science and Software Engineering University of Canterbury Human Computer Interaction.
June 5, 2007Mohamad Eid Heuristic Evaluation Chapter 9.
CENG 394 Introduction to HCI Usability Heuristics.
Heuristic Evaluation Short tutorial to heuristic evaluation
Alan Woolrych My Background Currently – Research & Liaison Officer (DMN) From 1 st January 2003 Usability Researcher with.
Chapter 15: Analytical evaluation. Aims: Describe inspection methods. Show how heuristic evaluation can be adapted to evaluate different products. Explain.
Usability Heuristics Avoid common design pitfalls by following principles of good design Nielsen proposes 10 heuristics, others propose more or less. Inspect.
1 Usability evaluation and testing User interfaces Jaana Holvikivi Metropolia.
Basic Elements.  Design is the process of collecting ideas, and aesthetically arranging and implementing them, guided by certain principles for a specific.
Oct 211 The next two weeks Oct 21 & 23: Lectures on user interface evaluation Oct 28: Lecture by Dr. Maurice Masliah No office hours (out of town) Oct.
Introduction to Evaluation “Informal” approaches.
Fall 2002CS/PSY Predictive Evaluation (Evaluation Without Users) Gathering data about usability of a design by a specified group of users for a particular.
CS 575 Spring 2012 CSULA Bapa Rao Lecture 6. Agenda for today Review of previous meeting Student Comments Heuristic Evaluation Presentation Team reports.
User Interface Evaluation Heuristic Evaluation Lecture #17.
Usability Engineering Dr. Dania Bilal IS 587 Fall 2007.
© 2016 Cognizant. © 2016 Cognizant Introduction PREREQUISITES SCOPE Heuristic evaluation is a discount usability engineering method for quick, cheap,
Ten Usability Heuristics These are ten general principles for user interface design. They are called "heuristics" because they are more in the nature of.
User Interface Design SCMP Special Topic: Software Development
Human Computer Interaction Lecture 15 Usability Evaluation
Heuristic Evaluation 3 CPSC 481: HCI I Fall 2014
A NEW FACE OF THE TECHNICAL COMMUNICATOR – UX IS OUR STRENGTH – WE DESIGN TO WRITE BY CHRIS GANTA © 2016, STC INDIA CHAPTER.
Unit 14 Website Design HND in Computing and Systems Development
Heuristic Evaluation Jon Kolko Professor, Austin Center for Design.
One-timer?. A new face of the technical communicator – UX is our strength – we design to write.
10 Design Principles.
Evaluation.
Nilesen 10 hueristics.
CSM18 Usability Engineering
Presentation transcript:

Qualitative Evaluation

Lecture Outline  Evaluation objectives  Evaluation methods Human Subjects  “Think Aloud”  Wizard of Oz No Human Subjects  Heuristic evaluation  Cognitive walkthrus  GOMS

Huh?

I’ll be dead before …

Evaluation objectives  Anticipate what will happen when real users start using your system.  Give the test users some tasks to try to do, and you’ll be keeping track of whether they can do them.

Two axes  Human – non-human  Qualitative – Quantitative

Quantitative Qualitative Non-Human Human Cognitive Walk-Thru Heuristic Evaluation GOMS Think Aloud Wizard of Oz

Non-human subject methods  Heuristic evaluation  Cognitive walkthrus

Heuristic Evaluation (1)  A small set of HCI experts independently assess (two passes) for adherence to usability principles (heuristics).  Evaluators rate severity of violation to prioritize key fixes. Explain why interface violates heuristic.  Evaluators communicate afterwards to aggregate findings but not during evaluation.  Since the evaluators are not using the system as such (to perform a real task), it is possible to perform heuristic evaluation of user interfaces that exist on paper only and have not yet been implemented.

Heuristic Evaluation (2)  10 Usability Heuristics (by Jakob Nielsen) Visibility of system status  The system should always keep users informed about what is going on, through appropriate feedback within reasonable time. Match between system and the real world  The system should speak the users' language, with words, phrases and concepts familiar to the user, rather than system-oriented terms. Follow real-world conventions, making information appear in a natural and logical order. User control and freedom  Users often choose system functions by mistake and will need a clearly marked "emergency exit" to leave the unwanted state without having to go through an extended dialogue. Support undo and redo. Consistency and standards  Users should not have to wonder whether different words, situations, or actions mean the same thing. Follow platform conventions. Error prevention  Even better than good error messages is a careful design which prevents a problem from occurring in the first place. Recognition rather than recall  Make objects, actions, and options visible. The user should not have to remember information from one part of the dialogue to another. Instructions for use of the system should be visible or easily retrievable whenever appropriate. Flexibility and efficiency of use  Accelerators -- unseen by the novice user -- may often speed up the interaction for the expert user such that the system can cater to both inexperienced and experienced users. Allow users to tailor frequent actions. Aesthetic and minimalist design  Dialogues should not contain information which is irrelevant or rarely needed. Every extra unit of information in a dialogue competes with the relevant units of information and diminishes their relative visibility. Help users recognize, diagnose, and recover from errors  Error messages should be expressed in plain language (no codes), precisely indicate the problem, and constructively suggest a solution. Help and documentation  Even though it is better if the system can be used without documentation, it may be necessary to provide help and documentation. Any such information should be easy to search, focused on the user's task, list concrete steps to be carried out, and not be too large.

Heuristic Evaluation (3)  Severity rating 0 = no problem 1 = cosmetic problem 2 = minor usability problem 3 = major usability problem; should fix 4 = catastrophe; must fix

Heuristic Evaluation (4)  Usability matrix Each row represents one evaluator. Each column represents one of the usability problems. Each black square shows whether the evaluator represented by the row found the usability problem. The more rows blacked out within a column, the more obvious the problem.

Heuristic Evaluation (5)  Use 3-5 evaluators; any more and you get diminishing returns.  Using more than 5 evaluators also costs more money!

Cognitive Walkthroughs (1)  Cognitive walkthrough is a formalized way of imagining people’s thoughts and actions when they use an interface for the first time.  Start with a prototype or a detailed design description of the interface and known end-users.  Try to tell a believable story about each action a user has to take to do the task.  If you can’t tell a believable story about an action, then you've located a problem with the interface.  Walkthroughs focus most clearly on problems that users will have when they first use an interface.

Cognitive Walkthroughs (2) 1.You need a description or a prototype of the interface. It doesn’t have to be complete, but it should be fairly detailed. Details such as exactly what words are in a menu can make a big difference. 2.You need a task description. The task should usually be one of the representative tasks you’re using for task-centered design, or some piece of that task. 3.You need a complete, written list of the actions needed to complete the task with the interface. 4.You need an idea of who the users will be and what kind of experience they’ll bring to the job. This is an understanding you should have developed through your task and user analysis. Ideally, you have developed detailed user personas either through customer surveys or ethnographic studies.

Cognitive Walkthroughs (3) 1.Will users be trying to produce whatever effect the action has? (Example: safely remove hardware in Windows) 2.Will users see the control (button, menu, switch, etc.) for the action? (Example: hidden cascading icons in Windows menus and Taskbar) 3.Once users find the control, will they recognize that it produces the effect they want? 4.After the action is taken, will users understand the feedback they get, so they can go on to the next action with confidence?

Human subject  Wizard of Oz  Think aloud

Human subjects (1)  Best test users will be people who are representative of the people you expect to have as users.  Voluntary, informed consent for testing.  If you are working in an organization that receives federal research funds, you are obligated to comply with formal rules and regulations that govern the conduct of tests, including getting approval from a review committee for any study that involves human participants.

Human subjects (2)  Train test users as they are likely to receive training in the field  You should always do a pilot study as part of any usability test. Do this twice, once with colleagues, to get out the biggest bugs, and then with real users.  Keep variability to a minimum. Do not provide one user more guidance or “Help” than another.

Human subjects (3)  During the test Make clear to test users that they are free to stop participating at any time. Avoid putting any pressure on them to continue. Monitor the attitude of your test users carefully especially if they get upset with themselves if things don’t go well. Stress that it is your system, not the users, that is being tested. You cannot provide any help beyond what they would receive in the field!

Collecting Data (1)  Process Data Qualitative observations of what the test users are doing and thinking as they work through the tasks.  Bottom-Line Data Quantitative data on how long the user spent on the experiment, how many mistakes, how many questions, etc.

“Think Aloud” (1)  “Tell me what you are thinking about as you work.”  Encourage the user to talk while working, to voice what they’re thinking, what they are trying to do, questions that arise as they work, things they read.  Tell the user that you are not interested in their secret thoughts but only in what they are thinking about their task.  Record (videotape, tape, written notes) their comments.  Convert the words and actions into data about your prototype using a coding sheet

“Think Aloud” – Coding (2) TimeAction / StatementError stateTypeComment 00:00Start Given task to create a menu for two kids aged 6 and 8. 00:10“I see multiple ways in which I can start the ordering process like suggested menu or low-fat menu. Ok, I’ll start at low-fat.” NoDeciding goal Selecting action 00:12Press Low-Fat.NoInterface action 00:15“Oh this is low-fat for adults. My kids wouldn’t eat steamed broccoli and fish.” YesInterpreting system state

Think-Aloud – Coding (3)  Coding Scheme Cognitive Ergonomics Issues  Searching, Learning, Interpreting, Recalling, Memorizing, Selecting, Physical Ergonomics Issues  Screen resolution, audio amplitude, text size, icon size Affective Issues  Emotion Content Issues  Relevance of content  Information design preference  Color and Font choice Computer Interaction Activity  Mouse movement  Mouse selection  Keyboard action  Spoken command

Getting “hard data”  Time to task completion  % of tasks completed  % of tasks completed per unit time (speed)  Ratio of successes to failures  Time spent in error state  Time spent recovering from errors  % or number of errors per number of actions  Frequency of getting help  Number of times user loses control of system  Number of times user expresses frustration

TimeStatementErrorCodeComment 1“I am supposed to find out how many restaurants there are.” NoSearching 2Selected Restaurant menu item. Error

Wizard of Oz  “Faking the implementation”  You emulate and simulate unimplemented functions and generate the feedback users should see.  Uses Testing needs to respond to unpredictable user input. Testing which input techniques and sensing mechanisms best represent the interaction Find out the kinds of problems people will have with the devices and techniques Very early stage testing (and quite useful for intelligent room)

Quantitative Evaluation

When to progress to quantitative  Qualitative methods are best for formative assessments  Quantitative methods are best for summative assessments

GOMS (1)  GOMS means Goals Operators Methods Selection rules

GOMS (2)  Goal Go from North Sydney to University of Sydney  Operators Locate train station, board correct train, alight at Central  Methods Walk, take bus, take ferry, take train, bike, drive  Selection rules Example: Walking is cheap but slow and inexpensive Example: Taking a bus is subject to uncertain road conditions

GOMS (3)  Goals = something the user wants to do; may have subgoals which are ordered hierarchically  Operators = specific actions performed in service of a goal; no sub-operators  Methods = sequence of operators to accomplish goals  Rules = how to select methods

GOMS (4)  Keystroke-Level-Model (KLM) To estimate execution time for a task, list the sequence of operators and then total the execution times for the individual operators. In particular, specify the method used to accomplish each particular task instance  Six Operators K to press a key or button P to point with a mouse to a target on a display H to home hands on the keyboard or other device D to draw a line segment on a grid M to mentally prepare to do an action or a closely related series of primitive actions R to represent the system response time during which the user has to wait for the system

GOMS (5)

GOMS (6)  Card, Moran, and Newell GOMS (CMN- GOMS) Like GOMS, CMN-GOMS has a strict goal hierarchy, but methods are represented in an informal program form that can include submethods and conditionals. Used to predict operator sequences.

GOMS (7)  Natural GOMS Language (NGOMSL) Constructs an NGOMSL model by performing a top-down, breadth-first expansion of the user’s top-level goals into methods, until the methods contain only primitive operators, typically keystroke-level operators. Like CMN-GOMS, NGOMSL models explicitly represent the goal structure, and so they can represent high-level goals.  NGOMSL provides learning time as well as execution time predictions.

GOMS (8)  Comparative Example Goal = remove a directory Comparison = Apple Macintosh MacOS X and Windows XP K-L-M Method

Hypothesis Testing (1)  Stating and testing a hypothesis allows the designer To provide data about cognitive process and human performance limitations To compare systems and fine-tune interaction  By Controlling variables and conditions in the test Removing experimenter bias

Hypothesis Testing (2)  A hypothesis IS A proposed explanation for a natural or artificial phenomenon  A hypothesis IS NOT A tautology (i.e., could not possibly be disproved)

Hypothesis Writing (1)  A good hypothesis (Interactive Menu Project) There is no difference in the time to complete a meal order between a dialog driven interface and a menu driven interface regardless of the expertise level of the subject.  A bad hypothesis (Interactive Menu Project) The meal order entry system is easy to use.

Hypothesis Writing (2)  A good hypothesis includes Independent variables that are to be altered  Aspects of the testing environment that you manipulate independent of a subject’s behaviour  Classifying the subjects into different categories (novice, expert) Example from Interactive Menu Project  UI Genre: Dialog driven; Menu driven  User Type: Expert, Novice

Hypothesis Writing (3)  A good hypothesis also includes Dependent variables that you will measure  Quantitative measurements and observations of phenomenon which are dependent on the subject’s interaction with the system and dependent on the independent variables Example  Interactive Menu Project Order entry time Number of selection errors made Count of interaction methods

Methods of Quantitative Analysis  Mean, Median and Standard Deviation  Correlation  ANOVA (analysis of variance)

Mean, Median and Standard Deviation  The mean is the expected value of a measured quantity.  The median is defined as the middle of a distribution: half the values are above the median and half are below the median.  The standard deviation tells you how tightly clustered the values are around the mean.

Correlation (1)  Used when you want to find a relationship between two variables, where one is usually considered the independent variable and the other is the dependent variable  The correlation may be Up to +1 when there is a direct relationship 0 where there is no relationship -1 when there is an inverse relationship  Notes A correlation does not imply causality – there may be a bias in your sample set or you do not have a large enough sample set

Correlation (2)  Example – Is there a correlation between the number of words people say while playing Monopoly and how much fun they’re having?  Independent variable: Number of words  Dependent variable: Fun

Correlation (3) R=0.64

ANOVA (1)  ANOVA is ANalysis Of VAriance.  Used when you want to find if there is a statistical difference between the heterogeneity of means when the measured quantity (observation) is from different test cases (factor levels)  The number of replicates (observations per factor level) must be the same in each factor level. This is called a balanced ANOVA.

ANOVA (2)  Example Suppose you want to test the completion time for ordering a meal with the Interactive Menu. You decide to classify your users by age group, 5-12, , and Then, you measure the amount of time it takes to complete the order entry. There is likely to be a different mean time to order among the three age groups. What you want to know is whether in fact the groups really are different. That is, is there statistical evidence that age causes the difference between the mean order entry time?

ANOVA (3)  The null hypothesis – The null hypothesis is that there is no real effect of age on order entry time, just that the groups are likely to have different order completion times.  The standard deviation of the expected mean calculates the likely variation. In the equation, σ is the is the standard deviation of the completion times for all groups and N is the number of people per group (must be the same).

ANOVA (4)  Collect the completion time for each group.  Calculate the mean completion time and standard deviation for each group.  If the standard deviation of those means is “significantly” larger than the standard deviation of the expected mean, we have evidence that the null hypothesis is not correct and instead age has an effect.

ANOVA (5)  You of course would typically use a statistical analysis package! This is what the package does.

ANOVA (6)  Calculates sum square of averages  Calculates sum square of errors  Calculates the mean squared average  Calculates the mean squared error  Calculates the F-ratio  Look up the P-value to see if the F-ratio is greater than or equal to what would have been found by chance

Usability Evaluation Summary  Select appropriate evaluation technique based on availability of human subjects and fidelity of prototype Wizard of Oz suitable for early stage, “think aloud” is not Heuristic evaluation and cognitive walkthroughs are good for mid-stage reviews, GOMS is overkill until details of interface have been finalized  Set clear metrics and objectives for evaluation Everyone should agree what is being tested. If the results of human subject tests are ambiguous, you either need a larger sample set (more time and money) or your testing procedure was too variable.  Agree how to use results of feedback before testing

Quantitative Analysis  Controlled experiments that test a hypothesis can provide convincing evidence on specific usability issues  In practice, often used in Extremely complex interfaces (aviation) High-risk (medical instruments) High-use (manufacturing) Academic research when developing new interface genres