Involving Users in Interface Evaluation Marti Hearst (UCB SIMS) SIMS 213, UI Design & Development April 8, 1999.

Slides:



Advertisements
Similar presentations
©2011 1www.id-book.com Evaluation studies: From controlled to natural settings Chapter 14.
Advertisements

Chapter 14: Usability testing and field studies
Lesson Overview 1.1 What Is Science?.
Task-Centered User Interface Design who are the users? what are the tasks? plagiarize! iterative design –rough descriptions, mock-ups, prototypes test.
Surveys and Questionnaires. How Many People Should I Ask? Ask a lot of people many short questions: Yes/No Likert Scale Ask a smaller number.
6.811 / PPAT: Principles and Practice of Assistive Technology Wednesday, 16 October 2013 Prof. Rob Miller Today: User Testing.
HCI 특론 (2007 Fall) User Testing. 2 Hall of Fame or Hall of Shame? frys.com.
Lesson Overview 1.1 What Is Science?.
1. Review- What is Science Explain- What kinds of understandings does science contribute about the natural world Form an Opinion- Do you think that scientists.
Chapter 14: Usability testing and field studies. 2 FJK User-Centered Design and Development Instructor: Franz J. Kurfess Computer Science Dept.
CyLab Usable Privacy and Security Laboratory 1 C yLab U sable P rivacy and S ecurity Laboratory Designing.
CS160 Discussion Section Matthew Kam Apr 14, 2003.
Chapter 14: Usability testing and field studies. Usability Testing Emphasizes the property of being usable Key Components –User Pre-Test –User Test –User.
1 User Testing. 2 Hall of Fame or Hall of Shame? frys.com.
SIMS 213: User Interface Design & Development Marti Hearst Thurs, March 13, 2003.
User Interface Testing. Hall of Fame or Hall of Shame?  java.sun.com.
Usability Inspection n Usability inspection is a generic name for a set of methods based on having evaluators inspect or examine usability-related issues.
Evaluation Methodologies
I213: User Interface Design & Development Marti Hearst March 1, 2007.
SIMS 213: User Interface Design & Development Marti Hearst March 9 and 16, 2006.
SIMS 213: User Interface Design & Development Marti Hearst Tues Feb 13, 2001.
An evaluation framework
SIMS 213: User Interface Design & Development Marti Hearst March 9 & 11, 2004.
SIMS 213: User Interface Design & Development Marti Hearst Thurs, Jan 20, 2005.
SIMS 213: User Interface Design & Development Marti Hearst Thurs, Jan 22, 2004.
SIMS 213: User Interface Design & Development Marti Hearst Thurs, Jan 18, 2007.
User Testing CSE 510 Richard Anderson Ken Fishkin.
From Controlled to Natural Settings
Prof. James A. Landay Computer Science Department Stanford University Autumn 2014 HCI+D: USER INTERFACE DESIGN + PROTOTYPING + EVALUATION Usability Testing.
Nature of Science.
Heuristic Evaluation “Discount” Usability Testing Adapted from material by Marti Hearst, Loren Terveen.
Chapter 14: Usability testing and field studies
Copyright © 2010 Pearson Education, Inc. Chapter 13 Experiments and Observational Studies.
OHTO -99 SOFTWARE ENGINEERING “SOFTWARE PRODUCT QUALITY” Today: - Software quality - Quality Components - ”Good” software properties.
Human Computer Interaction
Introduction to Experimental Design
Test Loads Andy Wang CIS Computer Systems Performance Analysis.
Assumes that events are governed by some lawful order
Usability Testing Chapter 6. Reliability Can you repeat the test?
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 1. The Statistical Imagination.
Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?
AMSc Research Methods Research approach IV: Experimental [1] Jane Reid
Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?
Formal User Studies Marti Hearst (UCB SIMS) SIMS 213, UI Design & Development April 13, 1999.
EVALUATION PROfessional network of Master’s degrees in Informatics as a Second Competence – PROMIS ( TEMPUS FR-TEMPUS-JPCR)
Prof. James A. Landay University of Washington Autumn 2004 User Testing December 2, 2004.
Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?
Usability Testing Instructions. Why is usability testing important? In a perfect world, we would always user test instructions before we set them loose.
Prof. James A. Landay University of Washington Autumn 2006 User Testing November 30, 2006.
User Testing. CSE490f - Autumn 2006User Interface Design, Prototyping, & Evaluation2 Hall of Fame or Hall of Shame? frys.com.
Lesson Overview Lesson Overview What Is Science?.
Welcome! Seminar – Monday 6:00 EST HS Seminar Unit 1 Prof. Jocelyn Ramos.
Usability Evaluation or, “I can’t figure this out...do I still get the donuts?”
How do we know if our UI is good or bad?.
11/10/981 User Testing CS 160, Fall ‘98 Professor James Landay November 10, 1998.
Evaluation / Usability. ImplementDesignAnalysisEvaluateDevelop ADDIE.
Test Loads Andy Wang CIS Computer Systems Performance Analysis.
Day 8 Usability testing.
Professor John Canny Spring 2003
From Controlled to Natural Settings
based on notes by James Landay
User Testing November 27, 2007.
i213: User Interface Design & Development
Usability Testing November 12, 2018
Professor John Canny Spring 2004
Experimental Evaluation
SE365 Human Computer Interaction
Professor John Canny Fall 2004
Human-Computer Interaction: Overview of User Studies
Presentation transcript:

Involving Users in Interface Evaluation Marti Hearst (UCB SIMS) SIMS 213, UI Design & Development April 8, 1999

Adapted from slide by James Landay Outline l Why do user testing? l Informal studies –collecting and analyzing process data –ethical considerations l Formal studies –chosing variables –interaction effects –special considerations for studies involving users

What is Usability? l The extent to which a product can be used by specified users to achieve specified goals with –effectiveness –efficiency –satisfaction in a specified context of use. [ISO9241] l Usability evaluation is a methodology for measuring these usability aspects of a user interface

Adapted from slide by James Landay Why do User Testing? l Can’t tell how good or bad UI is until: –people use it! l Other methods are based on evaluators: –may know too much –may not know enough (about tasks, etc.) l Summary: Hard to predict what real users will do

Adapted from slide by James Landay Two Main Approaches l Less formal, get a feeling for how users will use interface –participants work through task scenarios –gather process data –obtain user preference information l Formal studies –isolate the effects of particular UI components –compare competing designs –make quantitative measurements

Adapted from slide by James Landay Why Two Main Approaches? l Informal Study –Prcess data is easier to gather –Gives an overview of where big problems are l Formal Study –need many participants to prove your points (obtain statistical significance) –experiments that isolate effects properly often end up measuring things that are too fine- grained to really inform UI design

Adapted from slide by James Landay Informal Study l Select tasks l Select participant groups l Decide methodology for collecting data and what kinds of data to collect l Do the study l Analyze the results l Make recommendations for changes to the design

Adapted from slide by James Landay Selecting Tasks l Should reflect what real tasks will be like l Tasks from analysis & design can be used –may need to shorten if »they take too long »require background that test user won’t have l Avoid bending tasks in direction of what your design best supports l May have to simplity in order to produce usable results

Adapted from slide by James Landay Choosing Participants l Should be representative of eventual users in terms of –job-specific vocabulary / knowledge –tasks l If you can’t get real users, get approximation –system intended for doctors »get medical students –system intended for electrical engineers »get engineering students l Use incentives to get participants

Adapted from slide by James Landay Deciding on Data to Collect l Process data –observations of what users are doing & thinking –kinds of errors made –general strategies used and not used

Adapted from slide by James Landay The “Thinking Aloud” Method l Need to know what users are thinking, not just what they are doing l Ask users to talk while performing tasks –tell us what they are thinking –tell us what they are trying to do –tell us questions that arise as they work –tell us things they read l Make a recording or take good notes –make sure you can tell what they were doing

Adapted from slide by James Landay Thinking Aloud (cont.) l Prompt the user to keep talking –“tell me what you are thinking” l Only help on things you have pre- decided –keep track of anything you do give help on l Recording –use a digital watch/clock –take notes, plus if possible »record audio and video (or event logs)

Adapted from slide by James Landay Ethical Considerations l Sometimes tests can be distressing l You have a responsibility to alleviate this –make voluntary with informed consent form –avoid pressure to participate –let them know they can stop at any time [Gomoll] –stress that you are testing the system, not them –make collected data as anonymous as possible l Often must get official approval for use of human subjects –There is a campus exception for class projects

Adapted from slide by James Landay User Test Proposal l A report that contains –objective –description of system being testing –hypotheses –task environment & materials –participants –methodology –tasks –test measures l A good strategy: –Get this approved & then reuse it when writing up your results

Adapted from slide by James Landay Using the Test Results l Summarize the data –make a list of all critical incidents (CI) »positive & negative –include references back to original data –try to judge why each difficulty occurred l What does data tell you? –Did the UI work the way you thought it would? –Is something missing?

Adapted from slide by James Landay Using the Results (cont.) l Update task analysis and rethink design –rate severity & ease of fixing CIs –fix both severe problems & make the easy fixes l Will thinking aloud give the right answers? –not always –if you ask a question, people will always give an answer, even it is has nothing to do with the facts »try to avoid specific questions

Adapted from slide by James Landay Measuring User Preference l How much users like or dislike the system –often use Likert scales –or have them choose among statements »“best UI I’ve ever…”, “better than average”… –hard to be sure what data will mean »novelty of UI, feelings, not realistic setting, etc. –Shneiderman’s QUIS is a general example (in the reader) l If many give you low ratings -> trouble l Can get some useful data by asking –what they liked, disliked, where they had trouble, best part, worst part, etc. (redundant questions)

Adapted from slide by James Landay Formal Usability Studies l Situations in which these are useful –to determine time requirements for task completion –to compare two designs on measurable aspects »time required »number of errors »effectiveness for achieving very specific tasks l Do not combine with thinking-aloud –talking can affect speed & accuracy (neg. & pos.) l Require Experiment Design

Experiment Design l Experiment design involves determining how many experiments to run and which attributes to vary in each experiment l Goal: isolate which aspects of the interface really make a difference

Experiment Design l Decide on –Response variables »the outcome of the experiment »usually the system performance »aka dependent variable(s) –Factors (aka attributes)) »aka independent variables –Levels (aka values for attributes) –Replication »how often to repeat each combination of choices

Experiment Design l Studying a system; ignoring users l Say we want to determine how to configure the hardware for a personal workstation (from Jain 91, The art of computer systems performance analysis) –Hardware choices »which CPU (three types) »how much memory (four amounts) »how many disk drives (from 1 to 3) –Workload characteristics »administration, management, scientific

Experiment Design l We want to isolate the effect of each component for the given workload type. l How do we do this? –WL1CPU1 Mem1Disk1 –WL1CPU1Mem1Disk2 –WL1CPU1Mem1Disk3 –WL1CPU1Mem2Disk1 –WL1CPU1Mem2Disk2 –…–… l There are (3 CPUs)*(4 memory sizes)*(3 disk sizes)*(3 workload types) = 108 combinations!

Experiment Design l One strategy to reduce the number of comparisons needed: –pick just one attribute –vary it –hold the rest constant l Problems: –inefficient –might miss effects of interactions

Interactions among Attributes A1A2 B135 B268 A1A2 B135 B269 A1 B1 A2 A1 B2 A2 B2 Non-interactingInteracting

Experiment Design l Another strategy: figure out which attributes are important first l Do this by just comparing a few major attributes at a time –if an attribute has a strong effect, include it in future studies –otherwise assume it is safe to drop it l This strategy also allows you to find interactions between a few attributes

Special Considerations for Formal Studies with Human Participants l Studies involving human participants vs. measuring automated systems –people get tired –people get bored –people (may) get upset by some tasks –learning effects »people will learn how to do the tasks (or the answers to questions) if repeated »people will (usually) learn how to use the system over time

More Special Considerations l High variability among people –especially when involved in reading/comprehension tasks –especially when following hyperlinks! (can go all over the place)

Adapted from slide by James Landay Between Groups vs. Within Groups l Do participants see only one design or both? l Between groups –two groups of test users –each group uses only 1 of the systems l Within groups experiment –one group of test users »each person uses both systems »can’t use the same tasks (learning) –best for low-level interaction techniques l Why is this a consideration?n –People often learn during the experiment.

Summary l User testing is important, but takes time/effort l Use real tasks & representative participants l Be ethical & treat your participants well l Want to know what people are doing & why –collect process data –early testing can be done on mock-ups (low-fi) l More on formal studies next time.