Evaluation How do we test the interaction design? Several Dimensions Qualitative vs. Quantitative assessments Conceptual vs. Physical Design
Why Evaluate Five good reasons: Problems fixed before product is shipped Team can concentrate on real (not imaginary) problems Engineers can develop code instead of debating about their personal preferences Time to market is sharply reduced Solid, tested design will sell better
When to Evaluate Formative Evaluations Summative Evaluations Conducted during requirements specification and design Consider alternatives Summative Evaluations Assess the success of a finished product Determine whether product satisfies requirements
What to Evaluate A huge variety of User Interaction features can (and should) be evaluated, such as: Sequence of links in a web search Enjoyment experienced by game users System response time Signal detection performance in data analysis Gould’s principles: Focus on users and their tasks Observe, measure, and analyze user performance Design iteratively
Qualitative Assessment Informal Simply ask users how they like the system Listen to “hallway” conversations about systems Formal Develop survey instruments to ask specific questions, e.g. How long did it take you to become comfortable? Which task is the most difficult to accomplish? Hold focus group discussions about system
Quantitative Assessment Identify Usability Criteria (from Requirements) to test Design human performance experiments to test these, e.g. Measure response time or time to complete a task Measure error rate or incidence of “dead end” This can be used during the design process to compare alternative designs
An Evaluation Framework Evaluation must be an intentional, planned process ad hoc evaluations are of very little value The details of the particular framework can vary from team to team What is important is that the framework be crafted in advance, and that all team members understand the framework
Evaluation Paradigms Evaluation Paradigms are the beliefs and practices (perhaps underpinned by theory) that guide a user study We’ll discuss four core evaluation paradigms: Quick and Dirty evaluation Usability Testing Field Studies Predictive Evaluation
Quick and Dirty Informal feedback from users Can be conducted at any stage Emphasis is on speed, not quality Often consultants are used as surrogate users
Usability Testing Measuring typical users’ performance on carefully prepared tasks that are typical for the system Metrics can include such things as Error rate and time to completion Observations/recordings/logs of interaction Questionnaires Strongly controlled by the evaluator
What is usability? An Operational Definition As well as Efficient Effective Safe Easy To learn To remember To use Productive As well as Satisfying Enjoyable Pleasing Motivating Fulfilling
Field Studies Done in natural settings Try to learn what users do and how Artifacts are collected Video, notes, sketches, &c Two approaches: As an outsider looking on Qualitative techniques used to gather data Which may be analyzed qualitatively or quantitatively As an insider Easier to capture role of social environment
Predictive Evaluation Uses models of typical users Heuristic or theoretical Users themselves need not be present Cheaper, faster Tried and true heuristics can be useful E.g. speak the users’ language
Evaluation Techniques Observing users Asking users their opinions Asking experts their opinions Testing users’ performance Modeling users’ task performance to predict efficacy of the interface
Techniques vs. Paradigms Models used to predict efficacy N/A Modeling user’s performance Can measure performance, but difficult Test typical users, typical tasks User testing Heuristics early in design Provide critiques Asking experts May interview Questionnaires & interviews Discussions, focus groups Asking users Central technique Video and logging See how users behave Observing Users Predictive Field Studies Usability Testing Quick and Dirty Techniques Evaluation Paradigms
DECIDE Determine the overall goals that the evaluation addresses Explore the specific questions to be answered Choose the evaluation paradigm and techniques Identify the practical issues that must be addressed Decide how to deal with the ethical issues Evaluate, interpret, and present the data
Determine the goals What are the high-level goals of the evaluation? Who wants the evaluation and why? Should guide the evaluation, i.e.: Check that evaluators understood users’ needs Identify the metaphor under the design Ensure that interface is consistent Investigate degree to which technology influences working practices Identify how the interface of an existing product could be engineered to improve its usability
Explore the questions This amounts to hierarchical question development: “Is the user interface good?” “Is the system easy to learn?” “Are functions easy to find?” “Is the terminology confusing?” “Is response time too slow?” “Is login time too slow?” “Is calculation time too slow?” …
Choose the evaluation paradigm and techniques Choosing one or more evaluation paradigms Can use different paradigms in different stages Can use multiple paradigms in a single stage Combinations of techniques can be used to obtain different perspectives
Identify the practical issues Practical issues must be considered BEFORE beginning any evaluation Users Adequate number of representative users must be found Facilities and equipment How many cameras? Where? Film? Schedule and budget Both always less than would be ideal Expertise Assemble the correct evaluation team
Decide how to deal with ethical issues Experiments involving humans must be conducted within strict ethical guidelines Tell participants the goals and what will happen Explain that personal information is confidential They’re free to stop at any time Pay subjects when possible: formal relationship Avoid using quotes that reveal identity Ask users’ permission to quote them, show them the report Example: Yale shock experiment 1961-2
Evaluate, interpret and present the data What data to collect, how to analyze them Questions need to be asked Reliability: is it reproducible? Validity: measures what it’s supposed to Biases: biases cause distortion Scope: how generalizable? Ecological validity: how important is the evaluation environment – does it match the real environment of interest?
Stopped Here
Observing Users Ethnography – observing the social environment and recording observations which help to understand the function and needs of the people in it Users can be observed in controlled laboratory conditions or in natural environments in which the products are used – i.e. the field
Goals, questions and paradigms Goals and questions should guide all evaluation studies Ideally, these are written down Goals help to guide the observation because there is always so much going on
What and when to observe Insider or outsider? Laboratory or field? Control vs. realism What times are critical times (especially for field observations)?
Approaches to observation Quick and dirty Informal Insider or outsider Observation in usability testing Formal Video, interaction logs, performance data Outsider Observation in field studies Outsider, participant, or ethnographer (participant or not?)
How to observe Techniques of observation and data gathering vary
In controlled environments Decide location Temporary lab in user’s environment? Remote laboratory? Equipment Hard to know what user is thinking “Think Aloud” technique But speaking can alter the interaction Having two subjects work together can help
In the field Who is present? What is happening? What are their roles? What is happening? Include body language, tone When does activity occur? Where is it happening? Why is it happening How is the activity organized?
Participant observation and ethnography In this case, the observer/evaluator must be accepted into the group Honesty about purpose is important both ethically and to gain trust Disagreement in the field about the distinction between ethnography and participant observation Do ethnographers begin with any assumptions?
Data collection Notes plus still camera Audio recording plus still camera Video
Indirect observation: tracking users’ activities Diaries Interaction logging
Analyzing, interpreting, and presenting data Observation produces large quantities of data of various types How to analyze and interpret depends on the research questions first developed
Qualitative analysis to tell a story The ensemble of data (notes, video, diaries, &c) are used to help designers, as a team, understand the users There is much room for evaluator bias in these techniques
Qualitative analysis for categorization A taxonomy can be developed into which user’s behaviors can be placed This can be done by different observers, with the discrepancies used as a measure of observer bias
Quantitative data analysis Observations, interaction logs, and results are gathered and quantified Counted Measured Analysis using statistical reasoning can be used to draw conclusions What is statistical significance? What is a T-test?
Case Study: Evaluating a Data Representation
Feeding the findings back into design Ideally, the design team will participate in post-evaluation discussions of qualitative data Reports to designers should include artifacts, such as quotes, anecdotes, pictures, video clips Depending on the design team, quantitative data may or may not be compelling