1 Asking users & experts and Testing & modeling users Ref: Ch. 13-14.

Slides:

Advertisements

Similar presentations

©2011 1www.id-book.com Evaluation studies: From controlled to natural settings Chapter 14.

Advertisements

Chapter 15: Analytical evaluation

Chapter 14: Usability testing and field studies

Chapter 7 Data Gathering 1.

Data gathering. Overview Four key issues of data gathering Data recording Interviews Questionnaires Observation Choosing and combining techniques.

CS305: HCI in SW Development Evaluation (Return to…)

Asking Users and Experts

CS305: HCI in SW Development Continuing Evaluation: Asking Experts Inspections and walkthroughs.

Chapter 14: Usability testing and field studies. 2 FJK User-Centered Design and Development Instructor: Franz J. Kurfess Computer Science Dept.

Data gathering.

Chapter 14: Usability testing and field studies. Usability Testing Emphasizes the property of being usable Key Components –User Pre-Test –User Test –User.

Asking users & experts.

Chapter 15: Analytical evaluation. 2 FJK User-Centered Design and Development Instructor: Franz J. Kurfess Computer Science Dept. Cal Poly San.

Asking users & experts. The aims Discuss the role of interviews & questionnaires in evaluation. Teach basic questionnaire design. Describe how do interviews,

1 User-Centered Design and Development Instructor: Franz J. Kurfess Computer Science Dept. Cal Poly San Luis Obispo FJK 2009.

An evaluation framework

Asking users & experts The aims Discuss the role of interviews & questionnaires in evaluation. Teach basic questionnaire design. Describe how do interviews,

Asking users & experts. Interviews Unstructured - are not directed by a script. Rich but not replicable. Structured - are tightly scripted, often like.

Evaluation: Inspections, Analytics & Models

Chapter 7 GATHERING DATA.

FOCUS GROUPS & INTERVIEWS

From Controlled to Natural Settings

1 User-Centered Design and Development Instructor: Franz J. Kurfess Computer Science Dept. Cal Poly San Luis Obispo FJK 2005.

Design in the World of Business

©2011 1www.id-book.com Analytical evaluation Chapter 15.

Human Computer Interface

Ch 13. Asking Users & Experts Team 3:Jessica Herron Lauren Sullivan Chris Moore Steven Pautz.

Chapter 14: Usability testing and field studies

Usability testing and field studies

Evaluation Framework Prevention vs. Intervention CHONG POH WAN 21 JUNE 2011.

Chapter 11: An Evaluation Framework Group 4: Tony Masi, Sam Esswein, Brian Rood, & Chris Troisi.

Data gathering. Overview Four key issues of data gathering Data recording Interviews Questionnaires Observation Choosing and combining techniques.

Ch 14. Testing & modeling users

Chapter 7 Data Gathering 1.

Interviews. Unstructured - are not directed by a script. Rich but not replicable. Structured - are tightly scripted, often like a questionnaire. Replicable.

Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Franz J. Kurfess CPE/CSC 484: User-Centered Design and.

©2011 1www.id-book.com Introducing Evaluation Chapter 12 adapted by Wan C. Yoon

Human Computer Interaction

Usability testing. Goals & questions focus on how well users perform tasks with the product. – typical users – doing typical tasks. Comparison of products.

Usability Evaluation June 8, Why do we need to do usability evaluation?

Testing & modeling users. The aims Describe how to do user testing. Discuss the differences between user testing, usability testing and research experiments.

Level 2 Prepared by: RHR First Prepared on: Nov 23, 2006 Last Modified on: Quality checked by: MOH Copyright 2004 Asia Pacific Institute of Information.

Chapter 15: Analytical evaluation. Inspections Heuristic evaluation Walkthroughs.

Chapter 15: Analytical evaluation Q1, 2. Inspections Heuristic evaluation Walkthroughs Start Q3 Reviewers tend to use guidelines, heuristics and checklists.

Analytical evaluation Prepared by Dr. Nor Azman Ismail Department of Computer Graphics and Multimedia Faculty of Computer Science & Information System.

CSCI 4163 / CSCI 6904 – Winter Housekeeping  Clarification about due date for reading comments/questions  Skills sheet  Active listening handout.

Questionnaires Questions can be closed or open Closed questions are easier to analyze, and may be done by computer Can be administered to large populations.

AVI/Psych 358/IE 340: Human Factors Data Gathering October 6, 2008.

AVI/Psych 358/IE 340: Human Factors Data Gathering October 3, 2008.

Asking users & experts. The aims Discuss the role of interviews & questionnaires in evaluation. Teach basic questionnaire design. Describe how do interviews,

Chapter 15: Analytical evaluation. Aims: Describe inspection methods. Show how heuristic evaluation can be adapted to evaluate different products. Explain.

Oct 211 The next two weeks Oct 21 & 23: Lectures on user interface evaluation Oct 28: Lecture by Dr. Maurice Masliah No office hours (out of town) Oct.

Data gathering (Chapter 7 Interaction Design Text)

Lecture 4 Supplement – Data Gathering Sampath Jayarathna Cal Poly Pomona Based on slides created by Ian Sommerville & Gary Kimura 1.

Chapter 7 GATHERING DATA.

SIE 515 Design Evaluation Lecture 7.

Imran Hussain University of Management and Technology (UMT)

Lecture3 Data Gathering 1.

CS3205: HCI in SW Development Evaluation (Return to…)

Chapter 7 Data Gathering 1.

Chapter 7 GATHERING DATA.

GATHERING DATA.

From Controlled to Natural Settings

Chapter 7 GATHERING DATA.

From Controlled to Natural Settings

Testing & modeling users

Evaluation: Inspections, Analytics & Models

Evaluation: Inspections, Analytics, and Models

Presentation transcript:

1 Asking users & experts and Testing & modeling users Ref: Ch

2 The aims  Discuss the role of interviews & questionnaires in evaluation.  Teach basic questionnaire design.  Describe how to do interviews, heuristic evaluation & walkthroughs.  Describe how to collect, analyze & present data.  Discuss strengths & limitations of these techniques

3 Interviews Unstructured - are not directed by a script. Rich but not replicable. Unstructured - are not directed by a script. Rich but not replicable. Structured - are tightly scripted, often like a questionnaire. Replicable but may lack richness. Structured - are tightly scripted, often like a questionnaire. Replicable but may lack richness. Semi-structured - guided by a script but interesting issues can be explored in more depth. Can provide a good balance between richness and replicability. Semi-structured - guided by a script but interesting issues can be explored in more depth. Can provide a good balance between richness and replicability.

4 Basics of interviewing Remember the DECIDE framework Remember the DECIDE framework Goals and questions guide all interviews Goals and questions guide all interviews Two types of questions: ‘closed questions’ have a predetermined answer format, e.g., ‘yes’ or ‘no’ ‘open questions’ do not have a predetermined format Two types of questions: ‘closed questions’ have a predetermined answer format, e.g., ‘yes’ or ‘no’ ‘open questions’ do not have a predetermined format Closed questions are quicker and easier to analyze Closed questions are quicker and easier to analyze Determine the goals the evaluation addresses. Explore the specific questions to be answered. Choose the evaluation paradigm and techniques to answer the questions. Identify the practical issues. Decide how to deal with the ethical issues. Evaluate, interpret and present the data.

5 Things to avoid when preparing interview questions  Long questions  Compound sentences - split into two  Jargon & language that the interviewee may not understand  Leading questions that make assumptions e.g., why do you like …?  Unconscious biases e.g., gender stereotypes

6 Components of an interview Introduction - introduce yourself, explain the goals of the interview, reassure about the ethical issues, ask to record, present an informed consent form. Introduction - introduce yourself, explain the goals of the interview, reassure about the ethical issues, ask to record, present an informed consent form. Warm-up - make first questions easy & non-threatening. Warm-up - make first questions easy & non-threatening. Main body – present questions in a logical order Main body – present questions in a logical order A cool-off period - include a few easy questions to defuse tension at the end A cool-off period - include a few easy questions to defuse tension at the end Closure - thank interviewee, signal the end, e.g, switch recorder off. Closure - thank interviewee, signal the end, e.g, switch recorder off.

7 The interview process Use the DECIDE framework for guidance Use the DECIDE framework for guidance Dress in a similar way to participants Dress in a similar way to participants Check recording equipment in advance Check recording equipment in advance Devise a system for coding names of participants to preserve confidentiality. Devise a system for coding names of participants to preserve confidentiality. Be pleasant Be pleasant Ask participants to complete an informed consent form Ask participants to complete an informed consent form

8 Probes and prompts Probes - devices for getting more information. e.g., ‘would you like to add anything?’ Probes - devices for getting more information. e.g., ‘would you like to add anything?’ Prompts - devices to help interviewee, e.g., help with remembering a name Prompts - devices to help interviewee, e.g., help with remembering a name Remember that probing and prompting should not create bias. Remember that probing and prompting should not create bias. Too much can encourage participants to try to guess the answer. Too much can encourage participants to try to guess the answer.

9 Group interviews Also known as ‘focus groups’ Also known as ‘focus groups’ Typically 3-10 participants Typically 3-10 participants Provide a diverse range of opinions Provide a diverse range of opinions Need to be managed to: - ensure everyone contributes - discussion isn’t dominated by one person - the agenda of topics is covered Need to be managed to: - ensure everyone contributes - discussion isn’t dominated by one person - the agenda of topics is covered

10 Analyzing interview data Depends on the type of interview Depends on the type of interview Structured interviews can be analyzed like questionnaires Structured interviews can be analyzed like questionnaires Unstructured interviews generate data like that from participant observation Unstructured interviews generate data like that from participant observation It is best to analyze unstructured interviews as soon as possible to identify topics and themes from the data It is best to analyze unstructured interviews as soon as possible to identify topics and themes from the data

11 Asking Users: Questionnaires Questions can be closed or open Questions can be closed or open Closed questions are easiest to analyze, and may be done by computer Closed questions are easiest to analyze, and may be done by computer Can be administered to large populations Can be administered to large populations Paper, & the web used for dissemination Paper, & the web used for dissemination Advantage of electronic questionnaires is that data goes into a data base & is easy to analyze Advantage of electronic questionnaires is that data goes into a data base & is easy to analyze Sampling can be a problem when the size of a population is unknown as is common online Sampling can be a problem when the size of a population is unknown as is common online

12 Questionnaire style Varies according to goal so use the DECIDE framework for guidance Varies according to goal so use the DECIDE framework for guidance Questionnaire format can include: - ‘yes’, ‘no’ checkboxes - checkboxes that offer many options - Likert rating scales 1, 2, 3,4, 5 - semantic scales - open-ended responses Questionnaire format can include: - ‘yes’, ‘no’ checkboxes - checkboxes that offer many options - Likert rating scales 1, 2, 3,4, 5 - semantic scales - open-ended responses Likert scales have a range of points Likert scales have a range of points 3, 5, 7 & 9 point scales are common 3, 5, 7 & 9 point scales are common Debate about which is best Debate about which is best Attractive |___|_X_|___|___|___|Ugly Clear|___|___|_X_|___|___|Confusing Dull |___|___|___|___|___|Colorful Exciting |___|_X_|___|___|___|Boring Annoying |___|___|___|___|_X_|Pleasing Poor |___|___|___|_X_|___|Well-designed

13 Developing a questionnaire Provide a clear statement of purpose & guarantee participants anonymity Provide a clear statement of purpose & guarantee participants anonymity Plan questions - if developing a web-based questionnaire, design off-line first Plan questions - if developing a web-based questionnaire, design off-line first Decide on whether phrases will all be positive, all negative or mixed Decide on whether phrases will all be positive, all negative or mixed Pilot test questions - are they clear, is there sufficient space for responses Pilot test questions - are they clear, is there sufficient space for responses Decide how data will be analyzed & consult a statistician if necessary Decide how data will be analyzed & consult a statistician if necessary

14 Encouraging a good response Make sure purpose of study is clear Make sure purpose of study is clear Promise anonymity Promise anonymity Ensure questionnaire is well designed Ensure questionnaire is well designed Offer a short version for those who do not have time to complete a long questionnaire Offer a short version for those who do not have time to complete a long questionnaire If mailed, include a s.a.e. If mailed, include a s.a.e. Follow-up with s, phone calls, letters Follow-up with s, phone calls, letters Provide an incentive Provide an incentive 40% response rate is high, 20% is often acceptable 40% response rate is high, 20% is often acceptable

15 Advantages of online questionnaires  Responses are usually received quickly  No copying and postage costs  Data can be collected in database for analysis  Time required for data analysis is reduced  Errors can be corrected easily  Disadvantage - sampling problematic if population size unknown  Disadvantage - preventing individuals from responding more than once

16 Problems with online questionnaires  Sampling is problematic if population size is unknown  Preventing individuals from responding more than once

17 Developing a web-based questionnaire Produce an error-free interactive interactive electronic version from the original paper-based one. Produce an error-free interactive interactive electronic version from the original paper-based one. Make the questionnaire accessible from all common browsers and readable from different-size monitors and different network locations. Make the questionnaire accessible from all common browsers and readable from different-size monitors and different network locations. Make sure information identifying each respondent will be captured and stored confidentially because the same person may submit several complete surveys. Make sure information identifying each respondent will be captured and stored confidentially because the same person may submit several complete surveys. User-test the survey with pilot studies before distributing. User-test the survey with pilot studies before distributing.

18 Questionnaire data analysis & presentation Present results clearly - tables may help Present results clearly - tables may help Simple statistics can say a lot, e.g., mean, median, mode, standard deviation Simple statistics can say a lot, e.g., mean, median, mode, standard deviation Percentages are useful but give population size Percentages are useful but give population size Bar graphs show categorical data well Bar graphs show categorical data well More advanced statistics can be used if needed More advanced statistics can be used if needed

19 Asking experts Inspections Inspections Experts use their knowledge of users & technology to review software usability Experts use their knowledge of users & technology to review software usability Expert critiques (crits) can be formal or informal reports Expert critiques (crits) can be formal or informal reports Heuristic evaluation is a review guided by a set of heuristics Heuristic evaluation is a review guided by a set of heuristics Walkthroughs involve stepping through a pre- planned scenario noting potential problems Walkthroughs involve stepping through a pre- planned scenario noting potential problems

20 Heuristic evaluation Developed Jacob Nielsen in the early 1990s Developed Jacob Nielsen in the early 1990s Based on heuristics distilled from an empirical analysis of 249 usability problems Based on heuristics distilled from an empirical analysis of 249 usability problems These heuristics have been revised for current technology, e.g., HOMERUN for web These heuristics have been revised for current technology, e.g., HOMERUN for web Heuristics still needed for mobile devices, wearables, virtual worlds, etc. Heuristics still needed for mobile devices, wearables, virtual worlds, etc. Design guidelines form a basis for developing heuristics Design guidelines form a basis for developing heuristics

21 Nielsen’s heuristics Visibility of system status Visibility of system status Match between system and real world Match between system and real world User control and freedom User control and freedom Consistency and standards Consistency and standards Help users recognize, diagnose, recover from errors Help users recognize, diagnose, recover from errors Error prevention Error prevention Recognition rather than recall Recognition rather than recall Flexibility and efficiency of use Flexibility and efficiency of use Aesthetic and minimalist design Aesthetic and minimalist design Help and documentation Help and documentation

22 Discount evaluation Heuristic evaluation is referred to as discount evaluation when 5 evaluators are used. Heuristic evaluation is referred to as discount evaluation when 5 evaluators are used. Empirical evidence suggests that on average 5 evaluators identify 75-80% of usability problems. Empirical evidence suggests that on average 5 evaluators identify 75-80% of usability problems.

23 3 stages for doing heuristic evaluation Briefing session to tell experts what to do Briefing session to tell experts what to do Evaluation period of 1-2 hours in which: - Each expert works separately - Take one pass to get a feel for the product - Take a second pass to focus on specific features Evaluation period of 1-2 hours in which: - Each expert works separately - Take one pass to get a feel for the product - Take a second pass to focus on specific features Debriefing session in which experts work together to prioritize problems Debriefing session in which experts work together to prioritize problems

24 Advantages and problems Few ethical & practical issues to consider Few ethical & practical issues to consider Can be difficult & expensive to find experts Can be difficult & expensive to find experts Best experts have knowledge of application domain & users Best experts have knowledge of application domain & users Biggest problems - important problems may get missed - many trivial problems are often identified Biggest problems - important problems may get missed - many trivial problems are often identified

25 Cognitive walkthroughs Focus on ease of learning Focus on ease of learning Designer presents an aspect of the design & usage scenarios Designer presents an aspect of the design & usage scenarios One of more experts walk through the design prototype with the scenario One of more experts walk through the design prototype with the scenario Expert is told the assumptions about user population, context of use, task details Expert is told the assumptions about user population, context of use, task details Experts are guided by 3 questions Experts are guided by 3 questions

26 The 3 questions Will the correct action be sufficiently evident to the user? Will the correct action be sufficiently evident to the user? Will the user notice that the correct action is available? Will the user notice that the correct action is available? Will the user associate and interpret the response from the action correctly? As the experts work through the scenario they note problems Will the user associate and interpret the response from the action correctly? As the experts work through the scenario they note problems

27 Pluralistic walkthrough Variation on the cognitive walkthrough theme Variation on the cognitive walkthrough theme Performed by a carefully managed team Performed by a carefully managed team The panel of experts begins by working separately The panel of experts begins by working separately Then there is managed discussion that leads to agreed decisions Then there is managed discussion that leads to agreed decisions The approach lends itself well to participatory design The approach lends itself well to participatory design

28 Key points Structured, unstructured, semi-structured interviews, focus groups & questionnaires Structured, unstructured, semi-structured interviews, focus groups & questionnaires Closed questions are easiest to analyze & can be replicated Closed questions are easiest to analyze & can be replicated Open questions are richer Open questions are richer Check boxes, Likert & semantic scales Check boxes, Likert & semantic scales Expert evaluation: heuristic & walkthroughs Expert evaluation: heuristic & walkthroughs Relatively inexpensive because no users Relatively inexpensive because no users Heuristic evaluation relatively easy to learn Heuristic evaluation relatively easy to learn May miss key problems & identify false ones May miss key problems & identify false ones

29 Testing & modeling users

30 The aims  Describe how to do user testing.  Discuss the differences between user testing, usability testing and research experiments.  Discuss the role of user testing in usability testing.  Discuss how to design simple experiments.  Describe GOMS, the keystroke level model, Fitts’ law and discuss when these techniques are useful.  Describe how to do a keystroke level analysis.

31 Experiments, user testing & usability testing Experiments test hypotheses to discover new knowledge by investigating the relationship between two or more things – i.e., variables. Experiments test hypotheses to discover new knowledge by investigating the relationship between two or more things – i.e., variables. User testing is applied experimentation in which developers check that the system being developed is usable by the intended user population for their tasks. User testing is applied experimentation in which developers check that the system being developed is usable by the intended user population for their tasks. Usability testing uses a combination of techniques, including user testing & user satisfaction questionnaires. Usability testing uses a combination of techniques, including user testing & user satisfaction questionnaires.

32 User testing is not research User testing Aim: improve products Aim: improve products Few participants Few participants Results inform design Results inform design Not perfectly replicable Not perfectly replicable Controlled conditions Controlled conditions Procedure planned Procedure planned Results reported to developers Results reported to developers Research experiments Aim: discover knowledge Many participants Results validated statistically Replicable Strongly controlled conditions Experimental design Scientific paper reports results to community

33 User testing Goals & questions focus on how well users perform tasks with the product Goals & questions focus on how well users perform tasks with the product Comparison of products or prototypes common Comparison of products or prototypes common Major part of usability testing Major part of usability testing Focus is on time to complete task & number & type of errors Focus is on time to complete task & number & type of errors Informed by video & interaction logging Informed by video & interaction logging User satisfaction questionnaires provide data about users’ opinions User satisfaction questionnaires provide data about users’ opinions

34 Testing conditions Usability lab or other controlled space Usability lab or other controlled space Major emphasis on - selecting representative users - developing representative tasks Major emphasis on - selecting representative users - developing representative tasks 5-10 users typically selected 5-10 users typically selected Tasks usually last no more than 30 minutes Tasks usually last no more than 30 minutes The test conditions should be the same for every participant The test conditions should be the same for every participant Informed consent form explains ethical issues Informed consent form explains ethical issues

35 Type of data (Wilson & Wixon, ‘97)  Time to complete a task  Time to complete a task after a specified time away from the product  Number and type of errors per task  Number of errors per unit of time  Number of navigations to online help or manuals  Number of users making a particular error  Number of users completing task successfully

36 How many participants is enough for user testing? The number is largely a practical issue The number is largely a practical issue Depends on: - schedule for testing - availability of participants - cost of running tests Depends on: - schedule for testing - availability of participants - cost of running tests Typical 5-10 participants Typical 5-10 participants Some experts argue that testing should continue until no new insights are gained Some experts argue that testing should continue until no new insights are gained

37 Experiments Predict the relationship between two or more variables Predict the relationship between two or more variables Independent variable is manipulated by the researcher Independent variable is manipulated by the researcher Dependent variable depends on the independent variable Dependent variable depends on the independent variable Typical experimental designs have one or two independent variable Typical experimental designs have one or two independent variable

38 Experimental designs Different participants - single group of participants is allocated randomly to the experimental conditions Different participants - single group of participants is allocated randomly to the experimental conditions Same participants - all participants appear in all conditions Same participants - all participants appear in all conditions Matched participants - participants are matched in pairs, e.g., based on expertise, gender Matched participants - participants are matched in pairs, e.g., based on expertise, gender

39 Advantages & disadvantages

40 Predictive models Provide a way of evaluating products or designs without directly involving users Provide a way of evaluating products or designs without directly involving users Psychological models of users are used to test designs Psychological models of users are used to test designs Less expensive than user testing Less expensive than user testing Usefulness limited to systems with predictable tasks - e.g., telephone answering systems, mobiles, etc. Usefulness limited to systems with predictable tasks - e.g., telephone answering systems, mobiles, etc. Based on expert behavior Based on expert behavior

41 GOMS (Card et al., 1983) Goals - the state the user wants to achieve e.g., find a website Goals - the state the user wants to achieve e.g., find a website Operators - the cognitive processes & physical actions performed to attain those goals, e.g., decide which search engine to use Operators - the cognitive processes & physical actions performed to attain those goals, e.g., decide which search engine to use Methods - the procedures for accomplishing the goals, e.g., drag mouse over field, type in keywords, press the go button Methods - the procedures for accomplishing the goals, e.g., drag mouse over field, type in keywords, press the go button Selection rules - determine which method to select when there is more than one available Selection rules - determine which method to select when there is more than one available

42 Benefits and limitations of GOMS Help make decisions about the effectiveness of new products Help make decisions about the effectiveness of new products Allow comparative analysis to be performed for different interfaces Allow comparative analysis to be performed for different interfaces Difficult or impossible to predict how an average user will carry out their tasks Difficult or impossible to predict how an average user will carry out their tasks

43 Keystroke level model GOMS has also been developed further into a quantitative model - the keystroke level model. This model allows predictions to be made about how long it takes an expert user to perform a task.

44 Response times for keystroke level operators 0.35

45 Fitts’ Law (Paul Fitts 1954) The law predicts that the time to point at an object using a device is a function of the distance from the target object & the object’s size. The law predicts that the time to point at an object using a device is a function of the distance from the target object & the object’s size. The further away & the smaller the object, the longer the time to locate it and point. The further away & the smaller the object, the longer the time to locate it and point. Useful for evaluating systems for which the time to locate an object is important such as handheld devices like mobile phones Useful for evaluating systems for which the time to locate an object is important such as handheld devices like mobile phones

46 Fitt’s Law T = k ln(D/S+0.5) k~100ms T = k ln(D/S+0.5) k~100ms T: time to move the hand to a target T: time to move the hand to a target D: the distance between hand and target D: the distance between hand and target S: size of target S: size of target

47 Key points  User testing is a central part of usability testing  Testing is done in controlled conditions  User testing is an adapted form of experimentation  Experiments aim to test hypotheses by manipulating certain variables while keeping others constant  The experimenter controls the independent variable(s) but not the dependent variable(s)  There are three types of experimental design: different-participants, same- participants, & matched participants  GOMS, Keystroke level model, & Fitts’ Law predict expert, error- free performance  Predictive models are used to evaluate systems with predictable tasks such as telephones