Usability Assessment, Evaluation and Testing Laura and Julie.

Usability Assessment, Evaluation and Testing Laura and Julie

Readings and reference Chapter 12 of protobook

Usability Assessment, Evaluation and Testing Alongside task analysis, assessment of usability is the most common form of human factors activity. Recall that when we discussed the Usability Engineering lifecycle, we noted that the decision to move forward or backward through the lifecycle is typically based on the assessment of prototype. In Usability Engineering, this assessment is often made on the basis of usability. Without evaluation and feedback the only source of information about usability is the developer –Developer’s viewpoint may be biased, skewed or simply incomplete due to lack of domain knowledge.

Why Evaluate? To demonstrate a weakness or strength of a design feature during the design process To evaluate the adequacy of an overall design or product, or of particular design features Because guidelines, principles, etc. do not always apply

Why Evaluate? Because guidelines, principles, etc. are not always persuasive Because designers require feedback Because all people (including designers) make mistakes –Can view evaluation and assessment of designs and completed products as a form of proof-reading

Guiding Principles for Evaluation Identify quantifiable goals and characteristics of user interface that can be used as criterion of “good enough” Set up as soon as possible in development cycle. We called these “usability requirements” when we discussed specification of user interfaces. Measure at several points during project

Eason (1984): Usability is a function of... System characteristics Task characteristicsUser characteristics

Steps in Evaluation Determine what aspect of usability to evaluate –What is the acceptable level of this attribute? How will you measure or determine each usability characteristic –Will your determination and measure be objective, subjective….

Types of Evaluations Predictive Model –Developer builds a design or competing designs and identifies prototypic tasks. Based on human performance data, developer can estimate the time to complete the task on the design or designs. GOMS analysis is an example. Heuristic Evaluation by Experts –Using guidelines, experts evaluate the design for usability characteristics. Usability Testing –Collect data about users using prototype designs Analysis Tools –Automated tools can analyze designs for conformance to guidelines, especially guidelines relating to accessibility. These tools may also be used during development to make suggestions.

Process of Evaluation Just like software or usability development, evaluation should go through a development process. The steps –Determine if evaluation is feasible. What is the context? –Define the problem to solve. –Design an evaluation solution. –Implement your solution – do the evaluation –Assess your evaluation. Did you achieve the goals that you had set out for yourself in the evaluation. Just as in software or usability engineering, the result of each stage of this process should be fully documented or specified.

Approaches to Evaluation Formative Evaluation Summative Evaluation

Formative evaluation Occurs in order to help designers refine and form their designs The focus in formative evaluation is to identify problems and potential solutions.

Formative Evaluation - How To? As early as possible –10% rule Done several times Maner overhead gives an example of a lifecycle model for formative evaluation.

Benefits of Formative Evaluation Improves analysis and understanding of requirements for the eventual user interface. Allows modification of interaction and interface design Tests interaction and interface design

Summative evaluation Concerned with summarizing the overall impact and effectiveness of a system These should be quantifiable goals or characteristics that can be used as a criterion for “good enough”

Summative Evaluation Objectives Determine what acceptable levels are for each of the dimensions Determine how to measure each usability characteristic. If the specification for the project included “usability requirements” these can often be used in summative evaluation.

Usability Testing Is our focus - this is what we expect you to do for your project. For your project, we assume that usability testing is feasible! Essentially what you will have to decide is –What usability attributes you wish to test? (problem definition) –How will you operationalize the attributes? How will you measure the attributes that you have selected (design of evaluation) In other words if you are interested in “ease of learning” how will you measure? –What is your testing protocol? (design of evaluation) In other words, what standard tasks will subjects do while you are collecting your data? –Was your evaluation reasonable? What do your results mean? (evaluation and interpretation of evaluation) How can you improve your design to accommodate the data that you collected?

Detail - Developing Your Usability Tests Define the aspects of your design that you wish to evaluate. Determine the usability attributes that you wish to measure. Identify measurements for your attributes. Identify what subjects will do to produce data. Refine tasks that you will use as benchmark tasks. Determine procedures. Write a script of your interaction with the users. Write a script for the users of the steps that they will follow. Pilot test your evaluation and revise Do it! Collect data Analyze data - compare to expected levels Make recommendations based on severity of problems you identify

Evaluation For our Project Identify the Eason System Characteristic(s) that you wish to evaluate. Operationalize that characteristic into a specific description of how you will measure. Justify why your measurement is appropriate for your system characteristic (s). You are arguing for “external validity”, that is that your operational definition of your system characteristic is reasonable for the real world. Identify a benchmark task or tasks for your system. Once again, you should justify why your tasks are characteristic of the tasks that the user interface will support. A good choice of tasks can enhance the external validity of your assessment. Type out the steps of your assessment protocol. The protocol should be exactly the same for all participants, no matter who administers it. Execute your evaluation. Evaluate your results. You should find that even if overall, your project interface is usable for you Eason system characteristic (s) it is likely that you will find some aspects of your interface that could be improved.

We need more details about Usability Attributes Benchmark tasks Types of measurements Protocols and ideas for data collection

Commonly-used Types of Usability Attributes Eason System Characteristics Ease of Learning – typically suggested by –Initial performance –First impression Ease of Use – typically suggested by –Long-term performance –Long-term user satisfaction Task Match – suggested by –Feature use in the interface –Degree to which use of the interface matches user’s mental model of the procedure to accomplish task.

Benchmark Tasks Develop a set of task scenarios which capture the critical characteristics of the tasks likely to be performed with the system. These scenarios are usually descriptions of real- world tasks, that a user can be expected to understand, but the scenario doesn’t describe how it is done in this system. You may have such scenarios from you initial requirements analysis. Hix and Hartson (1993) call these “scripts”

Example Script –“You are a system administrator for a software system which schedules and allocates resources such as company cars, meeting rooms, etc. Unfortunately one of the meeting rooms has been unexpectedly scheduled for refurbishment, which will take two months, beginning in July. Your task is to notify those people who have booked the room for July/August and to provide alternative resources.”

Types of Measurements Objective Observation –Example User performance while performing a “benchmark” task Note, subject cannot impose opinion. Subjective feedback –Example User opinions; eg. “which menu style do you prefer” Note, may be biased if subject has another “agenda” such as impressing the experimenter.

Types of Measurements Qualitative (non-numeric data and results) –Examples Lists of problems Verbal protocols Quantitative (numeric data) –Examples Time to complete benchmark Number of errors

Summary Types of Measurements From Users quantitative qualitative objective Examples: time, Number errors Example: Observation on benchmark task Example: Responses on Questionnaire Example: Feedback from Focus Groups subjective

How Do Evaluations Differ? Data that is quantitative and objective lends itself to a study of whether the design matches an acceptance criteria. Data that is qualitative may lend support for the way that the design matches “big” or “abstract” goals. –May aid in explaining outcome of quantitative studies.

Usability Measures Time Errors Verbal protocols Visual Protocols Eye movements Actual patterns of use Dribble Files Attitudes Customer support activity

Time Classic measurement for psychologists, and within HCI Easy to measure, easily understood, easy to analyze statistically Usually measure time to perform a task. This style of measurement would be quantitative and objective.

Time (2) Problem with time measures is that they are not easily compared unless tasks, etc stay constant Time usually contributes best to summative evaluation.

Errors Another very popular metric is the number and type of errors This measure can be both qualitative (type of error) and quantitative (count of errors), contributing to both summative and formative evaluations.

Errors (2) Errors are actually very hard to define however - and especially hard to count. Also the word has very negative connotations which may not be helpful in user testing.

Errors (3) Can distinguish many types of errors

Verbal protocols Encouraging users to talk out loud through their actions, intentions, thoughts, etc Data is qualitative but can be focused on either subjective or objective feedback. Can be done concurrently or retrospectively. Concurrent protocols are hard for users, but tend to be more reliable.

Verbal Protocols (2) Retrospective protocols are easier for subjects but may lead to rationalizations of actions now perceived to be incorrect. May be able to collect concurrent protocols more easily by using two users working together on a task - gives natural dialogue (if dialogue occurs).

Visual Protocols Taking a video of users (often multiple cameras - maybe direct from monitor). Gives very rich information, but knowledge of video may make users cautious. May lead to unnatural behavior. Data typically qualitative and can be either objective or subjective.

Eye movements Rarely collected, but can be very rich source of information Especially useful in trying to understand why a user fails to notice important information. Calibration and interpretation are very difficult. Equipment is expensive. Some users experience vertigo. Forms: –head-mounted display –User’s head in fixed position and all that move is eyes

Field Studies Data collected about actual patterns of use Rather than looking at unit or benchmark tasks (in a lab setting), can place prototypes in actual work settings and observe actual patterns of use.

Dribble Files These refer to files kept of all actions taking while using a system. Can produce excessive quantities of data, and hard to analyze. Gives record of errors, error recovery and patterns of use.

Attitudes toward System Questionnaires and interviews can be used to assess attitudes towards a new piece of technology Tools for measuring attitudes are not easily constructed But important measure nonetheless.

Subjective Evaluation of System Features Survey of reaction to interface features. Challenge is to get the questionnaire to match the task. QUIS is an example.

Customer support activity If evaluating a real marketed system, then one can measure activity in the customer / technical support services. Politically sensitive data usually, but can be very valuable.

Building your protocol Specify what/how you will solicit your participants. Obtain their informed consent to the evaluation. Provide training if necessary. Perhaps your assessment is during training. Perform your evaluation. Debrief your participants.

What to evaluate? Pencil-and-paper/storyboard prototypes Prototyped or demonstration systems Real systems

Usability Testing Exercise Suppose that the ABC company is going to implement company-wide email.. ABC has never had company-wide email before and is considering several mail readers. –Develop a rational usability testing strategy and materials to apply to the candidate mail readers. Would your tests be summative or formative? What types of measurements and usability assessments would you use. Write a script for the data collector and for the users in your test. Consult your term project materials for ideas of what works and what does not work.

Usability Assessment, Evaluation and Testing Laura and Julie.

Similar presentations

Presentation on theme: "Usability Assessment, Evaluation and Testing Laura and Julie."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Usability Assessment, Evaluation and Testing Laura and Julie.

Similar presentations

Presentation on theme: "Usability Assessment, Evaluation and Testing Laura and Julie."— Presentation transcript:

Similar presentations

About project

Feedback