Karrie Karahalios, Eric Gilbert 6 April 2007 some slides courtesy of Brian Bailey and John Hart cs414 empirical user studies.

Karrie Karahalios, Eric Gilbert 6 April 2007 some slides courtesy of Brian Bailey and John Hart cs414 empirical user studies

Conduct user study to gain more precise measure of the usability of an interface or system Complements low-fidelity techniques Requires a larger investment than low-fi prototyping Provide positive experience for users! Messages

In Context of Task-Centered UI Design

Measure performance, error rate, learnability and retention, satisfaction, tolerable network delay… adapt to your particular interface and context Compare results to usability goals Identify usability issues and resolve them Empirical User Studies

Develop materials Prepare for the study Conduct the study Analyze results and iterate Learn from the experience Overview of Doing Empirical User Studies

Identify usability goals Develop experimental tasks and design Recruit users Instrument software/hardware Prepare for the Study

Identify questions you want answered questions should be specific and measurable Examples: can a user perform each task in < 30s? after only five minutes of instruction, can a user perform each task with < 2 errors? are users rating the interface at least a ‘3’ for overall satisfaction on a 5-point scale? Identify Usability Goals

Structure of experiment what will users do, in what order, where, etc. Between groups (randomly assigned to treatment groups) Control group Experimental group Within groups Each user performs under all conditions Order randomized Cheaper because it uses fewer participants Develop Experimental Design

What gets changed and what is its effect? Independent variables the variables you manipulate e.g. # of menu items, lighting conditions, mouse vs. keys Dependent variables measured part e.g. speed of menu choice, reaction time to stimuli Variable type matters discrete continuous Experimental Variables

Typically want about 8 – 12 users depends on desired confidence in the results 12 is the magic number for the ANOVA test (more later) This could be the most challenging aspect of the study expect about a 0.1% to 10% response rate may need IRB approval, especially if you want to publish Give users a compelling reason to participate Recruit Users

Demographic Diversity It is important to target your user population. example: if you are developing for Firefox, make sure that you use people already familiar with Firefox. Beyond that, it is also important to gain a diversity of different types of users: age sex education occupation... can tell you important things about your system, and help you generalize

Log performance and errors (if possible) Determined media capture needs ensure that you have access to equipment manage physical layout of the testing space Anything else that you need? Instrument Software/Hardware

Give user an overview of the study Introduce your system, allow for practice Have users work through the tasks Collect experimental measures (e.g., performance and error data) Fill out questionnaire, if any Debrief the user Entire session should last less than 60 minutes Conduct the Study

Purpose of the study, but not necessarily details of what you are testing What they will be doing (the tasks) They are not being tested, the interface/system is They can quit at anytime and will not affect relationship with you, the university, the company, etc. About the equipment in the room Whether their face and/or actions will be recorded How to think aloud (if you are collecting verbal data) If you will or will not be available to answer questions Their data will be viewed only in aggregate form How long the session will take Tell the User At Least:

Offer breaks at boundary points Offer to send results in aggregate form or allows users to see improved interface Develop understandable instructions Do not “defend” your interface Do not make subjective comments about users, ease or difficulty of tasks, etc. Make Users Feel Comfortable

Analyze data using statistical methods (ANOVAs and Chi- Squared tests common) take a stats course, e.g., Stat 320, for more detail did you meet the goals? How from the goals are you? Analyze Results and Iterate

t-tests and ANOVAs t-tests compare two random samples and determine if the samples are statistically significantly different e.g., are dynamic menus better than static menus? ANOVAs (analysis of variance) compare n random samples and determine if the samples are statistically significantly different e.g., which is best: dynamic, static or radial menus? Both assume the samples come from normal distributions and both produce p-values..

Bell curve y = exp(-x 2 ) Occurs from sum of independent events e.g. sum of dice rolls Total time = t-find + t- home + t-click Total # of errors Normal Distributions

p-values probability value The probability that the difference you observe in an experiment is due to random chance An expression of the confidence of your result Typically, a difference is called statistically significant when p < 0.05.

Partial eta-squared Some ANOVAs produce partial eta-squared values in addition to p-values. They are becoming widespread in HCI literature. You may see them soon in a usability report. Partial eta-squared values offer a practical measure of significance.

Measure performance (time, error rate) Measure user satisfaction Give realistic experience of the interface realistic system response move among tasks seamlessly designers not in control, the user is Focus will be on the details most big issues should already be resolved Advantages of Empirical User Studies

Users typically must come to the lab makes it more difficult to recruit them users may have anxiety Large setup effort involved software instrumentation, hardware setup, questionnaire design, IRB approval, etc. Prototype may crash Disadvantages of Empirical User Studies

An Example of How This Gets Used in Practice “The Impact of Delayed Visual Feedback on Collaborative Performance” by Darren Gergle, presented at CHI 06. What is the relationship between delayed visual feedback and collaboration? How much network delay can be tolerated? e.g, architectural planning, telesurgery and remote repair

The Collaborative Puzzle Task The experimental task was for a helper to guide a worker through a visual puzzle over a network connection

Independent Variables Only one: visual delay in the helper’s view window Delay sampled from this distribution [60 - 3300ms]: f(n) = T n = T n-1 * e.05 with T 1 = 60

Dependent Variables Only one: task performance time Participants were asked to perform the puzzle task as quickly and accurately as possible.

Quantitative Analysis Using ANOVA “For delays between 60ms and 939ms, we found no evidence to indicate any impact of delayed visual feedback on task performance (SE = (2.87), F 1,610 =.028, p =.87).” p > 0.05, so the samples are not significantly different “However, for delay rates between 939ms and 1798ms there is a significant impact on task performance (F 1,610 = 13.57, p <.001).” Since p < 0.001, this result is highly significant

Graph of Delay vs. Performance

Karrie Karahalios, Eric Gilbert 6 April 2007 some slides courtesy of Brian Bailey and John Hart cs414 empirical user studies.

Similar presentations

Presentation on theme: "Karrie Karahalios, Eric Gilbert 6 April 2007 some slides courtesy of Brian Bailey and John Hart cs414 empirical user studies."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Karrie Karahalios, Eric Gilbert 6 April 2007 some slides courtesy of Brian Bailey and John Hart cs414 empirical user studies.

Similar presentations

Presentation on theme: "Karrie Karahalios, Eric Gilbert 6 April 2007 some slides courtesy of Brian Bailey and John Hart cs414 empirical user studies."— Presentation transcript:

Similar presentations

About project

Feedback