Karrie Karahalios, Eric Gilbert 6 April 2007 some slides courtesy of Brian Bailey and John Hart cs414 empirical user studies.

Slides:



Advertisements
Similar presentations
Chapter 14: Usability testing and field studies
Advertisements

Methodology and Explanation XX50125 Lecture 2: Experiments Dr. Danaë Stanton Fraser.
Methodology and Explanation XX50125 Lecture 1: Part I. Introduction to Evaluation Methods Part 2. Experiments Dr. Danaë Stanton Fraser.
Inferential Statistics and t - tests
Analyzing and Presenting Results Establishing a User Orientation Alfred Kobsa University of California, Irvine.
Data analysis and interpretation. Agenda Part 2 comments – Average score: 87 Part 3: due in 2 weeks Data analysis.
The art and science of measuring people l Reliability l Validity l Operationalizing.
CS160 Discussion Section Matthew Kam Apr 14, 2003.
Chapter 14: Usability testing and field studies. Usability Testing Emphasizes the property of being usable Key Components –User Pre-Test –User Test –User.
CS CS 5150 Software Engineering Lecture 12 Usability 2.
1 User Centered Design and Evaluation. 2 Overview Why involve users at all? What is a user-centered approach? Evaluation strategies Examples from “Snap-Together.
SIMS 213: User Interface Design & Development Marti Hearst Thurs, March 13, 2003.
1 User Centered Design and Evaluation. 2 Overview My evaluation experience Why involve users at all? What is a user-centered approach? Evaluation strategies.
ISE554 The WWW 3.4 Evaluation Methods. Evaluating Interfaces with Users Why evaluation is crucial to interface design General approaches and tradeoffs.
Analysis of Variance & Multivariate Analysis of Variance
James Tam Evaluating Interfaces With Users Why evaluation is crucial to interface design General approaches and tradeoffs in evaluation The role of ethics.
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
RESEARCH METHODS IN EDUCATIONAL PSYCHOLOGY
Quantitative Research
Inferential Statistics
Empirical Evaluation Assessing usability (with users)
Chapter 14: Usability testing and field studies
Statistical Analysis Statistical Analysis
QNT 531 Advanced Problems in Statistics and Research Methods
MARKETING RESEARCH CHAPTERS
Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested.
CS130 – Software Tools Fall 2010 Statistics and PASW Wrap-up 1.
Overview of the rest of the semester Building on Assignment 1 Using iterative prototyping.
Fall 2002CS/PSY Empirical Evaluation Analyzing data, Informing design, Usability Specifications Inspecting your data Analyzing & interpreting results.
Human Computer Interaction
ANOVA. Independent ANOVA Scores vary – why? Total variability can be divided up into 2 parts 1) Between treatments 2) Within treatments.
Conducting a User Study Human-Computer Interaction.
Usability Testing Chapter 6. Reliability Can you repeat the test?
Midterm Stats Min: 16/38 (42%) Max: 36.5/38 (96%) Average: 29.5/36 (78%)
CS2003 Usability Engineering Usability Evaluation Dr Steve Love.
COMP5047 Pervasive Computing: 2012 Think-aloud usability experiments or concurrent verbal accounts Judy Kay CHAI: Computer human adapted interaction research.
©2010 John Wiley and Sons Chapter 3 Research Methods in Human-Computer Interaction Chapter 3- Experimental Design.
IT 와 인간의 만남 KAIST 지식서비스공학과 Experimental Research KSE966/986 Seminar Uichin Lee Sept. 21, 2012.
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
Human-Computer Interaction. Overview What is a study? Empirically testing a hypothesis Evaluate interfaces Why run a study? Determine ‘truth’ Evaluate.
LInfoVis Winter 2011 Chris Culy Evaluation of visualizations.
Usability Engineering Dr. Dania Bilal IS 582 Spring 2006.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.1 One-Way ANOVA: Comparing.
©2010 John Wiley and Sons Chapter 2 Research Methods in Human-Computer Interaction Chapter 2- Experimental Research.
Evaluating Impacts of MSP Grants Ellen Bobronnikov Hilary Rhodes January 11, 2010 Common Issues and Recommendations.
Statement of the Problem Goal Establishes Setting of the Problem hypothesis Additional information to comprehend fully the meaning of the problem scopedefinitionsassumptions.
Usability Evaluation, part 2. REVIEW: A Test Plan Checklist, 1 Goal of the test? Specific questions you want to answer? Who will be the experimenter?
Usability Engineering Dr. Dania Bilal IS 592 Spring 2005.
Chapter 13: Inferences about Comparing Two Populations Lecture 8b Date: 15 th November 2015 Instructor: Naveen Abedin.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Usability Evaluation or, “I can’t figure this out...do I still get the donuts?”
Usability Engineering Dr. Dania Bilal IS 582 Spring 2007.
Usability Engineering Dr. Dania Bilal IS 587 Fall 2007.
School of Engineering and Information and Communication Technology KIT305/607 Mobile Application Development Week 7: Usability (think-alouds) Dr. Rainer.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
Day 8 Usability testing.
9 Procedure for Conducting an Experiment.

Evaluation through user participation
Between-Subjects, within-subjects, and factorial Experimental Designs
Understanding Results
Usability Evaluation, part 2
Professor John Canny Spring 2003
Chapter 7: Single Factor Designs.
Chapter 23 Deciding how to collect data
Professor John Canny Spring 2004
HCI Evaluation Techniques
Professor John Canny Fall 2004
Empirical Evaluation Data Collection: Techniques, methods, tricks Objective data IRB Clarification All research done outside the class (i.e., with non-class.
Presentation transcript:

Karrie Karahalios, Eric Gilbert 6 April 2007 some slides courtesy of Brian Bailey and John Hart cs414 empirical user studies

Conduct user study to gain more precise measure of the usability of an interface or system Complements low-fidelity techniques Requires a larger investment than low-fi prototyping Provide positive experience for users! Messages

In Context of Task-Centered UI Design

Measure performance, error rate, learnability and retention, satisfaction, tolerable network delay… adapt to your particular interface and context Compare results to usability goals Identify usability issues and resolve them Empirical User Studies

Develop materials Prepare for the study Conduct the study Analyze results and iterate Learn from the experience Overview of Doing Empirical User Studies

Identify usability goals Develop experimental tasks and design Recruit users Instrument software/hardware Prepare for the Study

Identify questions you want answered questions should be specific and measurable Examples: can a user perform each task in < 30s? after only five minutes of instruction, can a user perform each task with < 2 errors? are users rating the interface at least a ‘3’ for overall satisfaction on a 5-point scale? Identify Usability Goals

Structure of experiment what will users do, in what order, where, etc. Between groups (randomly assigned to treatment groups) Control group Experimental group Within groups Each user performs under all conditions Order randomized Cheaper because it uses fewer participants Develop Experimental Design

What gets changed and what is its effect? Independent variables the variables you manipulate e.g. # of menu items, lighting conditions, mouse vs. keys Dependent variables measured part e.g. speed of menu choice, reaction time to stimuli Variable type matters discrete continuous Experimental Variables

Typically want about 8 – 12 users depends on desired confidence in the results 12 is the magic number for the ANOVA test (more later) This could be the most challenging aspect of the study expect about a 0.1% to 10% response rate may need IRB approval, especially if you want to publish Give users a compelling reason to participate Recruit Users

Demographic Diversity It is important to target your user population. example: if you are developing for Firefox, make sure that you use people already familiar with Firefox. Beyond that, it is also important to gain a diversity of different types of users: age sex education occupation... can tell you important things about your system, and help you generalize

Log performance and errors (if possible) Determined media capture needs ensure that you have access to equipment manage physical layout of the testing space Anything else that you need? Instrument Software/Hardware

Give user an overview of the study Introduce your system, allow for practice Have users work through the tasks Collect experimental measures (e.g., performance and error data) Fill out questionnaire, if any Debrief the user Entire session should last less than 60 minutes Conduct the Study

Purpose of the study, but not necessarily details of what you are testing What they will be doing (the tasks) They are not being tested, the interface/system is They can quit at anytime and will not affect relationship with you, the university, the company, etc. About the equipment in the room Whether their face and/or actions will be recorded How to think aloud (if you are collecting verbal data) If you will or will not be available to answer questions Their data will be viewed only in aggregate form How long the session will take Tell the User At Least:

Offer breaks at boundary points Offer to send results in aggregate form or allows users to see improved interface Develop understandable instructions Do not “defend” your interface Do not make subjective comments about users, ease or difficulty of tasks, etc. Make Users Feel Comfortable

Analyze data using statistical methods (ANOVAs and Chi- Squared tests common) take a stats course, e.g., Stat 320, for more detail did you meet the goals? How from the goals are you? Analyze Results and Iterate

t-tests and ANOVAs t-tests compare two random samples and determine if the samples are statistically significantly different e.g., are dynamic menus better than static menus? ANOVAs (analysis of variance) compare n random samples and determine if the samples are statistically significantly different e.g., which is best: dynamic, static or radial menus? Both assume the samples come from normal distributions and both produce p-values..

Bell curve y = exp(-x 2 ) Occurs from sum of independent events e.g. sum of dice rolls Total time = t-find + t- home + t-click Total # of errors Normal Distributions

p-values probability value The probability that the difference you observe in an experiment is due to random chance An expression of the confidence of your result Typically, a difference is called statistically significant when p < 0.05.

Partial eta-squared Some ANOVAs produce partial eta-squared values in addition to p-values. They are becoming widespread in HCI literature. You may see them soon in a usability report. Partial eta-squared values offer a practical measure of significance.

Measure performance (time, error rate) Measure user satisfaction Give realistic experience of the interface realistic system response move among tasks seamlessly designers not in control, the user is Focus will be on the details most big issues should already be resolved Advantages of Empirical User Studies

Users typically must come to the lab makes it more difficult to recruit them users may have anxiety Large setup effort involved software instrumentation, hardware setup, questionnaire design, IRB approval, etc. Prototype may crash Disadvantages of Empirical User Studies

An Example of How This Gets Used in Practice “The Impact of Delayed Visual Feedback on Collaborative Performance” by Darren Gergle, presented at CHI 06. What is the relationship between delayed visual feedback and collaboration? How much network delay can be tolerated? e.g, architectural planning, telesurgery and remote repair

The Collaborative Puzzle Task The experimental task was for a helper to guide a worker through a visual puzzle over a network connection

Independent Variables Only one: visual delay in the helper’s view window Delay sampled from this distribution [ ms]: f(n) = T n = T n-1 * e.05 with T 1 = 60

Dependent Variables Only one: task performance time Participants were asked to perform the puzzle task as quickly and accurately as possible.

Quantitative Analysis Using ANOVA “For delays between 60ms and 939ms, we found no evidence to indicate any impact of delayed visual feedback on task performance (SE = (2.87), F 1,610 =.028, p =.87).” p > 0.05, so the samples are not significantly different “However, for delay rates between 939ms and 1798ms there is a significant impact on task performance (F 1,610 = 13.57, p <.001).” Since p < 0.001, this result is highly significant

Graph of Delay vs. Performance