Evaluation: Controlled Experiments Chris North cs3724: HCI.

Slides:



Advertisements
Similar presentations
Utah School of Computing HCI Validation Richard F. Riesenfeld University of Utah Fall 2009 Lecture Set 16.
Advertisements

Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
Data analysis and interpretation. Agenda Part 2 comments – Average score: 87 Part 3: due in 2 weeks Data analysis.
1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 30, 2005.
USABILITY AND EVALUATION Motivations and Methods.
The art and science of measuring people l Reliability l Validity l Operationalizing.
Chapter 14: Usability testing and field studies. Usability Testing Emphasizes the property of being usable Key Components –User Pre-Test –User Test –User.
HCI Methods for Pathway Visualization Tools Purvi Saraiya, Chris North, Karen Duca* Virginia Tech Dept. of Computer Science, Center for Human-Computer.
Evaluation Adam Bodnar CPSC 533C Monday, April 5, 2004.
Using Statistics in Research Psych 231: Research Methods in Psychology.
1 User Centered Design and Evaluation. 2 Overview Why involve users at all? What is a user-centered approach? Evaluation strategies Examples from “Snap-Together.
ICS 463, Intro to Human Computer Interaction Design: 9. Experiments Dan Suthers.
1 User Centered Design and Evaluation. 2 Overview My evaluation experience Why involve users at all? What is a user-centered approach? Evaluation strategies.
Using Statistics in Research Psych 231: Research Methods in Psychology.
1 Chapter 13: Introduction to Analysis of Variance.
CS 3724: Introduction to Human Computer Interaction Chris North Jason Lee Szu-Chia Lu.
Chapter 14: Usability testing and field studies
SBD: Usability Evaluation
Dynamic Web Sites Chris North cs3724: HCI. Presentations matt ketner, sam altman, mike gordon Vote: UI Hall of Fame/Shame?
Conducting a User Study Human-Computer Interaction.
Information Design and Visualization
Evaluation Methods Analytic: theory, models, guidelines (experts) –Cognitive Walkthrough –Usability Inspection –Heuristic Evaluation Empirical: observations,
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 8: Quantitative.
Design and Evaluation. Overview Formal Evaluations of Visualization Techniques Design (review) Evaluation/Critique of Visualizations (what we’ve been.
T tests comparing two means t tests comparing two means.
Design Chris North cs3724: HCI. Quiz What are the 3 steps in producing a UI?
Chapter 12: Introduction to Analysis of Variance
Fall 2002CS/PSY Empirical Evaluation Analyzing data, Informing design, Usability Specifications Inspecting your data Analyzing & interpreting results.
Construction cs5984: Information Visualization Chris North.
Animated banners - H1 Sigurbjörn Óskarsson. Research design Repeated measures N=32 (- 1 outlier) Task testing Control category + 3 levels of experimental.
Karrie Karahalios, Eric Gilbert 6 April 2007 some slides courtesy of Brian Bailey and John Hart cs414 empirical user studies.
Usability Testing Chapter 6. Reliability Can you repeat the test?
Usability Testing Chris North cs3724: HCI. Presentations karen molye, steve kovalak Vote: UI Hall of Fame/Shame?
Human-Computer Interaction. Overview What is a study? Empirically testing a hypothesis Evaluate interfaces Why run a study? Determine ‘truth’ Evaluate.
Info Vis: Multi-Dimensional Data Chris North cs3724: HCI.
SBD: Usability Evaluation Chris North cs3724: HCI.
Early Design Process Chris North cs3724: HCI. Presentations mohamed hassoun, aaron dalton Vote: UI Hall of Fame/Shame?
Visual Overview Strategies cs5984: Information Visualization Chris North.
Information Visualiation: Trees Chris North cs3724: HCI.
Design and Evaluation. Design Use Field Guide to identify information relevant to planning visualization.Field Guide Formally plan visualization using.
Review Chris North cs3724: HCI. Midterm Topics Scenario-based design: (ch 1-4) SBD background –metrics, tradeoffs, scenarios Requirements analysis –Field.
RESEARCH METHODS IN INDUSTRIAL PSYCHOLOGY & ORGANIZATION Pertemuan Matakuliah: D Sosiologi dan Psikologi Industri Tahun: Sep-2009.
SBD: Usability Evaluation Chris North CS 3724: HCI.
Empirical Evaluation Chris North cs5984: Information Visualization.
Visual Overview Strategies cs5984: Information Visualization Chris North.
Major Steps. 1.State the hypotheses.  Be sure to state both the null hypothesis and the alternative hypothesis, and identify which is the claim. H0H0.
T tests comparing two means t tests comparing two means.
ANOVA EDL 714, Fall Analysis of variance  ANOVA  An omninbus procedure that performs the same task as running multiple t-tests between all groups.
CS 3724: Introduction to Human Computer Interaction Chris North Regis Kopper.
1 Usability Analysis n Why Analyze n Types of Usability Analysis n Human Subjects Research n Project 3: Heuristic Evaluation.
Information Visualization: Navigation Chris North cs3724: HCI.
Inferential Statistics Psych 231: Research Methods in Psychology.
Error & Statistical Analysis. Mini Lesson Unit 2 Types of Errors.
1. A question may be investigated through experimentation. 2. A good scientific experiment is designed to provide evidence for cause/effect relationships.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Information Visualization: Navigation
Qualitative vs. Quantitative
cs5984: Information Visualization Chris North
SBD: Usability Evaluation
Data Collection and Analysis
Information Visualization 2: Overview and Navigation
cs5984: Information Visualization Chris North
cs5984: Information Visualization Chris North
Information Design and Visualization
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Information Visualization
This is how we do science!!
cs5984: Information Visualization Chris North
Additional Topics Regarding Hypothesis Testing
Presentation transcript:

Evaluation: Controlled Experiments Chris North cs3724: HCI

Presentations dan constantin, grant underwood, mike gordon Vote: UI Hall of Fame/Shame?

Next Apr 4: Proj 2, final implementation Presentations: UI critique or HW2 results Thurs: matt ketner, sam altman Next Tues: karen molye, steve kovalak Next Thurs:

Review 3 approaches for navigating large information spaces? detail only Zoom Overview+detail Focus+context

Review: Visualizing Trees 2 approaches: Connection Containment Hyperbolic: 100s nodes + structure TreeMap: 1000s nodes + attributes 3D: infovis design is critical, not just VRML

Process Design EvaluateDevelop Continuous iteration

UI Evaluation Early evaluation: Wizard of Oz Role playing and scenarios Mid evaluation: Expert reviews Heuristic evaluation Usability testing Controlled Experiments Late evaluation: Data logging Online surveys

Controlled Experiments Scientific experiment with real users Typical HCI goal: which UI is better?

What is Science? Measurement Modeling

Scientific Method 1.Form Hypothesis 2.Collect data 3.Analyze 4.Accept/reject hypothesis

Deep Questions Is ‘computer science’ science? How can you “prove” a hypothesis with science?

Empirical Experiment Typical question: Which UI is better in which situations? LifelinesPerspectiveWall (zooming) (focus+context)

More Rigorous Question Does UI (Lifelines or PerspWall) have an effect on user performance time for task X for suchnsuch users? Null hypothesis: No effect Lifelines = PerspWall Want to disprove, provide counter-example, show an effect

Variables Independent Variables (what you vary) and treatments (the variable values): User Interface »Lifelines, Perspective Wall, Text UI Task type »Find, count, pattern, compare Data size (# of items) »100, 1000, Dependent Variables (what you measure) User performance time Errors Subjective satisfaction (survey), retention, learning time HCI metrics

Example: 2 x 3 design n users per cell Task1Task2Task3 Life- Lines Persp. Wall Ind Var 1: UI Ind Var 2: Task Type Measured user performance times (dep var)

Groups “Between subjects” variable 1 group of users for each variable treatment Group 1: 20 users, Lifelines Group 2: 20 users, PerspWall Total: 40 users, 20 per cell “With-in subjects” (repeated) variable All users perform all treatments Counter-balancing order effect Group 1: 20 users, Lifelines then PerspWall Group 2: 20 users, PerspWall then Lifelines Total: 40 users, 40 per cell

Issues Fairness Randomized Identical procedures Bias User privacy, data security Legal permissions

Procedure For each user: Sign legal forms Pre-Survey: demographics Instructions »Do not reveal true purpose of experiment Training runs Actual runs Post-Survey: subjective measures * n users

Data Measured dependent variables Spreadsheet Lifelines task 1, 2, 3, PerspWall task 1, 2, 3

Averages Task1Task2Task3 Life- Lines Persp. Wall Ind Var 1: UI Ind Var 2: Task Type Measured user performance times (dep var)

PerspWall better than Lifelines? Problem with Averages: lossy Compares only 2 numbers What about the 40 data values? (Show me the data!) Lifelines PerspWall Avg Task1 perf time (secs)

The real picture Need stats that take all data into account Lifelines PerspWall Perf time (secs)

Statistics t-test Compares 1 dep var on 2 treatments of 1 ind var (2 cells) ANOVA: Analysis of Variance Compares 1 dep var on n treatments of m ind vars (n x m cells) Result: “significant difference” between treatments? p = significance level (confidence) typical cut-off: p < 0.05

p < 0.05 Woohoo! Found a “statistically significant difference” Averages indicate which is ‘better’ Conclusion: UI has an “effect” on user performance for task1 PerspWall better user performance than Lifelines for task1 “95% confident that PerspWall better than Lifelines” Not “PerspWall beats Lifelines 95% of time” Found a counter-example to the null-hypothesis Null-hypothesis: Lifelines = PerspWall Hence: Lifelines  PerspWall

p > 0.05 Hence, same? UI has no effect on user performance for task1? Lifelines = PerspWall ? NOT! We did not detect a difference, but could still be different Did not find a counter-example to null hypothesis Provides evidence for Lifelines = PerspWall, but not proof Boring! Basically found nothing How? Not enough users Need better tasks, data, …