Empirical Evaluation Chris North cs5984: Information Visualization.

Slides:



Advertisements
Similar presentations
Lab 1. Overview  Instructor Introduction & Syllabus Distribution Attendance – Don’t miss labs. Assignments – Things are due EVERY week. See calendar/table.
Advertisements

Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
Research Methods for Counselors COUN 597 University of Saint Joseph Class # 8 Copyright © 2015 by R. Halstead. All rights reserved.
1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 30, 2005.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 10-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
USABILITY AND EVALUATION Motivations and Methods.
The art and science of measuring people l Reliability l Validity l Operationalizing.
Using Statistics in Research Psych 231: Research Methods in Psychology.
Chapter 14: Usability testing and field studies. Usability Testing Emphasizes the property of being usable Key Components –User Pre-Test –User Test –User.
HCI Methods for Pathway Visualization Tools Purvi Saraiya, Chris North, Karen Duca* Virginia Tech Dept. of Computer Science, Center for Human-Computer.
Using Statistics in Research Psych 231: Research Methods in Psychology.
1 User Centered Design and Evaluation. 2 Overview Why involve users at all? What is a user-centered approach? Evaluation strategies Examples from “Snap-Together.
Analysis of Variance: ANOVA. Group 1: control group/ no ind. Var. Group 2: low level of the ind. Var. Group 3: high level of the ind var.
1 User Centered Design and Evaluation. 2 Overview My evaluation experience Why involve users at all? What is a user-centered approach? Evaluation strategies.
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
Using Statistics in Research Psych 231: Research Methods in Psychology.
Statistical Methods in Computer Science Hypothesis Testing II: Single-Factor Experiments Ido Dagan.
Chapter 14: Usability testing and field studies
SBD: Usability Evaluation
Chapter 2 The Research Enterprise in Psychology. n Basic assumption: events are governed by some lawful order  Goals: Measurement and description Understanding.
Evaluation: Controlled Experiments Chris North cs3724: HCI.
Evaluation Methods Analytic: theory, models, guidelines (experts) –Cognitive Walkthrough –Usability Inspection –Heuristic Evaluation Empirical: observations,
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 8: Quantitative.
Design and Evaluation. Overview Formal Evaluations of Visualization Techniques Design (review) Evaluation/Critique of Visualizations (what we’ve been.
T tests comparing two means t tests comparing two means.
ANOVA. Independent ANOVA Scores vary – why? Total variability can be divided up into 2 parts 1) Between treatments 2) Within treatments.
Scientific Method for a controlled experiment. Observation Previous data Previous results Previous conclusions.
Usability Testing Chapter 6. Reliability Can you repeat the test?
Statistics (cont.) Psych 231: Research Methods in Psychology.
Usability Testing Chris North cs3724: HCI. Presentations karen molye, steve kovalak Vote: UI Hall of Fame/Shame?
Human-Computer Interaction. Overview What is a study? Empirically testing a hypothesis Evaluate interfaces Why run a study? Determine ‘truth’ Evaluate.
Single Factor or One-Way ANOVA Comparing the Means of 3 or More Groups Chapter 10.
SBD: Usability Evaluation Chris North cs3724: HCI.
Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process.
T tests comparing two means t tests comparing two means.
Business Statistics: A First Course (3rd Edition)
Design and Evaluation. Design Use Field Guide to identify information relevant to planning visualization.Field Guide Formally plan visualization using.
SBD: Usability Evaluation Chris North CS 3724: HCI.
Research Word has a broad spectrum of meanings –“Research this topic on ….” –“Years of research has produced a new ….”
European Patients’ Academy on Therapeutic Innovation The Purpose and Fundamentals of Statistics in Clinical Trials.
T tests comparing two means t tests comparing two means.
Experiment An experiment deliberately imposes a treatment on a group of objects or subjects in the interest of observing the response. Differs from an.
ANOVA EDL 714, Fall Analysis of variance  ANOVA  An omninbus procedure that performs the same task as running multiple t-tests between all groups.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 11 Testing for Differences Differences betweens groups or categories of the independent.
THE SCIENTIFIC METHOD: It’s the method you use to study a question scientifically.
Statistics Psych 231: Research Methods in Psychology.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Inferential Statistics Psych 231: Research Methods in Psychology.
Oneway ANOVA comparing 3 or more means. Overall Purpose A Oneway ANOVA is used to compare three or more average scores. A Oneway ANOVA is used to compare.
PS Research Methods I with Kimberly Maring Unit 9 – Experimental Research Chapter 6 of our text: Zechmeister, J. S., Zechmeister, E. B., & Shaughnessy,
1. A question may be investigated through experimentation. 2. A good scientific experiment is designed to provide evidence for cause/effect relationships.
Inferential Statistics Psych 231: Research Methods in Psychology.
Qualitative vs. Quantitative
Science Experiment Title
SBD: Usability Evaluation
Data Collection and Analysis
cs5984: Information Visualization Chris North
cs5984: Information Visualization Chris North
Scientific Method Steps
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Title of your experimental design
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
This is how we do science!!
cs5984: Information Visualization Chris North
Psychological Experimentation
Presentation transcript:

Empirical Evaluation Chris North cs5984: Information Visualization

Evaluating Visualizations Expert Review Examination by visualization expert Heuristic Evaluation Principles, Guidelines Algorithmic Usability Evaluation Observation, problem identification Empirical Experiment ** Controlled scientific experiment, “user study” Comparisons, statistical analysis

What is Science? Measurement Modeling

Scientific Method 1.Form Hypothesis 2.Collect data 3.Analyze 4.Accept/reject hypothesis

Deep Questions Is ‘computer science’ science? How can you “prove” a hypothesis with science?

Empirical Experiment Typical question: Which visualization is better in which situations? LifelinesPerspectiveWall

More Rigorous Question Does Vis Tool (Lifelines or PerspWall) have an effect on user performance time for task X? Null hypothesis: No effect Lifelines = PerspWall Want to disprove, provide counter-example, show an effect

Variables Independent Variables (what you vary) and treatments (the variable values): Visualization tool »Lifelines, Perspective Wall, Text UI Task type »Find, count, pattern, compare Data size (# of items) »100, 1000, Dependent Variables (what you measure) User performance time Errors Subjective satisfaction (survey) HCI metrics!

Example: 2 x 3 design n users per cell Task1Task2Task3 Life- Lines Persp. Wall Ind Var 1: Vis. Tool Ind Var 2: Task Type Measured user performance times (dep var)

Groups “Between subjects” variable 1 group of users for each variable treatment Group 1: 20 users, Lifelines Group 2: 20 users, PerspWall Total: 40 users, 20 per cell “With-in subjects” (repeated) variable All users perform all treatments Counter-balancing order effect Group 1: 20 users, Lifelines then PerspWall Group 2: 20 users, PerspWall then Lifelines Total: 40 users, 40 per cell

Issues Randomized Fairness Identical procedures Bias User privacy, data security

Procedure For each user: Sign legal forms Pre-Survey: demographics Instructions »Do not reveal true purpose of experiment Training runs Actual runs Post-Survey: subjective measures * n users

Data Measured dependent variables Spreadsheet: Lifelines task 1, 2, 3, PerspWall task 1, 2, 3

Averages Task1Task2Task3 Life- Lines Persp. Wall Ind Var 1: Vis. Tool Ind Var 2: Task Type Measured user performance times (dep var)

PerspWall better than Lifelines? Problem with Averages: lossy Compares only 2 numbers What about the 40 data values? (Show me the data!) Lifelines perspWall Perf time (secs)

The real picture Need stats that take all data into account Lifelines perspWall Perf time (secs)

Statistics t-test Compares 1 dep var on 2 treatments of 1 ind var ANOVA: Analysis of Variance Compares 1 dep var on n treatments of m ind vars Result: “significant difference” between treatments? p = significance level (confidence) typical cut-off: p < 0.05

p < 0.05 Woohoo! Found a “statistically significant difference” Averages determine which is ‘better’ Conclusion: Vis Tool has an “effect” on user performance for task1 PerspWall better user performance than Lifelines for task1 “95% confident that PerspWall better than Lifelines” Not “PerspWall beats Lifelines 95% of time” Found a counter-example to the null-hypothesis Null-hypothesis: Lifelines = PerspWall Hence: Lifelines  PerspWall

p > 0.05 Hence, same? Vis Tool has no effect on user performance for task1? Lifelines = PerspWall ? NOT! We did not detect a difference, but could still be different Did not find a counter-example to null hypothesis Provides evidence for Lifelines = PerspWall, but not proof Boring! Basically found nothing How? Not enough users Need better tasks, data, …

Data Mountain Robertson, “Data Mountain” (Microsoft) Quoc, Reenal

Assignment Thurs: Visualization Development Bederson, “Jazz” » Jun, Rohit Literature Review due Thurs Homework #2 due thurs oct 4