Presentation is loading. Please wait.

Presentation is loading. Please wait.

Empirical Evaluation Chris North cs5984: Information Visualization.

Similar presentations


Presentation on theme: "Empirical Evaluation Chris North cs5984: Information Visualization."— Presentation transcript:

1 Empirical Evaluation Chris North cs5984: Information Visualization

2 Evaluating Visualizations Expert Review Examination by visualization expert Heuristic Evaluation Principles, Guidelines Algorithmic Usability Evaluation Observation, problem identification Empirical Experiment ** Controlled scientific experiment, “user study” Comparisons, statistical analysis

3 What is Science? Measurement Modeling

4 Scientific Method 1.Form Hypothesis 2.Collect data 3.Analyze 4.Accept/reject hypothesis

5 Deep Questions Is ‘computer science’ science? How can you “prove” a hypothesis with science?

6 Empirical Experiment Typical question: Which visualization is better in which situations? LifelinesPerspectiveWall

7 More Rigorous Question Does Vis Tool (Lifelines or PerspWall) have an effect on user performance time for task X? Null hypothesis: No effect Lifelines = PerspWall Want to disprove, provide counter-example, show an effect

8 Variables Independent Variables (what you vary) and treatments (the variable values): Visualization tool »Lifelines, Perspective Wall, Text UI Task type »Find, count, pattern, compare Data size (# of items) »100, 1000, 1000000 Dependent Variables (what you measure) User performance time Errors Subjective satisfaction (survey) HCI metrics!

9 Example: 2 x 3 design n users per cell Task1Task2Task3 Life- Lines Persp. Wall Ind Var 1: Vis. Tool Ind Var 2: Task Type Measured user performance times (dep var)

10 Groups “Between subjects” variable 1 group of users for each variable treatment Group 1: 20 users, Lifelines Group 2: 20 users, PerspWall Total: 40 users, 20 per cell “With-in subjects” (repeated) variable All users perform all treatments Counter-balancing order effect Group 1: 20 users, Lifelines then PerspWall Group 2: 20 users, PerspWall then Lifelines Total: 40 users, 40 per cell

11 Issues Randomized Fairness Identical procedures Bias User privacy, data security

12 Procedure For each user: Sign legal forms Pre-Survey: demographics Instructions »Do not reveal true purpose of experiment Training runs Actual runs Post-Survey: subjective measures * n users

13 Data Measured dependent variables Spreadsheet: Lifelines task 1, 2, 3, PerspWall task 1, 2, 3

14 Averages Task1Task2Task3 Life- Lines 37.254.5103.7 Persp. Wall 29.853.2145.4 Ind Var 1: Vis. Tool Ind Var 2: Task Type Measured user performance times (dep var)

15 PerspWall better than Lifelines? Problem with Averages: lossy Compares only 2 numbers What about the 40 data values? (Show me the data!) Lifelines perspWall Perf time (secs)

16 The real picture Need stats that take all data into account Lifelines perspWall Perf time (secs)

17 Statistics t-test Compares 1 dep var on 2 treatments of 1 ind var ANOVA: Analysis of Variance Compares 1 dep var on n treatments of m ind vars Result: “significant difference” between treatments? p = significance level (confidence) typical cut-off: p < 0.05

18 p < 0.05 Woohoo! Found a “statistically significant difference” Averages determine which is ‘better’ Conclusion: Vis Tool has an “effect” on user performance for task1 PerspWall better user performance than Lifelines for task1 “95% confident that PerspWall better than Lifelines” Not “PerspWall beats Lifelines 95% of time” Found a counter-example to the null-hypothesis Null-hypothesis: Lifelines = PerspWall Hence: Lifelines  PerspWall

19 p > 0.05 Hence, same? Vis Tool has no effect on user performance for task1? Lifelines = PerspWall ? NOT! We did not detect a difference, but could still be different Did not find a counter-example to null hypothesis Provides evidence for Lifelines = PerspWall, but not proof Boring! Basically found nothing How? Not enough users Need better tasks, data, …

20 Data Mountain Robertson, “Data Mountain” (Microsoft) Quoc, Reenal

21 Assignment Thurs: Visualization Development Bederson, “Jazz” » Jun, Rohit Literature Review due Thurs Homework #2 due thurs oct 4


Download ppt "Empirical Evaluation Chris North cs5984: Information Visualization."

Similar presentations


Ads by Google