Spring 20116.813/6.831 User Interface Design and Implementation1 Lecture 15: Experiment Analysis.

Slides:



Advertisements
Similar presentations
Prepared by Lloyd R. Jaisingh
Advertisements

Inferential Statistics and t - tests
Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
PSY 307 – Statistics for the Behavioral Sciences Chapter 20 – Tests for Ranked Data, Choosing Statistical Tests.
Conducting a User Study Human-Computer Interaction.
Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test (19.5) Table I χ 2 test for independence (19.9) Table.
Comparing Two Population Means The Two-Sample T-Test and T-Interval.
MF-852 Financial Econometrics
Probability & Statistical Inference Lecture 7 MSc in Computing (Data Analytics)
Inferential Stats for Two-Group Designs. Inferential Statistics Used to infer conclusions about the population based on data collected from sample Do.
Statistics for Managers Using Microsoft® Excel 5th Edition
Lecture 9: One Way ANOVA Between Subjects
Inferences About Means of Two Independent Samples Chapter 11 Homework: 1, 2, 4, 6, 7.
4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Today Concepts underlying inferential statistics
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Chapter 14 Inferential Data Analysis
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Choosing Statistical Procedures
ANOVA Chapter 12.
AM Recitation 2/10/11.
Overview of Statistical Hypothesis Testing: The z-Test
Hypothesis testing – mean differences between populations
ANOVA Greg C Elvers.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Statistical Analysis A Quick Overview. The Scientific Method Establishing a hypothesis (idea) Collecting evidence (often in the form of numerical data)
PROBABILITY & STATISTICAL INFERENCE LECTURE 6 MSc in Computing (Data Analytics)
Comparing Two Population Means
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Chapter 11 HYPOTHESIS TESTING USING THE ONE-WAY ANALYSIS OF VARIANCE.
t(ea) for Two: Test between the Means of Different Groups When you want to know if there is a ‘difference’ between the two groups in the mean Use “t-test”.
A Repertoire of Hypothesis Tests  z-test – for use with normal distributions and large samples.  t-test – for use with small samples and when the pop.
Conducting a User Study Human-Computer Interaction.
Statistics 11 Confidence Interval Suppose you have a sample from a population You know the sample mean is an unbiased estimate of population mean Question:
1 Psych 5500/6500 t Test for Two Independent Means Fall, 2008.
Conducting a User Study Human-Computer Interaction.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 10-1 Chapter 10 Analysis of Variance Statistics for Managers Using Microsoft.
INTRODUCTION TO ANALYSIS OF VARIANCE (ANOVA). COURSE CONTENT WHAT IS ANOVA DIFFERENT TYPES OF ANOVA ANOVA THEORY WORKED EXAMPLE IN EXCEL –GENERATING THE.
Chapter 19 Analysis of Variance (ANOVA). ANOVA How to test a null hypothesis that the means of more than two populations are equal. H 0 :  1 =  2 =
1 ConceptsDescriptionHypothesis TheoryLawsModel organizesurprise validate formalize The Scientific Method.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Human-Computer Interaction. Overview What is a study? Empirically testing a hypothesis Evaluate interfaces Why run a study? Determine ‘truth’ Evaluate.
1 Chapter 8 Introduction to Hypothesis Testing. 2 Name of the game… Hypothesis testing Statistical method that uses sample data to evaluate a hypothesis.
Chapter 13 - ANOVA. ANOVA Be able to explain in general terms and using an example what a one-way ANOVA is (370). Know the purpose of the one-way ANOVA.
Experimental Psychology PSY 433 Appendix B Statistics.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.1 One-Way ANOVA: Comparing.
1 ANALYSIS OF VARIANCE (ANOVA) Heibatollah Baghi, and Mastee Badii.
ETM U 1 Analysis of Variance (ANOVA) Suppose we want to compare more than two means? For example, suppose a manufacturer of paper used for grocery.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Chapter Eight: Using Statistics to Answer Questions.
Chapter 12 Introduction to Analysis of Variance PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Eighth Edition by Frederick.
© Copyright McGraw-Hill 2004
Testing Hypotheses about a Population Proportion Lecture 31 Sections 9.1 – 9.3 Wed, Mar 22, 2006.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Kin 304 Inferential Statistics Probability Level for Acceptance Type I and II Errors One and Two-Tailed tests Critical value of the test statistic “Statistics.
Analysis of Variance STAT E-150 Statistical Methods.
STATS 10x Revision CONTENT COVERED: CHAPTERS
Chapter 13 Understanding research results: statistical inference.
Hypothesis Tests u Structure of hypothesis tests 1. choose the appropriate test »based on: data characteristics, study objectives »parametric or nonparametric.
ENGR 610 Applied Statistics Fall Week 8 Marshall University CITE Jack Smith.
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
S519: Evaluation of Information Systems Social Statistics Inferential Statistics Chapter 9: t test.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 FINAL EXAMINATION STUDY MATERIAL III A ADDITIONAL READING MATERIAL – INTRO STATS 3 RD EDITION.
Fall UI Design and Implementation1 Lecture 18: Experiment Design & Analysis.
When the means of two groups are to be compared (where each group consists of subjects that are not related) then the excel two-sample t-test procedure.
Inferential Statistics
Kin 304 Inferential Statistics
Presentation transcript:

Spring /6.831 User Interface Design and Implementation1 Lecture 15: Experiment Analysis

UI Hall of Fame or Shame? Spring /6.831 User Interface Design and Implementation2

Nanoquiz closed book, closed notes submit before time is up (paper or web) we’ll show a timer for the last 20 seconds Spring /6.831 User Interface Design and Implementation

1.To maximize external validity, a good research method to use is: (choose one best answer) A. formative evaluation B. field study C. survey D. lab study 2.When deciding between between-subjects and within-subjects designs for an experiment, the most important issues to consider are: (choose all best answers) A. the setting of the experiment B. ordering effects C. individual differences D. tasks 3.Louis Reasoner is running a user study to compare two input devices, and he decides to let participants in his user study decide which device they’ll use in the study. This decision threatens: (choose one best answer) A. reliability B. external validity C. internal validity D. experimenter bias Spring /6.831 User Interface Design and Implementation

Today’s Topics Hypothesis testing Graphing with error bars T test ANOVA test Spring /6.831 User Interface Design and Implementation5

Experiment Analylsis Hypothesis: Mac menubar is faster to access than Windows menubar –Design: between-subjects, randomized assignment of interface to subject Spring /6.831 User Interface Design and Implementation6 Windows Mac

Statistical Testing Compute a statistic summarizing the experimental data mean(Win) mean(Mac) Apply a statistical test –t test: are two means different? –ANOVA (ANalysis Of VAriance): are three or more means different? Test produces a p value –p value = probability that the observed difference happened purely by chance –If p < 0.05, then we are 95% confident that there is a difference between Windows and Mac Spring /6.831 User Interface Design and Implementation7

Standard Error of the Mean Spring /6.831 User Interface Design and Implementation8 N = 4 : Error bars overlap, so can’t conclude anything N=10: Error bars are disjoint, so Windows may be different from Mac

Graphing Techniques Pros –Easy to compute –Give a feel for your data Cons –Not a substitute for statistical testing Spring /6.831 User Interface Design and Implementation9 max min 25 th percentile 75 th percentile median Windows Max Error bars Tukey box plots

Quick Intro to R R is an open source programming environment for data manipulation –includes statistics & charting Get the data in win = scan() # win = [625, 480, … ] mac = scan() # mac = [647, 503, … ] Compute with it means = c(mean(win), mean(mac)) # means = [584, 508] stderrs = c(sd(win)/sqrt(10), sd(mac)/sqrt(10)) # stderrs = [23.29, 26.98] Graph it plot = barplot(means, names.arg=c("Windows", "Mac"), ylim=c(0,800)) error.bar(plot, means, stderrs) Spring /6.831 User Interface Design and Implementation10

Spring /6.831 User Interface Design and Implementation11 Hypothesis Testing Our hypothesis: position of menubar matters –i.e., mean(Mac times) < mean(Windows times) –This is called the alternative hypothesis (also called H1) If we ’ re wrong: position of menu bar makes no difference –i.e., mean(Mac) = mean(Win) –This is called the null hypothesis (H0) We can ’ t really disprove the null hypothesis –Instead, we argue that the chance of seeing a difference at least as extreme as what we saw is very small if the null hypothesis is true

Spring /6.831 User Interface Design and Implementation12 Statistical Significance Compute a statistic from our experimental data X = mean(Win) – mean(Mac) Determine the probability distribution of the statistic assuming H0 is true Pr( X=x | H0) Measure the probability of getting the same or greater difference Pr ( X > x0 | H0 ) one-sided test 2 Pr ( X > |x0| | H0) two-sided test If that probability is less than 5%, then we say –“ We reject the null hypothesis at the 5% significance level ” –equivalently: “ difference between menubars is statistically significant (p <.05) ” Statistically significant does not mean scientifically important

Spring /6.831 User Interface Design and Implementation13 T test T test compares the means of two samples A and B Two-sided: –H0: mean(A) = mean(B) –H1: mean(A) <> mean(B) One-sided: –H0: mean(A) = mean(B) –H1: mean(A) < mean(B) Assumptions: –samples A & B are independent (between- subjects, randomized) –normal distribution –equal variance

Running a T Test (Excel) Spring /6.831 User Interface Design and Implementation14 WinMac WindowsMac Mean Variance Observations44 Pooled Variance Hypothesized Mean Difference0 df6 t Stat0.336 P(T<=t) one-tail0.374 t Critical one-tail1.943 P(T<=t) two-tail0.748 t Critical two-tail2.447

Running a T Test Spring /6.831 User Interface Design and Implementation15 WinMac WindowsMac Mean Variance Observations10 Pooled Variance Hypothesized Mean Difference0 df18 t Stat2.130 P(T<=t) one-tail0.024 t Critical one-tail1.734 P(T<=t) two-tail0.047 t Critical two-tail2.101

Running a T Test (in R) t.test(win, mac) Spring /6.831 User Interface Design and Implementation16

Using Factors in R time = c(win,mac) menubar = factor(c(rep(“win”,10),rep(“mac”,10))) time = [ 625, 480, …, 647, 503, …] menubar = [ win, win, …, mac, mac, …] t.test(time ~ menubar) Spring /6.831 User Interface Design and Implementation17

Spring /6.831 User Interface Design and Implementation18 Paired T Test For within-subject experiments with two conditions Uses the mean of the differences (each user against themselves) H0: mean(A_i – B_i) = 0 H1: mean(A_i – B_i) <> 0 (two-sided test) or mean(A_i – B_i) > 0 (one-sided test)

Reading a Paired T Test Spring /6.831 User Interface Design and Implementation19 WinMac WindowsMac Mean Variance Observations10 Pearson Correlation0.370 Hypothesized Mean Difference0 df9 t Stat2.675 P(T<=t) one-tail0.013 t Critical one-tail1.833 P(T<=t) two-tail0.025 t Critical two-tail2.262

Running a Paired T Test (in R) t.test(times ~ menubar, paired=TRUE) Spring /6.831 User Interface Design and Implementation20

Spring /6.831 User Interface Design and Implementation21 Analysis of Variance (ANOVA) Compares more than 2 means One-way ANOVA –1 independent variable with k >= 2 levels –H0: all k means are equal –H1: the means are different (so the independent variable matters)

Running a One-Way ANOVA (Excel) Spring /6.831 User Interface Design and Implementation22 WinMacBottom GroupsCountSumAverageVariance Windows Mac Bottom Total Source of VariationSSdfMSFP-valueF crit Between Groups Within Groups Total

Running ANOVA (in R) time = [ 625, 480, …, 647, 503, …, 485, 436, …] menubar = [ win, win, …, mac, mac, …, btm, btm, …] fit = aov(time ~ menubar) summary(fit) Spring /6.831 User Interface Design and Implementation23

Running Within-Subjects ANOVA (in R) time = [ 625, 480, …, 647, 503, …, 485, 436, …] menubar = [ win, win, …, mac, mac, …, btm, btm, …] subject = [ u1, u1, u2, u2, …, u1, u1, u2, u2 …, u1, u1, u2, u2, …] fit = aov(time ~ menubar + Error(subject/menubar)) summary(fit) Spring /6.831 User Interface Design and Implementation24

Tukey HSD Test Tests pairwise differences for significance after a significant ANOVA test –More stringent than multiple pairwise t tests Be careful in general about applying multiple statistical tests Spring /6.831 User Interface Design and Implementation25 Win vs. Mac3.201 Mac vs. Bottom0.242 Win vs. Bottom3.443 critical value3.5

Tukey HSD Test (in R) TukeyHSD(fit) Spring /6.831 User Interface Design and Implementation26

Spring /6.831 User Interface Design and Implementation27 Two-Way ANOVA 2 independent variables with j and k levels, respectively Tests whether each variable has an effect independently Also tests for interaction between the variables

Two-way Within-Subjects ANOVA (in R) time = [ 625, 480, …, 647, 503, …, 485, 436, …] menubar = [ win, win, …, mac, mac, …, btm, btm, …] device = [ mouse, pad, …, mouse, pad, …, mouse, pad, …] subject = [ u1, u1, u2, u2, …, u1, u1, u2, u2 …, u1, u1, u2, u2, …] fit = aov(time ~ menubar*device + Error(subject/menubar*device)) summary(fit) Spring /6.831 User Interface Design and Implementation28

Other Tests Two discrete-valued variables –“does past experience affect menubar preference?” independent var { WinUser, MacUser} dependent var {PrefersWinMenu, PrefersMacMenu} –contingency table –Fisher exact test and chi square test Two (or more) scalar variables –Regression Spring /6.831 User Interface Design and Implementation29 PrefersWinPrefersMac WinUser259 MacUser819

Tools for Statistical Testing Web calculators Excel Statistical packages –Commercial: SAS, SPSS, Stata –Free: R Spring /6.831 User Interface Design and Implementation30

Summary Use statistical tests to establish significance of observed differences Graphing with error bars is cheap and easy, and great for getting a feel for data Use t test to compare two means, ANOVA to compare 3 or more means Spring /6.831 User Interface Design and Implementation31

UI Hall of Fame or Shame? Spring /6.831 User Interface Design and Implementation32