Day 10 Analysing usability test results
Objectives To learn more about how to understand and report quantitative test results To learn about some basic statistical terms To learn about t-tests To learn whether obtained results are “significant”
From
Analysing and presenting results - qualitative Qualitative data You could group comments from participants that seem to go together and explain how many had what problem For example, 4 stated that they did not realise you would click on “see it” to find the price of an item 2 stated that they searched the whole site and didn’t find the price of the item In your report, you could give quotations to back up what you are saying (this shows more clearly that this is not your subjective feeling … it makes it objective)
Quantitative data You might want to report things like: how many clicks people took on a task compared to an expert how much time people took on a task with design A compared with design B how many errors novice users made compared with another group who are experienced users of certain types of software But, just reporting the numbers is not enough! Analysing and presenting results - quantitative
The garden.com study from yesterday For each task, they reported the means (calculated for the participants) whether the mean for the participants differed from an expert (Y/N) The Standard Deviation (a measure of how much the data is dispersed around the mean … how consistent the data is) But, this is not enough!
Statistical tests Notice that they also said they did statistical tests “to determine whether real differences exist” between the participants and the expert They should have given more details what statistical test the values obtained from the statistical tests
Statistical test – some background The normal curve The standard deviation Types of “experiments” p values t-tests single sample, with hypothesised mean independent samples correlated samples
The normal curve and standard deviations
The standard deviation A measure of the variability of the data about the mean A large standard deviation means the values obtained from the subjects vary a lot from the mean A small standard deviation means the values obtained from the subjects vary little from the mean
Why is the standard deviation important? Table 7 of the garden.com study Compare task 1 and task 3 Statistical tests can take this SD difference into account The appropriate statistical test is the t-test
The t-test The t-test will tell you whether one set of means are really different from another set That is, it is a statistical test to compare means There are really 3 kinds of t-tests Single sample when you are comparing participant means with an expert (we’ll call this the hypothesised mean) Independent samples when you are comparing performance by two groups Correlated samples when you are comparing one group tested in two different situations
Single sample test Where you have one group of subjects and test them against one mean for example one value is obtained from one expert, which is then assumed to be the mean for some expert group or it could be more like a benchmark, and you compare the means of the participants to some benchmark
Independent samples This is where you have data from different groups of subjects for example you have novices and experienced users and you are comparing the means of the two groups Condition Group 1 members Condition Group 2 members
Correlated samples This is where you use the same subjects for two different measures and want to compare them for example you give subjects 2 tasks and see if they found one harder than the other Group members Condition 1Condition 2
The p-value When you run a statistical test, you get a p- value p-value stands for probability value The aim of statistical tests is to determine whether the results could have occurred by chance If it is very unlikely that certain results occurred by chance then there is probably some other reason; for example, maybe novice users get more confused than expert users
The importance of p <.05 Usually, if results could be obtained by chance less than 5 times in a hundred, we say the results are significant When you do a statistical test, you will get a p- value expressed as a decimal; for example p=.04 (the probability of getting the results by chance is just 4 in 100) Any p <.05 is significant: you can assume your observed differences are significant
One and two tailed t-tests one-tailed test: used when you have predicted the direction of the difference; for example, novices will use more clicks than experts two-tailed test: used when you have predicted a difference, but have not stated the direction of the difference; for example, there will be a difference in performance between males and females
Today’s lab We will run some t-tests on some fake data