Download presentation
Presentation is loading. Please wait.
Published byEgbert Howard Modified over 6 years ago
1
CS 594: Empirical Methods in HCC Experimental Research in HCI (Part 2)
Dr. Debaleena Chattopadhyay Department of Computer Science debaleena.com hci.cs.uic.edu
2
Agenda Discuss course project details Revisiting Parametric Statistics
Non-Parametric Statistics Categorical Data
3
Course Project Details
Part 1 (15%) Research proposal. Research design and conceptualization of a chosen research topic. – Due 9/26 Part 2 (25%) Data analysis. Results and Discussion. --Due mid term and finals
4
Course Project Details (cont.…)
You may deal with the same research topic for part 1 and part 2. Conceptualize, collect, and analyze data. You may use different topics for part 1 and part 2. For example, use data that you had collected before, but not analyzed (must get instructor approval beforehand)
5
Course Project Details (cont.…)
Part 1 Scope of research Conceptualization – research questions Operationalization Metrics; what would you measure? Validity and Reliability? Hypotheses How would you collect data? How would you analyze data? Why is this methodology suitable? Explain how the data collected and anticipated results will help you answer the research questions.
6
Course project; Part 1 Research scope must not be trivial
NO simple usability tests Your proposal will be evaluated on the following Correctness of operationalization and RQs Quality of metrics Quality of data collection plan Correctness of rationale for the chosen empirical method Degree of difficulty of the research proposal (10%) Proposals will be evaluated in two dimensions: degree of difficulty and execution appraisal
7
Revisiting Parametric Statistics
8
Q1 What does a significant test statistic tells us?
There is an important effect The null hypothesis is false There is an effect in the population of sufficient magnitude to warrant interpretation All of the above
9
Q1 What does a significant test statistic tells us?
There is an important effect The null hypothesis is false There is an effect in the population of sufficient magnitude to warrant interpretation All of the above
10
Q2 A Type II error is when:
We conclude that there is an effect in the population when in fact there is not. We conclude that there is not an effect in the population when in fact there is. We conclude that the test statistic is significant when in fact it is not. The data we have entered in R is different that the data collected.
11
Q2 A Type II error is when:
We conclude that there is an effect in the population when in fact there is not. We conclude that there is not an effect in the population when in fact there is. We conclude that the test statistic is significant when in fact it is not. The data we have entered in R is different that the data collected.
12
Q3 Which of these statements about statistical power is not true?
Power is the ability of a test to detect an effect, given that an effect of a certain size exists in the population. We can use power to determine how big a sample is required to detect an effect of a certain size. Power is linked to the probability of making a Type II error. All of the above are true.
13
Q3 Which of these statements about statistical power is not true?
Power is the ability of a test to detect an effect, given that an effect of a certain size exists in the population. We can use power to determine how big a sample is required to detect an effect of a certain size. Power is linked to the probability of making a Type II error. All of the above are true.
14
Q4 Which of the following are assumptions underlying the use of parametric tests (based on the normal distribution)? Some feature of the data should be normally distributed. The samples being tested should have approximately equal variances. Your data should be at least interval level. All of the above.
15
Q4 Which of the following are assumptions underlying the use of parametric tests (based on the normal distribution)? Some feature of the data should be normally distributed. The samples being tested should have approximately equal variances. Your data should be at least interval level. All of the above.
16
Q5 The Shapiro-Wilk test can be used to test:
Whether data are normally distributed. Whether group variances are equal. Whether scores are measured at the interval level Whether group means differ
17
Q5 The Shapiro-Wilk test can be used to test:
Whether data are normally distributed. Whether group variances are equal. Whether scores are measured at the interval level Whether group means differ
18
Q6 The correlation between two variables A and B is .12 with a significance of p <.01. What can we conclude? That there is a substantial relationship between A and B That there is a small relationship between A and B. That variable A causes variable B. All of the above.
19
Q6 The correlation between two variables A and B is .12 with a significance of p <.01. What can we conclude? That there is a substantial relationship between A and B That there is a small relationship between A and B. That variable A causes variable B. All of the above.
20
Normality
21
Homogeneity of variance
22
T-test (independent)
23
T-test (dependent)
24
Non-Parametric Statistics
25
When to use non-parametric tests?
Data are not normally distributed Data are not measured at interval level. Non-parametric tests sometimes get referred to as distribution-free tests, with an explanation that they make no assumptions about the distribution of the data.* Technically, this isn’t true: they do make distributional assumptions (e.g., the ones in this chapter all assume a continuous distribution), but they are less restrictive ones than their parametric counterparts.
26
Common Non-parametric Tests in use
Wilcoxon rank-sum test/ Mann–Whitney test (similar to independent t-test) Wilcoxon signed-rank test (similar to dependent t-test) Friedman’s test (similar to repeated-measures ANOVA) Kruskal–Wallis test (similar to one-way ANOVA)
27
Comparing two independent conditions: the Wilcoxon rank-sum test
When you want to test differences between two conditions and different participants have been used in each condition then you have two choices Wilcoxon rank-sum test Mann–Whitney test
28
Wilcoxon rank-sum test
If you have the data for different groups stored in a single column newModel<-wilcox.test(outcome ~ predictor, data = dataFrame, paired = FALSE/TRUE) if you have the data for different groups stored in two columns newModel<-wilcox.test(scores group 1, scores group 2, paired = FALSE/TRUE) outcome is a variable that contains the scores for the outcome measure (in this case drug). MM predictor is a variable that tells us to which group a score belongs (in this case sundayBDI or wedsBDI). scores group 1 is a variable that contains the scores for the first group. MM scores group 2 is a variable that contains the scores for the second group.
29
Example output For example, a neurologist might collect data to investigate the depressant effects of certain recreational drugs. She tested 20 clubbers in all: 10 were given an ecstasy tablet to take on a Saturday night and 10 were allowed to drink only alcohol. Levels of depression were measured using the Beck Depression Inventory (BDI) the day after and midweek.
30
Wilcoxon signed-rank test
Used in situations in which there are two sets of scores to compare, but these scores come from the same participants. As such, think of it as the nonparametric equivalent of the dependent t-test.
31
Kruskal–Wallis test The one-way independent ANOVA has a non-parametric counterpart called the Kruskal–Wallis test. When the data are collected using different participants in each group, we input the data using a coding variable. So, the data editor will have two columns of data. The first column is a factor. The One-way ANOVA is also called a single factor analysis of variance because there is only one independent variable or factor.
32
Kruskal–Wallis test (example output)
33
Kruskal–Wallis test (example output)
34
Differences between several related groups: Friedman’s ANOVA
Used for testing differences between conditions when there are more than two conditions and the same participants have been used in all conditions. If you have violated some assumption of parametric tests then this test can be a useful way around the problem.
35
Friedman’s ANOVA (example)
36
Categorical Data
37
Chi-square test; contingency table
There is one problem with the chi-square test, which is that the sampling distribution of the test statistic has an approximate chi-square distribution. The larger the sample is, the better this approximation becomes, and in large samples the approximation is good enough to not worry about the fact that it is an approximation. However, in small samples the approximation is not good enough, making significance tests of the chi-square distribution inaccurate. This is why you often read that to use the chi-square test the expected frequencies in each cell must be greater than 5 (see section 18.5). When the expected frequencies are greater than 5, the sampling distribution is probably close enough to a perfect chisquare distribution for us not to worry. However, when the expected frequencies are too low, it probably means that the sample size is too small and that the sampling distribution of the test statistic is too deviant from a chi-square distribution to be of any use. Fisher came up with a method for computing the exact probability of the chi-square statistic that is accurate when sample sizes are small. This method is called Fisher’s exact test Therefore, we use ‘expected frequencies’. One way to estimate the expected frequencies would be to say ‘well, we’ve got 200 cats in total, and four categories, so the expected value is simply 200/4 = 50’.
38
Chi-square test; contingency table
There is one problem with the chi-square test, which is that the sampling distribution of the test statistic has an approximate chi-square distribution. The larger the sample is, the better this approximation becomes, and in large samples the approximation is good enough to not worry about the fact that it is an approximation. However, in small samples the approximation is not good enough, making significance tests of the chi-square distribution inaccurate. This is why you often read that to use the chi-square test the expected frequencies in each cell must be greater than 5 (see section 18.5). When the expected frequencies are greater than 5, the sampling distribution is probably close enough to a perfect chisquare distribution for us not to worry. However, when the expected frequencies are too low, it probably means that the sample size is too small and that the sampling distribution of the test statistic is too deviant from a chi-square distribution to be of any use. Fisher came up with a method for computing the exact probability of the chi-square statistic that is accurate when sample sizes are small. This method is called Fisher’s exact test Therefore, we use ‘expected frequencies’. One way to estimate the expected frequencies would be to say ‘well, we’ve got 200 cats in total, and four categories, so the expected value is simply 200/4 = 50’.
39
Upcoming: Proposal due Sep 26, 11:59pm
Start working on your annotated bibliography Post your slides on piazza after class presentations
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.