Download presentation
Presentation is loading. Please wait.
1
Chi-square Basics
2
The Chi-square distribution
Positively skewed but becomes symmetrical with increasing degrees of freedom Mean = k where k = degrees of freedom Variance = 2k Assuming a normally distributed dataset and sampling a single z2 value at a time 2(1) = z2 If more than one… 2(N) =
3
Why used? Chi-square analysis is primarily used to deal with categorical (frequency) data We measure the “goodness of fit” between our observed outcome and the expected outcome for some variable With two variables, we test in particular whether they are independent of one another using the same basic approach.
4
One-dimensional Suppose we want to know how people in a particular area will vote in general and go around asking them. How will we go about seeing what’s really going on? Republican Democrat Other 20 30 10
5
Hypothesis: Dems should win district
Solution: chi-square analysis to determine if our outcome is different from what would be expected if there was no preference
6
Plug in to formula 20 30 10 Observed Expected Republican Democrat
Other Observed 20 30 10 Expected
7
Reject H0 The district will probably vote democratic However…
8
Conclusion Note that all we really can conclude is that our data is different from the expected outcome given a situation Although it would appear that the district will vote democratic, really we can only conclude they were not responding by chance Regardless of the position of the frequencies we’d have come up with the same result In other words, it is a non-directional test regardless of the prediction
9
More complex What do stats kids do with their free time? 30 40 20 10
TV Nap Worry Stare at Ceiling Males 30 40 20 10 Females
10
Example for males TV: (100*50)/200 = 25
Is there a relationship between gender and what the stats kids do with their free time? Expected = (Ri*Cj)/N Example for males TV: (100*50)/200 = 25 TV Nap Worry Stare at Ceiling Total Males 30 40 20 10 100 Females 50 70 60 200
11
df = (R-1)(C-1) R = number of rows C = number of columns 30 (25)
TV Nap Worry Stare at Ceiling Total Males (E) 30 (25) 40 (35) 20 (30) 10 (10) 100 Females (E) 20 (25) 30 (35) 40 (30) 50 70 60 20 200
12
Interpretation Reject H0, there is some relationship between gender and how stats students spend their free time
13
Other Important point about the non-directional nature of the test, the chi-square test by itself cannot speak to specific hypotheses about the way the results would come out Not useful for ordinal data because of this
14
Assumptions Normality Inclusion of non-occurences Independence
Rule of thumb is that we need at least 5 for our expected frequencies value Inclusion of non-occurences Must include all responses, not just those positive ones Independence Not that the variables are independent or related (that’s what the test can be used for), but rather as with our t-tests, the observations (data points) don’t have any bearing on one another. To help with the last two, make sure that your N equals the total number of people who responded
15
Measures of Association
Contingency coefficient Phi Cramer’s Phi Odds Ratios Kappa These were discussed in 5700
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.