Download presentation
Presentation is loading. Please wait.
Published byKenneth Ball Modified over 9 years ago
1
Chi-square Basics
2
The Chi-square distribution Positively skewed but becomes symmetrical with increasing degrees of freedom Mean = k where k = degrees of freedom Variance = 2k Assuming a normally distributed dataset and sampling a single z 2 value at a time – 2 (1) = z 2 –If more than one… 2 (N) =
3
Why used? Chi-square analysis is primarily used to deal with categorical (frequency) data We measure the “goodness of fit” between our observed outcome and the expected outcome for some variable With two variables, we test in particular whether they are independent of one another using the same basic approach.
4
One-dimensional Suppose we want to know how people in a particular area will vote in general and go around asking them. How will we go about seeing what’s really going on? RepublicanDemocratOther 203010
5
Hypothesis: Dems should win district Solution: chi-square analysis to determine if our outcome is different from what would be expected if there was no preference
6
Plug in to formula RepublicanDemocratOther Observed 203010 Expected 20
7
Reject H 0 The district will probably vote democratic However…
8
Conclusion Note that all we really can conclude is that our data is different from the expected outcome given a situation –Although it would appear that the district will vote democratic, really we can only conclude they were not responding by chance –Regardless of the position of the frequencies we’d have come up with the same result –In other words, it is a non-directional test regardless of the prediction
9
More complex What do stats kids do with their free time? TVNapWorryStare at Ceiling Males 30402010 Females 20304010
10
Is there a relationship between gender and what the stats kids do with their free time? Expected = (R i *C j )/N Example for males TV: (100*50)/200 = 25 TVNapWorryStare at Ceiling Total Males 30402010100 Females 20304010100 50706020200
11
df = (R-1)(C-1) –R = number of rows –C = number of columns TVNapWorryStare at Ceiling Total Males (E) 30 (25)40 (35)20 (30)10 (10)100 Females (E) 20 (25)30 (35)40 (30)10 (10)100 50706020200
12
Interpretation Reject H 0, there is some relationship between gender and how stats students spend their free time
13
Other Important point about the non-directional nature of the test, the chi-square test by itself cannot speak to specific hypotheses about the way the results would come out Not useful for ordinal data because of this
14
Assumptions Normality –Rule of thumb is that we need at least 5 for our expected frequencies value Inclusion of non-occurences –Must include all responses, not just those positive ones Independence –Not that the variables are independent or related (that’s what the test can be used for), but rather as with our t-tests, the observations (data points) don’t have any bearing on one another. To help with the last two, make sure that your N equals the total number of people who responded
15
Measures of Association Contingency coefficient Phi Cramer’s Phi Odds Ratios Kappa These were discussed in 57005700
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.