The Chi-square Statistic
Calculating Probabilities
Probability Probability of an event happening = Number of ways it can happen Total number of outcomes
Coin Toss Example A balanced coin flipped in an unbiased way results in heads or tails (each with an equal 50% chance) Chance of heads = one/two possible outcomes What if the last 4 coin flips were heads, what is the chance of the next flip resulting in tails?
Probability of Failure Know the odds! Example when rolling a die, the chance of your number coming up equals 1/6 (or 16.6%) More importantly the chance of numbers that you didn’t pick to show up is 1 – 1/6 (or 83.3%)
The Chi-square Test The Chi-square test is checking to see if the observed results match the expected results Like with the Dice rolls, if you rolled a dice 100 times did you indeed observe about 1/6 of each number. You can put the observed values versus the expected values in the test to see if the dice is not faulty or loaded.
Goodness of fit This test is used to decide whether there is any difference between the observed (experimental) value and the expected (theoretical) value.
Goodness of Fit
Free from Assumptions Chi square goodness of fit test depends only on the set of observed and expected frequencies and degrees of freedom. This test does not need any assumption regarding distribution of the parent population from which the samples are taken. Since this test does not involve any population parameters or characteristics, it is termed as non-parametric or distribution free tests. This test is also sample size independent and can be used for any sample size. Generally performed on Discrete data
It is all about expectations Oi = an observed frequency (i.e. count) for measurement i Ei = an expected (theoretical) frequency for measurement i, asserted by the null hypothesis.
Another way to look at it The value of the Chi-squared statistic = the sum of the (squares of the differences) expected values
Expected Value F = the cumulative Distribution function for the distribution being tested. Yu = the upper limit for class I (maximum possible observations for any category) Yl = the lower limit for class I (minumum possible observations for any category) N = the sample size
Hypothesis testing Choose a level of alpha – usually 0.05 This implies a 95% level of comfort that the observation is correct.
Degrees of Freedom = Number of groups – 1 Example The number of cubs delivered to a population of bears in the wild is tested to see if there is no difference in probability of twins. (N = 50 females) Number of cubs 1 2 3 Observed 5 35 9 Expected 12.5 Degrees of Freedom = Number of groups – 1 df = 4 – 1 = 3
CHI-SQUARE DISTRIBUTION TABLE
Decision Rule Based on the alpha and the degrees of freedom, look up the value in the table. For our example of alpha=.05 and df=3 If chi square is greater than 7.82 then reject the null hypothesis that bears normally birth twins.
Calculate the value Number of cubs 1 2 3 Observed 5 35 9 Expected 12.5 Chi-square = (1-12.5)2/12.5 + (5-12.5)2/12.5 + (35-12.5)2/12.5 + (9-12.5)2/12.5 = 10.58 + 4.5 + 40.5 + 0.98 = 56.56 Since 56.56 > 7.82 we reject the null hypothesis that the number of bear cubs is equally possible for 0-3 cubs
Interpret the result Since we rejected the null hypothesis, what conclusions (inferences) can we come to?
Two-Way Table Method Observed Column 1 Column 2 Row Totals Row 1 Row 1 Total (R1T) Row 2 Row 2 Total (R2T) Column Totals Column 1 Total (C1T) Column 2 Total (C2T) Grand Total (GT) Each value in the expected values table is calculated by multiplying the row total X the column total and dividing by the grand total for each cells location Expected Column 1 Column 2 Row 1 R1T*C1T/GT R1T*C2T/GT Row 2 R2T*C1T/GT R2T*C2T/GT
2-Way Chi-Square Conditions Simple Random Samples Categorical Data Degrees of Freedom equals number of rows minus 1 times the number of columns minus 1 or DF = (r – 1) * (c – 1) Test Statistic is calculated as before but this time for each cell of the table Χ2 = Σ [ (Or,c - Er,c)2 / Er,c ] P-value is the probability of observing a sample statistic as extreme as the test statistic.
Two-Way Table Example Observed Democrat Republican Row Totals Male 20 30 50 Female Column Totals 100 Each value in the expected values table is calculated by multiplying the row total (50) X the column total (50) and dividing by the grand total (100) for each cells location. Expected Democrat Republican Male 25 Female Calculating the Chi-square statistic: ((20-25)^2/25) + ((30-25)^2/25) + ((30-25)^2/25) + ((20-25)^2/25), or (25/25) + (25/25) + (25/25) + (25/25) or 1 + 1 + 1 + 1 or 4.
Compare Chi-square to table For the example, Chi-square = 4 The degrees of freedom are 1 Since 4 > 3.841 We can reject the null hypothesis that political party is independent of gender with 95% confidence.