Download presentation
Presentation is loading. Please wait.
1
Dr. Richard Jackson jackson_r@mercer.edu
Chi Square (2) Dr. Richard Jackson Chi square is a very important statistic in the medical and clinical literature. The symbol for chi square is the Greek letter chi with the squared sign. © Mercer University 2005 All Rights Reserved
2
Description Non-Parametric
Data Expressed as frequencies or % converted to frequencies Cells Must be Independent Compares “observed” Frequencies to “Expected” Frequencies Chi square is a non parametric statistic in that it does not involve the calculation of means or standard deviations. As stated earlier, nonparametric statistics are not as strong as parametric statistics like the t tests and analysis of variance. Chi square is the appropriate statistic to use for analyses when the data are expressed as frequencies or percentages that can be then converted into frequencies. The data are represented in individual cells and each of these cells must be independent, meaning that the measurement of one individual must not appear then anymore then one cell. Chi square involves comparing those frequencies that are observed to expected frequencies or it compares the data that are observed in the study to that which we would expect to occur by chance alone if in fact there was no difference in the groups that are being investigated.
3
“Observed” Versus “Expected”
10 Tosses: 7H, 3T (observed) Expected = 5H, 5T Determine Probability of 7H + 3T by chance To get a better understanding of observed vs. expected frequencies, lets consider the example of tossing the coin 10 times. The observed data that we might experience would include 7 heads and 3 tails. What would be expected by chance alone is 5 heads and 5 tails. We can calculate or determine the probability of getting 7 heads and 3 tails by chance alone and that way we would be comparing the expected values to the observed values.
4
Clinical Example: Observed Data
Sick Not Sick Drug Placebo 15 85 5 45 10 40 Lets consider a clinical example and one we described early on in the course. It was the study where we tested the difference between a drug and a placebo in terms of preventing motion sickness. We divvied 100 people into two groups of 50 each and ½ got the drug and ½ got the placebo and you determine how many got sick in each group. 5/50 people in the drug group got sick and 10/50 in the placebo group got sick. This represents our observed data
5
Clinical Example: Expected (If no difference)
S NS D P 7.5 42.5 7.5 42.5 Our expected data can be calculated by determining what we expect to occur by chance alone and by chance alone we mean, if there were no difference in the drug group and the placebo group, note that 15 people in the entire study got sick. If in fact there was no difference in the drug group and the placebo group we would have expected half of those to come from the drug group and half from the placebo group. Half of 15 is 7.5. So, if there was no difference in the drug group and the placebo group we would expect the expected frequencies for the sick group to be equal or 7.5 in each group and the same is true for the nonsick group. This represents our expected values, that which we would expect to occur by chance alone if in fact there was no difference in the group being investigated.
6
Null Hypothesis for Chi Square
Observed Frequencies Not Differ from Expect to Occur by Chance Calculated Chi Square (like F and t has p value) 2 from Clinical Example = 1.96, p=0.0616 Conclusion: Accept Ho (no difference) The null hypothesis for chi square analysis is not so easily expressed in terms of symbols and equations. Basically, the null hypothesis is that the observed values do not differ significantly from the expected values and the expected values are those that we expect to occur by chance alone. In chi square analysis we calculate a value for chi square and like the F and the t we have already discussed. The chi square value has a p value associated with it. In the clinical example we just discussed the calculated value for chi square comes out to be 1.96 that has a p value of Assuming that we had selected A priori, a maximum significance level of 0.05, we would be led to accept the null hypothesis because our p value is greater then Our conclusion would be that we accept our null hypothesis that our observed values do not differ significantly from our expected values and our conclusion is that there is no statistically significant difference between the drug group and the placebo group. Even though at first blush, 5/50 looks different then 10/50, it is not statistically significant.
7
Contingency Table Data in Frequencies Example above is 2 x 2
2 x 2 Table AKA Fourfold Table The data in chi square analyses are contained in tables that are known as contingency tables. The data are expressed in frequencies and the example we just went through are referred to as a 2 X 2 table. Another name for that is a four fold table.
8
In 2 x 2 Table No Expected Frequency <5 If so, use Fisher Exact
> 2 x 2; if > 20% expected cells < 5, or if any of the expected cells <1, then collapse table (combine adjacent categories) or use correction formula If we have a 2 by 2 table or a study involving a 2 by 2 table, no expected frequency should be less then 5. If it comes out that one or more of the expected frequencies is less then 5 it is no appropriate to use chi square in the analysis. Instead another statistic known as the fisher exact statistic should be used. If we have a contingency table that is greater then 2 by 2. If 20% of the expected cells is less then 5 or if any of the expected cells is less then 1, then it would be necessary to collapse the tables combining adjacent categories to get the values to get the cells up or use some correction formula that is available.
9
Example of Fisher Exact
0 1 P Observed D 6 2 1 7 This is an example of the use of the fisher exact statistic. You have two groups of patients. One in a placebo group and one in a drug group. 8 in each and what we are observing is how many had zero seizures in the left hand column vs. how many had one or more seizures. In the placebo group, 6 have 0 seizures and 2 have one or more seizures and in the drug group 1 and 7 respectively.
10
Example of Fisher Exact
0 1 P Expected D 3.5 4.5 3.5 4.5 The expected value should be represented on this slide, you may want to change your copy from observed to expected and you will note that in fact all 4 of the cells have expected values less then 5. Therefore, chi square is not the appropriate statistic to use.
11
Fisher Exact p = 0.04 Reject Ho
The appropriate statistic to use would be the fisher exact statistic which in this case has a p value of 0.04 and the conclusion would be that we reject the null hypothesis since the p value is less then We conclude that there is a significant difference in the greater seizures for those individuals in the drug group then in the placebo group.
12
Calculation of Chi Square (Example)
Died Survived Drug Placebo 4 51 Lets take a look at how we may actually calculate chi square not using the computer. You have an example of a study involving a drug group and a placebo group and how many in each group died vs. survived. We have 55 people in the drug group and 52 people in the placebo group and we observed that 4/55 died in the drug group and 14/52 died in the placebo group. 14 38
13
Hand Calculation of Chi Square (Example)
Calculate Expected Value for Each Cell Row Total X Column Total/N The fist step in the analysis is to calculate the expected value for each cell. We have 4 cells here so we want to calculate the expected value for each cell and that is done by taking the cells row total and the cells column total and dividing by N. Lets go through that calculation in the next slide.
14
Determination of Expected Values
Upper Left Cell Upper Right Cell Lower Left Cell Lower Right Cell (55)(18)/107 (55)(89)/107 (52)(18)/107 (52)(89)/107 9.25 45.75 .75 43.25 For the upper cell you will see that the row total was 55 the column total is 18. Multiplying those two together and dividing by the total number of subjects in the study, 107, gives us an expected value for that cell of For the upper right cell it is 55 times 89 divided by 107 giving us a value of and so on for the other two cells. Note that we cannot use the same procedure here as we did before, that is simply dividing the total number by two because we have an unequal number of subjects in our drug group and our placebo group
15
Calculation of Chi Square
Chi Square = (O-E)2/2E Examples for Upper Left Cell (O-E)2/E = (4 – 9.25)2/9.25 = 2.98 Repeat for Each Cell and Sum 2 = 7.38 To calculate the value for chi square for each of the four cells we carry out our mathematical manipulation and add the 4 together. For each cell we take the observed value minus that cells expected value, square it and divide it by that cells expected value. An example for the upper left cell includes the observed value 4 minus the expected value 9.25, we square that then divide that by the cells expected value giving us the value We repeat this process for each of the four cells and then add the four numbers. When we do that we end up with a chi square value of 7.38 for this analysis.
16
Chi Square Test Using Tabled Values (See Table I)
Compare Calculated Value With Tabled Value If Calculated Value Tabled Value Reject Null Hypothesis If Calculated Value < Tabled Value Accept Null Hypothesis Then, like we did for the t test analysis, we compare our calculated value to our table value. If our calculated value exceeds the table value we reject the null hypothesis. If it is less then our table value then we accept our null hypothesis. Take a look at the table of chi square values associated with this module. The calculated value of our example is 7.38 and we want to compare our calculated value to a table value.
17
Example Calculated Value = 7.38 df = (R – 1) (C – 1) = 1
Level of Significance = 0.05 Tabled Value = 3.841 Reject Null Hypothesis (p < 0.05) To determine the table value, we enter the table at the appropriate degrees of freedom. DF for chi square is the formula R minus 1 or Rows minus 1 or columns minus 1. In this case we had 2 rows and 2 columns in our data. So our degrees of freedom is equal to 1. Reading across the top of the table we choose the level of significance which is traditionally Therefore, our table value is Our calculated value was So, since the calculated value exceeds the table value we reject the null hypothesis at p less then In other words, the probability of committing a type 1 error is less then 5 out of 100. Therefore we can be confident in saying that there is a statistically significant difference between the drug and placebo group.
18
Statistix Calculation (See Table II)
Expected Values Chi Square Provides p-value If you take a look at table II, you will see the results of the statistic software calculation for chi square. In that analysis you will see it provides you with the observed value for each cell. The expected value for each cell and the chi square calculation. At the bottom of the page you get the overall chi square value 7.38, the p value which was for this example. In other words we would reject the null hypothesis and conclude that there was a statically significant difference at p equal which is of course less then 0.05.
19
Chi Square in > 2 x 2 Table
MI SI NI D P df = 2, 2 = 16.27, p = 60 32 28 This slide illustrates the use of chi square in a contingency table that is greater then 2 X 2. In this situation we are testing the difference between a drug group and a placebo group as to whether the patients were moderately improved significantly improved, or no improvements. As you see the data are expressed in frequencies. DF is equal to 2 in this case. The calculated value comes out to be and the corresponding p value is The conclusion is that we would reject the null hypothesis and conclude that there is a statistically significant difference between the drug group and the placebo group with regard to the improvement levels. With more of the drug group having moderate and significant improvement and fewer having no improvement. 28 17 45
20
Goodness of Fit Expected Values Not Calculated As Above
Expected Values Known Another type of chi square is to referred to as goodness of fit and this chi square analysis the expected values are not calculated. In goodness of fit analysis, expected values would already be known by previous experience or other studies. The slide following gives us an example of that.
21
Example of Goodness of Fit
200 Cancer Patients by Blood Type O A B AB In this study, it was observed that among 200 cancer patients that an inordinately large amount of them had type A blood. A question was raised was to whether or not the distribution of these 200 cancer patients by blood type differs from what we would expect from the normal population. Here we see the observed data for 200 patients who had this particular type of cancer and what the researcher did was to compare this observed data, these observed frequencies to what would want to expect to occur by chance if we were talking about the normal population to see if in fact there was a significantly greater number of people that had this type of cancer also having type A blood. 76 104 16 4
22
“Expected” Blood Type of Population
200 Cancer Patients by Blood Type O A B AB 2 = 12.3, p < 0.01, Reject Ho Its known that 45% of the population has type O blood so out of 200 we would expect 90 to have type O, 40$ to have type A, so 80 out of 200, 12% type B 24, and 6% have type AB. If we compare our observed to our expected values and calculate chi square it comes out to be 12.3 which has a p value associated with it less then Therefore we would reject the null hypothesis and conclude that among cancer patients there is an inordinately large number of individuals that have type A blood. This may be cause or reason for additional research into the relationship that might exist between persons with type A blood and this particular type of cancer. 90 80 24 6
23
Summary of Chi Square Data Expressed as Frequencies
Compares Observed to Expected To summarize, chi square is the statistic of choice to determine if there is a difference in groups if the data are expressed as frequencies or nominal data. Basically, it compares observed data to what we would expect to occur by chance alone if in fact there was no difference in the groups being observed.
24
How to Use Statistix to Analyze Data with Chi Square
Select Statistics, Association Tests, Two by Two Tables Enter cell counts then OK Read Chi Square Value and p (Pearson’s) Use Statistix to Work Problems at End of Module To analyze data using the statistic software package, simply select statistics association tests in 2 by 2 tables and enter each cell then ok. You can read the chi square value and its corresponding p value in the lower left hand portion of the table. And now take a look at the problems at the end of this module and use statistics to work some of those.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.