Download presentation
Presentation is loading. Please wait.
1
PRESENTED BY: Dhruv J. Patel M. Pharm. 1 nd sem.(2017-2018) 1 DHRUV J PATEL
2
A chi-square test, also written as χ² test, is any statistical hypothesis test wherein the sampling distribution of the test statistic is a chi-squared distribution when the null hypothesis is true. The chi-square test is an important test amongst the several tests of significance developed by statisticians. It is was developed by Karl Pearson in1900. 2 DHRUV J PATEL
3
CHI SQUARE TEST is a non parametric test not based on any assumption or distribution of any variable. The chi-squared test is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories. This statistical test follows a specific distribution known as chi square distribution. In general The test we use to measure the differences between what is observed and what is expected according to an assumed hypothesis is called the chi-square test. 3 DHRUV J PATEL
4
The term "non-parametric" refers to the fact that the chi-square tests do not require assumptions about population parameters nor do they test hypotheses about population parameters. Some examples of hypothesis tests, such as the t tests and ANOVA, are parametric tests and they do include assumptions about parameters and hypotheses about parameters. 6 4 DHRUV J PATEL
5
The most obvious difference between the chi- square tests and the other hypothesis tests we have considered (t and ANOVA) is the nature of the data. For chi-square, the data are frequencies rather than numerical scores. 7 5 DHRUV J PATEL
6
1) PARAMETRIC TEST: The test in which, the population constants like mean,std deviation, std error, correlation coefficient, proportion etc. and data tend to follow one assumed or established distribution such as normal, binomial, poisson etc. 2) NON PARAMETRIC TEST: the test in which no constant of a population is used. Data do not follow any specific distribution and no assumption are made in these tests. E.g. to classify good, better and best we just allocate arbitrary numbers or marks to each category. 6 DHRUV J PATEL
7
3) HYPOTHESIS: It is a definite statement about the population parameters. 4) NULL HYPOTHESIS: (H0) states that no association exists between the two cross- tabulated variables in the population, and therefore the variables are statistically independent. E.g. if we want to compare 2 methods method A and method B for its superiority, and if the assumption is that both methods are equally good, then this assumption is called as NULL HYPOTHESIS. 7 DHRUV J PATEL
8
5) ALTERNATIVE HYPOTHESIS: (H1) proposes that the two variables are related in the population. If we assume that from 2 methods, method A is superior than method B, then this assumption is called as ALTERNATIVE HYPOTHESIS. 6) DEGREE OF FREEDOM: It denotes the extent of independence (freedom) enjoyed by a given set of observed frequencies Suppose we are given a set of n observed frequencies which are subjected to k independent constraints(restrictions) then, d.f. = (number of frequencies) – (number of independent constraints on them) In other terms, df = (r – 1)(c – 1) where r = the number of rows c = the number of columns. 8 DHRUV J PATEL
9
7) CONTINGENCY TABLE: When the table is prepared by enumeration of qualitative data by entering the actual frequencies, and if that table represents occurance of two sets of events, that table is called the contingency table. (Latin, con- together, tangere- to touch). It is also called as an association table. 9 DHRUV J PATEL
10
This test (as a non-parametric test) is based on frequencies and not on the parameters like mean and standard deviation. The test is used for testing the hypothesis and is not useful for estimation. This test can also be applied to a complex contingency table with several classes and as such is a very useful test in research work. This test is an important non-parametric test as no rigid assumptions are necessary in regard to the type of population, no need of parameter values and relatively less mathematical details are involved. 10 DHRUV J PATEL
11
1. Although test is conducted in terms of frequencies it can be best viewed conceptually as a test about proportions. 2. χ 2 test is used in testing hypothesis and is not useful for estimation. 3. Chi-square test can be applied to complex contingency table with several classes. 4. Chi-square test has a very useful property i.e., ‘the additive property’. If a number of sample studies are conducted in the same field, the results can be pooled together. This means that χ 2 -values can be added. 11 DHRUV J PATEL
12
1. Quantitative data. 2. One or more categories. 3. Independent observations. 4. Adequate sample size (at least 10). 5. Simple random sample. 6. Data in frequency form. 7. All observations must be used. 12 DHRUV J PATEL
13
1) Goodness of fit of distributions 2) Test of independence of attributes 3) Test of homogenity. 13 DHRUV J PATEL
14
This is used when you have one independent variable, and you want to compare an observed frequency-distribution to a theoretical expected frequency-distribution. For the example described above, there is a single independent variable (in this example “age group”) with a number of different levels (17-20, 21-30, 31-40, 41-50, 51-60 and over 60). 14 DHRUV J PATEL
15
The statistical question is: do the frequencies you actually observe differ from the expected frequencies by more than chance alone? In this case, we want to know whether or not our observed frequencies of traffic accidents occur equally frequently for the different ages groups (so that our theoretical frequency-distribution contains the same number of individuals in each of the age bands). This test enables us to see how well does the assumed theoretical distribution (such as Binomial distribution, Poisson distribution or Normal distribution) fit to the observed data. 15 DHRUV J PATEL
16
Where: O is the observed frequency, and E is the expected frequency. 16 DHRUV J PATEL
17
The chi-square test for goodness-of-fit uses frequency data from a sample to test hypotheses about the shape or proportions of a population. Each individual in the sample is classified into one category on the scale of measurement. The data, called observed frequencies, simply count how many individuals from the sample are in each category. 17 DHRUV J PATEL
18
This test can also be used to test whether the occurance of events follow uniformity or not e.g. the admission of patients in government hospital in all days of week is uniform or not can be tested with the help of chi square test. χ² (calculated) < χ² (tabulated), then null hypothesis is accepted, and it can be concluded that there is a uniformity in the occurance of the events. (uniformity in the admission of patients through out the week). 18 DHRUV J PATEL
19
Remember, qualitative data is where you collect data on individuals that are categories or names. Then you would count how many of the individuals had particular qualities. An example is that there is a theory that there is a relationship between breastfeeding and autism. To determine if there is a relationship, researchers could collect the time period that a mother breastfed her child and if that child was diagnosed with autism. 19 DHRUV J PATEL
20
Then you would have a table containing this information. Now you want to know if each cell is independent of each other cell. Remember, independence says that one event does not affect another event. Here it means that having autism is independent of being breastfed. What you really want is to see if they are not independent. In other words, does one affect the other? If you were to do a hypothesis test, this is your alternative hypothesis and the null hypothesis is that they are independent. 20 DHRUV J PATEL
21
There is a hypothesis test for this and it is called the Chi-Square Test for Independence. Technically it should be called the Chi-Square Test for Dependence, but for historical reasons it is known as the test for independence. Just as with previous hypothesis tests, all the steps are the same except for the assumptions and the test statistic. 21 DHRUV J PATEL
22
In probability theory and statistics, the chi- squared distribution (also chi-square or χ 2 - distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. The chi-square distribution is a special case of the gamma distribution and is one of the most widely used probability distributions in inferential statistics, e. g., in hypothesis testing or in construction of confidence intervals. DHRUV J PATEL 22
23
When it is being distinguished from the more general noncentral chi-squared distribution, this distribution is sometimes called the central chi- squared distribution. The chi-squared distribution is used in the common chi-squared tests for goodness of fit of an observed distribution to a theoretical one, the independence of two criteria of classification of qualitative data, and in confidence interval estimation for a population standard deviation of a normal distribution from a sample standard deviation. Many other statistical tests also use this distribution, such as Friedman's analysis of variance by ranks. DHRUV J PATEL 23
24
Various chi-square tests to deal with cases involving frequency data Enumeration data Categorical data Qualitative data Contingency table DHRUV J PATEL 24
25
DHRUV J PATEL 25 Where, –O = observed data in each category –E = expected data in each category based on the experimenter’s hypothesis –S = Sum of the calculations for each category
26
DHRUV J PATEL 26 1 Draw a chi-square table. Each row will be whom you voted for, giving us two columns for Obama and Romney. Each row will be where you live, giving us three rows – rural, suburban and urban.
27
DHRUV J PATEL 27 2 Calculate totals for each row and column. The purpose of the first column total is to find out how many votes Obama got from all areas. Similarly, the purpose of the first row total is to find out how rural votes were cast for either candidate.
28
DHRUV J PATEL 28 3 Calculate probabilities for each row and column. These will be the individual probabilities of voting Obama, voting Romney, living in the country, etc… For example, the Obama column total tells us that 54 out of 100 people polled voted Obama, so probability of voting Obama is 0.54.
29
DHRUV J PATEL 29 4 Calculate the joint probabilities of belonging to each category. For example, probability of being rural and an Obama voter is found by multiplying the probability of voting Obama (0.54) with the probability of living in the country (0.13). So, 0.54 x 0.13 = A person has a 0.0702 chance of being a rural Obama voter. In doing so, we assume that where you live and whom you voted for are independent. This assumption, called the null hypothesis, may well be wrong, and we will test it later by testing the joint probabilities it yielded.
30
DHRUV J PATEL 30 5 Based on these joint probabilities, how many people do we expect to belong to each category? We multiply the joint probability for each category by 100, the number of people.
31
DHRUV J PATEL 31 6 These expected numbers are based on the assumption (hypothesis) that whom you voted for and where you live are independent. We can test this hypothesis by holding these expected numbers against the actual numbers we have.. First, we need our chi-square value..
32
Basically, the equation asks that, for each category, you find the discrepancy between the observed number and expected number, square it, and then divide it by the expected number. Finally, add up the figures for each category. I got 0.769 as my chi-square value. DHRUV J PATEL 32
33
7 Look at a chi-square table. Note that out degrees of freedom in a chi square test is. In our case, with 3 rows and 2 columns, we get 2 degrees of freedom. For a 0.05 level of significance and 2 degrees of freedom, we get a threshold (minimum) chi- square value of 5.991. Since our chi-square value 0.769 is smaller than the minimum, we cannot reject the null hypothesis that where you live and who you voted for are independent. DHRUV J PATEL 33
34
1. N, the total frequency, should be reasonably large, say greater than 50. 2. The sample observations should be independent. This implies that no individual item should be included twice or more in the sample. 3. The constraints on the cell frequencies, if any, should be linear (i.e., they should not involve square and higher powers of the frequencies) such as ∑f o = ∑f e = N. DHRUV J PATEL 34
35
4. No theoretical frequency should be small. Small is a relative term. Preferably each theoretical frequency should be larger than 10 but in any case not less than 5. If any theoretical frequency is less than 5 then we cannot apply χ 2 -test as such. In that case we use the technique of “pooling” which consists in adding the frequencies which are less than 5 with the preceding or succeeding frequency (frequencies) so that the resulting sum is greater than 5 and adjust for the degrees of freedom accordingly. DHRUV J PATEL 35
36
5. The given distribution should not be replaced by relative frequencies or proportions but the data should be given in original units. 6. Yates’ correction should be applied in special circumstances when df = 1 (i.e. in 2 x 2 tables) and when the cell entries are small. 7. χ 2 -test is mostly used as a non-directional test (i.e. we make a two-tailed test). However, there may be cases when χ 2 tests can be employed in making a one-tailed test. In one-tailed test we double the P-value. For example with df = 1, the critical value of χ 2 at 05 level is 2.706 (2.706 is the value written under. 10 level) and the critical value of; χ 2 at.01 level is 5.412 (the value is written under the.02 level). DHRUV J PATEL 36
37
1. Testing the divergence of observed results from expected results when our expectations are based on the hypothesis of equal probability. 2. Chi-square test when expectations are based on normal distribution. 3. Chi-square test when our expectations are based on predetermined results. 4. Correction for discontinuity or Yates’ correction in calculating χ 2. 5. Chi-square test of independence in contingency tables. DHRUV J PATEL 37
38
In many families where the parents could have produced children of all four blood groups, the total number of children with each blood group was : –Blood group A –Blood group B –Blood group AB –Blood group O 26 31 39 24 –TOTAL120 Those numbers are the results observed and have been put into the O column in this table
39
Blood group Observed (o) Expected (E) (O-E)(O-E) 2 E A26 B31 AB39 O24 2 (O E)2E 2 (O E)2E Now we need to work out the expected results. This is done by using the ratio that was described before 1:1:1:1 …. We just divide the total amount observed by 4. So we would expect to have 30 people of each blood group… 120 / 4 = 30
40
Blood group Observed (o) Expected (E) (O-E)(O-E) 2 E A2630 B3130 AB3930 O2430 2 ( O E) 2 E Then we need to write that in the E column. Next we need to do O-E.
41
Blood group Observed (o) Expected (E) (O-E)(O-E) 2 E A263026-30 = -4 B313031-30 = 1 AB393039-30 = 9 O243024-30 = -6 Then we square all the ( O – E) results… Blood group Observed (o) Expected (E) (O-E)(O-E) 2 E A263026-30 = -4- 4 2 = 16 B313031-30 = 11 2 = 1 AB393039-30 = 99 2 = 81 O243024-30 = -6-6 2 = 36
42
Blood group Observed (o) Expected (E) (O-E)(O-E) 2 E A263026-30 = -4- 4 2 = 1616/30 = 0.53 B313031-30 = 11 2 = 11/30 = 0.03 AB393039-30 = 99 2 = 8181/30 = 2.7 O243024-30 = -6-6 2 = 3636/30 = 1.2 Now, we divide all the ( O – E ) 2 results by E Finally, we add up the ( O – E) 2 / E column…
43
Blood group Observed (o) Expected (E) (O-E)(O-E) 2 E A263026-30 = -4- 4 2 = 1616/30 = 0.53 B313031-30 = 11 2 = 11/30 = 0.03 AB393039-30 = 99 2 = 8181/30 = 2.7 O243024-30 = -6-6 2 = 3636/30 = 1.2 E 2 (O E) 2 4.46 So the result is 4.46 Then we need to look up this result on a probability table.
45
In statistics, Yates's correction for continuity (or Yates's chi-squared test) is used in certain situations when testing for independence in a contingency table. In some cases, Yates's correction may adjust too far, and so its current use is limited. The following is Yates's corrected version of Pearson's chi-squared statistics. DHRUV J PATEL 45
46
Pure mathematics is, in its way, the poetry of logical ideas. - Albert Einstein DHRUV J PATEL 46
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.