Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chi-square Goodness of Fit

Similar presentations


Presentation on theme: "Chi-square Goodness of Fit"— Presentation transcript:

1 Chi-square Goodness of Fit
Friday, Dec. 3 Chi-square Goodness of Fit Chi-square Test of Independence: Two Variables. Summing up!

2

3

4

5

6 gg yy yg yg yg yg yy yg gg gy 25% 25% % %

7

8 Pea Color freq Observed freq Expected
Yellow Green TOTAL

9 2  Chi Square Goodness of Fit Pea Color freq Observed freq Expected
Yellow Green TOTAL 2 = (fo - fe)2 fe i=1 k d.f. = k - 1, where k = number of categories of in the variable.

10 “… the general level of agreement between Mendel’s expectations and his reported results shows that it is closer than would be expected in the best of several thousand repetitions. The data have evidently been sophisticated systematically, and after examining various possibilities, I have no doubt that Mendel was deceived by a gardening assistant, who knew only too well what his principal expected from each trial made…” -- R. A. Fisher

11 2  Mendel's Cooking! Chi Square Goodness of Fit
Pea Color freq Observed freq Expected Yellow Green TOTAL 2 = (fo - fe)2 fe i=1 k d.f. = k - 1, where k = number of categories of in the variable.

12 Peas to Kids: Another Example
Goodness of Fit At my children’s school science fair last year, where participation was voluntary but strongly encouraged, I counted about 60 boys and 40 girls who had submitted entries. Since I expect a ratio of 50:50 if there were no gender preference for submission, is this observation deviant, beyond chance level?

13 Boys Girls Expected: Observed:

14 Boys Girls Expected: Observed: 2 = (fo - fe)2 fe i=1 k

15 2  Boys Girls Expected: 50 50 Observed: 60 40 (fo - fe)2 = fe
k (fo - fe)2 = fe i=1 For each of k categories, square the difference between the observed and the expected frequency, divide by the expected frequency, and sum over all k categories.

16 2  Boys Girls Expected: 50 50 Observed: 60 40 (fo - fe)2 = = +
k (fo - fe)2 (60-50)2 (40-50)2 = = + = 4.00 fe i=1 For each of k categories, square the difference between the observed and the expected frequency, divide by the expected frequency, and sum over all k categories.

17 2  Boys Girls Expected: 50 50 Observed: 60 40 (fo - fe)2 = = +
k (fo - fe)2 (60-50)2 (40-50)2 = = + = 4.00 fe i=1 For each of k categories, square the difference between the observed and the expected frequency, divide by the expected frequency, and sum over all k categories. This value, chi-square, will be distributed with known probability values, where the degrees of freedom is a function of the number of categories (not n). In this one-variable case, d.f. = k - 1.

18 2  Boys Girls Expected: 50 50 Observed: 60 40 (fo - fe)2 = = +
k (fo - fe)2 (60-50)2 (40-50)2 = = + = 4.00 fe i=1 For each of k categories, square the difference between the observed and the expected frequency, divide by the expected frequency, and sum over all k categories. This value, chi-square, will be distributed with known probability values, where the degrees of freedom is a function of the number of categories (not n). In this one-variable case, d.f. = k - 1. Critical value of chi-square at =.05, d.f.=1 is 3.84, so reject H0.

19 Chi-square Test of Independence
Are two nominal level variables related or independent from each other? Is race related to SES, or are they independent?

20 White Black Hi 12 3 15 SES 32 Lo 16 16 47 28 19

21 Row n x Column n Total n The expected frequency of any given cell is White Black Hi 12 3 15 SES 32 Lo 16 16 47 28 19

22 2 = (fo - fe)2 fe r=1 r c=1 c At d.f. = (r - 1)(c - 1)

23 The expected frequency of any given cell is
Row n x Column n Total n (15x28)/47 (15x19)/47 15 (32x28)/47 (32x19)/47 32 47 28 19

24 The expected frequency of any given cell is
Row n x Column n Total n (15x28)/47 (15x19)/47 15 8.94 6.06 (32x28)/47 (32x19)/47 32 19.06 12.94 47 28 19

25 2  Please calculate: = (fo - fe)2 fe 12 3 15 8.94 6.06 16 16 32
r=1 r c=1 c 12 3 15 8.94 6.06 16 16 32 19.06 12.94 47 28 19

26 Important assumptions:
Independent observations. Observations are mutually exclusive. Expected frequencies should be reasonably large: d.f. 1, at least 5 d.f. 2, >2 d.f. >3, if all expected frequencies but one are greater than or equal to 5 and if the one that is not is at least equal to 1.

27 Univariate Statistics:
Interval Mean one-sample t-test Ordinal Median Nominal Mode Chi-squared goodness of fit

28 Y X Bivariate Statistics Nominal Ordinal Interval
Nominal 2 Rank-sum t-test Kruskal-Wallis H ANOVA Ordinal Spearman rs (rho) Interval Pearson r Regression X

29 Who said this? "The definition of insanity is doing the same thing over and over again and expecting different results".

30 I hate this quote! I hate this quote! I hate this quote!
Who said this? I hate this quote! "The definition of insanity is doing the same thing over and over again and expecting different results". I hate this quote! I hate this quote! I hate this quote! I hate this quote!

31 I don’t like it because from a statistical point of view, it is insane to do the same thing over and over again and expect the same results! More to the point, the wisdom of statistics lies in understanding that repeating things some ways ends up with results that are more the same than others. Hmm. Think about this for a moment. Statistics allows one to understand the expected variability in results even when the same thing is done, as a function of σ and N.

32 Your turn! Given this start, explain why uncle Albert heads us down the wrong path. In your answer, make sure you refer to the error statistic (e.g., standard error of the mean, standard error of the difference between means, Mean Square within) as well as the sample size N. In short, explain why statistical thinking is beautiful, and why Albert Einstein (if he ever said it) was wrong.


Download ppt "Chi-square Goodness of Fit"

Similar presentations


Ads by Google