Non-Parametric Statistics Part I: Chi-Square
x 2 Operates on FREQUENCY Data Suppose we have a plot of land on which we hope to harvest wood. Maple is more valuable than Oak and Oak more valuable than pine. We take a sample of the trees (the whole plot is too big) and we ask whether there are significantly unequal amounts of each type ( =.05). We cannot get a mean from these data but there are clear differences between the amounts in each category. This is categorical or nominal data experessed as frequencies. So we use the x 2 PineMapleOak # of trees
What are the null and alternative hypotheses? = ( )/3 = 245 PineMapleOak # of trees observed # of trees expected x 2 : Homogeneity = H0: The groups have equal frequencies. H1: The groups do not have equal frequencies. Find the critical value: Calculate the obtained statistic: x 2 table (k-1 df = 3-1= 2) = 5.99 Make a decision: Our obtained value is larger than our critical value. Reject the null; the groups do not have equal frequencies.
What are the null and alternative hypotheses? = ( )/3 = 10 DemocratRepublicanOther # of people observed x 2 : Homogeneity Example = 5 H0: The groups have equal frequencies. H1: The groups do not have equal frequencies. Find the critical value: Calculate the obtained statistic: x 2 table (k-1 df = 3-1= 2) = 9.21 Make a decision: Our obtained value is smaller than our critical value. Retain the null; the groups have equal frequencies # of people expected Is political affiliation distributed equally in our class? (use alpha=.01)
PineMapleOak # trees Five years ago the tree-lot was also sampled. Has the composition of the lot changed since then (use alpha=.05)? x 2 : Goodness of Fit PineMapleOak # trees We need a different expected value based on the previous sample. Notice we’re trying to compare the frequencies from two time points, but the total # of trees categorized in 2014 is different from the 2009 total! Total # Pine proportion = 255/473 = 0.54 Maple proportion = 115/473 = 0.24 Oak proportion = 103/473 = 0.22 Pine expected = 0.54(735) = Maple expected = 0.24(735) = Oak expected = 0.22(735) = PineMapleOak # trees expected
PineMapleOak # trees expected What are the null and alternative hypotheses? x 2 : Goodness of Fit Example = H0: The composition of the lot has not changed. H1: The composition of the lot has changed. Find the critical value: Calculate the obtained statistic: x 2 table (k-1 df = 3-1= 2) = 5.99 Make a decision: Our obtained value is larger than our critical value. Reject the null; the composition of the lot has changed. PineMapleOak # trees
Pine Maple Oak Mirkwood Old Forest H0: Tree type and Forest are independent. H1: Tree type and Forest are not independent. x 2 : Independence
What are the null and alternative hypotheses? x 2 : Independence Example (assume alpha=.05) H0: Tree type and forest are independent. H1: Tree type and forest and not independent. Find the critical value: Calculate the obtained statistic: x 2 table (df = 2) = 5.99 df for this test is (r-1)(c-1) We have 2 rows and 3 columns, so (2-1)(3-1) = 2
Expected value = (R x C)/ grand total R C Grand Total: 1500 Expected Old Forest-Pine = (798 x 356)/1500 = Expected Mirkwood-Pine = (702 x 356)/1500 = Pine Maple Oak Mirkwood Old Forest x 2 : Independence How to calculate expected values:
Observed ValuesExpected Values Pine Maple Oak Mirkwood Old Forest Pine Maple Oak Mirkwood Old Forest x 2 = E = x 2 : Independence
What are the null and alternative hypotheses? x 2 : Independence Example (assume alpha=.05) H0: Tree type and forest are independent. H1: Tree type and forest and not independent. Find the critical value: Calculate the obtained statistic: x 2 table (df = 2) = 5.99 df for this test is (r-1)(c-1) We have 2 rows and 3 columns, so (2-1)(3-1) = Make a decision: Our obtained value is larger than our critical value. Reject the null; tree type and forest are not independent.