Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chi-square test Chi-square test or  2 test. Chi-square test countsUsed to test the counts of categorical data ThreeThree types –Goodness of fit (univariate)

Similar presentations


Presentation on theme: "Chi-square test Chi-square test or  2 test. Chi-square test countsUsed to test the counts of categorical data ThreeThree types –Goodness of fit (univariate)"— Presentation transcript:

1 Chi-square test Chi-square test or  2 test

2 Chi-square test countsUsed to test the counts of categorical data ThreeThree types –Goodness of fit (univariate) –Independence (bivariate) –Homogeneity (univariate with two samples)

3  2 distribution – df=3 df=5 df=10

4 Chi-square distributions

5  2 distribution Different df have different curves Skewed right normal curveAs df increases, curve shifts toward right & becomes more like a normal curve

6  2 assumptions SRSSRS – reasonably random sample countsHave counts of categorical data & we expect each category to happen at least once Sample sizeSample size – to insure that the sample size is large enough we should expect at least five in each category. ***Be sure to list expected counts!! Combine these together: All expected counts are at least 5.

7  2 formula

8  2 Goodness of fit test Uses univariate data Want to see how well the observed counts “fit” what we expect the counts to be  2 cdf function p-valuesUse  2 cdf function on the calculator to find p-values Based on df – df = number of categories - 1

9 Hypotheses – written in words H 0 : proportions are equal H a : at least one proportion is not the same Be sure to write in context!

10 Example: Does the color of a car influence the chance that it will be stolen? Of 830 cars reported stolen, 140 were white, 100 were blue, 270 were red, 230 were black, and 90 were other colors. It is known that 15% of all cars are white, 15% are blue, 35% are red, 30% are black, and 5% are other colors. CategoryColorObservedExpected 1White140.15*830 = 124.5 2Blue100.15*830 = 124.5 3Red270.35*830 = 290.5 4Black230.30*830 = 249 5Other90.05*830 = 41.5

11 CategoryColorObservedExpected 1White140124.5 2Blue100124.5 3Red270290.5 4Black230249 5Other9041.5 Let π 1, π 2,... Π 5 denote true proportions of stolen cars that fall into the 5 color categories H o : π 1 =.15, π 2 =.15, π 3 =.35, π 4 =.30, π 5 =.05 H a ; H o is not true. α =.01 Test statistic: Assumptions: The sample was a random sample of stolen cars. All expected counts are greater than 5, so the sample size is large enough to use the chi-square test.

12 Calculations: = 1.93 + 4.82 + 1.45 + 1.45 + 56.68 = 66.33 P-value: All expected counts exceed 5, so the P-value can be based on a chi-square distribution with 4 df. The computed value is larger than 18.46, so P-value <.001. Because P-value < α, H o is rejected. There is convincing evidence that at least one of the color proportions for stolen cars differs from the corresponding proportion for all cars.

13  2 test for independence Used with categorical, bivariate data from ONE sample Used to see if the two categorical variables are associated (dependent) or not associated (independent)

14 Assumptions & formula remain the same!

15 Hypotheses – written in words H 0 : two variables are independent H a : two variables are dependent Be sure to write in context!

16 A beef distributor wishes to determine whether there is a relationship between geographic region and cut of meat preferred. If there is no relationship, we will say that beef preference is independent of geographic region. Suppose that, in a random sample of 500 customers, 300 are from the North and 200 from the South. Also, 150 prefer cut A, 275 prefer cut B, and 75 prefer cut C.

17 If beef preference is independent of geographic region, how would we expect this table to be filled in? NorthSouthTotal Cut A150 Cut B275 Cut C75 Total300200500 9060 165110 4530

18 Expected Counts Assuming H 0 is true,

19 Degrees of freedom Or cover up one row & one column & count the number of cells remaining!

20  2 test for homogeneity single categorical two (or more) independent samplesUsed with a single categorical variable from two (or more) independent samples Used to see if the two populations are the same (homogeneous)

21 Assumptions & formula remain the same! Expected counts & df are found the same way as test for independence. Only Only change is the hypotheses!

22 Hypotheses – written in words H 0 : the proportions for the two (or more) distributions are the same H a : At least one of the proportions for the distributions is different Be sure to write in context!

23 The following data is on drinking behavior for independently chosen random samples of male and female students. Does there appear to be a gender difference with respect to drinking behavior? (Note: low = 1-7 drinks/wk, moderate = 8-24 drinks/wk, high = 25 or more drinks/wk)

24 Assumptions: Have 2 random sample of students All expected counts are greater than 5. H 0 : the proportions of drinking behaviors is the same for female & male students H a : at least one of the proportions of drinking behavior is different for female & male students P-value =.000df = 3  =.05 Since p-value < , I reject H 0. There is sufficient evidence to suggest that drinking behavior is not the same for female & male students. Expected Counts: M F 0158.6167.4 L554.0585.0 M230.1243.0 H38.440.6


Download ppt "Chi-square test Chi-square test or  2 test. Chi-square test countsUsed to test the counts of categorical data ThreeThree types –Goodness of fit (univariate)"

Similar presentations


Ads by Google