Chi-square test Chi-square test or  2 test. Chi-square test countsUsed to test the counts of categorical data ThreeThree types –Goodness of fit (univariate)

Chi-square test Chi-square test or  2 test

Chi-square test countsUsed to test the counts of categorical data ThreeThree types –Goodness of fit (univariate) –Independence (bivariate) –Homogeneity (univariate with two samples)

 2 distribution – df=3 df=5 df=10

 2 distribution Different df have different curves Skewed right normal curveAs df increases, curve shifts toward right & becomes more like a normal curve

 2 assumptions SRSSRS – reasonably random sample countsHave counts of categorical data & we expect each category to happen at least once Sample sizeSample size – to insure that the sample size is large enough we should expect at least five in each category. ***Be sure to list expected counts!! Combine these together: All expected counts are at least 5.

 2 formula

 2 Goodness of fit test Uses univariate data Want to see how well the observed counts “fit” what we expect the counts to be  2 cdf function p-valuesUse  2 cdf function on the calculator to find p-values Based on df – df = number of categories - 1

Hypotheses – written in words H 0 : proportions are equal H a : at least one proportion is not the same Be sure to write in context!

Does your zodiac sign determine how successful you will be? Fortune magazine collected the zodiac signs of 256 heads of the largest 400 companies. Is there sufficient evidence to claim that successful people are more likely to be born under some signs than others? Aries 23Libra18Leo20 Taurus20Scorpio21Virgo19 Gemini18Sagittarius19Aquarius24 Cancer23Capricorn22Pisces29 How many would you expect in each sign if there were no difference between them? How many degrees of freedom? I would expect CEOs to be equally born under all signs. So 256/12 = 21.333333 Since there are 12 signs – df = 12 – 1 = 11

A company says its premium mixture of nuts contains 10% Brazil nuts, 20% cashews, 20% almonds, 10% hazelnuts and 40% peanuts. You buy a large can and separate the nuts. Upon weighing them, you find there are 112 g Brazil nuts, 183 g of cashews, 207 g of almonds, 71 g or hazelnuts, and 446 g of peanuts. You wonder whether your mix is significantly different from what the company advertises? Why is the chi-square goodness-of-fit test NOT appropriate here? What might you do instead of weighing the nuts in order to use chi-square? counts Because we do NOT have counts of the type of nuts. count We could count the number of each type of nut and then perform a  2 test.

Offspring of certain fruit flies may have yellow or ebony bodies and normal wings or short wings. Genetic theory predicts that these traits will appear in the ratio 9:3:3:1 (yellow & normal, yellow & short, ebony & normal, ebony & short) A researcher checks 100 such flies and finds the distribution of traits to be 59, 20, 11, and 10, respectively. What are the expected counts? df? Are the results consistent with the theoretical distribution predicted by the genetic model? (see next page) Expected counts: Y & N = 56.25 Y & S = 18.75 E & N = 18.75 E & S = 6.25 We expect 9/16 of the 100 flies to have yellow and normal wings. (Y & N) Since there are 4 categories, df = 4 – 1 = 3

Assumptions: Have a random sample of fruit flies All expected counts are greater than 5. Expected counts: Y & N = 56.25, Y & S = 18.75, E & N = 18.75, E & S = 6.25 H 0 : The proportions of fruit flies are the same as the theoretical model. H a : At least one of the proportions of fruit flies is not the same as the theoretical model. P-value =  2 cdf(5.671, 10^99, 3) =.129  =.05 Since p-value > , I fail to reject H 0. There is not sufficient evidence to suggest that the distribution of fruit flies is not the same as the theoretical model.

 2 test for independence Used with categorical, bivariate data from ONE sample Used to see if the two categorical variables are associated (dependent) or not associated (independent)

Assumptions & formula remain the same!

Hypotheses – written in words H 0 : two variables are independent H a : two variables are dependent Be sure to write in context!

A beef distributor wishes to determine whether there is a relationship between geographic region and cut of meat preferred. If there is no relationship, we will say that beef preference is independent of geographic region. Suppose that, in a random sample of 500 customers, 300 are from the North and 200 from the South. Also, 150 prefer cut A, 275 prefer cut B, and 75 prefer cut C.

If beef preference is independent of geographic region, how would we expect this table to be filled in? NorthSouthTotal Cut A150 Cut B275 Cut C75 Total300200500 9060 165110 4530

Expected Counts Assuming H 0 is true,

Degrees of freedom Or cover up one row & one column & count the number of cells remaining!

Suppose that in the actual sample of 500 consumers the observed numbers were as follows: Is there sufficient evidence to suggest that geographic regions and beef preference are not independent? (Is there a difference between the expected and observed counts?) NorthSouthTotal Cut A10050150 Cut B150125275 Cut C502575 Total300200500

Assumptions: Have a random sample of people All expected counts are greater than 5. H 0 : geographic region and beef preference are independent H a : geographic region and beef preference are dependent P-value =.0226df = 2  =.05 Since p-value < , I reject H 0. There is sufficient evidence to suggest that geographic region and beef preference are dependent. Expected Counts: N S A90 60 B165110 C45 30

 2 test for homogeneity single categorical two (or more) independent samplesUsed with a single categorical variable from two (or more) independent samples Used to see if the two populations are the same (homogeneous)

Assumptions & formula remain the same! Expected counts & df are found the same way as test for independence. Only Only change is the hypotheses!

Hypotheses – written in words H 0 : the proportions for the two (or more) distributions are the same H a : At least one of the proportions for the distributions is different Be sure to write in context!

The following data is on drinking behavior for independently chosen random samples of male and female students. Does there appear to be a gender difference with respect to drinking behavior? (Note: low = 1-7 drinks/wk, moderate = 8-24 drinks/wk, high = 25 or more drinks/wk) MenWomenTotal None140186326 Low4786611139 Moderate300173473 High631679 Total98110362017

Assumptions: Have 2 random sample of students All expected counts are greater than 5. H 0 : the proportions of drinking behaviors is the same for female & male students H a : at least one of the proportions of drinking behavior is different for female & male students P-value =.000df = 3  =.05 Since p-value < , I reject H 0. There is sufficient evidence to suggest that drinking behavior is not the same for female & male students. Expected Counts: M F 0163163 L569.5569.5 M236.5236.5 H39.539.5

Chi-square test Chi-square test or  2 test. Chi-square test countsUsed to test the counts of categorical data ThreeThree types –Goodness of fit (univariate)

Similar presentations

Presentation on theme: "Chi-square test Chi-square test or  2 test. Chi-square test countsUsed to test the counts of categorical data ThreeThree types –Goodness of fit (univariate)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chi-square test Chi-square test or  2 test. Chi-square test countsUsed to test the counts of categorical data ThreeThree types –Goodness of fit (univariate)

Similar presentations

Presentation on theme: "Chi-square test Chi-square test or  2 test. Chi-square test countsUsed to test the counts of categorical data ThreeThree types –Goodness of fit (univariate)"— Presentation transcript:

Similar presentations

About project

Feedback