Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chi-square test Chi-square test or  2 test. crazy What if we are interested in seeing if my “crazy” dice are considered “fair”? What can I do?

Similar presentations


Presentation on theme: "Chi-square test Chi-square test or  2 test. crazy What if we are interested in seeing if my “crazy” dice are considered “fair”? What can I do?"— Presentation transcript:

1 Chi-square test Chi-square test or  2 test

2 crazy What if we are interested in seeing if my “crazy” dice are considered “fair”? What can I do?

3 Chi-square test countsUsed to test the counts of categorical data ThreeThree types –Goodness of fit (univariate) –Independence (bivariate) –Homogeneity (univariate with two samples)

4 Chi-square distributions

5 Upper-tail Areas for Chi-square Distributions

6  2 distribution Different df have different curves Skewed right Cannot take on negative values normal curveAs df increases, curve shifts toward right & becomes more like a normal curve Each curve has a mode at df-2 and a mean at df

7  2 assumptions SRSSRS – reasonably random sample countsHave counts of categorical data & we expect each category to happen at least once Sample sizeSample size – to insure that the sample size is large enough we should expect at least five in each category. ***Be sure to list expected counts!! Combine these together: All expected counts are at least 5.

8 Hypotheses – written in words H 0 : proportions are equal H a : at least one proportion is not the same Be sure to write in context!

9  2 formula

10  2 Goodness of fit test Uses univariate data (one sample, one variable) Want to see how well the observed counts “fit” what we expect the counts to be  2 cdf function p-valuesUse  2 cdf function on the calculator to find p-values Based on df – df = number of categories - 1

11 Does your zodiac sign determine how successful you will be? Fortune magazine collected the zodiac signs of 256 heads of the largest 400 companies. Is there sufficient evidence to claim that successful people are more likely to be born under some signs than others? Aries 23Libra18Leo20 Taurus20Scorpio21Virgo19 Gemini18Sagittarius19Aquarius24 Cancer23Capricorn22Pisces29 How many would you expect in each sign if there were no difference between them? How many degrees of freedom? I would expect CEOs to be equally born under all signs. So 256/12 = 21.333333 Since there are 12 signs – df = 12 – 1 = 11

12 Assumptions: Have a random sample of CEO’s All expected counts are greater than 5. (I expect 21.33 CEO’s to be born in each sign.) H 0 : The proportions of CEO’s born under each sign are the same. H a : At least one of the proportion of CEO’s born under each sign is different.

13 2.) Compute the residuals. (Observed – Expected) SignObserved value Expected value (256/12) Residual = Observed - expected Aires2321.3331.667 Taurus2021.333-1.333 Gemini1821.333-3.333 Cancer2321.3331.667 Leo2021.333-1.333 Virgo1921.333-2.333 Libra1821.333-3.333 Scorpio2121.333-0.333 Sagittarius1921.333-2.333 Capricorn2221.3330.667 Aquarius2421.3332.667 Pisces2921.3337.667

14 3.) Square the residuals SignObserved value Expected value (256/12) Residual = Observed - expected (Observed- expected) 2 Aires2321.3331.6672.778889 Taurus2021.333-1.3331.776889 Gemini1821.333-3.33311.108889 Cancer2321.3331.6672.778889 Leo2021.333-1.3331.776889 Virgo1921.333-2.3335.442889 Libra1821.333-3.33311.108889 Scorpio2121.333-0.3330.110889 Sagittarius1921.333-2.3335.442889 Capricorn2221.3330.6670.444889 Aquarius2421.3332.6677.112889 Pisces2921.3337.66758.782889

15 4. Compute the components for each cell SignObserved value Expected value (256/12) Residual = Observed - expected (Observed- expected) 2 Expected value Aires2321.3331.6672.7788890.130262 Taurus2021.333-1.3331.7768890.083293 Gemini1821.333-3.33311.1088890.520737 Cancer2321.3331.6672.7788890.130262 Leo2021.333-1.3331.7768890.083293 Virgo1921.333-2.3335.4428890.255139 Libra1821.333-3.33311.1088890.520737 Scorpio2121.333-0.3330.1108890.005198 Sagittarius1921.333-2.3335.4428890.255139 Capricorn2221.3330.6670.4448890.020854 Aquarius2421.3332.6677.1128890.333422 Pisces2921.3337.66758.7828892.755491

16 5. Find the sum of the components (that’s the chi-square statistic) SignObserved value Expected value (256/12) Residual = Observed - expected (Observed- expected) 2 Expected value Aires2321.3331.6672.7788890.130262 Taurus2021.333-1.3331.7768890.083293 Gemini1821.333-3.33311.1088890.520737 Cancer2321.3331.6672.7788890.130262 Leo2021.333-1.3331.7768890.083293 Virgo1921.333-2.3335.4428890.255139 Libra1821.333-3.33311.1088890.520737 Scorpio2121.333-0.3330.1108890.005198 Sagittarius1921.333-2.3335.4428890.255139 Capricorn2221.3330.6670.4448890.020854 Aquarius2421.3332.6677.1128890.333422 Pisces2921.3337.66758.7828892.755491 Σ = 5.094

17 P-value =  2 cdf(5.094, 10^99, 11) =.9265  =.05 Since p-value > , I fail to reject H 0. There is not sufficient evidence to suggest that the CEOs are born under some signs more than under others.

18 Let’s test our dice!

19 Suppose we roll a six-sided die 60 times and obtain the outcomes recording in the table. Do you think this die is fair? OutcomeTalliesObserved rolls Expected value (60/6) (Observed - expected) 2 (Observed - expected) 2 / Exp. value 1 10 2 3 4 5 6 Totals60 Σ= Assumptions: SRS All expected counts are > 5 Hypotheses: H o : The proportion of rolls is the same for each number. H a : The proportion of rolls is not the same for each number.

20 “Since the p-value ) , I reject (fail to reject) the H 0. There is (is not) sufficient evidence to suggest that H a.” Be sure to write H a in context (words)! P-value =  2 cdf(  2, 10^99, df)

21 Offspring of certain fruit flies may have yellow or ebony bodies and normal wings or short wings. Genetic theory predicts that these traits will appear in the ratio 9:3:3:1 (yellow & normal, yellow & short, ebony & normal, ebony & short) A researcher checks 100 such flies and finds the distribution of traits to be 59, 20, 11, and 10, respectively. What are the expected counts? df? Are the results consistent with the theoretical distribution predicted by the genetic model? (see next page) Expected counts: Y & N = 56.25 Y & S = 18.75 E & N = 18.75 E & S = 6.25 We expect 9/16 of the 100 flies to have yellow and normal wings. (Y & N) Since there are 4 categories, df = 4 – 1 = 3

22 Assumptions: Have a random sample of fruit flies All expected counts are greater than 5. Expected counts: Y & N = 56.25, Y & S = 18.75, E & N = 18.75, E & S = 6.25 H 0 : The proportions of fruit flies are the same as the theoretical model. H a : At least one of the proportions of fruit flies is not the same as the theoretical model. P-value =  2 cdf(5.671, 10^99, 3) =.129  =.05 Since p-value > , I fail to reject H 0. There is not sufficient evidence to suggest that the distribution of fruit flies is not the same as the theoretical model.

23 A company says its premium mixture of nuts contains 10% Brazil nuts, 20% cashews, 20% almonds, 10% hazelnuts and 40% peanuts. You buy a large can and separate the nuts. Upon weighing them, you find there are 112 g Brazil nuts, 183 g of cashews, 207 g of almonds, 71 g or hazelnuts, and 446 g of peanuts. You wonder whether your mix is significantly different from what the company advertises? Why is the chi-square goodness-of-fit test NOT appropriate here? What might you do instead of weighing the nuts in order to use chi-square? counts Because we do NOT have counts of the type of nuts. count We could count the number of each type of nut and then perform a  2 test.

24 Example: Does the color of a car influence the chance that it will be stolen? Of 830 cars reported stolen, 140 were white, 100 were blue, 270 were red, 230 were black, and 90 were other colors. It is known that 15% of all cars are white, 15% are blue, 35% are red, 30% are black, and 5% are other colors. CategoryColorObservedExpected 1White140.15*830 = 124.5 2Blue100.15*830 = 124.5 3Red270.35*830 = 290.5 4Black230.30*830 = 249 5Other90.05*830 = 41.5

25 CategoryColorObservedExpected 1White140124.5 2Blue100124.5 3Red270290.5 4Black230249 5Other9041.5 Let π 1, π 2,... Π 5 denote true proportions of stolen cars that fall into the 5 color categories H o : π 1 =.15, π 2 =.15, π 3 =.35, π 4 =.30, π 5 =.05 H a : H o is not true. α =.01 Test statistic: Assumptions: The sample was a random sample of stolen cars. All expected counts are greater than 5, so the sample size is large enough to use the chi-square test.

26 Calculations: = 1.93 + 4.82 + 1.45 + 1.45 + 56.68 = 66.33 P-value: All expected counts exceed 5, so the P-value can be based on a chi-square distribution with 4 df. The computed value is larger than 18.46, so P-value <.001. Because P-value < α, H o is rejected. There is convincing evidence that at least one of the color proportions for stolen cars differs from the corresponding proportion for all cars.


Download ppt "Chi-square test Chi-square test or  2 test. crazy What if we are interested in seeing if my “crazy” dice are considered “fair”? What can I do?"

Similar presentations


Ads by Google