Chi-square test or c2 test

Slides:



Advertisements
Similar presentations
Chi-square test or c2 test
Advertisements

Chapter 11 Other Chi-Squared Tests
Chi-square test Chi-square test or  2 test. Chi-square test countsUsed to test the counts of categorical data ThreeThree types –Goodness of fit (univariate)
 2 test for independence Used with categorical, bivariate data from ONE sample Used to see if the two categorical variables are associated (dependent)
AP Statistics Tuesday, 15 April 2014 OBJECTIVE TSW (1) identify the conditions to use a chi-square test; (2) examine the chi-square test for independence;
Multinomial Experiments Goodness of Fit Tests We have just seen an example of comparing two proportions. For that analysis, we used the normal distribution.
The Analysis of Categorical Data and Goodness of Fit Tests
Chapter 11 Inference for Distributions of Categorical Data
Chi-square test Chi-square test or  2 test. Chi-square test countsUsed to test the counts of categorical data ThreeThree types –Goodness of fit (univariate)
Presentation 12 Chi-Square test.
Does your zodiac sign determine how successful you will be? Fortune magazine collected the zodiac signs of 256 heads of the largest 400 companies. Is there.
Chapter 13: Inference for Tables – Chi-Square Procedures
Chapter 26: Comparing Counts AP Statistics. Comparing Counts In this chapter, we will be performing hypothesis tests on categorical data In previous chapters,
 2 test for independence Used with categorical, bivariate data from ONE sample Used to see if the two categorical variables are associated (dependent)
Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.
Chi-square test Chi-square test or  2 test Notes: Page 217, and your own notebook paper 1.Goodness of Fit 2.Independence 3.Homogeneity.
Chi-square test Chi-square test or  2 test. crazy What if we are interested in seeing if my “crazy” dice are considered “fair”? What can I do?
Chapter 11 Chi Square Distribution and goodness of fit.
Chi-square test or c2 test
Chi-square test Chi-square test or  2 test Notes: Page Goodness of Fit 2.Independence 3.Homogeneity.
Chi-square test Chi-square test or  2 test. crazy What if we are interested in seeing if my “crazy” dice are considered “fair”? What can I do?
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests.
Chi-square test Chi-square test or  2 test. Chi-square test countsUsed to test the counts of categorical data ThreeThree types –Goodness of fit (univariate)
+ Section 11.1 Chi-Square Goodness-of-Fit Tests. + Introduction In the previous chapter, we discussed inference procedures for comparing the proportion.
Chi-Squared Test of Homogeneity Are different populations the same across some characteristic?
The χ 2 (Chi-Squared) Test. Crazy Dice? You roll a die 60 times and get: 3 ones, 6 twos, 19 threes, 22 fours, 6 fives, and 4 sixes  Is this a fair die?
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
AP Statistics Tuesday, 05 April 2016 OBJECTIVE TSW (1) identify the conditions to use a chi-square test; (2) examine the chi- square test for independence;
Section 10.2 Objectives Use a contingency table to find expected frequencies Use a chi-square distribution to test whether two variables are independent.
Chi-square test or c2 test
Inference for Tables: Chi-Squares procedures (2 more chapters to go!)
Check your understanding: p. 684
CHAPTER 11 Inference for Distributions of Categorical Data
Warm up On slide.
CHAPTER 11 Inference for Distributions of Categorical Data
Comparing Counts Chi Square Tests Independence.
Chi-square test or χ2 test
Other Chi-Square Tests
Presentation 12 Chi-Square test.
CHAPTER 11 Inference for Distributions of Categorical Data
Chi-square test or c2 test
Chi-squared test or c2 test
Chi-square test or c2 test
Chi-square test or c2 test
Chapter 25 Comparing Counts.
Chi-square test or c2 test
1) A bicycle safety organization claims that fatal bicycle accidents are uniformly distributed throughout the week. The table shows the day of the week.
Chapter 11 Goodness-of-Fit and Contingency Tables
The Analysis of Categorical Data and Chi-Square Procedures
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Lecture 18 Section 8.3 Objectives: Chi-squared distributions
Chapter 11: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Contingency Tables: Independence and Homogeneity
Chi-square test or c2 test
The Analysis of Categorical Data and Goodness of Fit Tests
The Analysis of Categorical Data and Goodness of Fit Tests
Chapter 26 Comparing Counts.
Chapter 13: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
The Analysis of Categorical Data and Goodness of Fit Tests
The Analysis of Categorical Data and Goodness of Fit Tests
UNIT V CHISQUARE DISTRIBUTION
CHAPTER 11 Inference for Distributions of Categorical Data
S.M.JOSHI COLLEGE, HADAPSAR
Chapter 26 Comparing Counts.
Chi-Square Test A fundamental problem in Science is determining whether the experiment data fits the results expected. How can you tell if an observed.
Chi-square test or c2 test
Inference for Distributions of Categorical Data
Presentation transcript:

Chi-square test or c2 test

What if we are interested in seeing if my “crazy” dice are considered “fair”? What can I do?

Chi-square test Used to test the counts of categorical data Three types Goodness of fit (univariate) Independence (bivariate) Homogeneity (univariate with two samples)

c2 distribution – df=3 df=5 df=10

c2 distribution Different df have different curves Skewed right As df increases, curve shifts toward right & becomes more like a normal curve

c2 assumptions SRS – reasonably random sample Have counts of categorical data & we expect each category to happen at least once Sample size – to insure that the sample size is large enough we should expect at least five in each category. ***Be sure to list expected counts!!

c2 formula

df = number of categories - 1 c2 Goodness of fit test Based on df – df = number of categories - 1 Uses univariate data Want to see how well the observed counts “fit” what we expect the counts to be Use c2cdf function on the calculator to find p-values

Hypotheses – written in words H0: the observed counts equal the expected counts Ha: the observed counts are not equal to the expected counts Be sure to write in context!

I would expect CEOs to be equally born under all signs. Does your zodiac sign determine how successful you will be? Fortune magazine collected the zodiac signs of 256 heads of the largest 400 companies. Is there sufficient evidence to claim that successful people are more likely to be born under some signs than others? Aries 23 Libra 18 Leo 20 Taurus 20 Scorpio 21 Virgo 19 Gemini 18 Sagittarius 19 Aquarius 24 Cancer 23 Capricorn 22 Pisces 29 How many would you expect in each sign if there were no difference between them? How many degrees of freedom? I would expect CEOs to be equally born under all signs. So 256/12 = 21.333333 Since there are 12 signs – df = 12 – 1 = 11

Assumptions: Have a random sample of CEO’s All expected counts are greater than 5. (I expect 21.33 CEO’s to be born in each sign.) H0: The number of CEO’s born under each sign is the same. Ha: The number of CEO’s born under each sign is the different. P-value = c2cdf(5.094, 10^99, 11) = .9265 a = .05 Since p-value > a, I fail to reject H0. There is not sufficient evidence to suggest that the CEOs are born under some signs than others.

Because we do NOT have counts of the type of nuts. A company says its premium mixture of nuts contains 10% Brazil nuts, 20% cashews, 20% almonds, 10% hazelnuts and 40% peanuts. You buy a large can and separate the nuts. Upon weighing them, you find there are 112 g Brazil nuts, 183 g of cashews, 207 g of almonds, 71 g or hazelnuts, and 446 g of peanuts. You wonder whether your mix is significantly different from what the company advertises? Why is the chi-square goodness-of-fit test NOT appropriate here? What might you do instead of weighing the nuts in order to use chi-square? Because we do NOT have counts of the type of nuts. We could count the number of each type of nut and then perform a c2 test.

Since there are 4 categories, Offspring of certain fruit flies may have yellow or ebony bodies and normal wings or short wings. Genetic theory predicts that these traits will appear in the ratio 9:3:3:1 (yellow & normal, yellow & short, ebony & normal, ebony & short) A researcher checks 100 such flies and finds the distribution of traits to be 59, 20, 11, and 10, respectively. What are the expected counts? df? Are the results consistent with the theoretical distribution predicted by the genetic model? (see next page) Since there are 4 categories, df = 4 – 1 = 3 Expected counts: Y & N = 56.25 Y & S = 18.75 E & N = 18.75 E & S = 6.25 We expect 9/16 of the 100 flies to have yellow and normal wings. (Y & N)

Assumptions: Have a random sample of fruit flies All expected counts are greater than 5. Expected counts: Y & N = 56.25, Y & S = 18.75, E & N = 18.75, E & S = 6.25 H0: The distribution of fruit flies is the same as the theoretical model. Ha: The distribution of fruit flies is not the same as the theoretical model. P-value = c2cdf(5.671, 10^99, 3) = .129 a = .05 Since p-value > a, I fail to reject H0. There is not sufficient evidence to suggest that the distribution of fruit flies is not the same as the theoretical model.

c2 test for independence Used with categorical, bivariate data from ONE sample Used to see if the two categorical variables are associated (dependent) or not associated (independent)

Assumptions & formula remain the same!

Hypotheses – written in words H0: two variables are independent Ha: two variables are dependent Be sure to write in context!

A beef distributor wishes to determine whether there is a relationship between geographic region and cut of meat preferred. If there is no relationship, we will say that beef preference is independent of geographic region. Suppose that, in a random sample of 500 customers, 300 are from the North and 200 from the South. Also, 150 prefer cut A, 275 prefer cut B, and 75 prefer cut C.

If beef preference is independent of geographic region, how would we expect this table to be filled in? North South Total Cut A 150 Cut B 275 Cut C 75 300 200 500 90 60 165 110 45 30

Expected Counts Assuming H0 is true,

Degrees of freedom Or cover up one row & one column & count the number of cells remaining!

Now suppose that in the actual sample of 500 consumers the observed numbers were as follows:   (on your paper)  Is there sufficient evidence to suggest that geographic regions and beef preference are not independent? (Is there a difference between the expected and observed counts?)

Have a random sample of people All expected counts are greater than 5. Assumptions: Have a random sample of people All expected counts are greater than 5. H0: geographic region and beef preference are independent Ha: geographic region and beef preference are dependent P-value = .0226 df = 2 a = .05 Since p-value < a, I reject H0. There is sufficient evidence to suggest that geographic region and beef preference are dependent. Expected Counts: N S A 90 60 B 165 110 C 45 30

c2 test for homogeneity Used with a single categorical variable from two (or more) independent samples Used to see if the two populations are the same (homogeneous)

Assumptions & formula remain the same! Expected counts & df are found the same way as test for independence. Only change is the hypotheses!

Hypotheses – written in words H0: the two (or more) distributions are the same Ha: the distributions are different Be sure to write in context!

The following data is on drinking behavior for independently chosen random samples of male and female students. Does there appear to be a gender difference with respect to drinking behavior? (Note: low = 1-7 drinks/wk, moderate = 8-24 drinks/wk, high = 25 or more drinks/wk)

Have 2 random sample of students Expected Counts: M F 0 158.6 167.4 L 554.0 585.0 M 230.1 243.0 H 38.4 40.6 Assumptions: Have 2 random sample of students All expected counts are greater than 5. H0: drinking behavior is the same for female & male students Ha: drinking behavior is not the same for female & male students P-value = .000 df = 3 a = .05 Since p-value < a, I reject H0. There is sufficient evidence to suggest that drinking behavior is not the same for female & male students.