Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 14 Inference for Distribution of Categorical Variables: Chi-Squared Procedures.

Similar presentations


Presentation on theme: "Chapter 14 Inference for Distribution of Categorical Variables: Chi-Squared Procedures."— Presentation transcript:

1 Chapter 14 Inference for Distribution of Categorical Variables: Chi-Squared Procedures

2 Chapter Objectives Understand and Conduct Chi-Square Tests – Goodness of Fit – Homogeneity of Populations – Association/Independence Given a two-way table, compute conditional distributions Use technology to conduct a chi-square significance test

3 Introduction Chi-Square Goodness of Fit Test allows us to determine whether a specified population distribution seems valid We can compare two or more population proportions using a chi-squared test for homogeneity of population Chi-square test of association/ independence makes it possible to use the info provided in a two-way table to determine whether the distributions of one variable has been influenced by another

4 Questions that Chi-Square can answer: 1.Are you more likely to have a car accident when using your cell phone? 2. Does our distribution of M&M colors match the stated distribution by the Mars company? 3. Is there an association between having an exclusive territory firm and the success of the business?

5 14.1 Test for Goodness of Fit Suppose you want to know how likely it is that you get only 2 out of 52 red M&M’s in a bag. You could conduct a z-test described in chapter 12 and that would tell you how likely this one proportion would be However, if you wanted to look at all six sample proportions (one for each color), conducting 6 different tests of significance would be quite inefficient

6 Chi-Square can help!!! Chi-Square test for goodness of fit can be used to see if the observed sample distribution is significantly different in some way from the hypothesized population distribution In this example, we can test all 6 color proportions at the same time to see if there is a significant difference

7

8 M&M Activity Population of Interest: M&M candies H o = The distribution of candy colors is as given by Mars H a = At least one of the proportions is different from the stated distributions. Complete activity

9 Car Collisions While On Cell Phones Are car collisions while on cell phones equally likely for each day of the week? A study of 699 drivers who were using a cell phone while involved in a crash examined this question.

10 Car Collisions ex. cont’d. H o : Car accidents involving cell phone use are equally likely on each day of the week p sunday = p monday = … = p saturday = 1/7 H a :The probabilities of a motor vehicle accident involving cell phones varies from day to day ( At least one of the proportions differs from the stated value)

11 Car Collisions Data DAYSunMonTueWedThuFriSat NUMBER of collisions 2013312615913611312 In this table, it is clear that the # of collisions is NOT equally distributed between the days. The histogram to the right further shows this point, with the left bar showing the expected count per day (~100) and the right bar showing the observed count.

12 Car Collision StatisticsDAYObservedExpectedSunday20699*1/7=99.85763.86 Monday13399.85711.00 Tuesday12699.8576.84 Wednesday15999.85735.03 Thursday13699.85713.08 Friday11399.8571.73 Saturday1299.85777.30 208.84 208.84

13 Car Collision Are the conditions met? All expected counts at least one? Yes No more than 20% less than 5? Yes What are the degrees of freedom? 7 – 1 =6 The p-value will represent the probability of observing a value of χ 2 at least as extreme as the one actually observed.

14 P-value of Car Collision For our example, in order to have a p-value of.05 and 6 degrees of freedom, we would need a critical value (X 2 ) of 12.59 In fact, even at a p-value of.0005 we would need a X 2 of 24.10 Since our X 2 is 208.84, or bigger than the required for any significance level, we have enough evidence to reject the H o In other words, the difference in the distribution of collisions per day is statistically significant

15 χ 2 on the Graphing Calculator The Goodness of Fit Test appears on the TI-84, not the 83. – For TI-84: Enter observed values in L1 and expected values in L2. – Stat → Test → GOF-Test Observed L1 Expected L2 Enter DF Calculate This will give the χ 2 test statistic and the p-value

16 χ 2 on the Graphing Calculator For the TI-83 – Enter observed values in L1 and expected values in L2. – Define L3 as (L1-L2) 2 /L2 – 2 nd List → MATH → sum (L3) This will give the χ 2 test statistic. Use the table to determine p-value or χ 2 cdf ( χ 2, u.b., df)

17 Chi-Square Properties Total area under a chi-square curve is 1 Each curve begins at 0 (except df = 1), increases to a peak, then approaches the hor. asymptote from above Each chi-square curve is skewed right and as the df increases, the curve becomes more and more symmetrical and appears more Normal

18 Goodness of Fit Tests Notes In the chi-square test for goodness of fit we test the null hypothesis that a categorical variable has a specified distribution. If we find significance, we can conclude that our variable has a distribution different from the specified one These tests do not specifically tell us which categories have the greatest differences – In our car collisions example, which days have the largest difference in observed vs expected values? Saturday and Sunday HW: p846 #14.3, 10

19 14.2 Inference for Two-Way Tables Goodness of fit tests work for one way tables, for two-way tables we need a new method of approach We could conduct multiple proportion tests and compare the results, but that would be inefficient and also a bit unclear The Chi-Square Test for Homogeneity of Populations will make this much more simple

20 χ 2 Tests for Homogeneity of Populations We are testing one categorical variable measured across two or more populations The H o will be that the distribution is the same across the all of the populations being compared The H a will be that the distribution is not the same

21

22 Conditions 1.Data must come from independent SRS’s from the populations of interest 2.All expected counts are greater than 1 and no more than 20% are less than 5 Once again we will be using a four step process

23 Does background music influence wine purchases? 3 types of music (None, French, and Italian) were played at random times and the counts of 3 types of wine sold (French, Italian and Other) were taken to see if there are any significant differences based on type of music.

24 Wine example In this example, we have three different populations: pop 1: bottles of wine sold when no music is playing pop 2: bottles of wine sold when French music is playing pop 3: bottles of wine sold when Italian music is playing Populations of interest: the distribution of types of wine selected for each music type

25 Wine example H o : The proportion of each wine type sold are the same in all three pops. or the distribution of wine selected is the same for all three populations of music types. H a : the distribution of wine types are not all the same

26 Calculating Chi-Square Where the expected counts are found by

27 Calculate the expected counts Note: expected counts need not be whole numbers

28 Calculations The calculator will compute the results for you, but you do need to demonstrate that you understand the formula… Df = (r – 1) x (c – 1) = (3 – 1) x (3 – 1) = 4

29 P-value and conclusion P( χ 2 > 18.28) for df = 4 is between 0.001 and 0.0025 Conclusion: We have strong evidence to reject Ho and conclude that the type of music being played has significant effect on wine sales

30 Using your calculator Enter your values from your 2 way table into Matrix A (in this case a 3x3) Perform a chi-square test [Stat] > [Test] [chi- square] Make sure Matrix A is the observed and B is expected, then hit calculate Notice that Matrix B now contains your expected values so you can compare where, if at all, major deviations occurred

31 Caution The chi-square test only confirms there is some relationship It does not in itself tell us what population our conclusion describes If the study was done in one market on a Saturday, the results may apply only to Saturday shoppers at this market. HW: pg. 855 #14.11, 14.15, 14.17

32 Chi-Square Test of Association/Independence Two categorical variables measure across a single population. We are drawing a single sample and breaking the sample down into categories.

33 Example: Franchises that succeed A franchise is a business that has multiple locations owned by individuals, rather than owned by the parent company – Think Applebees, Home Depot, Dunkin Donuts, etc The franchisee pays money to the parent company in exchange for an established brand Some franchises offer exclusive territory clauses in their contracts that state that the franchisee will be the only representative of the parent company in a specified area

34 How does the presence of an exclusive- territory clause in the contract relate to the survival of the business? A study designed to address this question collected data from a sample of 170 new franchise firms. Two categorical variables were measured for each firm. First – successful or not based on whether it was still franchising as of a certain date. Second – whether or not they had an exclusive-territory clause

35 Performing the Test Here are the results: H o : There is no association between success and exclusive territory H a : There is an associate between success and exclusive territory

36 Conditions The data comes from an SRS We must check the expected counts Since all values are at least 5, our conditions are met.

37 Calculations df = (r – 1)(c – 1) = (2 – 1)(2 – 1) = 1 P-value: between 0.01 and 0.02

38 Interpretation We have sufficient evidence ( χ 2 = 5.91, df = 1, 0.01 < P < 0.02) of an association between success and an exclusive territory in the population of franchisees. pg. 874 #14.21-23

39 What type of chi-square test should you use? Are the proportions of employees of different races represented at a company the same as the population? GOF Test A particular company has two offices, one in New York and one in Minneapolis. Are the proportions of employees of different races the same at each office? Homogeneity of Populations

40 Which type of chi-square test should you use?  Does the level of education received affect whether you will become a millionaire or not? Association/Independence pg. 876 #14.26, 14.29, 14.39


Download ppt "Chapter 14 Inference for Distribution of Categorical Variables: Chi-Squared Procedures."

Similar presentations


Ads by Google