Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts.

Similar presentations


Presentation on theme: "Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts."— Presentation transcript:

1 Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

2 Copyright © 2014, 2011 Pearson Education, Inc. 2 18.1 Chi-Squared Tests Retailers can customize the online shopping experience by learning more about its customers. For example, Amazon wants to know if income level affects what shoppers look for (camera or phone) when they visit electronics.  Use a chi-squared test for independence to answer this question.

3 Copyright © 2014, 2011 Pearson Education, Inc. 3 18.1 Chi-Squared Tests Contingency Table: Purchase Category vs. Household Income (555 visitors to Amazon)

4 Copyright © 2014, 2011 Pearson Education, Inc. 4 18.1 Chi-Squared Tests Observations from Contingency Table  Association is evident suggesting that income and choice of product are dependent.  Households with lower incomes seem more likely to purchase a phone; those with higher incomes a camera.  Are these differences in purchase rates the result of sampling variation?

5 Copyright © 2014, 2011 Pearson Education, Inc. 5 18.2 Test of Independence Chi-Squared test of independence Tests the independence of two categorical variables using counts in a contingency table.

6 Copyright © 2014, 2011 Pearson Education, Inc. 6 18.2 Test of Independence Hypotheses for the chi-squared test H 0 : Household Income and Purchase Category are independent. H a : Household Income and Purchase Category are not independent. Or H 0 : p 25 = p 50 = p 75 = p 100 = p 100+ H a : p 25, p 50, p 75, p 100, p 100+ are not all equal

7 Copyright © 2014, 2011 Pearson Education, Inc. 7 18.2 Test of Independence Hypotheses for the chi-squared test  Null hypothesis describes five segments of the population defined by household income.  Null assumes conditional probabilities of purchase type given income level are equal across the five segments.  Alternative hypothesis is vague; does not indicate why the null is false.

8 Copyright © 2014, 2011 Pearson Education, Inc. 8 18.2 Test of Independence Calculating χ 2  Measures the distance between the observed contingency table and a hypothetical contingency table.  The hypothetical contingency table obeys H 0 while being consistent with observed marginal counts.

9 Copyright © 2014, 2011 Pearson Education, Inc. 9 18.2 Test of Independence Calculating χ 2 The null hypothesis determines expected cell counts in the hypothetical table.

10 Copyright © 2014, 2011 Pearson Education, Inc. 10 18.2 Test of Independence Calculating χ 2 Accumulates the deviations between the observed and expected counts (in the hypothetical table) across all cells.

11 Copyright © 2014, 2011 Pearson Education, Inc. 11 18.2 Test of Independence Calculating χ 2 For retail data on purchase category and household income, the chi-squared statistic is 33.925.

12 Copyright © 2014, 2011 Pearson Education, Inc. 12 18.2 Test of Independence Plots of the chi-squared test Mosaic Plot for Retail Data

13 Copyright © 2014, 2011 Pearson Education, Inc. 13 18.2 Test of Independence Plots of the chi-squared test Mosaic Plot for Independent Variables

14 Copyright © 2014, 2011 Pearson Education, Inc. 14 18.2 Test of Independence Conditions  No lurking explanation for association.  Data are random samples from indicated segments of the population.  Categories defining the table are mutually exclusive.  Expected cell counts are not too small.

15 Copyright © 2014, 2011 Pearson Education, Inc. 15 18.2 Test of Independence The chi-squared distribution  Sampling distribution of the chi-squared statistic if the null hypothesis is true.  Right-skewed.  Assigns probabilities to positive values only.  Identified by degrees of freedom (df).  Approaches normal distribution as df increase.

16 Copyright © 2014, 2011 Pearson Education, Inc. 16 18.2 Test of Independence The chi-squared distribution

17 Copyright © 2014, 2011 Pearson Education, Inc. 17 18.2 Test of Independence Getting the p-value df for χ 2 test of independence = (r - 1)(c - 1) df based on size of contingency table r = number of rows c = number of columns

18 Copyright © 2014, 2011 Pearson Education, Inc. 18 18.2 Test of Independence Getting the p-value – Retail Example Observed χ 2 = 33.925 with 4 df From χ 2 table P(χ 2 > 9.4877) = 0.05; since 33.925 > 9.4877, we can reject H 0 The p-value is therefore < 0.05; the exact p-value is 0.0000008.

19 Copyright © 2014, 2011 Pearson Education, Inc. 19 18.2 Test of Independence Getting the p-value – Retail Example

20 Copyright © 2014, 2011 Pearson Education, Inc. 20 18.2 Test of Independence Summary: chi-squared test of independence

21 Copyright © 2014, 2011 Pearson Education, Inc. 21 18.2 Test of Independence Chi-squared test of independence – Checklist  No obvious lurking variable.  SRS Condition.  Contingency table condition.  Sample size condition. Expected cell frequencies at least 10; expected cell frequencies of 5 permitted with at least 4 df.

22 Copyright © 2014, 2011 Pearson Education, Inc. 22 18.2 Test of Independence Connection to two-sample tests  Chi-squared test reduces to two-sided version of the two-sample test of the difference between proportions.  If the 95% confidence interval for p 1 – p 2 does not include zero, then the chi-squared test has a p-value less than 0.05 and H 0 is rejected.

23 Copyright © 2014, 2011 Pearson Education, Inc. 23 4M Example 18.1: RETAIL CREDIT Motivation Managers of a chain worry that some methods of recruiting customers for store credit, called channels, produce more problems than other channels. Is the channel used related to the status of the customer’s account a year later?

24 Copyright © 2014, 2011 Pearson Education, Inc. 24 4M Example 18.1: RETAIL CREDIT Method Data collected for 630 accounts on variables Channel and Status after 12 months.

25 Copyright © 2014, 2011 Pearson Education, Inc. 25 4M Example 18.1: RETAIL CREDIT Method – Check Conditions  No obvious lurking variable. Difficult to check without knowing more about channels.  SRS condition reasonably met.  Contingency table condition satisfied.  Sample size condition must be checked after computing expected cell frequencies.

26 Copyright © 2014, 2011 Pearson Education, Inc. 26 4M Example 18.1: RETAIL CREDIT Mechanics – Mosaic Plot

27 Copyright © 2014, 2011 Pearson Education, Inc. 27 4M Example 18.1: RETAIL CREDIT Mechanics – Expected Counts Sample size condition satisfied. Χ 2 = 9.158 with 4 df; p-value = 0.057. Cannot reject H 0 at α = 0.05.

28 Copyright © 2014, 2011 Pearson Education, Inc. 28 4M Example 18.1: RETAIL CREDIT Message Observed rates of late payments and early closure are not statistically significantly different among credit accounts opened a year ago through in-store, mailing and Web channels. Since the p-value is close to 0.05, it may be worthwhile to monitor accounts developed through mailings.

29 Copyright © 2014, 2011 Pearson Education, Inc. 29 18.3 General Versus Specific Hypotheses  Chi-squared test cannot match the power of a more specific test.  A 95% confidence interval for the difference in proportions of late payments from accounts developed via the mailing channel versus the other two channels (combined into one) does not contain zero.

30 Copyright © 2014, 2011 Pearson Education, Inc. 30 18.4 Tests of Goodness of Fit Chi-Squared test of goodness of fit A test of the distribution of a single categorical variable.

31 Copyright © 2014, 2011 Pearson Education, Inc. 31 18.4 Tests of Goodness of Fit Testing for randomness  Do shoppers purchase big-ticket items more often on some days of the week than on others?  Are cars made on some days more likely to have defects than the cars made on other days?

32 Copyright © 2014, 2011 Pearson Education, Inc. 32 4M Example 18.2: DETECTING ACCOUNTING FRAUD Motivation Managers would like to have a systematic method to audit purchase amounts on invoices to uncover fraud.

33 Copyright © 2014, 2011 Pearson Education, Inc. 33 4M Example 18.2: DETECTING ACCOUNTING FRAUD Method Managers collected a sample of n = 135 invoices. Amounts ranged from $100 to $100,000, with an average of $42,000. Leading digits for the amounts should follow a distribution known as Benford’s law.

34 Copyright © 2014, 2011 Pearson Education, Inc. 34 4M Example 18.2: DETECTING ACCOUNTING FRAUD Method Probabilities based on Benford’s law

35 Copyright © 2014, 2011 Pearson Education, Inc. 35 4M Example 18.2: DETECTING ACCOUNTING FRAUD Method Counts of leading digits in sample of invoices

36 Copyright © 2014, 2011 Pearson Education, Inc. 36 4M Example 18.2: DETECTING ACCOUNTING FRAUD Method – Check Conditions All conditions are satisfied. The smallest expected count is 6.2. Because there are more than 4 degrees of freedom, the relaxed sample size condition is used.

37 Copyright © 2014, 2011 Pearson Education, Inc. 37 4M Example 18.2: DETECTING ACCOUNTING FRAUD Mechanics Χ 2 = 19.1 with 8 df. P-value = 0.014. Reject H 0.

38 Copyright © 2014, 2011 Pearson Education, Inc. 38 4M Example 18.2: DETECTING ACCOUNTING FRAUD Message The deviation of the distribution of leading digits in these invoice amounts is statistically significantly different from the form predicted by Benford’s law. This confirms suspicion that the digits are atypical and may indicate fraud.

39 Copyright © 2014, 2011 Pearson Education, Inc. 39 18.4 Tests of Goodness of Fit Testing the fit of a probability model  How do we know whether the observed counts match a particular distribution?

40 Copyright © 2014, 2011 Pearson Education, Inc. 40 4M Example 18.3: WEB HITS Motivation Managers of the Web site plan to use a Poisson model to summarize how often users click on ads. If it fits well, they will use this model to summarize concisely the volume of traffic headed to advertisers and to measure the effects of changes in the Web site on traffic patterns.

41 Copyright © 2014, 2011 Pearson Education, Inc. 41 4M Example 18.3: WEB HITS Method Data collected on a sample of 685 users that visited the Web site during a recent weekday evening.

42 Copyright © 2014, 2011 Pearson Education, Inc. 42 4M Example 18.3: WEB HITS Method – Check Conditions SRS and the contingency table conditions are satisfied. However, need to combine the last three categories in order to meet the sample size condition.

43 Copyright © 2014, 2011 Pearson Education, Inc. 43 4M Example 18.3: WEB HITS Mechanics

44 Copyright © 2014, 2011 Pearson Education, Inc. 44 4M Example 18.3: WEB HITS Mechanics Χ 2 = 0.345 with 2 df. P-value = 0.84. Cannot reject H 0.

45 Copyright © 2014, 2011 Pearson Education, Inc. 45 4M Example 18.3: WEB HITS Message The distribution of the number of ads clicked by users is consistent with a Poisson distribution. Managers of the Web site can use this model to summarize user behavior.

46 Copyright © 2014, 2011 Pearson Education, Inc. 46 Best Practices  Remember the importance of experiments.  State your hypotheses before looking at the data.  Plot the data.  Think when you interpret a p-value.

47 Copyright © 2014, 2011 Pearson Education, Inc. 47 Pitfalls  Don’t confuse statistical significance with substantive significance.  Don’t use a chi-squared test when the expected frequencies are too small.  Don’t cherry pick comparisons.  Don’t use the number of observations to find the degrees of freedom of chi-squared.


Download ppt "Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts."

Similar presentations


Ads by Google