Download presentation
Presentation is loading. Please wait.
Published byFay Bishop Modified over 8 years ago
1
Chapter 16: Analysis of Categorical Data
2
LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along a single dimension. LO2Use the chi-square test of independence to perform contingency analysis. Learning Objectives
3
In chapter 5 the binomial distribution was used to analyze experiments or trials that had only two possible outcomes An extension of this problem is a multinomial distribution in which more than two possible outcomes can occur The χ 2 goodness-of-fit test is used to analyze probabilities of multinomial distribution trials along a single dimension. It compares expected (theoretical) frequencies of categories from a population distribution to the observed (actual) frequencies from a distribution to determine whether there is a difference between what was expected and what was observed. 2 Goodness-of-Fit Test LO1
4
Hypothesize – Step1: The hypotheses Test – Step 2: The appropriate statistical tests for the problem – Step 3: Set α value – Step 4: Determine the degrees of freedom – Step 5: Determine the expected frequencies – Step 6: Calculate the observed value of chi-square Action – Step 7: Make decision to accept or reject null hypothesis Business Implication Use the information to answer research questions Formulating Test of Hypothesis LO1
5
When the expected value of a category is small, a large chi- square value can be obtained erroneously, leading to a type I error Control: to control for this potential error, the chi-square goodness of fit test should not be used when any of the expected frequencies is less than 5 If the observed data produce expected values of less than 5, combining adjacent categories (when meaningful) to create larger frequencies may be possible Small Expected Values of a Category LO1
6
The formula which is used to compute the test statistic for a chi-square goodness-of-fit test is given below. 2 Goodness-of-Fit Test LO1
7
Milk Sales Data for Demonstration Problem 16.1 MonthLitres of Milk January1,610 February1,585 March1,649 April1,590 May1,540 June1,397 July1,410 August1,350 September1,495 October1,564 November1,602 December1,655 TOTAL18,447 LO1
8
Hypotheses and Decision Rules for Demonstration Problem 16.1 LO1
9
Calculations for Demonstration Problem 16.1 Monthf0f0 fefe (f 0 –f e ) 2 / f e January1,6101,537.253.44 February1,5851,537.251.48 March1,6491,537.258.12 April1,5901,537.251.81 May1,5401,537.250.00 June1,3971,537.2512.80 July21,4101,537.2510.53 August1,3501,537.2522.81 September1,4951,537.251.16 October1,5641,537.250.47 November1,6021,537.252.73 December1,6551,537.259.02 Totals18,447 74.37 LO1
10
The observed chi-square value of 74.37 is greater than the critical value of 24.725. The decision is to reject the null hypothesis. The data provides enough evidence to indicate that the distribution of milk sales is not uniform. Calculations for Demonstration Problem 16.1 LO1
11
Calculations for Demonstration Problem 16.1 LO1
12
Bank Customer Arrival Data for Demonstration Problem 16.2 Number of Arrivals Observed Frequencies 07 118 225 317 412 55 5 LO1
13
Hypotheses and Decision Rules for Demonstration Problem 16.2 LO1
14
Calculations for Demonstration Problem 16.2: Estimating the Mean Arrival Rate Mean Arrival Rate Number of Arrivals X Observed Frequencies f f·X 070 118 22550 31751 41248 55 525 192 LO1
15
Calculations for Demonstration Problem 16.2: Poisson Probabilities for = 2.3 Number of Arrivals X Expected Probabilities P(X) Expected Frequencies n·P(X) 00.10038.42 10.230619.37 20.265222.28 30.203317.08 40.11699.82 0.08387.04 Poisson Probabilities for = 2.3 Poisson Probabilities for = 2.3 LO1
16
2 Calculations for Demonstration Problem 16.2 Number of Arrivals X Observed Frequencies f Expected Frequencies nP(X) (f o - f e ) 2 f e 0 1 2 3 4 55 78.42 1819.37 2522.28 1717.08 129.82 57.04 8484.00 0.24 0.10 0.33 0.00 0.48 0.59 1.74 LO1
17
The observed chi-square value of 1.74 is less than the critical value of 9.4877. The decision is not to reject the null hypothesis. The data does not provide enough evidence to indicate that the distribution of bank arrivals is Poisson. Calculations for Demonstration Problem 16.2 LO1
18
Calculations for Demonstration Problem 16.2 LO1
19
Used to analyze the frequencies of two variables with multiple categories to determine whether the two variables are independent. 2 Test of Independence Qualitative Variables Nominal Data LO2
20
2 Test of Independence: Investment Example Where do you reside? A.Large townB. Medium town C. Small town D. Rural area Which type of financial investment are you most likely to make today? E. StocksF. BondsG. Treasury bills Type of financial Investment EFG AO 13 nAnA Geographic BnBnB Region CnCnC DnDnD nEnE nFnF nGnG N Contingency Table LO2
21
2 Test of Independence: Investment Example Type of Financial Investment EFG Ae 12 nAnA Geographic BnBnB Region CnCnC DnDnD nEnE nFnF nGnG N Contingency Table LO2
22
2 Test of Independence: Formulas ij ij e nn N where : i= the row j= the columnn the total of row i the total of column j N= the total of all frequencies i j n n 2 2 oe where ff f e : df= (r- 1)(c- 1) r= the number of rows c= the number of columns Expected Frequencies Calculated (Observed ) LO2
23
2 Test of Independence: Gasoline Preference Versus Income Category LO2
24
Contingency Table for the Gas Consumer Example LO2
25
Gasoline Preference Versus Income Category: Expected Frequencies Type of Gasoline Income RegularPremium Extra Premium Less than $30,000(66.15) (24.46) (16.40) 85166107 $30,000 to $49,999(87.78) (32.46) (21.76) 1022713142 $50,000 to $99,000(45.13) (16.69) (11.19) 36221573 At least $100,000(38.95) (14.40) (9.65) 15232563 2388859385 ij ij e nn e e e N 11 12 13 107238 385 6615 10788 385 2446 10759 385 1640... LO2
26
Gasoline Preference Versus Income Category: 2 Calculation LO2
27
The observed chi-square value of 70.78 is greater than the critical value of 16.8119. The decision is to reject the null hypothesis. The data does provide enough evidence to indicate that the type of gasoline preferred is not independent of income. Gasoline Preference Versus Income Category LO2
28
Gasoline Preference Versus Income Category: 2 Calculation LO2
29
Gasoline Preference Versus Income Category: Minitab Output LO2
30
Chi-square tests indicate whether two distributions are the same or are not. They do not tell you in what specific way they are different The chi-square test of independence indicates whether two variables are independent or not. But it does not tell you in which way they are dependent: it does not tell the nature of the relationship between the two variables Chi-square techniques are an outgrowth of the binomial distribution and the inferential techniques for analyzing population proportions Both the chi-square test of independence and the chi-square goodness-of –fit test require that expected values be greater than or equal to 5. If they are not, add adjacent rows or columns until all expected values are five or greater. Important Points of Interests LO2
31
COPYRIGHT Copyright © 2014 John Wiley & Sons Canada, Ltd. All rights reserved. Reproduction or translation of this work beyond that permitted by Access Copyright (The Canadian Copyright Licensing Agency) is unlawful. Requests for further information should be addressed to the Permissions Department, John Wiley & Sons Canada, Ltd. The purchaser may make back-up copies for his or her own use only and not for distribution or resale. The author and the publisher assume no responsibility for errors, omissions, or damages caused by the use of these programs or from the use of the information contained herein.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.