Download presentation
Presentation is loading. Please wait.
Published byPenelope Robinson Modified over 9 years ago
1
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. A PowerPoint Presentation Package to Accompany Applied Statistics in Business & Economics, 4 th edition David P. Doane and Lori E. Seward Prepared by Lloyd R. Jaisingh
2
15-2 Chi-Square Tests Chapter Contents 15.1 Chi-Square Test for Independence 15.2 Chi-Square Tests for Goodness-of-Fit 15.3 Uniform Goodness-of-Fit Test 15.4 Poisson Goodness-of-Fit Test 15.5 Normal Chi-Square Goodness-of-Fit Test 15.6 ECDF Tests (Optional) Chapter 15
3
15-3 Chapter Learning Objectives LO15-1: Recognize a contingency table. LO15-2: Find degrees of freedom and use the chi-square table of critical values. LO15-3: Perform a chi-square test for independence on a contingency table. LO15-4: Perform a goodness-of-fit (GOF) test for a uniform distribution. LO15-5: Explain the GOF test for a Poisson distribution. LO15-6: Use computer software to perform a chi-square GOF test for normality. LO15-7: State advantages of ECDF tests as compared to chi-square GOF tests. Chapter 15 Chi-Square Tests
4
15-4 15.1 Chi-Square Test for Independence A contingency table is a cross-tabulation of n paired observations into categories.A contingency table is a cross-tabulation of n paired observations into categories. Each cell shows the count of observations that fall into the category defined by its row (r) and column (c) heading.Each cell shows the count of observations that fall into the category defined by its row (r) and column (c) heading. For example:For example: Contingency Tables Contingency Tables Chapter 15LO15-1 LO15-1: Recognize a contingency table.
5
15-5 Chi-Square Test Chi-Square Test In a test of independence for an r x c contingency table, the hypotheses are H 0 : Variable A is independent of variable B H 1 : Variable A is not independent of variable B In a test of independence for an r x c contingency table, the hypotheses are H 0 : Variable A is independent of variable B H 1 : Variable A is not independent of variable B Use the chi-square test for independence to test these hypotheses. Use the chi-square test for independence to test these hypotheses. This non-parametric test is based on frequencies. This non-parametric test is based on frequencies. The n data pairs are classified into c columns and r rows and then the observed frequency f jk is compared with the expected frequency e jk. The n data pairs are classified into c columns and r rows and then the observed frequency f jk is compared with the expected frequency e jk. The critical value comes from the chi-square probability distribution with degrees of freedom. (See Appendix E for table values). The critical value comes from the chi-square probability distribution with degrees of freedom. (See Appendix E for table values). d.f. = degrees of freedom = (r – 1)(c – 1) where r = number of rows in the table c = number of columns in the table Chapter 15 15.1 Chi-Square Test for Independence LO15-3, 2 LO15-3: Perform a chi-square test for independence on a LO15-3: Perform a chi-square test for independence on a contingency table. contingency table. LO15-2: Find degrees of freedom and use the chi-square LO15-2: Find degrees of freedom and use the chi-square table of critical values. table of critical values.
6
15-6 Assuming that H 0 is true, the expected frequency of row j and column k is:Assuming that H 0 is true, the expected frequency of row j and column k is: e jk = R j C k /n where R j = total for row j (j = 1, 2, …, r) C k = total for column k (k = 1, 2, …, c) n = sample size Expected Frequencies Expected Frequencies Chapter 15 Steps in Testing the Hypotheses Steps in Testing the Hypotheses Step 1: State the Hypotheses.Step 1: State the Hypotheses. H 0 : Variable A is independent of variable BH 0 : Variable A is independent of variable B H 1 : Variable A is not independent of variable BH 1 : Variable A is not independent of variable B Step 2: Specify the Decision Rule.Step 2: Specify the Decision Rule. Calculate d.f. = (r – 1)(c – 1)Calculate d.f. = (r – 1)(c – 1) For a given , look up the right-tail critical value ( 2 R ) from Appendix E or by using Excel.For a given , look up the right-tail critical value ( 2 R ) from Appendix E or by using Excel. 15.1 Chi-Square Test for Independence LO15-3
7
15-7 Step 4: Calculate the Test Statistic. The chi-square test statistic is Step 5: Make the Decision. Reject H 0 if test statistic > 2 R or if the p-value ≤ . Steps in Testing the Hypotheses Steps in Testing the Hypotheses Chapter 15 Small Expected Frequencies Small Expected Frequencies The chi-square test is unreliable if the expected frequencies are too small. The chi-square test is unreliable if the expected frequencies are too small. Rules of thumb: Rules of thumb: Cochran’s Rule requires that e jk > 5 for all cells. Cochran’s Rule requires that e jk > 5 for all cells. Up to 20% of the cells may have e jk < 5. Up to 20% of the cells may have e jk < 5. Most agree that a chi-square test is infeasible if e jk < 1 in any cell. Most agree that a chi-square test is infeasible if e jk < 1 in any cell. If this happens, try combining adjacent rows or columns to enlarge the expected frequencies. If this happens, try combining adjacent rows or columns to enlarge the expected frequencies. 15.1 Chi-Square Test for Independence LO15-3
8
15-8 Chi-square tests for independence can also be used to analyze quantitative variables by coding them into categories.Chi-square tests for independence can also be used to analyze quantitative variables by coding them into categories. Cross-Tabulating Raw Data Cross-Tabulating Raw Data Figure 14.6 Chapter 15 Why Do a Chi-Square Test on Numerical Data? Why Do a Chi-Square Test on Numerical Data? The researcher may believe there’s a relationship between X and Y, but doesn’t want to use regression.The researcher may believe there’s a relationship between X and Y, but doesn’t want to use regression. There are outliers or anomalies that prevent us from assuming that the data came from a normal population.There are outliers or anomalies that prevent us from assuming that the data came from a normal population. The researcher has numerical data for one variable but not the other.The researcher has numerical data for one variable but not the other. Test of Two Proportions Test of Two Proportions For a 2 × 2 contingency table, the chi-square test is equivalent to a two-tailed z test for two proportions, if the samples are large enough to ensure normality. The hypotheses are: The hypotheses are: 15.1 Chi-Square Test for Independence LO15-3
9
15-9 15.2 Chi-Square Tests for Goodness-of-Fit Purpose of the Test Purpose of the Test The goodness-of-fit (GOF) test helps you decide whether your sample resembles a particular kind of population.The goodness-of-fit (GOF) test helps you decide whether your sample resembles a particular kind of population. The chi-square test will be used because it is versatile and easy to understand.The chi-square test will be used because it is versatile and easy to understand. Chapter 15 Multinomial GOF Test Multinomial GOF Test A multinomial distribution is defined by any k probabilities 1, 2, …, k that sum to unity. For example,A multinomial distribution is defined by any k probabilities 1, 2, …, k that sum to unity. For example, H 0 : 1 =.13, 2 =.13, 3 =.24, 4 =.20, 5 =.16, 6 =.14 H 1 : At least one of the j differs from the hypothesized value. If no parameters are estimated (m = 0) and there are c = 6 classes, so the degrees of freedom will be d.f. = c – m – 1 = 6 – 0 – 1 = 5.
10
15-10 Hypotheses for GOF Hypotheses for GOF The hypotheses are:The hypotheses are: H 0 : The population follows a _____ distribution H 1 : The population does not follow a ______ distribution The blank may contain the name of any theoretical distribution (e.g., uniform, Poisson, normal).The blank may contain the name of any theoretical distribution (e.g., uniform, Poisson, normal). Chapter 15 Test Statistic and Degrees of Freedom for GOF Where f j = the observed frequency of observations in class j and e j = the expected frequency in class j if H 0 were true. The test statistic follows the chi-square distribution with degrees of freedom d.f. = c – m – 1 where c is the number of classes used in the test m is the number of parameters estimated.The test statistic follows the chi-square distribution with degrees of freedom d.f. = c – m – 1 where c is the number of classes used in the test m is the number of parameters estimated. 15.2 Chi-Square Tests for Goodness-of-Fit
11
15-11 15.3 Uniform Goodness-of-Fit Test The uniform goodness-of-fit test is a special case of the multinomial in which every value has the same chance of occurrence.The uniform goodness-of-fit test is a special case of the multinomial in which every value has the same chance of occurrence. The chi-square test for a uniform distribution compares all c groups simultaneously.The chi-square test for a uniform distribution compares all c groups simultaneously. The hypotheses are:The hypotheses are: H 0 : 1 = 2 = …, c = 1/c H 1 : Not all j are equal Uniform Distribution Uniform Distribution Chapter 15 The test can be performed on data that are already tabulated into groups.The test can be performed on data that are already tabulated into groups. Calculate the expected frequency e j for each cell.Calculate the expected frequency e j for each cell. The degrees of freedom are d.f. = c – 1 since there are no parameters for the uniform distribution.The degrees of freedom are d.f. = c – 1 since there are no parameters for the uniform distribution. Obtain the critical value 2 from Appendix E for the desired level of significance .Obtain the critical value 2 from Appendix E for the desired level of significance . The p-value can be obtained from Excel.The p-value can be obtained from Excel. Reject H 0 if p-value ≤ .Reject H 0 if p-value ≤ . LO15-4 LO15-4: Perform a goodness of-fit (GOF) test for a uniform LO15-4: Perform a goodness of-fit (GOF) test for a uniform distribution. distribution.
12
15-12 First form c bins of equal width and create a frequency distribution.First form c bins of equal width and create a frequency distribution. Calculate the observed frequency f j for each bin.Calculate the observed frequency f j for each bin. Define e j = n/c.Define e j = n/c. Perform the chi-square calculations.Perform the chi-square calculations. The degrees of freedom are d.f. = c – 1 since there are no parameters for the uniform distribution.The degrees of freedom are d.f. = c – 1 since there are no parameters for the uniform distribution. Obtain the critical value from Appendix E for a given significance level and make the decision.Obtain the critical value from Appendix E for a given significance level and make the decision. Maximize the test’s power by defining bin width as (As a result, the expected frequencies will be as large as possible.)Maximize the test’s power by defining bin width as (As a result, the expected frequencies will be as large as possible.) Uniform GOF Test: Raw Data Uniform GOF Test: Raw Data Chapter 15 15.3 Uniform Goodness-of-Fit Test LO15-4
13
15-13 Calculate the mean and standard deviation of the uniform distribution as:Calculate the mean and standard deviation of the uniform distribution as: If the data are not skewed and the sample size is large (n > 30), then the mean is approximately normally distributed.If the data are not skewed and the sample size is large (n > 30), then the mean is approximately normally distributed. So, test the hypothesized uniform mean usingSo, test the hypothesized uniform mean using Uniform GOF Test: Raw Data Uniform GOF Test: Raw Data Chapter 15 15.3 Uniform Goodness-of-Fit Test LO15-4
14
15-14 15.4 Poisson Goodness-of-Fit Test In a Poisson distribution model, X represents the number of events per unit of time or space.In a Poisson distribution model, X represents the number of events per unit of time or space. X is a discrete nonnegative integer (X = 0, 1, 2, …).X is a discrete nonnegative integer (X = 0, 1, 2, …). Event arrivals must be independent of each other.Event arrivals must be independent of each other. Sometimes called a model of rare events because X typically has a small mean.Sometimes called a model of rare events because X typically has a small mean. Poisson Data-Generating Situations Poisson Data-Generating Situations Chapter 15 Poisson Goodness-of-Fit Test Poisson Goodness-of-Fit Test The mean is the only parameter.The mean is the only parameter. If is unknown, it must be estimated from the sample.If is unknown, it must be estimated from the sample. Use the estimated to find the Poisson probability P(X) for each value of X.Use the estimated to find the Poisson probability P(X) for each value of X. Compute the expected frequencies.Compute the expected frequencies. Perform the chi-square calculations.Perform the chi-square calculations. Make the decision.Make the decision. You may need to combine classes until expected frequencies become large enough for the test (at least until e j > 2).You may need to combine classes until expected frequencies become large enough for the test (at least until e j > 2). LO15-5 LO15-5: Explain the GOF test for a Poisson distribution.
15
15-15 Calculate the sample mean as:Calculate the sample mean as: Using this estimate mean, calculate the Poisson probabilities either by using the Poisson formulaUsing this estimate mean, calculate the Poisson probabilities either by using the Poisson formula P(x) = ( x e - )/x! or Excel. For c classes with m = 1 parameter estimated, the degrees of freedom are d.f. = c – m – 1For c classes with m = 1 parameter estimated, the degrees of freedom are d.f. = c – m – 1 Obtain the critical value for a given from Appendix E.Obtain the critical value for a given from Appendix E. Make the decision.Make the decision. Poisson GOF Test: Tabulated Data Poisson GOF Test: Tabulated Data Chapter 15 15.4 Poisson Goodness-of-Fit Test LO15-5
16
15-16 15.5 Normal Chi-Square Goodness-of-Fit Test Two parameters, the mean and the standard deviation , fully describe the normal distribution.Two parameters, the mean and the standard deviation , fully describe the normal distribution. Unless and are know a priori, they must be estimated from a sample.Unless and are know a priori, they must be estimated from a sample. Using these statistics, the chi-square goodness-of-fit test can be used.Using these statistics, the chi-square goodness-of-fit test can be used. Normal Data Generating Situations Normal Data Generating Situations Chapter 15 Method 1: Standardizing the Data Method 1: Standardizing the Data Transform the sample observations x 1, x 2, …, x n into standardized values.Transform the sample observations x 1, x 2, …, x n into standardized values. LO15-6 LO15-6Use computer software to perform a chi-square GOF test for normality. LO15-6: Use computer software to perform a chi-square GOF test for normality.
17
15-17 To obtain equal-width bins, divide the exact data range into c groups of equal width.To obtain equal-width bins, divide the exact data range into c groups of equal width. Step 1: Count the sample observations in each bin to get observed frequencies f j.Step 1: Count the sample observations in each bin to get observed frequencies f j. Step 2: Convert the bin limits into standardized z-values by using the formula.Step 2: Convert the bin limits into standardized z-values by using the formula. Method 2: Equal Bin Widths Method 2: Equal Bin Widths Chapter 15 Step 3: Find the normal area within each bin assuming a normal distribution.Step 3: Find the normal area within each bin assuming a normal distribution. Step 4: Find expected frequencies e j by multiplying each normal area by the sample size n.Step 4: Find expected frequencies e j by multiplying each normal area by the sample size n. Classes may need to be collapsed from the ends inward to enlarge expected frequencies.Classes may need to be collapsed from the ends inward to enlarge expected frequencies. 15.5 Normal Chi-Square Goodness-of-Fit Test LO15-6
18
15-18 Define histogram bins in such a way that an equal number of observations would be expected within each bin under the null hypothesis.Define histogram bins in such a way that an equal number of observations would be expected within each bin under the null hypothesis. Define bin limits so that e j = n/cDefine bin limits so that e j = n/c A normal area of 1/c in each of the c bins is desired.A normal area of 1/c in each of the c bins is desired. The first and last classes must be open-ended for a normal distribution, so to define c bins, we need c – 1 cut-points.The first and last classes must be open-ended for a normal distribution, so to define c bins, we need c – 1 cut-points. The upper limit of bin j can be found directly by using Excel.The upper limit of bin j can be found directly by using Excel. Alternatively, find z j for bin j using Excel and then calculate the upper limit for bin j asAlternatively, find z j for bin j using Excel and then calculate the upper limit for bin j as Once the bins are defined, count the observations f j within each bin and compare them with the expected frequencies e j = n/c.Once the bins are defined, count the observations f j within each bin and compare them with the expected frequencies e j = n/c. Method 3: Equal Expected Frequencies Method 3: Equal Expected Frequencies Chapter 15LO15-6 15.5 Normal Chi-Square Goodness-of-Fit Test
19
15-19 15.6 ECDF Tests There are many alternatives to the chi-square test based on the Empirical Cumulative Distribution Function (ECDF).There are many alternatives to the chi-square test based on the Empirical Cumulative Distribution Function (ECDF). The Kolmogorov-Smirnov (K-S) test uses the largest absolute difference between the actual and expected cumulative relative frequency of the n data valuesThe Kolmogorov-Smirnov (K-S) test uses the largest absolute difference between the actual and expected cumulative relative frequency of the n data values The K-S test is not recommended for grouped data.The K-S test is not recommended for grouped data. The K-S test assumes that no parameters are estimated.The K-S test assumes that no parameters are estimated. If parameters are estimated, use a Lilliefors test.If parameters are estimated, use a Lilliefors test. Both of these tests are done by computer.Both of these tests are done by computer. The Anderson-Darling (A-D) test is widely used for non-normality because of its power. The A-D test is based on a probability plot. When the data fit the hypothesized distribution closely, the probability plot will be close to a straight line. Chapter 15LO15-7 LO15-7: State advantages of ECDF tests as compared to chi-square GOF tests. GOF tests.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.