15 Chi-Square Tests Chi-Square Test for Independence

Slides:



Advertisements
Similar presentations
Prepared by Lloyd R. Jaisingh
Advertisements

Prepared by Lloyd R. Jaisingh
CHI-SQUARE(X2) DISTRIBUTION
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Hypothesis Testing IV Chi Square.
Random variable Distribution. 200 trials where I flipped the coin 50 times and counted heads no_of_heads in a trial.
Chapter 14 Analysis of Categorical Data
Chapter 12 Chi-Square Tests and Nonparametric Tests
Chi Square: A Nonparametric Test PSYC 230 June 3rd, 2004 Shaun Cook, ABD University of Arizona.
Ch 15 - Chi-square Nonparametric Methods: Chi-Square Applications
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
BCOR 1020 Business Statistics
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
Cross Tabulation and Chi-Square Testing. Cross-Tabulation While a frequency distribution describes one variable at a time, a cross-tabulation describes.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
15 Chi-Square Tests Chi-Square Test for Independence
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. A PowerPoint Presentation Package to Accompany Applied Statistics.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
Chapter 9 Hypothesis Testing and Estimation for Two Population Parameters.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
1 1 Slide © 2005 Thomson/South-Western Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial Population Goodness of.
Chapter 20 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 These tests can be used when all of the data from a study has been measured on.
Chi-Square Tests Chi-Square Tests Chapter1414 Chi-Square Test for Independence Chi-Square Tests for Goodness-of-Fit Copyright © 2010 by The McGraw-Hill.
Week 10 Nov 3-7 Two Mini-Lectures QMM 510 Fall 2014.
Chapter Outline Goodness of Fit test Test of Independence.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
CHI SQUARE DISTRIBUTION. The Chi-Square (  2 ) Distribution The chi-square distribution is the probability distribution of the sum of several independent,
Test of independence: Contingency Table
St. Edward’s University
Chapter 12 Chi-Square Tests and Nonparametric Tests
Two-Sample Hypothesis Testing
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
CHAPTER 11 CHI-SQUARE TESTS
3. The X and Y samples are independent of one another.
Chapter 4. Inference about Process Quality
Sampling Distributions and Estimation
Chapter 9: Inferences Involving One Population
Prepared by Lloyd R. Jaisingh
Hypothesis testing. Chi-square test
John Loucks St. Edward’s University . SLIDES . BY.
Hypothesis Testing Using the Chi Square (χ2) Distribution
Goodness-of-Fit Tests
Goodness of Fit Tests The goal of χ2 goodness of fit tests is to test is the data comes from a certain distribution. There are various situations to which.
Data Analysis for Two-Way Tables
Chapter 9 Hypothesis Testing.
Discrete Event Simulation - 4
Hypothesis testing. Chi-square test
Chapter 10 Analyzing the Association Between Categorical Variables
Chi Square (2) Dr. Richard Jackson
Overview and Chi-Square
Contingency tables and goodness of fit
Paired Samples and Blocks
Analyzing the Association Between Categorical Variables
What are their purposes? What kinds?
Section 11-1 Review and Preview
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
UNIT V CHISQUARE DISTRIBUTION
Chapter Nine: Using Statistics to Answer Questions
Chapter Fifteen Frequency Distribution, Cross-Tabulation, and
S.M.JOSHI COLLEGE, HADAPSAR
Chapter 9 Hypothesis Testing: Single Population
Chapter 26 Comparing Counts.
Chapter 18 The Binomial Test
Chapter Outline Goodness of Fit test Test of Independence.
Presentation transcript:

15 Chi-Square Tests Chi-Square Test for Independence Chapter Chi-Square Tests Chi-Square Test for Independence Chi-Square Tests for Goodness-of-Fit Uniform Goodness-of-Fit Test Poisson Goodness-of-Fit Test Normal Chi-Square Goodness-of-Fit Test ECDF Tests (Optional) McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Chi-Square Test for Independence Contingency Tables A contingency table is a cross-tabulation of n paired observations into categories. Each cell shows the count of observations that fall into the category defined by its row (r) and column (c) heading.

Chi-Square Test for Independence Contingency Tables For example:

Chi-Square Test for Independence In a test of independence for an r x c contingency table, the hypotheses are H0: Variable A is independent of variable B H1: Variable A is not independent of variable B Use the chi-square test for independence to test these hypotheses. This non-parametric test is based on frequencies. The n data pairs are classified into c columns and r rows and then the observed frequency fjk is compared with the expected frequency ejk.

Chi-Square Test for Independence Chi-Square Distribution The critical value comes from the chi-square probability distribution with n degrees of freedom. n = degrees of freedom = (r – 1)(c – 1) where r = number of rows in the table c = number of columns in the table Appendix E contains critical values for right-tail areas of the chi-square distribution. The mean of a chi-square distribution is n with variance 2n.

Chi-Square Test for Independence Chi-Square Distribution Consider the shape of the chi-square distribution:

Chi-Square Test for Independence Expected Frequencies Assuming that H0 is true, the expected frequency of row j and column k is: ejk = RjCk/n where Rj = total for row j (j = 1, 2, …, r) Ck = total for column k (k = 1, 2, …, c) n = sample size

Chi-Square Test for Independence Expected Frequencies The table of expected frequencies is: The ejk always sum to the same row and column frequencies as the observed frequencies.

Chi-Square Test for Independence Steps in Testing the Hypotheses Step 1: State the Hypotheses H0: Variable A is independent of variable B H1: Variable A is not independent of variable B Step 2: State the Decision Rule Calculate n = (r – 1)(c – 1) For a given a, look up the right-tail critical value (c2R) from Appendix E or by using Excel. Reject H0 if c2R > test statistic.

Chi-Square Test for Independence Steps in Testing the Hypotheses For example, for n = 6 and a = .05, c2.05 = 12.59.

Chi-Square Test for Independence Steps in Testing the Hypotheses Here is the rejection region.

Chi-Square Test for Independence Steps in Testing the Hypotheses Step 3: Calculate the Expected Frequencies ejk = RjCk/n For example,

Chi-Square Test for Independence Steps in Testing the Hypotheses Step 4: Calculate the Test Statistic The chi-square test statistic is Step 5: Make the Decision Reject H0 if c2R > test statistic or if the p-value < a.

Chi-Square Test for Independence Small Expected Frequencies The chi-square test is unreliable if the expected frequencies are too small. Rules of thumb: Cochran’s Rule requires that ejk > 5 for all cells. Up to 20% of the cells may have ejk < 5 Most agree that a chi-square test is infeasible if ejk < 1 in any cell. If this happens, try combining adjacent rows or columns to enlarge the expected frequencies.

Chi-Square Test for Independence Small Expected Frequencies For example, here are some test results from MegaStat

Chi-Square Test for Independence Test of Two Proportions For a 2 x 2 contingency table, the chi-square test is equivalent to a two-tailed z test for two proportions, if the samples are large enough to ensure normality. The hypotheses are: H0: p1 = p2 H1: p1 ≠ p2 The z test statistic is:

Chi-Square Test for Independence Cross-Tabulating Raw Data Chi-square tests for independence can also be used to analyze quantitative variables by coding them into categories. For example, the variables Infant Deaths per 1,000 and Doctors per 100,000 can each be coded into various categories:

Chi-Square Test for Independence 3-Way Tables and Higher More than two variables can be compared using contingency tables. However, it is difficult to visualize a higher order table. For example, you could visualize a cube as a stack of tiled 2-way contingency tables. Major computer packages permit 3-way tables.

Chi-Square Test for Goodness-of-Fit Purpose of the Test The goodness-of-fit (GOF) test helps you decide whether your sample resembles a particular kind of population. The chi-square test will be used because it is versatile and easy to understand.

Chi-Square Test for Goodness-of-Fit Hypotheses for GOF The hypotheses are: H0: The population follows a _____ distribution H1: The population does not follow a ______ distribution The blank may contain the name of any theoretical distribution (e.g., uniform, Poisson, normal).

Chi-Square Test for Goodness-of-Fit Test Statistic and Degrees of Freedom for GOF Assuming n observations, the observations are grouped into c classes and then the chi-square test statistic is found using: where fj = the observed frequency of observations in class j ej = the expected frequency in class j if H0 were true

Chi-Square Test for Goodness-of-Fit Test Statistic and Degrees of Freedom for GOF If the proposed distribution gives a good fit to the sample, the test statistic will be near zero. The test statistic follows the chi-square distribution with degrees of freedom n = c – m – 1 where c is the no. of classes used in the test m is the no. of parameters estimated

Chi-Square Test for Goodness-of-Fit Test Statistic and Degrees of Freedom for GOF

Chi-Square Test for Goodness-of-Fit Data-Generating Situations Instead of “fishing” for a good-fitting model, visualize a priori the characteristics of the underlying data-generating process. Mixtures: A Problem Mixtures occur when more than one data-generating process is superimposed on top of one another.

Chi-Square Test for Goodness-of-Fit Eyeball Tests A simple “eyeball” inspection of the histogram or dot plot may suffice to rule out a hypothesized population. Small Expected Frequencies Goodness-of-fit tests may lack power in small samples. As a guideline, a chi-square goodness-of-fit test should be avoided if n < 25.

Uniform Goodness-of-Fit Test Multinomial Distribution A multinomial distribution is defined by any k probabilities p1, p2, …, pk that sum to unity. For example, consider the following “official” proportions of M&M colors.

Uniform Goodness-of-Fit Test Multinomial Distribution The hypotheses are H0: p1 = .30, p2 = .20, p3 = .10, p4 = .10, p5 = .10, p6 = .20 H1: At least one of the pj differs from the hypothesized value No parameters are estimated (m = 0) and there are c = 6 classes, so the degrees of freedom are n = c – m – 1 = 6 – 0 - 1

Uniform Goodness-of-Fit Test Uniform Distribution The uniform goodness-of-fit test is a special case of the multinomial in which every value has the same chance of occurrence. The chi-square test for a uniform distribution compares all c groups simultaneously. The hypotheses are: H0: p1 = p2 = …, pc = 1/c H1: Not all pj are equal

Uniform Goodness-of-Fit Test Uniform GOF Test: Grouped Data The test can be performed on data that are already tabulated into groups. Calculate the expected frequency eij for each cell. The degrees of freedom are n = c – 1 since there are no parameters for the uniform distribution. Obtain the critical value c2a from Appendix E for the desired level of significance a. The p-value can be obtained from Excel. Reject H0 if p-value < a.

Uniform Goodness-of-Fit Test Uniform GOF Test: Raw Data First form c bins of equal width and create a frequency distribution. Calculate the observed frequency fj for each bin. Define ej = n/c. Perform the chi-square calculations. The degrees of freedom are n = c – 1 since there are no parameters for the uniform distribution. Obtain the critical value from Appendix E for a given significance level a and make the decision.

Uniform Goodness-of-Fit Test Uniform GOF Test: Raw Data Maximize the test’s power by defining bin width as As a result, the expected frequencies will be as large as possible.

Uniform Goodness-of-Fit Test Uniform GOF Test: Raw Data Calculate the mean and standard deviation of the uniform distribution as: m = (a + b)/2 If the data are not skewed and the sample size is large (n > 30), then the mean is approximately normally distributed. So, test the hypothesized uniform mean using s = [(b – a + 1)2 – 1)/12

Poisson Goodness-of-Fit Test Poisson Data-Generating Situations In a Poisson distribution model, X represents the number of events per unit of time or space. X is a discrete nonnegative integer (X = 0, 1, 2, …) Event arrivals must be independent of each other. Sometimes called a model of rare events because X typically has a small mean.

Poisson Goodness-of-Fit Test The mean l is the only parameter. Assuming that l is unknown and must be estimated from the sample, the steps are: Step 1: Tally the observed frequency fj of each X-value. Step 2: Estimate the mean l from the sample. Step 3: Use the estimated l to find the Poisson probability P(X) for each value of X.

Poisson Goodness-of-Fit Test Step 4: Multiply P(X) by the sample size n to get expected Poisson frequencies ej. Step 5: Perform the chi-square calculations. Step 6: Make the decision. You may need to combine classes until expected frequencies become large enough for the test (at least until ej > 2).

Poisson Goodness-of-Fit Test Poisson GOF Test: Tabulated Data Calculate the sample mean as: Using this estimate mean, calculate the Poisson probabilities either by using the Poisson formula P(x) = (lxe-l)/x! or Excel. ^ l = S xj fj c j =1 n

Poisson Goodness-of-Fit Test Poisson GOF Test: Tabulated Data For c classes with m = 1 parameter estimated, the degrees of freedom are n = c – m – 1 Obtain the critical value for a given a from Appendix E. Make the decision.

Normal Chi-Square Goodness-of-Fit Test Normal Data Generating Situations Two parameters, m and s, fully describe the normal distribution. Unless m and s are know a priori, they must be estimated from a sample by using x and s. Using these statistics, the chi-square goodness-of-fit test can be used.

Normal Chi-Square Goodness-of-Fit Test Method 1: Standardizing the Data Transform the sample observations x1, x2, …, xn into standardized values. Count the sample observations fj within intervals of the form and compare them with the known frequencies ej based on the normal distribution. x + ks

Normal Chi-Square Goodness-of-Fit Test Method 1: Standardizing the Data Advantage is a standardized scale. Disadvantage is that data are no longer in the original units.

Normal Chi-Square Goodness-of-Fit Test Method 2: Equal Bin Widths To obtain equal-width bins, divide the exact data range into c groups of equal width. Step 1: Count the sample observations in each bin to get observed frequencies fj. Step 2: Convert the bin limits into standardized z-values by using the formula.

Normal Chi-Square Goodness-of-Fit Test Method 2: Equal Bin Widths Step 3: Find the normal area within each bin assuming a normal distribution. Step 4: Find expected frequencies ej by multiplying each normal area by the sample size n. Classes may need to be collapsed from the ends inward to enlarge expected frequencies.

Normal Chi-Square Goodness-of-Fit Test Method 3: Equal Expected Frequencies Define histogram bins in such a way that an equal number of observations would be expected within each bin under the null hypothesis. Define bin limits so that ej = n/c A normal area of 1/c in each of the c bins is desired. The first and last classes must be open-ended for a normal distribution, so to define c bins, we need c – 1 cutpoints.

Normal Chi-Square Goodness-of-Fit Test Method 3: Equal Expected Frequencies The upper limit of bin j can be found directly by using Excel. Alternatively, find zj for bin j using Excel and then calculate the upper limit for bin j as Once the bins are defined, count the observations fj within each bin and compare them with the expected frequencies ej = n/c. x + zjs

Normal Chi-Square Goodness-of-Fit Test Method 3: Equal Expected Frequencies Standard normal cutpoints for equal area bins.

Normal Chi-Square Goodness-of-Fit Test Histograms The fitted normal histogram gives visual clues as to the likely outcome of the GOF test. Histograms reveal any outliers or other non-normality issues. Further tests are needed since histograms vary.

Normal Chi-Square Goodness-of-Fit Test Critical Values for Normal GOF Test Since two parameters, m and s, are estimated from the sample, the degrees of freedom are n = c – m – 1 At least 4 bins are needed to ensure 1 df.

ECDF Tests Kolmogorov-Smirnov and Lilliefors Tests There are many alternatives to the chi-square test based on the Empirical Cumulative Distribution Function (ECDF). The Kolmogorov-Smirnov (K-S) test statistic D is the largest absolute difference between the actual and expected cumulative relative frequency of the n data values: D = Max |Fa – Fe| The K-S test is not recommended for grouped data.

ECDF Tests Kolmogorov-Smirnov and Lilliefors Tests Fa is the actual cumulative frequency at observation i. Fe is the expected cumulative frequency at observation i under the assumption that the data came from the hypothesized distribution. The K-S test assumes that no parameters are estimated. If parameters are estimated, use a Lilliefors test. Both of these tests are done by computer.

ECDF Tests Kolmogorov-Smirnov and Lilliefors Tests K-S test for uniformity.

ECDF Tests Kolmogorov-Smirnov and Lilliefors Tests K-S test for normality.

ECDF Tests Anderson-Darling Tests The Anderson-Darling (A-D) test is widely used for non-normality because of its power. The A-D test is based on a probability plot. When the data fit the hypothesized distribution closely, the probability plot will be close to a straight line. The A-D test statistic measures the overall distance between the actual and the hypothesized distributions, using a weighted squared distance.

ECDF Tests Anderson-Darling Tests with MINITAB

Applied Statistics in Business and Economics End of Chapter 15