Chapter 26 Comparing Counts.

Slides:



Advertisements
Similar presentations
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
Advertisements

© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Test of Homogeneity.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 25 Comparing Counts.
 Involves testing a hypothesis.  There is no single parameter to estimate.  Considers all categories to give an overall idea of whether the observed.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 26 Comparing Counts.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on Categorical Data 12.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
1 Pertemuan 11 Uji kebaikan Suai dan Uji Independen Mata kuliah : A Statistik Ekonomi Tahun: 2010.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Chi-Squared Significance Tests Chapters 26/27 Objectives: Chi-Squared Distribution Chi-Squared Test Statistic Chi-Squared Goodness of Fit Test Chi-Squared.
Slide 26-1 Copyright © 2004 Pearson Education, Inc.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Copyright © 2010 Pearson Education, Inc. Slide
Comparing Counts.  A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a.
Slide 1 Copyright © 2004 Pearson Education, Inc..
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Chi-Squared Test of Homogeneity Are different populations the same across some characteristic?
Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 11 Multinomial Experiments and Contingency Tables 11-1 Overview 11-2 Multinomial Experiments:
Statistics 26 Comparing Counts. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Inference for Tables Chi-Square Tests Chi-Square Test Basics Formula for test statistic: Conditions: Data is from a random sample/event. All individual.
Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square.
Chapter 26 Comparing Counts. Objectives Chi-Square Model Chi-Square Statistic Knowing when and how to use the Chi- Square Tests; Goodness of Fit Test.
Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a goodness-of-fit.
Comparing Counts Chi Square Tests Independence.
Chapter 12 Chi-Square Tests and Nonparametric Tests
CHAPTER 26 Comparing Counts.
Presentation 12 Chi-Square test.
Chi-square test or c2 test
5.1 INTRODUCTORY CHI-SQUARE TEST
Lecture Slides Elementary Statistics Twelfth Edition
CHAPTER 11 CHI-SQUARE TESTS
Hypothesis Testing Review
Chapter 12 Tests with Qualitative Data
CHAPTER 11 Inference for Distributions of Categorical Data
John Loucks St. Edward’s University . SLIDES . BY.
Chapter 25 Comparing Counts.
Chapter 11 Goodness-of-Fit and Contingency Tables
Inference on Categorical Data
Elementary Statistics
Lecture Slides Elementary Statistics Tenth Edition
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Chapter 11: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 10 Analyzing the Association Between Categorical Variables
Contingency Tables: Independence and Homogeneity
15.1 Goodness-of-Fit Tests
STT 200 Statistical Methods
Inference on Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Paired Samples and Blocks
Analyzing the Association Between Categorical Variables
CHAPTER 11 CHI-SQUARE TESTS
Chapter 26 Comparing Counts.
Chapter 13: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Section 11-1 Review and Preview
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 26 Comparing Counts.
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Presentation transcript:

Chapter 26 Comparing Counts

Objectives Chi-Square Model Chi-Square Statistic Knowing when and how to use the Chi-Square Tests; Goodness of Fit Test of Independence Test of Homogeneity Standardized Residual

Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test Allows us to test whether a certain population distribution seems valid. This is a one variable, one sample test Test of Independence Cross categorizing one group on two-variables to see if there is an association between variables. This is a two variable, one sample test. Test for Homogeneity Compares observed distribution for several groups to each other to see if there is a difference among the population. This is a one variable, many samples test.

Chi Square Model Just like the student t-models, chi square has a family of models depending on degrees of freedom. Unlike the student t-models, a chi square distribution is not symmetric. It’s skewed right. A chi square test statistic is always a one-sided, right-tailed test.

The Chi-Square ( 2 ) Distribution - Properties It is a continuous distribution. It is not symmetric. It is skewed to the right. The distribution depends on the degrees of freedom. The value of a 2 random variable is always nonnegative. There are infinitely many 2 distributions, since each is uniquely defined by its degrees of freedom.

The Chi-Square ( 2 ) Distribution - Properties For small sample size, the 2 distribution is very skewed to the right. As n increases, the 2 distribution becomes more and more symmetrical.

The Chi-Square ( 2 ) Distribution - Properties Since we will be using the 2 distribution for the tests in this chapter, we will need to be able to find critical values associated with the distribution.

Critical Value Since we will be using the 2 distribution for the tests in this chapter, we will need to be able to find critical values associated with the distribution. Explanation of the term – critical or rejection region: A critical or rejection region is a range of test statistic values for which the null hypothesis will be rejected. This range of values will indicate that there is a significant or large enough difference between the postulated parameter value and the corresponding point estimate for the parameter.

Critical Value Explanation of the term – non-critical or non-rejection region: A non-critical or non-rejection region is a range of test statistic values for which the null hypothesis will not be rejected. This range of values will indicate that there is not a significant or large enough difference between the postulated parameter value and the corresponding point estimate for the parameter.

Critical Value Non-Critical Region (Rejection Region) (Non-Rejection Region)

The Chi-Square ( 2 ) Distribution - Properties Notation: 2, df Explanation of the notation 2, df: 2, df is a 2 value with n degrees of freedom such that  (the significance level) area is to the right of the corresponding 2 value.

The Chi-Square ( 2 ) Distribution - Properties Diagram explaining the notation 2, df

The Chi-Square ( 2 ) Distribution - Table

The Chi-Square ( 2 ) Distribution - Table Values for the random variable with the appropriate degrees of freedom can be obtained from the tables in the formula booklet. Example: What is the value of 20.05,10?

The Chi-Square ( 2 ) Distribution - Table α=.05 df=10 χ2 critical value

The Chi-Square ( 2 ) Distribution - Table Solution: From Table in the formula booklet, 20.05,10 = 18.307.

The Chi-Square ( 2 ) Distribution - Table Your Turn: What is the value of 20.10,20?

The Chi-Square ( 2 ) Distribution - Table 20.10,20 = 28.41

Chi-Square (2) Test for Goodness of Fit

Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a goodness-of-fit test. As usual, there are assumptions and conditions to consider…

Assumptions and Conditions Counted Data Condition: Check that the data are counts for the categories of a categorical variable. Independence Assumption: The counts in the cells should be independent of each other. Randomization Condition: The individuals who have been counted and whose counts are available for analysis should be a random sample from some population.

Assumptions and Conditions Sample Size Assumption: We must have enough data for the methods to work. Expected Cell Frequency Condition: We should expect to see at least 5 individuals in each cell. This is similar to the condition that np and nq be at least 10 when we tested proportions.

Calculations Since we want to examine how well the observed data reflect what would be expected, it is natural to look at the differences between the observed and expected counts (Obs – Exp).

Calculations (cont.) The test statistic, called the chi-square (or chi-squared) statistic, is found by adding up the sum of the squares of the deviations between the observed and expected counts divided by the expected counts:

One-Sided or Two-Sided? The chi-square statistic is used only for testing hypotheses, not for constructing confidence intervals. If the observed counts don’t match the expected, the statistic will be large—it can’t be “too small.” So the chi-square test is always one-sided. If the calculated statistic value is large enough, we’ll reject the null hypothesis.

One-Sided or Two-Sided? The mechanics may work like a one-sided test, but the interpretation of a chi-square test is in some ways many-sided. There are many ways the null hypothesis could be wrong. There’s no direction to the rejection of the null model—all we know is that it doesn’t fit.

Procedure Delete slide and insert procedure 13.1 from page 655

Procedure (cont.) Delete slide and insert procedure 13.1(critical value) from page 655

Expected Frequencies E = n p If all expected frequencies are not all equal: each expected frequency is found by multiplying the sum of all observed frequencies by the probability for the category E = n p

Expected Frequencies The chi-square goodness of fit test is always a right-tailed test. For the chi-square goodness-of-fit test, the expected frequencies should be at least 5. When the expected frequency of a class or category is less than 5, this class or category can be combined with another class or category so that the expected frequency is at least 5.

Goodness-of-fit Test Test Statistic 2 =  (O – E)2 E Critical Values 1. Found in Table using k – 1 degrees of freedom where k = number of categories 2. Goodness-of-fit hypothesis tests are always right-tailed.

EXAMPLE There are 4 TV sets that are located in the student center of a large university. At a particular time each day, four different soap operas (1, 2, 3, and 4) are viewed on these TV sets. The percentages of the audience captured by these shows during one semester were 25 percent, 30 percent, 25 percent, and 20 percent, respectively. During the first week of the following semester, 300 students are surveyed.

EXAMPLE (Continued) (a) If the viewing pattern has not changed, what number of students is expected to watch each soap opera? Solution: Based on the information, the expected values will be: 0.25300 = 75, 0.30300 = 90, 0.25300 = 75, and 0.20300 = 60.

EXAMPLE (Continued) (b) Suppose that the actual observed numbers of students viewing the soap operas are given in the following table, test whether these numbers indicate a change at the 1 percent level of significance.

EXAMPLE (Continued) Solution: Given  = 0.01, n = 4, df = 4 – 1 = 3, 20.01, 3= 11.345. The observed and expected frequencies are given below

EXAMPLE (Continued) Solution (continued): The 2 test statistic is computed below.

EXAMPLE (Continued) Solution (continued): P-value = .6828, P > 𝛼

EXAMPLE (Continued) Solution (continued): Diagram showing the rejection region.

The Chi-Square test for Goodness of Fit

Your Turn The Advanced Placement (AP) Statistics examination was first administered in May 1997. Students’ papers are graded on a scale of 1–5, with 5 being the highest score. Over 7,600 students took the exam in the first year, and the distribution of scores was as follows (not including exams that were scored late). Score 5 4 3 2 1 . Percent 15.3 22.0 24.8 19.8 18.1 A distance learning class that took AP Statistics via satellite television had the following distribution of grades: Score 5 4 3 2 1 . Frequency 7 13 7 6 2

Your Turn Carry out an appropriate test to determine if the distribution of scores for students enrolled in the distance learning program is significantly different from the distribution of scores for all students who took the inaugural exam.

2 Test of Independence

Independence Contingency tables categorize counts on two (or more) variables so that we can see whether the distribution of counts on one variable is contingent on the other. A test of whether the two categorical variables are independent examines the distribution of counts for one group of individuals classified according to both variables in a contingency table.

Definition Test of Independence This method tests the null hypothesis that the row variable and column variable in a contingency table are not related. (The null hypothesis is the statement that the row and column variables are independent.) page 590 of text One way to explain why the null is always the variables are independent is to say ‘independence means no (= 0) relationship’. This will relate to previous problems where the null has equality.

Assumptions and Conditions The assumptions and conditions are the same as for the chi-square goodness-of-fit test: Counted Data Condition: The data must be counts. Randomization Condition and 10% Condition: As long as we don’t want to generalize, we don’t have to check these conditions. Expected Cell Frequency Condition: The expected count in each cell must be at least 5.

Test of Independence Test Statistic 2 =  (O – E)2 E Critical Values 1. Found in Table using degrees of freedom = (r – 1)(c – 1) r is the number of rows and c is the number of columns 2. Tests of Independence are always right- tailed. page 591 of text Same chi-square formula as for multinomial tables.

Tests of Independence H0: The row variable is independent of the column variable H1: The row variable is dependent (related to) the column variable This procedure cannot be used to establish a direct cause-and-effect link between variables in question. Dependence means only there is a relationship between the two variables.

Expected Frequency for Contingency Tables table total row total column total (probability of a cell) n • p E = (row total) (column total) (table total)

E = Total number of all observed frequencies in the table (row total) (column total) (table total) Total number of all observed frequencies in the table page 592 of text Example of finding an E value on this page.

Observed and Expected Frequencies Men Women Boys Girls Total 332 1360 1692 318 104 422 29 35 64 27 18 45 706 1517 2223 Survived Died Total We will use the mortality table from the Titanic to find expected frequencies. For the upper left hand cell, we find: Exercise #11 on page 601. = 537.360 E = (706)(1692) 2223

Observed and Expected Frequencies Men Women Boys Girls Total 332 537.360 1360 1692 318 104 422 29 35 64 27 18 45 706 1517 2223 Survived Died Total Find the expected frequency for the lower left hand cell, assuming independence between the row variable and the column variable. Exercise #11 on page 601. = 1154.640 E = (1517)(1692) 2223

Observed and Expected Frequencies Men Women Boys Girls Total 332 537.360 1360 1154.64 1692 318 134.022 104 287.978 422 29 20.326 35 43.674 64 27 14.291 18 30.709 45 706 1517 2223 Survived Died Total To interpret this result for the lower left hand cell, we can say that although 1360 men actually died, we would have expected 1154.64 men to die if survivablility is independent of whether the person is a man, woman, boy, or girl. Exercise #11 on page 601.

Example: Using a 0.05 significance level, test the claim that when the Titanic sank, whether someone survived or died is independent of whether that person is a man, woman, boy, or girl. H0: Whether a person survived is independent of whether the person is a man, woman, boy, or girl. H1: Surviving the Titanic and being a man, woman, boy, or girl are dependent. The test statistic chi-square values need to be compared with the chi-square critical value found in Table A-4.

Example: Using a 0.05 significance level, test the claim that when the Titanic sank, whether someone survived or died is independent of whether that person is a man, woman, boy, or girl. 2= (332–537.36)2 + (318–132.022)2 + (29–20.326)2 + (27–14.291)2 537.36 134.022 20.326 14.291 + (1360–1154.64)2 + (104–287.978)2 + (35–43.674)2 + (18–30.709)2 1154.64 287.978 43.674 30.709 2=78.481 + 252.555 + 3.702+11.302+36.525+117.536+1.723+5.260 = 507.084 The test statistic chi-square values need to be compared with the chi-square critical value found in Table A-4.

The number of degrees of freedom are (r–1)(c–1)= (2–1)(4–1)=3. Example: Using a 0.05 significance level, test the claim that when the Titanic sank, whether someone survived or died is independent of whether that person is a man, woman, boy, or girl. The number of degrees of freedom are (r–1)(c–1)= (2–1)(4–1)=3. Critical value: 2*.05,3 = 7.815. 507.084 > 7.815 We reject the null hypothesis. P-value: P = P(2 > 507.084) = 0. P < 𝛼. We reject the null hypothesis. Survival and gender are dependent. The test statistic chi-square values need to be compared with the chi-square critical value found in Table A-4.

2 = 507.084 2 = 7.815 (from Table ) Test Statistic with  = 0.05 and (r – 1) (c– 1) = (2 – 1) (4 – 1) = 3 degrees of freedom Critical Value 2 = 7.815 (from Table )

Procedure Delete slide and insert procedure 13.2from page 674

Procedure (cont.) Delete slide and insert procedure 13.2( critical value)from page 674

EXAMPLE A survey was done by a car manufacturer concerning a particular make and model. A group of 500 potential customers were asked whether they purchased their current car because of its appearance, its performance rating, or its fixed price (no negotiating). The results, broken down by gender responses, are given on the next slide.

EXAMPLE (Continued) Question: Do females feel differently than males about the three different criteria used in choosing a car, or do they feel basically the same?

Solution χ2 Test for independence. Thus the null hypothesis will be that the criterion used is independent of gender, while the alternative hypothesis will be that the criterion used is dependent on gender.

Solution (continued) The degrees of freedom is given by (number of rows – 1)(number of columns – 1). df = (2 – 1)(3 – 1) = 2.

Solution (continued) Calculate the row and column totals. These row and column are called marginal totals.

Solution (continued) Computation of the expected values The expected value for a cell is the row total times the column total divided by the table total.

Solution (continued) Let us use  = 0.01. So df = (2 –1)(3 –1) = 2 and 20.01, 2 = 9.210.

Solution (continued) The 2 test statistic is computed in the same manner as was done for the goodness-of-fit test.

Solution (continued)

Solution (continued) Diagram showing the rejection region.

Your Turn: Could eye color be a warning signal for hearing loss in patients suffering from meningitis? British researcher Helen Cullington recorded the eye color of 130 deaf patients, and noted whether the patient’s deafness had developed following treatment for meningitis. Her data are summarized in the table. Test an appropriate hypothesis and state your conclusion.

Test of Homogeneity

Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square test of homogeneity. A test of homogeneity is actually the generalization of the two-proportion z-test.

Comparing Observed Distributions (cont.) The statistic that we calculate for this test is identical to the chi-square statistic for independence. In this test, however, we ask whether choices are the same among different groups (i.e., there is no model). The expected counts are found directly from the data and we have different degrees of freedom.

Assumptions and Conditions The assumptions and conditions are the same as for the chi-square goodness-of-fit test: Counted Data Condition: The data must be counts. Randomization Condition and 10% Condition: As long as we don’t want to generalize, we don’t have to check these conditions. Expected Cell Frequency Condition: The expected count in each cell must be at least 5.

Test for Homogeneity In a chi-square test for homogeneity of proportions, we test whether different populations have the same proportion of individuals with some characteristic. The procedures for performing a test of homogeneity are identical to those for a test of independence.

Example: The following question was asked of a random sample of individuals in 1992, 2002, and 2008: “Would you tell me if you feel being a teacher is an occupation of very great prestige?” The results of the survey are presented below: Test the claim that the proportion of individuals that feel being a teacher is an occupation of very great prestige is the same for each year at the  = 0.01 level of significance. 1992 2002 2008 Yes 418 479 525 No 602 541 485

Solution Step 1: The null hypothesis is a statement of “no difference” so the proportions for each year who feel that being a teacher is an occupation of very great prestige are equal. We state the hypotheses as follows: H0: p1992= p2002= p2008 H1: At least one of the proportions is different from the others. Step 2: The level of significance is =0.01.

Solution Step 3: (a) The expected frequencies are found by multiplying the appropriate row and column totals and then dividing by the total sample size. They are given in parentheses in the table below, along with the observed frequencies. 1992 2002 2008 Yes 418 (475.554) 479 525 (470.892) No 602 (544.446) 541 485 (539.108)

Solution Step 3: Since none of the expected frequencies are less than 5, the requirements are satisfied. The test statistic is

Solution: Classical Approach Step 4: There are r = 2 rows and c =3 columns, so we find the critical value using (2-1)(3-1) = 2 degrees of freedom. The critical value is .

Solution: Classical Approach Step 5: Since the test statistic, is greater than the critical value , we reject the null hypothesis.

Solution: P-Value Approach Step 4: There are r = 2 rows and c =3 columns so we find the P-value using (2-1)(3-1) = 2 degrees of freedom. The P-value is the area under the chi-square distribution with 2 degrees of freedom to the right of which is approximately 0.

Solution: P-Value Approach Step 5: Since the P-value is less than the level of significance  = 0.01, we reject the null hypothesis.

Solution Step 6: There is sufficient evidence to reject the null hypothesis at the  = 0.01 level of significance. We conclude that the proportion of individuals who believe that teaching is a very prestigious career is different for at least one of the three years.

Example: Should Dentist Advertise? It may seem hard to believe but until the 1970’s most professional organizations prohibited their members from advertising. In 1977, the U.S. Supreme Court ruled that prohibiting doctors and lawyers from advertising violated their free speech rights.

Should Dentist Advertise? The paper “Should Dentist Advertise?” (J. of Advertising Research (June 1982): 33 – 38) compared the attitudes of consumers and dentists toward the advertising of dental services. Separate samples of 101 consumers and 124 dentists were asked to respond to the following statement: “I favor the use of advertising by dentists to attract new patients.”

Example: Should Dentist Advertise? Possible responses were: strongly agree, agree, neutral, disagree, strongly disagree. The authors were interested in determining whether the two groups—dentists and consumers—differed in their attitudes toward advertising.

Example: Should Dentist Advertise? This is a done by a chi-squared test of homogeneity, that is we are testing the claim that different populations have the same ratio across some second variable characteristic. So how should we state the null and alternative hypotheses for this test?

Example: Should Dentist Advertise? Ha: The true category proportions for all responses are the same for both populations of consumers and dentists. The true category proportions for all responses are not the same for both populations of consumers and dentists.

Observed Data 101 124 225 How do we determine the expected cell count under the assumption of homogeneity? That’s right, the expected cell counts are estimated from the sample data (assuming that H0 is true) by using …

Expected Values So the calculation for the first cell is … 19.30 101 124 225 So the calculation for the first cell is …

Expected Values 101 19.30 30.08 14.36 14.36 22.89 124 23.70 36.92 17.64 17.64 28.11 225

Test Statistic Now we can calculate the c 2 test statistic:

Sampling Distribution The two-way table for this situation has 2 rows and 5 columns, so the appropriate degrees of freedom is (2 – 1)(5 – 1) = 4. Chi-Squared critical value: 𝜒2*= 9.49. 𝜒2 (84.47) > 𝜒2* (9.49), Reject the null hypothesis.

P-value P-value: P = P(𝜒2 > 84.47) ≈ 0. Reject the null hypothesis. Conclusion: With a P-value ≈ 0, reject the null hypothesis. The true category proportions for all responses are not the same for both populations of consumers and dentists.

Your Turn An advertising firm has decided to ask 92 customers at each of three local shopping malls if they are willing to take part in a market research survey. According to previous studies, 38% of Americans refuse to take part in such surveys. At α = 0.01, test the claim that the proportions are equal.

Your Turn Mall A B Mall C Total Will Participate 52 45 36 133 Will not participate 40 47 56 143 92 276

Chi-Square and Causation Chi-square tests are common, and tests for independence are especially widespread. We need to remember that a small P-value is not proof of causation. Since the chi-square test for independence treats the two variables symmetrically, we cannot differentiate the direction of any possible causation even if it existed. And, there’s never any way to eliminate the possibility that a lurking variable is responsible for the lack of independence.

Chi-Square and Causation (cont.) In some ways, a failure of independence between two categorical variables is less impressive than a strong, consistent, linear association between quantitative variables. Two categorical variables can fail the test of independence in many ways. Examining the standardized residuals can help you think about the underlying patterns.

CHI-SQUARE INFERENCE TEST FOR GOODNESS OF FIT Used to determine if a particular population distribution fits a specified form HYPOTHESES: H0: Actual population percents are equal to hypothesized percentages Ha: Actual population percents are different from hypothesized percentages

CHI-SQUARE INFERENCE TEST FOR INDEPENDENCE Used to determine if two variables within a single population are independent HYPOTHESES: H0: There is no relationship between the two variables in the population Ha: There is a dependent relationship between the two variables in the population

CHI-SQUARE INFERENCE TEST FOR HOMOGENEITY Used to determine if two separate populations are similar in respect to a single variable HYPOTHESES: H0: There are no differences among proportions of success in the populations Ha: There are differences among proportions of success in the populations