1 Chapter 20 Two Categorical Variables: The Chi-Square Test.

Slides:



Advertisements
Similar presentations
BPS - 5th Ed. Chapter 241 One-Way Analysis of Variance: Comparing Several Means.
Advertisements

CHAPTER 23: Two Categorical Variables: The Chi-Square Test
CHAPTER 23: Two Categorical Variables The Chi-Square Test ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture.
Chapter 11 Inference for Distributions of Categorical Data
Does Background Music Influence What Customers Buy?
Chapter 13: Inference for Distributions of Categorical Data
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
Chapter 26: Comparing Counts
CHAPTER 11 Inference for Distributions of Categorical Data
Two-Way Tables Two-way tables come about when we are interested in the relationship between two categorical variables. –One of the variables is the row.
Statistics 303 Chapter 9 Two-Way Tables. Relationships Between Two Categorical Variables Relationships between two categorical variables –Depending on.
Stat 512 – Lecture 13 Chi-Square Analysis (Ch. 8).
Chapter 26: Comparing Counts. To analyze categorical data, we construct two-way tables and examine the counts of percents of the explanatory and response.
Chapter 9 Hypothesis Testing.
Presentation 12 Chi-Square test.
Chapter 13 Chi-Square Tests. The chi-square test for Goodness of Fit allows us to determine whether a specified population distribution seems valid. The.
Lesson Inference for Two-Way Tables. Vocabulary Statistical Inference – provides methods for drawing conclusions about a population parameter from.
Testing Distributions Section Starter Elite distance runners are thinner than the rest of us. Skinfold thickness, which indirectly measures.
Lecture Presentation Slides SEVENTH EDITION STATISTICS Moore / McCabe / Craig Introduction to the Practice of Chapter 9 Analysis of Two-Way Tables.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics, A First Course 4 th Edition.
Goodness-of-Fit Tests and Categorical Data Analysis
A random sample of 300 doctoral degree
Analysis of two-way tables - Formulas and models for two-way tables - Goodness of fit IPS chapters 9.3 and 9.4 © 2006 W.H. Freeman and Company.
1 Desipramine is an antidepressant affecting the brain chemicals that may become unbalanced and cause depression. It was tested for recovery from cocaine.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.
Chi-square test or c2 test
3 Unique Chi-Square Tests
Analysis of two-way tables - Formulas and models for two-way tables - Goodness of fit IPS chapters 9.3 and 9.4 © 2006 W.H. Freeman and Company.
Lecture 9 Chapter 22. Tests for two-way tables. Objectives The chi-square test for two-way tables (Award: NHST Test for Independence)  Two-way tables.
Chapter 26 Chi-Square Testing
Chapter 11 Inference for Tables: Chi-Square Procedures 11.1 Target Goal:I can compute expected counts, conditional distributions, and contributions to.
Chapter 241 Two-Way Tables and the Chi-square Test.
Chapter 11 The Chi-Square Test of Association/Independence Target Goal: I can perform a chi-square test for association/independence to determine whether.
+ Chi Square Test Homogeneity or Independence( Association)
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Data Analysis for Two-Way Tables. The Basics Two-way table of counts Organizes data about 2 categorical variables Row variables run across the table Column.
Essential Statistics Chapter 161 Review Part III_A_Chi Z-procedure Vs t-procedure.
Chapter 11 Chi- Square Test for Homogeneity Target Goal: I can use a chi-square test to compare 3 or more proportions. I can use a chi-square test for.
CHAPTER 23: Two Categorical Variables The Chi-Square Test ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture.
Statistical Significance for a two-way table Inference for a two-way table We often gather data and arrange them in a two-way table to see if two categorical.
1 Chapter 11: Analyzing the Association Between Categorical Variables Section 11.1: What is Independence and What is Association?
Lecture 9 Chapter 22. Tests for two-way tables. Objectives (PSLS Chapter 22) The chi-square test for two-way tables (Award: NHST Test for Independence)[B.
Lesson Inference for Two-Way Tables. Knowledge Objectives Explain what is mean by a two-way table. Define the chi-square (χ 2 ) statistic. Identify.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Chapter 13- Inference For Tables: Chi-square Procedures Section Test for goodness of fit Section Inference for Two-Way tables Presented By:
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
+ Section 11.1 Chi-Square Goodness-of-Fit Tests. + Introduction In the previous chapter, we discussed inference procedures for comparing the proportion.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Chi Square Procedures Chapter 14. Chi-Square Goodness-of-Fit Tests Section 14.1.
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.
11/12 9. Inference for Two-Way Tables. Cocaine addiction Cocaine produces short-term feelings of physical and mental well being. To maintain the effect,
 Check the Random, Large Sample Size and Independent conditions before performing a chi-square test  Use a chi-square test for homogeneity to determine.
Essential Statistics Two Categorical Variables: The Chi-Square Test
Data Analysis for Two-Way Tables
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Chapter 11: Inference for Distributions of Categorical Data
Chapter 10 Analyzing the Association Between Categorical Variables
Inference for Relationships
Lesson 11 - R Chapter 11 Review:
Analyzing the Association Between Categorical Variables
Inference for Two Way Tables
Presentation transcript:

1 Chapter 20 Two Categorical Variables: The Chi-Square Test

2 Outline u Two-way tables u The problem of multiple comparisons u The chi-square test u The chi-square distributions

3 u Chapter 19: compare proportions of successes for two groups –“Group” is explanatory variable (2 levels) –“Success or Failure” is outcome (2 values) here a relationship between two categorical variables?” u Chapter 20: “is there a relationship between two categorical variables?” –may have 2 or more groups (1 st variable) –may have 2 or more outcomes (2 nd variable) Relationships: Categorical Variables

4 1. Two-Way Tables Quality of lifeCanadaUnited States Much better75541 Somewhat better71498 About the same96779 Somewhat worse50282 Much worse1965 Total

5 –When there are two categorical variables, the data are summarized in a two-way table –The number of observations falling into each combination of the two categorical variables is entered into each cell of the table –Relationships between categorical variables are described by calculating appropriate percents from the counts given in the table Two-Way Tables

6 Example 20.1 Data from patients’ own assessment of their quality of life relative to what it had been before their heart attack (data from patients who survived at least a year) Quality of lifeCanadaUnited States Much better75541 Somewhat better71498 About the same96779 Somewhat worse50282 Much worse1965 Total

7 Quality of lifeCanadaUnited States Much better75541 Somewhat better71498 About the same96779 Somewhat worse50282 Much worse1965 Total Compare the Canadian group to the U.S. group in terms of feeling much better: We have that 75 Canadians reported feeling much better, compared to 541 Americans. The groups appear greatly different, but look at the group totals.

8 Quality of lifeCanadaUnited States Much better75541 Somewhat better71498 About the same96779 Somewhat worse50282 Much worse1965 Total Compare the Canadian group to the U.S. group in terms of feeling much better: Change the counts to percents Now, with a fairer comparison using percents, the groups appear very similar in terms of feeling much better. Quality of lifeCanadaUnited States Much better24%25% Somewhat better23% About the same31%36% Somewhat worse16%13% Much worse6%3% Total100%

9 Is there a relationship between the explanatory variable (Country) and the response variable (Quality of life)? Quality of lifeCanadaUnited States Much better24%25% Somewhat better23% About the same31%36% Somewhat worse16%13% Much wose6%3% Total100% Look at the distributions of the response variable (Quality of life), given each level of the explanatory variable (Country).(P531) Question: Is there a significant difference between the distributions of these two outcomes?

10 u If the distributions of the second variable are nearly the same given the category of the first variable, then we say that there is not an association between the two variables. u If there are significant differences in the distributions, then we say that there is an association between the two variables. u Significance test is needed to draw a conclusion. Significance Test

11 u Hypotheses: –Null: the percentages for one variable are the same for every level of the other variable (no difference in conditional distributions). (No real relationship). –Alt: the percentages for one variable vary over levels of the other variable. (Is a real relationship). Hypothesis Test

12 Null hypothesis: The percentages for one variable are the same for every level of the other variable. (No real relationship). Quality of lifeCanadaUnited States Much better24%25% Somewhat better23% About the same31%36% Somewhat worse16%13% Much worse6%3% Total100% For example, could look at differences in percentages between Canada and U.S. for each level of “Quality of life”: 24% vs. 25% for those who felt ‘Much better’, 23% vs. 23% for ‘Somewhat better’, etc. Problem of multiple comparisons!

13 u Problem of how to do many comparisons at the same time with some overall measure of confidence in all the conclusions u Two steps: –overall test to test for any differences –follow-up analysis to decide which parameters (or groups) differ and how large the differences are u Follow-up analyses can be quite complex; we will look at only the overall test for a relationship between two categorical variables 2. Multiple Comparisons

14 u H 0 : no real relationship between the two categorical variables that make up the rows and columns of a two-way table u To test H 0, compare the observed counts in the table (the original data) with the expected counts (the counts we would expect if H 0 were true) –if the observed counts are far from the expected counts, that is evidence against H 0 in favor of a real relationship between the two variables Hypothesis Test

15 u The expected count in any cell of a two-way table (when H 0 is true) is 3. Expected Counts Quality of lifeCanadaUnited StatesTotal Much better Somewhat better About the same Somewhat worse Much worse Total For the expected count of Canadians who feel ‘Much better’ (expected count for Row 1, Column 1): For the observed data to the right, find the expected value for each cell:

16 Quality of lifeCanadaUnited States Much better75541 Somewhat better71498 About the same96779 Somewhat worse50282 Much worse1965 Observed counts: Quality of lifeCanadaUnited States Much better Somewhat better About the same Somewhat worse Much worse Expected counts: Compare to see if the data support the null hypothesis

17 u To determine if the differences between the observed counts and expected counts are statistically significant (to show a real relationship between the two categorical variables), we use the chi-square statistic: 4. Chi-Square Statistic where the sum is over all cells in the table.

18 u The chi-square statistic is a measure of the distance of the observed counts from the expected counts –is always zero or positive –is only zero when the observed counts are exactly equal to the expected counts –large values of X 2 are evidence against H 0 because these would show that the observed counts are far from what would be expected if H 0 were true Chi-Square Statistic

19 Quality of lifeCanadaUnited States Much better75541 Somewhat better71498 About the same96779 Somewhat worse50282 Much worse1965 Observed counts CanadaUnited States Expected counts

20 u Calculate value of chi-square statistic u Find P-value in order to reject or fail to reject H 0 –use chi-square table for chi-square distribution (next few slides) –from computer output 5. Chi-Square Test

21 u Family of distributions that take only positive values and are skewed to the right u Specific chi-square distribution is specified by giving its degrees of freedom (similar to t dist.) Chi-Square Distributions

22 u Chi-square test for a two-way table with r rows and c columns uses critical values from a chi-square distribution with (r  1)×(c  1) degrees of freedom u P-value is the area to the right of X 2 under the density curve of the chi-square distribution –use chi-square table –P-value = P(X 2 >= X obs 2 ) Chi-Square Test

23 u See page 660 in text for Table E (“Chi-square Table”) u The process for using the chi-square table (Table E) is identical to the process for using the t-table (Table C, page 655), as discussed in Chapter 16 u For particular degrees of freedom (df) in the left margin of Table E, locate the X 2 critical value (x*) in the body of the table; the corresponding probability (p) of lying to the right of this value is found in the top margin of the table (this is how to find the P-value for a chi-square test) Table E: Chi-Square Table

24 Quality of lifeCanadaUnited States Much better75541 Somewhat better71498 About the same96779 Somewhat worse50282 Much worse1965 Case Study Health Care: Canada and U.S. X 2 = df = (r  1)(c  1) = (5  1)(2  1) = 4 Look in the df=4 row of Table E; the value X 2 = falls between the 0.02 and 0.01 critical values. Thus, the P-value for this chi-square test is between 0.01 and 0.02 (is actually ). ** P-value <.05, so we conclude a significant relationship **

25 u Tests the null hypothesis H 0 : no relationship between two categorical variables when you have a two-way table from either of these situations: –Independent SRSs from each of several populations, with each individual classified according to one categorical variable [Example: Health Care case study: two samples (Canadians & Americans); each individual classified according to “Quality of life”] –A single SRS with each individual classified according to both of two categorical variables [Example: Sample of 8235 subjects, with each classified according to their “Job Grade” (1, 2, 3, or 4) and their “Marital Status” (Single, Married, Divorced, or Widowed)] 6. Uses of the Chi-Square Test

26 u The chi-square test is an approximate method, and becomes more accurate as the counts in the cells of the table get larger u The following must be satisfied for the approximation to be accurate: –No more than 20% of the expected counts are less than 5 –All individual expected counts are 1 or greater –In particular, all four expected counts in a 2  2 table should be 5 or greater u If these requirements fail, then two or more groups must be combined to form a new (‘smaller’) two-way table Chi-Square Test: Requirements

27 Summary: steps to do chi-square test 1. Find row total, col total, grand total. 2. Find expected count for each cell. 3. Find test statistic X 2 : df = (r-1)(c-1) 4. Use Table E to find P-value: P-value = P(X 2 >= X obs 2 ) 5. Compare P-value with significance level and draw conclusion.

28 Example 20.7 & 20.8 marital status and job level Job Grade Marital Status SingleMarriedDivorcedWidowed u Do these data show a stat significant relationship between marital status and job grade?