Two-Way Tables and The Chi-Square Test*

Slides:



Advertisements
Similar presentations
CHAPTER 23: Two Categorical Variables: The Chi-Square Test
Advertisements

CHAPTER 23: Two Categorical Variables The Chi-Square Test ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture.
Chapter 11 Inference for Distributions of Categorical Data
Chapter 13: Inference for Distributions of Categorical Data
A random sample of 300 doctoral degree
1 Desipramine is an antidepressant affecting the brain chemicals that may become unbalanced and cause depression. It was tested for recovery from cocaine.
Lecture 9 Chapter 22. Tests for two-way tables. Objectives The chi-square test for two-way tables (Award: NHST Test for Independence)  Two-way tables.
Analysis of Two-Way tables Ch 9
+ Chi Square Test Homogeneity or Independence( Association)
Analysis of two-way tables - Inference for two-way tables IPS chapter 9.1 © 2006 W.H. Freeman and Company.
Analysis of two-way tables - Inference for two-way tables IPS chapter 9.2 © 2006 W.H. Freeman and Company.
Data Analysis for Two-Way Tables. The Basics Two-way table of counts Organizes data about 2 categorical variables Row variables run across the table Column.
Chapter 11 Chi- Square Test for Homogeneity Target Goal: I can use a chi-square test to compare 3 or more proportions. I can use a chi-square test for.
CHAPTER 23: Two Categorical Variables The Chi-Square Test ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture.
Statistical Significance for a two-way table Inference for a two-way table We often gather data and arrange them in a two-way table to see if two categorical.
1 Chapter 11: Analyzing the Association Between Categorical Variables Section 11.1: What is Independence and What is Association?
Lecture 9 Chapter 22. Tests for two-way tables. Objectives (PSLS Chapter 22) The chi-square test for two-way tables (Award: NHST Test for Independence)[B.
AP STATISTICS LESSON (DAY 1) INFERENCE FOR TWO – WAY TABLES.
+ Chapter 11 Inference for Distributions of Categorical Data 11.1Chi-Square Goodness-of-Fit Tests 11.2Inference for Relationships.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Chapter 13- Inference For Tables: Chi-square Procedures Section Test for goodness of fit Section Inference for Two-Way tables Presented By:
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Chapter 13 Section 2. Chi-Square Test 1.Null hypothesis – written in words 2.Alternative hypothesis – written in words – always “different” 3.Alpha level.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
10.1: Two-Way Tables and the Chi-Square Test Statistics Chap 10:Inference for Tables and Means.
11/12 9. Inference for Two-Way Tables. Cocaine addiction Cocaine produces short-term feelings of physical and mental well being. To maintain the effect,
Chapter 11: Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
22. Chi-square test for two-way tables
Chapter 11: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Objectives (PSLS Chapter 22)
Objectives (BPS chapter 23)
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
Data Analysis for Two-Way Tables
22. Chi-square test for two-way tables
Section 13.2 Chi-square Test of Association / Independence
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Chapter 11: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 10 Analyzing the Association Between Categorical Variables
Inference for Relationships
CHAPTER 11 Inference for Distributions of Categorical Data
Analyzing the Association Between Categorical Variables
Chapter 13: Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
Inference for Two Way Tables
Chapter 11: Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Comparing Proportions for Multiple Populations
CHAPTER 11 Inference for Distributions of Categorical Data
11.2 Inference for Relationships
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Presentation transcript:

Two-Way Tables and The Chi-Square Test* Basic Practice of Statistics - 3rd Edition CHAPTER 24 Two-Way Tables and The Chi-Square Test* STATISTICS CONCEPTS AND CONTROVERSIES Eighth Edition David S. Moore, William I. Notz Lecture Presentation Chapter 5

Chapter 24 Concepts Two-Way Tables Inference for a Two-Way Table The Chi-Square Test Using the Chi-Square Test Simpson’s Paradox

Two-Way Tables

Two-Way Tables To display relationships between two categorical variables, use a two-way table like the table of admission status and gender of applicants. Admission status is the row variable because each row in the table describes one of the possible admission decisions for an applicant. Gender is the column variable because each column describes one gender. The entries in the table are the counts of applicants in each admission status–by–gender class.

Two-Way Tables To describe relationships among categorical variables, calculate appropriate percentages from the counts given.

Inference for a Two-Way Table We often gather data and arrange them in a two-way table to see if two categorical variables are related to each other. The sample data are easy to investigate: turn them into percentages and look for an association between the row and column variables. Is the association in the sample evidence of an association between these variables in the entire population? Or could the sample association easily arise just from the luck of random sampling? This is a question for a significance test.

Inference for a Two-Way Table Cocaine addicts need the drug to feel pleasure. Perhaps giving them a medication that fights depression will help them resist cocaine. A three-year study compared an antidepressant called desipramine, lithium (a standard treatment for cocaine addiction), and a placebo. The subjects were 72 chronic users of cocaine who wanted to break their drug habit. An equal number of the subjects were randomly assigned to each treatment. Here are the counts and percentages of the subjects who succeeded in not using cocaine during the study:

Inference for a Two-Way Table The sample proportions of subjects who did not use cocaine are quite different. In particular, the percentage of subjects in the desipramine group who did not use cocaine was much higher than for the lithium or placebo group. Are these data good evidence that there is a relationship between treatment and outcome in the population of all cocaine addicts?

Inference for a Two-Way Table H0: There is no association between the treatment an addict receives and whether or not there is success in not using cocaine in the population of all cocaine addicts. To test H0, we compare the observed counts in a two-way table with the expected counts, the counts we would expect—except for random variation—if H0 were true. If the observed counts are far from the expected counts, that is evidence against H0. Expected Counts The expected count in any cell of a two-way table when H0 is true is

Inference for a Two-Way Table

The Chi-Square Test To see if the data give evidence against the null hypothesis of “no relationship,” compare the counts in the two-way table with the counts we would expect if there really were no relationship. The significance test uses a statistic that measures how far apart the observed and expected counts are. Chi-Square Statistic The chi-square statistic, denoted X2, is a measure of how far the observed counts in a two-way table are from the expected counts. The formula for the statistic is

The Chi-Square Test

The Chi-Square Test

The Chi-Square Test The Chi-Square Distributions The sampling distribution of the chi-square statistic X2 when the null hypothesis of no association is true is called a chi-square distribution. The chi-square distributions are a family of distributions that take only nonnegative values and are skewed to the right. A specific chi-square distribution is specified by giving its degrees of freedom. The chi-square test for a two-way table with r rows and c columns uses critical values from the chi-square distribution with (r − 1)(c − 1) degrees of freedom.

The Chi-Square Test

The Chi-Square Test

The Chi-Square Test

Using the Chi-Square Test Like our test for a population proportion, the chi-square test uses some approximations that become more accurate as we take more observations. Here is a rough rule for when it is safe to use this test. Cell Counts Required for the Chi-Square Test You can safely use the chi-square test when no more than 20% of the expected counts are less than 5, and all individual expected counts are 1 or greater.

Using the Chi-Square Test There is a clear trend: as the anger score increases, so does the percentage who suffer heart disease. Is this relationship between anger and heart disease statistically significant?

Using the Chi-Square Test

Using the Chi-Square Test

Using the Chi-Square Test

Using the Chi-Square Test Can we conclude that proneness to anger causes heart disease? This is an observational study, not an experiment. It isn’t surprising to find that some lurking variables are confounded with anger. The study report used advanced statistics to adjust for many differences among the three anger groups. The adjustments raised the P-value from P = 0.0003 to P = 0.02 because the lurking variables explain some of the heart disease. This is still good evidence for a relationship if a significance level of 0.05 is used. Because the study started with a random sample of people who had no CHD and followed them forward in time, and because many lurking variables were measured and accounted for, it does give some evidence for causation.

Simpson’s Paradox As is the case with quantitative variables, the effects of lurking variables can change or even reverse relationships between two categorical variables.

Simpson’s Paradox Almost half of the males but only one-third of the females who applied were admitted. Isn’t this proof of discrimination against women?

Simpson’s Paradox In its defense, the university produces a three-way table that classifies applicants by gender, admission decision, and the program to which they applied. We now see that engineering admitted exactly half of all applicants, both male and female, and that English admitted one-fourth of both males and females. There is no association between sex and admission decision in either program.

Simpson’s Paradox How can no association in either program produce strong association when the two are combined? Look at the data: English is hard to get into, and mainly females apply to that program. Electrical engineering is easier to get into and attracts mainly male applicants. English had 40 female and 20 male applicants, while engineering had 60 male and only 20 female applicants. The original two-way table, which did not take account of the difference between programs, was misleading.

Simpson’s Paradox Simpson’s Paradox An association or comparison that holds for all of several groups can disappear or even reverse direction when the data are combined to form a single group. This situation is called Simpson’s paradox. Simpson’s paradox is just an extreme form of the fact that observed associations can be misleading when there are lurking variables. Remember the caution from Chapter 15: beware the lurking variable.

Chapter 24 Concepts Two-Way Tables Inference for a Two-Way Table The Chi-Square Test Using the Chi-Square Test Simpson’s Paradox