Lecture 9 Chapter 22. Tests for two-way tables. Objectives (PSLS Chapter 22) The chi-square test for two-way tables (Award: NHST Test for Independence)[B.

Slides:



Advertisements
Similar presentations
CHAPTER 23: Two Categorical Variables: The Chi-Square Test
Advertisements

CHAPTER 23: Two Categorical Variables The Chi-Square Test ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture.
Chapter 13: Inference for Distributions of Categorical Data
1 Chi-Squared Distributions Inference for Categorical Data and Multiple Groups.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 25 Comparing Counts.
CHAPTER 11 Inference for Distributions of Categorical Data
Analysis of Two-Way Tables Inference for Two-Way Tables IPS Chapter 9.1 © 2009 W.H. Freeman and Company.
11-3 Contingency Tables In this section we consider contingency tables (or two-way frequency tables), which include frequency counts for categorical data.
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Presentation 12 Chi-Square test.
Chapter 13 Chi-Square Tests. The chi-square test for Goodness of Fit allows us to determine whether a specified population distribution seems valid. The.
Analysis of Two-Way Tables
Analysis of Count Data Chapter 26
Goodness-of-Fit Tests and Categorical Data Analysis
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 26 Comparing Counts.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
A random sample of 300 doctoral degree
Analysis of Count Data Chapter 14  Goodness of fit  Formulas and models for two-way tables - tests for independence - tests of homogeneity.
Analysis of two-way tables - Formulas and models for two-way tables - Goodness of fit IPS chapters 9.3 and 9.4 © 2006 W.H. Freeman and Company.
1 Desipramine is an antidepressant affecting the brain chemicals that may become unbalanced and cause depression. It was tested for recovery from cocaine.
Analysis of Count Data Chapter 26  Goodness of fit  Formulas and models for two-way tables - tests for independence - tests of homogeneity.
Analysis of two-way tables - Formulas and models for two-way tables - Goodness of fit IPS chapters 9.3 and 9.4 © 2006 W.H. Freeman and Company.
Warm-up Researchers want to cross two yellow- green tobacco plants with genetic makeup (Gg). See the Punnett square below. When the researchers perform.
Lecture 9 Chapter 22. Tests for two-way tables. Objectives The chi-square test for two-way tables (Award: NHST Test for Independence)  Two-way tables.
CHAPTER 11 SECTION 2 Inference for Relationships.
Chapter 11 The Chi-Square Test of Association/Independence Target Goal: I can perform a chi-square test for association/independence to determine whether.
FPP 28 Chi-square test. More types of inference for nominal variables Nominal data is categorical with more than two categories Compare observed frequencies.
13.2 Chi-Square Test for Homogeneity & Independence AP Statistics.
Analysis of Two-Way tables Ch 9
+ Chi Square Test Homogeneity or Independence( Association)
Chi-Square Analysis Test of Homogeneity. Sometimes we compare samples of different populations for certain characteristics. This data is often presented.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Analysis of two-way tables - Inference for two-way tables IPS chapter 9.1 © 2006 W.H. Freeman and Company.
Analysis of two-way tables - Inference for two-way tables IPS chapter 9.2 © 2006 W.H. Freeman and Company.
Data Analysis for Two-Way Tables. The Basics Two-way table of counts Organizes data about 2 categorical variables Row variables run across the table Column.
Chapter 11 Chi- Square Test for Homogeneity Target Goal: I can use a chi-square test to compare 3 or more proportions. I can use a chi-square test for.
CHAPTER 23: Two Categorical Variables The Chi-Square Test ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture.
Two-way tables BPS chapter 6 © 2006 W. H. Freeman and Company.
AP STATISTICS LESSON (DAY 1) INFERENCE FOR TWO – WAY TABLES.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
+ Chapter 11 Inference for Distributions of Categorical Data 11.1Chi-Square Goodness-of-Fit Tests 11.2Inference for Relationships.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Chapter 13 Section 2. Chi-Square Test 1.Null hypothesis – written in words 2.Alternative hypothesis – written in words – always “different” 3.Alpha level.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Chi Square Procedures Chapter 14. Chi-Square Goodness-of-Fit Tests Section 14.1.
Inference of 2-way Tables. Review… New Stuff: 2-Way Tables  Looking for Significant difference across 2 variables instead of just 1 (hence, 2-way) 
11/12 9. Inference for Two-Way Tables. Cocaine addiction Cocaine produces short-term feelings of physical and mental well being. To maintain the effect,
 Check the Random, Large Sample Size and Independent conditions before performing a chi-square test  Use a chi-square test for homogeneity to determine.
22. Chi-square test for two-way tables
Analysis of Count Data Goodness of fit
Objectives (PSLS Chapter 22)
Objectives (BPS chapter 23)
The Practice of Statistics in the Life Sciences Third Edition
Two-Way Tables and The Chi-Square Test*
Chapter 25 Comparing Counts.
The Practice of Statistics in the Life Sciences Fourth Edition
22. Chi-square test for two-way tables
The Practice of Statistics in the Life Sciences Fourth Edition
Inference for Relationships
Chapter 26 Comparing Counts.
Chapter 13: Inference for Distributions of Categorical Data
Inference for Two Way Tables
Chapter 26 Comparing Counts.
11.2 Inference for Relationships
Presentation transcript:

Lecture 9 Chapter 22. Tests for two-way tables

Objectives (PSLS Chapter 22) The chi-square test for two-way tables (Award: NHST Test for Independence)[B level]  Two-way tables  Hypotheses for the chi-square test for two-way tables  Expected counts in a two-way table  Conditions for the chi-square test  Chi-square test for two-way tables of fit  Simpson’s paradox

An experiment has a two-way factorial design if two categorical factors are studied with several levels of each factor. Two-way tables organize data about two categorical variables with any number of levels/treatments obtained from a factorial design design or two-way observational study. Two-way tables First factor: Parent smoking status Second factor: Student smoking status High school students were asked whether they smoke, and whether their parents smoke:

Marginal distribution The marginal distributions (in the “margins” of the table) summarize each factor independently Marginal distribution for parental smoking: P(both parent) = 1780/5375 = 33.1% P(one parent) = 41.7% P(neither parent) = 25.2%

The cells of the two-way table represent the intersection of a given level of one factor with a given level of the other factor. They represent the conditional distributions. Conditional distribution of student smoking for different parental smoking statuses: P(student smokes | both parent) = 400/1780 = 22.5% P(student smokes | one parent) = 416/2239 =18.6% P(student smokes | neither parent) = 188/1356 = 13.9% Conditional distribution

Hypotheses A two-way table has r rows and c columns. H 0 states that there is no association between the row and column variables in the table. Statistical Hypotheses H 0 : There is no association between the row and column variables H a : There is an association/relationship between the 2 variables We will compare actual counts from the sample data with the counts we would expect if the null hypothesis of no relationship were true.

Cocaine addiction Cocaine produces short-term feelings of physical and mental well being. To maintain the effect, the drug may have to be taken more frequently and at higher doses. After stopping use, users will feel tired, sleepy and depressed. A study compares the rates of successful rehabilitation for cocaine addicts following 1 of 3 treatment options: 1: antidepressant treatment (desipramine) 2: standard treatment (lithium) 3: placebo (“sugar pill”)

Cocaine addiction What should be the expected cell counts if relapse is independent of the treatment. In other words, there is no association between treatment and relapse.

Expected counts in a two-way table A two-way table has r rows and c columns. H 0 states that there is no association between the row and column variables (factors) in the table. The expected count in any cell of a two-way table when H 0 is true is: The expected count is the average count you would get for that cell if the null hypotheses was true.

25*26/74 ≈ * * * * * *0.65 Desipramine Lithium Placebo 35% Expected % Observed % Expected relapse counts No Yes

Situations appropriate for the chi-square test The chi-square test for two-way tables looks for evidence of association between multiple categorical variables (factors) in sample data. The samples can be drawn either:  By randomly selecting SRSs from different populations (or from a population subjected to different treatments)  girls vaccinated for HPV or not, among 8th graders and 12th graders  remission or no remission for different treatments  Or by taking 1 SRS and classifying the individuals according to 2 categorical variables (factors)  11 th graders’ smoking status and parents’ status When looking for associations between two categorical/nominal variables.

We can safely use the chi-square test when:  no more than 20% of expected counts are less than 5 (< 5)  all individual expected counts are 1 or more (≥1) What goes wrong? With small expected cell counts the sampling distribution will not be chi-square distributed. Statistician’s note: If one factor has many levels and too many expected counts are too low, you might be able to “collapse” some of the levels (regroup them) and thus have large-enough expected counts.

P-value: P(  2 variable ≥ calculated  2 | H 0 is true) The  2 statistic sums over all r x c cells in the table When H 0 is true, the  2 statistic follows ~  2 distribution with (r-1)(c-1) degrees of freedom. The chi-square test for two-way tables H 0 : there is no association between the row and column variables H a : H 0 is not true

Table D Ex: df = 6 If  2 = 15.9 the P-value is between 0.01 −0.02.

Desipramine Lithium Placebo No relapse Relapse Table of counts: “actual/expected,” with three rows and two columns: df = (3 − 1)(2 − 1) = 2 We compute the X 2 statistic: Using Table D: P > The P-value is very small (JMP gives P = ) and we reject H 0.  There is a significant relationship between treatment type (desipramine, lithium, placebo) and outcome (relapse or not).

Interpreting the  2 output When the  2 test is statistically significant: The largest components indicate which condition(s) are most different from H 0. You can also compare the observed and expected counts, or compare the computed proportions in a graph. The largest X 2 component, 4.41, is for desipramine/norelapse. Desipramine has the highest success rate (see graph).  2 components Desipramine Lithium Placebo No relapse Relapse

Influence of parental smoking Here is a computer output for a chi-square test performed on the data from a random sample of high school students (rows are parental smoking habits, columns are the students’ smoking habits). What does it tell you? Is the sample size sufficient? What are the hypotheses? Are the data ok for a  2 test? What else should you ask? What is your interpretation?

Caution with categorical data An association that holds for all of several groups can reverse direction when the data are combined to form a single group. This reversal is called Simpson's paradox. Kidney stones It turns out that for any given patient that PCNL is more likely to result in failure. Can you think of a reason why? A study compared the success rates of two different procedures for removing kidney stones: open surgery and percutaneous nephrolithotomy (PCNL), a minimally invasive technique % 17%

The procedures are not chosen randomly by surgeons! In fact, the minimally invasive procedure is most likely used for smaller stones (with a good chance of success) whereas open surgery is likely used for more problematic conditions % 17% For both small stones and large stones, open surgery has a lower failure rate. This is Simpson’s paradox. The more challenging cases with large stones tend to be treated more often with open surgery, making it appear as if the procedure were less reliable overall. Beware of lurking variables!

Objectives (PSLS Chapter 22) The chi-square test for two-way tables (Award: NHST Test for Independence)[B level]  Two-way tables  Hypotheses for the chi-square test for two-way tables  Expected counts in a two-way table  Conditions for the chi-square test  Chi-square test for two-way tables of fit  Simpson’s paradox