Statistical analyses for two- way contingency tables HRP 261 January 10, 2005 Read Chapter 2 Agresti.

Slides:



Advertisements
Similar presentations
1 Contingency Tables: Tests for independence and homogeneity (§10.5) How to test hypotheses of independence (association) and homogeneity (similarity)
Advertisements

Analysis of frequency counts with Chi square
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
Chapter Goals After completing this chapter, you should be able to:
Statistics 303 Chapter 9 Two-Way Tables. Relationships Between Two Categorical Variables Relationships between two categorical variables –Depending on.
Inferences About Process Quality
Chapter 9 Hypothesis Testing.
Introduction to sample size and power calculations How much chance do we have to reject the null hypothesis when the alternative is in fact true? (what’s.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
The Chi-Square Test Used when both outcome and exposure variables are binary (dichotomous) or even multichotomous Allows the researcher to calculate a.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.5 Small Sample.
Chapter 10 Analyzing the Association Between Categorical Variables
How Can We Test whether Categorical Variables are Independent?
AM Recitation 2/10/11.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics, A First Course 4 th Edition.
Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Dr.Shaikh Shaffi Ahamed Ph.D., Dept. of Family & Community Medicine
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
A Course In Business Statistics 4th © 2006 Prentice-Hall, Inc. Chap 9-1 A Course In Business Statistics 4 th Edition Chapter 9 Estimation and Hypothesis.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Contingency tables Brian Healy, PhD. Types of analysis-independent samples OutcomeExplanatoryAnalysis ContinuousDichotomous t-test, Wilcoxon test ContinuousCategorical.
The binomial applied: absolute and relative risks, chi-square.
FPP 28 Chi-square test. More types of inference for nominal variables Nominal data is categorical with more than two categories Compare observed frequencies.
Analysis of Qualitative Data Dr Azmi Mohd Tamil Dept of Community Health Universiti Kebangsaan Malaysia FK6163.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Copyright © 2010 Pearson Education, Inc. Slide
1 Chapter 11: Analyzing the Association Between Categorical Variables Section 11.1: What is Independence and What is Association?
1 G Lect 7a G Lecture 7a Comparing proportions from independent samples Analysis of matched samples Small samples and 2  2 Tables Strength.
© Copyright McGraw-Hill 2004
More Contingency Tables & Paired Categorical Data Lecture 8.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
The p-value approach to Hypothesis Testing
Introduction to Categorical Data Analysis July 22, 2004
Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 11 Multinomial Experiments and Contingency Tables 11-1 Overview 11-2 Multinomial Experiments:
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Copyright © 2009 Pearson Education, Inc t LEARNING GOAL Understand when it is appropriate to use the Student t distribution rather than the normal.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.1 Independence.
Chi Square Test of Homogeneity. Are the different types of M&M’s distributed the same across the different colors? PlainPeanutPeanut Butter Crispy Brown7447.
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Statistical Significance for 2 x 2 Tables Chapter 13.
More than two groups: ANOVA and Chi-square
The binomial applied: absolute and relative risks, chi-square
CHAPTER 11 Inference for Distributions of Categorical Data
John Loucks St. Edward’s University . SLIDES . BY.
Chapter 25 Comparing Counts.
Examples and SAS introduction: -Violations of the rare disease assumption -Use of Fisher’s exact test January 14, 2004.
Chapter 9 Hypothesis Testing.
Two Categorical Variables: The Chi-Square Test
CHAPTER 11 Inference for Distributions of Categorical Data
Contingency Tables: Independence and Homogeneity
Overview and Chi-Square
STAT 312 Introduction Z-Tests and Confidence Intervals for a
CHAPTER 11 Inference for Distributions of Categorical Data
Analyzing the Association Between Categorical Variables
Chapter 26 Comparing Counts.
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Applied Statistics Using SPSS
CHAPTER 11 Inference for Distributions of Categorical Data
Applied Statistics Using SPSS
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Presentation transcript:

Statistical analyses for two- way contingency tables HRP 261 January 10, 2005 Read Chapter 2 Agresti

Overview of statistical tests for two-way contingency tables Table Size Test or measures of association 2x2risk ratio odds ratio Chi-square difference in proportions Fischer’s Exact test (cell size less than 5) RxCChi-square Fischer’s Exact test (cell size less than 5)

Exposure (E)No Exposure (~E) Disease (D)ab No Disease (~D)cd a+cb+d risk to the exposed risk to the unexposed Review: The Risk Ratio (RR)

Exposure (E)No Exposure (~E) Disease (D)a = P (D& E)b = P(D& ~E) No Disease (~D)c = P (~D&E)d = P (~D&~E) Review: The Odds Ratio (OR) Odds of disease in the exposed Odds of disease in the unexposed

Practice Problem: Suppose the following data were collected on a random (cross-sectional) sample of teenagers: Calculate the odds ratio and risk ratio for the association between exposure to smoking in movies and smoking behavior. Ever tried smoking Never smoked Exposed to high occurrence of smoking in films Lower exposure to smoking in films 30370

Answer OR = (100*370)/(30*300) = 4.11 RR = (100/400=.25)/(30/400=.075) = 3.33 Ever tried smokingNever smoked Exposed to high occurrence of smoking in films Lower exposure to smoking in films 30370

Standard error of the difference of two proportions=Standard error of a proportion= Null distribution of a difference in proportions Standard error can be estimated by= (still normally distributed) The variance of a difference is the sum of variances (for independent samples) Simply the overall proportion of positive outcomes in the total sample.

Null distribution of a difference in proportions Difference of proportions

Difference of proportions… Example An August 2003 research article in Developmental and Behavioral Pediatrics reported the following about a sample of UK kids: when given a choice of a non-branded chocolate cereal vs. CoCo Pops, 97% (36) of 37 girls and 71% (27) of 38 boys preferred the CoCo Pops. Is this evidence that girls are more likely to choose brand-named products?

Answer 1. Hypotheses: H0: p ♂ -p ♀ = 0 Ha: p ♂ -p ♀ ≠ 0 [two-sided] 2. Null distribution of difference of two proportions: 3. Observed difference in our experiment = = Calculate the p-value of what you observed: p< p-value is sufficiently low for us to reject the null; there does appear to be a difference in gender preferences here. Null says p’s are equal so estimate standard error using overall observed p

Ever SmokedNever smoked High exposure to smoking in film Low exposure Difference of two proportions…

Corresponding confidence interval

Chi-square test for comparing proportions (of a categorical variable) between groups Chi-Square Test of Independence When both your predictor and outcome variables are categorical, they may be cross- classified in a contingency table and compared using a chi-square test of independence. A contingency table with R rows and C columns is an R x C contingency table.

Example Asch, S.E. (1955). Opinions and social pressure. Scientific American, 193,

The Experiment A Subject volunteers to participate in a “visual perception study.” Everyone else in the room is actually a conspirator in the study (unbeknownst to the Subject). The “experimenter” reveals a pair of cards…

The Task Cards Standard lineComparison lines A, B, and C

The Experiment Everyone goes around the room and says which comparison line (A, B, or C) is correct; the true Subject always answers last – after hearing all the others’ answers. The first few times, the 7 “conspirators” give the correct answer. Then, they start purposely giving the (obviously) wrong answer. 75% of Subjects tested went along with the group’s consensus at least once.

Further Results In a further experiment, group size (number of conspirators) was altered from Does the group size alter the proportion of subjects who conform?

The Chi-Square test Conformed? Number of group members? Yes No Apparently, conformity less likely when less or more group members…

= 235 conformed out of 500 experiments. Overall likelihood of conforming = 235/500 =.47

Calculating the expected, in general Null hypothesis: variables are independent Recall that under independence: P(A)*P(B)=P(A&B) Therefore, calculate the marginal probability of B and the marginal probability of A. Multiply P(A)*P(B)*N to get the expected cell count.

Expected frequencies if no association between group size and conformity… Conformed? Number of group members? Yes 47 No53

Do observed and expected differ more than expected due to chance?

Chi-Square test Degrees of freedom = (rows-1)*(columns-1)=(2-1)*(5-1)=4

The Chi-Square distribution: is sum of squared normal deviates The expected value and variance of a chi- square: E(x)=df Var(x)=2(df)

Chi-Square test Rule of thumb: if the chi-square statistic is much greater than it’s degrees of freedom, indicates statistical significance. Here 85>>4. Degrees of freedom = (rows-1)*(columns-1)=(2-1)*(5-1)=4

Caveat **When the sample size is very small in any cell (<5), Fischer’s exact test is used as an alternative to the chi-square test.

Same data, but use Chi-square test Ever SmokedNever smoked High exposure smoking in film Low exposure

Fisher’s Exact Test

Fisher’s “Tea-tasting experiment” (p. 40 Agresti) Claim: Fisher’s colleague (call her “Cathy”) claimed that, when drinking tea, she could distinguish whether milk or tea was added to the cup first. To test her claim, Fisher designed an experiment in which she tasted 8 cups of tea (4 cups had milk poured first, 4 had tea poured first). Null hypothesis: Cathy’s guessing abilities are no better than chance. Alternatives hypotheses: Right-tail: She guesses right more than expected by chance. Left-tail: She guesses wrong more than expected by chance

Fisher’s “Tea-tasting experiment” (p. 40 Agresti) Experimental Results: MilkTea Milk31 Tea13 Guess poured first Poured First 4 4

Fisher’s Exact Test Step 1: Identify tables that are as extreme or more extreme than what actually happened: Here she identified 3 out of 4 of the milk-poured-first teas correctly. Is that good luck or real talent? The only way she could have done better is if she identified 4 of 4 correct. MilkTea Milk31 Tea13 Guess poured first Poured First 4 4 MilkTea Milk40 Tea04 Guess poured first Poured First 4 4

Fisher’s Exact Test Step 2: Calculate the probability of the tables (assuming fixed marginals) MilkTea Milk31 Tea13 Guess poured first Poured First 4 4 MilkTea Milk40 Tea04 Guess poured first Poured First 4 4

Step 3: to get the left tail and right-tail p-values, consider the probability mass function: Probability mass function of X, where X= the number of correct identifications of the cups with milk-poured-first: “right-hand tail probability”: p=.243 “left-hand tail probability” (testing the null hypothesis that she’s systematically wrong): p=.986

SAS code and output for generating Fisher’s Exact statistics for 2x2 table MilkTea Milk31 Tea13 4 4

data tea; input MilkFirst GuessedMilk Freq; datalines; run; data tea2; *Fix quirky reversal of SAS 2x2 tables; set tea; MilkFirst=1-MilkFirst; GuessedMilk=1-GuessedMilk;run; proc freq data=tea2; tables MilkFirst*GuessedMilk /exact; weight freq;run;

SAS output Statistics for Table of MilkFirst by GuessedMilk Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square Likelihood Ratio Chi-Square Continuity Adj. Chi-Square Mantel-Haenszel Chi-Square Phi Coefficient Contingency Coefficient Cramer's V WARNING: 100% of the cells have expected counts less than 5. Chi-Square may not be a valid test. Fisher's Exact Test ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Cell (1,1) Frequency (F) 3 Left-sided Pr <= F Right-sided Pr >= F Table Probability (P) Two-sided Pr <= P Sample Size = 8

Sample Size needed for comparing two proportions: Example: I am going to run a case-control study to determine if pancreatic cancer is linked to drinking coffee. If I want 80% power to detect a 10% difference in the proportion of coffee drinkers among cases vs. controls (if coffee drinking and pancreatic cancer are linked, we would expect that a higher proportion of cases would be coffee drinkers than controls), how many cases and controls should I sample? About half the population drinks coffee.

Derivation of a sample size formula: Variance of a proportion = Variance of difference of 2 (independent) proportions = The standard error of the difference of two proportions is:

Here, if we assume equal sample size and that, under the null hypothesis proportions of coffee drinkers is.5 in both cases and controls, then s.e.(diff)= Derivation of a sample size formula:

Sample size: difference of proportions Null distrib: difference is 0; d null =0 Alternative distrib: difference is.10 d alternative =.10 Critical value = (standard error) If you are 1.96 Z-scores above 0, then you will reject the null.

Because it’s difficult to tell people to look up the area to the right of –Z, we make a little fudge here and present the formula without the negative sign and allow people to look up area to the left of +Z.

For 80% power… There is 80% area to the left of a Z-score of.84 on a standard normal curve; therefore, there is 80% area to the right of Would take 392 cases and 392 controls to have 80% power! Total=784

Question 2: How many total cases and controls would I have to sample to get 80% power for the same study, if I sample 2 controls for every case? Ask yourself, what changes here?

Different size groups… Need: 294 cases and 2x294=588 controls. 882 total. Note: you get the best power for the lowest sample size if you keep both groups equal (882 > 784). You would only want to make groups unequal if there was an obvious difference in the cost or ease of collecting data on one group. E.g., cases of pancreatic cancer are rare and take time to find.

General sample size formula

General sample size needs when outcome is binary: