Chapter 18 Cross-Tabulated Counts Part A 11/24/2018 Chapter 18 Cross-Tabulated Counts Part A November 18 Basic Biostat
Chapter 18, Part A: 18.1 Types of Samples 18.2 Naturalistic and Cohort Samples 18.3 Chi-Square Test of Association November 18
Types of Samples I. Naturalistic Samples ≡ simple random sample or complete enumeration of the population II. Purposive Cohorts ≡ select fixed number of individuals in each exposure group III. Case-Control ≡ select fixed number of diseased and non-diseased individuals November 18
Naturalistic (Type I) Sample Random sample of study base November 18
Naturalistic (Type I) Sample Random sample of study base How did we study CMV (the exposure) and restenosis (the disease) relationship via a naturalistic sample? A population was identified and sampled Sample classified as CMV+ and CMV− Disease occurrence (restenosis) was studied and compared in the groups. November 18
Purposive Cohorts (Type II sample) Fixed numbers in exposure groups How would we study CMV and restenosis with a purposive cohort design? A population of CMV+ individuals would be identified. From this population, select, say 38, individuals. A population of CMV− individuals would be identified. From this population, select, say, 38 individuals. Disease occurrence (restenosis) would be studied and compared among the groups. November 18
Case-control (Type III sample) Set number of cases and non-cases How would I do study CMV and restenosis with a case-control design? A population of patents who experienced restenosis (cases) would be identified. From this population, select, say, 38, individuals. A population of patients who did not restenose (controls) would be identified. From this population, select, say, 38 individuals. The exposure (CMV) would be studied and compared among the groups. November 18
Case-Control (Type III sample) Set number of cases and non-cases November 18
Naturalistic Sample Illustrative Example Edu. Smoke? + − Tot HS 12 38 50 JC 18 67 85 JC+ 27 95 122 UG 32 239 271 Grad 5 52 57 Total 94 491 585 SRS, N = 585 Cross-classify education level (categorical exposure) and smoking status (categorical disease) Talley R-by-C table “cross-tab” November 18
Cross-tabulation (cont.) Educ. Smoke? + − Tot HS 12 38 50 JC 18 67 85 Some 27 95 122 UG 32 239 271 Grad 5 52 57 Total 94 491 585 Row margins Total Column margins November 18
Cross-tabulation of counts For uniformity, we will always: put the exposure variable in rows put the disease variable in columns November 18
Exposure / Disease relationship Use conditional proportions to describe relationships between exposure and disease November 18
Conditional Proportions Exposure / Disease Relationship In naturalistic and cohort samples row percents! R-by-2 Table + − Total Grp 1 a1 b1 n1 Grp 2 a2 b2 n2 ↓ Grp R aR bR nR m1 m2 N November 18
Example Prevalence of smoking by education: Lower education associated with higher prevalence (negative association between education and smoking) November 18
Let group 1 represent the least exposed group Relative Risks Let group 1 represent the least exposed group November 18
Illustration: RRs Note trend November 18
k Levels of Disease Efficacy of Echinacea example. Randomized controlled clinical trial: echinacea vs. placebo in treatment of URI Exposure ≡ Echinacea vs. placebo Disease ≡ severity of illness Source: JAMA 2003, 290(21), 2824-30 November 18
Row Percents for Echinacea Example Echinacea group fared slightly worse than placebo group November 18
Chi-Square Test of Association A. H0: no association in population Ha: association in population B. Test statistic November 18
Observed Degree Smoke + Smoke − Tot HS 12 38 50 JC 18 67 85 JC+ 27 95 122 UG 32 239 271 Grad 5 52 57 Total 94 491 585 November 18
Expected Smoke + Smoke − Total HighS (50 × 94) ÷ 585 = 8.034 (50 × 491) ÷ 585 = 41.966 50 JC 13.658 71.342 85 Some 19.603 102.397 122 UG 43.545 227.455 271 Grad 9.159 47.841 57 94 491 585 November 18
Continuity Corrected Chi-Square Pearson’s (“uncorrected”) chi-square Yates’ continuity-corrected chi-square: November 18
Chi-Square Hand Calc. November 18
Chi-Square P-value X2stat= 13.20 with 4 df Table E 4 df row bracket chi-square statistic look up right tail (P-value) regions Example bracket X2stat between 11.14 (P = .025) and 13.28 (P = .01) .01 < P < .025 Right tail 0.975 0.25 0.20 0.15 0.10 0.05 0.025 0.01 df =4 0.48 5.39 5.99 6.74 7.78 9.49 11.14 13.28 14.86 November 18
Illustration: X2stat= 13.20 with 4 df The P-value = AUC in the tail beyond X2stat November 18
WinPEPI > Compare2 > F1 Input screen row 5 not visible Output November 18
Chi-Square, cont. How the chi-square works. When observed values = expected values, the chi-square statistic is 0. When the observed minus expected values gets large evidence against H0 mounts Avoid chi-square tests in small samples. Do not use a chi-square test when more than 20% of the cells have expected values that are less than 5. November 18
Chi-Square, cont. Supplement chi-squares with descriptive stat. Chi-square statistics do not quantify effects For 2-by-2 tables, chi-square and z tests produce identical P-values. November 18
Discussion and demo on power and sample size For estimation For testing Power Sample size November 18