Download presentation
Presentation is loading. Please wait.
Published byWillis Whitehead Modified over 9 years ago
1
September 15
2
In Chapter 18: 18.1 Types of Samples 18.2 Naturalistic and Cohort Samples 18.3 Chi-Square Test of Association 18.4 Test for Trend 18.5 Case-Control Samples 18.6 Matched Pairs
3
§18.1 Types of Samples The prior chapter considered categorical response variables with two possible outcomes This chapter considers categorical variables with any number of possible outcomes
4
Types of Samples, cont. Data may be generated by: I. Naturalistic Samples. An SRS with data then cross-classified according to the explanatory variable and response variable. II. Purposive Cohort Samples. Fixed numbers of individuals selected according to the explanatory factor. III. Case-Control Samples. Fixed numbers of individuals selected according to the outcome variable.
5
Naturalistic Samples Take an SRS from the population; then cross-classify individuals with respect to explanatory and response variables.
6
Purposive Cohort Samples Select predetermined numbers of exposed and nonexposed individuals; then ascertain outcomes in individuals.
7
Case-Control Samples Identify individuals who are positive for the outcome (cases); then sample the population for negative (controls).
8
§18.2 Naturalistic and Cohort Samples Data from a naturalistic sample are shown in this 5-by-2 table Let us always put the explanatory variable in row of such table (for uniformity) Totals are tallied in table margins Smoke + Smoke − Total High school 123850 Assoc. degree 186785 Some college 2795122 UG degree 32239271 Grad degree 55257 Total 94491585
9
Marginal Distributions For naturalistic samples (only) describe marginal distributions These may be reported graphically or in terms of percentages Top figure: column marginal distribution Bottom figure: row marginal distribution
10
Conditional Percents The relationship between the row variable and column variable is explored with conditional percents. Two types of conditional percents : Row percents use in cohort and naturalistic samples (describe prevalence and incidence) Column percents use in case-control samples
11
Incidence and Prevalence (Naturalistic and Cohort Samples only) The top table demonstrates R-by- C table notation (R rows and C columns) For naturalistic and cohort samples, row percents in column 1 represent group incidence or prevalences Smoke+Smoke-Total Group 1 a1a1 b1b1 n1n1 Group 2 a2a2 b2b2 n2n2 ↓ ↓↓n3n3 Group R aRaR bRbR nRnR Total m1m1 m2m2 N
12
Prevalences - Example This table shows prevalence by education level Example of calculation, prevalence group 1:
13
Relative Risks, R-by-2 Tables Let group 1 represent the least exposed group Relative risks are calculated as follows:
14
RRs, R-by-2 Tables, Example This table lists RR for the illustrative data Example of calculation Notice the downward dose-response in RRs
15
Odds Ratios, R-by-2 Tables (optional) The odds of an event is the ratio of successes to failures: The odds ratios associated with exposure level i in a R-by-2 table is Interpretation. ORs similar to RRs, e.g., OR≈1 implies no association (see chapter for details)
17
Responses with More than Two Levels of Outcome Efficacy of Echinacea. A randomized controlled clinical trial pitted echinacea vs. placebo in the treatment of upper respiratory symptoms in children. The response variable was severity of illness classified as: mild, moderate or severe. Source: JAMA 2003, 290(21), 2824-30JAMA 2003, 290(21), 2824-30
18
Echinacea, Conditional Distributions Row percents are calculated to determine the incidence of each outcome. Example of calculation, top right table cell (data prior slide) % severe w/echinacea = 48 / 329 × 100% = 14.6% Conclusion: the treatment group fared slightly worse than the control group: 14.6% of treatment group experienced severe symptoms compared to 10.9% of the control group.
19
§18.3 Chi-Square Test of Association A. Hypotheses. H 0 : no association in population versus H a : association in population B. Test statistic. C. P-value. Convert the X 2 stat to a P-value with a a Table E or software program.
20
Chi-Square Test - Example Data below reveal a negative association between smoking and education level. Let us test H 0 : no association in the population vs. H a : association in the population.
21
χ 2, Expected Frequencies
22
Chi-Square Statistic - Example
23
Chi-Square Test, P-value X 2 stat = 13.20 with 4 df Using Table E, find the row for 4 df Find the chi-square values in this row that bracket 13.20 Bracketing values are 11.14 (P =.025) and 13.28 (P =.01). Thus,.025 < P <.01 (closer to.01) Probability in right tail df0.980.250.200.150.100.050.0250.01 40.485.395.996.747.789.4911.1413.2814.86
24
Illustrative example X 2 stat = 13.20 with 4 df The P-value = AUC in the tail beyond X 2 stat
25
Chi-Square By Computer Here are results for the illustrative data from WinPepi > Compare2.exe > Program F Categorical Data
26
Yates’ Continuity Corrected Chi- Square Statistic Two different chi-square statistics are used in practice Pearson’s chi-square statistic (covered) is Yates’ continuity-corrected chi-square statistic is: The continuity-corrected method produces smaller chi- square statistics and larger P-values. Both chi-square are used in practice.
27
Chi-Square, cont. 1.How the chi-square works. When observed values = expected values, the chi-square statistic is 0. When the observed minus expected values gets large and evidence against H 0 mounts 2.Avoid chi-square tests in small samples. Do not use a chi-square test when more than 20% of the cells have expected values that are less than 5.
28
Chi-Square, cont. 3. Supplement chi-squares with measures of association. Chi-square statistics do not measure the strength of association. Use descriptive statistics or RRs to quantify “strength”. 4. Chi-square and z tests (Ch 17) produce identical P-values. The relationship between the statistics is:
29
18.4 Test for Trend See pp. 431 – 436
30
§18.5 Case-Control Samples Case-control sampling method Identify all cases in the population From the same source population, randomly select a series of non-cases (controls) Ascertain the exposure status of cases and controls Cross-tabulate the exposure status of cases and controls This provides an efficient way to study rare outcomes
31
Incidence Density Sampling This advanced concepts allows students to see that case- control studies are a type of longitudinal “time-failure” design. As cases are identified in the population; select at random one or more noncases (controls) for each case at time of occurrence.
32
Odds Ratio CasesControlsTotal Exposeda1a1 b1b1 n1n1 Nonexposeda2a2 b2b2 n2n2 Totalm1m1 m2m2 N With incidence density sampling, the OR is a direct estimate of the rate ratio in the population! Cross-tabulate the count of cases and controls according to their exposure status: cross-product ratio
33
Case-Control Illustrative Example Cases: men diagnosed with esophageal cancer Controls: noncases selected at random from electoral lists in same region Exposure = alcohol consumption dichotomized at 80 gms/day Interpretation: The rate ratio associated with high-alcohol consumption is about 5.6
34
(1– α)100% CI for the OR Note use of the natural logarithmic scale
35
90% CI for the OR – Example CasesCntls E+96109 E−104666
36
Case-Control - Example Results from WinPepi > Compare2.exe > A. WinPepi uses a slightly different formula than ours; the Mid-P results are similar to ours.
37
Case-Control Studies with Multiple Levels of Exposure With an ordinal exposure, compare each exposure level to the non-exposed group (next slide):
38
Case-Control, Ordinal Levels of Exposure Note dose-response relationship
39
18.6 Matched Pairs With matched-pair samples, each participant is carefully matched to a unique individual as part of the selection process This technique is used to mitigate confounding by the matching factor Both cohort and case-control samples may avail themselves of matching
40
Here’s the notation for matched-pair case-control data: The odds ratio associate with exposure is: The confidence interval is: Case E+Case E− Control E+ab Control E−cd
41
Matched Pairs - Example A matched case-control study found 45 pairs in which the case but not the control had a low fruit/veg diet; it found 24 pairs in which the control but not the case had a low fruit/veg diet Case E+Case E− Cntl E+ unknown 24 Cntl E−45 unknown The odds ratio suggests 88% higher risk in low fruit/veg consumers.
42
Matched Pair Example, cont. Data are compatible with ORs between 1.14 and 3.07 WinPepi’s PairEtc.exe program A calculates exact confidence intervals for ORs from matched-pair data. Hand calculated limits will be similar except in small samples.
43
Hypothesis Test, Matched Pairs A. H 0 : OR = 1 B. McNemar’s test statistic. C. P-values. Convert z stat to P-value with Table B or Table F If fewer than 5 discordancies are expected, use an exact binomial procedure (see text).
44
Hypothesis Test, Example Case E+Case E− Control E+unknown24 Control E−45unknown
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.