Download presentation
Presentation is loading. Please wait.
Published byEvelyn Morrison Modified over 8 years ago
1
Contingency tables Brian Healy, PhD
2
Types of analysis-independent samples OutcomeExplanatoryAnalysis ContinuousDichotomous t-test, Wilcoxon test ContinuousCategorical ANOVA, linear regression ContinuousContinuous Correlation, linear regression DichotomousDichotomous Chi-square test, logistic regression DichotomousContinuous Logistic regression Time to event Dichotomous Log-rank test
3
Example MS is known to have a genetic component MS is known to have a genetic component Several single nucleotide polymorphisms have been associated with susceptibility to MS Several single nucleotide polymorphisms have been associated with susceptibility to MS Question: Do patients with susceptibility SNPs experience more sustained progression than patients without susceptibility SNPs? Question: Do patients with susceptibility SNPs experience more sustained progression than patients without susceptibility SNPs?
4
Data Initially, we will focus on presence vs. absence of SNPs Initially, we will focus on presence vs. absence of SNPs Among our 190 GA treated patients, 74 had the SNP and 116 did not Among our 190 GA treated patients, 74 had the SNP and 116 did not –12 patients with the SNP experienced sustained progression –13 patients without the SNP experienced sustained progression
5
Another way to look at the data Rather than investigating two proportions, we can look at a 2x2 table of the same data Rather than investigating two proportions, we can look at a 2x2 table of the same data SNP+SNP-Total Prog121325 No prog 62103165 Total74116190
6
Question In our analysis, we assume that the margins are set In our analysis, we assume that the margins are set If there was no relationship between the two variables, what would we expect the values in the table be? If there was no relationship between the two variables, what would we expect the values in the table be?
7
Example As an example, use this table As an example, use this table SNP+SNP-Total Prog100 No prog 100 Total50150200 50*100/200 =25 150*100/200= 75
8
Expected table Expected table for our analysis Expected table for our analysis SNP+SNP-Total Prog25 No prog 165 Total74116190 25*74/190= 9.73 116*165/ 190=100.7 165*74/190 =64.3 25*116/190 =15.3 How different is our observed data compared to the expected table?
9
Does our data show an effect? To test for an association between the outcome and the predictor, we would like to know if our observed table was different from the expected table To test for an association between the outcome and the predictor, we would like to know if our observed table was different from the expected table How could we investigate if our table was different? How could we investigate if our table was different?
10
Chi-square distribution This statistic follows a chi-square distribution with 1 degree of freedom This statistic follows a chi-square distribution with 1 degree of freedom Assume x is a normal random variable with mean=0 and variance=1 Assume x is a normal random variable with mean=0 and variance=1 –x 2 has a chi-square distribution with 1 degree of freedom
11
Chi-square distribution Area=0.05 X 2 =3.84
12
Critical information for 2 For 1 degree of freedom, cut-off for =0.05 is 3.84 For 1 degree of freedom, cut-off for =0.05 is 3.84 –For normal distribution, this is 1.96 –Note 1.96 2 =3.84 Inherently, two-sided since it is squared Inherently, two-sided since it is squared
13
Hypothesis test with 2 1) H 0 : No association between SNP and progression 2) Dichotomous outcome, dichotomous predictor 2 test 4) Test statistic: 2 =0.99 5) p-value=0.32 6) Since the p-value is greater than 0.05, we fail to reject the null hypothesis 7) We conclude that there is no significant association between SNP and progression
14
2 statistic p-value
15
Hypothesis test comparison Yesterday, we completed this same test using a comparison of proportions Yesterday, we completed this same test using a comparison of proportions Let’s compare the results Let’s compare the results Method Test statistic p-value Test of proportions z=0.996p=0.32 2 test 2 =0.992 p=0.32 We get the same result!!!
16
Question: Continuity correction What is a continuity correction and when should I use it? What is a continuity correction and when should I use it? –Continuity correction subtracts ½ from the numerator of the 2 statistic –Designed to improve performance of normal approximation –Use default in STATA (or other stat package), but know which you are using –Less important today since exact tests are easily used
17
Question: Why 1 degree of freedom? We used a 2 distribution with 1 degree of freedom, but there are 4 numbers. Why? We used a 2 distribution with 1 degree of freedom, but there are 4 numbers. Why? –For our analysis, we assume that the margins are fixed. –If we pick one number in the table, the rest of the numbers are known SNP+SNP-Total Prog25 No Prog 165 Total74116190 3 22 7194
18
Question: Normal approximation We are using a normal approximation, but yesterday we talked about this being less than perfect. When can we use this test? We are using a normal approximation, but yesterday we talked about this being less than perfect. When can we use this test? –Rule of thumb: All cells larger than 5 –Large samples What should I do if I do not have large samples? What should I do if I do not have large samples? –Fisher’s exact test
19
Fisher’s exact test Remember that a p-value is the probability of the observed value or something more extreme Remember that a p-value is the probability of the observed value or something more extreme Fisher’s exact test looks at a table and determines how many tables are as extreme or more extreme than the observed table under the null hypothesis of no association Fisher’s exact test looks at a table and determines how many tables are as extreme or more extreme than the observed table under the null hypothesis of no association Same concept as exact test from Wilcoxon test Same concept as exact test from Wilcoxon test Easy to compute this in STATA Easy to compute this in STATA
20
Hypothesis test with exact test 1) H 0 : No association between SNP and progression 2) Dichotomous outcome, dichotomous predictor 3) Exact test 4) Test statistic: NA 5) p-value=0.38 6) Since the p-value is greater than 0.05, we fail to reject the null hypothesis 7) We conclude that there is no significant association between SNP and progression
21
Two-sided p-value
22
Results Our results were very similar to the other tests in part because we have a large sample size Our results were very similar to the other tests in part because we have a large sample size –Normal approximation ok In small samples, larger differences are possible In small samples, larger differences are possible
23
Types of studies In a cohort study, people are enrolled based on exposure status so we can somewhat control how many exposed and unexposed people we have In a cohort study, people are enrolled based on exposure status so we can somewhat control how many exposed and unexposed people we have In a case-control study, people are enrolled based on disease status so that we ensure that we have both diseased and non-diseased people In a case-control study, people are enrolled based on disease status so that we ensure that we have both diseased and non-diseased people
24
Measures of association Risk difference Risk difference –Do these added together equal 1? –Why? –Under the null, what is the risk difference? Relative risk (risk ratio) Relative risk (risk ratio) –Under the null, what is the relative risk?
25
P(Disease+|Exposure+)= a/m 1 =p 1 P(Disease+|Exposure+)= a/m 1 =p 1 –What is another name for this quantity? –Prevalence in patients with exposure P(Disease+|Exposure-)= b/m 2 =p 2 P(Disease+|Exposure-)= b/m 2 =p 2 RD=a/m 1 – b/m 2 RD=a/m 1 – b/m 2 Difference between proportions Difference between proportions Exposure DiseaseYNTotal Yab n1n1n1n1 Ncd n2n2n2n2 Total m1m1m1m1 m2m2m2m2N
26
Confidence interval for RD Several confidence intervals are available for the RD Several confidence intervals are available for the RD –Asymptotic normal distribution –Confidence interval
27
Estimate of RR: Estimate of RR: Exposure DiseaseYNTotal Yab n1n1n1n1 Ncd n2n2n2n2 Total m1m1m1m1 m2m2m2m2N
28
Confidence interval for RR To construct a confidence interval we use a normal approximation To construct a confidence interval we use a normal approximation In addition, the CI is based on a log transformation of the RR In addition, the CI is based on a log transformation of the RR –log(RR)=ln(RR) –I will use ln and log to represent the natural logarithm Quick math: e ln(RR) =RR Quick math: e ln(RR) =RR
29
ln(RR) Why do we use the ln(RR)? Why do we use the ln(RR)? –It is generally easier to deal with subtraction rather than division –ln(RR)=ln(p 1 /p 2 )=ln(p 1 )-ln(p 2 ) We can estimate the standard error for the ln(RR) using the following formula We can estimate the standard error for the ln(RR) using the following formula
30
Confidence interval Now that we have an estimate of the variance, we can create a confidence interval for ln(RR) using our standard normal approximation Now that we have an estimate of the variance, we can create a confidence interval for ln(RR) using our standard normal approximation To create a confidence interval for RR, we transform this confidence interval To create a confidence interval for RR, we transform this confidence interval
31
Estimated proportions in two groups p-value from chi- square test Given the confidence interval, would you reject the null hypothesis? Why?
32
Interpretation of RD The estimated risk difference is 0.05. The estimated risk difference is 0.05. –The interpretation of this is that the risk of progression for patients with the susceptibility allele is 5% higher than for patients without the allele The 95% confidence interval for the risk difference is (-0.052,0.152) The 95% confidence interval for the risk difference is (-0.052,0.152) –Is there a significant difference between the allele groups? What was the confidence interval for the difference between the proportions that we investigated two classes ago? What was the confidence interval for the difference between the proportions that we investigated two classes ago? –95% CI: (-0.052,0.152)
33
Interpretation of RR The estimated relative risk is 1.45. The estimated relative risk is 1.45. –The interpretation of this is that the risk of progression for patients with the susceptibility allele is 1.45 times higher than for patients without the allele The 95% confidence interval for the risk difference is (0.70,3.00) The 95% confidence interval for the risk difference is (0.70,3.00) –Is there a significant difference between the allele groups?
34
RD and RR Now that we know how to estimate these measures, can we estimate these with any study design? Now that we know how to estimate these measures, can we estimate these with any study design? –Not directly –In a cohort study study, the probabilities of interest, P(Disease|Exposure), are estimated –In a case-control study, the probabilities cannot be estimated directly so more information is required
35
Bayes theorem-technical The relationship between the P(Disease|Exposure) and P(Exposure| Disease) can be shown using Bayes theorem The relationship between the P(Disease|Exposure) and P(Exposure| Disease) can be shown using Bayes theorem Therefore, if we knew P(D+), we can estimate P(D+|E+) from a case control study Therefore, if we knew P(D+), we can estimate P(D+|E+) from a case control study –P(D+) is prevalence –Usually we do not know this so we can’t directly estimate the relative risk or risk difference
36
Odds ratio Odds: Odds: Odds ratio: Odds ratio: –Under the null, what is the OR?
37
Exposure DiseaseYNTotal Yab n1n1n1n1 Ncd n2n2n2n2 Total m1m1m1m1 m2m2m2m2N This is the estimate of the odds ratio from a cohort study
38
Exposure DiseaseYNTotal Yab n1n1n1n1 Ncd n2n2n2n2 Total m1m1m1m1 m2m2m2m2N This is the estimate of the odds ratio from a case-control study
39
Amazing!! Estimated odds ratio from each kind of study ends up being the same thing!!! Estimated odds ratio from each kind of study ends up being the same thing!!! Therefore, we can complete a case control study and get an estimate that we really care about, which is the effect of the exposure on the disease Therefore, we can complete a case control study and get an estimate that we really care about, which is the effect of the exposure on the disease This relationship is why the odds ratio is so commonly This relationship is why the odds ratio is so commonly
40
Confidence interval for OR In order to calculate a confidence interval for the OR, we will investigate Woolf’s approximation In order to calculate a confidence interval for the OR, we will investigate Woolf’s approximation –Other approximations and exact intervals are available in STATA (Exact is default) Woolf’s approximation focuses in a log transformation of the OR like for the RR Woolf’s approximation focuses in a log transformation of the OR like for the RR –log(OR)=ln(OR) Quick math: e ln(OR) =OR Quick math: e ln(OR) =OR
41
Woolf’s approximation gives us Woolf’s approximation gives us Using our normal approximation, we can create a confidence interval for ln(OR) using Using our normal approximation, we can create a confidence interval for ln(OR) using The confidence interval for OR The confidence interval for OR
42
Example In yesterday’s class, we discussed a study in which we wanted to estimate the effect of a SNP on disease progression In yesterday’s class, we discussed a study in which we wanted to estimate the effect of a SNP on disease progression –What type of study was this? –Cohort study because we followed people forward over time Let’s estimate the odds ratio and confidence interval for this study Let’s estimate the odds ratio and confidence interval for this study
43
CI for OR Based on this table, the estimated OR=(12*103)/(13*62)=1.53 Based on this table, the estimated OR=(12*103)/(13*62)=1.53 95% CI: (0.66, 3.57) 95% CI: (0.66, 3.57) Should we reject the null hypothesis of OR=1? Should we reject the null hypothesis of OR=1? SNP+SNP-Total Prog121325 No prog 62103165 Total74116190
44
Interpretation of OR The estimated odds ratio is 1.53. The estimated odds ratio is 1.53. –The interpretation of this is that the ODDS of progression for patients with the susceptibility allele is 1.53 times higher than the ODDS for patients without the allele The 95% confidence interval for the risk difference is (0.66,3.57) The 95% confidence interval for the risk difference is (0.66,3.57) –Is there a significant association between SNP and disease?
45
Estimated OREstimated CI (Woolf)
47
OR vs. RR Although the odds ratio is interesting, the relative risk is more intuitive Although the odds ratio is interesting, the relative risk is more intuitive If we have a rare disease, which is often the case for a case-control study, If we have a rare disease, which is often the case for a case-control study, Therefore, in these cases, the odds ratio is also an estimate of the relative risk Therefore, in these cases, the odds ratio is also an estimate of the relative risk In other cases, odds ratio provides valid estimate of relative risk (see other courses) In other cases, odds ratio provides valid estimate of relative risk (see other courses)
48
Hypothesis test with CI 1) H 0 : No association between SNP and progression (RD=0) 2) Dichotomous outcome, dichotomous predictor 3) Risk difference 95% confidence interval 4) Test statistic: Estimated RD=0.50 95% CI: (-0.052, 0.152) 5) p-value>0.05 6) Since the p-value is greater than 0.05, we fail to reject the null hypothesis 7) We conclude that there is no significant association between SNP and progression
49
Hypothesis test with CI 1) H 0 : No association between SNP and progression (RR=1) 2) Dichotomous outcome, dichotomous predictor 3) Risk difference 95% confidence interval 4) Test statistic: Estimated RR=1.45 95% CI: (0.70, 3.00) 5) p-value>0.05 6) Since the p-value is greater than 0.05, we fail to reject the null hypothesis 7) We conclude that there is no significant association between SNP and progression
50
Hypothesis test with CI 1) H 0 : No association between SNP and progression (OR=1) 2) Dichotomous outcome, dichotomous predictor 3) Risk difference 95% confidence interval 4) Test statistic: Estimated OR=1.53 95% CI: (0.66, 3.57) 5) p-value>0.05 6) Since the p-value is greater than 0.05, we fail to reject the null hypothesis 7) We conclude that there is no significant association between SNP and progression
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.