Lecture 8 – Comparing Proportions

Lecture 8 – Comparing Proportions
BMS 617 Lecture 8 – Comparing Proportions Marshall University Genomics Core Facility

Types of independent and dependent variables
Previously we looked at t-tests A (two-class) t-test is applicable when The dependent (outcome) variable is a continuous, interval variable The independent (input) variable is a nominal (or ordinal) variable with two possible values For example, in the GRHL2 comparison, the independent variable was “Basal type” (with values “Basal A” and “Basal B”) and the dependent variable was “log2 expression of GRHL2” Marshall University School of Medicine

Independent and dependent variables that are both nominal
Another class of tests involves independent and dependent variables that are both nominal Very common in clinical studies Independent: treated vs not treated Dependent: disease vs no disease Marshall University School of Medicine

Marshall University School of Medicine
Contingency Tables In such experiments, data is usually presented in a contingency table Shows how value of dependent variable is contingent on the independent variable Aim is to compare the proportions between the two groups: is A/(A+B) different to C/(C+D)? Disease No disease Total Exposed to risk or treated A B A+B Not exposed to risk, or not treated C D C+D A+C B+D N=A+B+C+D Marshall University School of Medicine

Types of study There are four different study designs that lead to data presented in contingency tables: Cross-sectional studies Sample is selected at random from population. Sample is then divided into two groups depending on prior treatment or exposure to risk factor. Disease prevalence is compared in each group. Prospective (longitudinal) studies Two groups are selected: one with exposure to risk factor (or treated) and one without. Groups are then followed over time to see how many develop disease in each group Experimental studies Samples are selected, divided randomly in two groups. One group receives treatment (or is exposed to risk); one is not. Incidence of disease is compared in each group. Case-control studies Two groups of samples are selected: one with the disease (cases) and one without (controls). Each group is examined to see how many were treated or exposed to risk prior to the study Marshall University School of Medicine

Example experimental study
Study from Frye et al (1996, NEJM) Compared two treatments for Coronary Artery Disease CABG and PTCA 1829 patients in study, randomly assigned to CABG or PTCA Outcome is 5-year survival Survived 5 years Did not survive 5 years Total CABG 542 372 914 PTCA 537 378 915 1079 750 1829 CABG = Coronary Artery Bypass Graft PTCA = Percutaneous Transluminal Coronary Angioplasty Marshall University School of Medicine

Calculations for Frye data
The risk for the CABG group is 372/914=40.7% The risk for the PTCA group is 378/915=41.3% The relative risk (for the PTCA group compared to the CABG group) is 41.3%/40.7%=1.01 The PTCA group is 1.01 times more likely not to survive 5 years than the CABG group The 95% confidence interval for the relative risk is to 1.031 There is little difference between the risk in these groups Marshall University School of Medicine

Frye data for diabetic patients
Frye et al. also examined a subgroup of their patients who had diabetes Still an experimental study Controlled which patients received which treatment, observed outcome Diabetic patients Survived 5 years Did not survived 5 years Total CABG 93 87 180 PTCA 69 104 173 162 191 353 Marshall University School of Medicine

Risk and relative risk for diabetic patients
Risk for CABG group is 87/180=48.3% Risk for PTCA group is 104/173=60.1% Relative risk for PTCA group (relative to CABG group) is 60.1%/48.3%=1.244 For diabetic patients, the risk of dying within 5 years if you receive PTCA is times the risk of dying within 5 years if you receive CABG 95% confidence interval is to 1.632 Marshall University School of Medicine

Another example: AZT and HIV
Cooper et al. (via Motulsky): Study of the effectiveness in using AZT to prevent HIV developing into AIDS. Studied 936 patients, randomly treated either with AZT or with a placebo After three years, compared the proportion for whom the disease had progressed to AIDS Disease progressed No progression Total AZT 76 399 475 Placebo 129 332 461 205 731 936 Marshall University School of Medicine

Risk and relative risk for AZT
We can perform the same analyses as before: Risk for AZT group is 76/475=16% Risk for placebo group is 129/461=28% Relative risk for AZT group is 16%/28%=0.57 AZT group are 0.57 times as likely to experience disease progression in three years as placebo group 95% Confidence interval for relative risk is to 0.736 Marshall University School of Medicine

The attributable risk The difference between the two incidence rates is called the attributable risk: Attributable risk = 28%-16% = 12% A risk of 12% (of the disease progressing in three years) is attributable to not taking AZT 95% Confidence interval for attributable risk is from 6.68% to 17.28% Attributable risk is an intuitively useful value when studying risk factors Marshall University School of Medicine

Number Needed to Treat (NNT)
Since the attributable risk is 12%, we can interpret this as: Of those who didn’t receive AZT, 12% (about 1 in 8) progressed to the full disease in 3 years because they didn’t receive AZT Another 16% also progressed to the disease, but they would have done so anyway… Another way to look at this is that, for every 8 people or so who are treated, one is prevented from progressing to the full disease The Number Needed to Treat (NNT) is the reciprocal of the attributable risk NNT = 1/ = 8.35 On average, 8.35 people need to be treated with AZT to prevent 1 from progressing to the full disease in 3 years The 95% confidence interval is computed from the 95% CI for the attributable risk: 1/ to 1/0.0668, or 5.79 to 14.97 Marshall University School of Medicine

Significance tests for contingency tables
The confidence intervals for relative risk and/or attributable risk provide plenty of information about the differences between proportions in the contingency table If needed, a p-value can also be provided p-value is associated with a null hypothesis The proportion of the “positive” outcomes is independent of the treatment group Marshall University School of Medicine

Fisher’s Exact Test and Chi-squared tests
The best statistical test for a contingency table is Fisher’s Exact Test For large numbers (very large numbers), this test is computationally prohibitive In this case, a Chi-squared test can be used as an approximation Historically, Chi-squared tests were always used, but increased computing power makes this unnecessary Marshall University School of Medicine

p-value for the CABG-PTCA study
For the Frye et al. study, the null hypothesis is: The 5-year survival rate is the same for those treated with PTCA as for those treated with CABG Using Fisher’s Exact test for these data, we get p=0.8122 Assuming there is no difference in the survival rates between those treated with PTCA and those treated with CABG, there is a 81.22% chance of seeing a difference at least as big as the one observered Marshall University School of Medicine

p-value for Frye’s data with diabetic patients
For the restriction to diabetic patients, the p-value, by Fisher’s exact test, is Assuming the survival rate for diabetic patients is the same for those treated with PTCA as for those treated with CABG, there is a 3.25% chance of seeing a difference as large as the one observed Marshall University School of Medicine

Case-control studies In a case-control study, the investigator selects two groups of subjects: One group with the disease (or outcome of interest) One group without Compare this to a prospective or experimental study where the investigator controls the groups based on the independent variable (treatment or risk factor) In a case-control study, the investigator then looks back within each group to see how many were exposed to the risk factor or treatment Sometimes called a “retrospective study” Marshall University School of Medicine

Example: cholera vaccine
Example (Lucas et al. via Motulsky) Performed a case-control study to measure the effectiveness of a vaccine for cholera Scientifically ideal experiment is to recruit subjects, randomly give half the vaccine and half a placebo, and follow them to see how many in each group develop cholera Study would take many years Unethical if you believe vaccine works Instead, investigators recruited a group of 43 subjects who had contracted cholera and 172 who had not, and compared how many had been vaccinated in each group Marshall University School of Medicine

Lucas et al study Cases (cholera) Controls Vaccinated 10 94 Not vaccinated 33 78 Total 43 172 Note that in this study, investigators control the column totals In the previous examples, investigators control the row totals Makes an important difference to the interpretation of calculations Marshall University School of Medicine

Relative risk is meaningless in case-control studies
It makes no sense to compute the risk or relative risk in case-control studies The risk is the number affected in each group divided by the total in each group In a case control study, this is determined merely by the choice of the investigator as to how many subjects to place in each group Marshall University School of Medicine

Odds ratios Results of a case-control study are summarized as an odds ratio In our example, for the cholera group, the odds of being vaccinated are the number vaccinated divided by the number not vaccinated 10/33 = 0.303 The odds of being vaccinated for the controls are 94/78 = 1.205 The odds ratio is the ratio of the odds: 0.303/1.205 = 0.251 The odds of having been vaccinated for a cholera victim are times the odds of having been vaccinated for a control The 95% confidence interval for this odds ratio is to Marshall University School of Medicine

Odds ratio and relative risk
If the disease (or other outcome) is rare, then the odds ratio is an approximation to the relative risk Rare means less than about 10% of the population So, if we assume cholera is rare in this population, vaccinated individuals have about 25% the risk of getting cholera as unvaccinated individuals Marshall University School of Medicine

Statistical test for case-control studies
The statistical test used for case-control is also a Fisher’s exact test the null hypothesis in our example is that the proportion who received the vaccine is the same for those with cholera as for those without Fisher’s exact test gives p= in this case So if there is no difference in the proportion who received the vaccine between those with and those without cholera, the chances of seeing data showing at least as strong a relationship between the two due to sampling would be (or 0.03%). Marshall University School of Medicine

Fisher’s exact test and Chi-squared tests
The best test to use to produce a p-value associated to contingency tests is Fisher’s exact test Because this is a computationally intensive test, historically Chi-squared tests were used as an approximation A “standard” chi-squared test will give a lower p-value than is accurate Potentially much lower if the sample size is small A corrected is available to the chi-squared test, called the “Yates continuity correction” which generally gives a higher p-value than is correct Use Fisher’s exact test If you are forced to use the Chi-squared test, use the Yates continuity correction For decent sample sizes it makes little difference Marshall University School of Medicine

Lecture 8 – Comparing Proportions

Similar presentations

Presentation on theme: "Lecture 8 – Comparing Proportions"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 8 – Comparing Proportions

Similar presentations

Presentation on theme: "Lecture 8 – Comparing Proportions"— Presentation transcript:

Similar presentations

About project

Feedback