Daniel O. Stram Mark Huberman Anna Wu IS RESIDUAL CONFOUNDING A REASONABLE EXPLANATION FOR THE APPARENT PROTECTIVE EFFECTS OF BETA-CAROTENE FOUND IN EPIDEMIOLOGICAL STUDIES OF LUNG CANCER IN SMOKERS? Daniel O. Stram Mark Huberman Anna Wu
Background A large number of epidemiological studies (case-control and cohort studies) have reported an inverse (protective) relationship between beta-carotene intake and lung cancer risk among smokers and more generally between fruit and vegetable intake and lung cancer risk
Review (Ziegler et al 1996) Of 25 retrospective studies of intake, 16 showed a protective effect of carotenoids, 6 were strongly significant, only 4 studies showed an opposite trend and only 1 reporting a marginally significant increase in risk Of 6 prospective studies of blood micro-nutrient levels, 5 showed protective effects of beta-carotene (4 strongly significant), and none showed increases in risk Many of these studies show approximately a doubling of risk in the “low” vs. “high” beta-carotene group
Intervention Trials Three randomized studies (CARET, ATBC, PHS) have failed to find reduced lung cancer risk in smokers given beta-carotene Two of these studies actually noted an increase in risk in the beta-carotene group
What is the likely cause for the differences between observational and intervention studies? Albanes et al (ATBC investigator) lists 2 distinct explanations Serum levels and “usual” intake represent long-term exposure to beta-carotene with different effects on risk than supplementation High beta-carotene intake is associated with other dietary or lifestyle practices that are protective
What about tobacco itself? Tobacco use is an extraordinarily important risk factor for lung cancer A number of studies have noted that intake or serum levels of beta-carotene are reduced in current smokers compared to ex-smokers, and in ex-smokers compared to never smokers Studies that fail to adequately control for smoking would be subject to biases due to this inverse association All the observational studies, did however, include smoking history as a variable in their analyses
Is controlling for self-reports of smoking history sufficient to address confounding? This depends upon The nature and magnitude of the errors in self-reports of smoking in assessing true exposure The strength of the association between beta-carotene intake or serum levels and true smoking exposure The shape and strength of the relationship between true smoking and lung cancer risk Large errors + strong relationships between true dose and beta-carotene and between true dose and risk = High probability of “Residual Confounding”
Goals of the paper Develop model for residual confounding that is consistent with the literature regarding self-reports of smoking, true lung dose, beta-carotene, and lung cancer risk Use this model to compute risk differences between smokers with “high vs. low” beta-carotene intake levels due solely to differences in “true lung dose” Suggest future research for observational studies
Simplifications Concentrate solely upon model for current smokers even though most observational studies included ex-smokers
What should the model look like? Model for errors in smoking reports (z) Allow for both classical and Berkson components of error in distribution of z and true lung dose (x) Reports of number of cigarettes / day may be “symmetrically” distributed around the truth (classical error model) Conditional on # of cigs, true lung dose may show additional random variation from person to person due to inhalation differences etc. (Berkson) Multiplicative errors seem more reasonable than additive errors
Conditional on Z=log(z) the log (X) of true exposure (x) is assumed to be lognormal Taking b = 1 gives the Berkson while b = Var(X)/Var(Z) gives the classical error model z and Var(Z) are set so that 95 percent of smokers report between 5 and 60 cigarettes with median = 20 We worked with 3 models (purely classical, purely Berkson, and “mixed”)
On the arithmetic scale the correlation, Rz,x, between x and z is equal to Where and are the Berkson and classical variances in a “mixed” B&C model
How large is Rz,x ? High Error Model: As a lower bound we chose Rz,x = 0.55 which is a commonly reported value for the correlation between cotinine measurements and self reports of smoking. This assumes that cotinine is a “nearly perfect” biomarker of true lung dose Low Error Model: We chose Rz,x = 0.85 as an upper bound (which seems high for a self-reported exposure of anything)
Model for beta-carotene and true lung dose We chose a semi-lognormal model so that log beta-carotene, B, is linear in true lung dose with a negative slope – this is similar to the models that Stryker fit to measurements of serum beta carotene This model is parameterized by the correlation RB,x
Why a negative RB,x ? Smokers have poor diets There probably is some kind of direct action of nicotine or other tobacco constituent on taste Anecdotally I have heard ex-smokers say that their taste for sweet foods is much stronger after quitting There may also be direct action of smoking on serum levels of beta-carotene conditional on intake This has been reported in several papers and partly served as a rationale for believing that replacing “lost” beta-carotene should be protective
How large is RB,x ? We assume that RB,x = -0.25 It must be larger than RB,z in fact we have We assume that RB,x = -0.25 Under the high error model this gives an observed correlation of –0.14 This correlation seems to be consistent with the (very few) direct examinations of RB,z that are given in the literature
Model for Lung Cancer Risk among current smokers Doll and Peto (J Epidemiol and Community Health 1978) describe the British Doctors data using RR = (1 + 1/6 cigarettes / day)2 We also considered a model of form RR=1 + (13/9) cigarettes/day which agrees with the quadratic model at the values 0 and 40 cigarettes/day
Note that this model is in terms of self-reported smoking. Under the lognormal measurement error model the observed relationship is attenuated relative to the true model. In particular if an observed model for risk involves a term zn then our lognormal model implies that a term zn/b appears in the model using true dose
Putting it all together What do we want to compute? RR = RR(“low intake”,z) / RR(“high intake”,z) for specific values of reported smoking, z How do we compute it? By (numerical) integration over the distributions of x given both z and B
Results
Conclusions Under the high measurement error model and linear-quadratic dose response model “all” of the effect of beta-carotene may be due to confounding with unmeasured tobacco exposure, that is relative risks of close to 2 are evident
The linear model reduces the strength of the residual confounding, but the bias is still important for the high error model The results are somewhat dependent upon whether a Berkson, Classical, or Mixed measurement error model is assumed
Comparison of Classical & Berkson error models (with the same value of Rz,x) Classical model implies attenuation and greater nonlinearity in risk function on true dose scale Classical model also implies that Var(X) is smaller than Var(Z). The opposite is true for the Berkson model Conditioning on “high” and “low” values of B produces a smaller difference in E{X|Blow} vs E{X|Bhigh} for classical than a Berkson model
The Berkson model implies much less change in the risk function but the larger Var(X) yields larger variation in E{X|B} These two effects tend to “cancel”. The model yielding the largest residual confounding is actually “mixed”
Is the high error model reasonable? Correlations between cotinine and self-reports of smoking range from approx. .4 - .7 in various publications. This allows little room for the “high error” model unless cotinine is a “perfect” estimate of true exposure Corr(Cotinine,z) = Corr(Cotinine, x)*Corr(z,x) implying that Corr(z,x) > Corr(Cotinine,z) However, these reported correlations are all from small carefully focused studies, it is likely that smoking reporting errors in larger less focused studies are larger than in these special studies
Many epidemiological studies report that reported smoking duration is a much better predictor of lung cancer risk than smoking amount. To the extent that this is true (and careful analysis is required because of the relation between smoking duration and age) I take this as evidence that self-reports of smoking are rather poor in these studies.
Is RB,x = -0.25 reasonable? This is a very strong negative correlation, but the high error model implies a much weaker observed correlation of RB,z = -0.14. One paper (Stryker) reported a correlation RB,z = -.26 for males, this however included nonsmokers in the analysis (the correlation would be smaller if only smokers are considered).
It is been under-appreciated that a direct action of tobacco smoke on serum beta- carotene (conditional on intake) which has been reported in the literature, increases the potential for residual confounding if tobacco intake is measured poorly. Remember that the cohort studies that measured serum beta-carotene found stronger effects than the other observational studies
Suggestions for Research The correlation between serum beta carotene levels and reported smoking should be better reported in studies Levels of beta-carotene in smokers < ex-smokers < never smokers have been commonly reported but only rarely have correlations for current smokers been calculated. (I found RB,z = -0.10 for the MEC). Correlation of serum beta carotene levels and serum cotinine levels should be reported where possible
Cohort studies with stored blood samples should use cotinine as well as reported smoking in joint analyses of lung cancer risk and beta carotene. This is planned for the Multi-Ethnic Cohort Study of Diet and Cancer (Kolonel et al 2000, Am J of Epidemiol)