Bias can get by us November Epidemiology 511 W. A. Kukull
Bias Systematic error that leads to incorrect estimate of an association –anticipate and eliminate or minimize in the study design phase –may be impossible to account for in analysis –usually introduced by the investigator (or subjects) Main categories: Selection bias and Information bias
Bias is a systematic error (diagram after Rothman, 2002) Random error decreases with study size; systematic error remains Random error Systematic error Study size Error
Direction of Bias
Control of Bias Careful study design is primary –Selection bias: permanent flaw Choice of study groups Data Collection; Data sources –objective, closed ended questions –trained interviewers: reliability assessment –wide variety of factors to “blind” interviewer and subject to hypothesis
Selection Bias Selection of “cases” or “controls” leads to apparent disease- exposure association Selection or f/u and dx of “exposed” or “unexposed” leads to apparent d - e association “Apparent” association is due to a systematic error in design or conduct of the study
Selection bias Common element: –The association between exposure and disease is different for those who are studied than it is for those who would be eligible but are not studied –Case - control: subject selection is influenced by probability of exposure history –Cohort: non-random loss to follow-up influences association measure (RR)
“Population” base Framinghammer City Study enrollees Time Loss, death, refusals before disease develops Disease cases Non- diseased
Selection Bias Reference Population Study Sample Non-Reference probabilities of being included in the study within exposure (or disease) DisNo Dis Exp Not Exp
Example: selection bias (after Szklo & Neito, 2000) True reference population disease No disease Exp Not Exp OR = 4.0
Unbiased Sample re: exposure status 50% of Diseased; 10% of Not Diseased-- but true Reference proportions of “exposed” in each D Not D Exp Not Exp OR= 4.0
Biased exposure probability sampling among “diseased” ONLY ( 60% exposed, not true 50% ) due to a flawed design or strategy Exp DisNot Dis Not Exp OR = 6.0
Basic example: Case-control study (after Hernan et al, 2004) Is prior HRT use associated with MI? Select women with incident MI—cases Select controls from women with high frequency of hip fracture (unintentionally) HRT is known to decrease osteoporosis Is the HRT – MI association likely to be biased ? Why/how?
Hospital-base case-control study: Berkson’s bias (after Schwartzbaum et al,2003) Premise: diseases have different probabilities hospital admission –Pr(brain injury) > Pr(allergic rhinitis) –Pr( >2 diseases) > Pr( 1 disease) –Diseases unassociated in the population could be associated in hospitalized patients Then, a risk factor for one disease could appear to be a risk factor for the other
Berkson’s bias/Admission bias (after Sackett, 1979) Resp. Disease Bone disease Yes No Yes No Yes No Gen. Pop. OR=1.06 Hospitalized in Last 6 months OR=4.06
Loss to follow-up: Selection bias in a Cohort study Effects of anti-retroviral therapy hx on AIDS risk in HIV+ patients. Pts. with more symptoms may drop early –Pts. with more therapy side effects may drop Restricting analysis to non-drop outs can produce biased result Subject drop out is rarely “at random” –Statistical missing data strategies
Selection Biases Non-response/Missing data bias: characteristics may differ between early, late and nonresponders –Missing data proportions differ –Analyses restricted to complete data will be biased –Non-responders in case-control studies may have different exposure histories
Healthy Worker selection bias Do rubber industry workers have excess mortality compared with U.S. population of the same age and sex? –SMR = 82 for rubber workers General population includes people who are unable to work because of illness –All cause death rates are usually higher in the general pop. than among workers –Use unexposed workers as a comparison group
Contributors to selection bias Choice of comparison group or sampling frame Self-selection, volunteers Loss to follow-up (cohort) Initial non-response –primarily case-control studies Selective survival Differences in disease detection (surveillance or detection bias)
Examples Unmasking bias: –physicians followed OC users more closely because of use-related cautions and thus detected more thrombophlebitis –Frequent visits =>more comorbidity Prevalent case and Survival bias –Smoking and Alzheimer’s disease –Among AD cases smokers may have shorter survival than non-smokers
Prevalent case bias Longer disease duration increases chance of selection Time Cross-sectional Sample
Example: volunteer/self-selection Leukemia in troops present at atomic test site –76% of all troops were traced –of the 76%, 82% were tracked down by investigators –of the 76%, 18% contacted investigators on their own initiative –4 leukemia cases were among the 18% and 4 among the 82%--Self referral bias?
Information Bias Inadequacies and inaccuracies in data collection or measurement Common to all subjects? –Will reduce observed association Different in each comparison group? –may exaggerate association
Information Bias Systematic errors in obtaining needed exposure (or diagnosis) information –non-differential misclassification, “random” error usually biases toward the “null” –differential misclassification: different between the study groups may cause estimated effect error in either direction
Example:True classification of family history for a hypothetical disease ‘X’ No Disease Positive Family Hx No Family Hx OR= Disease X
Example: Non-Differential misclassification Fam Hx accuracy cases 65%; controls 65% Disease XNo X Family Hx No Fam Hx OR =
Example: Differential misclassification accuracy cases 85%; controls 25% Disease XNo X Family Hx No Family Hx OR =
Cohort study: true classification of persons who hypothetically develop ER (after Koepsell & Weiss, Chapt 10) Esoph. Reflux No esoph. Reflux Chew tobacco Do not chew ,000 RR= 10.0
What if only 90% of the true cases were identified due to diagnostic inaccuracy? Esoph. Reflux No esoph. Reflux Chew Tobacco 10(0.9)=9990+1=991 1,000 Do not chew 10(0.9)= = ,000 RR=10.0
What if 1.0% of the well persons were misdiagnosed as having ER, but didn’t Esoph. Reflux No esoph. Reflux Chew tobacco 10+10=20 990(.99)= Do not chew = (.99)= ,000 RR= 1.82
Information Bias Example: MI and smoking –smokers with new MI may be less likely to respond to a mailed questionnaire than non- smokers with new MI –if the non response is related to exposure and disease the potential for bias exists Proxy reports of exposure –Relationship, proximity influence agreement
Information Biases (after Sackett) Diagnostic suspicion bias: knowledge of subjects prior history influences intensity of diagnostic effort Exposure suspicion bias: disease with “known” cause may increase search for that cause
Information Biases (after Sackett) Recall bias: cases more (or less) likely to report than controls Family information bias: Information from a family is stimulated by a new case in in the family--and their need to explain why
Exposure Disease viewed through (after Maclure & Schneeweiss, 2001) Background random factors (chance) Correlated causes, confounding Diagnostic inaccuracy Exposure accuracy Missing data, database errors Group/hypothesis formation Case-control selection Cohort loss to f/u Analysis, modeling, interpretation Publication bias –Editors and experts
Evaluation of Bias: What would the RR look like if ??? What is the direction and likely effect if bias is active? –IS A TRUE ASSOCIATION MASKED? –IS A SPURIOUS ASSOCIATION REPORTED? Can the potential for recall bias be estimated –second control group with another illness?
Is Selection Bias Present (after Grimes and Shultz, Lancet;2002;359:248-52) In a cohort study, are participants in the exposed and unexposed groups similar in all respects except for exposure? In a case control study, are cases and controls similar in important respects except for the disease in question?
Is Information Bias Present (after Grimes and Shultz, Lancet;2002;359:248-52) In a cohort study, is information about outcome obtained in the same way for those exposed and unexposed? In a case control study, information about exposure gathered in the same way for cases and controls?
Is Confounding Present (after Grimes and Shultz, Lancet;2002;359:248-52) Could the results be accounted for by the presence of another factor– e.g., age, smoking, sexual behavior, diet—associated with the exposure and outcome but not directly in the causal pathway? Confounding is the subject of another lecture…
If Not bias or confounding are results due to “chance” (after Grimes and Shultz, Lancet;2002,359:248-52) What is the RR or OR and the 95% confidence intervals…Does the CI include 1.0? Is the difference (association) statistically significant and if not did the study have adequate power to find a clinically important difference (association)? –What is the p-value? –Is the p-value inflated by multiple comparisons ?
Bias and study designs: Important sources Case-control –Knowledge of disease status may influence determination of exposure status –Knowledge of exposure status influenced the subjects selected –Recall bias Cohort –loss to follow-up; differential misdiagnosis –Information bias
Epidemiologic Reasoning Use the tools, statistics and calculations Use knowledge of biology, behavior and disease pathogenesis Make educated guesses about effect of bias and confounding to guide study design and analysis and eliminate untoward effects Try to make causal inferences
Conclusion What sources of Bias are common to which study designs? How can we evaluate bias? “Sensitivity analysis”: “What if….” Confounding may still impact results even if bias is eliminated—but it can be dealt with in analysis.