Three main points to be covered Nature, weakness, and (sometime) strength of studies using group-level observations Cohort study as gold standard and its assumptions and limitations Concept of the study base linking case- control design to the cohort design
Studies making observations on groups of individuals vs. individuals Studies using group level data are usually called ecological studies Two main weaknesses: –ecological fallacy –very limited control of confounding One (sometime) strength: –some exposures may be best measured at area or group level
Example from Szklo and Nieto of grouped data from cohorts in the Seven Countries Study
Ecological Fallacy Cannot tell whether the relationship between the predictor and the outcome at the group level holds at the individual level In this example: Are the individuals in the cohorts eating more saturated fat the same individuals experiencing more CHD deaths? Sometimes called confounding at the group level
Confounding in group data If no ecological fallacy, still left with possible confounding: some 3rd variable causing increase in CHD deaths and also related to consumption of fat (eg, exercise) Difficult to control for because measures may not be available Even if data available, don’t know relationship of confounding variable to other two variables at individual level
Example of the potential strength of measures at group level: Effect of Floods in Bangladesh in 1988 on Children Children years samples 6 months before flood and 5 months after Outcomes: Enuresis and aggressive behavior Individual level predictor: individual danger of drowning No association seen at individual level At group level, before and after flood comparison showed significant difference
Situations where group level variables may be better Exposures without much within group variability or a threshold effect (eg, salt consumption in U.S.) Herd immunity in studying infectious disease (vaccination levels may be more informative than individual behavior) Exposures that have powerful effects at group level (Bangladesh flood -- may also be example of a threshold effect)
Ecological Studies: Summary As text emphasizes, common view that they are only hypothesis-generating is inadequate Weakest design for establishing causality but has a role because inexpensive and easy to do For some situations and kinds of data may actually be superior Some variables can only be measured at group level (policies and laws, environment)
Cohort Study Design Mimics individual’s progress through life and accompanying disease risk Gold standard because exposure/risk factor is observed before the outcome occurs Randomized trial is a cohort design with exposure assigned rather than observed Other study designs can be understood by how they sample the experience of a cohort
Cohort study design censored observations = losses to follow-up Minimum loss to follow-up (1%)
Time of Cohort Follow-up vs. Time when measurements made Concurrent cohorts give most control because measurements are made at the same time as cohort assembly and follow-up (most texts call these prospective cohorts) Non-concurrent cohorts rely on obtaining measurements made in the past (most texts call these retrospective cohorts) Mixed cohorts obtain some measures made in the past and rest at same time as follow-up
Selecting a non-concurrent cohort from a current administrative data base Not a cohort study if you sample persons currently in the data base in order to insure retrospective data from past years –cross-sectional sample –no loss to follow-up by definition Must sample individuals from some baseline in the past in the data base –ascertain outcome, losses to follow-up from that time forward
Non-concurrent cohort study cannot be defined by presence at end of follow-up Not the cohort This is the cohort
Main Threats to Validity of a Cohort Study Subjects lost during follow-up –Number of losses is less important than how losses are related to outcome and risk factor Ascertainment of outcome –Ascertainment should be complete and unbiased with respect to risk factor
Subjects lost during follow-up If losses are random, only power is affected If disease incidence important, losses related to outcome bias results If association of risk factor to disease is focus, losses bias results only if related to both outcome and the risk factor If losses introduce bias in the outcome, the censoring is called informative censoring
Crucial issue is who is leaving cohort: what bias do the censored observations (losses to follow-up) introduce? Same issue with ascertainment of outcome events. ? ? ?? ? ? ?
Two Cohort Studies of HCV/HIV Coinfection and Risk of AIDS Swiss HIV Cohort 3111 patients, ‘96-’99 At least two visits Med. follow-up 28 mos HCV+ more rapid disease progression Adj RH = 1.7 (95% CI = ) No loss to follow-up info (Greub, Lancet, 2000) Johns Hopkins Cohort 1955 patients, ‘95-’01 At least two visits Med. follow-up 25 mos HCV not associated with disease progression Adj RH = 1.0 (95% CI = ) No loss to follow-up info (Sulkowski, JAMA, 2002)
Case-Control Design: Concept of the Study Base Study Base = the population that gave rise to the cases (Szklo and Nieto call it the “reference population”) Key concept that shows the link between case-control design and cohort design Case-control design using the study base concept is most easily understood in the setting of a cohort study
Nested Case-Control Study within a Cohort Study Study Base = Cohort Controls Sampled each time a Case is diagnosed = Incidence Density
Nested Case-control Study In text example, 4 cases occur at 4 different points in time giving rise to 4 risk sets of cases and controls Controls for each case are selected at random in each risk set from cohort subjects under follow-up at the time It follows from the random selection, that a control can later become a case Results can be just as valid as using entire cohort; gives unbiased estimate of rate ratio
Definition of a Primary Study Base Primary Study Base = population that gives rise to cases that can be defined before cases appear by a geographical area or some other identifiable entity like a health delivery system Nested within a cohort is a special case
Examples of Primary Study Bases Residents of San Francisco during 2001 Members of the Kaiser Permanente system in the Bay Area during 2001 Military personnel stationed at California bases during 2001
Example of Case-Control Incidence Density Sampling in a Primary Study Base Use cancer registry covering San Francisco County to identify all new cases of glioma during a defined time period At time each new glioma case is reported, randomly sample two controls from current residents of San Francisco
Incidence Density Sampling in a Primary Study Base (e.g., San Francisco County) New residents Nested case-control in an open cohort with new subjects entering Primary Study Base
Case-Control Incidence Density Sampling in a Primary Study Base Same as nested case-control sampling in a cohort study with exception that in- migration of new persons requires one additional assumption Just as losses to the study base should not bias the results, additions to the study base should not introduce bias
Case-Based Case-Control Study: The Secondary Study Base Secondary Study Base = population that gave rise to cases, identified after cases diagnosed; those persons who would have been among the cases if they had developed the disease during the time period of study Start with a cases and then attempt to identify hypothetical cohort that gave rise to them
Primary vs. Secondary Base Main problem with a primary base is often ascertainment of all cases Main problem with a secondary base is the definition of the base
Case-Based Case Control Studies and the Secondary Study Base Source of cases is often one or more hospitals or other medical facilities Problem is identifying the population who would come to those institutions if they were diagnosed with the disease Careful consideration has to be given to factors causing someone to show up at that institution with that diagnosis
Case-control study starting with a sample of cases and identifying secondary study base Sampling can be incidence density just as in primary study base Secondary study base
Case-Based Case Control Studies Example: glioma cases seen at UCSF Difficult because referrals come from many areas One possible control group might be UCSF patients with a different neurologic disease Patients from a similar tertiary referral clinic are another possible control group
Text example of case-based case-control design shows sampling prevalent controls Secondary Study Base
Cross-Sectional Study Design
Case-based design using prevalent cases: essentially same as cross-sectional design
Example of case-based design using prevalent cases Sampling glioma patients under treatment in a hospital during study period Poor survival so patients in treatment will over-represent those who live longest Nature of bias variable and not predictable
Study base and case-control design Critical features of best case-control design: - cases need to consist of all, or a random sample, of subjects in the study base experiencing the outcome - controls need to consist of a sample of the study base that can be used to estimate the distribution of the exposure (risk factor) in the base
Summary Points Ecological studies weak in showing cause but have some valuable features Nature, not the size, of losses to follow-up crucial in cohort studies Key to case-control design is specifying and sampling the study base Case-control results can be as valid as cohort results if study properly designed and measurements made without bias