Presentation is loading. Please wait.

Presentation is loading. Please wait.

Epidemiological study design

Similar presentations


Presentation on theme: "Epidemiological study design"— Presentation transcript:

1 Epidemiological study design
Chakrarat Pittayawonganon, MD, MPH FETP, Bureau of Epidemiology Department of Disease Control Ministry of Public Health

2 ทบทวนจากบทเรียนก่อน Counts (จำนวนนับ), Rate (อัตรา), Ratio (อัตราส่วน), Proportion (สัดส่วน) ตัวตั้งกับตัวหาร การเป็น subset กัน? Rates: Instantaneous rate (km/hr), Average rate (30 deaths/year) Prevalence (ความชุก), Incidence (อุบัติการณ์) มีระยะเวลาเป็นตัวกำหนด เป็นจุดเวลา / ช่วงเวลา เป็นผู้ป่วยที่มีอยู่เดิม กับเพิ่มขึ้นใหม่ Incidence: new cases of a disease that develop over a period of time Prevalence: existing cases of a disease at a particular point in time or over a period of time Cumulative incidence = Individual Risk (Incidence/Ndisease-free at start of F/U) Problems: dynamic cohort and die from diseases other than disease of interest (competing risk)

3 ทบทวนจากบทเรียนก่อน Relationship of incidence and prevalence
Prevalence rate (อัตราความชุก), Attack rate (อัตราป่วยเฉียบพลัน), Incidence rate (อัตราอุบัติการณ์) กำหนดตามช่วงเวลา / จุดเวลา ตัวหาร จำนวนประชากรเสี่ยงที่เกิดโรค / จำนวนประชากรทั้งหมด ความสำคัญ การแปลผล และการนำไปใช้ วิธีการให้ได้มาต่างกัน เช่น จากการเฝ้าระวังโรค หรือจากการสำรวจ Relationship of incidence and prevalence P = prevalence I = Incidence D = Duration of the disease Attack rate = ร้อยละอัตราป่วยของประชากรที่มีภูมิไวรับเกิดป่วยเป็นโรค P = I x D

4 Quiz Which ones of these “rates” are true rates? ____ Attack rate
____ Incidence rate ____ Five-year survival rate ____ Infant mortality rate ____ Prevalence rate ____ Age-specific incidence rate ____ Case-fatality rate ____ Cause-specific mortality rate Confusing Risk and rate

5 Quiz Which ones of these “rates” are true rates?
__F__ Attack rate Proportion: Case/Total N __T__ Incidence rate (IR; 0 – infinity) __F__ Five-year survival rate Proportion: Survives/Total Cases __F__ Infant mortality rate Proportion: Fatal infants/Total infants __F__ Prevalence rate Proportion: Fatal infants/Total infants __T__ Age-specific incidence rate __F__ Case-fatality rate Proportion: Fatal cases/Total Cases __T__ Cause-specific mortality rate (Deaths caused by a specific disease per 1,000 population per year)

6 Descriptive Studies Organize and summarize data according to time, place, and person. Describe natural history of disease Extent of public health problem Identify populations at greatest risk Allocation of health care resources Suggest hypothesis about causation

7 Study Question Study Design Results Answer ERROR TRUTH Random
Study design can be experimental or observational Errors can be random or systematic (bias or confounding) ERROR Random Systematic Selection bias Information bias TRUTH

8 Design tree: major epidemiologic study design
Study designs Descriptive Analytic Case report Randomized Case series Non-randomized Descriptive study based on rates Quasi-experiment Prospective Cohort study Retrospective Case-control study Cross-sectional study Longitudinal study Other

9 Caveats What is a difference between a cohort and cohort study design?
Do I need to specify a particular hypothesis for my study? Specify the study hypothesis prior to undertaking data collection A two-sided hypothesis does not specify the direction of the association Statistical analysis is based on inferential reasoning: drawing conclusions about a population based on observations of a sample of that population Can my study have more than one question? Absolutely

10 What is a cohort? Cohort: Latin word for one of the 10 divisions of a Roman legion A group of individuals Sharing same experience Followed-up for a specified period of time Examples Birth cohort Occupational cohort chemical plant workers A Rapid Response Team Slide 10 What is a cohort? A cohort is defined as a group of subjects followed during a period of time. The name cohort derived from the military organization of the ancient Roman Empire, in which one cohort was composed of 300 soldiers out of the total of 3000 composing a Legion. In epidemiology, the term cohort defines a group of individuals who share the same environment and experience. The cohort is followed by the investigators over a certain period of time. Some examples of cohorts are: birth cohort - children who are born on the same year; occupational cohort identifying people working in the same plant, or agricultural field; a group of person involved in the same outbreak investigation as a Rapid Response Team.

11 การประยุกต์ใช้ในสถานการณ์จริง
Cohort study จำเป็นหรือไม่ ต้องเป็นลักษณะ Follow up มีสิ่งที่บอกว่ายังไม่ป่วย และต่อมาป่วย โดยเฉพาะ Retrospective cohort study ยกเว้น กรณีสอบสวนโรคติดเชื้อ ที่สามารถ Assume ว่ามีสถานะก่อนป่วยได้ (แตกต่างตามโรงเรียนที่สอน) จำเป็นหรือไม่ ที่ต้องศึกษาในประชากรทั้งหมดในพื้นที่นั้นๆ Cohort ที่ใช้ในการศึกษาสามารถศึกษาจากประชากรบางส่วนได้ ทั้งนี้ควรมีขอบเขตที่ชัดเจน ได้แก่ กลุ่มคน ห้องชั้นเรียน ตึกพัก เฉพาะช่วงเวลา สามารถวิเคราะห์ความสัมพันธ์ระหว่าง exposure/risk กับ outcome/disease ได้ โดยแบ่งกลุ่มของผู้ที่ยังไม่ป่วยตามการมีหรือไม่มี exposure/risk ที่ศึกษา

12 เกร็ดเล็กเกร็ดน้อย Disease-free does not imply healthy: incorrect to conclude that population at risk is healthy Population at risk and a cohort: closed and open (dynamic) cohort Closed cohort: can estimate a risk or an incidence rate (little distortion) Period of follow-up is short enough Competing risks are small enough in relation to disease under study Dynamic cohort: can not directly estimate risk (new people are added in the follow-up period), however, incidence rate is suitable when precise information on the amount of period of time

13 Cohort studies Intuitive approach to studying disease incidence and risk factors: 1. Start with a population at risk 2. Measure characteristics at baseline 3. Follow-up the population over time with a) surveillance or b) re-examination 4. Compare event rates in people with and without characteristics of interest Cohort studies, of which intervention studies are a special case, are an “intuitive” approach to studying the incidence of disease and associations between risk factors and health outcomes. We start by identifying a “population at risk”, a group of people who are in fact at risk (as far as we know) of developing the disease. For a chronic disease, that generally means that they have not yet developed it. We then measure the characteristics of this population at baseline, i.e., the start of a period of follow-up. We follow-up the population over time, either through (a) surveillance of routinely-detected cases (people get symptoms and go to their health care providers, who diagnose them or, sometimes, people get screened and then referred for diagnosis) or (b) re-examining people periodically or at the end of the follow- up. We then compare the incidence of events (development of the outcome) in people with characteristics we suspect may increase risk and compare this incidence to that in people without those characteristics. 13

14 Cohort studies Can be large or small Can be long or short
Can be simple or elaborate Can be local or multinational For rare outcomes need many people and/or lengthy follow-up May have to decide what characteristics to measure long in advance Cohort studies are, of course, quite diverse. There are large cohort studies, (e.g the Nurses Health Study with nearly 100,000 participants, or even the entire population of a country, by making use of comprehensive population registries that exist in Scandinavian countries) and small ones with only a few dozen participants. Cohort studies can be long, continuing over many years (e.g., the Framingham study) or short (e.g. a cohort of Egyptian newborns followed for 1 year for rotavirus; Naficy, et al. AJE 1999;150:770-7) . Cohort studies can be simple or elaborate, local or multinational. Since we generally need to have at least tens of cases, preferably hundreds, and sometimes thousands, if incidence is low we need to follow many people and/or follow them for a long time. Among other things, that means that we often have to decide what risk characteristics we will study long before we get the results, by which time understanding and measurement technology will have evolved. 14

15 Prospective Cohort Study
Study starts Exposure occurrence Disease occurrence + - ill exp + exp - Slide 15 Prospective cohort study The cohort studies can be prospective and retrospective. The prospective studies as represented in this slide are based on the following: The selection of the population is done before the development of the diseases under investigation; any potential subject who is found to have the disease at enrolment will be excluded. A cohort study sometimes begins by enrolling everyone in a defined population regardless of the exposure status, then characterizing each person’s exposure status after enrolment as described in this diagram. The population is thus followed over a certain period of time and the occurrence or not of the diseases under investigation will be noted in the exposed group as well as in the unexposed group. Thus the occurrence of disease among persons with different exposures is compared. It assesses whether the exposures are associated with increased risk of disease. Time Selection of population Prospective assessment of exposure and disease Growth-nutrition studies, Folic acid and NT defects

16 Prospective cohort study
Exposure occurrence Study starts Disease occurrence + - ill exp + - exp Slide 16 Prospective cohort study A cohort study can also begin with the enrolment of persons based on their exposure status. The population is divided into a group of people exposed to a defined risk factor/s and a reference group of unexposed people. The population is followed over a certain period of time and the occurrence or not of the disease/s under investigation in the exposed group as well as in the unexposed group will be noted. Thus the occurrence of disease among persons with different exposures is compared to assess whether the exposures are associated with increased risk of the disease. Selection based on exposure Prospective assessment of disease Chernobyl, Industrial accidents, Flood victims

17 Retrospective cohort study Transversal studies
Now Real Time Exposure occurrence Disease occurrence Study takes place + - ill exp Slide 17 Retrospective cohort study A cohort study in which patients are enrolled after the disease has already occurred is called a retrospective cohort study. Usually these kind of studies are done when a specific population can be easily defined (cohort) and studied. For example all of the participants of a wedding ceremony, the passengers of an airplane or of a cruise ship where an outbreak has occurred, the students of a school or the soldiers of a specific barrack, etc. Thus all enrolled persons, ill and not ill, will provide information on both their exposure and whether or not they become ill. The attack rate among exposed and not exposed people will thus be compared to identify the risk factor/s associated with the health event under investigation. Retrospective assessment of exposure and disease Selection based on population Food borne outbreaks, closed environment outbreaks (school, prisons, etc)

18 Effect measures in cohort studies
Hypothesis Is the incidence among exposed higher than among unexposed Absolute measures Risk difference (RD) Ie+ - Ie- Relative measures Relative risk (RR) Rate ratio Risk ratio Slide 18 Effect measures in cohort studies The computation of the two incidences, among exposed and unexposed, will allow us to compute: absolute measures as the difference between the two incidences relative measure or the so called Relative Risk. As you can see from this slide the relative risk is the ratio of the risk among the exposed divided by the risk among the unexposed. As we know the incidence is a measure of risk. There is no relative risk if the result from the above calculation is one. It means that the risk of the two groups, exposed and unexposed, is exactly the same. All values of RR greater than one suggests an association between the factor of exposure and the occurrence of the event. All values of RR close to zero suggests a protective association between the exposure and the occurrence of the event. Usually in a cohort study, we use a sample and not the entire population for our study, hence, these interpretations are valid IF the results are statistically significant. This significance is computed through what is called the Confidence Interval (CI). The 95% CI can be roughly interpreted as the range of value that in the absence of bias has a 95% chance of including the true value.    The Risk Difference (RD) or Attributable Risk (AR) is a measure of association that provides information about the absolute effect of the exposure or the excess risk of disease in those exposed compared with those unexposed. The AR is used to quantify the risk of disease in the exposed group that can be considered attributable to the exposure. Ie+ Ie- a/(a+b) c/(c+d) =

19 Presentation of cohort data Population at risk
Does HIV infection increase the risk of developing TB among a population of drug users? Drug users (f/u 2 years) TB Cases Incidence (%) HIV + 215 8 Slide 19 Presentation of cohort data: Population at risk In this other example, we assess the RR in a prospective cohort study. What the study wanted to asses was whether having a HIV infection increased the risk of developing tuberculosis among a population of drug users being followed-up for two years. The incidence among the HIV positive is of 3.7%. The incidence among the HIV negatives is 0.3% The RR will be done by 3.7/0.3 = 12 HIV - 289 1 Source: Selwyn et al., New York, 1989

20 Presentation of cohort data Population at risk
Does HIV infection increase the risk of developing TB among a population of drug users? Drug users (f/u 2 years) TB Cases Incidence (%) HIV + 215 8 Slide 20 Presentation of cohort data: Population at risk In this other example, we assess the RR in a prospective cohort study. What the study wanted to asses was whether having a HIV infection increased the risk of developing tuberculosis among a population of drug users being followed-up for two years. The incidence among the HIV positive is of 3.7%. The incidence among the HIV negatives is 0.3% The RR will be done by 3.7/0.3 = 12 3.7 (8/215) HIV - 289 1 0.3 (1/289) Source: Selwyn et al., New York, 1989

21 Presentation of cohort data Population at risk
Does HIV infection increase the risk of developing TB among a population of drug users? Drug users (f/u 2 years) TB Cases Incidence (%) Relative risk HIV + 215 8 Slide 21 Presentation of cohort data: Population at risk In this other example, we assess the RR in a prospective cohort study. What the study wanted to asses was whether having a HIV infection increased the risk of developing tuberculosis among a population of drug users being followed-up for two years. The incidence among the HIV positive is of 3.7%. The incidence among the HIV negatives is 0.3% The RR will be done by 3.7/0.3 = 12 3.7 (8/215) 12 HIV - 289 1 0.3 (1/289) Source: Selwyn et al., New York, 1989

22 Advantages and disadvantages of cohort studies
Can measure incidence and risks Good for rare exposures Clear temporal relationship between exposure and outcome Less subject to selection bias Disadvantages Requires a large sample size Latency period Lost to follow-up Ethical considerations Resource intensive High cost Timely Slide 22 Advantages and disadvantages of cohort studies What are the advantages and disadvantages of a cohort study? The main advantages are: It is possible to measure the incidence and thus the real risk of association between exposure and disease. It is good for rare exposures and can show a clear temporal relationship between exposure and outcome. One of the main disadvantages of cohort studies, in particular prospective cohort study, is that a large sample size is required and this will imply a lot of resources in terms of costs and time. Ethical problems can be raised especially if the study is conducted when already good evidence of association between the exposure and the disease exists. For studies involving a long period of follow up many enrolled persons can drop out for several reasons which cannot be controlled in turn reducing the power and validity of the study.

23 Retrospective assessment of exposure Selection based on disease status
Case-Control Study Now Real Time Exposure occurred Disease occurred Study takes place + - ill exp + - ill Slide 23 Case-control study A case-control study is a type of observational analytic epidemiologic investigation in which subjects are selected on the basis of whether they do (cases) or do not (controls) have a particular disease under study. The groups are then compared with respect to the proportion having a history of an exposure. Case-control studies offer a number of advantages for evaluating the association between an exposure and a diseases The case-control studies design offers a solution to the difficulties of studying diseases with very long latency periods, since investigators could identify affected and unaffected individuals and then look backward in time to asses their antecedent exposure, rather than have to wait a number of years for the disease to develop. Retrospective assessment of exposure Selection based on disease status

24 When is it desirable to conduct a case-control study?
When exposure data are expensive or difficult to obtain - Ex: Pesticide study described earlier When disease has long induction and latent period - Ex: Cancer, cardiovascular disease

25 When is it desirable to conduct a case-control study?
When the disease is rare Ex: Studying risk factors for birth defects When little is known about the disease Ex. Early studies of AIDS, H5 When underlying population is dynamic Ex: Studying breast cancer on Cape Cod

26 Advantages and disadvantages of case-control studies
Suitable for rare diseases Can explore several exposures Low cost Rapid Can cope with long latency Small sample size No ethical problems Disadvantages Cannot calculate the risk Not suitable for rare exposures Temporal relationship difficult to establish Subject to bias Selection of controls Recall bias Slide 26 Advantages and disadvantages of case-control studies Case-control study has its share of advantages and disadvantages: Case-control studies are quite good for the study of rare diseases because once you have discovered them you can just look backward to find the possible risk factors associated with them The study is suitable for studying several exposures at the same time It is quick, cheap and does not require a big sample and does not pose ethical problems because is a retrospective study thus what we are studying already happen. As to limits/disadvantages: The case-control studies compute OR which is an estimate of RR thus, it does not calculate in practice the real risk or incidence. Is not a suitable study for rare exposure unless the study is very large or the exposure is common among those with the disease. The temporal relationship between exposure and disease may be difficult to establish given the fact that the study is retrospective. Finally, probably the greatest limitation of case-control studies is that they are more susceptible to bias than other analytical study designs. In particular selection and recall bias.

27 Example: Is gastro-esophageal reflux a risk factor for esophagus cancer?
How were cases selected? Were cases representative of patients with disease? How were controls selected? Were controls representative of patients from source population without disease? How were risk factors measured? How did they minimize measurement bias for risk factors? How were outcomes measured? How did they minimize measurement bias for outcomes?

28 Setting-up a case-control study
Identify group of cases Identify group of controls Question both groups for possible exposure Measure the frequency of exposure occurrence in both groups Compare the frequency of exposure between cases and controls Slide 28 Setting-up a case-control study What we need to set up a case-control study: Identify the group of cases, usually all or part of the people affected by a disease during an outbreak Identify a group of controls, meaning people who share the same condition, environment and risk BUT who are healthy or without the specific diseases under study Question both groups for exposure history. In practical terms its means producing a questionnaire and submitting it to the cases and controls. The questionnaire is about possible exposures: what did you eat? What did you drink? Where did you go? Where do you live, etc. Once you get the results you measure the proportion of cases as well as of the controls who were exposed to each specific exposure. Compare the frequency of exposure between the cases and the controls.

29 Case-control studies FROM SOURCE POPULATION:
Select cases with outcome (representative of cases in source population) Select controls without outcome (same exposure distribution to RF as source population) Hospital, clinic, neighborhood, population Can be > 1 control per case (Increases power and face validity, and decreases selection bias) Outcome can be disease, disability or positive outcome Measure strength of association of RF and outcome with OR (~RR) Outcome can be disease, disability or positive outcome

30 Two Characteristics of Cases
Representativeness: Ideally, cases are a random sample of all cases of interest in the source population (e.g. from vital data, registry data). More commonly they are a selection of available cases from a medical care facility. (e.g. from hospitals, clinics) Methods of selection: Selection may be from incident or prevalent cases Incident cases are those derived from ongoing ascertainment of cases over time Prevalent cases are derived from a cross-sectional survey

31 Selection of Cases Population-based cases:
Include all subjects or a random sample of all subjects with the disease at a single point or during a given period of time in the defined population. Hospital-based cases: All patients in a hospital department at a given time

32 Controls Definition: A sample of the source population that gave rise to the cases. Purpose: To estimate the exposure distribution in the source population that produced the cases.

33 Characteristics of Controls
Who is the best control? Where should controls come from? If cases are a random sample of all cases in the population, then controls should be a random sample of all non-cases in the population sampled at the same time (i.e. from the same study base) But if study cases are not a random sample of the university of all cases, it is not likely that a random sample of the population of non-cases will constitute a good control population.

34 Three Qualities Needed in Controls
Comparability is more important than representativeness in the selection of controls The control should be at risk of the disease The control should resemble the case in all respects except for the presence of disease

35 Comparability vs. Representativeness
Usually, cases in a case-control study are not a random sample of all cases in the population. And if so, the controls must be selected in the same way (and with the same biases) as the cases. If follows from the above, that a pool of potential controls must be defined. This is a universe of people from whom controls may be selected (study base).

36 Three Qualities Needed in Controls
Cases emerge within a study base. Controls should emerge from the same study base, except that they are not cases. For example, if cases are selected exclusively from hospitalized patients, controls must also be selected from hospitalized patients.

37 Three Qualities Needed in Controls
If cases must have gone through a certain ascertainment process (e.g. screening), controls must have also. (e.g. mammogram-detected breast cancer) If cases must have reached a certain age before they can become cases, so must controls. (thus we always match on age) If the exposure of interest is cumulative over time, the controls and cases must each have the same opportunity to be exposed to that exposure. (if the case has to work in a factory to be exposed to benzene, the control must also have worked where he/she could be exposed to benzene)

38 Sources of controls a) Population of defined area b) Hospital patients
c) Probability sample of total population d) Neighbors (i) walk (door to door) (ii) phone (random digit dialing) (iii) letter carrier routes e) Friends or associates of cases f) Siblings, spouses or other relatives g) Other

39 Selection of Controls General population controls: Most often used when cases are selected from a defined geographic population registries, households, telephone sampling, drivers’ license costly and time consuming recall bias eventually high non-response rate Advantages: assured that they come from the same base population as the cases Disadvantages: Time consuming, expensive, hard to contact and get cooperation; may remember exposures differently than cases

40 Selecting Controls Hospital controls
Used most often when cases are selected from a hospital population Easy to identify; less recall bias; higher response rate Example: Study of cigarette smoking and myocardial infarction among women. Cases identified from admissions to hospital coronary care units. Controls drawn from surgical, orthopedic, and medical unit of same hospital. Controls included patients with musculoskeletal and abdominal disease, trauma, and other non-coronary conditions.

41 Hospital controls Advantages: Disadvantages:
Same selection factors that led cases to hospital led controls to hospital Easily identifiable and accessible (so less expensive than population-based controls) Accuracy of exposure recall comparable to that of cases since controls are also sick Disadvantages: More willing to participate than population-based controls Since hospital based controls are ill, they may not accurately represent the exposure history in the population that produced the cases Hospital catchment areas may be different for different diseases

42 What illnesses make good hospital controls?
Those illnesses that have no relation to the risk factor(s) under study Example: Should respiratory diseases be used as controls for a study of smoking and myocardial infarction? Do they represent the distribution of smoking in the entire population that gave rise to the cases of MI?

43 Selecting Controls Special control groups like friends, spouses, siblings, and deceased individuals. These special controls are rarely used. Some cases are not able to nominate controls because they have few appropriate friends, are widowed, or are only or adopted children. Dead controls are tricky to use because they are more likely than living controls to smoke and drink.

44 Misconception about Control Selection
Representativeness Wrong Of all person with diseases Of the entire non-diseased population Correct the source population for the cases is the one that the controls should represent Exposure opportunity Not needed, as in a real follow-up study

45 For one control Basic Analysis
Data is expressed in a four-fold table, and an odds ratio is calculated (relative risks have no meaning here-why?) Case Controls Exposed a b Unexposed c d OR= ad/bc

46 Multiple Exposure Levels
B1 High A1 D Not exposed C Cases Exposure level B2 Medium A2 B3 Low A3 OR1 OR2 OR3 Reference Controls OR Slide 46 Multiple exposure levels If an association between an exposure and a health event has been established, epidemiologists often determine if there is any difference related to the level of exposure as described in the hypothetical table in the slide. This analysis is called Dose-Response: there is an increased risk of disease with the increasing amount of exposure The way to do this analysis is to draw a two-by-N table where N represents the categories or doses of exposure. In our example we have defined three categories of exposure: high, medium and low. The unexposed category is used as reference meaning that each OR is computed using the C and D cells with the respective A and B. Ex: OR1 = A1*D / B1*C OR2 = A2*D / B2*C and so on.

47 Relation of Hepato cellular Adenoma to duration of oral contraceptive use in 79 cases and 220 controls Months of OC use Cases Controls Odds ratio 0-12 7 121 13-36 11 49 37-60 20 23 61-84 21 20 Slide 47 Relation of hepato-cellular Adenoma to duration of oral contraceptive use in 79 cases and 220 controls In this example, we can see how the epidemiologists wanted to study the effect of different time exposure between the use of oral contraceptives (OC) and the development of hepato-cellular adenoma. It became evident from the study that increasing the amount of time using OC increased the risk of developing the disease. >= 85 20 7 Total 79 220 Source: Rooks & col. 1979

48 Relation of Hepato cellular Adenoma to duration of oral contraceptive use in 79 cases and 220 controls Months of OC use Cases Controls Odds ratio 0-12 7 121 Ref. 13-36 11 49 3.9 37-60 20 23 15.0 61-84 21 18.1 >= 85 49.7 Total 79 220 Source: Rooks & col. 1979 Slide 48 Relation of hepato-cellular Adenoma to duration of oral contraceptive use in 79 cases and 220 controls In this example, we can see how the epidemiologists wanted to study the effect of different time exposure between the use of oral contraceptives (OC) and the development of hepato-cellular adenoma. It became evident from the study that increasing the amount of time using OC increased the risk of developing the disease.

49 Do you believe their results?
Selection bias? Cases, controls Measurement bias? Outcomes, Risk factors Causation? Strength of association: between exposure and illnesses Dose response frequency, severity, duration of symptoms Biological plausibility: too subjective, causal/non-causal

50 Case-Control Studies: Biases
Bias in measurement of risk factors because: Retrospective measurement Differential recall bias Decrease measurement bias for outcomes and RF by: Standardize definitions, instrument and process Train assessors Use data recorded before outcome is known Blinding of subject and observer Re-analyze data with more conservative definitions

51 Case-Control Studies: decrease biases
Decrease selection bias by: Population based sample Cases - registry Controls - from same population (random digit dialing) Sample cases and controls in same way (same clinic) so risk factors/exposure is the same Minimize non-participants >1 control groups (increases power and generalizability) Matching Case and control comparable on RF that is not interesting, or not modifiable e.g. age, gender Advantages: Increased precision, decreased confounding Disadvantages: Loss of data, increased time, cost, complexity, irreversible. Are cases representitive of all those with outcome? Misdiagnosed, not diagnosed, dead Convenience

52 SIX ISSUES IN MATCHING CONTROLS, CASE-CONTROL STUDY
 1. Identify the pool from which controls may come. This pool is likely to reflect the way controls were ascertained (hospital, screening test, telephone survey). 2. Control selection is usually through matching. Matching variables (e.g. age), and matching criteria (e.g. control must be within the same 5 year age group) must be set up in advance. 3. Controls can be individually matched or frequency matched

53 SIX ISSUES IN MATCHING CONTROLS, CASE-CONTROL STUDY
INDIVIDUAL MATCHING: search for one (or more) controls who have the required MATCHING CRITERIA. PAIRED or TRIPLET MATCHING is when there is one or two controls individually matched to each case. FREQUENCY MATCHING: select a population of controls such that the overall characteristics of the group match the overall characteristics of the cases. e.g. if 15% of cases are under age 20, 15% of the controls are also.

54 SIX ISSUES IN MATCHING CONTROLS, CASE-CONTROL STUDY
 4. AVOID OVER-MATCHING. match only on factors known to be causes of the disease. 5. Obtain POWER by matching MORE THAN ONE CONTROL PER CASE. In general, N of controls should be < 4, because there is no further gain of power above four controls per case. 6. Obtain GENERALIZABILITY by matching more than ONE TYPE OF CONTROL

55 Paired Analysis Case Exposed Unexposed Both Mixed Controls Neither

56 McNemar chi2=(t+s)2/(t-s)
Paired Analysis For one control Case Exposed Unexposed r s Controls t u McNemar chi2=(t+s)2/(t-s)

57 More points about case-control analysis
The odds ratio is a good estimate of the relative risk when the disease is rare (prevalence <20%) Can be extended to N>1 controls Statistical testing is by simple chi-square (unmatched analysis) or by McNemar’s chi square (matched-pairs analysis) Can be extended to multiple strata ( Mantel-Haenzel chi-square)

58 Matching

59 Case-control study of lung cancer and uranium mining
Cases Controls

60 Cases Controls

61 Disease status Cases Controls

62 Matching types Individual (pair wise) Group (frequency)

63 Matching continuous variables
Category matching Case is a 42 year old black male Divide controls by age group: 30-34, 35-39, 40-44, 45-49, etc Control is a black male from the age group Caliper matching Control is a black male aged 42 ± 5 years

64 Individual matching Possible combinations Data layout

65 Individual matching Note: these are pairs not individuals

66 X Y Note: in calculating matched odds ratio (mOR) only discordant pairs are taken into account

67 Example Matched case-control study of work at a uranium mine and reduced sperm Cases: 400 men with low sperm count diagnosed in a Utah clinic Controls: 400 healthy men matched on race, age, area of residence, smoking and drinking habits

68 Results Matched pairs in which both men worked in uranium mine: 8
Matched pairs in which case had mine exposure but control did not: 18 Matched pairs in which case had no mining background but control did: 4 Matched pairs in which neither had worked in the mines: 370

69 What if we performed unmatched analyses?
A wrong result

70 การประยุกต์ใช้ในสถานการณ์จริง
Case-control study Use a case-control design to study uncommon diseases Cases and control should originate from the same population With individual matching controls are individually linked to cases. With frequency matching, controls are chosen as a group to have a similar distribution as the cases on the matched variable Perform matched studies with very small sample sizes or when you have multiple category nominal variables Can not be used for determining the prevalence or incidence of a disease

71 Cross-sectional studies
Snap shot Measure exposure and outcome variables at one point in time. Main outcome measure is prevalence P = Number of people with disease x at time t Number of people at risk for disease x at time t Prevalence=k x Incidence x Duration NB Prevalence versus incidence (get over time) Relative prevalence - prev in group with RF compared to those without,

72

73 Cross-sectional studies - Strengths
Useful baseline assessment Generalizable results if population based sample Study multiple outcomes and exposures Immediate outcome assessment and no loss to follow-up, therefore faster, cheaper, easier Can measure prevalence Hypothesis generating for causal links Serial surveys eg, Census Serial surveys to avoid learning effect from baseline measurement with cohort study

74 Cross-sectional studies - Weaknesses
Provide limited information Cannot establish sequence of events Not for causation or prognosis (inc, RR, AR) Look for biological plausibility in causal links Impractical for rare diseases if pop based sample (eg, gastric ca 1/10,000). Could use in rare disease registry (Kaposis sarcoma in AIDS). Prone to bias (selection, measurement)

75 Bias in cross-sectional studies
Selection Bias (eg, NSSP study) Is study population representative of target population? Is there systematic increase or decrease of RF? Measurement Bias Outcome Misclassified (dead, misdiagnosed, undiagnosed) Length-biased sampling Cases overrepresented if illness has long duration and are underrepresented if short duration.(Prev = k x I x duration) Risk Factor Recall bias Prevalence-incidence bias RF affects disease duration not incidence eg, HLA-A2 Is there systematic increase or decrease of RF? Child care increases liklihood of going to MDs office. NSSP increases liklihood of going to MDs office. Does child care increase risk of NSSP? Prevalence incidence bias. HLA-A2 affects survaival of children with leukaemia, not a RF for poor prognosis.

76 Cross-sectional studies - Uses
Prevalence used in planning Individual Pre-treament probability for Rx and Dx Population Health care services Describe distribution of variables (Census, NHANES, Table 1) Examine associations among variables Hypothesis generating for causal links Prediction rule eg, Ottawa ankle rule – XR if 3 factors present Ottawa ankle rule. XR if age > 55 yrs, unable to wt bear and bone tenderness on maleolus.

77 Observational studies
Cohort Exposure to outcome Case control Outcome to exposure Cross-sectional Exposure and outcome ALL ARE PRONE TO BIAS Selection Bias Population based sample, large sample, selection criteria, matching Measurement Bias Standardization, training, prospective data collection, blinding


Download ppt "Epidemiological study design"

Similar presentations


Ads by Google