Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Slides:



Advertisements
Similar presentations
How would you explain the smoking paradox. Smokers fair better after an infarction in hospital than non-smokers. This apparently disagrees with the view.
Advertisements

Observational Studies and RCT Libby Brewin. What are the 3 types of observational studies? Cross-sectional studies Case-control Cohort.
Study Designs in GWAS Jess Paulus, ScD January 30, 2013.
Case-Control Studies (Retrospective Studies). What is a cohort?
Chance, bias and confounding
Lecture 17: Regression for Case-control Studies BMTRY 701 Biostatistical Methods II.
Categorical Data. To identify any association between two categorical data. Example: 1,073 subjects of both genders were recruited for a study where the.
Introduction to Risk Factors & Measures of Effect Meg McCarron, CDC.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
BIOST 536 Lecture 3 1 Lecture 3 – Overview of study designs Prospective/retrospective  Prospective cohort study: Subjects followed; data collection in.
BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.
Case-Control Studies. Feature of Case-control Studies 1. Directionality Outcome to exposure 2. Timing Retrospective for exposure, but case- ascertainment.
BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model.
Manish Chaudhary BPH, MPH
Dr K N Prasad MD., DNB Community Medicine
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Case Control Study Manish Chaudhary BPH, MPH
Case-Control Studies and Odds Ratio STAT 6395 Spring 2008 Filardo and Ng.
Unit 6: Standardization and Methods to Control Confounding.
Multiple Choice Questions for discussion
Dr. Abdulaziz BinSaeed & Dr. Hayfaa A. Wahabi Department of Family & Community medicine  Case-Control Studies.
Case-Control Studies (retrospective studies) Sue Lindsay, Ph.D., MSW, MPH Division of Epidemiology and Biostatistics Institute for Public Health San Diego.
Case control study Moderator : Chetna Maliye Presenter Reshma Sougaijam.
Lecture 8 Objective 20. Describe the elements of design of observational studies: case reports/series.
Evidence-Based Medicine 4 More Knowledge and Skills for Critical Reading Karen E. Schetzina, MD, MPH.
Research Study Design and Analysis for Cardiologists Nathan D. Wong, PhD, FACC.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
Study Design. Study Designs Descriptive Studies Record events, observations or activities,documentaries No comparison group or intervention Describe.
Epidemiology The Basics Only… Adapted with permission from a class presentation developed by Dr. Charles Lynch – University of Iowa, Iowa City.
Evidence-Based Medicine 3 More Knowledge and Skills for Critical Reading Karen E. Schetzina, MD, MPH.
CHP400: Community Health Program- lI Research Methodology STUDY DESIGNS Observational / Analytical Studies Case Control Studies Present: Disease Past:
Case Control and Cohort studies
BC Jung A Brief Introduction to Epidemiology - IX (Epidemiologic Research Designs: Case-Control Studies) Betty C. Jung, RN, MPH, CHES.
Case-control studies Overview of different types of studies Review of general procedures Sampling of controls –implications for measures of association.
Bias Defined as any systematic error in a study that results in an incorrect estimate of association between exposure and risk of disease. To err is human.
Literature searching & critical appraisal Chihaya Koriyama August 15, 2011 (Lecture 2)
Mother and Child Health: Research Methods G.J.Ebrahim Editor Journal of Tropical Pediatrics, Oxford University Press.
Design and Analysis of Clinical Study 6. Case-control Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
A short introduction to epidemiology Chapter 2b: Conducting a case- control study Neil Pearce Centre for Public Health Research Massey University Wellington,
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Case-control study Chihaya Koriyama August 17 (Lecture 1)
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
Case Control Study Dr. Ashry Gad Mohamed MB, ChB, MPH, Dr.P.H. Prof. Of Epidemiology.
Causal relationships, bias, and research designs Professor Anthony DiGirolamo.
Case Control Study : Analysis. Odds and Probability.
Overview of Study Designs. Study Designs Experimental Randomized Controlled Trial Group Randomized Trial Observational Descriptive Analytical Cross-sectional.
Case-Control Study Duanping Liao, MD, Ph.D
1 Chapter 16 logistic Regression Analysis. 2 Content Logistic regression Conditional logistic regression Application.
Case-Control Studies Abdualziz BinSaeed. Case-Control Studies Type of analytic study Unit of observation and analysis: Individual (not group)
1 G Lect 7a G Lecture 7a Comparing proportions from independent samples Analysis of matched samples Small samples and 2  2 Tables Strength.
COHORT STUDY COHORT A group of people who share a common characteristic or experience within a defined period of time. e.g. age, occupation, exposure.
Matching. Objectives Discuss methods of matching Discuss advantages and disadvantages of matching Discuss applications of matching Confounding residual.
POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)
CASE CONTROL STUDY. Learning Objectives Identify the principles of case control design State the advantages and limitations of case control study Calculate.
Analytical Studies Case – Control Studies By Dr. Sameh Zaytoun (MBBch, DPH, DM, FRCP(Manch), DTM&H(UK),Dr.PH) University of Alexandria - Egypt Consultant.
Conditional Logistic Regression Epidemiology/Biostats VHM812/802 Winter 2016, Atlantic Veterinary College, PEI Raju Gautam.
THE CHI-SQUARE TEST BACKGROUND AND NEED OF THE TEST Data collected in the field of medicine is often qualitative. --- For example, the presence or absence.
Analysis of matched data Analysis of matched data.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Case Control study. An investigation that compares a group of people with a disease to a group of people without the disease. Used to identify and assess.
Measures of disease frequency Simon Thornley. Measures of Effect and Disease Frequency Aims – To define and describe the uses of common epidemiological.
Epidemiological Study Designs And Measures Of Risks (1)
Chapter 9: Case Control Studies Objectives: -List advantages and disadvantages of case-control studies -Identify how selection and information bias can.
Chihaya Koriyama August 17 (Lecture 1)
Present: Disease Past: Exposure
Biostatistics Case Studies 2016
Epidemiological Methods
CASE-CONTROL STUDIES Ass.Prof. Dr Faris Al-Lami MB,ChB MSc PhD FFPH
Case-control studies: statistics
Presentation transcript:

Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Study design in epidemiology Observational study individual Case-control study Cohort study population Ecological study intervention

Why case-control study? In a cohort study, you need a large number of the subjects to obtain a sufficient number of case, especially if you are interested in a rare disease. –Gastric cancer incidence in Japanese male: / 100,000 person year A case-control study is more efficient in terms of study operation, time, and cost.

Comparison of the study design Case-controlCohort Rare diseases suitable not suitable Number of disease1 1< Sample sizerelatively small need to be large Control selection difficult easier Study periodrelatively short long Recall bias yes no Risk difference no available available

Case-control study - Sequence of determining exposure and outcome status Step1: Determine and select cases of your research interest Step2: Selection of appropriate controls Step3: Determine exposure status in both cases and controls

Case ascertainment What is the definition of the case? –Cancer (clinically? Pathologically?) –Virus carriers (Asymptomatic patients) → You need to screen the antibody –Including deceased cases? You have to describe the following points, –the definition –when, where & how to select

Who will be controls? Control ≠ non-case –Controls are also at risk of the disease in his(her) future. –“Controls” are expected to be a representative sample of the catchment population from which the case arise. –In a case-control study of gastric cancer, a person who has received the gastrectomy cannot be a control since he never develop gastric cancer.

1) a population-based case-control study Both cases and controls are recruited from the population. 2) a case-control study nested in a cohort Both case and controls are members of the cohort. 3) a hospital-based case-control study Both case and controls are patients who are hospitalized or outpatients. Controls with diseases associated with the exposure of interest should be avoided. Various types of case-control studies

The following points should be recorded (described in your paper) The list (number) of eligible cases whose medical records unavailable The list (number) of refused subjects, if possible, with descriptions of the reasons of refusal The length of interview The list (number) of subjects lacking the measurement data, with descriptions of the reasons

Exploratory or Analytic Exploratory case-control studies –There is no specific a priori hypothesis about the relationship between exposure and outcome. Analytic case-control studies –Analytic studies are designed to test specific a priori hypotheses about exposure and outcome.

Case-control study - information Sources of the information of exposure and potential confounding factors –Existing records –Questionnaires –Face-to-face / telephone interviews –Biological specimens –Tissue banks –Databases on biochemical and environmental measurements

Temporality is essential in Hill’s criteria Disease onset Initial Symptoms Clinical Diagnosis The study exposure is unlikely to be altered at this stage because of the disease. The study exposure is more likely to be altered at this stage because of the symptoms. Essential Epidemiology (WA Oleckno)

Bias should be minimized Bias & Confounding –Selection bias –Detection bias –Information bias (recall bias) –Confounding Confounding can be controlled by statistical analyses but we can do nothing about bias after data collection.

Case-control studies ・・・ are potential sources of many biases should be carefully designed, analyzed, and interpreted.

How can we solve the problem of confounding in a case-control study? “Prevention” at study design Limitation Matching in a cohort study But not in a case-control study

Matching in a case-control study Matched by confounding factor(s) to increase the efficiency of statistical analysis Cannot control confounding –A conditional logistic analysis is required.

Over matching Matched by factor(s) strongly related to the exposure which is your main interest –CANNOT see the difference in the exposure status between cases and controls

How can we solve the problem of confounding? “Treatment “ at statistical analysis Stratification by a confounder Multivariate analysis

What you should describe in the materials and methods, 1.Study design 2.Definition of eligible cases and controls –Inclusion / exclusion criteria of cases and controls 3.Number of the respondents and response rate 4.Main exposure and other factors including potential confounding factors

5.Sources of the information of exposure and other factors 6.Matched factors, if any 7.The number of subjects used in statistical analyses 8.Statistical test(s) and model(s) 9.Name and version of the statistical software What you should describe in the materials and methods,

Assuring adequate study power Following information is necessary –The confidence level desired (usually 95% corresponding to a p-value of 0.05) –The level of power desired (80-95%) –The ratio of controls to cases –The expected frequency of the exposure in the control group –The smallest odds ratio one would like to be able to detect (based on practical significance)

Statistical analysis “Matched” vs. “Unmatched” studies The procedures for analyzing the results of case-control studies differ depending on whether the cases and controls are matched or unmatched. MatchedUnmatched ・ McNemar’s test ・ Chi-square test ・ Conditional logistic ・ Unconditional logistic regression analysis regression analysis

Advantages of pair matching in case- control studies Assures comparability between cases and controls on the selected variables May simplify the selection of controls by eliminating the need to identify a random sample Useful in small studies where obtaining cases and controls that are similar on potentially confounding factors may otherwise be difficult Can assure adequate numbers of subjects with specified characteristics so as to permit statistical comparisons Essential Epidemiology (WA Oleckno)

Disdvantages of pair matching in case- control studies May be difficult or costly to find a sufficient number of controls Eliminates the possibility of examining the effects of the matched variables on the outcome Can increase the difficulty or complexity of controlling for confounding by the remaining unmatched variables Overmatching Can result in a greater loss of data since a pair of subjects has to be eliminated even if ne subject is not responsive Essential Epidemiology (WA Oleckno)

Lung cancer Controls casesN=100 Smokers (NOT recently started) ↓ ↓ An example of unmatched case-control study CasesControls smoker7040 Non-smoker3060 Odds ratio=

Risk measure in a case-control study Odds = prevalence / (1 - prevalence) Odds ratio = odds in cases / odds in controls Disease + ( case ) -( control ) +ac Exposure - bd Exposure odds in cases = a / b Exposure odds in controls = c / d Odds ratio = (a / b) / (c / d) = a * d / b * c

Lung cancer Matched controls Casesby sex & ageN=100 Smokers (NOT recently started) ↓ ↓ An example of matched case-control study Case SmokerNon-smoker Control smoker3010 Non-smoker4020 Notice that this is the distribution of 100 matched pairs.

McNemar’s test Case SmokerNon-smoker Control smoker3010 Non-smoker4020 Chi-square (test) statistic = (40 – 10) 2 / (40+10) = 18 where degree of freedom is “1”. Odds ratio = 40 / 10 = 4

Logistic regression analysis Logistic regression is used to model the probability of a binary response as a function of a set of variables thought to possibly affect the response (called covariates). 1: case (with the disease) Y = 0: control (no disease)

One could imagine trying to fit a linear model (since this is the simplest model !) for the probabilities, but often this leads to problems: In a linear model, fitted probabilities can fall outside of 0 to 1. Because of this, linear models are seldom used to fit probabilities. Probability

In a logistic regression analysis, the logit of the probability is modelled, rather than the probability itself. P = probability of getting disease p logit (p) = log 1-p As always, we use the natural log. The logit is therefore the log odds, since odds = p / (1-p)

Simple logistic regression (with a continuous covariate) Suppose we give each of several beetles some dose of a potential toxic agent (x=dose), and we observe whether the beetle dies (Y=1) or lives (Y=0). One of the simplest models we can consider is to assume that the relationship of the logit of the probability of death and the dose is linear, i.e., p x logit (p x ) = log =  +  x 1 – p x where p x = probability of death for a given dose x, and  and  are unknown parameters to be estimated from the data.

The values of  and  will determine whether or not and how steeply the dose-response curve rises (or falls) and where it is centered. If  = 0 p x is constant over x  > 0p x increases with x  < 0p x decreases with x H 0 :  = 0 is the null hypothesis in a “test of trend” when x is a continuous variable. Knowledge of  would give us insight to the direction and degree of association outcome and exposure. e (  +  x) Px = 1 + e (  +  x)

Simple logistic regression (with a dichotomous covariate) Suppose we are considering a case-control study where the response variable is disease (case) / non-disease (control) and the predictor variable is exposed / non-exposed, which we “code” as an indicator variable, or dummy variable. 1D 1 1E 1 Y = x = 0D 0 0E 0 And p x = Prob (disease given exposure x) = P (Y = 1 | x) x = 0, 1 Thus, p 1 = probability of disease among exposed p 0 = probability of disease among non-exposed

In case of exposure (X=1): logit(P E1 )=intercept +  In case of non-exposure (X=0): logit (P E0 ) =intercept If you want to obtain odds ratio of exposure group, OR =( P E1 / (1-P E1 ) ) / (P E0 / (1-P E0 )) log(OR) = log { ( P E1 / (1-P E1 ) ) / (P E0 / (1-P E0 ))} = log (P E1 / (1-P E1 )) – log(P E0 / (1-P E0 )) = logit (P for exposure ) – logit (P for non-exposure ) = (intercept +  ) – intercept =  OR = e  Definition of odds ratio

Simple logistic regression (with a covariate having more than two categories) Suppose we are considering a case-control study where the predictor variable is current smoker / ex- smoker / non-smoker, which we “code” as a dummy variable. CaseSmoking status SMK1 (X1) SMK2 (X2) 1Current10 0Ex-smoker01 1Non-smoker00 1Ex-smoker01 0Non-smoker Original dataDummy variables

Logistic regression model of the previous example logit (P) =  +  1 (X 1 ) +  2 (X 2 ) In case of current smoker (X 1 =1, X 2 =0): logit(P current )=  +   In case of ex-smoker (X 1 =0, X 2 =1) : logit(P ex )=  +   In case of non-smoker (X 1 =0, X 2 =0) : logit(P non )=  OR current = e   OR ex = e   OR non = 1 (referent)

Wald’s test for no association The null hypothesis of no association between outcome and exposure corresponds to H 0 : OR=1 or H 0 :  =logOR=0 Using logistic regression results, we can test this hypothesis using standard coefficients or Wald’s test. Note: STATA and SAS present two-sided Wald’s test p-values.

Likelihood Ratio Test (LRT) An alternative way of testing hypotheses in a logistic regression model is with the use of a likelihood ratio test. The likelihood ratio test is specifically designed to test between nested hypotheses. H 0 : log (P x / (1-P x )) =  H A : log (P x / (1-P x )) =  +  x and we say that H 0 is nested in H A.

Likelihood Ratio Test (LRT) In order to test H 0 vs. H A, we compute the likelihood ratio test statistic: G= -2 ・ log(L H 0 / L H A ) = 2 (log L H A – log L H 0 ) = (-2log L H 0 ) – (-2log L H A ) Where L H A is the maximized likelihood under the alternative hypothesis H A and L H 0 is the maximized likelihood under the null hypothesis H 0. If the null hypothesis H 0 were true, we would expect the likelihood ratio test statistic to be close to zero.

Wald’s test vs. LRT In general, the LRT often works a little better than the Wald test, in that the test statistic more closely follows a X 2 distribution under H 0. But the Wald test often works very well and usually gives similar results. More importantly, the LRT can more easily be extended to multivariate hypothesis tests, e.g., H 0 :  1 =  2 = 0 vs. H A :  1 =  2 = 0

World J. Gastroenterology 2006

216 CASES 173 formalin-fixed paraffin-embedded blocks We could not obtain the information on tumor location for 23 cases, and those cases were excluded from the tumor location specific analysis. 81 cases were excluded REFUSED TO PARTICIPATE IN THE STUDY LIVED IN VALLE DEL CAUCA LESS THAN 5 YEARS RECURRENT CASES COULD NO CONTACT Recruitment of cases PATIENTS NEWLY DIAGNOSED AS G.C. 395 Sep.2000 ~ Dec.2002

431 CONTROLS POTENTIAL CONTROLS LIVED IN VALLE DEL CAUCA LESS THAN 5 YEARS REFUSED TO PARTICIPATE IN THE STUDY Histry of G.C. Recruitment of controls Matched by sex, age (5-year ), hospital, date of administration Case: control= 1 : 2 Major diseases of controls cardiovascular diseases ( 208 ) trauma ( 117 ) infectious diseases ( 38 ) urological disorders ( 21)

xi:logistic casocon i.fumar i.fumar _Ifumar_0-2 (naturally coded; _Ifumar_0 omitted) Logistic regression Number of obs = 647 LR chi2(2) = 4.24 Prob > chi2 = Log likelihood = Pseudo R2 = casocon | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] _Ifumar_1 | _Ifumar_2 | | gastric cancer Smoking | 0 1 | Total Never 0 | | 266 Ex- 1 | | 234 Current 2 | | Total | | 647 Walt’s test p values

xi:clogit casocon i.fumar, group(identi) or Conditional (fixed-effects) logistic regression Number of obs = 647 LR chi2(2) = 4.64 Prob > chi2 = Log likelihood = Pseudo R2 = casocon | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] _Ifumar_1 | _Ifumar_2 | Wald’s test p values Fumar=0 Fumar=1 Fumar=2 Results of conditional logistic regression analysis using the same data Case Control OR (95%CI) Stata command

GC risk by smoking in Cali, Colombia results of tumor-location specific analysis P = 0.51 P value by LRT This test examines the difference in the magnitude of the association between smoking and GC risk among 3 tumor sites.