Presentation is loading. Please wait.

Presentation is loading. Please wait.

BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model.

Similar presentations


Presentation on theme: "BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model."— Presentation transcript:

1 BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model

2 BIOST 536 Lecture 4 2 Logistic regression estimation Modeling a binary outcome  Modeling the P(Y=1|X=x), not a continuous Y  Linear model has problems since the left-hand side 0 ≤ Pr ≤ 1, but the  ’s can be anywhere between (-∞,∞) so does not do well  Instead assume that P(Y=1|X=x) depends on X only through the linear combination  Need to link Z to P(Y=1|Z=z) so we use a logistic function

3 BIOST 536 Lecture 4 3 Logistic function Symmetric – could model either Pr(Y=1) or Pr(Y=0) and the effect of covariates would be the same Epidemiologists prefer logistic model since the OR is easily derived Mathematically convenient form Maximum likelihood equations have a simple form Need to solve these equations by iteration Iteration rarely goes awry if the data are not too sparse

4 BIOST 536 Lecture 4 4 Link functions Logistic function very close to the probit model (symmetric) Probit model used more in classical diagnostic testing and ROC analysis Call “logistic” and “probit” link functions that link the P(Y|X) though the linear combination Other link functions include something called the “complimentary log-log” (asymmetrical) Some general regression programs in Stata expect you to specify the link function We will assume a logistic link function here but can test in Stata using the linktest command

5 BIOST 536 Lecture 4 5 Simple estimation example Exposed (X=1)Unexposed (X=0)Total Case (Y=1)341852 Control (Y=0)6682148 Total100 200 Adopt a logistic model so For the unexposed, probability of a case is For the unexposed, the probability of a control is For the exposed, the probability of a case is For the exposed, the probability of being a control is

6 BIOST 536 Lecture 4 6

7 7 The log likelihood is a well-behaved surface that is a function of the betas The maximum log likelihood is -111.2429 achieved at For the unexposed, the estimate is which is just the proportion of cases in the unexposed group For the exposed, the estimate is which is just the proportion of cases in the exposed group

8 BIOST 536 Lecture 4 8 Now do this under the null hypothesis that exposure does not make a difference, i.e. Then the likelihood depends on only Then the log likelihood is a function of The maximum log likelihood is -114.6114 achieved at Ignoring exposure, the estimated probability of being a case is which is the overall proportion of cases

9 BIOST 536 Lecture 4 9 Tests comparing nested models Want to decide if the more complex model is significantly better than the simpler model Possible tests comparing nested models (complex model includes all covariates of the simpler model) 1. Likelihood ratio test – direct comparison of the difference in log- likelihoods Preferred test Does not change with reparametrization 2. Score test – test computed at the null hypothesis values Very similar to LR test Sometimes can be computed when the LR test cannot Many of our common tests are score tests 3. Wald test – depends on the normality of the distribution of the estimates Can change with reparametrization P-value given in Stata output for individual variables

10 BIOST 536 Lecture 4 10

11 BIOST 536 Lecture 4 11

12 BIOST 536 Lecture 4 12 Summary about estimation Maximum likelihood estimation is preferred for binary outcome data The log likelihood depends on the betas that in turn depend on what covariates are in the model In some cases the beta estimates are related to familiar values, but usually have to iterate to get the estimates Difference in log likelihoods can sometimes test one model against another Odds ratios turn out to be related to the logistic regression coefficients

13 BIOST 536 Lecture 4 13 Example Study of identification of domestic violence (DV) identification in a medical setting ( PI: Robert Thompson, MD ; Co-PI: Fred Rivara, MD) Clinics were randomized to be either intervention (2 clinics) or control clinics (3 clinics) Intervention clinics received training in DV detection and some support services; clinics enrollees received materials Questions: 1. Did the intervention improve detection ? 2. Did it improve the rate of physicians asking about DV?

14 BIOST 536 Lecture 4 14

15 BIOST 536 Lecture 4 15

16 BIOST 536 Lecture 4 16

17 BIOST 536 Lecture 4 17

18 BIOST 536 Lecture 4 18

19 BIOST 536 Lecture 4 19

20 BIOST 536 Lecture 4 20 Confounding Confounding variable is related to both disease and exposure occurrence that modifies the relationship between exposure and disease We can adjust for the confounder explicitly by modeling or implicitly through stratification Two necessary relationships: 1. Confounder must be related to exposure in the data 2. Confounder must be independently related to disease in the population Example 1: Suppose (1) Age  exposure to menopausal estrogens and (2) Age  endometrial cancer Should we control for age? Example 2: (1) menopausal estrogens  endometrial hyperplasia and (2) endometrial hyperplasia  endometrial cancer Do we want to control for endometrial hyperplasia in studying the association between exposure to menopausal estrogens and endometrial cancer ?

21 BIOST 536 Lecture 4 21 Failure to account for confounding can increase or decrease the odds ratio Hypothetical data from Breslow & Day, Volume 1

22 BIOST 536 Lecture 4 22 Observational cohort study

23 BIOST 536 Lecture 4 23

24 BIOST 536 Lecture 4 24

25 BIOST 536 Lecture 4 25 If this model is correct, would the Mantel-Haenszel approach give an alternative estimate of the odds ratio?

26 BIOST 536 Lecture 4 26 Prefer the initial parametrization because we may want to test whether  = 0 (no interaction) or  =  =0 (no association of exposure with outcome for either confounder level) The α’s give us the baseline levels for the two confounder levels

27 BIOST 536 Lecture 4 27 Case-control study

28 BIOST 536 Lecture 4 28 1 st part depends on the sampling and 2 nd on exposure and confounders

29 BIOST 536 Lecture 4 29 Case-control log odds ratio is the difference between two logits (exposed versus unexposed)

30 BIOST 536 Lecture 4 30

31 BIOST 536 Lecture 4 31 Example of sampling proportions

32 BIOST 536 Lecture 4 32 Summary about confounding We can control for confounding by stratification or modeling For a cohort sample with a binary exposure, binary confounder, and a binary outcome the probability model is For a case-control sample with a binary exposure, binary confounder, and a binary outcome the probability model is but the sampling fractions may be unknown The odds ratios can be estimated from either cohort or case-control studies, but absolute risk probabilities can be made only from a cohort study unless the sampling probabilities are known


Download ppt "BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model."

Similar presentations


Ads by Google