Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude.

Similar presentations


Presentation on theme: "Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude."— Presentation transcript:

1 Exact Logistic Regression Larry Cook

2 Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude Explore an example with a different issue where logistic regression fails Computational considerations Example SAS code

3 Logistic Regression Model a binary outcome, Y, with one or more predictors –Success/failure –Disease/not disease Model outcome in terms of the log odds of a success log(odds of Y i ) =  +  x i + 

4 Why Log Odds? Canonical link function Makes a binary outcome continuous Solves this problem –Probability is constrained to [0,1] –Odds are constrained to [0, ∞) Log odds are in (-∞, ∞) Exponentiating coefficients gives us estimates of odds ratios

5 Example: Motor Vehicle Crash Fatalities What are odds of being hospitalized or killed in a motor vehicle crash for drivers using safety restraints vs. those that are not? –Outcome: Hospitalized/killed or not –Covariate: safety belt use

6 Hospital/Killed * Restraint Use OR = 0.22, p-value < 0.001

7 Example: Motor Vehicle Crash Fatalities What are odds of being hospitalized or killed in a motor vehicle crash for drivers using safety restraints vs. those that are not? –Outcome: Hospitalized/killed or not –Covariate: safety belt use gender, age, alcohol, rural area

8 Logistic Regression Output ParameterEstimateOdds RatioP-value Intercept-0.261< 0.001 Male-0.5760.56< 0.001 Restraint Use -1.4300.24< 0.001 Alcohol1.0652.90< 0.001 Night0.1941.210.011 Rural0.1351.14<0.001

9 Assumptions Conditional probabilities follow a logistic function of the independent variables Observations are independent Asymptotics –Sample size is large enough –Minimum of 50 to 100 observations –10 successes/failures per variable

10 Corneal Graft Rejections What if studying a rare disease? Data for eight kids in young age group and eight in the older age group Hypothesis is that rejection is more likely in older children

11 Graft Rejections Young (< 4 y.o.) (X = 0) Older (> 4 y.o.) (X = 1) Total No Rejection (Y = 0) 729 Rejection (Y = 1) 167 Total8816 OR = 21, p-value = 0.012, 100% of cell have expected counts < 5!!! Fisher’s Exact Test p-value (2-sided) = 0.0406; (1-sided) = 0.0203

12 Let’s Tackle the Graft Rejection Example as Logistic Regression

13 Graft Rejections Young (< 4 y.o.)Older (> 4 y.o.)Total No Rejection729 Rejection167 Total8816 Sample Size << 50! Don’t have 10 success or 10 failures!

14 Exact (Conditional) Logistic Regression Rather than using the unconditional logistic regression, we will condition on nuisance parameters Use conditional maximum likelihood for estimation and inference

15 Warning Algebra Ahead Proceed with Caution

16 Logistic Model

17 Likelihood of a Sample

18 Sufficient Statistics

19 Conditioning If we are only trying to describe the relationship between rejection and age, do we care about the value of the intercept? Remove the intercept, , out of the likelihood by conditioning on its sufficient statistic, t 0 =  y i. Let S(t o ) = Set of all tables with  y i = t 0 and observed sample sizes

20 Conditional Likelihood

21 Estimation

22 Inference

23 End of Algebra Back to Example

24 Graft Rejections Young (< 4 y.o.) (X = 0) Older (> 4 y.o.) (X = 1) Total No Rejection (Y = 0) 729 Rejection (Y = 1) 167 Total8816 Sufficient Statistics t 0 =  y i = # of rejections = 7 t 1 =  x i y i = 0*# of rejections in young + 1*# of rejections in old = 0*1 + 1*6 = 6

25 Conditional Distribution for Graft Rejection Need to calculate all possible tables that have exactly 7 rejections Calculate how often each of the tables occur Calculate CMLE Calculate how rare our table is to obtain p-value

26 Reference Set Yng_NRYng_ROld_NROld_Rt0t0 t1t1 CountP[Table] 17807080.0007 2671712240.0196 3562721,5680.1371 4453733,9200.3427 5344743,9200.3427 6235751,5680.1371 7126762240.0196 80177780.007 711,4401.000

27 Estimate  and Find a p-value t1t1 CountP[Table] 080.0007 12240.0196 21,5680.1371 33,9200.3427 43,9200.3427 51,5680.1371 62240.0196 780.0007

28 Estimate and p-value t1t1 CountP[Table] 080.0007 12240.0196 21,5680.1371 33,9200.3427 43,9200.3427 51,5680.1371 62240.0196 780.0007

29 Confidence Interval Lower Bound,  - If t 1 = t 1,min –  - = -∞ Otherwise –  - is the value of  that produces an upper p-value of  /2 Upper Bound,  + If t 1 = t 1,max –  + = ∞ Otherwise –  + is the value of  that produces a lower p-value of  /2

30 Final Stats for Graft Rejection

31 Example 2 PECARN C-Spine Study

32 Case Control Study Not PresentPresentTotal Control1,05721,059 Case5400 Total1,059721,599 Any problems estimating the odds ratio? Could exact logistic regression help?

33 What sufficient statistics are needed? Not Present (X = 0) Present (X = 1) Total Control (Y = 0) 1,05721,059 Case (Y = 1) 5400 Total1,59721,599  y = 2  xy = 0

34 Conditional Density Case PCase NPCtrl PCtrl NPt0t0 t1t1 CountP[Table] 054021,05720560,2110.438 153911,05821571,8600.448 253801,05922145,5300.114 21,277,6011.000 One-sided p-value = 0.438 Two-sided p-value = 2*0.438 = 0.876 95% confidence interval (-∞, 2.345) Point estimate?

35 Median Unbiased Estimate

36 One More Example Dose Response

37 Toxicology Experiment 0123Total Lived99979590381 Died1351019 Total100 400 400 mice randomized to one of four levels of a drug Drug administered to each animal Outcome is the number of deaths in each dose level  y = 19  xy = 3 + 10 + 30 = 43

38

39 Exact vs. Unconditional Exact Estimate = 0.710 SE = 0.246 OR = 2.03 CI = (1.26, 3.52) p-value = 0.002 Unconditional Estimate = 0.712 SE = 0.246 OR = 2.04 CI = (1.26, 3.30) p-value = 0.004

40 Computational Issues

41 Counting All the Tables One of the main hurdles for conditional logistic regression is counting all the tables in the sample space –Graft rejections – 11,440 possibilities –PECARN C-Spine - 1,277,601 –Toxicology – 2.79 x 10 33 Obviously don’t want to generate tables one at a time

42 Network Algorithm Graphical representation of the sample space Nodes represent a partial sum of the sufficient statistic Arcs have combinatorial weighting value One path through the graph represents a table in the sample space

43 Example X = 1X = 2X = 3X = 4Total Y = 032218 Y = 101124 Total333312 Sufficient Statistics t 0 =  y i = 4 t 1 =  x i y i = 1*0 + 2*1 + 3*1 + 4*2 = 13

44 (1,0)(2,0) (1,1)(2,1)(3,1) (0,0)(1,2)(2,2)(3,2)(4,4) (1,3)(2,3)(3,3) (2,4)(3,4) X = 1X = 2X = 3X = 4Total Y = 013138 Y = 120204

45 (1,0)(2,0) (1,1)(2,1)(3,1) (0,0)(1,2)(2,2)(3,2)(4,4) (1,3)(2,3)(3,3) (2,4)(3,4) X = 1X = 2X = 3X = 4Total Y = 032218 Y = 101124

46 Network Representation of the Sample Space (1,0)(2,0) (1,1)(2,1)(3,1) (0,0)(1,2)(2,2)(3,2)(4,4) (1,3)(2,3)(3,3) (2,4)(3,4)

47 What About Multiple Covariates? More Conditioning!

48 Osteogtenic Sarcoma LogXact Manual 46 patients surgically treated for osteogenic sarcoma and then observed for disease recurrence within 3 years Covariates –Sex: Male = 1, Female = 0 –Any Ostoid Pathology (AOP) Present = 1, not = 0 Interested in the effect of AOP

49 Osteogtenic Sarcoma Covariate Group No Recurrence (y = 0) Recurrence (y = 1) Group Size (n i ) Covariates Sex (x 1 )AOP (x 2 ) 180800 252701 3941310 47111811 Total291746

50 Estimating the Effect of AOP New statistics to condition –Group sizes –Sufficient statistic for intercept,  y = 17 –Sufficient statistic for coefficient for sex,  x 1 y = 15 Calculate the conditional distribution of  x 2 y –Sufficient statistic for coefficient for AOP –Number of cases with AOP in recurrence (=13) –Given exactly 17 with recurrence 15 of which are males

51 Network Algorithm The Network Algorithm using two passes –First pass conditions on the intercept All tables with exactly 17 cases in recurrence –Second pass removes arcs that don’t produce sufficient statistic for sex All tables that don’t have 15 males in recurrence Proceed with estimation & inference as before

52 P[  x 2 y = t 2 |17 in recurrence and 15 males ]

53 Results

54 LR Test for Both Variables To test both sex and AOP are zero simultaneously, need the joint conditional density –All possible combinations of males and patients with AOP in recurrence given exactly 17 patients in recurrence –Determine how rare is it to have 15 recurrent males AND 13 recurrent AOP patients?

55 SAS Examples

56 Conclusion Exact (conditional) logistic regression –Useful method when asymptotic assumptions are not met or with separation –Utilizes conditioning to remove nuisance parameters from the likelihood –Very computational intensive method –Network algorithm speeds up calculations

57 Questions?


Download ppt "Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude."

Similar presentations


Ads by Google