Presentation is loading. Please wait.

Presentation is loading. Please wait.

STT : Biostatistics Analysis Dr. Cuixian Chen

Similar presentations


Presentation on theme: "STT : Biostatistics Analysis Dr. Cuixian Chen"— Presentation transcript:

1 STT520-420: Biostatistics Analysis Dr. Cuixian Chen
Chapter 8: Fitting Parametric Regression Models

2 Fitting Parametric regression model
Relationship between a survival variable Y and an explanatory variable X; We consider regression model: Y=survival r.v. (called dependent/response variable); X=covariate and regressor/independent/explanatory variable; For example: For heart transplant surgery: Y=survival time following heart transplant; and Write XT =(X1, … Xp) when there are p explanatory variables. (XT means transpose of X). Write Yx for response Y when X=x. STT

3 Fitting Parametric regression model
Two parametric regression models: Proportional hazard model (PHM); Accelerated lifetime/accelerate failure time model (AFT). Object: Check model suitability, diagnostic and interpretation of results. Only when Y~Weibull(, these two models coincide. Otherwise we need to choose between proportional hazard model and accelerated lifetime model.

4 Proportional Hazards Model
Recall: Yx for response Y when X=x. Q: what effect do the covariates have on the hazard function and how should this effect be modeled? For Proportional Hazards Model: Covariates have a multiplicative effect on a basic hazard function called baseline hazard. Let hx(y) denote the hazard function of Y when x is the vector of observed covariate values. STT

5 Proportional Hazards Model
Def 8.1: Let Yx denote response depending on an observed vector X=x. A proportional hazards model for Yx is hx(y)=h0(y)*g1(x), where g1 is a positive function of x and h0(y) is called the baseline hazard and represents the hazard function for an individual having g1(x) ≥ 0 and g1(0) = 1 . Often g1(x)=exp(x1+…+pxp). hx(y)=h0(y)*exp(x1+…+pxp).

6 Proportional Hazards Model
Note how the “proportional” enters the picture (see p. 144 for definitions): Two hazards are for two different individuals, distinguished by the values the explanatory variables take on for them…note that the “baseline” hazard cancels out. Simplest case: g1(x)= exp(x1+…+pxp) , where g1(x) ≥ 0 and g1(0) = 1 and the baseline hazard occurs when x=0. The process of fitting this model follows the usual process of finding the best estimates of …

7 Proportional Hazards Model (PHM)
Standard PHM in Def. 8.1 becomes: hx(y)=h0(y) *exp(x1+…+pxp) Then baseline hazard is obtained when x=0 (all covariates=0). Next, estimate  using the given responses and covariates… NOTE: hx(y) equals product of two functions: baseline hazard h0(y) (which doesn’t involve the covariates) and other factor (which doesn’t involve the survival time y). This is called Cox PHM. Good estimates of  and the hazard and survival curves can be obtained in many different and varied situations ; i.e., this model is very robust. It is called Semiparametric since we don’t have to assume a particular model (distribution-free) for the survival function.

8 Example 8.1, page 145. PHM with a group membership covariate: there is only one covariate, namely “group” effect (usually control group: x=0 and treatment group: x=1) The proportional hazard (or the hazard ratio) is So, if we could get an estimate of call it -hat), we could then have an estimate of the hazard ratio between two individuals in the two groups ; i.e., exp(-hat) so we could say that

9 Proportional Hazards Model
Note on page 145 in (8.3) that the proportional hazards model has a so-called “power” effect on the baseline survival function: Here Example 8.1 shows the effect of a single covariate X=group: Notice also that the ratio of two hazards cancels out the baseline hazard and leaves a function that is constant over time.

10 Leukemia Remission Time with PHM and SAS
SAS has a procedure that easily estimates ’s in the proportional hazards model. With PHM, use SAS codes to estimates ’s in the remission times data. STT

11 PROC PHREG in SAS SAS has a procedure that easily estimates ’s in the proportional hazards model - for example, in the remission times data: proc phreg data=remission; model remtime*censor(0)=grp; run; /* or if we put a second covariate in */ model remtime*censor(0)=grp logWBC; run; /*note the use of the numeric variable grp defined as grp=1 if group=“pl” and 0 otherwise… */ 5

12 Example #1 to PHM Suppose that the baseline hazard function is h0(y)=2y for all y’s. (a) Compute the baseline survival function . (b) If Yx satisfies the Proportional hazard model, and g1(x)=exp(5x), find the hazard function for Yx , that is, hx(y). (c) Under the assumptions from (b), what is the survival function of Yx ? STT

13 Example #2 to PHM Suppose that the baseline hazard function is h0(y)=(2/9)*y^2, for all y’s. (a) Compute the baseline survival function . (b) If Yx satisfies the Proportional hazard model, and g1(x)=exp(7x), find the hazard function for Yx , that is, hx(y). (c) Under the assumptions from (b), what is the survival function of Yx ? STT

14 Accelerated Lifetimes Model
Def 8.2: The accelerated lifetime model Yx*g2(x)=Y0, where g2(x)>=0 and g2(0)=1. That is, the covariates act directly on lifetime, so as to speed it up or retard its progress. Often g2(x)=exp[ - (x1+…+pxp)]. That is: Yx= Y0* exp[x1+…+pxp]. We can show that the accelerated lifetime model satisfies:

15 Accelerated Lifetimes Model
Def 8.2: The accelerated lifetime model where g2(x)>=0 and g2(0)=1. Often g2(x)=exp[-(x1+…+pxp)]. Recall: A proportional hazards model for Yx is hx(y)=h0(y)*g1(x), where g2(x)>=0 and g2(0)=1. Often g1(x)=exp(x1+…+pxp). hx(y)=h0(y)*exp(x1+…+pxp).

16 Example #1 to AFT Suppose that the baseline hazard function is h0(y)=4y for all y’s. (a) Compute the baseline survival function . (b) If Yx satisfies the Accelerated Lifetime model, and g2(x)=exp(2x), find the hazard function for Yx , that is, hx(y). (c) Under the assumptions from (b), what is the survival function of Yx ? STT

17 Example #2 to AFT Suppose that the baseline hazard function is h0(y)=3y for all y’s. (a) Compute the baseline survival function . (b) If Yx satisfies the Accelerated Lifetime model, and g2(x)=exp(5x), find the hazard function for Yx , that is, hx(y). (c) Under the assumptions from (b), what is the survival function of Yx ? STT

18 Examples to PHM/AFT, --- EX8.1, page 159
For the following baseline hazard function: (a) sketch the baseline hazard function; (b) sketch the proportion hazard function which is multiplied by a factor of g1(x)=2. (c) Compare this, on the same graph, with the accelerated lifetime hazard which is accelerated by a factor g2(x)=2. (d) Find the baseline survival function; (e) Find the survival functions for case (b); (f) Find the survival functions for case (c). STT

19 Examples to PHM/AFT, --- Extra EX
For the following baseline hazard function: (a) sketch the baseline hazard function; (b) sketch the proportion hazard function which is multiplied by a factor of g1(x)=2. (c) Compare this, on the same graph, with the accelerated lifetime hazard which is accelerated by a factor g2(x)=2. (d) Find the baseline survival function; (e) Find the survival functions for case (b); (f) Find the survival functions for case (c). STT

20 Proportional Hazards for Weibull data
For Y ~ Weibull(, the survival function is: Covariates are usually introduced into the Weibull model in the position of the scale parameter . Here we assume (x)(x), where (x)0.

21 Accelerated lifetime for Weibull data
If Yx is Weibull( ) , subjecting to Type I censoring, then it follows Where Then AFT for Weibull data gives: ~ ~

22 Recall: Weibull Prob Plots (chap4)
Recall Power hazard model: Substitute , then take logarithm: Take logarithm of base 10 again to create log-life variable: We now have: Note: base-10 log are traditionally used in lifetime, but natural log is equivalent with a difference of scale. STT

23 Recall: Weibull Prob Plots (chap4)
If data fit Weibull model: Weibull Prob plot is: It follows a straight line with slope and intercept STT

24 PROC LIFEREG in SAS Yx can assume the follow distributions: Weibull, Exponential , gamma, log-logistic, and log-normal. Eg: “/dis=Weibull”. Note: all AFT models are named for the distribution of Yx, not log(Yx) or epsilon. However, choice of model can make substantial difference. Graphical method for evaluation model fit: If Yx ~Exp(β), then (Yx, -logS(Yx)) should be a straight line with an origine at 0. If Yx ~Weibull(α, β), then (log(Yx), log[-logS(Yx)]) should be a straight line. In PROC LIFETEST, plots=(ls, lls) gives both plots. STT

25 About Dataset Recid This dataset have 432 male inmates who were released from Maryland state prisons. They were followed for a year after their release and the dates of arrests were recorded. The variables are defined as: week=# of the week of the first arrest after release from prison. This is “event" variable. arrest=1 for those arrested in the year, 0 for those not arrested in the year. This is "arrest" variable. fin=1 if the inmate received financial aid after release, 0 otherwise. age=age in years at the time of release race=1 if inmate was black, otherwise 0. wexp=1 if inmate had fulltime work experience before incarceration, 0 otherwise. mar=1 in inmate was married at time of release, 0 otherwise. paro=1 if inmate was released on parole, 0 otherwise. prio=number of prior convictions an inmate had prior to incarceration. educ=education level (not sure about the levels here...) emp1-emp52 are 0-1 variables indicating whether the inmate was employed during each week after release STT

26 PROC LIFEREG in SAS with Exponential
/*With covariates, the PROBPLOT statement produces non- parametric estimates of the survivor function using a modified Kaplan-Meier method that adjusts for covariates.*/ proc lifereg data=recid; model week*arrest(0)=fin age race wexp mar paro prio / dist=exponential; probplot; title "Lifereg Exponential"; run; quit; STT

27 PROC LIFEREG in SAS with Weibull
/*With covariates, the PROBPLOT statement produces non- parametric estimates of the survivor function using a modified Kaplan-Meier method that adjusts for covariates.*/ proc lifereg data=recid; model week*arrest(0)=fin age race wexp mar paro prio / dist=weibull; probplot; title "Lifereg Weibull"; run; quit; /* Log Likelihood = */ STT

28 PROC LIFEREG with Weibull and NO covariates
/*Null model with NO covariates*/ proc lifereg data=recid; model week*arrest(0)=/ dist=weibull; probplot; title "Lifereg Weibull"; run; quit; /* Log Likelihood = */ STT

29 PROC LIFEREG in SAS with Weibull: Hypothesis Testing #1
For the Likelihood Ratio Test: for the likelihood ratio test likelihood ratio = -2*( ( )) = -2*( ) = Since covariate=(fin age race wexp mar paro prio) = 7 dimensions, we conclude that degree of freedom = 7 for the Chi-square test. It follows the p-value < You can use the following commands in R to the p-value: 1-pchisq( , 7) Therefore, we reject the null hypothesis and conclude that at least one of the coefficients are nonzero. STT

30 PROC LIFEREG in SAS with Weibull Hypothesis Testing #2
/* Wald Test*/ Wald statistics for testing equality of any two coefficients are simple to calculate. It is defined as statistics = (beta-hat / SE(beta-hat))^2 For example: For fin, we have estimate = with SE = ; then Wald statistics = ( / )^2 = with degree = 1, you can use the following commands in R to the p-value: 1-pchisq(3.8906, 1) = Therefore, we reject the null hypothesis (not so confident though since the p- value is too close to 0.05) and conclude that the coefficients are nonzero. /* Score test*/ It is not straight forward of Score test in the lifereg, we skip this test here. STT

31 Example 7.4, page 135 Use SAS to read in the switch failure time data (see website). Then get estimates of the Weibull parameters for both the log-transformed and non-transformed data - use PROC LIFEREG: Notice that we may use the NOLOG option in the MODEL statement to not take logs of the data… proc lifereg data=switch; model u*censor(0)= /nolog dist=weibull; title 'Modeling u=log(Y) w/ NOLOG option'; model y*censor(0)= /dist=weibull; title 'Modeling non-transformed Y'; run; quit; Return to Section 4.4 (p ) and use SAS to get a probability plot (formula 4.4) of this data…

32 Example 7.4, page 135 (Case 1) With No-log option, if Y follows
Weibull Distribution, then log(Y) Follows Extreme Value Distribution. Note that dependent Variable is Y.

33 Example 7.4, page 135 (Case 2--Most used)
With default setting, Y follows Weibull Distribution. Note that dependent Variable is log(Y).

34 Example 7.4, page 135 (Case 3) With default setting, Y follows Weibull
Distribution. If we take Log to Y as our input. Then dependent Variable is log(log(Y)).

35 Diagnostics for choosing between models
Read section 8.5 on Diagnostics for choosing between models Recall the empirical survivor plot: Diagnostic 1: Plot the empirical survivor plots for each of the groups in a classification variable on the same plot. They should be “location shifted versions of the baseline survival function when an accelerated lifetimes model is appropriate” Diagnostic 2: “Compare the standard deviations of the data in each group (or when enough data points permits, the s.d.s at each covariate level). This allows us to check the constancy of sigma in the AFT model.”

36 Diagnostics for choosing between models
Diagnostic 3: “Plot to see if they are parallel - if so, the proportional hazards is correct” Note: “These curves will be straight lines when log(y) is plotted as the x-coordinate (rather than y) and the Weibull model fits the data well.” Go over Example 8.3 and use either R or SAS (or both!) to reproduce the diagnostic plots for this data.

37 Example 8.3, page 152: Diagnostic 1 in R
STT

38 Example 8.3, page 152: Diagnostic 3 in R
STT

39 Example 8.3, page 152: Diagnostics in SAS
Plots=(lls) gives Figure 8.2, page 154 STT

40 PROC LIFEREG in SAS The only differences between AFT and the usual linear regression models are that there is a σ before εi the and that the dependent variable is logged. With exact data, take Y = log T, and use the linear regression model with Y as the dependent variable. With censoring data, use MLE with different distribution assumption on ε. For each of the distribution of ε, there is a corresponding distribution for T. Incidentally, all AFT models are named for the distribution of T rather than for the distribution of e or log T. STT

41 Accelerated Lifetimes Model
For general Accelerate lifetime model Implies that Where Z is a random variable with mean 0 and variance 1

42 Accelerated Lifetimes Model
This provides the log-linear model Where Definition 8.2 also provides

43 Recall: Weibull Model (Chap 2), page 27
Weibull model: assume Y ~ Weibull(, take log of survival data, say X=loge(Y). Then X=loge(Y) follows Extremevalue(u, b): The survival function is given by The original Weibull parameters can be estimated if u and b are estimated by u-hat and b-hat:

44 Recall: Weibull Model (Chap 2), page 27
If Y~ Weibull(, then loge(Y) ~ Extreme-Value(u b). where Standardized: ~


Download ppt "STT : Biostatistics Analysis Dr. Cuixian Chen"

Similar presentations


Ads by Google