REGRESSION MODEL FITTING & IDENTIFICATION OF PROGNOSTIC FACTORS BISMA FAROOQI
Objective Focus on the parametric regression models for survival data Build an understanding related to the commonly used parametric regression methods Link the survival time of an individual to covariates using a specified probability distribution within the regression settings
What is Prognosis? It is the prediction of the future of an individual patient with respect to duration, course, and outcome of a disease. Prognosis plays an important role in medical practice but it is often difficult to sort out which characteristics of a patient (also called explanatory variables) are most closely related to it. Therefore, a statistical analysis is needed to prepare a compact summary of the data that can reveal their relationship.
PRELIMINARY EXAMINATION OF DATA The Categories of Dependent Variables: The data used in prognostic studies or clinical trials can have the response variable as dichotomous, polychotomous, or continuous. Dichotomous dependent variables : response or nonresponse, life or death, and presence or absence of a given disease. Polychotomous dependent variables : different grades of symptoms (e.g., no evidence of disease, minor symptom, major symptom) and scores of psychiatric reactions (e.g., feeling well, tolerable, depressed, or very depressed). Continuous dependent variables : length of survival from start of treatment or length of remission, both measured on a numerical scale by a continuous range of values.
The Categories of Independent Variables: A prognostic variable (or independent variable) may be either numerical or nonnumeric. Numerical prognostic variables: - discrete, such as the number of previous strokes - continuous, such as age Continuous variables can be made discrete by grouping patients into subcategories (e.g., four age subgroups: 20, 20—39, 40—59, and 60). Nonnumeric prognostic variables : - unordered (e.g., race or diagnosis) - ordered (e.g., severity of disease may be primary, local, or metastatic).
Steps in Data Examination: Before conducting a statistical computation, the data needs to be examined carefully. We usually take the following steps as our preliminary examination: Obtain correlation coefficients between variables to detect significantly correlated variables. The highly correlated variable that has a prognostic value shown in other studies shouldn’t be deleted. For qualitative prognostic variable, the dummy variables are used. For example, having cell types A, B and O, let the dummy variable x1 takes the value of 1 for cell type A and 0 otherwise, and x2 takes the value of 1 for cell type B and 0 otherwise. For two categories (e.g., sex), only one dummy variable is needed: x is 1 for a male, 0 for a female. Transformation ( such as logarithmic) can be applied to the prognostic variables to obtain the better description of the data.
Reduction of prognostic factors that have little or no effect on the dependent variable from the multivariate analysis. Dealing with missing data. - depends what proportion of data is missing - may drop the missing data observations if they are relatively smaller in proportion - for nominal or categorical independent variable, treat individuals in a group with missing information as another group. - for quantitatively measured variables (e.g., age), the mean of the values available can be used for a missing value. This principle can also be applied to nominal data.
GENERAL STRUCTURE OF PARAMETRIC REGRESSION MODELS
Commonly Used Parametric Models: The most commonly used parametric models are: Exponential Weibull Lognormal Log-logistic Gamma Gompertz The first two are included in our discussion. The distributions generally involve 2 parameters : (λ) scale parameter & (γ) shape parameter. Shape is assumed constant across individuals. Maximum Likelihood Estimation is used to obtain the estimates for parameters. Newton – Raphson Iterative procedure is also applied when there is no closed solution to MLE.
Likelihood Inference of Regression Models
Hypothesis Testing
Exponential Model The exponential distribution is a useful form of the survival distribution when the hazard function (probability of failure) is constant and does not depend on time, the graph is approximately a straight line with slope=1. In biomedical field, a constant hazard function is usually unrealistic, the situation will not be the case.
Practical Approach
Weibull Model The hazard function changes with time, the graph is approximately a straight line, but the slope is not 1. The hazard function always increase when the parameter γ >1 The hazard function always decrease when γ <1 It is the exponential regression model when γ = 1
Exponential hazard function is constant whereas Weibull hazard function is monotonically decreasing.
THANK YOU