Download presentation
Presentation is loading. Please wait.
1
Canadian Bioinformatics Workshops
2
Module #: Title of Module
2
3
Module 8, Part II Clinical Data Integration and Survival Analysis
Anna Lapuk, PhD Bioinformatics for Cancer Genomics August 29 – September 2, 2011
4
Module 8 Part II overview
Clinical data and survival analysis theory Lab on survival analysis
5
Analytic techniques for biomarker discovery
Disease characterization “Clinical data” Whole genome/whole transcriptome data
6
Clinical data (variables)
ID race family history (yes/no) Nodal status (yes/no; number of nodes involved) Radiation Chemo Hormone therapy Protein IHC Stage Size Age at diagnosis Estrogen receptor level Progesterone level SBR grade Overall outcome (dead/alive) Overall survival time Disease specific outcome (dead/alive) Disease specific survival time Recurrence status (yes/no) time to recurrence Time to distant recurrence Distant recurrence status (yes/no) Survival times – time to a given end point Survival analysis
7
Survival analysis Goal Technique
Estimate the probability of individual surviving for a given time period (one year) Kaplan-Meier survival curve, life table Compare survival experience of two different groups of individuals (drug/placebo) Logrank test (comparison of different K-M curves) Detect clinical/genomic/epidemiologic variables which contribute to the risk (associated with poor outcome) Multivariate (univariate) Cox regression model
8
Survival data Survival time – is the time from a fixed point to an end point Almost never observe the event of interest in all subjects (censoring of data) Need for a special analytical techniques Starting point End point Surgery Death/Recurrence/Relapse Diagnosis Treatment
9
Censored observations
Arise whenever the dependent variable of interest represents the time to a terminal event, and the duration of the study is limited in time. Incomplete observation - the event of interest did not occur at the time of the analysis. Type I and II censoring (time fixed/proportion of subjects fixed) Right and left censoring Event of Interest Censored observation Death of the disease Still alive Survival of marriage Still married Drop-out-time from school Still in school Type I and II Censoring. So-called Type I censoring describes the situation when a test is terminated at a particular point in time, so that the remaining items are only known not to have failed up to that time (e.g., we start with 100 light bulbs, and terminate the experiment after a certain amount of time). In this case, the censoring time is often fixed, and the number of items failing is a random variable. In Type II censoring the experiment would be continued until a fixed proportion of items have failed (e.g., we stop the experiment after exactly 50 light bulbs have failed). In this case, the number of items failing is fixed, and time is the random variable. Left and Right Censoring. When censoring, a distinction can be made to reflect the "side" of the time dimension at which censoring occurs. Consider an experiment where we start with 100 light bulbs, and terminate the experiment after a certain amount of time. In this experiment the censoring always occurs on the right side (right censoring), because the researcher knows when exactly the experiment started, and the censoring always occurs on the right side of the time continuum. Alternatively, it is conceivable that the censoring occurs on the left side (left censoring). For example, in biomedical research one may know that a patient entered the hospital at a particular date, and that s/he survived for a certain amount of time thereafter; however, the researcher does not know when exactly the symptoms of the disease first occurred or were diagnosed.
10
Types of censoring
11
Survival time and probability
Survival probability for a given length of time can be calculated considering time in intervals. Probability of survival month 2 is the probability of surviving month 1 multiplied by the probability of surviving month 2 provided that the patient has survived month 1 (conditional probability) Survival probability = p1 x p2 x p3 x p4 x ... pj pj is the probability of surviving month j of those still known to be alive after (j-1) months. In the reality time intervals contain exactly one case. On the left is a diagram showing how patients proceed through study. First 6 mo – accrual, next 12 months are study.
12
Kaplan-Meier Curve 1 Survival probability 0.5 Time (months) Censored
observations Show particular time point. Say how to use this curve to estimate a probability of a patient from another cohort. Survival probability is a geometric progression of probabilities at each distinct survival time point. (excluding censored observations time points) Time (months) r – still at risk f – failure (reached the end point)
13
Kaplan-Meier Curve 1 Survival probability Censored observations
What is the probability of a patient to survive 2.5 months? 0.5 Censored observations Time (months)
14
Kaplan-Meier Curve 1 Treated patients Survival probability
Untreated patients 0.5 Are survival experiences significantly different? Proportions of still alive subjects at any given time point? – not adequate. Time (months)
15
Logrank test Is a non-parametric method to test the null hypothesis that compared groups are samples from the same population with regard to survival experience. (Doesn’t tell how different)
16
Divide time scale into intervals
1 Treated patients Survival probability Untreated patients 0.5 Compare proportions at every time interval and summarize it across intervals (similar to a Chi-square test) Time (months)
17
Logrank test: compare survival experience of two different groups of individuals
k time intervals O – observed proportion E – expected V – variance of (O-E) Then compare with the χ2 distribution with (k-1) degrees of freedom Chi-square Log-rank
18
Hazard ratio Hazard ratio compares two groups differing in treatments or prognostic variables etc. Measures relative survival in two groups based on the complete period studied. R=0.43 – relative risk (hazard) of poor outcome under the condition of group 1 is 43% of that of group 2. R= 2.0 then the rate of failure in group 1 is twice the rate in the group 2. (tells how different) The R is computed for the entire period of study and may not be consistent throughout the time intervals. KM curves are essential in this regard to visually inspect the consistency of differences in survival experience. Also we may compute R for different time points and see how consistent it tis. Note: for entire period. Check for consistency across time intervals
19
Cox-proportional hazard model
Used to investigate the effect of several variables on survival experience. Multivariate proportional hazards regression model described by D.R. Cox for modeling survival times. It is also called proportional hazards model because it estimates the ratio of the risks (hazard ratio or relative hazard). There are multiple predictor variables (such as prognostic markers whose individual contribution to the outcome is being assessed in the presence of the others) and the outcome variable . Logrank test does not allow to investigate the effect of multiple variables on surv experience. (even if we can do the stratified logrank test, it’s not adequate enough). For this there ‘s a Cox regression model.
20
Hazard function X1...Xp – independent variable of interest
Prognostic index (PI) X1...Xp – independent variable of interest b1 ... bp – regression coefficients to be estimated Assumption: the effect of variables is constant over time and additive in a particular scale (Similarly to K-M) Hazard function is a risk of dying after a given time assuming survival thus far. Cumulative function H0(t) – cumulative baseline or underlying function. Probability of surviving to time t is S(t) = exp[-H(t)] for every individual with given values of the variables in the model we can estimate this probability. The hazard function is closely related to the survival curve and represents the risk of dying in a very short time interval after a given time. It a cumulative function, so we can add all the hazards from 0 to time t to get the risk of dying at time t.
21
Interpretation of the Cox model
Cox regression model fitted to data from PBC trial of azathioprine vs placebo (n=216) variable Regression coef (b) SE(b) exp(b) Serum billirubin 2.510 0.316 12.31 Age 1.01 Cirrhosis 0.879 0.216 2.41 Serum albumin 0.0181 0.95 Central cholestasis 0.679 0.275 1.97 Therapy 0.52 0.207 1.68 Coefficient: Sign – positive or negative association with poor survival Magnitude – refers to the increase in log hazard for an increase of 1 in the value of the covariate Altman D, 1991
22
Interpretation of the Cox model
Cox regression model fitted to data from PBC trial of azathioprine vs placebo (n=216) variable Regression coef (b) SE(b) exp(b) Increase of value of the variable by 1 will result in (relative to baseline) Serum billirubin 2.510 0.316 12.31 1231% Age 1.01 101% Cirrhosis 0.879 0.216 2.41 241% Serum albumin 0.0181 0.95 95% Central cholestasis 0.679 0.275 1.97 197% Therapy 0.52 0.207 1.68 168% Coefficient: Sign – positive or negative association with poor survival Magnitude – refers to the increase in log hazard for an increase of 1 in the value of the covariate. If the value changes by 1, hazard changes Exp(b) times. Important points: Coefficient sign and magnitude. Covariates may be binary or continuous. In general – increase by 1 gives increase in hazard exp(b) times. Illustrate on albumin and therapy. We call H(t) / H0(t) the hazard ratio. The coefficients bi...bk are estimated by Cox regression, and can be interpreted in a similar manner to that of multiple logistic regression. Suppose the covariate (risk factor) is dichotomous and is coded 1 if present and 0 if absent. Then the quantity exp(bi) can be interpreted as the instantaneous relative risk of an event, at any time, for an individual with the risk factor present compared with an individual with the risk factor absent, given both individuals are the same on all other covariates. Suppose the covariate is continuous, then the quantity exp(bi) is the instantaneous relative risk of an event, at any time, for an individual with an increase of 1 in the value of the covariate compared with another individual, given both individuals are the same on all other covariates. Modified from Altman D, 1991
23
Survival curves based on Cox model
The hazard function is a step function and we can express survival function through hazard function (and some programs give S(t) instead of H(t). Then we can plot it as a survival curve. The power of analysis depends on the number of terminal events – deaths for example. Sometimes it takes many years of follow-up to compile a data that will give sufficient power. That’s why studies normally use other end points, whcih may be more frequent. Such as recurrence time. Of course, increase the sample size is another way to achieve power. It is not simple to estimate the required sample set and there are namograms thath allow one to calculate appropriate sample size. Altman D, 1991
24
Survival curves based on Cox model
power of analysis depends on the number of terminal events – deaths Higher power requires longer follow-up times. Alternative , more frequent endpoints – recurrence Estimation of a sample size to achieve required power is a hard task. Namograms help. The hazard function is a step function and we can express survival function through hazard function (and some programs give S(t) instead of H(t). Then we can plot it as a survival curve. The power of analysis depends on the number of terminal events – deaths for example. Sometimes it takes many years of follow-up to compile a data that will give sufficient power. That’s why studies normally use other end points, whcih may be more frequent. Such as recurrence time. Of course, increase the sample size is another way to achieve power. It is not simple to estimate the required sample set and there are namograms thath allow one to calculate appropriate sample size. Altman D, 1991
25
What Have We Learned? Clinical data is a highly important component and is intrinsically different from genomic/transcriptomic data. Survival data is a special type of data requiring special methodology Main applications of survival analysis: Estimates of survival probability of a patient for a given length of time (Kaplan-Meier survival curve) under given circumstances. Comparison of survival experiences of groups of patients (is the drug working???) (log-rank test) Investigation of risk factors contributing to the outcome (make a prognosis for a given patient and choose appropriate therapy) (Cox-regression model)
26
Questions?
27
References Other useful references:
Statistics for Medical Research, Douglas G Altman , Chapman & Hall/CRC Pharmacogenetics and pharmacogenomics: development, science, and translation. Weinshilboum RM, Wang L. Annu Rev Genomics Hum Genet. 2006;7: PMID: Pharmacogenomics: candidate gene identification, functional validation and mechanisms. Wang L, Weinshilboum RM. Hum Mol Genet Oct 15;17(R2):R PMID: End-sequence profiling: sequence-based analysis of aberrant genomes. Volik S, Zhao S, Chin K, Brebner JH, Herndon DR, Tao Q, Kowbel D, Huang G, Lapuk A, Kuo WL, Magrane G, De Jong P, Gray JW, Collins C. Proc Natl Acad Sci U S A Jun 24;100(13): PMID: A Review of Trastuzumab-Based Therapy in Patients with HER2-positive Metastatic Breast Cancer, David N. Church and Chris G.A. Price. Clinical Medicine: Therapeutics 2009: Other useful references: The hallmarks of cancer. Hanahan D, Weinberg RA. Cell Jan 7;100(1): PMID: Aberrant and alternative splicing in cancer. Venables JP Cancer Res Nov 1;64(21): PMID:
28
We are on a Coffee Break & Networking Session
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.