How to analyze your organism’s chance of survival?

Slides:



Advertisements
Similar presentations
Survival Analysis In many medical studies, the primary endpoint is time until an event occurs (e.g. death, remission) Data are typically subject to censoring.
Advertisements

If we use a logistic model, we do not have the problem of suggesting risks greater than 1 or less than 0 for some values of X: E[1{outcome = 1} ] = exp(a+bX)/
5.2 Continuous Random Variable
Main Points to be Covered
Sampling and Randomness
Some standard univariate probability distributions
1 Review Definition: Reliability is the probability that a component or system will perform a required function for a given period of time when used under.
Lesson Overview 5.1 How Populations Grow.
Quantitative Genetics
Analysis of Complex Survey Data
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University.
Statistical Analysis Statistical Analysis
Chapter 19 Table of Contents Section 1 Understanding Populations
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
Dr Laura Bonnett Department of Biostatistics. UNDERSTANDING SURVIVAL ANALYSIS.
JMB Chapter 6 Lecture 3 EGR 252 Spring 2011 Slide 1 Continuous Probability Distributions Many continuous probability distributions, including: Uniform.
JMB Ch6 Lecture 3 revised 2 EGR 252 Fall 2011 Slide 1 Continuous Probability Distributions Many continuous probability distributions, including: Uniform.
Estimation of Statistical Parameters
Probability theory 2 Tron Anders Moger September 13th 2006.
Chapter 3 Basic Concepts in Statistics and Probability
1 Introduction to medical survival analysis John Pearson Biostatistics consultant University of Otago Canterbury 7 October 2008.
Lecture 8: Generalized Linear Models for Longitudinal Data.
Logistic Regression STA2101/442 F 2014 See last slide for copyright information.
AP STATISTICS LESSON SAMPLE MEANS. ESSENTIAL QUESTION: How are questions involving sample means solved? Objectives:  To find the mean of a sample.
Borgan and Henderson:. Event History Methodology
9.3: Sample Means.
01/20151 EPI 5344: Survival Analysis in Epidemiology Survival curve comparison (non-regression methods) March 3, 2015 Dr. N. Birkett, School of Epidemiology,
More Continuous Distributions
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Lecture 2 Review Probabilities Probability Distributions Normal probability distributions Sampling distributions and estimation.
Essential Question:  How do scientists use statistical analyses to draw meaningful conclusions from experimental results?
Pro gradu –thesis Tuija Hevonkorpi.  Basic of survival analysis  Weibull model  Frailty models  Accelerated failure time model  Case study.
Censoring an observation of a survival r.v. is censored if we don’t know the survival time exactly. usually there are 3 possible reasons for censoring.
Tools of Environmental Science Chapter 2. Objectives List and describe the steps of the experimental method. Describe why a good hypothesis is not simply.
Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we.
CY1B2 Statistics1 (ii) Poisson distribution The Poisson distribution resembles the binomial distribution if the probability of an accident is very small.
+ Chapter Scientific Method variable is the factor that changes in an experiment in order to test a hypothesis. To test for one variable, scientists.
Satistics 2621 Statistics 262: Intermediate Biostatistics Jonathan Taylor and Kristin Cobb April 20, 2004: Introduction to Survival Analysis.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Chapter 4 Continuous Random Variables and Probability Distributions  Probability Density Functions.2 - Cumulative Distribution Functions and E Expected.
Populations Chapter 19 Table of Contents Section 1 Understanding Populations Section 2 Measuring Populations Section 3 Human Population Growth.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
Chapter 20 Statistical Considerations Lecture Slides The McGraw-Hill Companies © 2012.
01/20151 EPI 5344: Survival Analysis in Epidemiology Hazard March 3, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine,
INTRODUCTION TO CLINICAL RESEARCH Survival Analysis – Getting Started Karen Bandeen-Roche, Ph.D. July 20, 2010.
Lecture 8: Measurement Errors 1. Objectives List some sources of measurement errors. Classify measurement errors into systematic and random errors. Study.
True or False: During Stage 2 of a population’s demographic transition, the death rate declines.
Chapter 4 Continuous Random Variables and Probability Distributions  Probability Density Functions.2 - Cumulative Distribution Functions and E Expected.
EPI 5344: Survival Analysis in Epidemiology Week 6 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine, University of Ottawa 03/2016.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
DURATION ANALYSIS Eva Hromádková, Applied Econometrics JEM007, IES Lecture 9.
Theoretical distributions: the Normal distribution.
Carolinas Medical Center, Charlotte, NC Website:
More on Exponential Distribution, Hypo exponential distribution
CHAPTER 12 More About Regression
Statistical analysis.
The Exponential and Gamma Distributions
Statistical analysis.
CHAPTER 12 More About Regression
CHAPTER 18 SURVIVAL ANALYSIS Damodar Gujarati
CHAPTER 26: Inference for Regression
Anja Schiel, PhD Statistician / Norwegian Medicines Agency
Virtual University of Pakistan
CHAPTER 12 More About Regression
3. Random Variables Let (, F, P) be a probability model for an experiment, and X a function that maps every to a unique point.
Chapter 4 SURVIVAL AND LIFE TABLES
CHAPTER 12 More About Regression
Tools of Environmental Science
Presentation transcript:

How to analyze your organism’s chance of survival?

Graphic representationTransition time Data: Time between two distinct events, repeated among many subjects/objects/organisms. The first event is predefined while the second is typically some specific kind of transition. The time between these two events will be called the transition time (or survival time) It can happen that subjects exit the study for other reasons than the event of interest. This is called censored data. Transition time is more than what we have registered, but we don’t know by how much. time

 Medical analysis – diagnosis to death by that disease:  End of the study  The patient wants to leave the study  Death by other causes  Plant survival study – start of experiment to death by environmental factors:  End of study  Experimenter accidentally dropped the pot on the ground  Start of larval stage until transition to adult:  End of the study  Death of the larva  Matriculation to master’s degree of students:  End of study  Student dropped out  Death of student (hopefully not!) If we disregard censored data, we can seriously underestimate the transition time!

 Subject: Whatsoever the start and end events belong to.  Transition: The end event.  Transition time: Time between the two events of interest.  Censoring: When a subject leave the study in a different way than the specified transition.  Age: Time from the start event to the present time for subjects which have not been censored and which have not undergone transition.  Treatment: Same as regression/dependent variable/explanation variable in ordinary regression. Finding the effect of a treatment on survival is typically the goal of survival analysis.

The transition time will typically vary, so we need a statistical distribution for it. Distribution f(t) describes the probability-density of the transition times (t). A sharp peak around a value means most survival times are found around that value. A histogram of actual transition times will start to look like this distribution when we have much data. Distribution of age at death for the population of U.S.A. 2003, as derived from the histogram of ages.

Survival curve, S(t) : Describes the probability of not going through a transition before a given age, t. A histogram of the age of subjects at a given time, will look like this curve. In more mathematical terms, it’s the cumulative sum (integral) of probabilities (densities) for all times larger than t. Survival curve for U.S.A. 2003, from the histogram of ages.

Hazard rate, h(t) : The chance of going through a transition the next time interval, given that the subject has not done so earlier. Hazard rate for U.S.A. 2003, derived from the histogram of ages.

ConceptGraph Transition time distribution, f(t) : Survival curve, S(t) : Hazard rate, h(t) : If we have an expression for one of these concepts, the other two can be derived. Just different ways of looking at the same thing.

 Form: f(t)= e - t  Usage:  Unstable elementary particles  Radioactive isotopes  Time between phone calls  Life time of a particular copy of DNA for microbial organisms? PS: Conditioned on the state of the organism itself and it’s environment.  Special quality - memoryless: f(t-t 0 | t>t 0 )=f(t) If the survival probability drops to 50% in t=5, it will drop to 25% in t=10 and to 12.5% in t=15. Constant hazard. Reasonable, since the distribution is memoryless. S(t)=e - t h(t)= t0t0 f(t)= e - t

 Assume microbial survival is conditionally exponential distribution.  Contribution from genetics and environment spreads out the death rate,, according to the gamma distribution,  (a,b).  Result: f(t)= (a/b) (1+t/b) -(a+1) (Pareto distribution) S(t)=(1+t/b) -a Dropping hazard rate. Reasonable: If old age => good genes and/or good env. If young, over- representation of bad genes and/or bad env. f(t) for a=1, b=1 h(t)=a/(b+t)

 Cartoon model of aging: the uniform distribution: f(t)=I(0<t<a)/a  a=maximal age.  All outcomes below that are equally probable. f(t) S(t)=(1-t/a)I(0<t<a) h(t)=1/(a-t) Hazard rate increases inversely proportional to the distance to a. The closer to the maximum attainable age, the more risk there is of dying.

 Often observed in engineering: a hazard rate that it higher for small and large times than for moderate times.  Can be reasonable for complex biological organisms also. For instance humans.  Possible to start with modeling hazard rates in order to make a transition time distribution. Estimated h(t) from census data 2003, U.S.A.

 Increasing hazard  survival-curve bends downwards on the log-scale.  Decreasing hazard  survival-curve bends upwards on the log-scale. h(t) S(t) Example: Uniform transition time (cartoon of aging) Example: Pareto distribution (Varying genes/env. Possibly also a model for vulnerability in early life.)  

Kaplan-Meier is a parameter-free way of estimating the survival curve.  Similar to histograms, in that it simply summarizes the data.  Performed by first noting for which times, t j, there are transitions in the data.  The number of transitions, m j, and the number of subjects “at risk”, y j, (both subjects which will transit later and subjects that are later censored), is then used.  Technical: R code: survfit(Surv(t.event,censoring.status)) Use “plot”, to see the resulting curve. Example: Survival plot for plant experiment – all plants

You can get a confidence interval for this estimated curve. R code: survfit(Surv(t.event,censoring.status), conf.type=“plain”, conf.int=0.95) Divide your dataset into subgroups with different treatments and you get a feel for the difference between these treatments. (In this plant study, the treatment is day length, “dlen”) : R code: survfit(Surv(t.event,censoring.status)  dlen, conf.type=“plain”, conf.int=0.95) PS: Note that using confidence intervals to say whether there is a difference means invoking a large number of dependent tests. Not ok.

Does model comparison between grouping the data according to treatment and not grouping the data. Compares the observed number of events to that expected if the groups have equal transition time distribution. Gives a p-value for the zero-hypothesis (no effect of different treatments). R-code: survdiff(Surv(t.event,censoring)~treatment)

StrengthWeakness No model assumptions => can be used for almost any dataset. Allows categorical explanation variables. Model-independent confidence interval and model comparison, too. Model assumptions => stronger inference. No inference for transition time distribution or hazard rate. No way to stepwise add and test more explanation variables (treatments). Explanation variables can only be added by subdividing the dataset into smaller and smaller pieces. No way to incorporate continuous explanation variables.

 Addresses (almost all) the weaknesses of the Kaplan- Meier approach.  Does so by a single model assumption: proportional hazard.  Separates time dependency from variable dependency *in the hazard rate*.  Hazard ratio: The hazard rate for one choice of explanation variables divided by the hazard rate of another choice.  Allows for continuous explanation variables and additive effects of different categorical variables.  R-code: coxph(Surv(t.event,censoring)~var1+var2+var3)

Proportional hazard regression: h(t|x)=h 0 (t)e  x or lh(t|x)=lh 0 (t)+  x where lh=log(h). Assume one explanation variable, x: What happens if we change it from x to x+1?  Log-hazard rate changes by a additive factor .  The hazard changes by a multiplicative factor e .  Log-survival curve also changes by a multiplicative factor e . (The latter can be compared to results from the Kaplan-Meier estimator.)  Survival curve changes from S(t) to S(t) exp(  ). ( Ex: S(t|x=1)=S(t|x=0) 2. Not so easy to see in a plot.) lh(t|x=1)= lh(t|x=0)+0.69 log(S(t|x=1))= 2log(S(t|x=0)) h(t|x=1)= 2h(t|x=0)

 Gives a (partial) likelihood, so more model comparison techniques available.  Likelihood-ratio test  AIC/BIC  Wald test (comparison between estimate and standard error, implemented in R).  Stepwise adding/subtracting extra variables possible, either likelihood-based methods or by Wald test.  Full model exploration by information criteria also possible (though prohibitively costly when the number of explanation variables is high).

 Implemented in R: cox.zph(…)  What makes the hazard non-proportional can be viewer using the Kaplan-Meier-estimated log- survival-curve. (Or plot(cox.zph(…))) For this plant experiment, the survival-curves for short day length and long day length seem to part company at around day Doesn’t necessarily invalidate the Cox-regression but makes the hazard ratio an average effect. Kaplan-Meier estimate for log-survival-curve