Download presentation
Presentation is loading. Please wait.
1
Logistic regression
2
Analysis of proportion data
We know how many times an event occurred, and how many times it did not occur. We want to know if these proportions are affected by a treatment or a factor. Examples: Proportion dying Proportion responding to a treatment Proportion in a sex Proportion flowering
3
The old fashioned way People used to model these data using percentages as the response variable… The problems with this are: Errors are not normally distributed! The variance is not constant! The response is bounded (0-1)! We lose information on the sample size!
4
However… Some data, such as percentage of plant cover, is better analyzed using the conventional models (normal errors and constant variance) following the arcsine transformation (the response variable measured in radians)…
6
If the response variable takes the form of percentage change of some measurement
Usually it is better to: Analysis of covariance, using final weight as the response variable and initial weight as the covariate Specifying the response variable as a relative growth rate, measured as log(final/initial) Both can be analyzed with normal errors without further transformations!
7
Rational for logistic regression
The traditional transformation of proportion data was arcsine. This transformation took care of the error distribution. There is nothing wrong with this transformation, but a simpler approach is often preferable, and is likely to produce a model that is easier to interpret…
8
The logistic curve The logistic curve is commonly used to describe data on proportions. It asymptotes at 0 and 1, so that negative proportions and responses of more than 100 % cannot be predicted.
9
Binomial errors If p = proportion of individuals observed to respond in a given way The proportion of individuals that respond in alternative ways is: 1-p and we shall call this proportion q n is the size of the sample (or number of attempts) An important point is that the variance of the binomial distribution is not constant. In fact the variance of a binomial distribution with mean np is: So that the variance changes with the mean like this:
10
The logistic model The logistic model for p as a function of x is given by: This model is bounded since:
11
The trick of linearizing the logistic model is a simple transformation known as logit…
See better description for the logit transformation in the class website
12
Hypericum cumulicola Small short-lived perennial herb
Narrowly endemic and endangered Flowers are small and bisexual Self-compatible, but requires pollinators to set seed Menges et al. (1999) Dolan et al. (1999) Boyle and Menges (2001)
13
Demographic data 15 populations (various patch sizes)
>80 individuals per population each year Data on height and number of reproductive structures Survival between August 1994 and August 1995
14
Histogram of height (cm) Hypericum cumulicola (1994)
15
Call: glm(formula = survival ~ height, family = binomial) Deviance Residuals: Min Q Median Q Max Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) <2e-16 *** height <2e-16 *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: on 878 degrees of freedom Residual deviance: on 877 degrees of freedom AIC: Number of Fisher Scoring iterations: 4
16
Calculating a given proportion
You can back-transform from logits (z) to proportions (p) by:
17
Survival vs. height
18
Survival vs. Rep. Structures
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.