Lecture 7 - Binomial and Logistic Regression

Lecture 7 - Binomial and Logistic Regression
C. D. Canham Likelihood Methods in Ecology April , 2011 Granada, Spain Lecture 7 Analysis of Categorical and Ordinal Data: Binomial and Logistic Regression

Example: analysis of windthrow data
Lecture 7 - Binomial and Logistic Regression C. D. Canham Example: analysis of windthrow data Traditionally: Summarize variation in degree and type of damage, across species and tree sizes, from the storm, as a whole... A likelihood alternative: Use the spatial variation in storm intensity that occurs within a given storm to estimate parameters of functions that describe susceptibility to windthrow, as a function of variation in storm severity and individual tree attributes... There are whole issues of journals devoted to summarizing the effects of notable storms (e.g. the issue of Biotropica containing papers devoted to Hurricane Hugo). Traditional approaches often lack generality because they essentially treat each storm as unique. While this is undoubtedly true, it misses the point that individual storms almost always contain variability in storm intensity that can be exploited to develop a more general, predictive understanding of how forests respond to a storm regime..

C. D. Canham Types of Response Variables (with examples from analysis of windthrow data) BINARY: Only two possible outcomes (yes, no; lived, died; etc.) This is termed a “Bernoulli trial” CATEGORICAL: Multiple categories (uprooted, snapped,...) ORDINAL: Ordered categories (degree of damage): none, light, medium, heavy, complete canopy loss {usually estimated visually} CONTINUOUS: just what the term implies, but rarely used in analyses of wind damage because of the difficulties of quantifying damage accurately... There are different statistical approaches to the analysis, depending on the nature of the response variable. The first three kinds of response variables are most commonly used because of the difficulty of actually measuring the amount of damage to an individual tree

Analysis of Binary Data
Binomial Regression Used when the individual “trial” is not the unit of study, but rather when there are replicates of a set of trials (i.e. seedlings in a quadrat) In the past, folks often analyzed this type of dataset by converting the response variable to a percentage, and then doing regression on the percentages (after doing ugly transformations…) Model predicts the underlying Binomial probability that would produce the observed number of successes given a number of trials Logistic Regression Used when the individual Bernoulli trial is the unit of study (i.e. did the tree die…) Model predicts the probability of “success” of a given trial

Steps in a likelihood analysis for binomial regression
In R: Specify the “scientific model” that predicts the probability of “success” as a function of a set of independent variables… -- Note that your scientific model should predict expected values bounded by 0 and 1 (since the predicted value is a probability) Define the likelihood function (using dbinom) binom_log_lh_function <- function(successes,trials,p) { dbinom(x=successes,size=trials, prob = p, log = TRUE) } Set up optimization to find the parameters of the scientific model that maximize likelihood across the dataset

Analysis of Binary Data: Traditional Logistic Regression
Lecture 7 - Binomial and Logistic Regression C. D. Canham Analysis of Binary Data: Traditional Logistic Regression Consider a sample space consisting of two outcomes (A,B) where the probability that event A occurs is p Definition: Logit = log of an odds ratio (i.e log[p/(1-p)]) Benefits of logits A logit is a continuous variable Ranges from negative when p < 0.5 to positive when p > 0.5 There are clear benefits of expressing the model in logits rather than in terms of raw probabilities (which would require functional forms that are bounded at 0 and 1...). These benefits were particularly important to traditional statistical packages, which could then treat logistic regression as a form of linear regression... Standard logistic regression involves fitting a linear function to the logit:

What if your terms are multiplicative?
Lecture 7 - Binomial and Logistic Regression C. D. Canham What if your terms are multiplicative? Example: Assume that the probability of windthrow is a joint (multiplicative) function of Storm severity, and Tree size In addition, assume that the effect of DBH is nonlinear.... A model that incorporates these can be written as: However, there will be cases where the model you would like to test will involve multiplicative terms (i.e. interactions), and these can not be handled in traditional logistical regression (at least without additional transformation). Categorical modeling procedures (log-linear models) in traditional packages will allow interaction terms, but they are clumsy to use. Likelihood methods can incorporate interaction terms easily (at least in principle). The challenge is with the data: if the two multiplicative terms are highly correlated, an estimator will have great difficulty finding a maximum likelihood solution because of parameter tradeoff...

C. D. Canham A little more detail.... Pisj is the probability of windthrow of the jth individual of species s in plot i DBHisj is the DBH of that individual as, bs, and cs are species-specific, estimated parameters, and Si is the estimated storm severity in plot i NOTE: storm severity is an arbitrary index, and was allowed to range from 0-1 NOTE: you can think of this as a hierarchical model, with trees nested in plots, and S is the plot term If you understand that you can estimate storm severity, rather than measuring it, then you understand the most important concepts of the likelihood method... The model requires estimating a large number of parameters: 3 for each species in the dataset, and 1 for each plot. Thus, plots need to be large enough to hold a minimum of 30 – 40 trees. If they are too large, the assumption of uniform storm severity within the plot will be suspect. If they are too small, you won’t have enough data points to fit this large number of parameters... But don’t you have to measure storm severity (not estimate it)?

Likelihood Function for Logistic Regression
Lecture 7 - Binomial and Logistic Regression C. D. Canham Likelihood Function for Logistic Regression It couldn’t be any easier... (since the scientific model is already expressed as a probabilistic equation):. The probability model and the scientific model are identical in this example, since the scientific model is already expressed in probabilistic terms. loglikelihood <- function(pred,observed) { ifelse(observed == 1, log(pred), log(1-pred)) }

Example: Windthrow in the Adirondacks
Lecture 7 - Binomial and Logistic Regression C. D. Canham Example: Windthrow in the Adirondacks Highly variable damage due to: variation within storm topography susceptibility of species within a stand The report written by Jerry Jenkins provides an excellent synthesis of the ecological and policy implications of the storm... Reference: Canham, C. D., Papaik, M. J., and Latty, E. F Interspecific variation in susceptibility to windthrow as a function of tree size and storm severity for northern temperate tree species. Canadian Journal of Forest Research 31:1-10.

C. D. Canham The dataset Study area: 15 x 6 km area perpendicular to the storm path 43 circular plots: ha (19.95 m radius) censused in (20 of the 43 were in oldgrowth forests) The plots were chosen to span a wide range of apparent damage All trees > 10 cm DBH censused Tallied as windthrown if uprooted or if stem was < 45o from the ground We censused windthrow in 43, ha circular plots (19.95 m radius) between June 6 and July 15, Windthrow was assessed on all trees > 10 cm DBH rooted within the plot. Saplings (stems > 2 cm DBH and < 10 cm DBH) were censused in a 5 m radius subplot at the center of the plot. Individuals were considered windthrown by the storm if they had been either uprooted so that the stem was less than 45 degrees from the ground, or if the trunk had been broken below the crown. The plots were distributed in an approximately 15 km x 6 km area running perpendicular to the storm path,

Critical data requirements
Lecture 7 - Binomial and Logistic Regression C. D. Canham Critical data requirements Variation in storm severity across plots Variation in DBH and species mixture within plots Without considerable variation in both tree sizes and species composition within plots, storm severity (a plot attribute) would be confounded with tree size and the species-specific parameters (a,b, and c)

C. D. Canham The analysis... 7 species comprised 97% of stems – only stems of those 7 species were included in the dataset for analysis # parameters = 64 (43 plots + 3 parameters for each of 7 species) Parameters estimated using simulated annealing See Canham et al. (2001) for details.

C. D. Canham Model evaluation Numbers above bars represent the number of observations in the class There is no analogue to traditional measures of R2 for logistic regression. This visual method is the best I have been able to come up with. There are various other measures (boiled down to a single number) in the literature, but none of them are very meaningful, from my point of view. The solid line is a 1:1 relationship

Estimating Storm Severity
Lecture 7 - Binomial and Logistic Regression C. D. Canham Estimating Storm Severity The method stretches out the observed variation in “severity” to the allowed range of Overall, plots with high estimated severity had a high fraction of density or basal area windthrown, but there was considerable scatter. This presumably reflects the fact that some plots had a high fraction windthrown because they were made up of particularly susceptible trees. Note that we did not sample any plots in which all trees > 10 cm DBH were blowndown, so we may need to wait for new storms to sample the most extreme events. However, given the relative resistance of small stems, it would appear that we came close to sampling in truly “catastrophic” disturbance...

C. D. Canham Results: Big trees... There was striking variation among the species in susceptibility of large trees to windthrow. The functions were still quite different for medium-sized stems

C. D. Canham Little trees... We observed almost no windthrow of stems < 10 cm DBH. The predicted curves for 10 cm were more similar to one another than for larger stems, but there was still considerable variation among species...

C. D. Canham New twists Effects of partial harvesting on risk of windthrow to residual trees Effects of proximity to edges of clearings on risk of windthrow Research with Dave Coates in cedar-hemlock forests of interior B.C.

Effects of harvest intensity and proximity to edge…
Lecture 7 - Binomial and Logistic Regression C. D. Canham Effects of harvest intensity and proximity to edge… Equation (1): basic model – probability of windthrow is a species-specific function of tree size and storm severity: Equation (2) introduces the effect of prior harvest removal to equation (1) by adding basal area removal and assumes the effect is independent and additive Equation (3) assumes the effects of prior harvest interact with tree size: Models 1a – 3a: test models where separate c coefficients are estimated for “edge” vs. “non-edge” trees (edge = any tree within 10 m of a forest edge)

C. D. Canham Other issues… Is the risk of windthrow independent of the fate of neighboring trees? (not likely) Should we examine spatially-explicit models that factor in the “nucleating” process of spread of windthrow gaps?…

Analysis for CATEGORICAL Response Variables
Lecture 7 - Binomial and Logistic Regression C. D. Canham Analysis for CATEGORICAL Response Variables Extension of the binary case??: Estimate a complete set of species-specific parameters for each of n-1 categories (assuming that the set of categories is complete and mutually exclusive...) # of parameters required = P + (n-1)*(3*S) Where P = # plots, S = # species, and n = # of response categories {Is this feasible?...} I have not tried this, but in principle it would seem reasonable to use the same storm severity estimates for each category, and just estimate n-1 sets of the species-specific parameters for n categories. As long as the list of categories is complete and mutually-exclusive, you wouldn’t need to estimate n sets of parameters, since the probability of the nth event is simply 1 minus the cumulative probability of the other n-1 events... In effect, each category is treated as a possible outcome with probability p, and all other outcomes combined have probability 1-p. Then estimate separate sets of of specie-specific parameters for the n-1 outcomes (leaving the no damage category unestimated), and 1 set of site severity indices. In reality, once a tree snaps off, it is no longer available to be uprooted, so the probabilities are effectively conditional, so this problem may be much harder than it appears...

Analysis for ORDINAL Response Variables
Lecture 7 - Binomial and Logistic Regression C. D. Canham Analysis for ORDINAL Response Variables The categories in this case are ranked (i.e. none, light, heavy damage) Analysis shifts to cumulative probabilities...

Simple Ordinal Logistic Regression
Lecture 7 - Binomial and Logistic Regression C. D. Canham Simple Ordinal Logistic Regression If (i.e. the probability that an observation y will be less than or equal to ordinal level Yk (k = 1.. n-1 levels) , given a vector of X explanatory variables), The most common approach for extending logistic regression to an ordinal scale (i.e. a range of damage levels) is often called “parallel slopes logistic regression”, because it assumes that the coefficients bI don’t change. Instead, there are additional “intercept” terms added to the model, so that the logits for the cumulative probability of a given ordinal level are a set of parallel lines with different intercepts. Then simple ordinal logistic regression fits a model of the form: Remember: The probability that an event will fall into a single class k (rather than the cumulative probability) is simply

The “Parallel Slopes” form of ordinal logistic regression
The Challenge: Since the response categories are ordinal, and the model predicts cumulative probabilities, we need a scientific model that generates predictions that keep the categories in order (i.e. the cumulative probability that a response should be in or less than level k needs to be greater than the predicted cumulative probability for level k-1 The Parallel Slopes solution: Just allow the intercept term in the equation for the logit to vary among the k ordinal responses, while the slope stays constant (Note that you only need k-1 intercepts…)

C. D. Canham In our case... where We will use the multiplicative model, but with the same assumption that the differences among the levels can be estimated from variation in the aj parameter and where aks, cs and bs are species specific parameters (s = 1.. S species), and Si are the estimated storm severities for the i = 1..N plots. # of parameters: N + (K-1+2)*S, where N = # of plots, K = # of ordinal response levels, and S = # of species

The Likelihood Function Stays the Same
Lecture 7 - Binomial and Logistic Regression C. D. Canham The Likelihood Function Stays the Same The probability that an event will fall into a single class k (rather than the cumulative probability) is simply The probability model and the scientific model are identical in this example, since the scientific model is already expressed in probabilistic terms. Again, since the scientific model is already expressed as a probabilistic equation:

Hurricane Damage in Puerto Rico
Lecture 7 - Binomial and Logistic Regression C. D. Canham Hurricane Damage in Puerto Rico Storm damage assessment in the permanent plot at the Luquillo LTER site Hurricane Hugo Hurricane Georges – 1998 Combined the data into a single analysis: 136 plots, 13 species (including 1 lumped category for “other” species), and 3 damage levels: No or light damage Partial damage Complete canopy loss Total # of parameters = 188 (15,647 trees) The Hurricane Hugo dataset contained damage assessments over the entire Luquillo permanent plot. I aggregated the 20 x 20 m plots into contiguous 40 x 40 m plots, starting at the origin of the plot, to provide a small enough sample of plots (96 instead of ~ 400), each containing sufficient numbers of individuals for the analysis. I omitted the top row of 20 x 20 m plots (since the 25 rows of plots don’t divide in half evenly...). When combined with the 40 plots from the Hurricane Georges dataset (30 x 30 m plots), this gave us a dataset with 136 plots and 15,647 trees for which damage was coded as either: (1) no or light damage (2) partial damage (medium damage from Hugo, or medium or high damage from Georges), and (3) complete canopy loss. Canham, C. D., J. Thompson, J. K. Zimmerman, and M. Uriarte. Variation in susceptibility to hurricane damage as a function of storm intensity in Puerto Rican tree species. Biotropica, in press.

Parameter Estimation with Simulated Annealing
Lecture 7 - Binomial and Logistic Regression C. D. Canham Parameter Estimation with Simulated Annealing I ran the simulated annealing algorithm for 5 million iterations, but it’s not obvious that it has converged yet... Solving simultaneously for 188 parameters in a dataset containing > 15,000 trees takes time!

C. D. Canham Model Evaluation The model does not fit as well as the simple logistic regression for the Adirondack data. In particular, the model tends to overestimate complete damage (i.e. fewer trees were observed to have complete damage than predicted for the classes with high predicted probability of complete damage...)

Comparison of the two storms...
Lecture 7 - Binomial and Logistic Regression C. D. Canham Comparison of the two storms... Remember that the estimator will effectively try to stretch the observed range of storm severity to the limits allowed (set to 0 – 1). Thus, the estimated maximum for Hugo (obviously the more severe storm) is 1, while the maximum estimated plot severity for Georges was Curiously, the variance in storm severity within the study site was very similar for the two storms... Statistics on variation in storm severity from Hurricanes Hugo and Georges

Support for the Storm Severity Parameter Estimates
Lecture 7 - Binomial and Logistic Regression C. D. Canham Support for the Storm Severity Parameter Estimates Support limits for the 136 estimates of storm severity were not particularly “tight” Remember that the storm severity parameter values range from 0 - 1

Support for the Species-specific Parameters
Lecture 7 - Binomial and Logistic Regression C. D. Canham Support for the Species-specific Parameters Strength of support for the species-specific parameters was better, but still not great... Range of the 1.92 Unit Support Intervals, as a % of the parameter estimate

C. D. Canham These are the estimated functions for a sample of 4 of the 13 species, arranged in rough order of increasing vulnerability. These are not the cumulative probabilities, but the individual probabilities of experiencing 1 of the 3 damage levels...

C. D. Canham Critical assumptions Probability of damage to a tree in Georges was independent of damage in Hugo (actually true…) The “parallel slopes” model is reasonable Others ? Combining data from the two storms at the same site is risky. There are obviously reasons to suspect that the level of damage in the first storm influences the risk of damage in the subsequent storm. This could be incorporated in the analysis, but I haven’t tried to do this yet...

Lecture 7 - Binomial and Logistic Regression

Similar presentations

Presentation on theme: "Lecture 7 - Binomial and Logistic Regression"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 7 - Binomial and Logistic Regression

Similar presentations

Presentation on theme: "Lecture 7 - Binomial and Logistic Regression"— Presentation transcript:

Similar presentations

About project

Feedback