Hierarchical Models
Roadmap: Models of increasing complexity Poisson plots: Individual differences via random effects. Hidden process models Light limitation of trees Simple non-hierarchical model Adding individual differences in max growth rate LIDET decomposition: sampling errors in y’s Errors in covariates Adding a treatment
R.A. Fischer’s ticks A simple example: We want to know (for some reason) the average number of ticks on sheep. We round up 60 sheep and count ticks on each one. Does a Poisson distribution fit the data?
A single mean governs the pattern Data Parameter A single mean governs the pattern
Exercise: Write out a simple Bayesian model for estimating the average number of ticks per sheep. How would you estimate that mean?
model{ #prior lambda ~ dgamma(0.001,0.001) # likelihood for(i in 1:60){ y[i] ~ dpois(lambda) }
Each sheep has its own mean Data Parameter Hyperparameter Each sheep has its own mean (a.k.a. random effect)
Shape and rate must be numbers Hierarchical models have quantities that appear on both sides of the | Shape and rate must be numbers
Hierarchical Model model{ # Priors a~ dgamma(.001,.001) b~ dgamma(.001,.001) for(i in 1:60){ lambda[i] ~ dgamma(a,b) y[i] ~ dpois(lambda[i]) } } #end of model INSERT DAG
Hidden Processes in Ecology The things we are able to observe are not usually what we seek to understand. The things we wish to estimate but can’t observe we call latent. We assume that the true, latent state gives rise to the observations with some uncertainty—observation error. We have a model that predicts the true, latent state imperfectly. All the things that influence the true latent state that are not represented in our model we treat stochastically in process variance. All this means that we need a model for the data (with uncertainty) and a model for the process (with uncertainty).
Blueprint for Hierarchical Bayes The probabilistic basis for factoring complex models into sensible components via rules of conditioning. Directed Acyclic Graphs (DAGs) again. Conditioning from DAGs. JAGS and MCMC from conditioning.
Conditioning Remember from basic laws of probability that: This generalizes to: where the components zi may be scalars or subvectors of z and the sequence of their conditioning is arbitrary.
So what does this mean? All these are correct. Any quantity that appears on the rhs of | and does not appear in the lhs of | must end up as a P().
A general, hierarchical model for ecological processes Define: θ : A vector of parameters that can be decomposed into two additional vectors: θ ={θp,θd} where θd are parameters in a model of the data and θp are parameters in the model of the process. Both vectors include parameters representing uncertainty. y: A vector or matrix of data, including responses and covariates. μ: The unobserved, true state of the system-the size of the population of sites occupied, the mass of C in a gram of soil, etc.
A general, hierarchical model for ecological processes
We will make use of the following three laws from probability theory:
A general, hierarchical model for ecological processes
Parameters in the data model, including observation uncertainty The data Predictions of the true state by a process model Parameters in the process model, including process uncertainty
This general set-up provides a basis for Gibbs sampling or JAGS code. Gibbs sampling: For each step in the chain, we cycle over each parameter and estimate its posterior distribution based on the probability densities where it appears, treating other parameters at their current value at each step in the chain. We sample from the full conditional distribution: remembering there may be more several more steps because θd and θp are vectors or parameters.
Roadmap: Models of increasing complexity Poisson plots: Individual differences via random effects. Hidden process models Light limitation of trees Simple non-hierarchical model Adding individual differences in max growth rate LIDET decomposition: sampling errors in y’s Errors in covariates Adding a treatment
Steps Diagram the unknowns and knowns for a single observation (A DAG) Using P() notation, write out an expression for the posterior distribution using arrows in the DAG as a guide. Chose appropriate distributions for the P()’s. Take products over all observations. Implement via MCMC or JAGS.
Developing a simple Bayesian model for light limitation of trees μi= prediction of growth for tree I xi= light measured for tree i ϒ= max. growth rate at high light c=minimum light requirement α=slope of curve at low light
A simple Bayesian model Write out the numerator for Bayes Law
A simple Bayesian model Now write out the full model by choosing appropriate distributions for the P()’s. Assume that the y’s can take negative values. There are n observations.
JAGS code
Now assume that there is individual variation in the α (max growth rate), such that each α, is drawn from a distribution of α’s. Sketch the DAG; write out the posterior and the joint conditional distribution using P() notation. Write out the distribution of the full data set and choose appropriate probability functions.
Adding complexity using a hierarchical model: individual variation in the α’s. a, b Hyper-parameters controlling the distribution of α’s
a,b
a, b
JAGS code
Roadmap: Models of increasing complexity Poisson plots: Individual differences via random effects. Light limitation of trees Simple non-hierarchical model Adding individual differences in max growth rate LIDET decomposition: sampling errors in y’s Errors in covariates Adding a treatment
So, what is a “data model”? Let μi be the true (latent) value of some quantity of interest and let yi be an observation of μi fd is a data model. In the simplest data model, θdata=σ of the data model, σdata. In this case, we assume there is no bias, that is, on average y= μ. We also may have something like θdata=[v,η, σdata] where v and η are parameters in a deterministic model that corrects for bias in the y.
Decomposition y=𝑀0𝑒-kt We conduct an experiment where we observe 5 replications of disappearance of litter at 10 points in time. Replications could be unbalanced. We want to estimate the parameters in a model of decomposition. Our dependent variable is the true mass of litter at time t. Our model is a simple decaying exponential. We are confident that there is no bias in the estimates of the mass of litter but there are errors in observations at each time for which we we must account. These errors occur because we get different values when we observe replicates from the same sample. y=𝑀0𝑒-kt
Process model
Process model with true mass y=𝑀0𝑒-kt
Process model with true mass and observed mass We can think of sampling variance and process variance in this way—if we increase the number of blue points (i.e. decrease sampling error), their average would asymptotically reach the red points. However, no matter how many blue points we observed, the red points would remain some distance away (i.e. process error) from our process model estimates.
Diagram the knowns and unknowns and write out the posterior and the joint conditional distribution for a single observation.
yijis the observed mass remaining. Data model μi is the true value of mass remaining. It is “latent”, i.e. not observable. Process model Parameter model
Data model Process model Parameter model
Putting it all together: JAGS pseudo code
Roadmap: Models of increasing complexity Poisson plots: Individual differences via random effects. Hidden process models Light limitation of trees Simple non-hierarchical model Adding individual differences in max growth rate LIDET decomposition: sampling errors in y’s Errors in covariates Adding a treatment
Accounting for errors in covariates We now want to add a way to represent uncertainty in the observations of light—in other words, we want to model the data as well as the process. This makes sense—a single measure of light cannot represent the light that is “seen” by a tree. For each tree, we take 10 different measurements of light at different points in the canopy. For now, we assume that the growth rates (the y’s) are measured perfectly. There are n=50 trees with j=10 light observations per tree for a total of n*j=500 observations. We will index the observations within a plant using i[j] to mean tree i containing light measurement j. The total set of light measurements are indexed N=1….n*j. This notation is useful because the number of measurements per tree can vary.
Data model Process model Parameter model Hyper-parameters Same as before but assuming perfect observations of x and y Process model Parameter model Hyper-parameters
Data model Process model Parameter model Hyper-parameters Xi[j] reads tree i containing light observation j. Data model Write out Bayesian model using P() notation. Process model Parameter model Hyper-parameters
Data model Process model Parameter model Hyper-parameters
What is this? N is the number of light readings, N is the number of trees
JAGS pseudo-code
Roadmap: Models of increasing complexity Poisson plots: Individual differences via random effects. Hidden process models Light limitation of trees Simple non-hierarchical model Adding individual differences in max growth rate LIDET decomposition: sampling errors in y’s Errors in covariates Adding a treatment
Adding an experimental treatment We now want to estimate the effect of CO2 enrichment on the effect of light on growth. We have two levels of CO2: ambient (365 ppm) and elevated (565 ppm). We want to examine the effect of treatment on maximum growth rate. We use an indicator variable u=0 for control and u=1 for ambient. We want to estimate a new parameter, k , that represents the additional growth due to CO2.
Data model Process model Parameter model Hyper-parameters Xi[j] reads tree i containing light observation j. Data model Write out Bayesian model using P() notation. Process model Parameter model Hyper-parameters
Data model Process model Parameter model Hyper-parameters
N is the number of light readings, N is the number of trees
How to develop a hierarchical model Develop a deterministic model of ecological process This may be mechanistic or empirical. It represents your hypothesis about how the world works. Diagram the relationship among data, processes, parameters. Using P() notation, write out a data model and process model using the diagram as a guide to conditioning. For each P() from above, choose a probability density function to represent the unknowns based on your knowledge on how the data arise. Are they counts? 0-1? Proportions? Always positive? etc… Think carefully how to subscript the data and the estimates of your process model. Add product symbols to develop total likelihoods over all the subscripts. Think about how the data arise. If you can simulate the data properly, that is all you need to know to do all steps above. These steps are not rigidly sequential