Hierarchical Models.

Slides:



Advertisements
Similar presentations
Regression Eric Feigelson Lecture and R tutorial Arcetri Observatory April 2014.
Advertisements

Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Transforming the data Modified from: Gotelli and Allison Chapter 8; Sokal and Rohlf 2000 Chapter 13.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.
Statistical Decision Theory
Model Inference and Averaging
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Testing Hypotheses about Differences among Several Means.
Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Maximum likelihood estimators Example: Random data X i drawn from a Poisson distribution with unknown  We want to determine  For any assumed value of.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
JAGS & Bayesian Regression. A new approach to insight Pose question and think of the answer needed to answer it. Ask: How do the data arise? What is the.
JAGS. Learning Objectives Be able to represent ecological systems as a network of known and unknowns linked by deterministic and stochastic relationships.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Likelihood and Maximum Likelihood Estimation. Objectives Understand the concept of likelihood and its relationship to the probability of data conditioned.
Hierarchical models. Hierarchical with respect to Response being modeled – Outliers – Zeros Parameters in the model – Trends (Us) – Interactions (Bs)
Fundamentals of Data Analysis Lecture 10 Correlation and regression.
Hierarchical Models. Conceptual: What are we talking about? – What makes a statistical model hierarchical? – How does that fit into population analysis?
Bayesian analysis of a conceptual transpiration model with a comparison of canopy conductance sub-models Sudeep Samanta Department of Forest Ecology and.
Prediction and Missing Data. Summarising Distributions ● Models are often large and complex ● Often only interested in some parameters – e.g. not so interested.
Bayes’ Theorem Suppose we have estimated prior probabilities for events we are concerned with, and then obtain new information. We would like to a sound.
Stochasticity and Probability. A new approach to insight Pose question and think of the answer needed to answer it. Ask: How do the data arise? What is.
Data Modeling Patrice Koehl Department of Biological Sciences
Lecture 1.31 Criteria for optimal reception of radio signals.
SUR-2250 Error Theory.
Chapter 4 Basic Estimation Techniques
ESTIMATION.
Probability Theory and Parameter Estimation I
Linear Regression.
STAT 311 Chapter 1 - Overview and Descriptive Statistics
ICS 280 Learning in Graphical Models
Model Inference and Averaging
Linear and generalized linear mixed effects models
Bayes Net Learning: Bayesian Approaches
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Inference about the Slope and Intercept
Introduction to Instrumentation Engineering
Discrete Event Simulation - 8
Discrete Event Simulation - 4
Inference about the Slope and Intercept
Geology Geomath Chapter 7 - Statistics tom.h.wilson
EC 331 The Theory of and applications of Maximum Likelihood Method
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Propagation Algorithm in Bayesian Networks
Statistical NLP: Lecture 4
Integration of sensory modalities
What are BLUP? and why they are useful?
Simple Linear Regression
LECTURE 09: BAYESIAN LEARNING
Parametric Methods Berlin Chen, 2005 References:
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions July chonbuk national university.
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Mathematical Foundations of BME Reza Shadmehr
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Classical regression review
Presentation transcript:

Hierarchical Models

Roadmap: Models of increasing complexity Poisson plots: Individual differences via random effects. Hidden process models Light limitation of trees Simple non-hierarchical model Adding individual differences in max growth rate LIDET decomposition: sampling errors in y’s Errors in covariates Adding a treatment

R.A. Fischer’s ticks A simple example: We want to know (for some reason) the average number of ticks on sheep. We round up 60 sheep and count ticks on each one. Does a Poisson distribution fit the data?

A single mean governs the pattern Data Parameter A single mean governs the pattern

Exercise: Write out a simple Bayesian model for estimating the average number of ticks per sheep. How would you estimate that mean?

model{ #prior lambda ~ dgamma(0.001,0.001) # likelihood for(i in 1:60){ y[i] ~ dpois(lambda) }

Each sheep has its own mean Data Parameter Hyperparameter Each sheep has its own mean (a.k.a. random effect)

Shape and rate must be numbers Hierarchical models have quantities that appear on both sides of the | Shape and rate must be numbers

Hierarchical Model model{ # Priors a~ dgamma(.001,.001) b~ dgamma(.001,.001) for(i in 1:60){ lambda[i] ~ dgamma(a,b) y[i] ~ dpois(lambda[i]) } } #end of model INSERT DAG

Hidden Processes in Ecology The things we are able to observe are not usually what we seek to understand. The things we wish to estimate but can’t observe we call latent. We assume that the true, latent state gives rise to the observations with some uncertainty—observation error. We have a model that predicts the true, latent state imperfectly. All the things that influence the true latent state that are not represented in our model we treat stochastically in process variance. All this means that we need a model for the data (with uncertainty) and a model for the process (with uncertainty).

Blueprint for Hierarchical Bayes The probabilistic basis for factoring complex models into sensible components via rules of conditioning. Directed Acyclic Graphs (DAGs) again. Conditioning from DAGs. JAGS and MCMC from conditioning.

Conditioning Remember from basic laws of probability that: This generalizes to: where the components zi may be scalars or subvectors of z and the sequence of their conditioning is arbitrary.

So what does this mean? All these are correct. Any quantity that appears on the rhs of | and does not appear in the lhs of | must end up as a P().

A general, hierarchical model for ecological processes Define: θ : A vector of parameters that can be decomposed into two additional vectors: θ ={θp,θd} where θd are parameters in a model of the data and θp are parameters in the model of the process. Both vectors include parameters representing uncertainty. y: A vector or matrix of data, including responses and covariates. μ: The unobserved, true state of the system-the size of the population of sites occupied, the mass of C in a gram of soil, etc.

A general, hierarchical model for ecological processes

We will make use of the following three laws from probability theory:

A general, hierarchical model for ecological processes

Parameters in the data model, including observation uncertainty The data Predictions of the true state by a process model Parameters in the process model, including process uncertainty

This general set-up provides a basis for Gibbs sampling or JAGS code. Gibbs sampling: For each step in the chain, we cycle over each parameter and estimate its posterior distribution based on the probability densities where it appears, treating other parameters at their current value at each step in the chain. We sample from the full conditional distribution: remembering there may be more several more steps because θd and θp are vectors or parameters.

Roadmap: Models of increasing complexity Poisson plots: Individual differences via random effects. Hidden process models Light limitation of trees Simple non-hierarchical model Adding individual differences in max growth rate LIDET decomposition: sampling errors in y’s Errors in covariates Adding a treatment

Steps Diagram the unknowns and knowns for a single observation (A DAG) Using P() notation, write out an expression for the posterior distribution using arrows in the DAG as a guide. Chose appropriate distributions for the P()’s. Take products over all observations. Implement via MCMC or JAGS.

Developing a simple Bayesian model for light limitation of trees μi= prediction of growth for tree I xi= light measured for tree i ϒ= max. growth rate at high light c=minimum light requirement α=slope of curve at low light

A simple Bayesian model Write out the numerator for Bayes Law

A simple Bayesian model Now write out the full model by choosing appropriate distributions for the P()’s. Assume that the y’s can take negative values. There are n observations.

JAGS code

Now assume that there is individual variation in the α (max growth rate), such that each α, is drawn from a distribution of α’s. Sketch the DAG; write out the posterior and the joint conditional distribution using P() notation. Write out the distribution of the full data set and choose appropriate probability functions.

Adding complexity using a hierarchical model: individual variation in the α’s. a, b Hyper-parameters controlling the distribution of α’s

a,b

a, b

JAGS code

Roadmap: Models of increasing complexity Poisson plots: Individual differences via random effects. Light limitation of trees Simple non-hierarchical model Adding individual differences in max growth rate LIDET decomposition: sampling errors in y’s Errors in covariates Adding a treatment

So, what is a “data model”? Let μi be the true (latent) value of some quantity of interest and let yi be an observation of μi fd is a data model. In the simplest data model, θdata=σ of the data model, σdata. In this case, we assume there is no bias, that is, on average y= μ. We also may have something like θdata=[v,η, σdata] where v and η are parameters in a deterministic model that corrects for bias in the y.

Decomposition y=𝑀0𝑒-kt We conduct an experiment where we observe 5 replications of disappearance of litter at 10 points in time. Replications could be unbalanced. We want to estimate the parameters in a model of decomposition. Our dependent variable is the true mass of litter at time t. Our model is a simple decaying exponential. We are confident that there is no bias in the estimates of the mass of litter but there are errors in observations at each time for which we we must account. These errors occur because we get different values when we observe replicates from the same sample. y=𝑀0𝑒-kt

Process model

Process model with true mass y=𝑀0𝑒-kt

Process model with true mass and observed mass We can think of sampling variance and process variance in this way—if we increase the number of blue points (i.e. decrease sampling error), their average would asymptotically reach the red points. However, no matter how many blue points we observed, the red points would remain some distance away (i.e. process error) from our process model estimates.

Diagram the knowns and unknowns and write out the posterior and the joint conditional distribution for a single observation.

yijis the observed mass remaining. Data model μi is the true value of mass remaining. It is “latent”, i.e. not observable. Process model Parameter model

Data model Process model Parameter model

Putting it all together: JAGS pseudo code

Roadmap: Models of increasing complexity Poisson plots: Individual differences via random effects. Hidden process models Light limitation of trees Simple non-hierarchical model Adding individual differences in max growth rate LIDET decomposition: sampling errors in y’s Errors in covariates Adding a treatment

Accounting for errors in covariates We now want to add a way to represent uncertainty in the observations of light—in other words, we want to model the data as well as the process. This makes sense—a single measure of light cannot represent the light that is “seen” by a tree. For each tree, we take 10 different measurements of light at different points in the canopy. For now, we assume that the growth rates (the y’s) are measured perfectly. There are n=50 trees with j=10 light observations per tree for a total of n*j=500 observations. We will index the observations within a plant using i[j] to mean tree i containing light measurement j. The total set of light measurements are indexed N=1….n*j. This notation is useful because the number of measurements per tree can vary.

Data model Process model Parameter model Hyper-parameters Same as before but assuming perfect observations of x and y Process model Parameter model Hyper-parameters

Data model Process model Parameter model Hyper-parameters Xi[j] reads tree i containing light observation j. Data model Write out Bayesian model using P() notation. Process model Parameter model Hyper-parameters

Data model Process model Parameter model Hyper-parameters

What is this? N is the number of light readings, N is the number of trees

JAGS pseudo-code

Roadmap: Models of increasing complexity Poisson plots: Individual differences via random effects. Hidden process models Light limitation of trees Simple non-hierarchical model Adding individual differences in max growth rate LIDET decomposition: sampling errors in y’s Errors in covariates Adding a treatment

Adding an experimental treatment We now want to estimate the effect of CO2 enrichment on the effect of light on growth. We have two levels of CO2: ambient (365 ppm) and elevated (565 ppm). We want to examine the effect of treatment on maximum growth rate. We use an indicator variable u=0 for control and u=1 for ambient. We want to estimate a new parameter, k , that represents the additional growth due to CO2.

Data model Process model Parameter model Hyper-parameters Xi[j] reads tree i containing light observation j. Data model Write out Bayesian model using P() notation. Process model Parameter model Hyper-parameters

Data model Process model Parameter model Hyper-parameters

N is the number of light readings, N is the number of trees

How to develop a hierarchical model Develop a deterministic model of ecological process This may be mechanistic or empirical. It represents your hypothesis about how the world works. Diagram the relationship among data, processes, parameters. Using P() notation, write out a data model and process model using the diagram as a guide to conditioning. For each P() from above, choose a probability density function to represent the unknowns based on your knowledge on how the data arise. Are they counts? 0-1? Proportions? Always positive? etc… Think carefully how to subscript the data and the estimates of your process model. Add product symbols to develop total likelihoods over all the subscripts. Think about how the data arise. If you can simulate the data properly, that is all you need to know to do all steps above. These steps are not rigidly sequential