Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

Slides:



Advertisements
Similar presentations
Missing data – issues and extensions For multilevel data we need to impute missing data for variables defined at higher levels We need to have a valid.
Advertisements

Latent normal models for missing data Harvey Goldstein Centre for Multilevel Modelling University of Bristol.
VARYING RESIDUAL VARIABILITY SEQUENCE OF GRAPHS TO ILLUSTRATE r 2 VARYING RESIDUAL VARIABILITY N. Scott Urquhart Director, STARMAP Department of Statistics.
# 1 METADATA: A LEGACY FOR OUR GRANDCHILDREN N. Scott Urquhart STARMAP Program Director Department of Statistics Colorado State University.
An Overview STARMAP Project I Jennifer Hoeting Department of Statistics Colorado State University
Bayesian Estimation in MARK
Multi-Lag Cluster Enhancement of Fixed Grids for Variogram Estimation for Near Coastal Systems Kerry J. Ritter, SCCWRP Molly Leecaster, SCCWRP N. Scott.
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Markov-Chain Monte Carlo
# 1 STATISTICAL ASPECTS OF COLLECTIONS OF BEES TO STUDY PESTICIDES N. SCOTT URQUHART SENIOR RESEARCH SCIENTIST DEPARTMENT OF STATISTICS COLORADO STATE.
Robust sampling of natural resources using a GIS implementation of GRTS David Theobald Natural Resource Ecology Lab Dept of Recreation & Tourism Colorado.
Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey, F. Jay Breidt, Colorado State University Abstract In aquatic resources,
BAYESIAN INFERENCE Sampling techniques
Bayesian Models for Radio Telemetry Habitat Data Megan C. Dailey* Alix I. Gitelman Fred L. Ramsey Steve Starcevich * Department of Statistics, Colorado.
1 STARMAP: Project 2 Causal Modeling for Aquatic Resources Alix I Gitelman Stephen Jensen Statistics Department Oregon State University August 2003 Corvallis,
Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.
State-Space Models for Within-Stream Network Dependence William Coar Department of Statistics Colorado State University Joint work with F. Jay Breidt This.
Semiparametric Mixed Models in Small Area Estimation Mark Delorey F. Jay Breidt Colorado State University September 22, 2002.
Models for the Analysis of Discrete Compositional Data An Application of Random Effects Graphical Models Devin S. Johnson STARMAP Department of Statistics.
Habitat selection models to account for seasonal persistence in radio telemetry data Megan C. Dailey* Alix I. Gitelman Fred L. Ramsey Steve Starcevich.
1 Accounting for Spatial Dependence in Bayesian Belief Networks Alix I Gitelman Statistics Department Oregon State University August 2003 JSM, San Francisco.
Quantifying fragmentation of freshwater systems using a measure of discharge modification (and other applications) David Theobald, John Norman, David Merritt.
Today Introduction to MCMC Particle filters and MCMC
PAGE # 1 Presented by Stacey Hancock Advised by Scott Urquhart Colorado State University Developing Learning Materials for Surface Water Monitoring.
Quantifying fragmentation of freshwater systems using a measure of discharge modification (and other applications) David Theobald, John Norman, David Merritt.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
Two-Phase Sampling Approach for Augmenting Fixed Grid Designs to Improve Local Estimation for Mapping Aquatic Resources Kerry J. Ritter Molly Leecaster.
Basics: Notation: Sum:. PARAMETERS MEAN: Sample Variance: Standard Deviation: * the statistical average * the central tendency * the spread of the values.
Example For simplicity, assume Z i |F i are independent. Let the relative frame size of the incomplete frame as well as the expected cost vary. Relative.
Habitat association models  Independent Multinomial Selections (IMS): (McCracken, Manly, & Vander Heyden, 1998) Product multinomial likelihood with multinomial.
Visual Recognition Tutorial
PAGE # 1 STARMAP OUTREACH Scott Urquhart Department of Statistics Colorado State University.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
Models for the Analysis of Discrete Compositional Data An Application of Random Effects Graphical Models Devin S. Johnson STARMAP Department of Statistics.
State-Space Models for Biological Monitoring Data Devin S. Johnson University of Alaska Fairbanks and Jennifer A. Hoeting Colorado State University.
1 Learning Materials for Surface Water Monitoring Gerald Scarzella.
Optimal Sample Designs for Mapping EMAP Data Molly Leecaster, Ph.D. Idaho National Engineering & Environmental Laboratory Jennifer Hoeting, Ph. D. Colorado.
Applications of Nonparametric Survey Regression Estimation in Aquatic Resources F. Jay Breidt, Siobhan Everson-Stewart, Alicia Johnson, Jean D. Opsomer.
Statistical Models for Stream Ecology Data: Random Effects Graphical Models Devin S. Johnson Jennifer A. Hoeting STARMAP Department of Statistics Colorado.
Random Effects Graphical Models and the Analysis of Compositional Data Devin S. Johnson and Jennifer A. Hoeting STARMAP Department of Statistics Colorado.
1 Learning Materials for Surface Water Monitoring Gerald Scarzella.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
GRA 6020 Multivariate Statistics Probit and Logit Models Ulf H. Olsson Professor of Statistics.
1 Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys Breda Munoz Virginia Lesser R
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 Spatial and Spatio-temporal modeling of the abundance of spawning coho salmon on the Oregon coast R Ruben Smith Don L. Stevens Jr. September.
Simulation of the matrix Bingham-von Mises- Fisher distribution, with applications to multivariate and relational data Discussion led by Chunping Wang.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
DAMARS/STARMAP 8/11/03# 1 STARMAP YEAR 2 N. Scott Urquhart STARMAP Director Department of Statistics Colorado State University Fort Collins, CO
STA 216 Generalized Linear Models Meets: 2:50-4:05 T/TH (Old Chem 025) Instructor: David Dunson 219A Old Chemistry, Teaching.
Bayesian Multivariate Logistic Regression by Sean O’Brien and David Dunson (Biometrics, 2004 ) Presented by Lihan He ECE, Duke University May 16, 2008.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination based on event counts (follow-up from 11 May 07) ATLAS Statistics Forum.
Lecture 2: Statistical learning primer for biologists
Short course on space-time modeling Instructors: Peter Guttorp Johan Lindström Paul Sampson.
VARYING DEVIATION BETWEEN H 0 AND TRUE  SEQUENCE OF GRAPHS TO ILLUSTRATE POWER VARYING DEVIATION BETWEEN H 0 AND TRUE  N. Scott Urquhart Director, STARMAP.
BAE 5333 Applied Water Resources Statistics
Estimation and Model Selection for Geostatistical Models
STA 216 Generalized Linear Models
Lecture 09: Gaussian Processes
Latent Variables, Mixture Models and EM
School of Mathematical Sciences, University of Nottingham.
STA 216 Generalized Linear Models
Spatial Prediction of Coho Salmon Counts on Stream Networks
Markov Random Fields for Edge Classification
Probabilistic Models with Latent Variables
Paul D. Sampson Peter Guttorp
Lecture 10: Gaussian Processes
TROUBLESOME CONCEPTS IN STATISTICS: r2 AND POWER
Concepts and Applications of Kriging
Presentation transcript:

Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado State University Department of Statistics, Colorado State University *Department of Civil Engineering, Colorado State University A spatial model for ordered categorical data

Substrate size in streams ► Influences in-stream physical habitat ► Often indicative of stream health ► EPA collected data at 485 sites in Washington and Oregon between 1994 and 2004

Data Collection Protocol ► At a site:  11 transects x 5 points along each transect  Choose particle under the sharp end of a stick  Visually estimate and classify size

Creating the response ► For a site:  Transform the original size classes to log 10 (Geometric Mean) for all sample points  Find the median for the site ► Geometric mean

The response ► Y i = median[log 10 (geometric mean)] for site i ► Transformation provides a more symmetric, continuous-like variable  Typically modeled as a continuous variable  Predictive models have performed poorly ► Response is an ordered categorical variable  12 categories (6 with very few observations)

Ordered categorical data ► Y i is a categorical response variable with K ordered values: {1,…,K} ► Modeling objectives:   Explain the variation in the ordered response from covariate(s)   Incorporate the spatial dependence   Estimate, predict, and create maps of Pr(Y i ≤ k) and Pr(Y i = k)

Formulating the spatial model Spatial model for ordered categorical data += Non-spatial model for ordered categorical data Albert & Chib (1993, 1997) Spatial model for binary and count data Diggle, Tawn, & Moyeed (1998) Gelfand & Ravishanker (1998) Generalized geostatistical models with a latentGeneralized geostatistical models with a latent Gaussian process Metropolis Hastings within Gibbs samplingMetropolis Hastings within Gibbs samplingapproach

Latent variable formulation ► Define latent variable, Z i, such that Z i = X i ’β + ε i  ε i ~ N(0,1) for the probit model  ε i ~ Standard Logistic for logit model ► Define the categorical response, Y i = {1,…,K}, using Z i and ordered cut-points, θ = (θ 1, …,θ K-1 ), where 0 = θ 1 < θ 2 < … < θ K-1 < θ K = ∞ where 0 = θ 1 < θ 2 < … < θ K-1 < θ K = ∞ Y i = 1 if Z i < θ 1 Y i = 1 if Z i < θ 1 Y i = k if θ k-1 ≤ Z i < θ k Y i = k if θ k-1 ≤ Z i < θ k Y i = K if Z i ≥ θ K-1 Y i = K if Z i ≥ θ K-1

Latent variable formulation ► Thus, Pr(Y i ≤ k | θ, β) = Pr(Z i < θ k ) Pr(Y i = k | θ, β) = Pr(θ k-1 ≤ Z i < θ k ) Pr(Y i = k | θ, β) = Pr(θ k-1 ≤ Z i < θ k )  If Z ~ N(X i ’β, 1), then Pr(Y i ≤ k | θ, β) = Φ(θ k – X i ’β) Pr(Y i ≤ k | θ, β) = Φ(θ k – X i ’β) Pr(Y i = k | θ, β) = Φ(θ k – X i ’β) - Φ(θ k-1 – X i ’β) Pr(Y i = k | θ, β) = Φ(θ k – X i ’β) - Φ(θ k-1 – X i ’β) where Φ is the N(0,1) cdf

Spatial cumulative model ► Z i = X i ’β + W i + ε i is the latent variable  where ε i ~ N(0,1) W ~ N(0,  2 H(  )) (H(  )) ij =  (s i -s j ;  ) W ~ N(0,  2 H(  )) (H(  )) ij =  (s i -s j ;  ) Z i | β, W i ~ N(X i ’β + W i, 1) ► Pr(Y i ≤ k | β, θ, W i ) = Pr(Z i < θ k ) = Φ (θ k – X i ’β - W i ) = Φ (θ k – X i ’β - W i ) Where θ = (θ 1, …,θ K ) is a vector of cut-points such that 0 = θ 1 < θ 2 < … < θ K-1 < θ K = ∞ such that 0 = θ 1 < θ 2 < … < θ K-1 < θ K = ∞

Fitting the spatial model ► The likelihood ► Estimating  = (  0,  1 ),  = (  2,  ), θ = (θ 2, …,θ K-1 ) ► Transform θ to a real-valued, unrestricted cut-points:  = (      where    log(θ 2 )  k   log(θ k – θ k-1 )  k   log(θ k – θ k-1 ) ► MCMC sampling  Metropolis-Hastings within Gibbs sampling  Prior: ►  – flat and conjugate Normal ►  2 and  – Independent uniform priors ►  multivariate normal

Simulated data ► Simulated data at a subset of the original locations (n = 82)  Cluster infill around the 82 sites (n=120)  Spatial process: ► W is a stationary Gaussian process with E[W(s)]=0 and Cov[W(s i ),W(s j )] =  2  (s i -s j ;  ) ► Exponential correlation function:  (d) = exp(-d  )  Covariate: ► Distance weighted stream power

Preliminary Results ► Posterior quantities  Based on 1000 iterations (burn-in = 1000)

Posterior mean of the spatial process

Posterior SD of the spatial process

Posterior mean and SD for Pr(Y i = 2)

Posterior mean and SD for Pr(Y i = 5)

Posterior mean and SD for Pr(Y i ≤ 5)

Future Work ► Convergence and mixing for the spatial model ► Models and methods for large data sets  Spectral parameterization of the spatial process ► Wikle (2002), Paciorek & Ryan (2005), Royle & Wikle (2005)  Importance sampling ► Gelfand & Ravishanker (1998), Gelfand, Ravishanker, & Ecker (2000)  Sub-sampling ► Investigate different spatial correlation functions and distance metrics  Traditional  Stream based ► Model selection for the spatial model

Funding and Affiliations FUNDING/DISCLAIMER The work reported here was developed under the STAR Research Assistance Agreement CR awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This presentation has not been formally reviewed by EPA. The views expressed here are solely those of the authors and STARMAP, the Program they represent. EPA does not endorse any products or commercial services mentioned in this presentation. Megan’s research is also partially supported by the PRIMES National Science Foundation Grant DGE CR

Thank you

Subset of data (n small = 82)

Sample path plot - Example

Surface for estimating  =(  2,  )

Sample path plot – Avoiding plateau