Spatial Prediction of Coho Salmon Counts on Stream Networks

Slides:



Advertisements
Similar presentations
Spatial point patterns and Geostatistics an introduction
Advertisements

Spatial point patterns and Geostatistics an introduction
Autocorrelation and Heteroskedasticity
Use of Estimating Equations and Quadratic Inference Functions in Complex Surveys Leigh Ann Harrod and Virginia Lesser Department of Statistics Oregon State.
StatisticalDesign&ModelsValidation. Introduction.
Forecasting Using the Simple Linear Regression Model and Correlation
Introduction to Regression with Measurement Error STA431: Spring 2015.
TRIM Workshop Arco van Strien Wildlife statistics Statistics Netherlands (CBS)
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Linear regression models
Cox Model With Intermitten and Error-Prone Covariate Observation Yury Gubman PhD thesis in Statistics Supervisors: Prof. David Zucker, Prof. Orly Manor.
Structural Equation Modeling
The Simple Linear Regression Model: Specification and Estimation
Raymond J. Carroll Texas A&M University Nonparametric Regression and Clustered/Longitudinal Data.
Raymond J. Carroll Department of Statistics and Nutrition Texas A&M University Non/Semiparametric Regression.
Clustered or Multilevel Data
Strength of Spatial Correlation and Spatial Designs: Effects on Covariance Estimation Kathryn M. Irvine Oregon State University Alix I. Gitelman Sandra.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Autocorrelation Lecture 18 Lecture 18.
Introduction to Regression with Measurement Error STA431: Spring 2013.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
1 Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys Breda Munoz Virginia Lesser R
GEE and Generalized Linear Mixed Models
Lecture 16 Correlation and Coefficient of Correlation
1 Spatial and Spatio-temporal modeling of the abundance of spawning coho salmon on the Oregon coast R Ruben Smith Don L. Stevens Jr. September.
Regression and Correlation Methods Judy Zhong Ph.D.
Modeling Correlated/Clustered Multinomial Data Justin Newcomer Department of Mathematics and Statistics University of Maryland, Baltimore County Probability.
Geographic Information Science
The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.
Why Model? Make predictions or forecasts where we don’t have data.
Danila Filipponi Simonetta Cozzi ISTAT, Italy Outlier Identification Procedures for Contingency Tables in Longitudinal Data Roma,8-11 July 2008.
GEE Approach Presented by Jianghu Dong Instructor: Professor Keumhee Chough (K.C.) Carrière.
Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
The Simple Linear Regression Model: Specification and Estimation ECON 4550 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s.
Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng.
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
T Test for Two Independent Samples. t test for two independent samples Basic Assumptions Independent samples are not paired with other observations Null.
Statistics……revisited
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Diversity Loss in General Estimation of Distribution Algorithms J. L. Shapiro PPSN (Parallel Problem Solving From Nature) ’06 BISCuit 2 nd EDA Seminar.
1 Probability and Statistics Confidence Intervals.
Topics, Summer 2008 Day 1. Introduction Day 2. Samples and populations Day 3. Evaluating relationships Scatterplots and correlation Day 4. Regression and.
The Probit Model Alexander Spermann University of Freiburg SS 2008.
Exposure Prediction and Measurement Error in Air Pollution and Health Studies Lianne Sheppard Adam A. Szpiro, Sun-Young Kim University of Washington CMAS.
The Probit Model Alexander Spermann University of Freiburg SoSe 2009
Why Model? Make predictions or forecasts where we don’t have data.
Bayesian Semi-Parametric Multiple Shrinkage
BINARY LOGISTIC REGRESSION
Machine Learning – Classification David Fenyő
Statistical Modelling
Evaluating Classifiers
CHAPTER 7 Linear Correlation & Regression Methods
Assessing Disclosure Risk in Microdata
Correlation and Simple Linear Regression
Making Statistical Inferences
Logistic Regression --> used to describe the relationship between
Correlation and Simple Linear Regression
Chapter 8: Weighting adjustment
The Simple Linear Regression Model: Specification and Estimation
Simple Linear Regression and Correlation
Product moment correlation
The loss function, the normal equation,
Model generalization Brief summary of methods
The European Statistical Training Programme (ESTP)
Introductory Statistics
Relative risk estimation with clustered/longitudinal data: solving convergence issues in fitting the log binomial generalized estimating equations (GEE)
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Spatial Prediction of Coho Salmon Counts on Stream Networks Dan Dalthorp Lisa Madsen Oregon State University September 8, 2005

Sponsors U.S. EPA STAR grant # CR-829095 U.S. EPA Program for Cooperative Research on Aquatic Indicators at Oregon State University grant # CR-83168201-0.

Outline • Introduction (i) Coho salmon data (ii) GEEs for spatial data • Latent process model for spatially correlated counts • Estimation and results • Cross-validation • Simulation study • Conclusions and future research

Coho Salmon Data Adult Coho salmon counts at selected points in Oregon coastal stream networks for 1998 through 2003. Euclidean distance between sampled points. Stream distance between sampled points.

Coastal Stream Networks and Sampling Locations

GEEs for Spatially Correlated Data Liang and Zeger’s (1986) pioneering paper in Biometrika introduced GEEs for longitudinal data. Zeger (1988) developed GEE analysis for a time series of counts using a latent process model. McShane, Albert, and Palmatier (1997) adapted Zeger’s model and analysis to spatially correlated count data. Gotway and Stroup (1997) used GEEs to model and predict spatially correlated binary and count data. Lin and Clayton (2005) develop asymptotic theory for GEE estimators of parameters in a spatially correlated logistic regression model

The Latent Process Model Suppose: The latent process allows for overdispersion and spatial correlation in .

The Marginal Model These assumptions imply: For now, we assume a simple constant-mean model and a one-parameter exponential correlation function:

Estimating the Model Parameters To estimate parameters solve estimating equations: where

Iterative Modified Scoring Algorithm Step 0: Calculate initial estimates Step 1: Update .

Step 2: Update . Step 3: Update . Iterate steps 1, 2, and 3 until convergence.

Assessing Model Fit–Estimating the Mean   Assessing Model Fit–Estimating the Mean Year Sample Mean Euclidean Distance Stream 1998 6.2451 6.0 6.4941 1999 9.0025 8.7286 9.0765 2000 11.92 10.898 11.481 2001 31.359 31.597 34.541 2002 46.494 46.782 46.725 2003 44.453 41.005 41.829

Assessing Model Fit – Estimating the Variance     Assessing Model Fit – Estimating the Variance Year Sample Std. Dev. Euclidean Distance Stream 1998 222.07 221.59 1999 443.65 442.61 442.54 2000 384.59 384.75 383.90 2001 2508.6 2502.3 2512.3 2002 9286.6 9265.4 2003 3650.2 3653.4 3648.4    

Assessing Model Fit – Estimating the Range (Euclidean Distance)

Assessing Model Fit – Estimating the Range (Stream Distance)

Cross validation to compare predictions based on three different assumptions about the underlying spatial process:   1. Null model (spatial independence) : 2. Spatial correlation as a function of Euclidean distance (ed): 3. Spatial correlation as a function of stream network distance (id)

1. Bias? Not an issue... 2. Precision? Covariance model _ Euclidean Stream distance 1998 -0.001 -0.047 1999 0.007 -0.037 2000 0.013 0.011 2001 -0.005 -0.005 2002 -0.008 -0.007 2003 -0.002 0.020 1. Bias? Not an issue... 2. Precision? Covariance model _ Null Euclidean Stream distance 1998 14.72 13.25 14.00 1999 20.58 19.75 21.17 2000 20.05 19.83 19.74 2001 48.69 34.38 37.75 2002 98.53 97.04 97.35 2003 60.49 60.92 58.61

Variances of predicteds Odds(|Eed| < |Eid|) Null Euclidean Stream 0.04 10.32 4.95 0.05 12.07 7.68 0.04 11.46 6.40 0.13 38.08 33.36 0.22 15.74 10.65 0.14 24.34 25.99 Odds(|Eed| < |Eid|) Year Odds 1998 256:152 1999 267:132 2000 266:171 2001 197:198 2002 266:171 2003 222:197 Total 1474:1021

For each year, 8 scenarios that mimic the sample means, Simulations For each year, 8 scenarios that mimic the sample means, variances, and ranges from the data were simulated. Mean and variance constant 1. Euclidean spatial correlation 2. Stream network spatial correlation Mean varies randomly by stream network; variance = 3.66 m 1.741 3. Euclidean spatial correlation; long range 4. Euclidean spatial correlation; medium range 5. Euclidean spatial correlation; short range 6. Stream network spatial correlation; long range 7. Stream network spatial correlation; medium range 8. Stream network spatial correlation; short range

Simulation proceedure 1. Simulate vector Z of correlated lognormal-Poissons to cover all sampling sites (n ≈ 400) 2. Estimate parameters (m, s2, range) via latent process regression from simulated data for a subset of the sampling sites (blue) 3. Predict Z at the remaining sites (red, m ≈ 400) using: (Gotway and Stroup 1997) 4. Repeat 100 times for each scenario (8) and year (6)

Use Euclidean distance or stream distance in covariance model? Evaluation of predictions via two measures: where:

Summary of Findings Cross-validations: 1. MSPEs same for Euclidean distance and stream network distance; 2. Errors usually smaller with Euclidean distance; 3. Population spikes more likely to be detected with Euclidean distance. Simulations: 1. Euclidean spatial process: Euclidean covariance gives smaller MSPE than does stream network distance covariance; 2. Stream network process: Euclidean covariance model MSPEs comparable to those of stream distance model EXCEPT when network means varied and range of correlation was large.

Future work -- Incorporate covariates (with some misaligned data); -- Incorporate downstream distances/flow ratios; -- Spatio-temporal modeling; -- Rank correlations in place of covariances; -- Model selection; -- Non-random data;