Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey, F. Jay Breidt, Colorado State University Abstract In aquatic resources,

Slides:



Advertisements
Similar presentations
VARYING RESIDUAL VARIABILITY SEQUENCE OF GRAPHS TO ILLUSTRATE r 2 VARYING RESIDUAL VARIABILITY N. Scott Urquhart Director, STARMAP Department of Statistics.
Advertisements

# 1 METADATA: A LEGACY FOR OUR GRANDCHILDREN N. Scott Urquhart STARMAP Program Director Department of Statistics Colorado State University.
An Overview STARMAP Project I Jennifer Hoeting Department of Statistics Colorado State University
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Sampling: Final and Initial Sample Size Determination
Model- vs. design-based sampling and variance estimation on continuous domains Cynthia Cooper OSU Statistics September 11, 2004 R
Sampling with unequal probabilities STAT262. Introduction In the sampling schemes we studied – SRS: take an SRS from all the units in a population – Stratified.
Robust sampling of natural resources using a GIS implementation of GRTS David Theobald Natural Resource Ecology Lab Dept of Recreation & Tourism Colorado.
1 STARMAP: Project 2 Causal Modeling for Aquatic Resources Alix I Gitelman Stephen Jensen Statistics Department Oregon State University August 2003 Corvallis,
EPA & Ecology 2005 # 1 AN ACADEMICIAN’S VIEW OF EPA’s ECOLOGY PROGRAM ESPECIALLY ITS ENVIRONMENTAL MONITORING AND ASSESSMENT PROGRAM (EMAP) N. Scott Urquhart,
State-Space Models for Within-Stream Network Dependence William Coar Department of Statistics Colorado State University Joint work with F. Jay Breidt This.
Dr. Chris L. S. Coryn Spring 2012
Semiparametric Mixed Models in Small Area Estimation Mark Delorey F. Jay Breidt Colorado State University September 22, 2002.
Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.
Models for the Analysis of Discrete Compositional Data An Application of Random Effects Graphical Models Devin S. Johnson STARMAP Department of Statistics.
What is a Multi-Scale Analysis? Implications for Modeling Presence/Absence of Bird Species Kathryn M. Georgitis 1, Alix I. Gitelman 1, Don L. Stevens 1,
1 Accounting for Spatial Dependence in Bayesian Belief Networks Alix I Gitelman Statistics Department Oregon State University August 2003 JSM, San Francisco.
Quantifying fragmentation of freshwater systems using a measure of discharge modification (and other applications) David Theobald, John Norman, David Merritt.
PAGE # 1 Presented by Stacey Hancock Advised by Scott Urquhart Colorado State University Developing Learning Materials for Surface Water Monitoring.
Overview of STAT 270 Ch 1-9 of Devore + Various Applications.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
Two-Phase Sampling Approach for Augmenting Fixed Grid Designs to Improve Local Estimation for Mapping Aquatic Resources Kerry J. Ritter Molly Leecaster.
Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey Joint work with F. Jay Breidt and Jean Opsomer September 8, 2005.
Example For simplicity, assume Z i |F i are independent. Let the relative frame size of the incomplete frame as well as the expected cost vary. Relative.
Habitat association models  Independent Multinomial Selections (IMS): (McCracken, Manly, & Vander Heyden, 1998) Product multinomial likelihood with multinomial.
PAGE # 1 STARMAP OUTREACH Scott Urquhart Department of Statistics Colorado State University.
October, A Comparison of Variance Estimates of Stream Network Resources Sarah J. Williams Candidate for the degree of Master of Science Colorado.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
State-Space Models for Biological Monitoring Data Devin S. Johnson University of Alaska Fairbanks and Jennifer A. Hoeting Colorado State University.
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.
STAT 4060 Design and Analysis of Surveys Exam: 60% Mid Test: 20% Mini Project: 10% Continuous assessment: 10%
1 Learning Materials for Surface Water Monitoring Gerald Scarzella.
Optimal Sample Designs for Mapping EMAP Data Molly Leecaster, Ph.D. Idaho National Engineering & Environmental Laboratory Jennifer Hoeting, Ph. D. Colorado.
Applications of Nonparametric Survey Regression Estimation in Aquatic Resources F. Jay Breidt, Siobhan Everson-Stewart, Alicia Johnson, Jean D. Opsomer.
Random Effects Graphical Models and the Analysis of Compositional Data Devin S. Johnson and Jennifer A. Hoeting STARMAP Department of Statistics Colorado.
1 Learning Materials for Surface Water Monitoring Gerald Scarzella.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
Ch 5: Equal probability cluster samples
1 Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys Breda Munoz Virginia Lesser R
Business Statistics: Communicating with Numbers
1 Spatial and Spatio-temporal modeling of the abundance of spawning coho salmon on the Oregon coast R Ruben Smith Don L. Stevens Jr. September.
Comparison of Variance Estimators for Two-dimensional, Spatially-structured Sample Designs. Don L. Stevens, Jr. Susan F. Hornsby* Department of Statistics.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Sample Size Determination CHAPTER thirteen.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Receptor Occupancy estimation by using Bayesian varying coefficient model Young researcher day 21 September 2007 Astrid Jullion Philippe Lambert François.
Sampling Design and Analysis MTH 494 Lecture-30 Ossam Chohan Assistant Professor CIIT Abbottabad.
1 Enhancing Small Area Estimation Methods Applications to Istat’s Survey Data Ranalli M.G. ~ Università di Perugia D’Alo’ M., Di Consiglio L., Falorsi.
Sampling Design and Analysis MTH 494 LECTURE-12 Ossam Chohan Assistant Professor CIIT Abbottabad.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok,
DAMARS/STARMAP 8/11/03# 1 STARMAP YEAR 2 N. Scott Urquhart STARMAP Director Department of Statistics Colorado State University Fort Collins, CO
Clustering and Testing in High- Dimensional Data M. Radavičius, G. Jakimauskas, J. Sušinskas (Institute of Mathematics and Informatics, Vilnius, Lithuania)
Chapter Thirteen Copyright © 2004 John Wiley & Sons, Inc. Sample Size Determination.
The final exam solutions. Part I, #1, Central limit theorem Let X1,X2, …, Xn be a sequence of i.i.d. random variables each having mean μ and variance.
Chapter 9 Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis.
VARYING DEVIATION BETWEEN H 0 AND TRUE  SEQUENCE OF GRAPHS TO ILLUSTRATE POWER VARYING DEVIATION BETWEEN H 0 AND TRUE  N. Scott Urquhart Director, STARMAP.
1. 2 DRAWING SIMPLE RANDOM SAMPLING 1.Use random # table 2.Assign each element a # 3.Use random # table to select elements in a sample.
Sampling Design and Analysis MTH 494 LECTURE-11 Ossam Chohan Assistant Professor CIIT Abbottabad.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Small area estimation combining information from several sources Jae-Kwang Kim, Iowa State University Seo-Young Kim, Statistical Research Institute July.
Virtual University of Pakistan
Chapter 4. Inference about Process Quality
Calibrated estimators of the population covariance
LESSON 18: CONFIDENCE INTERVAL ESTIMATION
Estimation and Confidence Intervals
TROUBLESOME CONCEPTS IN STATISTICS: r2 AND POWER
Determining Which Method to use
Presentation transcript:

Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey, F. Jay Breidt, Colorado State University Abstract In aquatic resources, a two-stage sampling design can be employed to make the best use of what are often limited time and financial resources. Even with the ability to focus such resources, it is often the case that the sample sizes are not sufficiently large to make model-free inferences. The presence of auxiliary information for the regions of interest suggests employing a model in our inferences. Breidt, Claeskens, and Opsomer (2003) propose incorporating this auxiliary information through a class of model-assisted estimators based on penalized spline regression in single stage sampling. Zheng and Little (2003) also use penalized spline regression in a model-based approach for finite population estimation in a two-stage sample. In a survey context, weights computed from a set of auxiliary information are often applied to many study variables. With this approach, model-assisted estimators should fare better than model-based estimators. We compare the two through a series of simulations. This research is funded by U.S.EPA – Science To Achieve Results (STAR) Program Cooperative Agreements # CR – and # CR – Funding/Disclaimer The work reported here was developed under the STAR Research Assistance Agreement CR and CR awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This poster has not been formally reviewed by EPA. The views expressed here are solely those of the presenter and the STARMAP, the Program he represents. EPA does not endorse any products or commercial services mentioned in this poster. Case A: Cluster Level Auxiliaries (Our focus) The auxiliary information is available for all clusters in the population Leads to regression modeling of quantities associated with the clusters, such as cluster totals Cluster quantities can be computed for all clusters Population quantities can be computed from cluster estimates Example: Lake represents a cluster; auxiliary information is elevation Case B: Complete Element Level Auxiliaries The auxiliary information is available for all elements in the population Leads to regression modeling of quantities associated with the elements Cluster and population quantities can then be computed from element estimates and observations Example: EMAP hexagon is cluster; lake is element; auxiliary information is elevation Case C: Limited Element Level Auxiliaries The auxiliary information is available for all elements in selected clusters only Leads to regression modeling of quantities associated with the elements Regression estimators can be used for cluster-level quantities only for the clusters selected in the first-stage sample Example: Aerial photography of selected sites (clusters); for each point (element) in site, we have percent forested, urban, industrial Case D: Limited Cluster Level Auxiliaries The auxiliary information is available for all clusters in the first-stage sample Not a very interesting case Design-based estimator can be used for population quantities In some cases, good estimators for population quantities are not available Example: Cluster is lake; auxiliary information is measure of size which is not available until site is visited Generating Responses 500 PSUs; the number of SSUs per cluster ~ Uniform(50, 400)  PSU = m(  I ) + , where m(  ) is one of the eight functions below and  ~ N(0,  2 I) – We use first order inclusion probabilities proportional to size (pps) – Auxiliary data is often proportional to size of cluster Response of interest y ij =  i +  ij. where y ij is the jth element in the ith cluster and  ij ~iid N(0,  2 ) Two-Stage Sampling The population of elements U = {1,…, k,…, N} is partitioned into clusters or primary sampling units (PSUs), U 1,…, U i,…,. So, where N i is the number of elements or secondary sampling units (SSUs) in U i. First stage: A sample of clusters, s I, is selected based on a design, p I (  ) with inclusion probabilities  Ii and  Iij. –  Ii and  Iij are the first and second order inclusion probabilities, respectively Second stage: For every i  s I, a sample s i is drawn from U i based on the design p i (  | s I ) Typically require second stage design to be invariant and independent of the first stage Two-Stage Sampling with Aquatic Resources Time and expense constraints may make two-stage sampling more efficient Auxiliary information may be available on different scales The Estimators (for population totals) Horvitz-Thompson (HT) where Model-assisted where is the PSU total predicted by the model Model-based where is the ith cluster mean predicted by the model Comments on Simulation Results 500 samples from each of the populations were drawn H-T = Horvitz-Thompson estimator M-A: lin = Model-assisted estimator using a linear model M-B: pmmra = Model-based estimator using a penalized spline and including a random effect for PSU M-A: pmm = Model-assisted estimator using a penalized spline with no random effect for PSU Point represents MSE Estimator :MSE Model-assisted estimator with radom effect for PSU Vertical black bars represent approximate 95% confidence intervals Model-assisted estimator with random effect for PSU is as efficient or more efficient than model-based estimator; we do not appear to lose efficiency (with respect to MSE) by using model-assisted non-parametric methods Notes on the Models and Model Parameters 3 different models used – Linear – Penalized spline with random effect for PSU – Penalized spline with no random effect for PSU In a survey context, such as those found in environmental monitoring, it is often desirable to obtain a single set of survey weights that can be used to predict any study variable. To accommodate this: – Smoothing parameter for spline is selected by fixing the degrees of freedom for the smooth rather than using a data driven approach – Variance component for PSU effect is computed for the linear model and resulting covariance matrix and corresponding survey weights are applied to samples from other data sets – In this kind of survey context, model-assisted estimators have good efficiency properties and should be superior to model-based estimators which rely on correct specification of variance components