Rachel Fewster Department of Statistics, University of Auckland Variance estimation for systematic designs in spatial surveys.

Slides:



Advertisements
Similar presentations
Sampling Design, Spatial Allocation, and Proposed Analyses Don Stevens Department of Statistics Oregon State University.
Advertisements

Introduction Simple Random Sampling Stratified Random Sampling
COMM 472: Quantitative Analysis of Financial Decisions
1.2 Investigating Populations. Learning Objectives Recap yesterday. Elaborate on a few things. Study the different ecological techniques used to study.
Estimation in Sampling
Appropriate Sampling Ann Abbott Rocky Mountain Research Station
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Dr. Chris L. S. Coryn Spring 2012
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 17: Repeated-Measures ANOVA.
Evaluating Hypotheses
Clustered or Multilevel Data
Why sample? Diversity in populations Practicality and cost.
Chapter 14 Conducting & Reading Research Baumgartner et al Chapter 14 Inferential Data Analysis.
Variability Measures of spread of scores range: highest - lowest standard deviation: average difference from mean variance: average squared difference.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Amos Storkey, School of Informatics. Density Traversal Clustering and Generative Kernels a generative framework for spectral clustering Amos Storkey, Tom.
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-1 Lesson 6: Sampling Methods and the Central Limit Theorem.
Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of Suppose.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson6-1 Lesson 6: Sampling Methods and the Central Limit Theorem.
Determining the Size of a Sample
Sampling and Sampling Theorem A Short Introduction by Brad Morantz.
Determining the Size of
Slides 13b: Time-Series Models; Measuring Forecast Error
Normal and Sampling Distributions A normal distribution is uniquely determined by its mean, , and variance,  2 The random variable Z = (X-  /  is.
Chapter 4 Measures of Variability
Sampling Design  M. Burgman & J. Carey Types of Samples Point samples (including neighbour distance samples) Transects line intercept sampling.
1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo
Definitions Observation unit Target population Sample Sampled population Sampling unit Sampling frame.
Statistical Methods, part 1 Module 2: Latent Class Analysis of Survey Error Models for measurement errors Dan Hedlin Stockholm University November 2012.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
1 Statistical Distribution Fitting Dr. Jason Merrick.
Part III Gathering Data.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
1 MARKETING RESEARCH Week 5 Session A IBMS Term 2,
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Lohr 2.2 a) Unit 1 is included in samples 1 and 3.  1 is therefore 1/8 + 1/8 = 1/4 Unit 2 is included in samples 2 and 4.  2 is therefore 1/4 + 3/8 =
Population Estimation Objective : To estimate from a sample of households the numbers of animals in a population and to provide a measure of precision.
Variability Pick up little assignments from Wed. class.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 7 Sampling and Sampling Distributions.
Academic Research Academic Research Dr Kishor Bhanushali M
Bootstraps and Jackknives Hal Whitehead BIOL4062/5062.

: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
Ka-fu Wong © 2003 Chap 6- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.
L15 – Spatial Interpolation – Part 1 Chapter 12. INTERPOLATION Procedure to predict values of attributes at unsampled points Why? Can’t measure all locations:
Non-Parametric Methods in Forest Models James D. Arney, Ph.D. Forest Biometrics Research Institute FBRI Annual Meeting December 4, 2012.
The effect of variable sampling efficiency on reliability of the observation error as a measure of uncertainty in abundance indices from scientific surveys.
IE241: Introduction to Design of Experiments. Last term we talked about testing the difference between two independent means. For means from a normal.
Time Series - A collection of measurements recorded at specific intervals of time. 1. Short term features Noise: Spike/Outlier: Minor variation about.
Introduction to statistics I Sophia King Rm. P24 HWB
Introduction to Inference Sampling Distributions.
ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Training Workshop on the ICCS 2009 database Weights and Variance Estimation picture.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Sampling Designs Outline
Spatially Explicit Capture-recapture Models for Density Estimation 5.11 UF-2015.
Density Estimation with Closed CR Models 5.10 UF-2015.
Sampling Design and Analysis MTH 494 LECTURE-11 Ossam Chohan Assistant Professor CIIT Abbottabad.
Survey sampling Outline (1 hr) Survey sampling (sources of variation) Sampling design features Replication Randomization Control of variation Some designs.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Chapter 4 Variability PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Seventh Edition by Frederick J Gravetter and Larry.
INTERPOLATION Procedure to predict values of attributes at unsampled points within the region sampled Why?Examples: -Can not measure all locations: - temperature.
Multiple Season Study Design. 2 Recap All of the issues discussed with respect to single season designs are still pertinent.  why, what and how  how.
Single Season Study Design. 2 Points for consideration Don’t forget; why, what and how. A well designed study will:  highlight gaps in current knowledge.
 Occupancy Model Extensions. Number of Patches or Sample Units Unknown, Single Season So far have assumed the number of sampling units in the population.
HW Page 23 Have HW out to be checked.
SUR-2250 Error Theory.
Chapter 1 – Ecological Data
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Rachel Fewster Department of Statistics, University of Auckland Variance estimation for systematic designs in spatial surveys

Method of estimating density of objects in a survey region. Line transect sampling

D # detections per unit area = p

D = p Line transect sampling Density, D

Estimate the variance of the ratio by the Delta method: “squared CVs add” D # detections per unit area = p ENCOUNTER RATE easy ENCOUNTER RATE VARIANCE: Largest and most difficult component Usually >70% of total variance

Encounter rate estimates mean detections per unit line length Encounter Rate and its variance

Inferential framework: which Var(n/L)? Animals from spatial p.d.f. Select lines Detect animals Variance is defined over conceptual survey repeats Find n/L

Inferential framework: which Var(n/L)? Animals from spatial p.d.f. Select lines Detect animals Variance is defined over conceptual survey repeats Find n/L Gained value of n/L from first survey

Same animals, new positions Second survey: Inferential framework: which Var(n/L)?

Select new lines Same animals, new positions Detect new animals Find new n/L Inferential framework: which Var(n/L)? Second survey:

Select new lines Same animals, new positions Detect new animals Find new n/L Inferential framework: which Var(n/L)? Gained value of n/L from second survey Overall, gives var(n/L) across the repeated surveys This is our ENCOUNTER RATE VARIANCE.

To estimate a variance, use repeated observations with the same variance Random-line estimator: makes no assumptions about the unknown distribution of objects; How to estimate Var(n/L)?

To estimate a variance, use repeated observations with the same variance Random-line estimator: makes no assumptions about the unknown distribution of objects; random variables are IID with respect to the design. How to estimate Var(n/L)?

Systematic Survey Designs Surveys usually use SYSTEMATIC transect lines, instead of random lines. Grid has random start-point

Systematic lines give LOWER VARIANCE than random lines in trended populations But the variance is HARD TO ESTIMATE A systematic sample has NO REPETITION: it is a sample of size 1!

Variance for systematic designs There is no general design-unbiased variance estimator for data from a single systematic sample Approaches to systematic variance estimation are: 1.Ignore the problem and use estimators for random lines 2.Use some form of post-stratification 3.Model the autocorrelation in the systematic sample Approach used to date

Variance for systematic designs There is no general design-unbiased variance estimator for data from a single systematic sample Approaches to systematic variance estimation are: 1.Ignore the problem and use estimators for random lines 2.Use some form of post-stratification 3.Model the autocorrelation in the systematic sample Approach in Fewster et al, Biometrics, 2009

But the stratified estimators are still biased sometimes – e.g. high sampling fraction, or population clustering Stratified variance estimators: results Can we do better…?

Variance for systematic designs There is no general design-unbiased variance estimator for data from a single systematic sample Approaches to systematic variance estimation are: 1.Ignore the problem and use estimators for random lines 2.Use some form of post-stratification 3.Model the autocorrelation in the systematic sample

Historical Note Many estimators for systematic designs originated in social statistics – discrete surveys Correlation will clearly exist in responses of neighbours, but modelling the correlation is hard!

But space is continuous! As a strip changes position very slightly it still covers many of the same objects.

But space is continuous! As a strip changes position very slightly it still covers many of the same objects. Idea: 1.Divide the region into hundreds of tiny ‘striplets’ 2.Allow the number of objects available in each striplet to be random variables X 1, X 2, …, X J 3.The number of objects available in any full strip is the sum of the objects in the constituent striplets

1.Divide the region into hundreds of tiny ‘striplets’ 2.Number of objects available in striplets 1, 2, …, J is X 1, X 2, …, X J 3. Number of objects available in any full strip is the sum of the objects in the constituent striplets. Expected number of objects per striplet Random number of objects per striplet, X 1, X 2, …, X J ~ Multinomial Striplet #objects available striplet position

1.Divide the region into hundreds of tiny ‘striplets’ 2.Number of objects available in striplets 1, 2, …, J is X 1, X 2, …, X J 3. Number of objects available in any full strip is the sum of the objects in the constituent striplets. Striplet #objects available striplet position Full strip at this position: 10 objects Full strip at next position: 7 objects Full strip at next position: 8 objects... etc

Recap: We want the variance in the encounter rate, n/L, over: 1.Moving grid; 2.Moving objects; 3.Detections Account for: 1.Large-scale trends 2.Small-scale noise

1. Trends in object density across the region Observed number of detections per unit search area #detections / unit area Points correspond to observed transects Fit a GAM to give a fitted object density for any search strip in the region x-coordinate

#detections / unit area x-coordinate 1. Trends in object density across the region Fit a GAM to give a fitted object density for any search strip in the region For any striplet j, we now have an expected number of objects available,  j

Expected number of objects per striplet,  j Striplet #objects available striplet position Account for: 1.Large-scale trends

Striplet #objects available striplet position Account for: 2. Small-scale noise Random number of objects per striplet, X 1, X 2, …, X J ~ Multinomial(N,  j /N) Striplet idea means we correctly model the autocorrelation between systematic grids

Striplet #objects available striplet position Account for: 2. Small-scale noise

Recap: We want the variance in the encounter rate, n/L, over: 1.Moving grid; 2.Moving objects; 3.Detections Variance in number of objects available is taken care of (1 & 2) Variance in detections is Binomial given #objects available (1 & 2)

Law of Total Variance: b is the grid placement: Mean and variance of #detections, n, given grid placement, is all that’s needed.

Striplet variance estimator:

Simulation Results: 3 habitat types but no clustering Clustering included

Simulation Results: Red lines give correct answers

Simulation Results: Ignoring the systematic design: appalling performance!

Simulation Results: Post-stratification: improvement but still clear bias

Simulation Results: Striplet method: huge improvement!

Spotted Hyena in the Serengeti

Short grass plains: prey herds congregate in wet season Long grass plains: unattractive in wet season

Spotted Hyena in the Serengeti Wet season: non-territorial ‘commuters’ (n=186) Dry season: territorial residents (n=53)

Wet season: highly clustered. cv(n/L) is: -17% ignoring systematic design -14% using poststratification -7% using striplets! Overall cv(D) is: -20% ignoring systematic design -17% using poststratification -11% using striplets The estimator matters!

Dry season: not clustered; small n cv(n/L) is: -15% ignoring systematic design -12% using poststratification -13% using striplets Overall cv(D) is: -23% ignoring systematic design -20% using poststratification -21% using striplets Not much difference

In Revision, Biometrics

1. For a systematic design, variance estimators based on random lines are not adequate for trended or clustered populations 2. Post-stratification improves estimation for trended pops, but far from perfect 3. New ‘striplet’ method huge improvement in all line/strip situations trialled to date Variance can be highly overestimated Conclusions

Striplet variance estimator: B is the number of possible grids, in discrete approximation  j is fitted #objects in striplet j g j (b) is fitted P(detection) in striplet j

Williams & Thomas, JCRM 2008 Application: British Columbia multi- species marine survey Select species with greatest and least trends in encounter rate for illustration

Greatest trend: Dall’s Porpoise Highest encounter rates on short lines Worst case!

Least trend: floating plastic garbage No trend in encounter rate with line length

Results Dall’s Porpoise: previous reported CV=31% Stratified methods: reported CV=19% Estimated CV=31% using Poisson-based estimator with no adjustment for systematic lines Estimated CV=19% using design-based estimator with post-stratification and overlapping strata

Results Floating garbage: previous reported CV=15% Stratified methods: reported CV=14% For untrended population, there is little difference in the different estimators

But space is continuous! As a strip changes position very slightly it still covers many of the same objects.

But space is continuous! As a strip changes position very slightly it still covers many of the same objects. Idea: 1.Divide the region into hundreds of tiny ‘striplets’ 2.Allow the number of objects available in each striplet to be random variables X 1, X 2, …, X J 3.The number of objects available in any full strip is the sum of the objects in the constituent striplets

1.Divide the region into hundreds of tiny ‘striplets’ 2.Number of objects available in striplets 1, 2, …, J is X 1, X 2, …, X J 3. Number of objects available in any full strip is the sum of the objects in the constituent striplets. Striplet #objects available striplet position Expected number of objects per striplet Random number of objects per striplet, X 1, X 2, …, X J ~ Multinomial

1.Divide the region into hundreds of tiny ‘striplets’ 2.Number of objects available in striplets 1, 2, …, J is X 1, X 2, …, X J 3. Number of objects available in any full strip is the sum of the objects in the constituent striplets. Full strip at this position: 10 objects Full strip at next position: 7 objects Full strip at next position: 8 objects... etc Striplet #objects available striplet position

1. Trends in object density across the region Observed number of detections per unit search area #detections / unit area Points correspond to observed transects Fit a GAM to give a fitted object density for any search strip in the region x-coordinate

1. Trends in object density across the region #detections / unit area Fit a GAM to give a fitted object density for any search strip in the region x-coordinate For any new grid placement, we now have an expected number of objects available for that grid