Download presentation
Presentation is loading. Please wait.
Published byGwendoline Gabriella Lawrence Modified over 9 years ago
1
1 Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys Breda Munoz Virginia Lesser R82-9096-01
2
2 This presentation was supported under STAR Research Assistance Agreement No. CR82-9096-01 awarded by the U.S. Environmental Protection Agency to Oregon State University. It has not been formally reviewed by EPA. The views expressed in this presentation are solely those of authors and EPA does not endorse any products or commercial services mentioned in this presentation.
3
3 Outline Missing data in environmental surveys Nonignorable missing data mechanism Model-based approach for nonignorable missing data Design-based estimation and nonignorable missing data Illustration Summary
4
4 Missing Data in Environmental Surveys Researchers in environmental studies must obtain access to selected sites to gather field data Denial of access: common problem in environmental surveys unit non-response affects the results of data analysis
5
5 Response Disposition 1995/1996 EMAP North Dakota Prairie Wetlands Studies (Lesser, 2001) Result 19951996 Private Landowners Agreed to access43%40% Refused access36%37% Undeliverable 2% Not returned/no contact16%14% Public Land 3% 7% Total 100%
6
6 Introduction (Boward et.al.,1999) The 1995-1997 Maryland Biological Stream Survey Results: overall denial access rate of 10%. ODFW habitat surveys overall rate of access denial (Flitcroft et.al., 2002): 1998: 10.0% 1999: 6.0% 2000: 12.5%
7
7 Assumptions A probability sampling design to collect outcomes of a spatial random process Y is a collection of sampling sites selected using the probability sampling design. auxiliary variables
8
8 Smith, Skinner and Clark (1999), Rubin and Little (2002) X1X1 X2X2 YR Missing Mechanism: Missing Completely at Random (MCAR)
9
9 X1X1 X2X2 YR Missing Mechanism: Missing at Random (MAR) Smith, Skinner and Clark (1999), Rubin and Little (2002)
10
10 X1X1 X2X2 YR Missing Mechanism: Nonignorable Smith, Skinner and Clark (1999), Rubin and Little (2002)
11
11 Model-based Approach Under a nonignorable mechanism: we model the joint probability of the data and the missing mechanism indicator (“response” indicator) : R(s i ) ~ Bernoulli(p i ), Data model Missing Mechanism model covariates
12
12 Model-assisted estimation and nonignorable missing data Assume the parameter of interest: Total of the response Y R
13
13 Model-assisted estimation and nonignorable missing data Continuous form of the Horvitz-Thompson estimator for the total (Cordy, 1993): Let be a collection of fixed values
14
14 Model-assisted estimation (cont.) Sample size n: observed, n-n* missing nonignorable missing
15
15 Model-assisted estimation (cont.) denotes the
16
16 Model-assisted estimation (cont.) Likelihood:
17
17 Model-assisted estimation (cont.) Reparameterize model parameters ( Baker and Laird (1988 )): Expected cell counts
18
18 Model-assisted estimation (cont.) Use EM algorithm to estimate expected counts of missing cells, M ij. E-step:
19
19 M-step: iterative proportional fitting (IPF) (Bishop et.al., 1975) Algorithm based on fit of marginal totals. EM algorithm always converges to a solution when using IPF in the M-step (Baker and Laird, 1988) Model-assisted estimation (cont.)
20
20 Possible estimators for the total of Y: Cell adjustment: Model-assisted estimation (cont.) adjustment weight (Little and Rubin, 2002)
21
21 Column adjustment: Model-assisted estimation (cont.)
22
22 Row adjustment: Model-assisted estimation (cont.)
23
23 Model-assisted estimation (cont.) Variance estimators obtained using bootstrap (Efron, 1994) Bootstrap produces asymptotically valid variance.
24
24 Illustration We simulate a continuous multivariate normal spatial random process for y Population: John Day Middle Fork stream reaches 143 stream reaches divided in survey segments (~1 mile) 6536 survey segments Area of 785 mi 2
25
25 Illustration The population of stream reaches was stratified in 6 strata based on the number of survey segments: “<10 ” “10-20” “20-30” “30-50” “50-100” “>100” Nonignorable missing data was generated as: Missing rates of 15%, 30% and 50% were created.
26
26
27
27 Population Summary Strata1Strata2Strata3Strata4Strata5Strata6 Size246433269105912083321 Class Class 1 Class 2 64.23% 35.77% 65.13% 34.87% 64.31% 35.69% 65.44% 34.56% 65.48% 34.52% 61.70% 38.30% Summary Minimum Mean Max -2.07 1.63 7.01 -2.99 1.68 7.95 -3.96 1.66 8.04 -2.18 1.70 6.15 -2.37 1.73 8.65 -5.47 1.80 9.87
28
28 Illustration Sample size n = 100 Allocation proportional to number of survey segments on each strata Q 1 = first sample quantile
29
29
30
30 Modified Bootstrap We draw 1000 random samples of size 100 from the observed sample: Independently across strata Maintain proportional allocation Maintain the row totals by the auxiliary variable For each of the 1000 samples, we estimate We obtain a standard error and MSE for each estimate We repeat this process 1000 times
31
31 Summary
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.