Variance Estimation When Donor Imputation is Used to Fill in Missing Values Jean-François Beaumont and Cynthia Bocci Statistics Canada Third International.

Variance Estimation When Donor Imputation is Used to Fill in Missing Values Jean-François Beaumont and Cynthia Bocci Statistics Canada Third International Conference on Establishment Surveys Montréal, June 18-21, 2007

2 Overview Context Context Donor imputation Donor imputation Variance estimation Variance estimation Simulation study Simulation study Conclusion Conclusion

3 Context Population parameter to be estimated : Population parameter to be estimated : Domain total: Domain total: Estimator in the case of full response: Estimator in the case of full response: Calibration estimator Calibration estimator Horvitz-Thompson estimator Horvitz-Thompson estimator

4 Donor Imputation Imputed estimator : Imputed estimator : With donor imputation, the imputed value is With donor imputation, the imputed value is A variety of methods can be considered in order to find a donor l(k) for the recipient k A variety of methods can be considered in order to find a donor l(k) for the recipient k with

5 Donor Imputation Two simple examples: Two simple examples: Random Hot-Deck Imputation Within Classes Random Hot-Deck Imputation Within Classes Nearest-neighbour imputation Nearest-neighbour imputation Practical considerations that add some complexity to the imputation process: Practical considerations that add some complexity to the imputation process: Post-imputation edit rules Post-imputation edit rules hierarchical imputation classes hierarchical imputation classes

6 Imputation Model Most imputation methods can be justified by an imputation model: Most imputation methods can be justified by an imputation model: The donor imputed estimator is assumed to be approximately unbiased under the model: The donor imputed estimator is assumed to be approximately unbiased under the model:

7 CurrentVariance Estimation Methods Assuming negligible sampling fractions Assuming negligible sampling fractions Chen and Shao (2000, JOS) for NN imputation Chen and Shao (2000, JOS) for NN imputation Resampling methods Resampling methods Our method is closely related to: Our method is closely related to: Rancourt, Särndal and Lee (1994, proc. SRMS): Assumes a ratio model holds Rancourt, Särndal and Lee (1994, proc. SRMS): Assumes a ratio model holds Brick, Kalton and Kim (2004, SM): Condition on the selected donors Brick, Kalton and Kim (2004, SM): Condition on the selected donors

8 Imputation Model Approach Variance decomposition of Särndal (1992, SM): Variance decomposition of Särndal (1992, SM): For any donor imputation method, we have: For any donor imputation method, we have:

9 Estimation of the nonresponse variance The estimation of the nonresponse variance is achieved by estimating The estimation of the nonresponse variance is achieved by estimating Noting that the nonresponse error is: Noting that the nonresponse error is: Then, the nonresponse variance estimator is: Then, the nonresponse variance estimator is:

10 Estimation of the mixed component Similarly, the estimation of the mixed component is achieved by estimating Similarly, the estimation of the mixed component is achieved by estimating The mixed component estimator is: The mixed component estimator is: This component can be either positive or negative and may not always be negligible This component can be either positive or negative and may not always be negligible

11 Estimation of the sampling variance Let be the full response variance est. Let be the full response variance est. The strategy consists of The strategy consists of Estimating Estimating Replace by their estimates the unknown Replace by their estimates the unknown This leads to the sampling variance estimator: This leads to the sampling variance estimator:

12 Estimation of the sampling variance This strategy is essentially equivalent to This strategy is essentially equivalent to Randomly imputing the missing values using the imputation model Randomly imputing the missing values using the imputation model Computing the full response sampling variance estimator by treating these imputed values as true values Computing the full response sampling variance estimator by treating these imputed values as true values Repeating this process a large number of times and taking the average of the sampling variance estimates Repeating this process a large number of times and taking the average of the sampling variance estimates Similar to multiple imputation sampling variance estimator Similar to multiple imputation sampling variance estimator

13 Simulation study Generated a population of size 1000 Generated a population of size 1000 Two y-variables: Two y-variables: LIN: Linear relationship between y and x LIN: Linear relationship between y and x NLIN: Nonlinear relationship between y and x NLIN: Nonlinear relationship between y and x Two different sample sizes: Two different sample sizes: Small sampling fraction: n=50 Small sampling fraction: n=50 Large sampling fraction: n=500 Large sampling fraction: n=500 Response probability depends on x with an average of 0.5 Response probability depends on x with an average of 0.5

14 Simulation study Imputation: Nearest-Neighbour imputation using x as the matching variable Imputation: Nearest-Neighbour imputation using x as the matching variable Estimation of Estimation of LIN: Linear model in perfect agreement with the LIN y-variable LIN: Linear model in perfect agreement with the LIN y-variable NPAR: Nonparametric estimation using the procedure TPSPLINE of SAS NPAR: Nonparametric estimation using the procedure TPSPLINE of SAS

15 Simulation study Two objectives: Two objectives: Compare the two ways of estimating Compare the two ways of estimating LIN and NPAR LIN and NPAR Compare three nonparametric methods: Compare three nonparametric methods: NPAR NPAR NPAR_Naïve: NPAR with the sampling variance being estimated by the naïve sampling variance (Brick, Kalton and Kim, 2004) NPAR_Naïve: NPAR with the sampling variance being estimated by the naïve sampling variance (Brick, Kalton and Kim, 2004) CS : method of Chen and Shao (2000) CS : method of Chen and Shao (2000)

16 Results: Large sampling fraction Method Relative Bias in % RRMSE in % y-LINy-NLINy-LINy-NLIN LIN-2.4358.415.7514.1 NPAR-0.3-18.821.554.6

17 Results: Small sampling fraction Method Relative Bias in % RRMSE in % y-LINy-NLINy-LINy-NLIN NPAR-4.9-13.341.8245.4 NPAR_ Naïve -5.9-10.442.1265.8 CS-9.1-9.452.8257.8

18 Results: Large sampling fraction Method Relative Bias in % RRMSE in % y-LINy-NLINy-LINy-NLIN NPAR-0.3-18.821.554.6 NPAR_ Naïve -0.3-12.021.869.1 CS33.959.653.7118.7

19 Conclusion Nonparametric estimation of seems beneficial (robust) with Nearest-Neighbour imputation Nonparametric estimation of seems beneficial (robust) with Nearest-Neighbour imputation Our proposed method is valid even for large sampling fractions Our proposed method is valid even for large sampling fractions It seems to be slightly better to use our sampling variance estimator instead of the naïve sampling variance estimator It seems to be slightly better to use our sampling variance estimator instead of the naïve sampling variance estimator

20 Conclusion Work done in the context of developing a variance estimation system (SEVANI) Work done in the context of developing a variance estimation system (SEVANI) Methodology implemented in the next version 2.0 of SEVANI Methodology implemented in the next version 2.0 of SEVANI Estimation of : Estimation of : Linear model Linear model Nonparametric estimation Nonparametric estimation

21 Thanks - Merci For more information please contact Pour plus dinformation, veuillez contacter Jean-François Beaumont Jean-Francois.Beaumont@statcan.ca Cynthia Bocci Cynthia.Bocci@statcan.ca @statcan.ca

Variance Estimation When Donor Imputation is Used to Fill in Missing Values Jean-François Beaumont and Cynthia Bocci Statistics Canada Third International.

Similar presentations

Presentation on theme: "Variance Estimation When Donor Imputation is Used to Fill in Missing Values Jean-François Beaumont and Cynthia Bocci Statistics Canada Third International."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Variance Estimation When Donor Imputation is Used to Fill in Missing Values Jean-François Beaumont and Cynthia Bocci Statistics Canada Third International.

Similar presentations

Presentation on theme: "Variance Estimation When Donor Imputation is Used to Fill in Missing Values Jean-François Beaumont and Cynthia Bocci Statistics Canada Third International."— Presentation transcript:

Similar presentations

About project

Feedback