Investigating improvements in quality of survey estimates by updating auxiliary information in the sampling frame using returned and modelled data Alan Bentley, Salah Merad and Kevin Moore
Overview Motivation Modelling Evaluation of benefits to estimation
Motivation Employment Headcount– current size stratifier 0-9; 10-19; 20-49; 50-99; ; 300+ Issues Burden on businesses with large number of Part Time employees Homogeneity of strata Full Time Equivalent (FTE) Employees – suggested as alternative FTE = Full Time + 0.5*Part Time
Motivation Updating of register via a sample survey - Business Register and Employment Survey (BRES) Large businesses updated every year Small businesses less often Regression Modelling – suggested to improve timeliness of frame data Predict Full Time & Part Time; or Full Time Equivalent – for every local unit
Data Available Survey Data (current Business Register) Employees Region Industry Age Time of last update Number of local units in enterprise group Administrative Data Employees (from PAYE – Pay As You Earn) Turnover (from VAT – Value Added Tax)
Data Structure BR BRS BRBRS PAYE BRBRSPAYE VAT BRBRSPAYEVAT at least one of
Regression Modelling FTE Dependent Variable Modelling for business <100 employment
Regression Modelling Model identified includes the following covariates: Register employees PAYE employees VAT turnover Number of local units in enterprise group Time of last update Region Industry Significant interactions of these
Variable Transformations
Log Transformation
Model Residuals
Model Residuals – After Noise Added
Test for Constant Variance Breusch-Pagan test for heteroscedasticity Squared residuals regressed against covariates in substantive model Under null hypothesis: ~ Strong evidence to reject the null hypothesis: residuals appear to have non constant variance
Explanatory Power of the Model R2R2 Full Model 81.5 Simple Model – register employees as only predictor 79.6
Domain analysis of R 2 R2R2 IndustrySimple Model Full ModelDifference Manufacturing Electricity, Gas & Water Construction Wholesale Hotels and Restaurants
Model validation by data splitting Full Data Training Validation 50% R2R2 Training 81.7 Validation 81.4
Model validation by bootstrap Full Data Bootstrap Sample Sample with replacement Efron (1983) Over optimism less than 0.05%
Back-transformation Simple back-transformation will give under- estimates of the dependent variable on the original scale Wooldridge (2000) gives an adjustment for the log back-transformation:
Benefits to business survey estimation Monthly Production Inquiry (MPI) Monthly Inquiry into Distribution Services Sector (MIDSS) Using an expansion estimator: Assuming Neyman allocation, variance due to stratification:
Impact on Monthly Surveys Variance Indicator Stratification VariableMPI Turnover MIDSS Turnover Register Employment Register FTE Modelled FTE
Concluding Remarks Model identified for predicting FTE employees High R 2 and high predictive power Non constant variance Large reliance on one covariate – employment headcount Benefits to sample design and estimation FTE a useful frame variable Greatest benefit to sampling in service industries Additional benefit from modelling appears small
Areas for further work Improvements to modelling Heteroscedasticity – Multilevel modelling? More recent data (2005 – 2008) BRES data Improvements to evaluation Impact on other business sample surveys Impact at industry level Impact under ratio estimation Correlations between modelled FTE and survey variables: FTE as auxiliary Pilot study
Questions? Thank you for listening Contact: