Download presentation
Presentation is loading. Please wait.
Published byLeslie Parker Modified over 9 years ago
1
Investigating improvements in quality of survey estimates by updating auxiliary information in the sampling frame using returned and modelled data Alan Bentley, Salah Merad and Kevin Moore
2
Overview Motivation Modelling Evaluation of benefits to estimation
3
Motivation Employment Headcount– current size stratifier 0-9; 10-19; 20-49; 50-99; 100-299; 300+ Issues Burden on businesses with large number of Part Time employees Homogeneity of strata Full Time Equivalent (FTE) Employees – suggested as alternative FTE = Full Time + 0.5*Part Time
4
Motivation Updating of register via a sample survey - Business Register and Employment Survey (BRES) Large businesses updated every year Small businesses less often Regression Modelling – suggested to improve timeliness of frame data Predict Full Time & Part Time; or Full Time Equivalent – for every local unit
5
Data Available Survey Data (current Business Register) Employees Region Industry Age Time of last update Number of local units in enterprise group Administrative Data Employees (from PAYE – Pay As You Earn) Turnover (from VAT – Value Added Tax)
6
Data Structure BR BRS BRBRS PAYE BRBRSPAYE VAT BRBRSPAYEVAT at least one of
7
Regression Modelling FTE Dependent Variable Modelling for business <100 employment
8
Regression Modelling Model identified includes the following covariates: Register employees PAYE employees VAT turnover Number of local units in enterprise group Time of last update Region Industry Significant interactions of these
9
Variable Transformations
10
Log Transformation
11
Model Residuals
12
Model Residuals – After Noise Added
13
Test for Constant Variance Breusch-Pagan test for heteroscedasticity Squared residuals regressed against covariates in substantive model Under null hypothesis: ~ Strong evidence to reject the null hypothesis: residuals appear to have non constant variance
14
Explanatory Power of the Model R2R2 Full Model 81.5 Simple Model – register employees as only predictor 79.6
15
Domain analysis of R 2 R2R2 IndustrySimple Model Full ModelDifference Manufacturing 82.184.22.1 Electricity, Gas & Water 68.068.80.9 Construction 62.968.15.2 Wholesale 81.683.41.8 Hotels and Restaurants 66.373.37.0
16
Model validation by data splitting Full Data Training Validation 50% R2R2 Training 81.7 Validation 81.4
17
Model validation by bootstrap Full Data Bootstrap Sample Sample with replacement Efron (1983) Over optimism less than 0.05%
18
Back-transformation Simple back-transformation will give under- estimates of the dependent variable on the original scale Wooldridge (2000) gives an adjustment for the log back-transformation:
19
Benefits to business survey estimation Monthly Production Inquiry (MPI) Monthly Inquiry into Distribution Services Sector (MIDSS) Using an expansion estimator: Assuming Neyman allocation, variance due to stratification:
20
Impact on Monthly Surveys Variance Indicator Stratification VariableMPI Turnover MIDSS Turnover Register Employment32.4181.5 Register FTE31.9141.7 Modelled FTE31.6133.0
21
Concluding Remarks Model identified for predicting FTE employees High R 2 and high predictive power Non constant variance Large reliance on one covariate – employment headcount Benefits to sample design and estimation FTE a useful frame variable Greatest benefit to sampling in service industries Additional benefit from modelling appears small
22
Areas for further work Improvements to modelling Heteroscedasticity – Multilevel modelling? More recent data (2005 – 2008) BRES data Improvements to evaluation Impact on other business sample surveys Impact at industry level Impact under ratio estimation Correlations between modelled FTE and survey variables: FTE as auxiliary Pilot study
23
Questions? Thank you for listening Contact: alan.bentley@ons.gov.uk
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.