Investigating improvements in quality of survey estimates by updating auxiliary information in the sampling frame using returned and modelled data Alan.

Slides:



Advertisements
Similar presentations
Evaluating the Effects of Business Register Updates on Monthly Survey Estimates Daniel Lewis.
Advertisements

F-tests continued.
Inference in the Simple Regression Model
Using Business Taxation Data as Auxiliary Variables and as Substitution Variables in the Australian Bureau of Statistics Frank Yu, Robert Clark and Gabriele.
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
FACTORIAL ANOVA Overview of Factorial ANOVA Factorial Designs Types of Effects Assumptions Analyzing the Variance Regression Equation Fixed and Random.
Forecasting Using the Simple Linear Regression Model and Correlation
Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Inference for Regression
Sampling Strategy for Establishment Surveys International Workshop on Industrial Statistics Beijing, China, 8-10 July 2013.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Examining the use of administrative data for annual business statistics Joanna Woods, Ria Sanderson, Tracy Jones, Daniel Lewis.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Econ 140 Lecture 131 Multiple Regression Models Lecture 13.
Regression Diagnostics - I
Chapter 11 Multiple Regression.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Correlation & Regression
Improving Quality in the Office for National Statistics’ Annual Earnings Statistics Pete Brodie & Kevin Moore UK Office for National Statistics.
Regression and Correlation Methods Judy Zhong Ph.D.
Administrative Data at Statistics Canada – Current Uses and the Way Forward 27 th Voorburg Group Meeting Warsaw, Poland André Loranger October 4, 2012.
Returning to Consumption
Better Information for Regional Government Marie Cruddas, Minda Phillips & Pete Brodie, ONS. Presented by Martin Brand, ONS Methodology Directorate.
Use of administrative data in short term economic indicators Statistics NZ Rochelle Barrow.
Improvements in stratification in the UK's Office for National Statistics Pete Brodie, Martina Portanti & Emily Carless UK Office for National Statistics.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Use of Administrative Data in Statistics Canada’s Annual Survey of Manufactures Steve Matthews and Wesley Yung May 16, 2004 The United Nations Statistical.
Microeconometric Modeling William Greene Stern School of Business New York University.
The Future of Administrative Data ICES III End Panel Discussion Don Royce Statistics Canada June 2007.
A Strategy for Prioritising Non-response Follow-up to Reduce Costs Without Reducing Output Quality Gareth James Methodology Directorate UK Office for National.
The application of selective editing to the ONS Monthly Business Survey Emma Hooper Office for National Statistics
Performance of Resampling Variance Estimation Techniques with Imputed Survey data.
The new multiple-source system for Italian Structural Business Statistics based on administrative and survey data Orietta Luzi, Ugo Guarnera, Paolo Righi.
Examining Relationships in Quantitative Research
1 G Lect 7M Statistical power for regression Statistical interaction G Multiple Regression Week 7 (Monday)
Why Model? Make predictions or forecasts where we don’t have data.
Evaluating generalised calibration / Fay-Herriot model in CAPEX Tracy Jones, Angharad Walters, Ria Sanderson and Salah Merad (Office for National Statistics)
Impact of updating weights on tracking performance and volatility: Industry survey G. Bruno, L. Crosilla, P. Margani, A. Righi EU Workshop on Recent Developments.
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Correlation & Regression Analysis
IAOS Conference Shanghai 14th October 2008 The Impact of Technology and Innovation on the Performance of Businesses in the Irish Services Sector Steve.
KNN Ch. 3 Diagnostics and Remedial Measures Applied Regression Analysis BUSI 6220.
Evaluating the benefits of using VAT data to improve the efficiency of editing in a multivariate annual business survey Daniel Lewis.
Regional Seminar on Developing a Program for the Implementation of the 2008 SNA and Supporting Statistics Cenker Burak METİN September 2013 Ankara.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Heteroscedasticity Heteroscedasticity is present if the variance of the error term is not a constant. This is most commonly a problem when dealing with.
Heteroscedasticity Chapter 8
Why Model? Make predictions or forecasts where we don’t have data.
F-tests continued.
Correlation and Simple Linear Regression
Chow test.
General Linear Model & Classical Inference
Statistics in MSmcDESPOT
Correlation and Simple Linear Regression
Implementation of a more efficient way of collecting data SBS: use of administrative data Statistics Belgium June 2009.
Dublin, april 2012 Role of Business Register in coordinated sampling
Quality Aspects and Approaches in Business Statistics
Prepared by Lee Revere and John Large
Correlation and Simple Linear Regression
Simple Linear Regression and Correlation
Product moment correlation
Multivariate Linear Regression
Chapter 13 Additional Topics in Regression Analysis
Sampling and estimation
The Swedish survey on turnover in the service sector
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Investigating improvements in quality of survey estimates by updating auxiliary information in the sampling frame using returned and modelled data Alan Bentley, Salah Merad and Kevin Moore

Overview Motivation Modelling Evaluation of benefits to estimation

Motivation Employment Headcount– current size stratifier 0-9; 10-19; 20-49; 50-99; ; 300+ Issues Burden on businesses with large number of Part Time employees Homogeneity of strata Full Time Equivalent (FTE) Employees – suggested as alternative FTE = Full Time + 0.5*Part Time

Motivation Updating of register via a sample survey - Business Register and Employment Survey (BRES) Large businesses updated every year Small businesses less often Regression Modelling – suggested to improve timeliness of frame data Predict Full Time & Part Time; or Full Time Equivalent – for every local unit

Data Available Survey Data (current Business Register) Employees Region Industry Age Time of last update Number of local units in enterprise group Administrative Data Employees (from PAYE – Pay As You Earn) Turnover (from VAT – Value Added Tax)

Data Structure BR BRS BRBRS PAYE BRBRSPAYE VAT BRBRSPAYEVAT at least one of

Regression Modelling FTE Dependent Variable Modelling for business <100 employment

Regression Modelling Model identified includes the following covariates: Register employees PAYE employees VAT turnover Number of local units in enterprise group Time of last update Region Industry Significant interactions of these

Variable Transformations

Log Transformation

Model Residuals

Model Residuals – After Noise Added

Test for Constant Variance Breusch-Pagan test for heteroscedasticity Squared residuals regressed against covariates in substantive model Under null hypothesis: ~ Strong evidence to reject the null hypothesis: residuals appear to have non constant variance

Explanatory Power of the Model R2R2 Full Model 81.5 Simple Model – register employees as only predictor 79.6

Domain analysis of R 2 R2R2 IndustrySimple Model Full ModelDifference Manufacturing Electricity, Gas & Water Construction Wholesale Hotels and Restaurants

Model validation by data splitting Full Data Training Validation 50% R2R2 Training 81.7 Validation 81.4

Model validation by bootstrap Full Data Bootstrap Sample Sample with replacement Efron (1983) Over optimism less than 0.05%

Back-transformation Simple back-transformation will give under- estimates of the dependent variable on the original scale Wooldridge (2000) gives an adjustment for the log back-transformation:

Benefits to business survey estimation Monthly Production Inquiry (MPI) Monthly Inquiry into Distribution Services Sector (MIDSS) Using an expansion estimator: Assuming Neyman allocation, variance due to stratification:

Impact on Monthly Surveys Variance Indicator Stratification VariableMPI Turnover MIDSS Turnover Register Employment Register FTE Modelled FTE

Concluding Remarks Model identified for predicting FTE employees High R 2 and high predictive power Non constant variance Large reliance on one covariate – employment headcount Benefits to sample design and estimation FTE a useful frame variable Greatest benefit to sampling in service industries Additional benefit from modelling appears small

Areas for further work Improvements to modelling Heteroscedasticity – Multilevel modelling? More recent data (2005 – 2008) BRES data Improvements to evaluation Impact on other business sample surveys Impact at industry level Impact under ratio estimation Correlations between modelled FTE and survey variables: FTE as auxiliary Pilot study

Questions? Thank you for listening Contact: