Examining the use of administrative data for annual business statistics Joanna Woods, Ria Sanderson, Tracy Jones, Daniel Lewis.

Slides:



Advertisements
Similar presentations
Statistics NZs experience in using Administrative Data in an Integrated Programme of Economic Vince Galvin General Manager Strategy & Communications.
Advertisements

Katherine Jenny Thompson
Evaluating the Effects of Business Register Updates on Monthly Survey Estimates Daniel Lewis.
Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Overview of Sampling Methods II
Possibilities of exploiting administrative data in short term statistics in Poland Jacek Kowalewski STATISTICAL OFFICE IN POZNAŃ.
Unido.org/statistics International workshop on industrial statistics 8 – 10 July, Beijing Non response in industrial surveys Shyam Upadhyaya.
Using Business Taxation Data as Auxiliary Variables and as Substitution Variables in the Australian Bureau of Statistics Frank Yu, Robert Clark and Gabriele.
Towards a Better Integration of Survey and Tax Data in the Unified Enterprise Survey Claude Turmelle Statistics Canada ICES-III Montréal, Québec, Canada.
Introduction to Sampling Distributions Chapter 7 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University.
Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 1 An Introduction to Business Statistics.
The estimation strategy of the National Household Survey (NHS) François Verret, Mike Bankier, Wesley Benjamin & Lisa Hayden Statistics Canada Presentation.
Editing and Imputing VAT Data for the Purpose of Producing Mixed- Source Turnover Estimates Hannah Finselbach and Daniel Lewis Office for National Statistics,
Week 3. Logistic Regression Overview and applications Additional issues Select Inputs Optimize complexity Transforming Inputs.
Sampling Strategy for Establishment Surveys International Workshop on Industrial Statistics Beijing, China, 8-10 July 2013.
1 5 th session: Financial Accounting Measures of Performance Performance Evaluation IMSc in Business Administration September 2010.
Results and next steps from the ESSnet Admin Data Alison Pritchard Business Outputs & Developments, Office for National Statistics, UK 4 December 2012.
QBM117 Business Statistics Statistical Inference Sampling 1.
STAT262: Lecture 5 (Ratio estimation)
The Excel NORMDIST Function Computes the cumulative probability to the value X Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
The Lognormal Distribution
Sampling Concepts Population: Population refers to any group of people or objects that form the subject of study in a particular survey and are similar.
Maintenance of Selective Editing in ONS Business Surveys Daniel Lewis.
Quality assuring the UK business register Andrew Allen.
UNECE Workshop on Confidentiality Manchester, December 2007 Comparing Fully and Partially Synthetic Data Sets for Statistical Disclosure Control.
A P STATISTICS LESSON 9 – 1 ( DAY 1 ) SAMPLING DISTRIBUTIONS.
Sample Design.
Improving Quality in the Office for National Statistics’ Annual Earnings Statistics Pete Brodie & Kevin Moore UK Office for National Statistics.
Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)
Measuring the quality of regional estimates from the ABS Jennie Davies and Daniel Ayoubkhani.
From Sample to Population Often we want to understand the attitudes, beliefs, opinions or behaviour of some population, but only have data on a sample.
Probabilistic and Statistical Techniques 1 Lecture 24 Eng. Ismail Zakaria El Daour 2010.
Use of administrative data in short term economic indicators Statistics NZ Rochelle Barrow.
Improvements in stratification in the UK's Office for National Statistics Pete Brodie, Martina Portanti & Emily Carless UK Office for National Statistics.
Use of Administrative Data in Statistics Canada’s Annual Survey of Manufactures Steve Matthews and Wesley Yung May 16, 2004 The United Nations Statistical.
Emerging methodologies for the census in the UNECE region Paolo Valente United Nations Economic Commission for Europe Statistical Division International.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Performance of Resampling Variance Estimation Techniques with Imputed Survey data.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 1-1 Statistics for Managers Using Microsoft ® Excel 4 th Edition Chapter.
Investigating improvements in quality of survey estimates by updating auxiliary information in the sampling frame using returned and modelled data Alan.
CHAPTER 12 Descriptive, Program Evaluation, and Advanced Methods.
ESSnet AdminData Methods of estimation for business statistics variables that cannot be obtained from administrative data sources (WP3) Duncan Elliott.
Evaluating generalised calibration / Fay-Herriot model in CAPEX Tracy Jones, Angharad Walters, Ria Sanderson and Salah Merad (Office for National Statistics)
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Index of Manufactured Exports Methodology Review 2012 Ratio Estimation methodology update Chainlinking and benchmarking to GCS / I-O (and SNAP) SIC 2007.
Chapter 4: Introduction to Predictive Modeling: Regressions
The Practice of Statistics Chapter 9: 9.1 Sampling Distributions Copyright © 2008 by W. H. Freeman & Company Daniel S. Yates.
Multivariate selective editing via mixture models: first applications to Italian structural business surveys Orietta Luzi, Guarnera U., Silvestri F., Buglielli.
1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
Evaluating the benefits of using VAT data to improve the efficiency of editing in a multivariate annual business survey Daniel Lewis.
Selecting a Sample. outline Difference between sampling in quantitative & qualitative research.
IAOS Shanghai – Reshaping Official Statistics Some Initiatives on Combining Data to Support Small Area Statistics and Analytical Requirements at.
The Evolution of Administrative Data Use for the Canadian Business Register (BR) IAOS Conference Gaétan St-Louis October 2008.
DEMAND FORECASTING & MARKET SEGMENTATION. Why demand forecasting?  Planning and scheduling production  Acquiring inputs  Making provision for finances.
1. Population sources Sampling process – Sample design – Sample selection – Proving 2.
4-6 September 2013, Vilnius Quality in Statistics: Administrative Data and Official Statistics USING ADMINISTRATIVE DATA SOURCES IN OFFICIAL.
Profitability Analysis
Statistical Analysis Urmia University
David Freeman Labour Market Division Office for National Statistics
Dublin, april 2012 Role of Business Register in coordinated sampling
Quality Aspects and Approaches in Business Statistics
ADMINISTRATIVE DATA IN ANNUAL BUSINESS STATISTICS OF LATVIA
Overview of Approaches to Register-Based Populating Censuses
New Techniques and Technologies for Statistics 2017  Estimation of Response Propensities and Indicators of Representative Response Using Population-Level.
The Swedish survey on turnover in the service sector
A bootstrap method for estimators based on combined administrative and survey data Sander Scholtus (Statistics Netherlands) NTTS Conference 13 March 2019.
Task Force on Small and Medium Sized Enterprise Data (SMED)
Small area estimation for the Dutch Investment Survey
Presentation transcript:

Examining the use of administrative data for annual business statistics Joanna Woods, Ria Sanderson, Tracy Jones, Daniel Lewis

Overview Background -Motivation -Admin data -Variables of interest Methods tested -Discontinuing the survey -Cut-off sampling Results Conclusions

Motivation Drive to increase the use of admin data for business statistics - reduce survey costs - decrease burden on survey respondents One possibility - replace survey data with admin data - Some variables have admin data directly available - Other variables do not have a direct source of admin data available

Annual Business Survey The Annual Business Survey (ABS) collects financial variables Target population = UK economy Stratified simple random sample by industry, region & employment Samples approximately 60,000 businesses Businesses with employment > 249 are completely enumerated Ratio estimation

Available administrative data Two main sources available: - VAT turnover data - Company accounts data (balance sheet variables) These overlap with, but do not fully cover, the target population Properties of these data sources are different

Survey population and admin data Survey population

Survey population and admin data Survey population Administrative data

Survey population and admin data Survey population Administrative data MATCHED PART

Administrative data sources VAT turnoverCompany Accounts (balance sheets) Created annual data sets for Annual data from April 2003 to March 2009 Matched to units in the survey population Complex matches to units in survey population Match rate 73-75%, few missing values Low match rate and many missing values

ABS variables ABS variables which do not have admin data directly available include Total Acquisitions – investment in land, existing buildings, and computers Total Disposals – sale of land and existing buildings Proportion of zeros varies within each sizeband Total Acquisitions: 71% for 0-9 emp 9% for >250 emp Total Disposals: 93% for 0-9 emp 43% for > 250 emp

Acquisitions & Disposals

Methods Tested Aim: to see if admin data sources can be helpful as auxiliary variables in estimating these totals to reduce the sample size. Discontinuing the survey -Predict values for investment variables based on models derived from past survey data. Cut-off sampling -Stop sampling some businesses -Use admin data to estimate for these units -Consider simple ratio adjustment

Methods Tested: Considerations Discontinuing the survey Cut-off sampling Advantages No survey is required (provided admin data is available for all) Reduces the burden placed on small businesses Reduces survey costs Disadvantages Model parameters fixed, cannot respond to changes in economy, may introduce bias Different models required for different survey variables Still requires a survey component May introduce bias

Methods tested: Discontinuing the survey Produce models using past survey & admin data to produce estimates Linear model – predict values for positive returns Logistic model – predict probability of positive return Build a model using data from last survey Model covariates can be admin data variables Apply model to future years & evaluate results.

Methods tested: Discontinuing the survey - Linear model Aim - predict values for acquisitions/disposals Have skewed data, use log transformation Use positive returns from year t to create a model Apply model to year t+1, t+2... to get predicted value for each business Back transform prediction to get back to original linear scale

Methods tested: Discontinuing the survey - Logistic model Aim – predict probability of company returning a positive value Use all returned data from year t to model the probability of a business returning a positive value Apply model to predicted values in year t+1 Multiply linear model prediction & logistic model probability to produce predicted value for every unit

Results: Discontinuing the survey Acquisitions Best linear model for predicting log(total acquisitions) – Intercept, – Standard Industrial Classification(SIC) at three digit level, – Region, – Employment band, – log turnover, – log turnover *SIC section R-squared = 0.66

Results: Discontinuing the survey Acquisitions Best logistic model for predicting probability of a positive return – Intercept, – SIC division level, – Region, – Employment band, – log turnover, Produced one of the lowest AIC

Results: Discontinuing the survey

Methods tested: Cut-off sampling Reduces burden but introduces bias Create a cut-off, based on employment Stop sampling below the cut-off Use sample information above the cut-off to estimate for units below the cut-off in an effort to reduce bias Missing data and match rates are the main difficulty => can’t be applied to full survey population, still need a sample

Simple ratio adjustment Estimate for units below the cut-off: Total of auxiliary variable below cut-off Estimate of variable of interest above cut-off Estimate of auxiliary variable above cut-off

Results: Simple ratio adjustment

Conclusions Discontinuing survey - not an option for this variable Under predicts Growth rates differ Cut-off sampling with simple ratio adjustment - can give reasonable results in some divisions but not all - sample size savings can be made where method works well but is dependent on match rate - multiple auxiliary variables are required

Any questions?