Download presentation
Presentation is loading. Please wait.
Published byMeredith Harvey Modified over 6 years ago
1
Improvements in editing methods and processes for use of Value Added Tax data in UK National Accounts Martina Portanti and Robert Breton Office for National Statistics, UK
2
Overview ONS use of VAT turnover data Data challenges
Current methods micro editing and processing Recommended methods improvements System Timeline
3
ONS use of VAT data Historically VAT used as a source for the Business register 2013 ESSNet on Administrative data provided opportunity to explore methods to use VAT turnover data for producing business statistics Ad-hoc system built to process data for further research/exploration Since then, attempt to re-use system for statistical production purposes Limitations in both methods and system
4
Data challenges Data needs to be calendarised to a monthly series
Data needs to be linked and apportioned to Reporting Unit level to ensure correct industrial classification and identify business employment sizeband 1. Businesses can file VAT returns on any of 16 patterns (monthly; 3 x quarterly; 12 x annual). 2. VAT returns are available at VAT reporter level while ONS business statistics are compiled from information collected at reporting unit (RU) level.
5
Current process Three editing stages Take on After apportionment
1 Three editing stages Take on After apportionment After calendarisation Structural validation at stage 1 2 Content validation at stage 2 and 3 Thousands pounds rule Quarterly pattern rule Suspicious turnover rule 3
6
Issues Position of editing modules Validation methods Cleaning methods
Macro-editing stage Position of editing module. Only structural validation takes place at the VAT unit level; recommendation to include a content validation module earlier in the process Validation method – less problematic as current rules set to pick up systematic errors where the error mechanism is known; the suspicious turnover rule could be reviewed as necessary to ensure that is still the most efficient way of identifying large errors Over-reliance upon automatic cleaning. The suspicious turnover rule, in particular, is likely to cause overediting. A business that fail the suspicious turnover rule will have its current return discard and replaced by an imputed value using ratio imputation. The macro-editing stage (not captured in the diagram above) is quite resource intense. While this can be expected because of the new source, inefficiencies in the micro-editing stage also likely drive the amount of resources required at macro-level
7
Target process Key changes:
1 Key changes: Content validation module at VAT unit level Selective editing stage No editing after calendarisation 2 2 3
8
Recommended improvements/1
1. Content validation module at VAT unit level “The early, the better” £000 rule and quarterly pattern rule Error detection Error correction may still be required at RU level
9
Recommended improvements/2
2. Selective editing stage Clerical investigation of influential cases earlier in the process Accept Impute Manual construction Decrease macro-editing effort Identify genuine business changes Improve knowledge of new data source Improve micro-editing methods
10
Recommended improvements/3
3. No editing after calendarisation Ensure impact of editing vs calendarisation can be easily separated Allow more sophisticated calendarisation methods to be applied Streamline data processing system
11
Current system SAS-macro based system utilising 20 main macros
Around 200 files created to process each data period 1 day processing Lack of interactive features to query and clean data Lack of flexibility to test different methods/sequencing
12
Ongoing system development
Distributing computing technology SPARK Jupyter notebook Oozie Processing time down to 5 minutes to process 4 years of data Agile approach with collaborative work across methodologists, data scientists and IT developers Spark (coding language to take advantage of distributing power to split processing across nodes –py spark Oozie orchestration tool to put together sparkjobs in a workflow Jupyter notebook development tool to have code and graphs in the same place; interactive tool to quickly develop, explore data and communicate with business areas Then code can be lifted and put into a spark module for production processes
13
Jupyter notebook/1
14
Jupyter notebook/2
15
Jupyter notebook/3
16
Oozie workflow
17
Timeline July 2017 – delivery of a VAT turnover processing system
July 2017-December 2017 – National Accounts testing within NA systems End of 2017 – publication of Q3 quarterly national accounts estimates including VAT data for the first time
18
Thank you Contact details Martina Portanti martina.portanti@ons.gov.uk
Head of Processing, Editing and Imputation (Business surveys) Survey Methodology and Statistical Computing division Robert Breton Head of Data Sources Data Science Economic Statistics Data Sources
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.