1 Methods for detecting errors in VAT Turnover data Phil Lewis Processing, Editing and Imputation branch Business Statistics Methods-Survey Methodology.

Slides:



Advertisements
Similar presentations
Katherine Jenny Thompson
Advertisements

Integrated Data Editing and Imputation Ton de Waal Department of Methodology Voorburg Statistics Netherlands ICES III conference, Montréal June 19, 2007.
Investigation of Treatment of Influential Values Mary H. Mulry Roxanne M. Feldpausch.
Evaluating the Effects of Business Register Updates on Monthly Survey Estimates Daniel Lewis.
Annual growth rates derived from short term statistics and annual business statistics Dr. Pieter A. Vlag, Dr. K. van Bemmel Department of Business Statistics,
Data Imputation United Nations Statistics Division (UNSD) 16 March 2011 Santiago, Chile.
Editing and Imputing VAT Data for the Purpose of Producing Mixed- Source Turnover Estimates Hannah Finselbach and Daniel Lewis Office for National Statistics,
An editing strategy for annual VAT-turnover Montreal June 18-21, 2007 * Jeffrey Hoogland Grietje van Haren.
Deliverable 2.8: Outliers Gary Brown Office for National Statistics UK.
Tool for Assessing Impact of Changing Editing Rules On Cost & Quality Alaa Al-Hamad, Begoña Martín, Gary Brown Processing, Editing & Imputation Branch.
1 Editing Administrative Data and Combined Data Sources Introduction.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
© 2013 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Introductory Statistics: Exploring the World through.
BCOR 1020 Business Statistics
Editing of mixed source data for turnover statistics Jeffrey Hoogland (SN) Work Session on Statistical Data Editing (Ljubljana, Slovenia, 9-11 May 2011)
Maintenance of Selective Editing in ONS Business Surveys Daniel Lewis.
Edit and Imputation of the 2011 Abu Dhabi Census Glenn Hui and Hanan AlDarmaki Statistics Centre - Abu Dhabi UNECE CES Work Session on Statistical Data.
Chapter 4 Measures of Variability
Combining administrative and survey data: potential benefits and impact on editing and imputation for a structural business survey UNECE Work Session on.
Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)
Using survey data collection as a tool for improving the survey process Silvia Biffignandi, Antonio Laureti Giulio Perani University of Bergamo Istat Istat.
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 14 Analysis.
Lecture Slides Elementary Statistics Twelfth Edition
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Determining Sample Size
GDP measurement issues Graeme Walker Head of National Accounts Productivity Puzzle Seminar: 16 October 2012.
ICES III - Johan Erikson1 Effects of offering web questionnaires as an option in enterprise surveys – The Swedish experience Johan Erikson Statistics Sweden.
Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE.
Introduction to Statistical Inference Probability & Statistics April 2014.
12th Meeting of the Group of Experts on Business Registers
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
Chapter 4 Statistics. 4.1 – What is Statistics? Definition Data are observed values of random variables. The field of statistics is a collection.
Sébastien CHAMI 5 May, 2010 Reengineering French structural business statistics An extended use of administrative data.
Integrating administrative and survey data in the new Italian system for SBS: quality issues O. Luzi, F. Oropallo, A. Puggioni, M. Di Zio, R. Sanzo Nurnberg,
Improving the Design of UK Business Surveys Gareth James Methodology Directorate UK Office for National Statistics.
A generic tool to assess impact of changing edit rules in a business survey – SNOWDON-X Pedro Luis do Nascimento Silva Robert Bucknall Ping Zong Alaa Al-Hamad.
THE MAIN INNOVATIONS OF DATA EDITING AND IMPUTATION FOR THE 2010 ITALIAN AGRICULTURAL CENSUS G. Bianchi, R. M. Lipsi, P. Francescangeli, G. Ruocco, A.
Chapter 8 Hypothesis Testing with Two Samples 1. Chapter Outline 8.1 Testing the Difference Between Means (Large Independent Samples) 8.2 Testing the.
Quality issues on the way from survey to administrative data: the case of SBS statistics of microenterprises in Slovakia Andrej Vallo, Andrea Bielakova.
IMPUTING MISSING ADMINISTRATIVE DATA FOR SHORT-TERM ENTERPRISE STATISTICS Pieter Vlag – Statistics Netherlands Joint work with DESTATIS, Statistics Estonia,
Deliverable 2.6: Selective Editing Hannah Finselbach 1 and Orietta Luzi 2 1 ONS, UK 2 ISTAT, Italy.
We provide information Challenges in the transition from traditional to register- based census in Austria High-level Seminar on Population.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 8-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Applying Process Indicators to Monitor the Editing Process.
A Strategy for Prioritising Non-response Follow-up to Reduce Costs Without Reducing Output Quality Gareth James Methodology Directorate UK Office for National.
The application of selective editing to the ONS Monthly Business Survey Emma Hooper Office for National Statistics
Challenges in Collecting Police-Reported Crime Data Colin Babyak Household Survey Methods Division ICES III - Montreal – June 20, 2007.
Jeroen Pannekoek - Statistics Netherlands Work Session on Statistical Data Editing Oslo, Norway, 24 September 2012 Topic (I) Selective and macro editing.
Using cluster analysis for Identifying outliers and possibilities offered when calculating Unit Value Indices OECD NOVEMBER 2011 Evangelos Pongas.
Distributions of the Sample Mean
© Federal Statistical Office, Institute for Research and Development in Federal Statistics, Elmar Wein Federal Statistical Office Introducing and implementing.
BPS - 3rd Ed. Chapter 131 Confidence Intervals: The Basics.
Methodology of Allocating Generic Field to its Details Jessica Andrews Nathalie Hamel François Brisebois ICESIII - June 19, 2007.
CBS-SSB STATISTICS NETHERLANDS – STATISTICS NORWAY Work Session on Statistical Data Editing Oslo, Norway, September 2012 Jeroen Pannekoek and Li-Chun.
Workshop on Price Index Compilation Issues February 23-27, 2015 Data Collection Issues Gefinor Rotana Hotel, Beirut, Lebanon.
Outlier Treatment in HCSO Present and future. Outline Outlier detection – types, editing, estimation Description of the current method Alternatives Future.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Selective Editing Strategies for the U.S. Census Bureau Trade Statistics Programs María García, Alison Gajcowski, and Andrew Jennings U.S. Census Bureau.
Multivariate selective editing via mixture models: first applications to Italian structural business surveys Orietta Luzi, Guarnera U., Silvestri F., Buglielli.
Testing the use of administrative data to edit the 2009 Agriculture Census Dolores Lorca National Statistical Institute of Spain.
Evaluating the benefits of using VAT data to improve the efficiency of editing in a multivariate annual business survey Daniel Lewis.
ПРОБЛЕМЫ ПЕРЕСЧЁТА КВЕД 2005 – КВЕД 2010 Bronislava Kaminskienė.
Chapter 4 Variability PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Seventh Edition by Frederick J Gravetter and Larry.
IMPACT EVALUATION PBAF 526 Class 5, October 31, 2011.
Theme (v): Managing change
Improvements in editing methods and processes for use of Value Added Tax data in UK National Accounts Martina Portanti and Robert Breton Office for National.
DEVELOPMENT OF IMPUTATION MODEL FOR SMALL ENTERPRISES
Jeroen Pannekoek, Sander Scholtus and Mark van der Loo
Data processing German foreign trade statistics
Presentation transcript:

1 Methods for detecting errors in VAT Turnover data Phil Lewis Processing, Editing and Imputation branch Business Statistics Methods-Survey Methodology

2 Outline of talk Detecting suspicious patterns Methods for detecting unit errors Consider 5 methods Comparing methods Results Conclusion and recommendations

3 Detecting suspicious patterns One of the problems with VAT Turnover data is that it is often not possible to re-contact businesses to get an idea of their true Turnover figure. It is often possible to identify errors in VAT Turnover data by considering the pattern of reported Turnover over a period.

4 Hoogland (2010) i.Zero Turnover in three quarters, positive Turnover in the other quarter ii.Zero Turnover in one quarter, positive Turnover in the other three quarters iii.Same Turnover in all four quarters iv.Same Turnover for three quarters, a different (positive) Turnover value in the other quarter v.Negative Turnover in any of the quarters

5 Methods for detecting unit errors in reported VAT Turnover If then assume the current VAT Turnover has been reported in thousands of pounds and multiply by 1000 to get a figure in pounds.

6 1 – Quartile distances in industry Turnover Based on a method described in Hoogland and Van Haren (2007) to identify unusually large or small Turnover by locating extreme values in the distribution of VAT Turnover within a particular industry and size class.

7 Suspicious Turnover is identified as follows. If Turnover > Q3 + [C × (Q3 – Median)] or Turnover < Q1 – [C × (Median – Q1)] C may be given different values for different industry and size classes.

8 2 – Period on period ratios Method 2 comes from De Jong (2003) and involves calculating period on period ratios for each business based on the contribution that business’s Turnover makes to its class. For each business calculate:

9 Then calculate Where is the value of Score in period t.

10 3 – Comparison with reporting history for the business The method is described in slightly different forms in Hoogland and Van Haren (2007), Lorenz (2010) and Röstel (2010). Note that this method only identifies suspiciously large Turnover.

11 If Turnover > £100 million and Turnover > 10 × mean Turnover for the business in the past 24 months. Then treat as suspicious.

12 4 – Quartile differences combined with measure of influence Refinement to method 1, inspired by Hoogland et al (2009). Calculate the influence as the proportion of VAT Turnover the business contributes to the total VAT Turnover in the industry and size class. Combine detection of suspicious values using quartile differences with the influence.

13 Identify unusual Turnover values using the quartile distances measure described in method 1. Reminder method 1 Suspicious Turnover: Turnover > Q3 + [C × (Q3 – Median)] or Turnover < Q1 – [C × (Median – Q1)]

14 Then for each business calculate This method effectively subsets businesses failing the quartile distance method, so that only the most influential are viewed as being suspicious.

15 5 – Hidiroglou-Berthelot method Compare to previous period’s value: Form the ratio r = current VAT turnover / previous VAT turnover Transform the ratio if r < m median then t = (r - m) / r otherwise t = (r - m) / m Define E = t x max { current VAT T/O, previous VAT T/O } v

16 Then calculate Suspicious businesses are then identified as follows: If or

17 A key difference between survey and administrative data is that with administrative data it is often not possible to re-contact the business and ask them to confirm any suspicious values. Evaluation of detection methods is not straightforward and cannot usually be definitive.

18 Comparing methods Diagnostics include the proportion of businesses identified as suspicious within each industry and size class and the average size (employment) and VAT Turnover of suspicious businesses compared with the rest of the class.

19 Results of testing detection methods with VAT data If businesses with larger Turnover values are of more importance: method 4 (Quartile differences &influence) and method 5 (Hidiroglou-Berthelot) offer the flexibility to give higher weight to those businesses.

20 Good quality historic data available then: method 2 (Period on period ratios) and method 3 (Comparison with history ) likely to give good results.

21 Method 1 (Quartile differences) and the related method 4 (Quartile differences &influence) should be effective in identifying extreme values when only the current period data are available.

22 Results of testing detection methods with VAT data

23 Estimated false hits

24 Conclusion and recommendations Each of these methods uses parameters which can be fine-tuned to identify an appropriate number of suspicious businesses. The effective values of these parameters are likely to differ between data sources. Therefore, rather than prescribe specific values, it is recommended that the parameters are set through analysis of the effect of the method on the VAT data under consideration.

25 Before applying any detection methods Suspicious patterns. It is recommended that VAT data are checked for these patterns before implementing any other error detection method. Unit errors: relatively easy to identify and correct. It is recommended that an automatic method is developed to detect and correct any unit errors in VAT Turnover data, before applying any other rules.

26 The final recommendation is that in developing methods for detecting errors in VAT Turnover data, it is always useful to understand the data source and the possible errors that may be found in it. In many cases, it will be necessary to liaise with the data providers to get this information.

27 References: De Jong, A. "Impect: Recent developments in harmonized processing and selective editing", Proceedings of UNECE Work Session on Statistical Data Editing, Madrid, October 2003: Web. Hidiroglou, M. A. and Berthelot, J.-M. “Statistical Editing and Imputation for Periodic Business Surveys”, Survey Methodology, June 1986, Vol. 12, No. 1, pp 73-83: Journal. Hoogland, J. "Editing strategies for VAT data", Seminar on 'Using administrative data in the Production of Business Statistics - Member States experiences', Rome, March 2010: Web. Hoogland, J. and Van Haren, G. "Editing and integrating VAT and SBS data", Proceedings of the third International Conference on Establishment Surveys (ICES-III), Montreal, June 2007: CD ROM.

28 References: Hoogland, J., Van Bemmel, K. and De Wolf, P-P. "Detection of potential influential errors in VAT turnover data used for short term statistics", Proceedings of UNECE Work Session on Statistical Data Editing, Neuchatel, October 2009: Web. Lorenz, R. "The integrated system of editing administrative data for STS in Germany", Seminar on 'Using administrative data in the Production of Business Statistics - Member States experiences', Rome, March 2010: Web. Seyb, A., Stewart, J., Chiang, G., Tinkler, I., Kupferman, L., Cox, V. and Allan, D. "Automated editing and imputation system for administrative financial data in New Zealand", Proceedings of UNECE Work Session on Statistical Data Editing, Neuchatel, October 2009: Web.

29 Extra information For method 2, we used a threshold of 25 as a compromise between the monthly and quarterly data. For method 3, we used the thresholds described in Hoogland and Van Haren (2007). For method 5, the Hidiroglou-Berthelot rule, we used a value of V = 1 to give extra weight to businesses with larger Turnover, as this has been shown to work well with business data in the past. The value of C for this method was 250. Method 1 used a value of C = 10 in the quartile method to give the same proportion of failures. For method 4 we chose a value of C = 8 in the quartile method and then prioritised the businesses failing that method by VAT Turnover to give a similar proportion of failures as methods 1 and 5.