UNITED NATIONS - ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Work Session on Statistical Data Editing Use of administrative data.

Slides:



Advertisements
Similar presentations
Inter-relationship Between China s Input- Output Estimation, production-based GDP and Expenditure-based GDP PENG Zhilong Department of National Accounts,
Advertisements

Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
United Nations Statistics Division/DESA
MANAGERIAL ACCOUNTING
Sampling Strategy for Establishment Surveys International Workshop on Industrial Statistics Beijing, China, 8-10 July 2013.
Sensitivity Analysis for Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research and Quality (AHRQ)
Business Statistics for Managerial Decision
Evaluating Hypotheses
THE IDENTIFICATION PROBLEM
OECD Short-Term Economic Statistics Working PartyJune Analysis of revisions for short-term economic statistics Richard McKenzie OECD OECD Short.
Trade and business statistics: use of administrative data Lunch Seminar Enrico Giovannini Italian National Statistical Institute (ISTAT) New York, February,
Regression and Correlation Methods Judy Zhong Ph.D.
Combining administrative and survey data: potential benefits and impact on editing and imputation for a structural business survey UNECE Work Session on.
Carmela Pascucci – Istat - Italy Meeting of the Working Party on International Trade in Goods and Trade in Services Statistics (WPTGS) Linking business.
Electronic reporting in Poland 27th Voorburg Group Meeting Warsaw, Poland October 1st to October 5th, 2012 Central Statistical Office of Poland.
Chapter 7. Balancing supply and use Comments and suggestions. By Liv Hobbelstad Simpson. 1. Why SNA 1993 and not 2008 SNA The Global Office has decided.
Financial Statement Modeling & Spreadsheet Engineering “Training in spreadsheet modeling improves both the efficiency and effectiveness with which analysts.
Use of survey (LFS) to evaluate the quality of census final data Expert Group Meeting on Censuses Using Registers Geneva, May 2012 Jari Nieminen.
1 COMMENTS ON THE PAPER “China’s Measure in Real Term for Education” Ramesh Kolli Additional Director General Ministry of Statistics & Programme Implementation.
Measurement of capital stocks for government in the UK Nollaig Griffin Working Party for National Statistics OECD October 2005.
M. Velucchi, A. Viviani, A. Zeli New York University and European University of Rome Università di Firenze ISTAT Roma, November 21, 2011 DETERMINANTS OF.
Slide 2.1 Accounting and Reporting on an Accrual Accounting Basis Chapter 2.
African Centre for Statistics United Nations Economic Commission for Africa Chapter 6: Chapter 6: Data Sources for Compiling SUT Ramesh KOLLI Senior Advisor.
Integrating administrative and survey data in the new Italian system for SBS: quality issues O. Luzi, F. Oropallo, A. Puggioni, M. Di Zio, R. Sanzo Nurnberg,
Use of Administrative Data in Statistics Canada’s Annual Survey of Manufactures Steve Matthews and Wesley Yung May 16, 2004 The United Nations Statistical.
Draft Handbook on Measuring Intellectual Property Products: Estimating Mineral Exploration Presentation to OECD Working Party on National Accounts, Paris,
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
Quality issues on the way from survey to administrative data: the case of SBS statistics of microenterprises in Slovakia Andrej Vallo, Andrea Bielakova.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
The Italian new survey COEN – 2011: an innovative editing procedure, Giovanni Seri – Nuremberg, 11/09/2013 The Italian new survey on Enterprises Final.
Use of web scraping and text mining techniques in the Istat survey on “Information and Communication Technology in enterprises” Giulio Barcaroli(*), Alessandra.
Q20101 National accounts revisions: Italian manufacturing productivity analysis Alessandro Faramondi Istat – National Statistical Institute.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
Jeroen Pannekoek - Statistics Netherlands Work Session on Statistical Data Editing Oslo, Norway, 24 September 2012 Topic (I) Selective and macro editing.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
The new multiple-source system for Italian Structural Business Statistics based on administrative and survey data Orietta Luzi, Ugo Guarnera, Paolo Righi.
Recommended Practices for Editing and Imputation in the European Statistical System: the EDIMBUS Project* Orietta Luzi (Istat, Italy) Ton De Waal (Statistics.
Copyright 2010, The World Bank Group. All Rights Reserved. 1 GOVERNMENT FINANCE STATISTICS INTRODUCTION TO GOVERNMENT FINANCE STATISTICS Part 2 This lecture.
Danila Filipponi Simonetta Cozzi ISTAT, Italy Outlier Identification Procedures for Contingency Tables in Longitudinal Data Roma,8-11 July 2008.
1. 2 Traditional Income Statement LO1: Prepare a contribution margin income statement.
The United States Research and Development Satellite Account: Estimates and Challenges Brent R. Moulton Joint UNECE/Eurostat/OECD Meeting on National Accounts.
Impact of updating weights on tracking performance and volatility: Industry survey G. Bruno, L. Crosilla, P. Margani, A. Righi EU Workshop on Recent Developments.
Statistik.atSeite 1 Norbert Rainer Quality Reporting and Quality Indicators for Statistical Business Registers European Conference on Quality in Official.
ICES 2007 Labour Cost Index and Sample Allocation Outi Ahti-Miettinen and Seppo Laaksonen Statistics Finland (+ University of Helsinki) Labour cost index.
Implementation of the 2008 System of National Accounts in Azerbaijan and some challenges of FISIM estimation Author: Nuru Suleymanov State Statistics Committee.
 Statistical Center of Iran is legal responsible for compilation of economic statistics and national accounts. National accounts compile by Economic.
United Nations Statistics Division Work Programme on Economic Census Vladimir Markhonko, Chief Trade Statistics Branch, UNSD Youlia Antonova, Senior Statistician,
Eurostat Weighting and Estimation. Presented by Loredana Di Consiglio Istituto Nazionale di Statistica, ISTAT.
The challenge of a mixed-mode design survey and new IT tools application: the case of the Italian Structure Earning Surveys Fabiana Rocci Stefania Cardinleschi.
Evaluating the Quality of Editing and Imputation: the Simulation Approach M. Di Zio, U. Guarnera, O. Luzi, A. Manzari ISTAT – Italian Statistical Institute.
Regression Analysis: Part 2 Inference Dummies / Interactions Multicollinearity / Heteroscedasticity Residual Analysis / Outliers.
China’s Input-Output Survey and Tabulation Method Comments 13th OECD – NBS Workshop on National Accounts November 40- December 4, 2009 Haikou, China Contact:
Multivariate selective editing via mixture models: first applications to Italian structural business surveys Orietta Luzi, Guarnera U., Silvestri F., Buglielli.
Eurostat Accuracy of Results of Statistical Matching Training Course «Statistical Matching» Rome, 6-8 November 2013 Marcello D’Orazio Dept. National Accounts.
Methods and software for editing and imputation: recent advancements at Istat M. Di Zio, U. Guarnera, O. Luzi, A. Manzari ISTAT – Italian Statistical Institute.
Needles Powers Crosson Financial and Managerial Accounting 10e Capital Investment Analysis 24 C H A P T E R © human/iStockphoto ©2014 Cengage Learning.
African Centre for Statistics United Nations Economic Commission for Africa Handbook on Supply and Use Table: Compilation, Application, and Good Practices.
African Centre for Statistics United Nations Economic Commission for Africa Expert Group Meeting: to review “Handbook on Supply and Use Table, Compilation,Application,and.
Negative underwriting loss turning into positive profit — Explore the role of investment income for U.S. Property and Casualty insurers Shuang Yang Department.
Chapter 4 Basic Estimation Techniques
Compilation and Dissemination of Distributive Trade Statistics
An R package for selective editing based on a latent class model
Lecture Slides Elementary Statistics Thirteenth Edition
CHAPTER 29: Multiple Regression*
ADMINISTRATIVE DATA IN ANNUAL BUSINESS STATISTICS OF LATVIA
A New Business Statistics in Finland - Quarterly Investments
DEVELOPMENT OF IMPUTATION MODEL FOR SMALL ENTERPRISES
ANALYSIS OF POSSIBILITY TO USE TAX AUTHORITY DATA IN STS. RESULTS
Lecture was elaborated with the help of grant project of Ministry of Education, Youth and Sports, FRVŠ n „Innovation of Subject Financing of Building.
Country Report of the Statistical Center of Iran for Workshop on Integrated Economic Statistics and Informal Sector for ECO Member Countries November.
Presentation transcript:

UNITED NATIONS - ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Work Session on Statistical Data Editing Use of administrative data for selective editing: the case of business investments Marco Di Zio (Istat) Paolo Forestieri (Istat) Ugo Guarnera (Istat) Massimiliano Iommi (Istat) Antonio Regano (Istat) Paris, 28-30/04/2014

Outline 1.Motivations 2.Selective editing using SeleMix 3.Application of SeleMix to investment data 4.Validation 5.Conclusions and future research Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

Motivations (1/4) Investments (Gross Fixed Capital Formation) is of great interest in National Accounts because: They are an important component of final demand (and then of GDP) They are an input to estimate capital stock and consumption of fixed capital (that are needed to produce productivity and profitability indicators) Traditionally National Accounts department in Istat makes some editing to SBS data (it is not a simple “user” of SBS data) However the selective editing procedure we describe is not “National Accounts specific” Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

Motivations (2/4) Recently administrative data are entering the process of SBS in Istat Istat currently has access to a rich set of administrative data regarding variables reported in profit and loss accounts and (only for corporations and limited companies) in balance sheets. Investment data are reported only in the Notes to Financial Statements Notes comprising a summary of significant accounting policies and details of the values reported in the profit and loss account and in balance sheet Istat has access to the explanatory notes of corporations and limited companies only in the form of non-standardized text files (one for each company) No statistical database that could be used to produce SBS data on investment or to automatically correct data o investment values are only gathered through the SBS survey A valuable source to recover the “true” value of investment for a limited number of firms Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

Motivations (3/4) Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

Motivations (4/4) Recovering investment data from the explanatory notes is a time consuming process. The goal of selective editing framework is just to minimize the number of checks by focusing on the units where editing has the highest expected benefit. Typically, selective editing is useful if at the review stage it is possible to recover with a high reliability the ‘true’ value In the case of investments, the revision of the values of the units is done by consulting the Explanatory Notes. Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

Selective editing using SeleMix (1/3) The selective editing method is based on explicitly modelling both true (error-free) data and error mechanism (see Di Zio and Guarnera, 2013, for details). The model assumptions can be summarized as follows True data are thought of as n realizations from a random p-vector Y that, conditional on a set of q covariates Xis normally distributed with mean vector BX and covariance matrix Σ The intermittent nature of the error is modelled through a Bernoullian r.v Conditional on the presence of error, we assume a Gaussian additive error with zero mean and covariance matrix proportional to Σ The distribution of true data conditional on observed data is a mixture of a mass density corresponding to absence of error and a Gaussian distribution corresponding to presence of error The model parameters can be estimated by maximizing the likelihood function based on the observed data Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

Selective editing using SeleMix (2/3) The selective editing strategy consists in using the estimated distribution of true data conditional on observed data to build up a score function. Score function = difference between observed and anticipated value The anticipated value is obtained by means of a weighted average of the observed value and a synthetic value. o The weights are given by the probability of being in error. o The synthetic value is in turn the weighted average of the observed value and a robust estimate of the regressed value. Once all the observations have been ordered according to this score function, it is possible to estimate the residual error remaining in data after the correction of the first k units (k=1,..,n). The number of most critical units to be edited can be chosen so that the estimate of the residual error is below a prefixed threshold Selemix allows to explicitly relate the efforts in editing activities to the accuracy of the target estimates. Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

Selective editing using SeleMix (3/3) Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014 Y* true data Y observed data X covariates (no error) B regression coefficients U residuals I Bernoullian variable: True data model: Error model: Distribution of observed data:

Application of SeleMix to investment data (1/3) Data source: Istat Annual Survey on Economic and financial accounts of large enterprises, year 2010 Actually a census, covering all enterprises operating in Italy with at least 100 employees Industrial and services sectors excluding financial services EC Structural Business Statistics. Units: responding enterprises (5280 observations) Target variable: gross investment in tangible goods Covariates: investment of the same firm in the previous year (or, in the case it is not available, the variable depreciation for the current year) Covariates are considered not affected by errors since they are treated and officially released Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

Application of SeleMix to investment data (2/3) Estimation domains: Model estimated separately for each NACE Rev2 section Impact of errors evaluated at the level of 64 industries o corresponding to the classification of economic activity A*64 that is used to disseminate National Accounts data Threshold used for the selection of critical units is chosen such that the estimated relative residual error for each industry is 5% o threshold chosen taking into account the reasonable number of units that can be checked with the available resources The model is estimated on the subset of data with positive values of the analyzed variables o predictions are computed also for units whose observed value is zero A weight for taking into account the non-response is computed and used in the score function Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

Application of SeleMix to investment data (3/3) Hit-rate not very high but the number of selected units very low Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014 N. Obs.Selected% SelectedEditedHit-Rate %6544.5% SBS Original Value (A) Post- editing Value (B) (B - A)(B - A) / A * mlnE mlnE mlnE-8% Notwithstanding the low number of selected units, quite an high impact on the estimated value: efficient procedure

Validation (1/5) The assessment of the quality of the procedure cannot be based only on its efficiency It is also necessary to quantify the impact of the residual errors on the estimates. The ideal validation of the procedure would consist in revising all the not selected observations of the sample by looking at their reported values in the notes We could find the error left in each not selected unit (residual error) We could compute the impact of residual errors on the released estimate The manual revision of all not selected observations is not feasible in practice unless a big investment in resources is planned. Alternative strategy: Manual revision of all the not selected observations in some industries The analysis of the results in each industry is quite informative, because the threshold used (set equal to 5%) is referred to each industry Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

Validation (2/5) Manually revised three indutries Industry 17 and 32 satisfactory results Post editing residual percent error (RE) close to the expected one (5%) Strong improvement obtained selecting few units Industry 12 not very satisfactory: The selective procedure does not select any observation the error is 12.6%. Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014 IndustryN. Obs N. Selecte d obs. N. Edited Obs. RE - raw data RE - post editing 12 -Manufacture of basic pharmaceutical products and pharmaceutical preparations % 17 - Manufacture of computer, electronic and optical products %7.2% 32 - Water transport %-5.5%

Validation (3/5) What did happen in industry 12? The error is due almost entirely to one company that was not selected by the editing procedure: The company made the same misclassification error in 2009 and 2010 and the error in 2009 was not detected The company registered as investment of the year both expenditures made in 2010 and reclassifications across different asset categories made during the year Actually only expenditures of the year should be registred as investment of the year The company followed the same criteria (then made the same error) when compiling the SBS questionnaire both in 2009 and 2010 and the error in 2009 was not detected. The values of expenditures and reclassifications in 2010 are quite similar to the values in 2009 and are not outliers in the cross-section distribution Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

Validation (4/5) Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014 Reported only to illustrate the type of error. Eni is not the firm we referred to in the previous slide

Validation (5/5) From the point of view of the statistical model: The current value of investment is not atypical conditional on the historical value. In addition, since 2009 data were treated and officially released, they were used as error-free covariates in the model. This makes the selection of this error almost impossible. SeleMix may deal with the case where there is not any auxiliary variable. In this application, we could have considered also the investments of the previous year a variable with errors the observation would have been selected only if the values of the investments were jointly considered outliers Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

Summing up The selective editing procedure proved to be quite efficient: strong improvements in the results obtained selecting few units The selective editing procedure is considered by the survey managers feasible in terms of costs. The analysis performed to validate the method is encouraging and should be extended to other estimation domains. A result of the validation analysis is that it is difficult to select an observation with an erroneous investment which it is not atypical with respect to the historical value. Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

Future developments Using other administrative variables as covariate o Expenditures for amortizable goods reported in VAT declarations o A derived variable based on the assets at the end of the year minus assets at the beginning plus depreciation and revaluation Identifying observations with an erroneous investment which it is not atypical with respect to the historical value Increasing hit-rate. Experimenting with an alternative validation procedure, based on the use of samples of not selected observations. PPS sampling proportional to the scores estimated with SeleMix Revising all the observations belonging to the domains where the residual error estimated with the previous scheme is considerably far from the prefixed threshold Removing errors from the historical values Understanding whether there are other important error mechanisms affecting the quality of the procedure. Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

Thank you Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014