European Conference on Quality in Official Statistics Session on Quality reporting M. Carla Congia Fabio ISTAT - Italy Quality reporting in a short-term business survey based on administrative data Rome, 8-11 July 2008
Quality reporting Q Rome, 8-11 July 2008 The Italian Oros Survey Quality issues in using administrative data Peculiarities of data quality assessment Oros quality indicators and reporting Final remarks Outline
Quality reporting Q Rome, 8-11 July 2008 Since 2003 the Oros survey has released quarterly indicators on gross wages and total labour cost per FTE covering all size enterprises in the private non-agricultural sector (C to K sections Nace Rev. 1.1) Based on extensive use of administrative data (National Social Security Institute - INPS) combined with survey data on Large firms with more than 500 employees (Monthly Large Enterprise Survey) Provisional estimates based on the “provisional population” are released with a 70-days delay Final estimates are produced after 5 quarters on the basis of the “whole population” and complete updated information Meets also the requirements of the European regulations: STS - Short-Term Statistics LCI - Labour Cost Index (hourly labour cost index) The Oros Survey
Quality reporting Q Rome, 8-11 July 2008 National Social Security Institute - INPS All Italian firms in the private sector with at least one employee have to pay monthly social security contributions to INPS (roughly 1.3 million employers and 12 millions employees) DM10 form The Monthly Declaration is a highly detailed grid where information on total employment, wage-bills, paid days, overtime hours and social contributions is identified by specific administrative codes (about 5,000 valid codes) Each DM10 lays in several records (8 on average) Data capturing Every firm monthly transmits to INPS the DM10 in electronic format, not later than 30 days after the reference period Then the whole raw declarations are redirected to Istat at 35 days from the end of the reference period (about 10 millions records each month) The administrative source
Quality reporting Q Rome, 8-11 July 2008 A constrain became an opportunity At first INPS could not aggregate in the very strict time scheduled the DM10 data in the format required for Oros purposes. So the Istat strategy became “Catch what you can” “as quick as you can” from a typical “one collection-for one single output/product” The administrative data exploiting strategy to focus on the “whole data source”- the wage and contribution system Advantages microdata are exactly those sent by firms and this allows a more direct control of the aggregation/translation process a lot of information available for many other different statistical purposes Disadvantages a complex preliminary phase of checks and computation inside the single DM10 to get to the target variables at micro level a lot of data not necessarily useful for short-term objectives
Quality reporting Q Rome, 8-11 July 2008 The Oros challenge is to produce short-term indicators processing a huge quantity of very detailed microdata in a very short time scheduled coping with the frequent changes in the basic INPS metadata enterprises have to use DM10 form to take advantage of labour cost’s reduction policies and these contribution laws continuously change After preliminary studies INPS data have been considered to be suitable for Oros purposes but still statisticians have no quarterly ex-ante control over the quality of the raw administrative data Only a complex quality-oriented production process can assure ex-post quality coping with unusual problems Quality issues in using INPS administrative data
Quality reporting Q Rome, 8-11 July 2008 Final key checks - macroediting Quality issues in using INPS administrative data Fragmented and insufficient Inps metadata In-house Metadata database Highly disaggregated raw data Preliminary checks and accurate translation into statistical variables Integration with LE Survey data Checks to avoid double counting Continuos legislation changes
Quality reporting Q Rome, 8-11 July 2008 Relating to quality assessment of administrative data Eurostat recommends to produce: a source-specific report and a product- specific one In the Oros case the non-conventional use of administrative data implies that the two reports overlap…….while new approaches on administrative data quality assessment are empirically explored Oros practice has been developed trying: to find better tools to assess quality to manage the measurement of rather new indicators on: efficient and stable data capturing completeness and consistency of metadata stable traslation/retrieval of target statistical variables correct integration with LE survey data to quarterly produce quality indicators along the whole production process to meet both Istat and Eurostat requests on quality reporting Peculiarities of data quality assessment
Oros quality reporting: an overview Target group Frequency Oros Producers Internal users (of micro and macro data) Top managers and central quality managers External Expert users (Eurostat, IMF, BCE,.. ) General Public Once Annually Quarterly + PRODUCT + PROCESS Oros Process Monitoring Report + QUALITATIVE + QUANTITATIVE Survey Documentation and Methodological Handbook LCI Quality Report Istat Quality Report Quarterly LCI meta information Metadata in SDDS Oros PR explanatory notes SIDI information system for survey documentation
Quality reporting Q Rome, 8-11 July 2008 Survey Documentation and Methodological Handbook Initial basic quality assessment of the INPS administrative source to evaluate the suitability for the production of quarterly labour market indicators Concepts and definitions of variables and population Translation scheme of administrative information into statistical variables Coverage Reference time Accuracy Stability over time And obviously contening more about…….. the survey methods and the description of the whole production process
Quality reporting Q Rome, 8-11 July 2008 Metadata in Special Dissemination Data Standard format used to deliver information to the IMF Base page data, access by the public, integrity and quality Summary methodology statements key features enabling users to assess the suitability of the data for their purposes totally qualitative and compiled once: it is updated following the relevant changes in the methodology compiled for the 3 outputs and different users efforts to systematize Oros ConIstat - short-term indicators’TS database on Istat web-site Oros Eurostat LCI Eurostat STS Eurostat Metadata in SDDS format
Quality reporting Q Rome, 8-11 July 2008 Number of monthly records Number of DM10 forms Time lag between scheduled and actual delivery dates Process Monitoring Report 1 Quantitative indicators to keep continuosly under control and improve the quality along the whole Oros production process Some of them are also warning indicators : signal decisive problems or detect sources of error Main quality indicators for some key steps of the process: Data capturing Metadata Database updating Date of last updating of DM10 metadata on INPS web-site Number of new and expired DM10 codes by type Rate of new DM10 codes to include/exclude Number of official INPS acts to analyse
Quality reporting Q Rome, 8-11 July 2008 DM10 codes error rate=Number of impossible codes/Total number of codes DM10 codes edit rate=Number of codes changed by editing/Number of impossible codes Rate of duplicate units=Number of duplicate units/Total number of units Process Monitoring Report 2 Preliminary checks on administrative data Micro editing Edit rate=Number of unit edited/ Total number of units in scope for the item Total contribution to key estimates from edited values=Total weighted quantity for edited values on total weighted quantity for all final values
Quality reporting Q Rome, 8-11 July 2008 Process Monitoring Report 3 Number of units manually checked due to record linkage problems (i.e. mergers or split-ups recorded in different times) Integration with LE survey data Macroediting Number of suspicious aggregates identified automatically by TERROR or through graphical checks Number of outliers treated at micro or macro level Total contribution to the estimates from treated values Length of the homogeneous time series
Quality reporting Q Rome, 8-11 July 2008 Still experimental Oros has been involved in the pilot test quality indicators within a framework of a qualitative report coherent with Eurostat quality components disseminated within the System on the Quality (SIQual) available on Istat website external-user oriented subset of standard quality indicators appropriately chosen within those available from the Information System for Survey Documentation (SIDI) Response Rate Indicators on the Revision policy (MR, MAR) Timeliness for provisional data release Timeliness for definitive data release Length of the homogeneous time series description of non-sampling error, relevance, accessibility Istat Quality Report
Quality reporting Q Rome, 8-11 July 2008 Required by Eurostat to evaluate the quality of national LCI used to produce the European aggregate index LCI was established with an “harmonization of output” and not “harmonization of input” approach since 2004 the LCI QR has been annually produced standard structure based on Eurostat dimensions of quality with a further aspect “completeness” main standard quality indicators used: Revision policy (MR, MAR) Timeliness for provisional data release description of method for compiling hours worked (LCI denominator) LCI Quality Report Quarterly LCI meta information Standard Template mainly qualitative release-specific Changes in the labour market (collective agreements, laws) which has an impact on wages and labour cost Reasons of revisions in NSA, WDA and SA data
Quality reporting Q Rome, 8-11 July 2008 The Oros innovative quarterly use of administrative data forces to monitor peculiar aspects of quality not usually taken into consideration in the standard quality assessment approach suggested by Eurostat Several specific indicators to assess the quality of the process, in particular the metadata updating and the translation/aggregation of raw INPS data, have been implemented but they need to be more systematized These specific indicators are essential from the producer point of view, but they could also be used to report to the users the quality of some key issues On the other hand, the Oros survey satisfies the internal (SIDI, SiQual) and external (Eurostat) requests of standard quality reports A better integration of all the reviewed quality reporting tools is desirable but only partially achievable Final remarks