Wiesbaden Group on Business Registers International Roundtable on Business Survey Frames Tallinn, 27 – 30 September 2010 Simonetta Cozzi,ISTAT, Italy Wiesbaden.

1 Wiesbaden Group on Business Registers International Roundtable on Business Survey Frames Tallinn, 27 – 30 September 2010 Simonetta Cozzi,ISTAT, Italy Wiesbaden Group on Business Registers International Roundtable on Business Survey Frames Tallinn, 27 – 30 September 2010 Simonetta Cozzi,ISTAT, Italy Realisation of Outlets Register for the implementation of new probability sample strategy on the Consumer Price Index Survey

2 Introduction The presentation describes the process used to create a first version of the Outlets Register. The set-up of the register has been developed during the activities of a project, “Development of instruments for implementing a new sampling strategy and completing a new data collection system in the production of a Consumer Price Index” This project has been instituted following the directives of a scientific committee, composed by university professors and researchers of various institutions, establishing at ISTAT for steering and monitoring the innovation process for the production of the CPI.

4 Summary  The current Consumer Price Index (CPI)  The new sampling strategy proposed for the CPI  Outlets Register  Quality survey

5 Current Consumer Price Index: characteristic Three sampling stages for the local survey: The first stage units (PSU) are 83 municipalities The second stage units are the outlets purposively chosen in each PSU (about 40,000). The selection is made, through a kind of quota sampling, to be representative of the consumer behaviour. The most sold elementary items of the fixed basket* are observed in each selected outlet (about 400,000). The collection of prices is carried out in two different ways: 1.centrally, (about 60 products) by the staff of Istat, for products and services where there are national pricing policies and for prices that are difficult to observe directly; 2.locally, (about 500 products) directly by staff of Municipal Statistical Offices, from individual outlets at territorial level. * At the beginning of this year, Istat defined a fixed basket in which includes, relying on the purposive selection, 562 representative elementary items (products or services) that consist of groups of products that are as similar as possible and relatively homogeneous

6 Current Consumer Price Index: issues  The purposive sampling strategy used in the current survey structure prevents the exact computation of the sampling precision (the standard error) of the current estimates of the CPIs  Not all municipalities are included in the survey and it could cause biases if the not included municipalities display price movements which systematically differ from those of the included municipalities;  the selection criterion of the “most sold” elementary item of the product for each outlet under-represents the smaller brands and products and it could introduce unknown bias

7 New sampling strategy proposed: overview To improve the quality of the CPIs, a new probability sampling strategy has been proposed by the division for Information Technology and Methodology. The new sampling design is based on: the hypothesis that turnover is a good proxy of the household consumer expenditure the availability of a sampling frame consisting in a list of outlets defined on the basis of the information collected in the BR Relying on the frame availability, the sampling design consists of a three stage selection: 1.Local districts (selected within the geographical area through balanced sampling) 2.Outlets (D sample distinct, one for each product through a coordinated selection method to obtain an high level of overlapping of the selected samples for each type of product) 3.Items (based on an iterative hierarchical drawing of group of product)

8 The new sampling strategy for the CPI survey is based on the availability of a complete and sufficiently updated list of outlets (Outlets Register) The set-up of the Outlets Register has been carried out by the division for “Statistical Registers and Administrative Data”, in order to have an useful sampling frame for the CPIs. The informative basis of Outlets Register should contain: Identification codes of outlets Economic activity code Types of products sold 1…n (Coicop classification) Turnover for each type of products sold. Outlets Register

9 The basis for the set-up the Outlets Register is the availability of the Statistical Register of Local Units (ASIA-UL) which, every year, supplies information on the territorial locations, economic activity and employees of the local units of the enterprises, previously available every ten years from the Industry and Services Census. The economic activity code allows to identify : the local units that can be classified as outlets the type of products sold The setting-up of the Outlets Register requires the integration of different administrative/statistical sources in order to obtain a register with the suitable information for applying the proposed sampling strategy. Outlets Register: administrative sources used

10 10 Studi di Settore (Sds) – “Business sector analisys” introduced by the Italian tax administration to calculate reference revenue levels for taxpayers involve small and medium size firms and independent workers, with an annual turnover under 7.5 mln. The Sds are based on sophisticated statistical procedure which aims at estimating a reasonable turnover value for each taxpayer. In order to estimate a plausible level of turnover, data are collected from all firms that report similar activity codes. Data include structural variables (surface area of office and warehouses, number of employees, type of customers, product output) and accounting variables (mainly costs). Outlets Register: administrative sources used

11 11 Outlets Register: administrative sources used Number of taxpayers in Studi di settore by sector activity. Years 2005-2008 80%, i.e. about 3,8 million, of Italian firms are eligible to be audited on the basis of Sds.

12 Issues Sds  Different informative structure with different recorded variables (different degree of complexity in the statistical translation)  Different definitions and classifications (i.e. classification of product sold) ( translation according to a statistical framework before their usage)  Lack of local unit identification ( only, the municipality code without addresses ) Outlets Register: administrative sources used

13 Retail Trade Register - Nielsen The register contains information about 28 thousand local unit of enterprises in the sector of the retail food trade: hypermarkets, supermarkets, discount stores 15 thousand stores with a sales area of between 100 and 400 m 2 ) The information is: identification variables of unit dimension in terms of employees number of cash desks sales area type of counter present (frozen food, vegetables, meat,…) sales potential indexes, which supply an estimate on the turnover of the main types of product sold by each local unit. Outlets Register: administrative sources used

14 In order to apply the proposed sampling strategy, for each outlet it is needed to assign the information about the type of products sold (TPs). To identify the TPs sold by each outlet, the division for Structural Statistics and Consumer Prices has developed a correspondence table between the classification of the economic activities (ATECO, Classification of economic activities, national version of NACE) and the classification of products (COICOP, Classification of Individual Consumption by Purpose). Outlets Register: ATECO-COICOP correspondence table

15 n=2…..44 The correspondence between COICOP/ATECO, in some cases is 1 to 1, namely an Ateco code is associated with a product group and in others 1 to n, where n can vary from 2 to 44 (for example 44 product groups are associated with the Ateco code 47.11.1 – hypermarket). Type of correspondenceNumbers ATECO codes 1-1144 1-n47 The association between TPs and ATECO codes is quite immediate for TP related to goods, but it implies some uncertainty when treating with services

16 Outlets Register: Macro phases of realization The set-up of the Outlets Register for food trade includes five macro phases of work: 1.Integration ASIA-UL and the correspondence table ATECO/COICOP, 2.Analysis of administrative sources considered 3.Integration administrative sources 4.Imputation the turnover to each outlet 5.Checking

17 Outlets Register: Phase 1 – Integration ASIA-UL correspondence table ATECO/COICOP The subset of local unit containing the outlets selling goods in the food trade is defined by selecting all the outlets characterised by a ATECO code linked to a COICOP code of food trade. The output of this phase is a list of outlets LU 0 with the following information: identification characters (name, fiscal code, address, ecc.) economic activity code, type products sold (Coicop code at group level) employees enterprise turnover

18 Outlets Register: Phase 1 – Integration ASIA-UL correspondence table ATECO/COICOP Outlets by enterprise typology and number of products sold (percentage value)

19 Outlets Register: Phase 2 – Source analysis Each source used for the building-up the Outlets Register needs a specific pre-treatment before compared and integrated:  standardization and normalization operations for common variables like location, fiscal code  checks and decoding activities (as location, at municipality code level) to transform the value of this variable from the input source proper classification into statistical codes.  Coherence test of identification codes for the link process.

20 Outlets Register: Phase 2 – Source analysis Sds:  identification the potentially useful Sds among the 206 Sds avalilable,  analysis the selected Sds gathering different kind of information  identification the products sold,  development of editing and imputation procedures for using the data Retail Trade Register:  analysis between this source and “survey on the local units of large enterprises” (IULGI) to evalue the quality of this source  analysis of the methodology used for the potentiality indexes and their usefulness for estimating the product turnover

21 Outlets Register: Phase 2 – Source analysis Sds used to setup the Outlets Register DenominationCodeNumber enterprises Retail sale of meatTM02U28 thousands Retail sale in non-specialised stores with food, beverages (minimarkets, supermarkets, etc.) TM01U63 thousands Retail sale of fruit and vegetablesTM27A15 thousands Retail sale of fishTM27B5 thousands Retail sale of Frozen productsTM30U900 Retail sale of cakesTD012 thousands Retail sale of flour confectioneryTD12U5 thousands Information useful for determining the turnover of the outlets:  percentage of profits made by each type of product sold as a ratio of the total profits made at an enterprise level,  percentage of outlet profits considered as a percentage of the total profits at a local unit level but only for the TM01U and TM02U.

22 Outlets Register: Phase 2 – Source analysis To use the Sds data in order to estimate the turnover of the outlets of traditional distribution, it’s necessary to define some corresponding tables among the classifications of the products present in the Sds and the Coicop classification at a group level.

23 Outlets Register: Phase 3 – Sources integration The integration process starts by matching the above mentioned sources and the first subset of local unit containing the outlets selling products in the food trade (LU 0 ), using the primary key for the unit identification (fiscal code). Outlets’ distribution by source presence and products sold More than 128 thousand outlet (56%) are present at least one source considered and for which the percentage of turnover per product (at an enterprise level), the percentage of turnover per local unit (in some cases and for the multi-location enterprises), and the potentiality sales indexes are available.

24 Outlets Register: Phase 4 – Turnover imputation

25 The next steps are: 1.Imputation the turnover for each outlet For single-location enterprises, the turnover of the outlet coincides with the turnover of the enterprise For the multi-location enterprises, the turnover of each outlet has to be imputed, using models relating the turnover to some auxiliary information such ad the economic activity. 2.Imputation the turnover for all the TPs sold by the outlet. For outlets selling only one product, the enterprise turnover is attributed completely to the product that is sold For outlets selling p products, the turnover for each product has to be estimated, using as training data the information on TPs turnover known by Sds

26 Outlets Register: Survey quality To verify the quality of the Outlets Register and the proposed probability sampling design for the CPI, the division for Structural Statistics and Consumer Prices will carry out a quality survey in October. Three main themes will be evaluated during this survey:  coherence between the list of outlets in the selected sample and the reality of business distribution in the Municipalities involved, considering a delay of around 18 months between the time reference of the starting list and the moment in which the sample is extracted;  correctness of the methods for turnover imputation;  test different schemes for the selection of outlet sample;

27 Outlets Register: Survey quality Municipalities involve in the survey

28 Outlets Register: Survey quality The survey will divide in three phases:  a first phase, to be carried out in collaboration between Istat and Municipalities during which the sample outlets is extracted and the coherence between the extracted sample and the trade structure of each individual Municipality is verified. During this phase, the software for the selection of the sample will test and questionnaires will be made, in which the information contained in the Outlets Register, (COICOP group turnover for each unit selected) are given.  a second phase to be carried out partly on-field, during which the researchers of each Municipality check coherence of the information given in the questionnaire with the reality of the outlets.  a third phase, to be carried out prevalently through office work and in some cases with a return to the field, during which the results of the second phase are evaluated. The results will help to fit the statistical methods and to improve the methodologies used.

