Sampling design issues in Italian experience on scanner data and the possible integration with microdata coming from traditional data collection Claudia.

Slides:



Advertisements
Similar presentations
By: Saad Rais, Statistics Canada Zdenek Patak, Statistics Canada
Advertisements

Innovation data collection: Advice from the Oslo Manual South East Asian Regional Workshop on Science, Technology and Innovation Statistics.
Innovation Surveys: Advice from the Oslo Manual South Asian Regional Workshop on Science, Technology and Innovation Statistics Kathmandu,
Innovation Surveys: Advice from the Oslo Manual National training workshop Amman, Jordan October 2010.
The Bulgarian CPI And The Index Of A Small Basket Of Goods And Services Joint UNECE/ILO Meeting on Consumer Price Indices Geneva, May 2006 National.
Methodological approaches to recording certain types of services in the Consumer Price Index in Belarus Ekaterina Grikhanova, National Statistical Committee.
1 Third Workshop on ICP Western Asia Beirut, October 2004 Design of ICP price survey Sultan Ahmad, Consultant Based on Keith.
Sampling Strategy for Establishment Surveys International Workshop on Industrial Statistics Beijing, China, 8-10 July 2013.
Quality in Italian consumer price survey: optimal allocation of resources and indicators to monitor the data collection process Federico Polidoro, Rosabel.
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey using the Information of the Business Register Luigi Biggeri,
UNECE Workshop on Consumer Price Indices Session 1: The Concept, Scope and Coverage of the CPI Presentation by Cengiz Erdoğan, TurkStat October Istanbul,
TURKISH STATISTICAL INSTITUTE Agricultural Statistics Department TURKISH STATISTICAL INSTITUTE Agricultural Statistics Department Agricultural Structure.
Ranked Set Sampling: Improving Estimates from a Stratified Simple Random Sample Christopher Sroka, Elizabeth Stasny, and Douglas Wolfe Department of Statistics.
A new sampling method: stratified sampling
28-30 April 2014UNECE - Work session on statistical data editing Data editing and scanner data I. Léonard, G. Varlet and P. Sillard Insee.
Sébastien FAIVRE INSEE Workshop on scanner data, Stockholm, /06/2012 Would scanner data improve the French CPI?
1 Price index of pharmaceutical products, other medical products and appliances in the HICP/CPI Pia Skare Rønnevik.
UNECE Workshop on Consumer Price Indices Session 3: Calculation Expenditure Weight Presentation by Cengiz Erdoğan, TurkStat October Istanbul, Turkey.
Combining administrative and survey data: potential benefits and impact on editing and imputation for a structural business survey UNECE Work Session on.
United Nations Economic Commission for Europe Statistical Division Calculation of the consumer price index - Recommendations of the CPI Manual Joint EFTA/UNECE/SSCU.
Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE.
Regional Seminar on Developing a Program for the Implementation of the 2008 SNA and Supporting Statistics Dr. Cem BAŞ September 2013 Ankara - Turkey.
1 Experiences from compilation of the Hungarian CPI UNECE/ILO Meeting on CPI Geneva, May 2006 Hungarian Central Statistical Office Consumer Price.
Slide 1 © 2002 McGraw-Hill Australia, PPTs t/a Introductory Mathematics & Statistics for Business 4e by John S. Croucher 1 Index numbers n Learning Objectives.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
9 th Workshop on Labour Force Survey Methodology – Rome, May 2014 The Italian LFS sampling design: recent and future developments 9 th Workshop on.
Nobel Prize - Economics Three Amigos Eugene Fama - U. Chicago Lars Peter Hansen - U. Chicago Robert Shiller - Yale Financial Economics American.
Workshop on Price Index Compilation Issues February 23-27, 2015 Sample Design, Selection, and Maintenance Gefinor Rotana Hotel, Beirut, Lebanon.
Use of web scraping and text mining techniques in the Istat survey on “Information and Communication Technology in enterprises” Giulio Barcaroli(*), Alessandra.
Constant Price Estimates Expert Group Meeting on National Accounts Cairo May 12-14, 2009 Presentation points.
Q20101 National accounts revisions: Italian manufacturing productivity analysis Alessandro Faramondi Istat – National Statistical Institute.
Composite Price Index  Unweighted Aggregative method  Unweighted Average of Relatives method  Weighted Aggregative Method  Paasche Index  Laspeyres.
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio,
Luigi Biggeri 1, Rita De Carli 2 and Tiziana Laureti 3 1 Istat and University of Firenze, Italy; 2 Istat, Italy; 3 University of Tuscia, Italy Geneva,
ICP Workshop, Tunis Nov. 03 Overview of the Sample Framework.
1 Biggeri L*., Brunetti* A. and. Laureti° T *Italian National Statistical Institute (Istat), Rome, Italy; ° University of Tuscia, Viterbo, Italy Geneva,
Workshop on Price Index Compilation Issues February 23-27, 2015 Market Basket Items and Weights Gefinor Rotana Hotel, Beirut, Lebanon.
ECE/ILO Meeting on Consumer Price Indices, May 10-12, 2006, Geneva1 E-COMMERCE IN THE ISRAELI CPI Yoel Finkel Merav Yiftach Central Bureau of Statistics,
United Nations Economic Commission for Europe Statistical Division UNECE Workshop on Consumer Price Indices Istanbul, Turkey,10-13 October 2011 Session.
ISI Satellite Conference on Agricultural Statistics, Maputo, August 2009 Integrated survey framework Using Household Expenditure Surveys for Food.
Meeting of the Group of Experts on Consumer Price Indices. Geneva May 2014, UNECE Workshop 7: Seasonal products. Palais des Nations, Geneva - Meeting.
UNECE Workshop on Consumer Price Indices Session 2: Sampling of Outlets and Products Presentation by Cengiz Erdoğan, TurkStat October Istanbul, Turkey.
Impact of updating weights on tracking performance and volatility: Industry survey G. Bruno, L. Crosilla, P. Margani, A. Righi EU Workshop on Recent Developments.
UNECE Workshop on Consumer Price Indices Session 11: The HICP in non-EU Member Country Presentation by Cengiz Erdoğan, TurkStat October Istanbul,
UNECE Workshop on Consumer Price Indices Session 4: Practical Experiences with Calculating Elementary Indices and Treatment of Missing Prices Presentation.
Workshop on Price Index Compilation Issues February 23-27, 2015 Imputation of Missing Values, Seasonal Products and Quality Changes Gefinor Rotana Hotel,
INCREASING HERBAL PRODUCT CONSUMPTION IN THAILAND DURING THE PERIOD Assoc Prof Dr Arthorn Riewpaiboon Department of Pharmaceutical Botany Faculty.
Meeting of the Group of Experts on Consumer Price Indices. Geneva May 2014, UNECE Workshop 7: Seasonal products. Palais des Nations, Geneva - Meeting.
1 The use of scanner data on non-food products Statistics Norway.
Scanner data in the Luxembourg HICP/CPI Moving towards implementation Claude Lamboray Vanda Guerreiro Scanner Data Workshop ISTAT 1-2 October 2015.
The Swedish experience with scanner data From sampling to index calculation Paulina Jonéus.
The Use of Scanner Data for Price Statistics in the Netherlands Huib van de Stadt Head of Unit Prices and Business Cycle Statistics Netherlands 15 October.
Performance Indicators Workshop for African countries on the Implementation of International Recommendations for Distributive Trade Statistics May.
 Definition  Unweighted and Weighted Index Numbers ( Simple Index Numbers, Laspeyre’s, Paasche’s Index, Fisher’s “Ideal” Index)  CPI ( Consumer Price.
1 Outline  Introduction  Methodology for compiling CPI  Data Collection and Compilation.
CPI Measurement Problems The Case of Malawi By Charles Machinjili National Statistical Office Malawi.
5/25/2016Indices and Price Analyses Unit 1 The Household Expenditure Survey (HES) Presented by: The Indices and Price Analyses Unit Statistical.
United Nations Economic Commission for Europe Statistical Division UNECE Workshop on Consumer Price Indices Istanbul, Turkey,10-13 October 2011 Session.
UNECE Workshop on Consumer Price Indices Session 5: Calculation of a Chained CPI Presentation by Cengiz Erdoğan, TurkStat October Istanbul, Turkey.
United Nations Economic Commission for Europe Statistical Division UNECE Workshop on Consumer Price Indices Istanbul, Turkey,10-13 October 2011 Session.
The impact of European Commission regulation for the treatment of seasonality in HICP Italian clothing Presented by Monica Montella Italian National Statistical.
HARMONISED CONSUMER PRICE INDEX Nadiežda Alejeva Head, Price Statistics Division Statistics Lithuania WORKSHOP, 13 JULY 2012, YEREVAN 1.
Copyright 2010, The World Bank Group. All Rights Reserved. Producer prices, part 2 Measurement issues Business Statistics and Registers 1.
Wiesbaden Group on Business Registers International Roundtable on Business Survey Frames Tallinn, 27 – 30 September 2010 Simonetta Cozzi,ISTAT, Italy Wiesbaden.
Chaining? User needs and good practice!
Realisation of Outlets Register for the implementation of new probability sample strategy on the Consumer Price Index Survey Wiesbaden Group on Business.
OECD SHORT-TERM ECONOMIC STATISTICS WORKING PARTY (STESWP) MEETING
Price Indices for External Trade of Goods Eleonora Baghy 26/05/2013
Annual routines with chaining
Some Implementation Issues of Scanner Data
Presentation transcript:

Sampling design issues in Italian experience on scanner data and the possible integration with microdata coming from traditional data collection Claudia De Vitiis In collaboration with: C. Casciano, N. Cibella, A. Guandalini, F. Inglese, G. Seri, M. Terribili, F. Tiero ISTAT - ITALY Workshop scanner data. Rome 1-2 October 2015

Summary 1.Aims of the presentation 2.The new general sampling design for CPI 3.The context of the first experiments of sampling from Scanner Data 4.Selection of elementary items from Scanner Data 5.First results 6.Open issues and conclusions Workshop scanner data. Rome 1-2 October 2015

 Scanner data is a big opportunity to introduce improvements in the CPI compiling not only for the data collection but also with regards the sampling perspective  This presentation focuses on a comparison among indices of elementary aggregates compiled using different sub-sets of series obtained through different “selection schemes”  Elementary Index Bias  Elementary Index Sampling Variance  First experiment on a small data set:  One province, some consumption segments (Italian COICOP6), permanent series 1. Aims of the presentation Workshop scanner data. Rome 1-2 October 2015

The current sample strategy of the CPI survey (at territorial level)  Three purposive sampling stages: – The first stage units are the chief towns of provinces (established by law) – The second stage units are the outlets, purposively chosen in each PSU to be representative of the consumer behaviour – The most sold items of a fixed basked of products are observed in the sample outlets  The elementary indices are obtained at municipality level by unweighted geometric mean  The general index is calculated by subsequent aggregation of elementary indices, using weights at different levels based on population and national account data on consumer expenditure 2. The new general sampling design for CPI Workshop scanner data. Rome 1-2 October 2015

A working group established at ISTAT is developing a probability sample strategy  Based on the hypothesis that turnover is a good proxy of final household consumption (expenditure)  Outlets and items are selected using probabilities proportional to the turnover  Selection list for the outlets (local units) is obtained from business register, ASIA-UL  ASIA-PV  The archive contains information useful for the selection and the definition of the inclusion probability  For the item level different approaches will coexist at the beginning:  Scanner data for food and grocery in the modern distribution allow the use of sampling methods and index compilation using weights from quantities (or expenditures)  For traditional distribution and the other sectors, data collection and index compilation will continue unchanged at first 2. The new general sampling design for CPI Workshop scanner data. Rome 1-2 October 2015

 Among the first analyses on the scanner data universe we carried out some tentative experiments of sample selection of series  Data referred to the first Italian provinces for which ISTAT got data, for year 2014  Weekly data on turnover and quantities per EAN-code (GTIN) and outlet allow obtaining weekly unit values  One series is considered present in a specific month if it has associated a non zero turnover in at least one of the three central weeks of the month  The first issue we analysed is the continuity of series (=EAN by outlet) and the coverage of a panel of permanent series  Following figures show examples of the coverage rate of a fixed basket of series taken in December 2013 during 2014 months 3. The context of the first experiments of sampling from Scanner Data Workshop scanner data. Rome 1-2 October 2015

Figure 1a. Presence of series fixed in Dec2013 in 2014 months - Coverage of single series and total turnover (all products, Turin province) Workshop scanner data. Rome 1-2 October 2015

Figure 1b. Presence of series fixed in Dec2013 in 2014 months - Coverage of single series and total turnover (Coicop 6 digits - Coffee segment, Turin province) Workshop scanner data. Rome 1-2 October 2015

Very first exercise on permanent series  The permanent series are defined as having non-zero turnover for at least one relevant week (one of the first three full weeks) every month for 13 months (Dec 2013-Dec 2014)  After having verified that permanent series provide a good coverage of the total turnover  For these first analyses we focus on this PANEL, postponing the issues related to the item life cycle, replacement of missing items and seasonality  Our reference universe for the first experiments is constituted only of permanent series and indices are evaluated on this sub-set of series for each consumption segment 3. The context of the first experiments of sampling from SD Workshop scanner data. Rome 1-2 October 2015

Table 1. Total turnover for all series, relevant week series and panel series, five Italian provinces (2014) Workshop scanner data. Rome 1-2 October 2015

 For the outlets of Turin for which we have data, we focus on three consumption segments (Coicop 6 digits):  Coffee ( )  Pasta ( )  Mineral water ( ) 3. The context of the first experiments of sampling from SD Table 2. Total turnover for all series, relevant week series and panel series, 3 segments in Turin (2014) Workshop scanner data. Rome 1-2 October 2015

 Comparison of probabilistic and non-probabilistic sampling selection schemes for different aggregation index formula for elementary aggregates  Cut-off selection of series based on thresholds of covered turnover: the index is compiled using all series covering 60% or 80% of all turnover in each outlet for the consumption segment, in previous year 2013  Probability sampling: pps (size= previous year turnover) for two sampling rates (5% and 10%), selection of 500 samples  Reference method: most sold items in each outlet for representative products (current fixed basket approach) 4. The selection of elementary items from SD Workshop scanner data. Rome 1-2 October 2015

Table 3. Percentage and average number of items per outlet covering 60 and 80% of turnover, 3 segments in Turin (2014) Consumption segment Percentage of SeriesAverage Number of Items per Outlet Turnover threshold 60% Turnover threshold 80% Total Covering 60% of turnover Covering 80% of turnover Coffee 16,236, Pasta 23,444, Mineral water 12,126,33449

 Sample series are selected from a sample of outlets: 30 out of 127 of outlets of retail trade modern distribution in Turin province  Outlet are selected by stratified SRS sampling with allocation proportional to turnover of strata (6 chain by 2 types of outlet)  For each sample we compiled the elementary fixed base indices for 12 months with three classical aggregation formulas: Jevons (unweighted), Fisher (ideal) and Lowe (weights from quantities of previous year)  For the sample selection and weighting of indices we refer to total annual turnover  Comparison of each estimate with the corresponding universe value, evaluated on the complete set of panel series 4. The selection of elementary items from SD Workshop scanner data. Rome 1-2 October 2015

 For each aggregation formula comparison of values obtained on the different subset of series  Bias  Variance 5. First results Workshop scanner data. Rome 1-2 October 2015

Figure 2a. Lowe Indices for elementary aggregate : comparison of universe and different sub-sets (Coffee -Turin 2014)  The mean of the sample estimates is perfectly overlapped to the “true” value U  Cut-off samples over-estimate but follow trend  Most sold items under-estimates and alter trend Workshop scanner data. Rome 1-2 October 2015

Figure 2b. Fisher Indices for elementary aggregate : comparison of universe and different sub- sets (Coffee -Turin 2014)  The mean of the sample estimates is quite overlapped to the “true” value U  Cut-off samples over- estimate but follow the trend  Most sold items under-estimate and alter trend Workshop scanner data. Rome 1-2 October 2015

Figure 2c. Jevons Indices for elementary aggregate : comparison of universe and different sub-sets (Coffee -Turin 2014)  The mean of the sample estimates strongly over- estimates the “true” value U and stresses trend  Cut-off samples over- estimate but follow quite the trend  M ost sold items substancially follow trend and levels Workshop scanner data. Rome 1-2 October 2015

Figure 3. Lowe Indices for elementary aggregate : comparison of universe and different sub-sets (Coffee and Pasta -Turin 2014)  Pasta: The mean of the sample estimates and cut-off values are overlapped to the “true” value U  Most sold items over-estimates and alter trend  Cut-off values and best selling items show opposite performance for the two product: this can be explained by the different number of items and turnover distributions? Workshop scanner data. Rome 1-2 October 2015

Figure 4. Fisher and Jevons Indices for elementary aggregate : comparison of universe and different sub-sets (Pasta -Turin 2014)  The mean of the sample estimates is quite overlapped to the “true” value U  Cut-off samples slightly under-estimate but follow the trend  Most sold items stress trend Workshop scanner data. Rome 1-2 October 2015  The mean of the sample estimates strongly over-estimates the “true” value U and accentuates trend  Cut-off samples over-estimate but quite follow trend  Most sold items substancially follow trend and levels

Figure 5a. Confidence band for Lowe Indices for elementary aggregate in comparison with universe values, pps sample (Coffee Turin 2014) Workshop scanner data. Rome 1-2 October 2015

Figure 5b. Confidence band for Jevons Indices for elementary aggregate of Coffee in comparison with of universe values, pps sample (Turin 2014) Workshop scanner data. Rome 1-2 October 2015

Table 4. Bias and Relative Sampling Error distribution of monthly Lowe, Fisher and Jevons indices for pps samples of series Bias Consumption Segment Sampling rate Lowe IndexFisher IndexJevons Index minmaxminmaxminmax Coffee 5% % Pasta 5% % Sampling relative error (%) Consumption Segment Sampling rate Sample size Lowe IndexFisher IndexJevons Index minmaxminmaxminmax Coffee 5% % Pasta 5% % Workshop scanner data. Rome 1-2 October 2015

The results highlight the following first evidences about the performance of different series selection schemes  Cut-off based sample are much less biased with respect to a selection of most sold items: in general cut-off slightly overestimate while the most sold items approach underestimate inflation (even inflation vs deflation); anyway the results depend on the product category  Probability pps sample produces approximately unbiased estimates for indices using weights (Lowe and Fisher), though the second one shows a very high variance.  Probability srs sample produces approximately unbiased estimates when using unweighted indices (Jevons)  Sample scheme is not neutral with respect to index choice  Increasing the sampling rate produce a remarkable improvement of the bias in all indices, in addition to an obvious reduction of sampling error  First replication on other product segments show similar evidence but depending on the distribution of turnover Workshop scanner data. Rome 1-2 October 2015

 The sample allocation criteria are still under study, both for outlet and items; analysis of variability will be made  In-depth studies should take into consideration the cycle of life of items and all related implications (new items, replacements…)  A big issue to deal with is the integration between scanner data and traditional data for index compilation, different hypothesis are under evaluation  Combining indices obtained with different approaches  Gradually abandon manual collection, at least for food and grocery, considering the high expenditure coverage of modern distribution  Aim at define and realise a probability sampling also for traditional distribution 6. Open issues and conclusions Workshop scanner data. Rome 1-2 October 2015

Thank you for your attention ! Workshop scanner data. Rome 1-2 October 2015