WORKSHOP ON SCANNER DATA Geneva 10 May 2010 Joint presentation by Ragnhild Nygaard (Statistics Norway) and Heymerik van der Grient (Statistics Netherlands)

1 WORKSHOP ON SCANNER DATA Geneva 10 May 2010 Joint presentation by Ragnhild Nygaard (Statistics Norway) and Heymerik van der Grient (Statistics Netherlands)

2 Historical overview – NL Supermarkets  Mid 90s: first contacts with chain(s)  2002: first implementation: 1/2 chain(s) Yearly Laspeyres (labour intensive)  Construction of yearly basket of items  Manual linking of items to COICOP-groups  Manual replacement of disappearing items Reduction of ca 10 000 monthly price quotes in field survey

3 Historical overview – NL, cont Supermarkets  2010: extension: 6 chains Monthly chained Jevons (efficient process)  No manual linking of items  No explicit replacements Extra reduction of ca 5 000 monthly price quotes in field survey

4 Historical overview – N  1997: first contact with one chain Gradually contact with more chains Implementation in the CPI  only price information of specific representative items  2002: scanner data from all the chains (no questionnaires - big incentive)  Aug 2005: expanded use for COICOP 01 price and quantity information for all items in representative outlets

5 Questions to be answered when dealing with scanner data  How/Where require scanner data?  Which statistical method?  How to link items to COICOP?  How to deal with all kind of particularities in data?  Development of new computer system?

6 Source of scanner data  Market research companies Cleaned data (very) expensive Two-stage delivery chain (timeliness)  Companies/Chains Raw data Cheap (NL/N do not pay) Direct contact with original supplier

7 Negotiations with companies  Time consuming process Negotiations can take up to a year or more including meetings, sending test data, analysing data etc. Be aware of some company establishing costs e.g. preparing the data extractions  Can company provide what you want/need? E.g. information to link items to COICOP automatically

8 Negotiations with companies, cont.  Focus on advantages for companies Minor costs once established (just a copy of their sales administration) No questionnaires or monthly visits of price collectors  Other incentives for companies? Money – not likely Information  E.g. company price development compared to overall price development

9 Negotiations with companies, cont.  Establishing good routines with the companies are essential Strict time schedules No changes in formats when implemented

10 Pre - production work  Take your time analyzing the data Enormous amount of data  N: Over 300 000 price observations each month divided into about 14 000 items Build shadow system (prototype)  Compare the new price indexes based on scanner data with the old method for a certain period of time before implementation  Discover possible problems in advance Unexpected situations will arise for sure

11 Pre - production work  Ideas for analysing the data: Is same EAN always same item? Extreme price changes Specific price development at beginning or end life cycle EAN structurally  Risk of bias! All kind of dynamics in data Missing prices Do properties of data change over time Etc

12 Methodology / IT-system  Find methodology that: Delivers good indexes (e.g. no bias) Can deal with all particularities in data  Build IT-system that supports the chosen methodology  Learn from experiences other countries using scanner data

13 Properties of data C onsequences for methodology NL and N  High attrition rate of items

14 Properties of data, cont. C onsequences for methodology NL and N  How to deal with high attrition rate of items NL: monthly chained index N: monthly chained index

15 Properties of data, cont. C onsequences for methodology NL and N Sales: low prices combined with enormous increase in quantities sold

16 Properties of data, cont. C onsequences for methodology NL and N  Consequences of sales: Single observations can have extremely high influence on elementary index Risk of bias applying monthly chaining and explicit weights

17 Properties of data, cont. C onsequences for methodology NL and N  Bias not just theoretically! Example for detergents FormulaWeekly index I(200835; 200501=100) Monthly index I(200808; 200501=100) Laspeyres7 794 207.27 11 301.04 Paasche 0.0000033 0.88 Fisher 5.10 99.89 Törnqvist 7.40 101.53 Jevons 78.76 91.75 Walsh 33.78 107.72

18 Properties of data, cont. C onsequences for methodology NL and N  How to deal with sales? NLcrude weighting on item level: w=0 or 1 NManual checks of price ratios that contribute most to elementary results: “critical observations”

19 Properties of data, cont. C onsequences for methodology NL and N  Implausible price changes NLprice changes (p t /p t-1 ) of more than a factor 4 are deleted Changes of +5000% and -99% do actually occur Nprice changes (p t /p t-1 ) of more than a factor 3 are deleted

20 Properties of data, cont. C onsequences for methodology NL and N  Temporarily missing prices

21 Properties of data, cont. C onsequences for methodology NL and N  How to deal with temporarily missing prices: NL:imputation of missing prices N :no adjustments, but imputing prices is considered for the near future

22 Properties of data, cont. C onsequences for methodology NL and N  Quality differences Items with same EAN are considered to be identical Items with different EAN are treated as different items (no matching)  How to deal with quality differences: NLOnly adjustment in exceptional cases: manual interference NNo adjustment

23 Actual method - NL  Data received: For each item each week:  EAN  Short description  (Chain specific) product group Used to link items to COICOP automatically  Expenditures  Quantities sold

24 Actual method – NL, cont.  Price of item: Unit value based on first three weeks of month  Unweighted price index elementary level: Monthly chained Jevons on selection of items  Weighted price index higher aggregates: Yearly chained Laspeyres Weights based on scanner data of all 52 weeks of previous year

25 Actual method – NL, cont.  Item selection at elementary level Items with low expenditures: w=0 Other items: w=1  Threshold of low (average) expenditure share:  Example: threshold =1% for χ=2 and N=50

26 Actual method – NL, cont.  Determination of threshold value  Simulations lead to: Optimal value: χ=1.25  Ca 50% of items is excluded (on average)  Elementary index based on 80 à 85% of total expenditures  Elementary level (chain dependent) comparable with COICOP6

27 Actual method – NL, cont.  Refinements: Extreme price changes are excluded (factor 4) Missing prices are imputed Dump prices at end lifecycle item are excluded (see paper)

28 Actual method – NL. What advantages were achieved?  Indexes are of higher quality Compared with old method scanner data Compared with field survey  Response burden for companies is lower No price collection in the shops  Efficiency gains? Yes: more or less automatic production process Investment costs (IT-system) were (very) high

29 Illustrations  Price indexes based on five supermarkets

30 Illustrations  Price indexes based on five supermarkets

31 Actual method - N  Data received: For each item in the midweek of the month:  EAN/PLU  Short description  (Chain specific) product group  Calculated average price  Quantity sold  Expenditure

32 Actual method – N, cont.  Sample of representative outlets Stratified by chain and concept  Matching EAN/PLU with COICOP6  Weighted Jevons price index on elementary level with expenditures shares of current and base period; Monthly chained Törnqvist index  Scanner data weights between the COICOP6 groups

33  Higher aggregates: Yearly chained Laspeyres Weights from HES (NR as of 2011)  Exclude strongly seasonal items only available for a certain period of the year  Manual control and possibly exclusion of extreme contributions to elementary results Actual method – N, cont.

34 Actual method – N What advantages were achieved?  Indexes of higher quality? New methodology led to reduction of e.g sampling and measurement errors, but also to new biases Much more data – more detailed price indexes Considering both prices and quantities Many indexes have improved, others have not  Low response burden for companies No questionnaires  Efficiency gains? Automatic production process which requires some manual interference  Resources demanded not much higher than before High investment costs (IT-system)

35 New methodology  Newly developed index ( Ivancic, Diewert, Fox ) Rolling year GEKS price index  Source: GEKS-algorithm of purchasing power parities (International Comparison Programme) GEKS index transitive by construction  chained index equals direct index  no chain drift A geometric mean of direct superlative price indexes

36 New methodology, cont. bilateral indexes (Törnqvist or Fisher) between entities j and l (l=1..M) and between entities k and l, respectively Purchasing power parities: entity is country Scanner data : entity is month

37 New methodology, cont.  Expanding time period leads to revising all previous GEKS indexes  Solution: rolling version (chaining) etc

38 RYGEKS and NL  RYGEKS specifically developed for Statistics Netherlands as remedy for not-weighting at elementary level Not (yet) applied in practice Used as benchmark  Finding optimal value threshold Current method (NL) resembles RYGEKS quite well (on average)  No bias found

39 RYGEKS and NL: Illustrations



42 RYGEKS and NL, cont.  Plans for near future: Shadow system based on RYGEKS indexes Continuous benchmark for current method Implementation when RYGEKS is widely accepted?  More (international) analysis needed

43 RYGEKS and N  RYGEKS indexes tested on Norwegian scanner data on different levels; EAN, elementary and aggregated COICOP levels  For COICOP 01 compared a monthly chained Törnqvist index with a monthly chained RYGEKS index  The results indicate some bias in the Törnqvist index

44 RYGEKS and N, cont.  Small deviations for many COICOP aggregates Milk, Cheese and eggs, Oils and fats, Vegetables, Fish

45 RYGEKS and N, cont.  While others show more deviations Meat, Sugar, jam and chocolate

46 RYGEKS and N, cont.

47  Causing bias; Missing prices Seasonal items (not excluded) Price and quantity oscillating over time  Shadow system for calculating RYGEKS indexes on monthly basis established Too early to be implemented

48 Scanner data in other branches?  NL: Expanding to other branches desirable Data available (e.g. durables) Problem of quality changes Analysis needed  N: Continuously working to expand scanner data  Increasing pressure from chains and outlets Data available for pharmaceutical products, wine and spirits (state monopoly) and petrol  Mostly price information implemented Have tried to cover clothing, but matched item model unsuccessful

