Backcasting United Nations Statistics Division
Overview Any change in classifications creates a break in time series, since they are suddenly based on differently formed categories Backcasting is a process to describe data collected before the “break” in terms of the new classification
Overview There is no single “best method” Factors influencing a decision include: type of statistical series that requires backcasting (raw data, aggregates, indices, growth rates,...) statistical domain of the time series availability of micro-data availability of "dual coded" micro-data (i.e. businesses are classified according to both the old and the new classification) length of the "dual coded" period frequency of the existing time series required level of detail of the backcast series cost / resource considerations
Main methods “Micro-data approach” (re-working of individual data) “Macro-data approach” (proportional approach) Hybrids thereof
Micro-data approach Consists of assigning a new activity code (= new classification) to all units in every period in the past (as far back as backcasting is desired) No other change is required Statistics are then compiled by standard aggregation Census vs. survey (weight adjustment issue)
Micro-data approach Census All in-scope unites are selected and therefore have a weight of one. Each unit is therefore recoded and then the re- aggregation can take place. Survey The non-observed units in the population have influence on the outcome via sampling weights Therefore all units under the population (both observed and non-observed) need to be coded Re-aggregation of the sample units under the new classification can then occur.
Micro-data approach Requires detailed information from past periods (for all units to be recoded) More detailed than just the old code If information is available, results are more reliable than those from macro-approaches
Micro-data approach Issues: Resource intensive Need solutions if unit information is not available for a period (not collected, not responded) Nearest neighborNearest neighbor –Back calculation of the elementary unit is made in the same way as made for the “closet unit”. Transition matrix approachTransition matrix approach – Using conversion coefficient at the elementary level
Macro-data approach Also called “proportional method” This method calculates a ratio (“proportion”, “conversion coefficients”) in a fixed dual coding period that is then applied to all previous periods The ratios are calculated at the macro level Could be based on number of units (counts) or s ize variables such as turnover or employment Has a more approximate character
Macro-data approach In simple form, applies growth rates of former time series to the revised level for the whole historical period More sophisticated methods may use adjustments based on experts’ knowledge Example: mobile phones
Macro-data approach Assumes that the same set of coefficients applies to all periods This means it is assumed that the distribution of the variable of interest has not changed between the old and the new classification Applied to aggregates; does not consider micro-data Relatively simple and cheap to implement
Macro-data approach: Steps 1 – estimation of conversion coefficients Done for dual-coding periodDone for dual-coding period –Longer/multiple periods help in overcoming “infant problems’ of the new classification and allow for correction of data –Based on selection of specific variable 2 – calculation of aggregates using the conversion coefficients Weighted linear combinationWeighted linear combination 3 – linking the different segments Old – overlap – new seriesOld – overlap – new series Breaks caused by mainly by change in field of observationBreaks caused by mainly by change in field of observation –Simple factor or “wedging” 4 – final adjustment Seasonal etc.Seasonal etc.
Macro-data approach: Hypothetical example Basics of conversion matrices Makes use of a simple, artificial example Convert from A to B. a = 3 (codes 1A, 2A, 3A) b = 5 (codes 1B, 2B, 3B, 4B, 5B) N (Count) = 115
Dual-coded business register
Derive summary totals N1A2A3ATotal 1B B B1616 4B44 5B3030 Total Emp1A2A3ATotal1B B B B5353 5B Total OR
Conversion matrix A to B: counts N1A2A3A 1B520 2B1030 3B16 4B4 5B30 Total beta1A2A3A1B B B.40 4B.10 5B.50 Total111 Conversion coefficient from 1A to 1B Conversion is via linear combination
Conversion is via linear combinations … and the aggregate totals are the same:
Apply these proportions to each time point …
Example – ISIC Rev3 to Rev.4 Conversion at the Section level Denote turnover (y) of ISIC Rev.3 Section C, D & E and out-of-scope unit (Z) by Denote turnover (y) of ISIC Rev.4 Section B, C, D & E and out-of-scope unit (Z) by
Conversion matrix Conversion coefficient from Rev3 Section C to Rev4 Section B
Rev.3 CDEZ Total Rev.4 B 19,829, ,178 19,843,380 C 211,6321,297,621,6072,1424,975,276 1,302,810,657 D 0101,624147,814,40725,793, ,709,454 E 6,8347,712,0018,342,74718,977,634 35,039,216 Z 101,65444,961,905783,2983,152,252,617 3,198,099,474 Total20,149,3221,350,397,137156,942,5943,202,013,1284,729,502,181 Turnover: Summary table The turnover value of activities that is classified in Old classification: Rev.3 Section C New classification: Rev.4 Section B
Conversion matrix Rev.3 CDEZ Rev.4 B C D E Z Total weight Of the Rev.3 Section C activities, 98.41% is reclassified to Rev.4 Section B 1.05 % is reclassified to Rev.4 Section C, and so on Rev.4 Section C activities is a combination of 1.05 % Rev.3 Section C, 96.09% Rev.3 Section E, and 0.16% of Rev.3 activities that does not belong to the Rev.3 industrial sector
Conversion via linear combination Equations for converting total series from Rev.3 to Rev.4 are:
Comparison Micro-data approach better retains structural evolution of the economy Micro-data approach does not require choice of a special variable Macro-data approach reflects evolution based on fixed ratio for a fixed variable Seasonal patterns may be distorted Macro-data approach is more cost-efficient No consideration of micro-data necessary Assumptions underlying the macro-data approach become invalid over longer periods “Benchmark years” might help to measure the effect, if data is available
Other options Combinations of both approaches are possible Ratios for the macro-data approach could be calculated for shorter periods only Micro-data approach could be used for specific years and the macro-data approach for interpolation between these years E.g. based on availability of census dataE.g. based on availability of census data Many factors can influence the choice (see beginning) but data availability is a key practical factor