A Review of Benchmarking Methods G Brown, N Parkin, and N Stuttard, ONS
Overview Introduction What is benchmarking? What we did and why Some methods for benchmarking Some quality measures Comparison of methods Summary 2
Introduction Purpose: to recommend a method for benchmarking to ONS and wider GSS Benchmarking combines two time series of same phenomenon, measured at different frequencies Result: benchmarked series is higher quality Work funded from Quality Improvement Fund 3
What we did and why Identified appropriate benchmarking methods Tested using several hundred ONS time series Used range of quality measures to rank methods Made judgment to combine results from different quality measures Recommended a benchmarking method Update of ONS computer systems prompted examination of methods 4
Benchmarking Want good estimates of levels and growth Have two series measuring same phenomenon Different frequencies Higher frequency more timely, accurate growths o Indicator series Lower frequency delayed, more accurate levels o Benchmark series 5
Benchmarking Resulting high frequency series o Benchmarked series Has good estimates of growth combined with good estimates of level 6
Benchmarking Two types of relation between indicator and benchmark: o Point in time o Average 7
Benchmarking, point in time Example: unemployment monthly and quarterly Benchmarks apply to the third month in each quarter Third monthly estimate in each quarter is forced to equal benchmark 8
9
10
Benchmarking, average Example: turnover monthly and quarterly Benchmarks apply to each month in each quarter Average turnover of three months in each quarter is forced to equal benchmark 11
12
13
Non-negativity Most indicator series must be non-negative In those cases the benchmarked series must be non-negative too Process of benchmarking can produce negative benchmarked series 14
15
What we did and why Identified appropriate benchmarking methods Tested using several hundred ONS time series Used range of quality measures to rank methods Made judgment to combine results from different quality measures Recommended a benchmarking method Update of ONS computer systems prompted examination of methods 16
Benchmarking methods Methods suggested by ONS, variants with different splines o proc Expand (in SAS) o INTER o Kruger Denton Cholette-Dagum Constrained versions of the above for non- negativity 17
Benchmarking methods Methods suggested by ONS, variants with different splines o proc Expand (in SAS) o INTER o Kruger Denton Cholette-Dagum Constrained versions of the above for non- negativity 18
Benchmarking methods Methods suggested by ONS, variants with different splines o proc Expand (in SAS) o INTER o Kruger Denton Cholette-Dagum Constrained versions of the above for non- negativity 19
Benchmarking methods Methods suggested by ONS, variants with different splines o proc Expand (in SAS) o INTER o Kruger Denton Cholette-Dagum Constrained versions of the above for non- negativity 20
ONS methods (and variants) Summary: fits smooth curve through knots 1.Aggregate indicator series 2.Calculate ratio of aggregated to benchmark 3.Augment with fore/backcasts using X-12-ARIMA 4.Interpolate to frequency of indicator 5.Multiply indicator by interpolated series 6.Iterate 1 to 5 Variants use different ways to interpolate 21
Interpolation Three types of cubic spline 1.Proc Expand (point in time/average) 2.INTER (average) 3.Kruger (point in time) Progressively less prone to produce negative values 22
Denton type Summary: try to preserve movements in indicator Minimise a penalty function of differences or relative differences between indicator and benchmark Minimisation using either special methods or off-the-shelf methods for quadratic minimisation Denton usually set up to minimise first differences or proportionate first differences 23
Denton and Cholette-Dagum For indicator points with no benchmark: Denton carries forward the most recent difference between benchmark and indicator Cholette-Dagum assumes the difference decays to zero in a defined way Flexible in the way this is modelled We assume: o Decay is geometric o Rate of decay fixed in advance for all series 24
25
Non-negativity ONS suggestion: o Benchmark on log scale o Exponentiate o Distribute residual differences Optimisation approach for Denton type: o Set up basic method as a matrix problem o Add constraints as part of matrix setup o Solve using off-the-shelf optimiser in SAS 26
What we did and why Identified appropriate benchmarking methods Tested using several hundred ONS time series Used range of quality measures to rank methods Made judgment to combine results from different quality measures Recommended a benchmarking method Update of ONS computer systems prompted examination of methods 27
Time series used for testing Mixture of: o Monthly to quarterly o Quarterly to annual o Average and point in time Different lengths Included some awkward series (to test non- negativity) 28
What we did and why Identified appropriate benchmarking methods Tested using several hundred ONS time series Used range of quality measures to rank methods Made judgment to combine results from different quality measures Recommended a benchmarking method Update of ONS computer systems prompted examination of methods 29
How the methods were compared 1.Failures 2.Verification of benchmarking constraint 3.Preserving change 4.Revisions 5.Smoothness 6.Closeness 30
How the methods were compared 1.Failures – program fails to benchmark 2.Verification of benchmarking constraint 3.Preserving change 4.Revisions 5.Smoothness 6.Closeness 31
How the methods were compared 1.Failures 2.Verification of benchmarking constraint - benchmarked not equal to benchmark 3.Preserving change 4.Revisions 5.Smoothness 6.Closeness 32
How the methods were compared 1.Failures 2.Verification of benchmarking constraint 3.Preserving change – size and direction 4.Revisions 5.Smoothness 6.Closeness 33
How the methods were compared 1.Failures 2.Verification of benchmarking constraint 3.Preserving change 4.Revisions – size & bias when perturbing or adding benchmark 5.Smoothness 6.Closeness 34
How the methods were compared 1.Failures 2.Verification of benchmarking constraint 3.Preserving change 4.Revisions 5.Smoothness – relative variance of indicator and benchmarked 6.Closeness 35
How the methods were compared 1.Failures 2.Verification of benchmarking constraint 3.Preserving change 4.Revisions 5.Smoothness 6.Closeness – between indicator and benchmarked 36
How the methods were compared For each one of preserving change, revisions, smoothness and closeness, calculate: o For each method, for each time series, for different lengths of the series o Rank methods for each series and length o Average the ranks over all series o Plot and compare average ranks by length 37
38
39
Recommended method Around 100 plots compared Judgment made on overall best performing method Based on good performance and lack of bad performance Recommended method: Cholette-Dagum (0.8) 40
Summary Aim: recommend method for benchmarking to ONS and wider GSS Update of ONS computer systems prompted examination of methods Used several quality measures to rank methods Made judgment to combine results from different quality measures Recommended: Cholette-Dagum (0.8) 41
Any questions? 42