Migration of a large survey onto a micro-economic platform Val Cox April 2014
Micro-economic Platform (MEP) Standardises and automates processes - Provides more efficient processing, more analysis Enables Statistics NZ to gain more from available data - Basic principle: use administrative data wherever possible, with surveys filling the gaps - Objective: bring core information about every business in the economy into the Longitudinal Business DB to allow Statistics NZ to respond quickly to changing needs for economic statistics 2
Aim of paper To discuss the challenges of building a non- response imputation package for a large survey on the MEP - Rationalises the use of Banff for outlier detection and imputation SEVANI (System for Estimation of Variance due to Nonresponse and Imputation) to estimate sampling and non-sampling errors 3
Annual Enterprise Survey(AES) Provides statistics on the financial performance and position of New Zealand businesses - Captures about 90% of New Zealand's GDP Uses four different major data sources -Three administrative (covers 72% of the population) -One postal survey 4
AES before MEP 5
Editing strategy of AES on MEP Guided by the Methodological Standard for E&I Key objective of standard - Editing is fit-for-purpose and enables continuous improvement of processes and data quality Key principles used -Automate editing processes where possible -Use Statistics NZ standard editing tools, wherever possible, to achieve standardisation 6
Editing system of AES in MEP Uses Banff to automate and standardise editing and imputation processes Uses analytical views to assess the quality of the edited data 7
Challenges and solutions A.Sheer volume of data -28 questionnaires, 113 industries and 180 variables Solution: Use of a “thin slice” approach -Restrict dataset to one questionnaire and one industry to show all stages of E&I are working -Once successful, expand dataset to include more industries until all 28 questionnaires are replicated -Successful in determining optimal level of automation for correcting failed edits 8
Challenges and solutions B.Determining which variable is erroneous when groups of variables must add or subtract to a total - Banff “errorloc” procedure always recommends to change one variable by a large amount -Change is done by “deterministic” procedure Solution: Assign weights to variables -Assign lower weights to more reliable variables so Banff doesn’t change their values Examples: totals, gross profit, since respondents use this to determine the tax they pay 9
Challenges and solutions C.Outlier detection -Old system detects outlier in 3 key variables but unlinks whole unit (all variables) - Banff does univariate outlier detection Solution: Compared 2 E&I runs of data -1 st run had only the 3 key variables set as outliers and 2 nd had all variables included in outlier steps -Decision: Choose variables to be set as outliers based on the effect on the totals 10
Challenges and solutions D.Running imputation one variable at a time would have been very time-consuming Solution: Group variables -By imputation method (4 methods) -By industry (some industries have different characteristics) -By type of variable (e.g. some variables can be negative) 11
Challenges and solutions E.Imputation failed for some variables -Some imputation cells were too small Solution: Merged small imputation cells -Each imputation stage was run twice, the first without cell merging and the second with cell merging, resulting in 8 imputation stages -Use of a “catch-all” stage at the end (9 th stage) to carry out mean imputation by industry 12
Challenges and solutions F.Challenges with no solutions -Analysis of improvements in the E&I was slow as it took several hours to run E&I and write back to the main data storage area to view data in a cube -Attempt to replicate published results as closely as possible created a dilemma: When to stop trying? -What was the “right” answer? 13
SEVANI Provided a standardised and automated method to report on estimates of variances due to sampling as well as non-response and imputation Challenges: - Can produce output for one variable at a time - SEVANI required a lot of parameters to set-up - MEP is unit-based so can’t easily output SEVANI results Solution: -Use of a macro to identify variable names -Created a SAS code to set-up parameters -Output SEVANI results outside MEP 14
Next steps Educate the users of the new system on MEP Identify potential areas to make improvements in the editing and imputation system Create a new MEP collection for Charities data to include its own editing and imputation system 15