Download presentation
Presentation is loading. Please wait.
Published byNathan Hunter Modified over 9 years ago
1
Migration of a large survey onto a micro-economic platform Val Cox April 2014
2
Micro-economic Platform (MEP) Standardises and automates processes - Provides more efficient processing, more analysis Enables Statistics NZ to gain more from available data - Basic principle: use administrative data wherever possible, with surveys filling the gaps - Objective: bring core information about every business in the economy into the Longitudinal Business DB to allow Statistics NZ to respond quickly to changing needs for economic statistics 2
3
Aim of paper To discuss the challenges of building a non- response imputation package for a large survey on the MEP - Rationalises the use of Banff for outlier detection and imputation SEVANI (System for Estimation of Variance due to Nonresponse and Imputation) to estimate sampling and non-sampling errors 3
4
Annual Enterprise Survey(AES) Provides statistics on the financial performance and position of New Zealand businesses - Captures about 90% of New Zealand's GDP Uses four different major data sources -Three administrative (covers 72% of the population) -One postal survey 4
5
AES before MEP 5
6
Editing strategy of AES on MEP Guided by the Methodological Standard for E&I Key objective of standard - Editing is fit-for-purpose and enables continuous improvement of processes and data quality Key principles used -Automate editing processes where possible -Use Statistics NZ standard editing tools, wherever possible, to achieve standardisation 6
7
Editing system of AES in MEP Uses Banff to automate and standardise editing and imputation processes Uses analytical views to assess the quality of the edited data 7
8
Challenges and solutions A.Sheer volume of data -28 questionnaires, 113 industries and 180 variables Solution: Use of a “thin slice” approach -Restrict dataset to one questionnaire and one industry to show all stages of E&I are working -Once successful, expand dataset to include more industries until all 28 questionnaires are replicated -Successful in determining optimal level of automation for correcting failed edits 8
9
Challenges and solutions B.Determining which variable is erroneous when groups of variables must add or subtract to a total - Banff “errorloc” procedure always recommends to change one variable by a large amount -Change is done by “deterministic” procedure Solution: Assign weights to variables -Assign lower weights to more reliable variables so Banff doesn’t change their values Examples: totals, gross profit, since respondents use this to determine the tax they pay 9
10
Challenges and solutions C.Outlier detection -Old system detects outlier in 3 key variables but unlinks whole unit (all variables) - Banff does univariate outlier detection Solution: Compared 2 E&I runs of data -1 st run had only the 3 key variables set as outliers and 2 nd had all variables included in outlier steps -Decision: Choose variables to be set as outliers based on the effect on the totals 10
11
Challenges and solutions D.Running imputation one variable at a time would have been very time-consuming Solution: Group variables -By imputation method (4 methods) -By industry (some industries have different characteristics) -By type of variable (e.g. some variables can be negative) 11
12
Challenges and solutions E.Imputation failed for some variables -Some imputation cells were too small Solution: Merged small imputation cells -Each imputation stage was run twice, the first without cell merging and the second with cell merging, resulting in 8 imputation stages -Use of a “catch-all” stage at the end (9 th stage) to carry out mean imputation by industry 12
13
Challenges and solutions F.Challenges with no solutions -Analysis of improvements in the E&I was slow as it took several hours to run E&I and write back to the main data storage area to view data in a cube -Attempt to replicate published results as closely as possible created a dilemma: When to stop trying? -What was the “right” answer? 13
14
SEVANI Provided a standardised and automated method to report on estimates of variances due to sampling as well as non-response and imputation Challenges: - Can produce output for one variable at a time - SEVANI required a lot of parameters to set-up - MEP is unit-based so can’t easily output SEVANI results Solution: -Use of a macro to identify variable names -Created a SAS code to set-up parameters -Output SEVANI results outside MEP 14
15
Next steps Educate the users of the new system on MEP Identify potential areas to make improvements in the editing and imputation system Create a new MEP collection for Charities data to include its own editing and imputation system 15
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.