The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki, 5 – 7 October 2015
Old system Stove-pipe oriented production –Ad-hoc solutions were developed for a particular survey Survey methodologists‘ strive for improvement was crucial –“Our data are not confidential“ Process metadata were not organized –Difficulties when a survey methodologist resigns
Renovation An internal project started in 2012 –IT, General Methodology and subject-matter specialists –Build a global solution appropriate for most of the surveys –Solution which covers most of the parts of statistical production: Data validation Data editing and imputation Aggregation and standard error estimation Statistical disclosure control for tabular data Tabulation
Renewed system Generalised metadata driven application –Database of process metadata MS Access -> ORACLE For each survey instance –General SAS code –GUI for process metadata –Different microdata environments allowed, just some basic rules for the structure of microdata databases Ad hoc SAS program for preparation of microdata
Schematic presentation of the renewed system Different microdata databases General SAS Ad- Database of process metadata Metadata repository Different kind of output … program Application for management Data on tables and variables Ad-hoc
Tabular data protection 1.Calculation of primary sensitivity for seven types of statistics: number, total, share, ratio, average… –Threshold, p%-rule, (n,k)-dominance rule –„Holding rule“ + sampling weights –Zeroes unsafe 2.Secondary suppression applied in case of sensitive statistics (number and total) –SAS-Tool (Excel file with metadata, Tau Argus, SAS macros)
Tabular data protection Results for each survey instance saved in the database with statistics (ORACLE) –Statuses for lower precision –Confidentiality flags for the type of primary and secondary suppression 3 types of tabulation (codelists) –Excel format (the most user-friendly) –plain text format (.tab,.hrc) for Tau-Argus –plain text format (.csv) for PX-Edit (SURS’s publication tool)
Tabulation & Tabular Data Protection
Parameters for SDC in MetaSOP
Tabulation in MetaSOP
Processing in MetaSOP
Example of 3-dimensional table After aggregation CC_SI / Dim_2 Dim_3 TOT F O E E E E E E TOT z TOT E E TOT E z z E E z z After use of SAS-Tool CC_SI / Dim_2 Dim_3 TOT F O E E E E E E TOT zz zz zz zz zz 12 TOT E E zz zz zz 21 TOT E zzz 11 zzz 2 zzz E zz 23 zzz zz
New organization Old system: –Every survey had its own programmer and its own general methodologist Renewed system: –General methodologist and IT expert („support team“) help the subject-matter specialist to insert and edit the process metadata (except for SDC) into the application run particular parts of the statistical process
Advantages The subject-matter personnel‘s skills improve (higher quality of data) The process metadata can be changed easily and the procedure can be repeated in short time (flexibility) The rules for data processing are gathered in one place (transparency)
Drawbacks High risk of syntax errors in the process of the insertion of metadata expressions Subject-matter personnel has to learn some new skills (SAS expressions) An error during the execution can cause problem if the support team is busy or not available
Challenges for the future Introduce the application successfully into the production –Adjusting to changes by the subject-matter specialists –Building a qualified support team Adding new functionalities –Indices –Secondary suppression for other types of statistics –GUI instead of the Excel file for the SAS - Tool
Thank you for attention.