Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,

Similar presentations


Presentation on theme: "The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,"— Presentation transcript:

1 The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki, 5 – 7 October 2015

2 Old system Stove-pipe oriented production –Ad-hoc solutions were developed for a particular survey Survey methodologists‘ strive for improvement was crucial –“Our data are not confidential“ Process metadata were not organized –Difficulties when a survey methodologist resigns

3 Renovation An internal project started in 2012 –IT, General Methodology and subject-matter specialists –Build a global solution appropriate for most of the surveys –Solution which covers most of the parts of statistical production: Data validation Data editing and imputation Aggregation and standard error estimation Statistical disclosure control for tabular data Tabulation

4 Renewed system Generalised metadata driven application –Database of process metadata MS Access -> ORACLE For each survey instance –General SAS code –GUI for process metadata –Different microdata environments allowed, just some basic rules for the structure of microdata databases Ad hoc SAS program for preparation of microdata

5 Schematic presentation of the renewed system Different microdata databases General SAS Ad- Database of process metadata Metadata repository Different kind of output … program Application for management Data on tables and variables Ad-hoc

6 Tabular data protection 1.Calculation of primary sensitivity for seven types of statistics: number, total, share, ratio, average… –Threshold, p%-rule, (n,k)-dominance rule –„Holding rule“ + sampling weights –Zeroes unsafe 2.Secondary suppression applied in case of sensitive statistics (number and total) –SAS-Tool (Excel file with metadata, Tau Argus, SAS macros)

7 Tabular data protection Results for each survey instance saved in the database with statistics (ORACLE) –Statuses for lower precision –Confidentiality flags for the type of primary and secondary suppression 3 types of tabulation (codelists) –Excel format (the most user-friendly) –plain text format (.tab,.hrc) for Tau-Argus –plain text format (.csv) for PX-Edit (SURS’s publication tool)

8 Tabulation & Tabular Data Protection

9 Parameters for SDC in MetaSOP

10 Tabulation in MetaSOP

11 Processing in MetaSOP

12 Example of 3-dimensional table After aggregation CC_SI / Dim_2 Dim_3 TOT F O 12099435481.09E+091.23E+08 1 37700934.42356254422075493 11 47110694.4846417660693034.1 2 733763444.26.62E+0871456295 21 517712620.14.8E+0837489998 22 161044502.51.1E+0850837088 23 37903335.8537783060120275.8 24 343495995.12.86E+0857438583 11 TOT 59283130.99561998833083248 1 64428657.15624536771974980 11 21989840.6921609892379948.2 2 69502173.33673771012125073 21 13959568.6713959569 - 22 338148.7639338148.8 z 23 7911125.1227911125 - 24 27886089.54260160251870064 12 TOT 215349659.22.04E+0811792968 1 5993635.3565993635 - 11 2035728.9542035729 - 2 55635358.28544305111204847 21 146242216.31.43E+082783876 22 4164502.4173872003292499.2 23 38774447.75349318623842585 24 42332750.72374471124885639 21 TOT 1769727281.76E+081323998 1 2248602.3522248602 z 11 166013.5624166013.6 z 2 372993785.93.69E+084134769 21 418831917.84.08E+0810337323 22 29411096.0829411096 z 23 56581.597556581.6 z 24 88244091.34864834311760660 After use of SAS-Tool CC_SI / Dim_2 Dim_3 TOT F O 12099435481.09E+091.23E+08 1 37700934.42356254422075493 11 47110694.4846417660693034.1 2 733763444.26.62E+0871456295 21 517712620.14.8E+0837489998 22 161044502.51.1E+0850837088 23 37903335.8537783060120275.8 24 343495995.12.86E+0857438583 11 TOT 59283130.99561998833083248 1 64428657.15 zz 11 21989840.69 zz 2 69502173.33 zz 21 13959568.6713959569 - 22 338148.763 zz 23 7911125.1227911125 - 24 27886089.54 zz 12 TOT 215349659.22.04E+0811792968 1 5993635.3565993635 - 11 2035728.9542035729 - 2 55635358.28544305111204847 21 146242216.31.43E+082783876 22 4164502.417 zz 23 38774447.75 zz 24 42332750.72 zz 21 TOT 1769727281.76E+081323998 1 zzz 11 zzz 2 zzz 21 418831917.84.08E+0810337323 22 29411096.08 zz 23 zzz 24 88244091.34 zz

13 New organization Old system: –Every survey had its own programmer and its own general methodologist Renewed system: –General methodologist and IT expert („support team“) help the subject-matter specialist to insert and edit the process metadata (except for SDC) into the application run particular parts of the statistical process

14 Advantages The subject-matter personnel‘s skills improve (higher quality of data) The process metadata can be changed easily and the procedure can be repeated in short time (flexibility) The rules for data processing are gathered in one place (transparency)

15 Drawbacks High risk of syntax errors in the process of the insertion of metadata expressions Subject-matter personnel has to learn some new skills (SAS expressions) An error during the execution can cause problem if the support team is busy or not available

16 Challenges for the future Introduce the application successfully into the production –Adjusting to changes by the subject-matter specialists –Building a qualified support team Adding new functionalities –Indices –Secondary suppression for other types of statistics –GUI instead of the Excel file for the SAS - Tool

17 Thank you for attention.


Download ppt "The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,"

Similar presentations


Ads by Google