Metadata driven application for aggregation and tabular protection Andreja Smukavec SURS.

Slides:



Advertisements
Similar presentations
- ONS Classification Coding Tools Project Occupation Classification Workshop RSS, London, 21 June 2004 Nigel Swier.
Advertisements

Statistics 2020 and Platform Approach Te Käpehu Whetü May 2011.
Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector.
Input Data Warehousing Canada’s Experience with Establishment Level Information Presentation to the Third International Conference on Establishment Statistics.
© Statistisches Bundesamt, IIA - Mathematisch Statistische Methoden Summary of Topic ii (Tabular Data Protection) Frequency Tables Magnitude Tables Web.
STANDARD ERRORS PRESENTATION AND DISEMINATION AT THE STATISTICAL OFFICE OF THE REPUBLIC OF SLOVENIA Rudi Seljak Statistical Office of the Republic of Slovenia.
A program for exporting SAS datasets to Argus Johan Heldal Statistics Norway.
Modernisation of Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS Workshop on Modernisation of Statistical Production Geneva, 15–17.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
Environment Change Information Request Change Definition has subtype of Business Case based upon ConceptPopulation Gives context for Statistical Program.
Basque Statistics Office Confidentiality Project: Final stages Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality Tarragona, Spain,
Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September 2011 Overview of Archiving of Microdata Session 4 United Nations.
The Edit Anders Norberg, Statistics Sweden (SCB) Work Session on Statistical Data Editing Ljubljana, Slovenia, 9-11 May 2011.
Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics.
Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE.
© Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 Analytical validity and confidentiality protection of anonymised longitudinal.
Q2010, Helsinki Development and implementation of quality and performance indicators for frame creation and imputation Kornélia Mag László Kajdi Q2010,
Version 1.1 Tau-Argus and SuperCROSS A practical example using the UK Business Register Unit data Andrea Staggemeier Philip Lowthian Grant Lee.
G-Confid: Turning the tables on disclosure risk Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality Ottawa, Canada 30 October 2013 Peter.
Copyright 2010, The World Bank Group. All Rights Reserved. COVERAGE, FRAMES & GIS, Part 2 Quality assurance for census 1.
Development of metadata in the National Statistical Institute of Spain Work Session on Statistical Metadata Genève, 6-8 May-2013 Ana Isabel Sánchez-Luengo.
Accreditation practices at the Hungarian Central Statistical Office Zoltán Vereczkei Methodology Department Hungarian Central Statistical Office
Daniel Beckler United States Department of Agriculture National Agricultural Statistics Service Timothy Mulcahy NORC at the University of Chicago Topic.
Population census micro data for research: the case of Slovenia Danilo Dolenc Statistical Office of the Republic of Slovenia Ljubljana, First Regional.
SDTM Validation Delaware Valley CDISC user network Ketan Durve Johnson and Johnson Pharmaceutical Reasearch and Development May 11 th 2009.
CZECH STATISTICAL OFFICE Na padesátém 81, CZ Praha 10, Czech Republic 1 Subsystem QUALITY in Statistical Information System Czech.
1 New Implementations of Noise for Tabular Magnitude Data, Synthetic Tabular Frequency and Microdata, and a Remote Microdata Analysis System Laura Zayatz.
Lisbone, March ALBANIAN METADATA AlbMeta Prepared by INSTAT Working Group.
1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline.
Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia.
United Nations Economic Commission for Europe Statistical Division Mapping Data Production Processes to the GSBPM Steven Vale UNECE
Sampling Error Estimation – SORS practice Rudi Seljak, Petra Blažič Statistical Office of the Republic of Slovenia.
WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics.
Michelle Simard Statistics Canada UNECE Worksessions on Statistical Disclosure Control Methods Helsinki, October 2015 Development of rules from administrative.
Electronic data collection System in CSB of Latvia By Karlis Zeila, Vice President, CSB of Latvia IT DG meeting, October , Eurostat.
Editing of linked micro files for statistics and research.
Handbook on Precision Requirements and Variance Estimation for ESS Household Surveys Denisa Florescu, Eurostat European Conference on Quality in Official.
Statistical data confidentiality and micro data in Albania
The experience of a National Statistical Institute after a law change: Estonia First Regional Workshop Microdata Access in European Countries ― Cooperation.
Outlier Treatment in HCSO Present and future. Outline Outlier detection – types, editing, estimation Description of the current method Alternatives Future.
Pilot Census in Poland Some Quality Aspects Geneva, 7-9 July 2010 Janusz Dygaszewicz Central Statistical Office POLAND.
The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,
Analysis Introduction Data files, SPSS, and Survey Statistics.
The views expressed herein are those of the author and should not necessarily be attributed to the IMF, its Executive Board, or its management Data Confidentiality,
Joint UNECE/Eurostat work session on statistical data confidentiality Manchester, December 2007 Dealing with Confidentiality in Dissemination: The.
MetaPlus Klas Blomqvist Statistics Sweden Research and Development – Central Methods
Statistics Canada Citizenship and Immigration Canada Methodological issues.
RECENT DEVELOPMENT OF SORS METADATA REPOSITORIES FOR FASTER AND MORE TRANSPARENT PRODUCTION PROCESS Work Session on Statistical Metadata 9-11 February.
1 The Process of Practicing Statistical Disclosure Control in Tabular Data at Statistics Sweden Q2010 Helsinki, May 4-6 Ingegerd Jansson, Michael Carlson,
5.8 Finalise data files 5.6 Calculate weights Price index for legal services Quality Management / Metadata Management Specify Needs Design Build CollectProcessAnalyse.
Joint UNECE-Eurostat worksession on confidentiality, 2011, Tarragona Sampling as a way to reduce risk and create a Public Use File maintaining weighted.
Elaborating on the Business Architecture of SN Robbert Renssen Statistics Netherlands Standard Process Steps.
The business process models and quality issues at the Hungarian Central Statistical Office (HCSO) Mr. Csaba Ábry, HCSO, Methodological Department Geneva,
1 General Recommendations of the DIME Task Force on Accuracy WG on HBS, Luxembourg, 13 May 2011.
11 Measuring Disclosure Risk and Data Utility for Flexible Table Generators Natalie Shlomo, Laszlo Antal, Mark Elliot University of Manchester
The Role of service Granularity in Successful CSPA Realization Zvone Klun, Tomaž Špeh Geneve, 22 June 2016.
Methods for Data-Integration
Creation of synthetic microdata in 2021 Census Transformation Programme (proof of concept) Robert Rendell.
Confidentiality in Published Statistical Tables
Rudi Seljak, Aleš Krajnc
Dissemination Workshop for African countries on the Implementation of International Recommendations for Distributive Trade Statistics May 2008,
The status of metadata standards and ModernStats models in SURS
The European Statistical Training Programme
Tomaž Špeh, Rudi Seljak Statistical Office of the Republic of Slovenia
Data from statistical modeling (e. g
Education and Training Statistics Working Group – 2-3 June 2016
Mapping Data Production Processes to the GSBPM
Item 2.2 Scientific Use Files for the Time Use Survey
Presentation transcript:

Metadata driven application for aggregation and tabular protection Andreja Smukavec SURS

Practice at SURS More than once tabulated data for each survey: –Standard error estimation –Statistical disclosure control –Tabulation Tabular protection (cell suppression) is quite a time-consuming operation. Applying precision requirements and safety rules are two separate processes.

Project Standardization of Data Treatment Started in 2011 Internal project Composed of two major parts: –editing and imputations –aggregation, tabulation, sampling error estimation and confidentiality treatment Final aim: construction of metadata driven application (MetaSOP).

All the parameters for the particular survey will be provided in a metadata database (MS Access, transfer to Oracle by the end of 2013). The general code (SAS) will read from metadata database and execute required processes. The dissemination of the data will be carried out. Metadata driven application

Metadata database It consists of Table of derived and dummy variables Table of demanded statistics Table of domains Metadata for safety rules Metadata for sampling error estimation Metadata for tabulation Still missing Metadata for tabular data protection (input files for Tau Argus, SAS-Tool) Metadata for publication

Boundaries table in the metadata database Metadata for precision requirements: –Boundaries for distinction between precise, less precise and imprecise data –Type of sample Metadata for safety rules: –Parameters for dominance rule, p% rule, threshold –Holding indicator –Safety margin for threshold

Types of statistics and corresponding safety rules

Precision requirements - safety rules We included four possibilities: Precision requirements and safety rules are used. –If the statistic is imprecise (CV > 30%) and primary sensitive, then we set its status to safe. –If the statistic is less precise and primary sensitive, then it is still primary sensitive. Only precision requirements are used. Only safety rules are used. Neither precision requirements nor safety rules are used.

General SAS code It consist of 3 SAS macros: Preparation of the micro-data file Computation of all needed statistics, determination of the primary sensitive ones and attribution of the corresponding standard error and coefficient of variation Tabulation into Excel Still missing: SAS macros for confidentiality treatment (preparing Tau Argus input files, code from SAS Tool for creation of safe tables with method Cell Suppression) SAS macros for preparing files for publication on our data portal

Table with statistics Metadata database + general code table with statistics SAS environment, transfer to ORACLE planned by the end 2013 It consists of –Statistic‘s value for each domain –CV and standard error of statistic –Primary sensitivity status Still missing: –Final confidentiality status (SAS-Tool) –Final dissemination status

Metadata driven application

MetaSOP

Drawbacks and benefits Drawbacks: –Not all surveys are appropriate (only a part will be possible to use) –Better suppression pattern is usually found individually –Lot of work for the first reference period Benefits: –Better transparency (all metadata available in one place) –Survey methodologist controls the whole process –Risk for errors will decrease –Small amount of work for the next reference periods

Future plans Missing parts of the metadata database, table with statistics and general code have to be added The MetaSOP application has to be upgraded A system for all the corresponding code lists has to be developed

Thank you for your attention!