Anco Hundepool Sarah Giessing

Slides:



Advertisements
Similar presentations
Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector.
Advertisements

Confidentiality risks of releasing measures of data quality Jerry Reiter Department of Statistical Science Duke University
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Eurostat Statistical Disclosure Control. Presented by Peter-Paul de Wolf, Statistics Netherlands (CBS)
IMPROVING CONFIDENTIALITY WITH tau-ARGUS BY FOCUSSING ON CLEVER USAGE OF MICRODATA Roland van der Meijden MSc. ± 10 minutes.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
Information Coding in noisy channel error protection:-- improve tolerance of errors error detection: --- indicate occurrence of errors. Source.
Sampling Techniques. Simple Random Sample Keep Your Index Card Number On You Table 1 – Random Numbers
Joint meeting Working Groups on Environmental Accounts & Environmental Expenditure Statistics Luxembourg, 10 March 2015 Confidential data (point 7 of the.
Some ACS Data Issues and Statistical Significance (MOEs) Table Release Rules Statistical Filtering & Collapsing Disclosure Review Board Statistical Significance.
LP Narrowing: A New Strategy for Finding All Solutions of Nonlinear Equations Kiyotaka Yamamura Naoya Tamura Koki Suda Chuo University, Tokyo, Japan.
1 Using Fixed Intervals to Protect Sensitive Cells Instead of Cell Suppression By Steve Cohen and Bogong Li U.S. Bureau of Labor Statistics UNECE/Work.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Sep 29, 2004Subtraction (lvk)1 Negative Numbers and Subtraction The adders we designed can add only non-negative numbers – If we can represent negative.
◦ We sometimes need to digitize an analog signal ◦ To send human voice over a long distance, we need to digitize it, since digital signals are less prone.
Version 1.1 Improving our knowledge of metaheuristic approaches for cell suppression problem Andrea Toniolo Staggemeier Alistair R. Clark James Smith Jonathan.
Memory Management OS Fazal Rehman Shamil. swapping Swapping concept comes in terms of process scheduling. Swapping is basically implemented by Medium.
Page Buffering, I. Pages to be replaced are kept in main memory for a while to guard against poorly performing replacement algorithms such as FIFO Two.
G. Merola Winton Capital Management 1 UN/ECE Work Session On Statistical Data Confidentiality (Geneva, 9-11 November 2005) WP30: Safety rules in statistical.
Principal Component Analysis
ESTP course, SBS module 13 March 2013 Structural Business Statistics Data reporting to Eurostat, transmission format and tools.
P2 Chapter 8 CIE Centre A-level Pure Maths © Adam Gibson.
Variability GrowingKnowing.com © 2011 GrowingKnowing.com © 2011.
Chapter 4 Dynamical Behavior of Processes Homework 6 Construct an s-Function model of the interacting tank-in-series system and compare its simulation.
Chapter Eight Estimation.
Virtual University of Pakistan
Confidence Intervals for Proportions
Chapter 4 Dynamical Behavior of Processes Homework 6 Construct an s-Function model of the interacting tank-in-series system and compare its simulation.
Negative Numbers and Subtraction
Confidentiality in Published Statistical Tables
MECH 373 Instrumentation and Measurements
Confidence Intervals for Proportions
Database Performance Tuning and Query Optimization
Dissemination Workshop for African countries on the Implementation of International Recommendations for Distributive Trade Statistics May 2008,
Java Software Structures: John Lewis & Joseph Chase
Elementary Statistics
Solution of Equations by Iteration
Structural Business Statistics Data validation
Treatment of statistical confidentiality Table protection using Excel and tau-Argus Practical course Trainer: Felix Ritchie CONTRACTOR IS ACTING UNDER.
Treatment of statistical confidentiality Table protection using Excel and tau-Argus Practical course Trainer: Felix Ritchie CONTRACTOR IS ACTING UNDER.
RS – Reed Solomon List Decoding.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Structural Business Statistics Data reporting to Eurostat, transmission format and tools ESTP course, SBS module 13 March 2013.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Two Categorical Variables: The Chi-Square Test
TRAINING OF FOCAL POINTS on the CountrySTAT SYSTEM based on FENIX
Prodcom ESTP course October 2013
RAID Redundant Array of Inexpensive (Independent) Disks
The Standard Normal Distribution
Roundtable on Business Survey Frames 17-21/10/2005
Prodcom ESTP course October 2010
Principal Component Analysis
Machine Learning in Practice Lecture 17
Chapter 11 Database Performance Tuning and Query Optimization
Confidence Intervals for Proportions
International Data Encryption Algorithm
Computer Vision Lecture 19: Object Recognition III
Structural Business Statistics
Examining Data.
Chapter 5.
Confidence Intervals for Proportions
Copyright © Cengage Learning. All rights reserved.
MATH 1910 Chapter 3 Section 8 Newton’s Method.
Prodcom Working Group Item 03.5 – Confidentiality & dissemination of PRODCOM statistics Prodcom Working Group 18th -19th September 2014.
Secondary confidentiality in European business statistics
Presentation transcript:

Anco Hundepool Sarah Giessing ProdCom Rounding Anco Hundepool Sarah Giessing

Overview Current situation Cell suppression or rounding ProdCom solution Prototype Further work

Current situation (MS) Member states collect and process the data. MS applies its own conf. rules. (Dom. rule, freq rule, p% rule) MS publishes its own data first. Process done individually by each MS. Then MS sends the data to Eurostat

Current situation (Eurostat) Eurostat receives MS data plus conf. info. Problem: Eurostat cannot publish many EU-aggregates because some countries are blocked. Eurostat should respect/safeguard the national confidential cells. Result: many EU-aggregates are blocked.

ProdCom conf. chapter If one country is conf.: EU-aggregate is conf, If 2 countries are conf. and both are singleton: EU-aggregate is blocked. If 2 countries are conf. and one is a zero-cell then EU-aggregate is blocked If 2 or more countries are conf. and there is dominance then EU-aggregate is blocked If all countries are safe, then EU-aggregate is OK

ProdCom conf. chapter Result: Many ProdCom aggregates are blocked.  The usefulness of this publication is ‘limited’  Is there a way out?

ProdCom rounding project Cell suppression = replacing the conf. cell value by an interval. Minimum size of the interval is determined by the sensitivity rule. (distance to a safe cell). Dom. rule and p% rule give a clear formula. Freq. rule not but we choose a percentage.

ProdCom rounding project Conclusion: Eurostat should respect these protection intervals. Eurostat can publish approximations or intervals of the EU-aggregates. EU-intervals should respect the MS-intervals. So build those intervals.

t-ARGUS Standard solution: Cell suppression. Finding a safe pattern and minimising information loss is a hard problem. Efficient solutions offered by t-ARGUS. Lead to too many suppression on the EU-level (because suppression of additional member state level cells not allowed)

Rounding Rounded value = interval. 37 rounded to 40 base 10 = [35,44] If interval is large enough the rounded value (= interval) can be published.

Simple rounding does not work Intervals often smaller as rounding base because of table relations Sometimes the rounding can be undone Does not always provide upper protection even with large rounding base upper bound may still be sensitive – dominance problem!!!

Simple rounding does not work Intervals often smaller as rounding base Example: Cell value 5800, contributions 5200, 300, 300 sensitive at (1,85)-dominance rule Protection of 400 would be enough (5200 not dominant f.i. in 6200). Still, simple rounding with base 1000 may not provide sufficient protection: Assume table relation Becomes after rounding 6400 | 5800 600 6000 | 6000 1000 So upper bound for 5800 is 6000. But 5200 is still dominant in 6000 !!! < 6500 ≥ 500 < 6000

Simple rounding does not work Sometimes rounding can be undone Example 28 | 7 7 7 7 30 | 5 5 5 5

Simple rounding does not work Does not always provide upper protection Example: Cell value 5400, contributions 4800, 300, 300 sensitive at (1,85)-dominance rule Protection of 300 would be enough (4800 not dominant f.i. in 5700). Still, simple rounding with base 1000 does not provide sufficient protection: Cell value Becomes after rounding 5400 5000 Upper bound 5500 too close: 4800 is still dominant in 5500 !!! < 5500

Countrolled Rounding Controlled rounding preserves additivity and avoids underprotection problems. In higher dimensions very difficult. Bounds are twice as large. In t-ARGUS a rounder solution similar to cell-suppression. Based on optimisation techniques. Originally designed for freq. tables but can also be used to protect magnitude data.

Rounding Use t-ARGUS rounder to compute EU-aggregates Which rounding base? At least larger than largest protection interval. If not enough iterate till enough

Rounding base Two series: Simple (powers of 10) 10,20,…,80, 90,100,200,…,800, 900, 1000, 2000, …, 8000, 9000, 10000 Version 2 less information loss, more difficult to explain. Both have been implemented. Your choice.

Results (rounding base) Rounding base (% EU total) Frequency Proc 1 Proc 2 0-<1 23 70 1-<5 190 277 5-<10 82 109 10-<20 92 58 20-<50 107 15 50+ 39 4

Results (Iterations) Number of iterations Frequency Proc 1 Proc 2 1 448 162 2 84 195 3 111 4 48 5+ 17

Rounding Parameters to be chosen: Prot. interval freq. unsafe (%) Prot. interval secondaries (%) Prot. interval zero-cells (interval) In above examples Freq: 10%, Sec:10%, Zero 100.

Prototype Indicate the data file Indicate the rules file A few parameters And then all will be done automatically

Program

Prototype Output: File with EU-aggregates Orig. values if safe Rounded values + rounding base if unsafe Comma-separated; easily loaded in Excel Report/log file

Future A bit more testing Extend the aggregates

Conclusions Solution based on rounding has been created. Looks satisfactorily for the ProdCom problem Most EU-aggregates rounded moderately only; a few might still be considered suppressed. Much more useful ProdCom publication.