G-Confid: Turning the tables on disclosure risk Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality Ottawa, Canada 30 October 2013 Peter Wright
2 G-Confid: a cell suppression application Use with any table size and any number of dimensions (subject to hardware / memory limitations) Available for SAS 9.2 and 9.3; SAS EG 4.3 and 5.1 PROC SENSITIVITY identifies sensitive cells Highlights, inputs, strategies Macro SUPPRESS creates a suppression pattern Inputs, outputs, strategies Macro AUDIT audits a suppression pattern Overview by component
PROC SENSITIVITY identifies confidential cells Highlights: Choice of sensitivity rule: p-percent, (n,k), arbitrary Allows multiple decomposition 3 where
Inputs for PROC SENSITIVITY Definition of hierarchy(ies) for each table dimension Microdata file Classification variables (e.g., geography, industry) Enterprise identifier Enterprise value 4 Tip: to reduce the sensitivity of a cell by the value of an enterprise, set the enterprise identifier = missing
Example of SAS code to run PROC SENSITIVITY proc sensitivity data=microfile outconstraint=consfile outcell=cellfile outlargest=largestfile hierarchy="0 East West; ;" srule=“pq.20" range=“East A B: West C D; : : ;" minresp=5; id Enterpriseid; var Income; dimension EastWest Industry; run; 5
Strategies using PROC SENSITIVITY Use the MINRESP=r option to set the minimum number of respondents Any cell with fewer than r respondents is assigned a sensitivity of max{1, S} where S is the sensitivity of the cell Only positive (>0) values are counted as respondents MINRESP rule is ignored for a cell with a value contributed by an anonymous enterprise 6 Note: we can use MINRESP without applying a sensitivity rule
Strategies using PROC SENSITIVITY (continued) To reduce oversuppression, apply rules that make use of sampling weights Example: if the sampling weight w i >3, make the enterprise anonymous (set ID value=missing). G-Confid will use its contribution to reduce the sensitivity of the cell. 7 Find more strategies in: Tambay and Fillion (Proceedings of the JSM 2013)
Macro SUPPRESS – complementary suppression Uses the SAS/OR® LP solver Input files: (i) cell sensitivities file, and (ii) linear constraints file Syntax: %Suppress(InCell=, Constraint=, CFunction1=, CFunction2=, CVar1=, CVar2=, OutCell=, ByVars=, OutComplement=, ScaleCost=); Output file has final status (Suppress, Publish) and the net variation (largest amount the cell was “moved”) 8
Strategies using the macro SUPPRESS Choice of cost functions (functions of cell total) Can run the LP process twice to reduce the number of suppressions (e.g., SIZE or DIGITS, then INFORMATION) Can favour publishing certain cells by defining higher cost values (by default, cost=tot) 9 SIZE (=tot)DIGITS (=log[tot+1]) CONSTANT (=1)INFORMATION (=log[tot+1]/[tot+1])
Macro AUDIT – validates a suppression pattern Calculates minimum and maximum values for each suppressed cell using LP solver Provides results for each cell (protection achieved, not achieved, or exact disclosure) 10 Coming soon: pre-set narrower starting intervals than the default values (0.5tot and 1.5tot) using the Shuttle algorithm (Buzzigoli and Giusti (2006)) Using the Shuttle algorithm to pre-set the starting intervals ↓ run time
11 PROC SENSITIVITY Use pre-defined or customized sensitivity rule Can do multiple decomposition MINRESP function Can apply weighting strategies Macro SUPPRESS Can favour cells to publish (or suppress) Macro AUDIT Conclusion Coming soon: additive controlled rounding
12 For more information, Pour plus d’information, please contact:veuillez contacter : Peter Wright