Confidentiality in Published Statistical Tables Annu Cabrera 22.9.2015 Study visit of the State Statistical Service of Ukraine (SSSU)
Statistics Act (280/2004), Section 11 “Statistics shall be compiled so that those whom they concern are not directly or indirectly identifiable from them, unless the data concerning identification are public by virtue of this Act. “ Statistics Act + theory & methodology → Guidelines How to apply theory in practice? Which methods to use? → Practice 22 September 2015 Annu Cabrera
Guidelines on the protection of tabulated data I Internal guidelines Tabulated business data Tabulated personal data Guidelines renewed in 2013 Old guidelines from 2000 & 2002 and since then legislation had changed data protection methods and tools had developed statistical disclosure control (SDC) practices at different departments of the agency had developed and adopted their own standards… → Need for consistency in practices 22 September 2015 Annu Cabrera
Guidelines on the protection of tabulated data II Guidelines describe protection methods and practices in general Departments are required to write down more specific SDC instructions for every statistics they publish These instructions need to be available and easily found inside Statistics Finland → Comparison of protection methods and practices between different statistics is possible → Information on methods specific for certain statistics doesn’t get lost even if production team changes 22 September 2015 Annu Cabrera
Recommendations for business data Default threshold rule: information on less than 3 units (business/enterprise/corporation group) cannot be disseminated If data are recent and their disclosure could have an impact on the market situation or the activity of an individual enterprise a dominance rule should be used alongside the threshold rule If protection can be made by not disseminating the identity and number of data suppliers, this is recommended Estimates based on sample data No information on which units belong to the sample Confidential data can be released if the data supplier gives consent to their publication 22 September 2015 Annu Cabrera
Recommendations for personal data Assessment of the sensitivity of the information contained in a certain statistical output Default threshold rule: information on small number of units (persons/households) or small classes/groups of units should not be disseminated. In tabulation cell frequency is too small if it’s less than 3 class/group frequency is too small if it’s less than 10 22 September 2015 Annu Cabrera
Data protection in practice Usually the same people who compile and prepare statistics for dissemination also implement the necessary data protection methods SDC expert from Standards and Methods department can assist with implementation if needed If new practices or methods are needed to apply in statistics production it’s normally done in co-operation with statistics production team and SDC expert 22 September 2015 Annu Cabrera
Protection methods Active data protection methods are needed to apply if data on individual units are at risk of being disclosed from a certain statistics i.e. threshold rule or dominance rule are violated Used protection methods at Statistics Finland: Re-designing the table / changing the classification Cell suppression Primary suppressions (disclosive cells) Secondary suppressions (non-disclosive cells) Additional cells need to be suppressed to prevent re-calculation of disclosive cells from table marginals / totals 22 September 2015 Annu Cabrera
Tools for cell suppression ”Manually” choosing secondary suppressions Works only with simple tables, i.e. tables with only few explanatory variables SAS SAS codes specific for certain statistics Not flexible solutions if new explanatory variables or classifications are used τ-Argus Can handle more complex tables (hierarchical structure, linked tables) Integrations with other production tools and software not easy Used only in few business statistics 22 September 2015 Annu Cabrera
Annu Cabrera