Eurostat Statistical Disclosure Control
Presented by Peter-Paul de Wolf, Statistics Netherlands (CBS)
Content Introduction What’s the problem? –Specific for business statistics Formalising the problem What to do? –Methods –Software Summary
Introduction General definition of confidential data: Data can not be published “as is” »By law (e.g. statistical law) »Sensitive data (what’s sensitive?) »Respondent considers it confidential »…
Introduction Physical protection –Entrance –Network Legal protection –Oath Statistical Disclosure Control –Protection of statistical output
What’s the problem? Statistical output Microdata –Not often in case of business data –Obvious: each record represents a single respondent Tabular data –In business data often magnitude tables –Sometimes frequency tables –But: aggregated data?!?!?!?
Cell value itself not sensitive: –All contributions are equal (1) Spanning variables –Indentifying, e.g. NACE, Region –Sensitive, e.g. “environmental offence” (illegal dumping of waste, illegal fishing, oil spills, …) What’s the problem (frequency table)
Example: number of ship-owners Environmental offence RegionYes No Total … A
What’s the problem (frequency table) Example: number of ship-owners Environmental offence RegionYes No Total … B
What’s the problem (frequency table) Example: number of ship-owners Environmental offence RegionYes No Total … C
What’s the problem (magnitude table) Turnover (10 6 €) of instrument producing companies Region A B C Total Harps Organs Pianos Other Total
What’s the problem (magnitude table) Turnover (10 6 €) of instrument producing companies Region A B C Total Harps Organs Pianos Other Total ?
Formalising the problem Suppose cell (Piano, A) consists of Company X: 8110 6 € Company Y: 510 6 € Other three: 210 6 € each Total : 9210 6 € 92 – 5 = 87 is within 7.4%!
Formalising the problem General, objective rules needed Threshold rule Dominance rule or (n,k)-rule p%-rule p%-rule is favoured over (n,k)-rule and implies minimum of 3 contributors
What to do? Redesign table –Combine rows/columns –Define different categories Rounding Add noise Cell suppression
Region A B C D Total Harps Organs Pianos Other Total
Cell suppression Region A B C D Total Harps Organs Pianos Other Total X X X
Cell suppression Region A B C D Total Harps Organs Pianos Other Total X X X X XX
Cell suppression Region A B C D Total Harps Organs Pianos Other Total X X X XX X X X X
Cell suppression Region A B C D Total Harps Organs Pianos Other Total X X X XX X X X X
Software Latest version can be found on New Open Source version available end 2014
Contact/info Glossary, handbook, project info – Wiley book