Download presentation
Presentation is loading. Please wait.
Published byProsper Simmons Modified over 9 years ago
1
Setting the Stage: How De-Identification Came into U.S. Law, and Why the Debate Matters Today Professor Peter Swire Ohio State University/Future of Privacy Forum FPF Conference on DeIdentification National Press Club December 5, 2011
2
Overview U.S. history: Census, federal agency statistics, & HIPAA Why Deidentification (DeID) matters today – The debate – it works or it doesn’t – Three threat models – Analogy to law enforcement Big picture – useful for many tasks, even with the limits shown by scientists
3
Census, Statistics & DeID Many years of Census experience – Highly useful data – Deidentified Periodic opposition to mandatory reporting Needed strong confidentiality promises – Suppress small cell size Only home in a census tract – Fuzz data – Strict rules against release even for national security purposes
4
Federal Agency Statistics Codification in Confidential Information Protection & Statistical Efficiency Act of 2002 (CIPSEA) – Good history by Sylvester & Lohr Basic rule: if collect data for statistical purposes, use only for statistical purposes, don’t ReID Funny thing: same culture & practice for years in private sector polling (Gallup-style) and market research Many years of practice here Perhaps a basic guideline going forward?
5
HIPAA 1999-2000 regs informed by Sweeney research Safe harbor – delete a lot of specified data fields Expert (I pushed for this) – where statistical basis, can achieve DeID based on risk, not safe harbor Data use agreements – release for research, with enforceable promise not to ReID In short: – If scrubbed enough, can release publicly – If scrubbed less, then enforceable promise not to ReID
6
Why It Matters Today Now data mining far beyond specialized researchers – The Internet (commercial since only 1993) gives me access to data – Storage & processing on my laptop > mainframe of 25 years ago – Search is way better – The erosion of practical obscurity – “they” really may figure out who “we” are
7
The Debate is Joined Ohm (and others) draw on Sweeney-type research – DeID likely to lead to ReID Yakowitz (and others) respond – Benefits of public data enormous – Practical risk/harm from ReID low Anonymization creates huge risks or low risks? Worth doing anonymization/DeID at all? Today’s conference to shed light on this …
8
Threat Models – Which Attackers? Three types of attackers on “anonymized” data: – Insiders “peeping” – Outside hackers intruding – The public who doesn’t get into the database DeID often effective for first two Ohm/Yakowitz debate primarily on the third
9
Insiders Peeping Swire 2009 Peeping article, at peterswire.net Threat: employee or employee of sub-contractor sees the data and “peeps” – Sees celebrity information - Clooney – Sees information about friend/family/ex – Sees information to create harm (ID theft, blackmail) Anonymization useful part of anti-peeping strategy – Employee doesn’t search or stumble upon Clooney – Employee may lack tools to do Sweeney-type analysis – Audit logs catch employees who try – Give employees access to statistical data, not PII
10
Outside Hackers Hacker may intrude for a short while – Anonymization may prevent “ah hah” – Clooney Hacker may download database – If so, then hacker becomes similar to the public – May or may not be good at Sweeney-type tricks – May be focused on specific types of information, and not try to ReID Less-than-perfect DeID may substantially reduce incidence of ReID
11
Re-ID by “The Public” So, masking may help against some threats The debate, though, is whether “the public” (i.e., the experts) can ReID Sweeney & other research provides startling & important results of ReID – Can everything be ReIdentified?
12
ReID & 2 Famous Studies Date of birth, zip, & gender -> 80%+ unique – Yes – BUT, DOB is off-the-charts different Gender – splits population in half DOB = 366 (days) x 80 (years) = over 25,000 cells Moral – DOB ridiculously strong to ReID Netflix and can Re-ID over 60% of movie reviews – BUT, takes known ImDB reviewers and matches to Netflix – Can ReID a lot, but not a big effect
13
Law Enforcement Analogy So, is ReID generally easy or hard, useful or useless? Consider cop with a bunch of clues (male, tall, red hair, etc.) – Enough to ReID? No – Helpful to ReID? Yes – A matter of how much legwork, analysis, extra data is available and accurate – Very big range for difficulty of finding the suspect – Same is true for ability of “the public” to ReID, to name the suspect
14
Conclusion Issue matters today -- more data potentially available to “the public” History of useful anonymization in statistics – If collect data for statistical purposes, use only for statistical purposes, store that way, don’t ReID DeID helps against insider & hacker threats DeID by “the public” varies widely in the effort needed to find the “suspect” Our conference today to help policymakers learn where DeID likely to be most useful
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.