The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Slides:



Advertisements
Similar presentations
Survey design. What is a survey?? Asking questions – questionnaires Finding out things about people Simple things – lots of people What things? What people?
Advertisements

Cipher Techniques to Protect Anonymized Mobility Traces from Privacy Attacks Chris Y. T. Ma, David K. Y. Yau, Nung Kwan Yip and Nageswara S. V. Rao.
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY.
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Raef Bassily Adam Smith Abhradeep Thakurta Penn State Yahoo! Labs Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds Penn.
“Mortgages, Privacy, and Deidentified Data” Professor Peter Swire Ohio State University Center for American Progress Consumer Financial Protection Bureau.
Privacy Enhancing Technologies
UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006.
Kunal Talwar MSR SVC [Dwork, McSherry, Talwar, STOC 2007] TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A AA A.
Differential Privacy 18739A: Foundations of Security and Privacy Anupam Datta Fall 2009.
Finding Personally Identifying Information Mark Shaneck CSCI 5707 May 6, 2004.
. Bayesian Networks Lecture 9 Edited from Nir Friedman’s slides by Dan Geiger from Nir Friedman’s slides.
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
Frank McSherry Researcher Microsoft Research, Silicon Valley.
Privacy: Challenges and Opportunities Tadayoshi Kohno Department of Computer Science and Engineering University of Washington.
Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China CIKM 2009.
Foundations of Privacy Lecture 11 Lecturer: Moni Naor.
Current Developments in Differential Privacy Salil Vadhan Center for Research on Computation & Society School of Engineering & Applied Sciences Harvard.
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
R 18 G 65 B 145 R 0 G 201 B 255 R 104 G 113 B 122 R 216 G 217 B 218 R 168 G 187 B 192 Core and background colors: 1© Nokia Solutions and Networks 2014.
Private Analysis of Graphs
The Complexity of Differential Privacy Salil Vadhan Harvard University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Defining and Achieving Differential Privacy Cynthia Dwork, Microsoft TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Health information that does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be.
Math 409/409G History of Mathematics
JSM, Boston, August 8, 2014 Privacy, Big Data and The Public Good: Statistical Framework Stefan Bender (IAB)
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #5 Jose M. Cruz Assistant Professor.
Data Publishing against Realistic Adversaries Johannes Gerhrke Cornell University Ithaca, NY Michaela Götz Cornell University Ithaca, NY Ashwin Machanavajjhala.
Thwarting Passive Privacy Attacks in Collaborative Filtering Rui Chen Min Xie Laks V.S. Lakshmanan HKBU, Hong Kong UBC, Canada UBC, Canada Introduction.
Differentially Private Marginals Release with Mutual Consistency and Error Independent of Sample Size Cynthia Dwork, Microsoft TexPoint fonts used in EMF.
Dimensions of Privacy 18739A: Foundations of Security and Privacy Anupam Datta Fall 2009.
Lecture 17 Page 1 CS 236 Online Privacy CS 236 On-Line MS Program Networks and Systems Security Peter Reiher.
Data Anonymization (1). Outline  Problem  concepts  algorithms on domain generalization hierarchy  Algorithms on numerical data.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
Differential Privacy: Theoretical & Practical Challenges Salil Vadhan Center for Research on Computation & Society John A. Paulson School of Engineering.
Privacy-preserving data publishing
A Whirlwind Tour of Differential Privacy
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 13, 2013.
Fundamental Limitations of Networked Decision Systems Munther A. Dahleh Laboratory for Information and Decision Systems MIT AFOSR-MURI Kick-off meeting,
Anonymity and Privacy Issues --- re-identification
Probabilistic km-anonymity (Efficient Anonymization of Large Set-valued Datasets) Gergely Acs (INRIA) Jagdish Achara (INRIA)
Graph Data Management Lab, School of Computer Science Personalized Privacy Protection in Social Networks (VLDB2011)
THRio Database Linkage and THRio Database Issues.
Differential Privacy (1). Outline  Background  Definition.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05
Introduction to Machine Learning © Roni Rosenfeld,
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
Auditing Information Leakage for Distance Metrics Yikan Chen David Evans TexPoint fonts used in EMF. Read the TexPoint manual.
Unraveling an old cloak: k-anonymity for location privacy
Sequential Off-line Learning with Knowledge Gradients Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial Engineering.
Sergey Yekhanin Institute for Advanced Study Lower Bounds on Noise.
Differential Privacy with Bounded Priors: Reconciling Utility and Privacy in Genome-Wide Association Studies Florian Tramèr, Zhicong Huang, Erman Ayday,
A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has cancer. -suppose.
Introduction to Machine Learning
University of Texas at El Paso
ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,
Reconciling Public Policy with New Theories of Privacy
Database Systems: Design, Implementation, and Management Tenth Edition
Privacy-preserving Release of Statistics: Differential Privacy
Human Cells Human genomics
Thinking (and teaching) Quantitatively about Bias and Privacy
Differential Privacy in Practice
Current Developments in Differential Privacy
Published in: IEEE Transactions on Industrial Informatics
Some contents are borrowed from Adam Smith’s slides
Differential Privacy (1)
Presentation transcript:

The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA

Pre-Modern Cryptography Propose Break

Modern Cryptography Propose Definition Break Definition Propose STRONGER Definition Break Definition algorithms satisfying definition Algs Propose STRONGER [GoldwasserMicali82,GoldwasserMicaliRivest85]

Modern Cryptography Propose Definition Break Definition Propose STRONGER Definition Break Definition algorithms satisfying definition Algs Propose STRONGER [GoldwasserMicali82,GoldwasserMicaliRivest85]

No Algorithm? Propose Definition ? Why?

Provably No Algorithm? Bad Definition Propose Definition ? Propose WEAKER/DIFF Definition Alg / ?

The Privacy Dream Original DatabaseSanitized data set ? C  Census, financial, medical data; OTC drug purchases; social networks; MOOCs data; call and text records; energy consumption; loan, advertising, and applicant data; ad clicks product correlations, query logs,…

Fundamental Law of Info Recovery “Overly accurate” estimates of “too many” statistics is blatantly non-private DinurNissim03; DworkMcSherryTalwar07; DworkYekhanin08, De12, MuthukrishnanNikolov12,…

Anonymization (aka De-Identification)  Remove “personally identifying information” Name sex DOB zip symptoms previous admissions medications family history …

Anonymization (aka De-Identification)  Remove “personally identifying information” Name sex DOB zip symptoms previous admissions medications family history …

Anonymization (aka De-Identification)  Remove “personally identifying information” Name sex DOB zip symptoms previous admissions medications family history …

Anonymization (aka De-Identification)  Remove “personally identifying information” Name sex DOB zip symptoms previous admissions medications family history … ?

William Weld’s Medical Records ZIP birth date sex name address date reg. party affiliation last voted ethnicity visit date diagnosis procedure medication total charge voter registration data HMO data Sweeney97

William Weld’s Medical Records ZIP birth date sex name address date reg. party affiliation last voted ethnicity visit date diagnosis procedure medication total charge voter registration data HMO data Sweeney97

Anonymization (aka De-Identification)  Remove “personally identifying information” Name sex DOB zip symptoms previous admissions medications family history … ??

Anonymization (aka De-Identification)  Remove “personally identifying information” Name sex DOB zip symptoms previous admissions medications family history … ??

NarayananShmatikov08

Re-ID Not the Only Worry k- anonymity Break Definition l - diversity Break Definition algorithms satisfying definition Algs m,t m-invariance t-closeness SamaratiSweeney98, MakanvajjhalaGehrkeKiferVenkitasubramaniam06,XiaoTao07,LiLiVenkatasubmraminan07

Culprit: Diverse Background Info Voter Weld: M, DOB, zip IMDb

Billing for Targeted Advertisements + + [Korolova12]

Product Recommendations  X’s preferences influence Y’s experience  Combining evolving similar items lists with a little knowledge (from your blog) of what you bought, an adversary can infer purchases you did not choose to publicize CalandrinoKilzerNarayananFeltenShmatikov11 People who bought this also bought… Blog

SNP: Single Nucleotide (A,C,G,T) polymorphism CT TT … … … … SNP statistics of Case Group “Can” Test Case Group Membership using target’s DNA and HapMap GWAS Statistics Homer+08

Culprit: Diverse Background Info Voter Weld: M, DOB, zip IMDb People who bought this… Blog Billing

HapMap Culprit: Diverse Background Info Voter Weld: M, DOB, zip IMDb People who bought this… Blog Billing

How Should We Approach Privacy?  “Computer science got us into this mess, can computer science get us out of it?” (Sweeney, 2012)

How Should We Approach Privacy?  “Computer science got us into this mess, can computer science get us out of it?” (Sweeney, 2012)  Complexity of this type requires a mathematically rigorous theory of privacy and its loss.

How Should We Approach Privacy?  “Computer science got us into this mess, can computer science get us out of it?” (Sweeney, 2012)  Complexity of this type requires a mathematically rigorous theory of privacy and its loss.  We cannot discuss tradeoffs between privacy and statistical utility without a measure that captures cumulative harm over multiple uses.  Other fields -- economics, ethics, policy -- cannot be brought to bear without a “currency,” or measure of privacy, with which to work.

Useful Databases that Teach  Database teaches that smoking causes cancer.  Smoker S’s insurance premiums rise.  Premiums rise even if S not in database!  Learning that smoking causes cancer is the whole point.  Smoker S enrolls in a smoking cessation program.  Differential privacy: limit harms to the teachings, not participation The outcome of any analysis is essentially equally likely, independent of whether any individual joins, or refrains from joining, the database.

Useful Databases that Teach  Database teaches that smoking causes cancer.  Smoker S’s insurance premiums rise.  Premiums rise even if S not in database!  Learning that smoking causes cancer is the whole point.  Smoker S enrolls in a smoking cessation program.  Differential privacy: limit harms to the teachings, not participation The likelihood of any possible harm to ME is essentially independent of whether I join, or refrain from joining, the database.

Useful Databases that Teach  Database teaches that smoking causes cancer.  Smoker S’s insurance premiums rise.  Premiums rise even if S not in database!  Learning that smoking causes cancer is the whole point.  Smoker S enrolls in a smoking cessation program.  Differential privacy: limit harms to the teachings, not participation High premiums, busted, purchases revealed to co-worker… Essentially equally likely when I’m in as when I’m out

Differential Privacy [D.,McSherry,Nissim,Smith ‘06]

Impossible to know the actual probabilities of bad events. Can still control change in risk due to joining the database.

Differential Privacy [D.,McSherry,Nissim,Smith ‘06] “Privacy Loss” Impossible to know the actual probabilities of bad events. Can still control change in risk due to joining the database.

Differential Privacy [D.,McSherry,Nissim,Smith ‘06] “Privacy Loss”

Differential Privacy  Nuanced measure of privacy loss  Captures cumulative harm over multiple uses, multiple databases  Adversary’s background knowledge is irrelevant  Immune to re-identification attacks, etc.  “Programmable”  Construct complicated private analyses from simple private building blocks

Recall: Fundamental Law “Overly accurate” estimates of “too many” statistics is blatantly non-private DinurNissim03; DworkMcSherryTalwar07; DworkYekhanin08, De12, MuthukrishnanNikolov12,…

Answer Only Questions Asked q1q1 a1a1 Database curator data analysts C q2q2 a2a2 q3q3 a3a3

Intuition

Algorithms, geometry, learning theory, complexity theory, cryptography, statistics, machine learning, programming languages, verification, databases, economics,…

Not a Panacea  Fundamental Law of Information Recovery still holds

Challenge: The Meaning of Loss  Sometimes the theory gives exactly the right answer  Large loss in differential privacy translates to “obvious” real life privacy breach, under circumstances known to be plausible  Other times?  Do all large losses translate to such realizable privacy breaches, or is the theory too pessimistic?

Policy Recommendation DworkMulligan14

“Just a Few”? Randomly choose a few rows; Publish in entirety. OK?

Fundamental Law  There is a (LARGE) set of statistics, S  An analyst having even a remotely accurate estimate of EVERY statistic in S can completely violate any reasonable notion of privacy  There is a very simple way of designing small sets of statistics, T  An analyst having estimates of 80% of the statistics in T that beat the sampling error can completely violate any reasonable notion of privacy Must be very inaccurate on some statistic in S “Overly accurate” estimates of “too many” statistics is blatantly non-private DinurNissim03; DworkMcSherryTalwar07; DworkYekhanin08, De12, MuthukrishnanNikolov12,…