Privacy by Learning the Database Moritz Hardt DIMACS, October 24, 2012.

Slides:

Advertisements

Similar presentations

1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Advertisements

Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin

Generalization and Specialization of Kernelization Daniel Lokshtanov.

COMP 553: Algorithmic Game Theory Fall 2014 Yang Cai Lecture 21.

Learning Juntas Elchanan Mossel UC Berkeley Ryan O’Donnell MIT Rocco Servedio Harvard.

Raef Bassily Adam Smith Abhradeep Thakurta Penn State Yahoo! Labs Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds Penn.

FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley 1 and Robert E. Schapire 2 1 Carnegie Mellon University 2 Princeton University.

Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua On Agnostic Boosting and Parity Learning.

Foundations of Privacy Lecture 4 Lecturer: Moni Naor.

Efficient Query Evaluation on Probabilistic Databases

Foundations of Privacy Lecture 6 Lecturer: Moni Naor.

Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.

Yi Wu (CMU) Joint work with Vitaly Feldman (IBM) Venkat Guruswami (CMU) Prasad Ragvenhdra (MSR)

Probably Approximately Correct Model (PAC)

Ensemble Learning: An Introduction

Evaluating Hypotheses

Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

Private Information Retrieval. What is Private Information retrieval (PIR) ? Reduction from Private Information Retrieval (PIR) to Smooth Codes Constructions.

Computational Complexity, Physical Mapping III + Perl CIS 667 March 4, 2004.

Dasgupta, Kalai & Monteleoni COLT 2005 Analysis of perceptron-based active learning Sanjoy Dasgupta, UCSD Adam Tauman Kalai, TTI-Chicago Claire Monteleoni,

Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.

Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.

1 On The Learning Power of Evolution Vitaly Feldman.

Foundations of Privacy Lecture 11 Lecturer: Moni Naor.

How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.

Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.

Using Data Privacy for Better Adaptive Predictions Vitaly Feldman IBM Research – Almaden Foundations of Learning Theory, 2014 Cynthia Dwork Moritz Hardt.

The Complexity of Differential Privacy Salil Vadhan Harvard University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:

Foundations of Privacy Lecture 6 Lecturer: Moni Naor.

Ragesh Jaiswal Indian Institute of Technology Delhi Threshold Direct Product Theorems: a survey.

Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami.

Benk Erika Kelemen Zsolt

Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.

Potential-Based Agnostic Boosting Varun Kanade Harvard University (joint work with Adam Tauman Kalai (Microsoft NE))

The Sparse Vector Technique CompSci Instructor: Ashwin Machanavajjhala 1Lecture 12 : Fall 12.

Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.

Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.

Foundations of Privacy Lecture 5 Lecturer: Moni Naor.

Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.

CS6045: Advanced Algorithms NP Completeness. NP-Completeness Some problems are intractable: as they grow large, we are unable to solve them in reasonable.

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University Today: Computational Learning Theory Probably Approximately.

CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.

Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,

1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.

Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.

Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.

Approximation Algorithms based on linear programming.

The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.

Sergey Yekhanin Institute for Advanced Study Lower Bounds on Noise.

Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.

1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.

Machine Learning: Ensemble Methods

Private Data Management with Verification

Data Driven Resource Allocation for Distributed Learning

Dana Ron Tel Aviv University

Understanding Generalization in Adaptive Data Analysis

Privacy-preserving Release of Statistics: Differential Privacy

NP-Completeness Yin Tat Lee

Differential Privacy in Practice

Vitaly (the West Coast) Feldman

Pseudo-derandomizing learning and approximation

CSCI B609: “Foundations of Data Science”

Computational Learning Theory

Foundations of Privacy Lecture 7

Differential Privacy (2)

Learning, testing, and approximating halfspaces

Computational Learning Theory

NP-Completeness Yin Tat Lee

Lecture 14 Learning Inductive inference

Gentle Measurement of Quantum States and Differential Privacy *

Presentation transcript:

Privacy by Learning the Database Moritz Hardt DIMACS, October 24, 2012

Isn’t privacy the opposite of learning the database?

Curator Analyst data set D = multi-set over universe U query set Q privacy-preserving structure S accurate on Q

N 45 Data set D as N-dimensional histogram where N=|U| D[i] = # elements in D of type i Normalized histogram = distribution over universe 1 0 Statistical query q (aka linear/counting): N 45 Vector q in [0,1] N q(D) := q(D) in [0,1]

Why statistical queries? Perceptron, ID3 decision trees, PCA/SVM, k- means clustering [BlumDworkMcSherryNissim’05] Any SQ-learning algorithm [Kearns’98] – includes “most” known PAC-learning algorithms Lots of data analysis reduces to multiple statistical queries

Curator’s wildest dream: This seems hard!

Curator’s 2 nd attempt: Intuition: Entropy implies privacy

Two pleasant surprises Approximately solved by multiplicative weights update [Littlestone89,...] Can easily be made differentially private

Why did learning theorists care to solve privacy problems 20 years ago? Answer: Entropy implies generalization

Learner example set Q hypothesis h accurate on all examples Maximizing entropy implies hypothesis generalizes Unknown concept

Sensitive database Queries labeled by answer on DB Synopsis approximates DB on query set Must Preserve Privacy Unknown concept Examples labeled by concept Hypothesis approximates target concept on examples Must Generalize PrivacyLearning

How can we solve this? Concave maximization s.t. linear constraints Concave maximization s.t. linear constraints Ellipsoid We’ll take a different route.

Start with uniform D 0 “What’s wrong with it?” Query q violates constraint! Minimize entropy loss s.t. correction Minimize entropy loss s.t. correction Closed form expression for D t+1 ? Well...

Closed form expression for D t+1 ? YES! Relax Approximate Think

Multiplicative Weights Update

N DtDt D At step t

N DtDt D q At step t Suppose q(D t ) < q(D)

N DtDt D q After step t

Multiplicative Weights Update Algorithm: D 0 uniform For t = 1...T Find bad query q D t+1 = Update(D t,q) Algorithm: D 0 uniform For t = 1...T Find bad query q D t+1 = Update(D t,q) How quickly do we run out of bad queries?

Progress Lemma: if q bad Put

Facts : Progress Lemma: if q bad At moststeps Error bound

Algorithm: D 0 uniform For t = 1...T Find bad query q D t+1 = Update(D t,q) Algorithm: D 0 uniform For t = 1...T Find bad query q D t+1 = Update(D t,q) What about privacy? Only step that interacts with D

Differential Privacy [Dwork-McSherry-Nissim-Smith-06] Two data sets D,D’ are called neighboring if they differ in one element. Definition (Differential Privacy): A randomized algorithm M(D) is called (ε,δ)-differentially private if for any two neighboring data sets D,D’ and all events S:

Laplacian Mechanism [DMNS’06] Given query q: 1.Compute q(D) 2.Output q(D) + Lap(1/ε 0 n) Given query q: 1.Compute q(D) 2.Output q(D) + Lap(1/ε 0 n) Fact: Satisfies ε 0 -differential privacy Note: Sensitivity of q is 1/n

Query selection … q1q1 q2q2 q3q3 qkqk |q(D)-q(D t )|

Query selection … q1q1 q2q2 q3q3 qkqk |q(D)-q(D t )| Add Lap(1/ε 0 n)

Pick maximal violation Query selection … q1q1 q2q2 q3q3 qkqk |q(D)-q(D t )|

Pick maximal violation Query selection … q1q1 q2q2 q3q3 qkqk |q(D)-q(D t )| Lemma [McSherry-Talwar’07]: Selected index satisfies ε 0 -differential privacy and w.h.p Violation > Lemma [McSherry-Talwar’07]: Selected index satisfies ε 0 -differential privacy and w.h.p Violation >

Algorithm: D 0 uniform For t = 1...T Noisy selection of q D t+1 = Update(D t,q) Algorithm: D 0 uniform For t = 1...T Noisy selection of q D t+1 = Update(D t,q) Now: Each step satisfies ε 0 -differential privacy! What is the total privacy guarantee? Also use noisy answer in update rule New error bound:

T-fold composition of ε 0 -differential privacy satisfies: Answer 1 [DMNS’06]: ε 0 T-differential privacy Answer 1 [DMNS’06]: ε 0 T-differential privacy Answer 2 [DRV’10]: (ε,δ)-differential privacy Answer 2 [DRV’10]: (ε,δ)-differential privacy Note: for small enough ε

Composition Theorems Error bound Optimize T, ε 0 ε,δ Theorem 1. On databases of size n MW achieves ε-differential privacy with Theorem 2. MW achieves (ε, δ)-differential privacy with Optimal dependence on |Q| and n

Offline (non-interactive) S Q … Online (interactive) q1q1 q2q2 a2a2 a1a1 ? ✔ H-Ligett-McSherry12, Gupta-H-Roth-Ullman11 See also: Roth-Roughgarden10, Dwork-Rothblum-Vadhan10, Dwork-Naor-Reingold-Rothblum-Vadhan09, Blum-Ligett-Roth08 H-Rothblum10

Algorithm: Given query q t : If |q t (D t )- q t (D) | < α/2 + Lap(1/ε 0 n) – Output q t (D t ) Otherwise – Output q t (D) + Lap(1/ε 0 n) – D t+1 = Update(D t, q t ) Algorithm: Given query q t : If |q t (D t )- q t (D) | < α/2 + Lap(1/ε 0 n) – Output q t (D t ) Otherwise – Output q t (D) + Lap(1/ε 0 n) – D t+1 = Update(D t, q t ) Private MW Online [H-Rothblum’10] Achieves same error bounds!

Overview: Privacy Analysis Offline setting: T << n steps – Simple analysis using Composition Theorems Online setting: k >> n invocations of Laplace – Composition Thms don’t suggest small error! Idea: Analyze privacy loss like lazy random walk (goes back to Dinur-Dwork-Nissim’03)

Privacy Loss as a lazy random walk

Privacy loss

Privacy Loss as a lazy random walk lazy Privacy loss busy busy round = noisy answer close to forcing update

Privacy Loss as a lazy random walk lazy Privacy loss busy busy round = noisy answer close to forcing update

Privacy Loss as a lazy random walk lazy Privacy loss busy busy round = noisy answer close to forcing update

Privacy Loss as a lazy random walk lazy Privacy loss busy busy round = noisy answer close to forcing update

Privacy Loss as a lazy random walk lazy Privacy loss busy busy round = noisy answer close to forcing update

Privacy Loss as a lazy random walk lazy Privacy loss busy busy round = noisy answer close to forcing update

Privacy Loss as a lazy random walk lazy Privacy loss busy busy round = noisy answer close to forcing update W.h.p. bounded by O(sqrt(#busy))

Formalizing the random walk Imagine output of PMW is 0/1 indicator vector where v t =1 if round t update, 0 otherwise Recall: Very few updates! Vector is sparse. Theorem: Vector v is (ε,δ)-diffpriv.

Let D,D’ be neighboring DBs Let P,Q be corresponding output distributions Lemma: (3) implies (ε,δ)-diffpriv. Approach: 1.Sample v from P 2.Consider X = log(P(v)/Q(v)) 3.Argue Pr{ |X| > ε } ≤ δ Intution: X = privacy loss Intution: X = privacy loss

Privacy loss in round t We’ll show: 1. X t = 0 if t not busy 2.|X t | ≤ ε 0 if t busy 3. Number of busy rounds O(#updates) Total privacy loss DRV’10 E[X X k ] ≤ O(ε 0 2 #updates) Azuma Strong concentration around expectation

Defining “busy” event Update condition: Busy event

… Offline (non-interactive)Online (interactive) q1q1 S q2q2 a2a2 Qa1a1 ✔ ✔

What we can do Offline/batch setting: every set of linear queries Online/interactive setting: every sequence of adaptive and adversarial linear queries Theoretical performance: Nearly optimal in the worst case – For instance-by-instance guarantee see H-Talwar10, Nikolov-Talwar (upcoming!), different techniques Practical performance: Compares favorably to previous work! See Katrina’s talk. Are we done?

What we would like to do Running time: Linear dependence on |U| |U| exponential in #attributes of data Can we get poly(n)? No, in the worst-case for synthetic data [DNRRV09] even for simple query classes [Ullman-Vadhan10] No, in interactive setting without restricting query class [Ullman12] What can we do about it?

Look beyond the worst-case! Find meaningful assumptions on data, queries, models etc Design better heuristics! In this talk: Get more mileage out of learning theory! In this talk: Get more mileage out of learning theory!

Sensitive database Queries labeled by answer on DB Synopsis approximates DB on query set Unknown concept Examples labeled by concept Hypothesis approximates target concept on examples PrivacyLearning Can we turn this into an efficient reduction? Yes. [H-Rothblum-Servedio’12]

Informal Theorem: There is an efficient differentially private release mechanism for a query class Q provided that there is an efficient PAC-learning algorithm for related concept class Q’ Interfaces nicely with existing learning algorithms: – Learning based on polynomial threshold functions [Klivans-Servedio] – Harmonic Sieve [Jackson] and extension [Jackson, Klivans, Servedio]

Database as a function Observation: Enough to learn F t for t=α,2α,...,(1-α) in order to approximate F Observation: Enough to learn F t for t=α,2α,...,(1-α) in order to approximate F Query q q(D)

High-level idea Learning algorithm labeled examples Observation: If all labels are privacy-preserving, then so will be hypothesis h Observation: If all labels are privacy-preserving, then so will be hypothesis h Hypothesis h such that

Main hurdles Privacy requires noise, noise might defeat learning algorithm Can only generate |D| examples efficiently before running out of privacy

Learning algorithm Threshold Oracle Compute a=F(x)+N If |a-t| tiny: output “fail” Else if a>t: output 1 Else if a<t: output 0 Threshold Oracle Compute a=F(x)+N If |a-t| tiny: output “fail” Else if a>t: output 1 Else if a<t: output 0 Ensures: 1.Privacy 2.“Removes” noise 3.Complexity independent of |D| Generate samples: 1. Pick x 1,x 2,..,.x m 2. Receive b 1,b 2,...,b m from TO 3. Remove all “failed” examples 4. Pass on remaining labeled examples to learner (y 1,l 1 ),....,(y r,l r ) “F(x)>t”? b in {0,1,fail}

Application: Boolean Conjunctions Important class of queries in differential privacy [BCDKMT07,KRSU10,GHRU11,HMT12,...] Salary > $50kSyphilisHeight > 6’1Weight < 180 Male TrueFalseTrueFalseTrue False TrueFalse TrueFalse True False Universe U = {0,1} d

Informal Corollary (Subexponential algorithm for conjunctions). There is a differentially private release algorithm with running time poly(|D|) such that for any distribution over Boolean conjunctions the algorithm is w.h.p. α-accurate provided that: Informal Corollary (Small width). There is a differentially private release algorithm with running time poly(|D|) such that for any distribution over width-k Boolean conjunctions the algorithm is w.h.p. α-accurate provided that: Previous: 2 O(d) Previous: 2 O(d) Previous: d O(k) Previous: d O(k)

Follow-up work Thaler-Ullman-Vadhan12: Can remove distributional relaxation and get exp(O(d 1/2 )) complexity for all Boolean conjunctions Idea: Use polynomial encodings from learning algorithm directly

Summary Derived simple and powerful private data release algorithm from first principles Privacy/learning analogy as a guiding principle – Can be turned into efficient reduction Can we use these ideas outside theory and in new settings?

Thank you

Open problems Is PMW close to instance optimal? Is there a converse to privacy-to-learning reduction? No barriers for cut/spectral analysis of graphs/matrices (universe small) Releasing k-way conjunctions in time poly(n), error poly(d,k)