Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February.

Slides:



Advertisements
Similar presentations
1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
Advertisements

The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.
Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
I have a DREAM! (DiffeRentially privatE smArt Metering) Gergely Acs and Claude Castelluccia {gergely.acs, INRIA 2011.
Shortest Vector In A Lattice is NP-Hard to approximate
Raef Bassily Penn State Local, Private, Efficient Protocols for Succinct Histograms Based on joint work with Adam Smith (Penn State) (To appear in STOC.
Foundations of Cryptography Lecture 10 Lecturer: Moni Naor.
Introduction to Histograms Presented By: Laukik Chitnis
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Raef Bassily Adam Smith Abhradeep Thakurta Penn State Yahoo! Labs Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds Penn.
FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley 1 and Robert E. Schapire 2 1 Carnegie Mellon University 2 Princeton University.
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
Models and Security Requirements for IDS. Overview The system and attack model Security requirements for IDS –Sensitivity –Detection Analysis methodology.
Simulation Where real stuff starts. ToC 1.What, transience, stationarity 2.How, discrete event, recurrence 3.Accuracy of output 4.Monte Carlo 5.Random.
Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China CIKM 2009.
Locally Decodable Codes Uri Nadav. Contents What is Locally Decodable Code (LDC) ? Constructions Lower Bounds Reduction from Private Information Retrieval.
Foundations of Privacy Lecture 11 Lecturer: Moni Naor.
Collecting Correlated Information from a Sensor Network Micah Adler University of Massachusetts, Amherst.
Part I: Classification and Bayesian Learning
. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.
Finding Almost-Perfect
Foundations of Cryptography Lecture 2 Lecturer: Moni Naor.
Mathematical Programming in Support Vector Machines
Private Analysis of Graphs
New Protocols for Remote File Synchronization Based on Erasure Codes Utku Irmak Svilen Mihaylov Torsten Suel Polytechnic University.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
1 Privacy-Preserving Distributed Information Sharing Nan Zhang and Wei Zhao Texas A&M University, USA.
Data mining and machine learning A brief introduction.
Adaptive CSMA under the SINR Model: Fast convergence using the Bethe Approximation Krishna Jagannathan IIT Madras (Joint work with) Peruru Subrahmanya.
Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.
Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami.
Universit at Dortmund, LS VIII
Privacy of Correlated Data & Relaxations of Differential Privacy CompSci Instructor: Ashwin Machanavajjhala 1Lecture 16: Fall 12.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis Yahoo! Labs Sunnyvale February.
Foundations of Privacy Lecture 5 Lecturer: Moni Naor.
Amplification and Derandomization Without Slowdown Dana Moshkovitz MIT Joint work with Ofer Grossman (MIT)
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Massive Data Sets and Information Theory Ziv Bar-Yossef Department of Electrical Engineering Technion.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Probabilistic km-anonymity (Efficient Anonymization of Large Set-valued Datasets) Gergely Acs (INRIA) Jagdish Achara (INRIA)
1 Introduction to Quantum Information Processing CS 467 / CS 667 Phys 467 / Phys 767 C&O 481 / C&O 681 Richard Cleve DC 3524 Course.
A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
The Message Passing Communication Model David Woodruff IBM Almaden.
Learning with General Similarity Functions Maria-Florina Balcan.
Approximation Algorithms based on linear programming.
Page 1 CS 546 Machine Learning in NLP Review 2: Loss minimization, SVM and Logistic Regression Dan Roth Department of Computer Science University of Illinois.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
University of Texas at El Paso
Data Transformation: Normalization
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Boosting and Additive Trees (2)
Understanding Generalization in Adaptive Data Analysis
Generalization and adaptivity in stochastic convex optimization
Privacy and Fault-Tolerance in Distributed Optimization Nitin Vaidya University of Illinois at Urbana-Champaign.
Machine Learning Basics
Privacy-Preserving Classification
Differential Privacy in Practice
Differential Privacy in the Local Setting
Vitaly (the West Coast) Feldman
Current Developments in Differential Privacy
Privacy-preserving Prediction
Generalization bounds for uniformly stable algorithms
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February 23, 2015

Privacy in Statistical Databases A A queries answers ) ( Government, researchers, businesses (or) malicious adversary Curator x1x1 x2x2 xnxn Users Two conflicting goals: Utility vs. Privacy internet social networks anonymized datasets Balancing these goals is tricky:  No control over external sources of information  Ad-hoc Anonymization schemes are unreliable:  [Narayanan-Shmatikov’08],  [Korolova’11],  [Calendrino et al.’12], … Need algorithms with robust, provable privacy guarantees.

This work Gives efficient algorithms for statistical data analyses with optimal accuracy under rigorous, provable privacy guarantees.

Differential privacy [DMNS’06, DKMMN’06] local random coins A A x1x1 x2x2 xnxn x2’x2’ x1x1 Datasets x and x ’ are called neighbors if they differ in one record. xnxn Require: Neighbor datasets induce close distributions on outputs Def.: A randomized algorithm A is -differentially private if, for all neighbor data sets and, for all events, “Almost same” conclusions will be reached from the output regardless of whether any individual opts into or opts out of the data set. Think of Worst-case definition: DP gives same guarantee regardless of side information of attacker. Worst-case definition: DP gives same guarantee regardless of side information of attacker. Two regimes:  -differential privacy  -differential privacy,

Two models for private data analysis A Individuals Trusted Curator x1x1 x2x2 xnxn A is differentially private w.r.t. datasets of size n Centralized model B Individuals Untrusted Curator y1y1 y2y2 ynyn x1x1 x2x2 xnxn Q1Q1 Q1Q1 Q2Q2 Q2Q2 QnQn QnQn Each Q i is differentially private w.r.t. datasets of size 1 Local model

This talk 1.Differentially private algorithms for:  Convex Empirical Risk Minimization in the centralized model  Estimating Succinct Histograms in the local model 2.Generic framework for relaxing Differential Privacy

1.Differentially private algorithms for:  Convex Empirical Risk Minimization in the centralized model  Estimating Succinct Histograms in the local model 2.Generic framework for relaxing Differential Privacy This talk

Example of Convex ERM: Support Vector Machines Goal: Classify data points of different “types”  Find a hyper-plane separating two different “types” of data points. Many applications  Medical studies: Disease classification based on protein structures. Tested +ve Tested -ve Many applications  Medical studies: Disease classification based on protein structures. Coefficients of hyper-plane is the solution of a convex optimization problem defined by the data set. is given by a linear combination of only few data points called support vectors.

Convex empirical risk minimization C Dataset. Convex constraint set. Loss function where is convex for all.

Convex empirical risk minimization Actual minimizer C Dataset. Convex constraint set. Loss function where is convex for all. Goal: Find a “parameter” that minimizes

Excess risk Output Actual minimizer C Dataset. Convex constraint set. Loss function where is convex for all. Goal: Find a “parameter” that minimizes Output such that Convex empirical risk minimization

Other examples Median Linear regression

Why privacy is hard to maintain in ERM? Dual form of SVM: typically contains a subset of the exact data points in the clear. Median: Minimizer is always a data point.

Private convex ERM [Chaudhuri-Monteleoni 08 & -- Sarwate 11] Studied by [Chaudhuri-et-al ‘11, Rubinstein-et-al ’11, Kifer- Smith-Thakurta‘12, Smith-Thakurta ’13, …] Privacy: A is differentially private in input Utility measured by (worst-case) expected excess risk: A -diff. private Dataset Convex setLoss, Random coins

Best previous work [Chaudhuri-et-al’11, Kifer et al.’12] address special case (smooth functions)  Application to many problems (e.g., SVM, median, …) introduces large additional error. Contributions [B, Smith, Thakurta ‘14] This work improves previous excess risk bounds by factor of 1.New algorithms with optimal excess risk assuming: Loss function is Lipschitz. Parameter set C is bounded. (Separate set of algorithms for strongly convex loss.) 2.Matching lower bounds

PrivacyExcess riskTechnique -DP Exponential sampling (inspired by [McSherry-Talwar’07]) -DP Noisy stochastic gradient descent (rigorous analysis of & improvements to [McSherry-Williams’10], [Jain-Kothari-Thakurta’12] and [Chaudhuri-Sarwate-Song’13]) Normalized bounds: Loss is 1-Lipschitz on parameter set C of diameter 1. Results ( dataset size =, C )

PrivacyExcess riskTechnique -DP Exponential sampling (inspired by [McSherry-Talwar’07]) -DP Noisy stochastic gradient descent (rigorous analysis of & improvements to [McSherry-Williams’10], [Jain-Kothari-Thakurta’12] and [Chaudhuri-Sarwate-Song’13]) Results ( dataset size =, C ) Normalized bounds: Loss is 1-Lipschitz on parameter set C of diameter 1.

Exponential sampling Define a probability distribution over C : Output a sample from C according to Define a probability distribution over C : Output a sample from C according to An instance of the exponential mechanism [McSherry-Talwar’08]  Efficient construction based on rapidly mixing MCMC:  Uses [Applegate-Kannan’91] as a subroutine.  Provides purely multiplicative convergence guarantee.  Does not follow directly from existing results.  Tight utility analysis via a “peeling” argument:  Exploits structure of convex functions: A 1, A 2, … are decreasing in volume  Shows that when

Run SGD with noisy queries for sufficiently many iterations. Noisy stochastic gradient descent Our contributions:  Tight privacy analysis  Stochastic  privacy amplification  Running SGD for many iterations (T = n 2 iterations)  optimal excess risk. Remarks: Stochastic part only for efficiency. Empirically, [CSS’13] showed few iterations are enough in some cases.

Generalization error For a distribution, generalization error at : For any distribution, for output of any -DP algorithm: -DP algorithm such that: Generalized linear model: we get  optimal.

1.Differentially private algorithms for:  Convex Empirical Risk Minimization in the centralized model  Estimating Succinct Histograms in the local model 2.Generic framework for relaxing Differential Privacy This talk

Finance.com Fashion.com WeirdStuff.com How many users like Business.com? A conundrum server How can the server compute aggregate statistics about users without storing user-specific information? How can the server compute aggregate statistics about users without storing user-specific information?

n Untrusted server A set of items (e.g. websites) = [d] = {1, …, d} Set of users = [n] Frequency of an item a is f(a) = ( ♯ users holding a)/n Finance.com Fashion.com WeirdStuff.com Goal is to produce a succinct histogram: a list of frequent items (“heavy hitters”) and estimates of their frequencies while providing rigorous privacy guarantees to the users. Goal is to produce a succinct histogram: a list of frequent items (“heavy hitters”) and estimates of their frequencies while providing rigorous privacy guarantees to the users Item ♯... d-2 d-1 d Item ♯... d-2 d-1 d... Succinct histogram = for some implicitly Succinct histograms

Local model of Differential Privacy Algorithm Q is -local differentially private (LDP) if for any pair v, v’ [d], for all events S, v1v v2v2 vnvn Q1Q1 Q1Q1 Q2Q2 Q2Q2 QnQn QnQn z1z1 z2z2 znzn Succinct histogram is item of user z i is differentially-private report of user i LDP protocols for frequency estimation is used in Chrome web browser (RAPPOR) [Erlingsson-Korolova-Pihur’14] as a basis for other estimation tasks [Dwork-Nissim’04]

Error is measured by the worst-case estimation error: Performance measures v1v v2v2 vnvn Q1Q1 Q1Q1 Q2Q2 Q2Q2 QnQn QnQn z1z1 z2z2 znzn Succinct histogram is item of user z i is differentially-private report of user i A protocol is efficient if it runs in time poly(log(d), n) Communication Complexity measured by number of bits transmitted per user. d is very large, e.g., number of all possible URL’s log(d) = # of bits to describe single URL d is very large, e.g., number of all possible URL’s log(d) = # of bits to describe single URL

Contributions [B, Smith ‘15] 1.Efficient -LDP protocol with optimal error: run in time poly(log(d), n). Estimate all frequencies up to error. 2.Matching lower bound on the error. 3.Generic transformation reducing the communication complexity to 1 bit/user. Previous protocols either  ran in time [Mishra-Sandler’06, Hsu-Khanna-Roth’12, EKP’14]  or, had larger error [HKR’12] Too slow Too much error Best previous lower bound was

UHH: at least fraction of users have the same item while the rest have (i.e., “no item”) Design paradigm Reduction from a simpler problem with a unique heavy hitter (UHH problem)  Efficient protocol with optimal error for UHH  efficient protocol with optimal error for the general problem.

Construction for the UHH problem v*v* v*v* Encoder z1z1 Noising operator z2z2 znzn Round Decoder (error-correcting code) Key idea: is the signal-to-noise ratio. Decoding succeeds when Each user has either v* or v* is unknown to the server Goal: Find v* and estimate f(v*) Each user has either v* or v* is unknown to the server Goal: Find v* and estimate f(v*) Similar to [Duchi et al.’13]

Guarantees that w.h.p., every heavy hitter is allocated a “collision-free” copy of the UHH protocol. v1v1 vnvn Hash K K v1v1 vnvn UHH.. Item whose frequency Construction for the general setting Key insight: Decompose general scenario into multiple instances of UHH via hashing. Run parallel copies of the UHH protocol on these instances.

Efficient Private Protocol for a unique heavy hitter UHH Efficient Private Protocol for estimating all heavy hitters Efficient Private Protocol for a unique heavy hitter UHH Time poly(log(d), n) All frequencies up to the optimal error Efficient Private Protocol for a unique heavy hitter UHH Recap: Construction of succinct histograms

Transforming to a protocol with 1-bit reports generate public random string; one for each user User i sends a biased bit B i Conditioned on B i = 1, the public string has the same distribution as the output of local randomizer Q i Gen( Q i, v i, s i ) vivi BiBi s i Local randomizer: Q i IF B i = 1, THEN report of user i = s i ELSE ignore user i IF B i = 1, THEN report of user i = s i ELSE ignore user i  This transformation works for any local protocol not only heavy hitters. Key idea: What matters is the distribution of the output of each local randomizer.  Public string does not depend on private data: can be generated by untrusted server.  For our HH protocol, this transformation gives essentially same error and computational efficiency (Gen can be computed in O(log(d))).

1.Differentially private algorithms for:  Convex Empirical Risk Minimization in the centralized model  Estimating Succinct Histograms in the local model 2.Generic framework for relaxing Differential Privacy This talk

Attacker’s side information A A queries answers ) ( Curator x1x1 xixi xnxn.... Attacker internet social networks anonymized datasets.... Attacker’s side information is the main reason privacy is hard.

Attacker’s side information A A queries answers ) ( Curator x1x1 xixi xnxn.... Omniscient attacker.... everything except x i Differential privacy is robust against arbitrary side information. Attackers typically have limited knowledge. Contributions [B, Groce, Katz, Smith’13]: Rigorous framework for formalizing and exploiting limited adversarial information: coupled-worlds privacy Algorithms with higher accuracy than is possible under differential privacy Contributions [B, Groce, Katz, Smith’13]: Rigorous framework for formalizing and exploiting limited adversarial information: coupled-worlds privacy Algorithms with higher accuracy than is possible under differential privacy

Exploiting attacker’s uncertainty [BGKS’13] A A queries answers ) ( Curator x1x1 xixi xnxn.... Attacker.... Side info in Δ for any side information in Δ, Given some restricted class of attacker’s knowledge Δ, the output of A must “look the same” to the attacker regardless of whether any single individual is in or out of the computation.

Distributional Differential Privacy [BGKS’13] local random coins A A x1x1 xixi xnxn xixi x1x1 xnxn A is -DDP if, for any distribution on the data set, for any index i, for any value v of a data entry, and for any event A is -DDP if, for any distribution on the data set, for any index i, for any value v of a data entry, and for any event This implies: for all distributions and for all i, w.p. : For any distribution in Δ, almost same inferences will be made about Alice whether or not Alice’s data is present in the data set.

What can we release exactly and privately? Sums  whenever the data distribution has a small uniform component. Histograms  constructed from a random sample from the population. Stable functions  small probability that the output changes when any single entry of the dataset changes. Under modest distributional assumptions, we can release several exact statistics while satisfying DDP:

Conclusions Privacy, a pressing concern in “Big Data”, but hard to define intuitively. Differential privacy, a sound rigorous approach:  Robust against arbitrary side information This work:  the first efficient differentially private algorithms with optimal accuracy guarantees for essential tasks in statistical data analysis.  generic definitional framework for privacy relaxing DP.