Privacy-preserving Prediction

Slides:

Advertisements

Similar presentations

VC Dimension – definition and impossibility result

Advertisements

On-line learning and Boosting

DIMENSIONALITY REDUCTION: FEATURE EXTRACTION & FEATURE SELECTION Principle Component Analysis.

Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay.

Raef Bassily Adam Smith Abhradeep Thakurta Penn State Yahoo! Labs Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds Penn.

Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)

Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.

Vapnik-Chervonenkis Dimension

1 On The Learning Power of Evolution Vitaly Feldman.

More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.

Using Data Privacy for Better Adaptive Predictions Vitaly Feldman IBM Research – Almaden Foundations of Learning Theory, 2014 Cynthia Dwork Moritz Hardt.

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Cost-Sensitive Bayesian Network algorithm Introduction: Machine learning algorithms are becoming an increasingly important area for research and application.

Differential Privacy - Apps Presented By Nikhil M Chandrappa 1.

Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.

10/5/ Geometric Approach Geometric Interpretation: Geometric Interpretation: Each node holds a statistics vector Each node holds a statistics vector.

HADOOP David Kauchak CS451 – Fall Admin Assignment 7.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 16– Linear and Logistic Regression) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.

Logistic Regression Week 3 – Soft Computing By Yosi Kristian.

Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.

Maria-Florina Balcan 16/11/2015 Active Learning. Supervised Learning E.g., which s are spam and which are important. E.g., classify objects as chairs.

Privacy-Preserving Support Vector Machines via Random Kernels Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison March 3, 2016 TexPoint.

Preserving Statistical Validity in Adaptive Data Analysis Vitaly Feldman IBM Research - Almaden Cynthia Dwork Moritz Hardt Toni Pitassi Omer Reingold Aaron.

Why does it work? We have not addressed the question of why does this classifier performs well, given that the assumptions are unlikely to be satisfied.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Geometric Approach Geometric Interpretation:

Private Data Management with Verification

Deep Feedforward Networks

Large Margin classifiers

Vitaly Feldman and Jan Vondrâk IBM Research - Almaden

Introduction to Machine Learning

Lecture 3: Linear Regression (with One Variable)

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

Understanding Generalization in Adaptive Data Analysis

Machine learning, pattern recognition and statistical data modelling

Algorithmic Transparency with Quantitative Input Influence

Discriminative and Generative Classifiers

CH. 2: Supervised Learning

Generalization and adaptivity in stochastic convex optimization

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Overview of Supervised Learning

Bias and Variance of the Estimator

Importance Weighted Active Learning

Morgan Bruns1, Chris Paredis1, and Scott Ferson2

Thinking (and teaching) Quantitatively about Bias and Privacy

Algorithmic Approaches to Preventing Overfitting in Adaptive Data Analysis Part 1 Aaron Roth.

Machine Learning Week 1.

Privacy-Preserving Classification

Vitaly (the West Coast) Feldman

Current Developments in Differential Privacy

Preserving Validity in Adaptive Data Analysis

Differential Privacy and Statistical Inference: A TCS Perspective

CSCI B609: “Foundations of Data Science”

Statistical Learning Dong Liu Dept. EEIS, USTC.

Michal Rosen-Zvi University of California, Irvine

CSCI B609: “Foundations of Data Science”

Seth Neel University of Pennsylvania EC 2018: MD4SG Workshop

Approximation and Generalization in Neural Networks

Model generalization Brief summary of methods

Scott Aaronson (UT Austin) UNM, Albuquerque, October 18, 2018

Machine learning overview

Generalization bounds for uniformly stable algorithms

Logistic Regression Chapter 7.

MODEL DEVELOPMENT FOR HIGH-SPEED RECEIVERS

The reusable holdout: Preserving validity in adaptive data analysis

A task of induction to find patterns

A task of induction to find patterns

INTRODUCTION TO Machine Learning 3rd Edition

Exploiting Unintended Feature Leakage in Collaborative Learning

Presentation transcript:

Privacy-preserving Prediction Vitaly Feldman Brain with Cynthia Dwork

Privacy-preserving learning Input: dataset 𝑆=( 𝑥 1 , 𝑦 1 ),…,( 𝑥 𝑛 , 𝑦 𝑛 ) Goal: given 𝑥 predict 𝑦 𝑠 Differentially private learning algorithm 𝐴 Model ℎ 𝐴(𝑆) 𝐴(𝑆′)

Trade-offs Linear regression in ℝ 𝒅 With 𝜖-DP needs factor Ω 𝑑 𝜖 more data [Bassily,Smith,Thakurta 14] Learning a linear classifier over {0,1} 𝑑 Needs factor Ω 𝑑 𝜖 more data [Feldman,Xiao 13] MNIST accuracy ≈𝟗𝟓% with small 𝜖, 𝛿 vs 99.8% without privacy [AbadiCGMMTZ 16]

Prediction 𝑠 Users need predictions not models Fits many existing systems 𝑝 1 ∈𝑋 Prediction API 𝑠 𝑣 2 𝑝 2 𝑣 2 𝑝 𝑡 𝑣 𝑡 Users Given that many existing applications DP

Attacks Black-box membership inference with high accuracy [Shokri,Stronati,Song,Shmatikov 17; LongBWBWTGC 18; SalemZFHB 18]

Learning with DP prediction Accuracy-privacy trade-off Single prediction query Differentially private prediction : 𝑀: 𝑋×𝑌 𝑛 ×𝑋→𝑌 is 𝜖-DP prediction algorithm if for every 𝑥∈𝑋, 𝑀(𝑆,𝑥) is 𝜖-DP private w.r.t. 𝑆

Differentially private aggregation Label aggregation [HCB 16; PAEGT 17; PSMRTE 18; BTT 18] 𝑆 𝑆 1 𝑆 2 𝑆 3 ⋯ 𝑆 𝑘 𝑛=𝑘𝑚 ⋯ 𝐴 ℎ 1 ℎ 2 ℎ 3 ℎ 𝑘−2 ℎ 𝑘−1 ℎ 𝑘 ⋯ (non-DP) learning algo 𝐴 𝑥 ℎ 1 (𝑥) ℎ 2 (𝑥) ℎ 3 (𝑥) ℎ 𝑘−2 (𝑥) ℎ 𝑘−1 (𝑥) ℎ 𝑘 (𝑥) Differentially private aggregation 𝑦 e.g. exponential mechanism 𝑦∝ 𝑒 𝜖 | 𝑖 ℎ 𝑖 𝑥 =𝑦}|/2

Classification via aggregation PAC model: Let 𝐶 be a class of function over 𝑋 For all distributions 𝑃 over 𝑋×{0,1} output ℎ such that w.h.p. 𝐏𝐫 (𝑥,𝑦)∼𝑃 ℎ 𝑥 ≠𝑦 ≤Op t 𝑃 𝐶 +𝛼 Non-private 𝝐-DP prediction 𝝐-DP model Θ VCdim 𝐶 𝑛 Θ VCdim 𝐶 𝜖𝑛 Θ Rdim 𝐶 𝜖𝑛 𝛼 Realizable case: Θ VCdim 𝐶 𝑛 𝑂 VCdim 𝐶 𝜖𝑛 1/3 + Θ Rdim 𝐶 𝜖𝑛 Agnostic: Representation dimension [Beimel,Nissim,Stemmer 13] VCdim 𝐶 ≤ Rdim 𝐶 ≤ VCdim 𝐶 ⋅log⁡|𝑋| [KLNRS 08] For many classes Rdim 𝐶 =Ω( VCdim 𝐶 ⋅ log 𝑋 ) [F.,Xiao 13]

Prediction stability À la [Bousquet,Elisseeff 02]: 𝐴: 𝑋×𝑌 𝑛 ×𝑋→ℝ is uniformly 𝛾-stable algorithm if for every, neighboring 𝑆,𝑆′ and 𝑥∈𝑋, 𝐴 𝑆,𝑥 −𝐴 𝑆 ′ ,𝑥 ≤𝛾 Convex regression: given 𝐹= 𝑓 𝑤,𝑥 𝑤∈𝐾 For 𝑃 over 𝑋×𝑌, minimize: ℓ 𝑃 𝑤 = 𝐄 (𝑥,𝑦)∼𝑃 [ℓ(𝑓 𝑤,𝑥 ,𝑦)] over convex 𝐾⊆ ℝ 𝑑 , where ℓ(𝑓 𝑤,𝑥 ,𝑦) is convex in 𝑤 for all 𝑥,𝑦 Convex 1-Lipschitz regression over ℓ 2 ball of radius 1: Non-private 𝝐-DP prediction 𝝐-DP model Θ 1 𝑛 𝑂 1 𝜖𝑛 Ω 1 𝑛 + 𝑑 𝜖𝑛 Excess loss:

DP prediction implies generalization Beyond aggregation Threshold functions on a line 1 𝑚 Excess error for agnostic learning Non-private 𝝐-DP prediction 𝝐-DP model Θ 1 𝑛 Θ 1 𝑛 + 1 𝜖𝑛 Θ 1 𝑛 + log 𝑚 𝜖𝑛 DP prediction implies generalization

Conclusions Natural setting for learning with privacy Better accuracy-privacy trade-off Paper (COLT 2018): https://arxiv.org/abs/1803.10266 Open problems: General agnostic learning Other general approaches Handling of multiple queries [BTT 18]