Privacy-preserving Prediction

Slides:



Advertisements
Similar presentations
VC Dimension – definition and impossibility result
Advertisements

On-line learning and Boosting
DIMENSIONALITY REDUCTION: FEATURE EXTRACTION & FEATURE SELECTION Principle Component Analysis.
Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay.
Raef Bassily Adam Smith Abhradeep Thakurta Penn State Yahoo! Labs Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds Penn.
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
Vapnik-Chervonenkis Dimension
1 On The Learning Power of Evolution Vitaly Feldman.
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Using Data Privacy for Better Adaptive Predictions Vitaly Feldman IBM Research – Almaden Foundations of Learning Theory, 2014 Cynthia Dwork Moritz Hardt.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Cost-Sensitive Bayesian Network algorithm Introduction: Machine learning algorithms are becoming an increasingly important area for research and application.
Differential Privacy - Apps Presented By Nikhil M Chandrappa 1.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
10/5/ Geometric Approach Geometric Interpretation: Geometric Interpretation: Each node holds a statistics vector Each node holds a statistics vector.
HADOOP David Kauchak CS451 – Fall Admin Assignment 7.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 16– Linear and Logistic Regression) Pushpak Bhattacharyya CSE Dept., IIT Bombay.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Logistic Regression Week 3 – Soft Computing By Yosi Kristian.
Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.
Maria-Florina Balcan 16/11/2015 Active Learning. Supervised Learning E.g., which s are spam and which are important. E.g., classify objects as chairs.
Privacy-Preserving Support Vector Machines via Random Kernels Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison March 3, 2016 TexPoint.
Preserving Statistical Validity in Adaptive Data Analysis Vitaly Feldman IBM Research - Almaden Cynthia Dwork Moritz Hardt Toni Pitassi Omer Reingold Aaron.
Why does it work? We have not addressed the question of why does this classifier performs well, given that the assumptions are unlikely to be satisfied.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Geometric Approach Geometric Interpretation:
Private Data Management with Verification
Deep Feedforward Networks
Large Margin classifiers
Vitaly Feldman and Jan Vondrâk IBM Research - Almaden
Introduction to Machine Learning
Lecture 3: Linear Regression (with One Variable)
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Understanding Generalization in Adaptive Data Analysis
Machine learning, pattern recognition and statistical data modelling
Algorithmic Transparency with Quantitative Input Influence
Discriminative and Generative Classifiers
CH. 2: Supervised Learning
Generalization and adaptivity in stochastic convex optimization
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Overview of Supervised Learning
Bias and Variance of the Estimator
Importance Weighted Active Learning
Morgan Bruns1, Chris Paredis1, and Scott Ferson2
Thinking (and teaching) Quantitatively about Bias and Privacy
Algorithmic Approaches to Preventing Overfitting in Adaptive Data Analysis Part 1 Aaron Roth.
Machine Learning Week 1.
Privacy-Preserving Classification
Vitaly (the West Coast) Feldman
Current Developments in Differential Privacy
Preserving Validity in Adaptive Data Analysis
Differential Privacy and Statistical Inference: A TCS Perspective
CSCI B609: “Foundations of Data Science”
Statistical Learning Dong Liu Dept. EEIS, USTC.
Michal Rosen-Zvi University of California, Irvine
CSCI B609: “Foundations of Data Science”
Seth Neel University of Pennsylvania EC 2018: MD4SG Workshop
Approximation and Generalization in Neural Networks
Model generalization Brief summary of methods
Scott Aaronson (UT Austin) UNM, Albuquerque, October 18, 2018
Machine learning overview
Generalization bounds for uniformly stable algorithms
Logistic Regression Chapter 7.
MODEL DEVELOPMENT FOR HIGH-SPEED RECEIVERS
The reusable holdout: Preserving validity in adaptive data analysis
A task of induction to find patterns
A task of induction to find patterns
INTRODUCTION TO Machine Learning 3rd Edition
Exploiting Unintended Feature Leakage in Collaborative Learning
Presentation transcript:

Privacy-preserving Prediction Vitaly Feldman Brain with Cynthia Dwork

Privacy-preserving learning Input: dataset 𝑆=( 𝑥 1 , 𝑦 1 ),…,( 𝑥 𝑛 , 𝑦 𝑛 ) Goal: given 𝑥 predict 𝑦 𝑠 Differentially private learning algorithm 𝐴 Model ℎ 𝐴(𝑆) 𝐴(𝑆′)

Trade-offs Linear regression in ℝ 𝒅 With 𝜖-DP needs factor Ω 𝑑 𝜖 more data [Bassily,Smith,Thakurta 14] Learning a linear classifier over {0,1} 𝑑 Needs factor Ω 𝑑 𝜖 more data [Feldman,Xiao 13] MNIST accuracy ≈𝟗𝟓% with small 𝜖, 𝛿 vs 99.8% without privacy [AbadiCGMMTZ 16]

Prediction 𝑠 Users need predictions not models Fits many existing systems 𝑝 1 ∈𝑋 Prediction API 𝑠 𝑣 2 𝑝 2 𝑣 2 𝑝 𝑡 𝑣 𝑡 Users Given that many existing applications DP

Attacks Black-box membership inference with high accuracy [Shokri,Stronati,Song,Shmatikov 17; LongBWBWTGC 18; SalemZFHB 18]

Learning with DP prediction Accuracy-privacy trade-off Single prediction query Differentially private prediction : 𝑀: 𝑋×𝑌 𝑛 ×𝑋→𝑌 is 𝜖-DP prediction algorithm if for every 𝑥∈𝑋, 𝑀(𝑆,𝑥) is 𝜖-DP private w.r.t. 𝑆

Differentially private aggregation Label aggregation [HCB 16; PAEGT 17; PSMRTE 18; BTT 18] 𝑆 𝑆 1 𝑆 2 𝑆 3 ⋯ 𝑆 𝑘 𝑛=𝑘𝑚 ⋯ 𝐴 ℎ 1 ℎ 2 ℎ 3 ℎ 𝑘−2 ℎ 𝑘−1 ℎ 𝑘 ⋯ (non-DP) learning algo 𝐴 𝑥 ℎ 1 (𝑥) ℎ 2 (𝑥) ℎ 3 (𝑥) ℎ 𝑘−2 (𝑥) ℎ 𝑘−1 (𝑥) ℎ 𝑘 (𝑥) Differentially private aggregation 𝑦 e.g. exponential mechanism 𝑦∝ 𝑒 𝜖 | 𝑖 ℎ 𝑖 𝑥 =𝑦}|/2

Classification via aggregation PAC model: Let 𝐶 be a class of function over 𝑋 For all distributions 𝑃 over 𝑋×{0,1} output ℎ such that w.h.p. 𝐏𝐫 (𝑥,𝑦)∼𝑃 ℎ 𝑥 ≠𝑦 ≤Op t 𝑃 𝐶 +𝛼 Non-private 𝝐-DP prediction 𝝐-DP model Θ VCdim 𝐶 𝑛 Θ VCdim 𝐶 𝜖𝑛 Θ Rdim 𝐶 𝜖𝑛 𝛼 Realizable case: Θ VCdim 𝐶 𝑛 𝑂 VCdim 𝐶 𝜖𝑛 1/3 + Θ Rdim 𝐶 𝜖𝑛 Agnostic: Representation dimension [Beimel,Nissim,Stemmer 13] VCdim 𝐶 ≤ Rdim 𝐶 ≤ VCdim 𝐶 ⋅log⁡|𝑋| [KLNRS 08] For many classes Rdim 𝐶 =Ω( VCdim 𝐶 ⋅ log 𝑋 ) [F.,Xiao 13]

Prediction stability À la [Bousquet,Elisseeff 02]: 𝐴: 𝑋×𝑌 𝑛 ×𝑋→ℝ is uniformly 𝛾-stable algorithm if for every, neighboring 𝑆,𝑆′ and 𝑥∈𝑋, 𝐴 𝑆,𝑥 −𝐴 𝑆 ′ ,𝑥 ≤𝛾 Convex regression: given 𝐹= 𝑓 𝑤,𝑥 𝑤∈𝐾 For 𝑃 over 𝑋×𝑌, minimize: ℓ 𝑃 𝑤 = 𝐄 (𝑥,𝑦)∼𝑃 [ℓ(𝑓 𝑤,𝑥 ,𝑦)] over convex 𝐾⊆ ℝ 𝑑 , where ℓ(𝑓 𝑤,𝑥 ,𝑦) is convex in 𝑤 for all 𝑥,𝑦 Convex 1-Lipschitz regression over ℓ 2 ball of radius 1: Non-private 𝝐-DP prediction 𝝐-DP model Θ 1 𝑛 𝑂 1 𝜖𝑛 Ω 1 𝑛 + 𝑑 𝜖𝑛 Excess loss:

DP prediction implies generalization Beyond aggregation Threshold functions on a line 1 𝑚 Excess error for agnostic learning Non-private 𝝐-DP prediction 𝝐-DP model Θ 1 𝑛 Θ 1 𝑛 + 1 𝜖𝑛 Θ 1 𝑛 + log 𝑚 𝜖𝑛 DP prediction implies generalization

Conclusions Natural setting for learning with privacy Better accuracy-privacy trade-off Paper (COLT 2018): https://arxiv.org/abs/1803.10266 Open problems: General agnostic learning Other general approaches Handling of multiple queries [BTT 18]