Exploiting Unintended Feature Leakage in Collaborative Learning Luca Melis∗ UCL luca.melis.14@alumni.ucl.ac.uk Congzheng Song∗ Cornell University cs2296@cornell.edu Emiliano De Cristofaro UCL & Alan Turing Institute e.decristofaro@ucl.ac.uk Vitaly Shmatikov Cornell Tech shmat@cs.cornell.edu
Collaborative machine learning Dataset 1 Participant 1 Model 1 Dataset 2 Participant 2 Model 2 Dataset 3 Participant 3 Model 3 Periodically exchange model parameters Training data never leave participants’ machines
Collaborative machine learning synchronized gradient updates
Collaborative machine learning Federated learning with model averaging
Key idea Any useful ML model reveals something about the population from which the training data was drawn inferring “unintended” features that hold for certain subsets of the training data
Inferences Membership Inference Passive Property Inference Active Property Inference
Threat Model K participants (1 adversary, 1 target) Algorithm 1 K = 2 observes gradient updates computed on a single batch of the target’s data K > 2 observes an aggregation of gradient updates from all other participants Algorithm 2 the result of two-step aggregation: (1) every participant aggregates the gradients computed on each local batch (2) the server aggregates the updates from all participants.
Threat Model
Threat Model - Embedding layer Non-numeric Discrete Inputs Sparse Low-dim vector representation Treat embedding matrix as a parameter Sparse gradient Infer information from non-zero gradient
Membership Inference IN? - Interpretation data model - Importance Disease record - Implementation in
Membership Inference - Experiment (a) idea Test Bag of Words (BoW) : the input to be inferred Batch Bag of Words (BoW): the target’s data in each batch subset (b) dataset Yelp-health : vocabulary containing 5,000 words FourSquare: 30,000 locations (c) result
Passive Property Inference - Interpretation Not necessarily in all class Not necessarily related with training object Detect properties in a single batch Detect properties in a participant’s entire dataset Bob’s photo -- gender classification – whether Alice also appears whether people wear glasses when a property appears - Assumption Data labeled: - idea generate aggregated updates based on the data with the property and updates based on the data without the property. train a binary batch property classifier and feeds it
Passive Property Inference - idea
Single batch Property Inference - Experiment ex1
Single batch Property Inference t-SNE projection of the features from different layers - Experiment ex1
Single batch Property Inference - Experiment ex2 Main task: review-score classification Inference: specialty of doctors ex3 Infer some people
Dynamic Property Occurrence Inference determine if people in the image are of the same gender infer whether and when a certain person appears in the other participant’s photos
Inference against well-generalized models Main task: sentiment Inference: infer authors’ gender dataset: annually expanded student-written essays and reviews Truthful/Deceptive OR Positive/Negative labeled with attributes of the author (gender, age, sexual orientation, region of origin, personality profile) the document (timestamp, genre, topic, veracity, sentiment)
Active property inference Let the main model learn separable representations for the data with and without the property. adversary performs additional local computations and submits the resulting values into the collaborative learning protocol Main task: gender classification Inference: presence of ID 4
Multi-party experiments A. Synchronized SGD
Multi-party experiments B. Model averaging
Multi-party experiments B. Model averaging
Defense A. Sharing fewer gradients B. Dimensionality reduction
Defense C. Dropout D. Participant-level differential privacy
Limitations A. Auxiliary data More targeted inference attacks require specialized auxiliary data that may not be available B. Number of participants some federated-learning applications involve thousands or millions of users C. Undetectable properties It may not be possible to infer some properties from model updates. D. Attribution of inferred properties may not be able to attribute these inputs to a specific participant in multi-party scenarios
Thanks!