Stealing DNN models: Attacks and Defenses Mika Juuti, Sebastian Szyller, Alexey Dmitrenko, Samuel Marchal, N. Asokan
Prediction Service Provider API Background Machine learning increasingly popular: business advantage to companies API: black-box access to clients Automate tedious decision-making Attacker wants to compromise Model confidentiality ~ model extraction Does stolen model agree with target model (classification results)? Model integrity (prediction quality) ~ transferable adversarial examples Are adversarial examples for stolen model also adversarial examples on target model? Prediction Service Provider API ML model Target model Client Speed limit 80km/h Attacker Stolen model [1] Tramer et al. Stealing ML models via prediction APIs. UsenixSEC’16.
Generic model extraction with synthetic samples Select model hyperparameters Initial data collection Query API for predictions Train attacker model (“stolen model”) Generate new queries Terminate -- hyperparameters, layers, ... [1, 2] -- unlabeled seed samples for attack -- get labels / classification probabilities -- update model -- probe boundaries with synthetic samples -- budget exceeded Duplication rounds [1] Oh et al. Towards Reverse Engineering Black-box Neural Networks. ICLR’18 [2] Wang and Gong. Stealing Hyperparameters in Machine Learning. S&P’18 [3] Papernot et al. Practical black-box attacks against machine learning. AsiaCCS’17. [3]
Datasets MNIST: B&W Digits Server trained with 55,000 images 4 layers (2 conv + 2 dense) Seed samples: 10 ~ 500 images 10 classes GTSRB: Traffic Sign Recognition Server trained w/ 39,000 images 5 layers (2 conv + 3 dense) Seed samples: 43 ~ 2150 images 8 macro-classes Click to show macro-classes
Comparative evaluation with state-of-the-art Better targeted transferability: Jb-topk targets a specific class Better agreement: fully training substitute models crucial
Take-aways Agreement and transferability do not go hand-in-hand Probabilities improve transferability Dropout may help on complex data Synthetic samples with probabilities boost transferability Thinner, deeper networks more resilient to transferable adversarial examples
Attacks: Common characteristics Seed samples ~ novelty in queries establish initial decision boundaries Synthetic samples ~ similar to existing samples refine the boundaries Study distribution of queries to detect model extraction attacks
Defence: PRADA Protection Against DNN Model Extraction Attacks Stateful defense: Detects lack-of-novelty in client queries Lazy clustering Keeps track of submitted queries Adds “novel” samples to growing set Parameters W : window size – speed of detection 𝚫 : change in growth rate – recall Idea of detection: Kink!
Detection efficiency All known model extraction attacks detected Slowest on Tramer ~ inefficient attack Requires >500k queries to succeed [1] [1] (Conservative estimate based on) Tramer et al. Stealing ML models via prediction APIs. UsenixSEC’16. [2] Papernot et al. Practical black-box attacks against machine learning. AsiaCCS’17.
False positives Test with randomly sampled data from different distributions Traffic Signs: German / Belgian Signs Digits: MNIST / USPS No false positives PRADA relies on relative data distribution = client behavior German Belgian MNIST USPS
Summary Model extraction is a serious threat to model owners Proposed new attack that outperforms state-of-the-art Take-aways for model extraction attacks & transferability PRADA detects all known model extraction attacks https://arxiv.org/abs/1805.02628
Extra
Structured tests Exp 1: What is the impact of natural seed samples? Attacker only uses seed samples. Exp 2: What advantage do synthetic samples give? Continue attack with synthetic samples. Compare our new attack with existing. Exp 3: What effect does model complexity have? Transferability. Compare (1) number of layers and (2) number of parameters.
1. Impact of using only natural seed samples: MNIST Number of adversarial examples tested is always 90 for x ≥ 100 Number of adversarial examples tested is always 70 for x ≥ 430 Agreement and transferability do not go hand-in-hand Probabilities improve transferability Dropout may help on complex data
2. Benefits of using synthetic samples (4) Synthetic samples with probabilities boost transferability Click to show untargeted transferability
3. Effect of model complexity (5) Thinner, deeper networks more resilient to transferable adversarial examples MNIST: Transferability
RW circle Gray circle 1 2 3 4 5 6 7 8 No passing No passing (weight limit) Right-of-way Priority road Yield Stop No vehicles No vehicles (weight limit) No entry Warning Priority 9 10 11 12 13 14 15 16 17 Yield Caution Dangerous curve (left) Dangerous curve (right) Double curve Bumpy Slippery Road narrows Road work Traffic signals Stop 18 19 20 21 22 23 24 25 26 No entry Pedestrians Children Bicycles Ice Wildlife End of limits Turn right Turn right Ahead only Blue circle 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 Go straight or right Go straight or left Keep right Keep left Roundabout End of no passing End of no passing (weight limit) Back to datasets
1. Impact of using only natural seed samples: MNIST
1. Impact of using only natural seed samples: GTSRB
2. Synthetic samples: untargeted transferability
Detection efficiency – sequential data from client Sequential data is co-dependant Good parameters more conservative All known model extraction attacks detected Detection at first window overlap Seed samples + window size (50) No difference on Tramer Requires >500k queries to succeed [1] [1] (Conservative estimate based on) Tramer et al. Stealing ML models via prediction APIs. UsenixSEC’16. [2] Papernot et al. Practical black-box attacks against machine learning. AsiaCCS’17. 21