Download presentation
Presentation is loading. Please wait.
Published byAugustine Bryant Modified over 9 years ago
1
On the Hardness of Evading Combinations of Linear Classifiers Daniel Lowd University of Oregon Joint work with David Stevens
2
Machine learning is used more and more in adversarial domains… Intrusion detection Malware detection Phishing detection Detecting malicious advertisements Detecting fake reviews Credit card fraud detection Online auction fraud detection Email spam filtering Blog spam filtering OSN spam filtering Political censorship …and more every year!
3
Evasion Attack 1.System designers deploy a classifier. 2.An attacker learns about the model through interaction (and possibly other information sources). 3.An attacker uses this knowledge to evade detection by changing its behavior as little as possible. Example: A spammer sends test emails to learn how to modify a spam so that it gets past a spam filter. Question: How easily can the attacker learn enough to mount an effective attack?
4
4 Adversarial Classifier Reverse Engineering (ACRE) Task: Find the negative instance “closest” to x a (We will also refer to this distance as a “cost” to be minimized.) Problem: the adversary doesn ’ t know the classifier! [Lowd&Meek,’05] X1X1 X2X2 + - xaxa
5
5 Adversarial Classifier Reverse Engineering (ACRE) Task: Find the negative instance “closest” to x a Given: X1X1 X2X2 ?? ? ? ? ? ? ? - + – One positive and one negative instance, x + and x – A polynomial number of membership queries Within a factor of k [Lowd&Meek,’05] xaxa
6
Example: Linear Classifiers With continuous features and L 1 distance, find optimal point by doing line search in each dimension: However, with binary features, we can’t do line searches. X1X1 X2X2 xaxa * -- Somewhat more efficient methods exist for the continuous case. [Nelson&al.,2012].
7
7 Attacking Linear Classifiers with Boolean features Can efficiently find an evasion with at most twice the optimal cost, assuming unit cost for each “change”. xaxa x-x- wiwi wjwj wkwk wlwl wmwm c(x)c(x) METHOD: Iteratively reduce cost in two ways: 1.Remove any unnecessary change: O(n) 2.Replace any two changes with one: O(n 3 ) xaxa y wiwi wjwj wkwk wlwl c(x)c(x) wmwm x-x- xaxa y’y’ wiwi wjwj wkwk wlwl c(x)c(x) wpwp Also known: Any convex-inducing classifier with continuous features is ACRE-learnable. [Nelson&al.,2012] [Lowd&Meek’05]
8
This work: We consider when the positive or negative class is an intersection of half-spaces, or polytope, representable by combinations of linear classifiers: What about non-linear classifiers? Positive class is conjunction of linear classifiers. Example: One classifier to identify each legitimate user. Positive class is disjunction of linear classifiers. Example: One classifier for each type of attack. We show that the attack problem is hard in general, but easy when the half-spaces are defined over disjoint features.
9
Hardness Results With continuous features and L 1 costs, near-optimal evasion of a polytope requires polynomially many queries. [Nelson et al., 2012] With discrete features, we show that exponentially many queries are required in the worst case. Proofs work for any fixed approximation ratio k. Key Idea: Construct a set of component classifiers so there is no clear path from “distant” to “close” negative instances.
10
Hardness of Evading Disjunctions n/2k classifiers n/2+1 Two ways to evade: – Include all light-green features (cost: n/2+1) – Include all dark-green features (cost: n/2k) Challenge: – If you don’t guess all dark-green features, some classifier remains positive. – If you include extra red features, all classifiers become positive. Guessing low-cost instance requires exponentially many queries! (Instance is negative only if all component classifiers mark it as negative.)
11
Hardness of Evading Conjunctions To evade c 2 : Include > ½ the light-green features (cost: n/4+1) To evade c 1 : Include all dark-green features (cost: n/4k), or all light-green features (cost: n/2), or a combo. Two cases: – When > ½ the light-green features are included, c 2 is negative so dark-greens have no effect on the class label. – When ½ the dark-green features to evade c 1. Adversary must guess n/8k features! n/2 n/4k (Instance is negative only if any component classifier marks it as negative.) c1c1 c2c2
12
Restriction: Disjoint Features In practice, classifiers do not always represent the worst case. In some applications, each classifier in the set works on a different set of features: – Image or fingerprint biometrics classifiers – Separate image spam and HTML spam classifiers This simple restriction makes attacks easy!
13
Evading Disjoint Disjunctions Theorem: Linear attack from [Lowd&Meek,2005] is at most twice optimal on disjoint disjunctions. Proof Sketch: When features are disjoint, the optimal evasion is to evade each component classifier optimally. When the algorithm terminates, there is no way to reduce the cost with individual or pairs of changes, so each separate evasion is at most twice optimal. (Instance is negative only if all component classifiers mark it as negative.) xaxa x-x- wiwi wjwj wkwk wlwl c1(x)c1(x) xaxa x-x- wmwm wnwn wowo c2(x)c2(x) Example:
14
Evading Disjoint Conjunctions Theorem: By repeating linear attack with different constraints, we can efficiently find an attack that is at most twice optimal. Proof Sketch: Each component classifier has some optimal evasion. The optimal overall attack is the cheapest of these attacks. Running the linear attack once finds a good evasion against some classifier. Since it’s an evasion, one classifier must be negative. All feature changes for other classifiers can be removed. Since no individual or pair of changes reduces the cost, this evasion is at most twice optimal. By rerunning the linear attack restricted to features we haven’t used before, eventually we will find good evasions against all component classifiers. (Instance is negative if any component classifier marks it as negative.)
15
Experiments Data: 2005 TREC spam corpus Component classifiers: LR (SVM, NB in paper) Features partitioned into 3 or 5 sets: – Randomly – Spammy / Neutral / Hammy [Jorgensen et al., 2008] Fixed overall false negative rate to 10%. We attempted to disguise 100 different spams. To make this more challenging, we first added 100 random “spammy” features to each spam.
16
Results: Attack Cost
17
Results: Attack Optimality
18
Results: Attack Efficiency Number of queries before algorithms terminate: Conjunction: ~1,000,000 (Restricted: ~50,000) Disjunction: ~10,000,000 (Restricted: ~700,000)
19
1 million queries is not very efficient! The purpose of this experiment is to understand how performance depends on different factors, not the exact number of queries. In practice, the adversary’s job is much easier: – We added 100 spammy features to make it harder. – Additional background knowledge could make this much easier. – Restricted vocabulary reduces queries 10x with minimal increase in attack cost (90% of the time, still within 2x of optimal) – Attackers don’t need guarantees of optimality.
20
Results: Attack Efficiency Number of queries before our attack is within twice optimal: Conjunction: ~3,000 / ~100,000Disjunction: ~10,000 / ~300,000 Attacks are even easier with background knowledge and without 100 spammy words.
21
Discussion and Conclusion Evading discrete classifiers is provably harder than evading continuous classifiers. – Linear: k-approximation vs. (1+ε)-approximation – Polytope: Exponential vs. polynomial queries Interesting sub-classes of discrete non-linear classifiers are still vulnerable. – Disjoint features are a sufficient condition – Open question: What other sub-classes are vulnerable? Conjunction (convex spam) is theoretically harder but practically easier. – In addition to worst-case bounds, we need realistic simulations that can be applied to specific classifiers.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.