Download presentation
Presentation is loading. Please wait.
Published byThomasina Burns Modified over 9 years ago
1
Polygraph: Automatically Generating Signatures for Polymorphic Worms Presented by: Devendra Salvi Paper by : James Newsome, Brad Karp, Dawn Song
2
Introduction Why automated signature generation technique ? Learning from previous worm detection implementations Polymorphic worm ?
3
Polymorphic Worm design Characteristic of a Polymorphic worm Invariant bytes Wildcard bytes Code bytes Creating a Polymorphic worm Assumptions Perfectly obfuscated code Code obfuscation
4
Polymorphic Worm design The two chief sources of invariant content Exploit framing (reserved key words) Exploit payload (alter control flow)
5
Invariant content in polymorphic worm Apache multiple-host-header vulnerability Apache-Knacker exploit Unshaded area=wildcard bytes Lightly shaded =code bytes Heavily shaded=invariant content byte
6
Invariant content in polymorphic worm (contd.) BIND TSIG vulnerability Exploited by the Lion worm. Unshaded area=wildcard bytes Lightly shaded =code bytes Heavily shaded=invariant content byte
7
Invariant content in polymorphic worm (contd.) CodeRed AdmWorm Slapper Clet polymorphic engine Boxed bytes are found in at least 20% of Clet’s outputs; shaded bytes are found in all of Clet’s outputs.
8
Polymorphic Signatures Substring Signatures Insufficient ? A single invariant substring exists across payload instances for the same worm; that is, the substring is sensitive, in that it will match all worm instances. The invariant substring is sufficiently long to be specific; that is, the substring does not occur in any nonworm payloads destined for the same IP protocol and port. Signature Classes for Polymorphic Worms Conjunction signatures Token-subsequence signatures Bayes signatures
9
Polygraph Polygraph monitor incorporates the Polygraph signature generator.
10
Polygraph (contd.) Polygraph Signature Generator Signature quality Efficient signature generation Efficient signature matching Generation of small signature sets. Robustness against noise and multiple worms. Robustness against evasion and subversion.
11
Algorithm for signature generation Preprocessing: Token Extraction All of the distinct substrings of a minimum length are extracted. e.g.. If there are ‘K’ occurrences of “http”, “ttp” will not be considered distinct unless if it appears in another ‘K’ occurrences and not as a substring of “http” This is the first step of the algorithm which filters out irrelevant tokens of a suspicious flow.
12
Algorithm for signature generation (contd.) Generating single signatures Generating Conjunction Signatures Unordered token list Generating Token-Subsequence Signatures Ordered token list (regular expression) E.g.. “.*one.*two.*”. “.*o.*n.*e.*z.*” Generating Bayes Signatures Pr[L(x) = worm|x] and Pr[L(x) = worm|x]. (Pr[L(x) = worm|x] / Pr[L(x) = worm|x]) = Pr[L(x) = worm] Õ1in Pr[xi = 1|L(x) = worm] / Pr[L(x) = worm] Õ1in Pr[xi = 1|L(x) = worm]
13
Practical signatures generation Generating multiple signatures the suspicious flow pool could contain more than one type of worm, and could contain innocuous flows Bayes algorithm implementation Conjunction algorithms require clustering Each cluster contains similar flow Hierarchical clustering
14
Practical signatures generation Hierarchical Clustering Cluster are merged iteratively. Two clusters are merged based on what the merged signature would be for each of the O(s2) pairs of clusters. The two clusters that result in a signature with the lowest false positive rate are merged. S1S2S3S4S5S6 S1S2-S3S4S5-S6
15
Performance of each Polygraph signature generation algorithm Experimental Setup: Token-extraction threshold k = 3, the minimum token length a = 2, and the minimum cluster size to be 3. All experiments were run on desktop machines with 1.4 GHz Intel R Pentium R III processors, running Linux kernel 2.4.20. Signatures for polymorphic versions of three real-world exploits are generated. the Apache-Knacker exploit the ATPhttpd exploit the BIND-TSIG exploit Network traces. several network traces as input for and to evaluate Polygraph signature generation, HTTP and DNS.
16
Results Single polymorphic worm ApacheKnacker signatures. For each algorithm, the correct signature is generated 100% of the time for all experiments where the suspicious pool size is greater than 2,and 0% of the time where the suspicious pool size is only 2.
17
Results (contd.) Single polymorphic worm BINDTSIG signatures. These signatures were successfully generated for innocuous pools containing at least 3 worm samples.
18
Results (contd.) Single Polymorphic Worm Plus Noise False Negatives: Clusters produce 0% false negatives while Bayes algorithm, beyond 80%, at which point the signatures cause 100%false negatives. Figures (a) and (b) show the additional false positives that result from the addition of noise.
19
Results (contd.) Multiple Polymorphic Worms Plus Noise False Negatives is similar to single polymorphic worms plus noise False Positives is very similar to single polymorphic worms plus noise when there is only one type of worm in the suspicious pool.
20
Potential attacks on Polygraph Overtraining Attacks The conjunction and token subsequence algorithms are designed to extract the most specific signature possible from a worm. An attacker may attempt to exploit this property to prevent the generated signature from being sufficiently general. Innocuous Pool Poisoning An attacker could determine what signatures Polygraph would generate for it. He could then create otherwise innocuous flows that match these signatures, and try to get them into Polygraph’s innocuous flow pool. Long-tail Attack : An exploit could have already occurred by the time we see a full signature match.
21
Strengths The paper introduces preventive measure, should there be a polymorphic worm. Signature generation technique is automated Since the algorithms work efficiently for polymorphic worm as well as in situation where there maybe more than one worm present in the data flow, it is practical too.
22
Weaknesses Any of the signature generation algorithm when applied individually can be evaded. In the time it comes up with a signature, the vulnerable host might be already infected.
23
Improvisation All of the three mentioned algorithms can be implemented simultaneously and use the signature which has the fewest false positives and false negatives
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.