Download presentation
Presentation is loading. Please wait.
Published bySteven Ray Modified over 9 years ago
1
Hunting for Metamorphic Engines Wing Wong Mark Stamp Hunting for Metamorphic Engines 1
2
In This Paper… Analyze metamorphic malware o Hacker-produced metamorphic code Measure similarity of software o Based on n-gram analysis Compute scores o Based on n-grams and o Based on HMMs This paper is baseline for future work Hunting for Metamorphic Engines 2
3
Motivation Many virus construction kits available o Many can produce metamorphic code So anybody can create “new” version of existing malware o Virtually no technical expertise required How “effective” is the resulting metamorphic code? Can we detect metamorphic malware? Hunting for Metamorphic Engines 3
4
Background Encrypted, polymorphic, metamorphic o Metamorphic == body polymorphic Metamorphic vs cloned software o Clone is the norm, but metamorphic could offer advantages to the good guy too… From the theory, we know malware detection is NP-complete o And metamorphic is at least as hard o But what about practical situation? Hunting for Metamorphic Engines 4
5
Metamorphism Metamorphic code changes it “shape” Well-known examples o W95/Regswap o W32/Ghost o W95/Zperm o MetaPHOR Hunting for Metamorphic Engines 5
6
Metamorphism General techniques available o Insertion o Substitution o Transposition o Deletion Some easier than others Some more effective against certain detection strategies Hunting for Metamorphic Engines 6
7
Virus Construction Kits In this paper, we consider o PS-MPC (Phalcon/Skism Mass Produced Code generator) o G2 (Second Generation virus generator) o MPCGEN (Mass Produced Code GENerator) o NGVCK (Next Generation Virus Construction Kit) o VCL32 (Virus Creation Lab for Win32) Hunting for Metamorphic Engines 7
8
Virus Construction Kits Did not consider MetaPHOR o Difficult to work with, finicky All of these claim to be metamorphic Are they really? o How can we measure “metamorphism”? If they are highly metamorphic, can we still detect them? Hunting for Metamorphic Engines 8
9
Brief Review of Malware Detection First generation o Signature scanning, wildcards OK Second generation o Approximate signature scanning; e.g., ignore NOP instructions Code emulation Heuristic analysis o Static or dynamic, false positives… Hunting for Metamorphic Engines 9
10
Machine Learning Consider the following o Data Mining, Neural Networks, HMMs Data Mining o Malware-related previous work o Generic approach Neural Networks o Previous work based on byte trigrams o Developed and used at IBM Hunting for Metamorphic Engines 10
11
Hidden Markov Models Train HMM on metamorphic family Then we can score any file to see how “close” it is to the family What to use to train such an HMM? o Raw bytes in exe? o Disassembled code? o Opcode sequence? More on this later… Hunting for Metamorphic Engines 11
12
Software Similarity How to quantify metamorphism? In general, how to measure similarity of software? Given program 1 and program 2.. We develop a score o Score of 0 means “no similarity” o Score of 1 means “virtually identical” Hunting for Metamorphic Engines 12
13
N-gram Similarity Given executable files X and Y Extract opcode sequences from each o Suppose X has n opcodes o Suppose Y has m opcodes How to compare the sequences? Many possible ways --- here we use n- gram analysis o That is, we compare subsequences Hunting for Metamorphic Engines 13
14
N-gram Similarity Extracted opcode sequences o X=(x 0,x 1,…,x n-1 ) and Y=(y 0,y 1,…,y m-1 ) Compare subsequences of length k o Then x i,x i+1,…,x i+k-1 matches y j,y j+1,…,y j+k-1 if they are the same in any order o For each such match, plot the point (i,j) o Remove any segments less than p points Then score = (x axis covered + y axis covered) / 2 Hunting for Metamorphic Engines 14
15
N-gram Similarity Example Hunting for Metamorphic Engines 15
16
N-gram Similarity Score is between 0 and 1 If program X identical to program Y o Main diagonal is a solid line o And score = 1 Minimum score is 0 The smaller the score, the less similar are the programs Hunting for Metamorphic Engines 16
17
Typical N-gram Similarity Hunting for Metamorphic Engines 17 Normal (cygwin utility) files
18
Typical N-gram Similarity Hunting for Metamorphic Engines 18 NGVCK
19
Typical N-gram Similarity Hunting for Metamorphic Engines 19 G2
20
N-gram Similarity Compare members of a “family” with each other Hunting for Metamorphic Engines 20
21
N-gram Similarity In graphical form… Hunting for Metamorphic Engines 21
22
N-gram Similarity Conclusion? G2 more similar to each other than expected o So, they are not very metamorphic o Ditto for most of the other generators But, NGVCK viruses more different from each other than expected o So, they are highly metamorphic Implication wrt signature detection? Hunting for Metamorphic Engines 22
23
NGVCK Similarity Compare NGVCK to other families… Hunting for Metamorphic Engines 23
24
NGVCK Similarity Conclusion? NGVCK viruses very different from each other o Implies highly metamorphic… o …so, signature detection will fail But NGVCK viruses are even more different from normal files o Then what about detection? Hunting for Metamorphic Engines 24
25
Aside: Similar Similarity Measures to Consider? Given opcode sequences o Edit distance o Other sequence comparison techniques o Statistical measures Considering raw bytes o Statistical measures o Entropy and other “structural” measures Hunting for Metamorphic Engines 25
26
Hidden Markov Models Generic view of HMM Hunting for Metamorphic Engines 26
27
HMM Notation Hunting for Metamorphic Engines 27
28
HMM for Metamorphic Detection Train HMM o Extract opcodes from family executables o Append opcode sequences o Train a model, i.e., determine matrices Use trained HMM to score files o Given an file, extract opcode sequence o Score sequence against the model o Compare to predetermined threshold Hunting for Metamorphic Engines 28
29
HMM Scoring: Fine Points Score computed as log likelihood of the scored sequence o We mormalize score to “log likelihood per opcode” (LLPO) o Why? How to quantify effectiveness? o ROC curves are very useful o Specifically, area under ROC curve (AUC) Hunting for Metamorphic Engines 29
30
Results HMM scoring for NGVCK family Hunting for Metamorphic Engines 30
31
HMM Scoring: Bottom Line Signature detection for metamorphic families, except NGVCK For NGVCK, we can use HMM o Classification is 100% when compared to normal (benign) files o Some misclassifications of other malware (is that good or bad?) Should include ROC curves, AUC, … Hunting for Metamorphic Engines 31
32
HMM States: 3 State Model Hunting for Metamorphic Engines 32
33
N-gram Score Can also score files using N-grams Randomly select NGVCK file o Extract its opcode sequence Given a file we want to score o Extract its opcode sequence o N-gram similarity to NGVCK sequence o Higher similarity, classify as NGVCK o Lower similarity, classify as “not NGVCK” Hunting for Metamorphic Engines 33
34
N-gram Score Results? For NGVCK, obtain ideal separation o There exists a threshold for which… o …we can separate NGVCK from normal Surprisingly strong results o For such a simple similarity score Why does this work? o We come back to this at the end… Hunting for Metamorphic Engines 34
35
Compare to Commercial AV Tested following on our virus sets o eTrust, avast!, AVG These scanners detected most of the viruses from weak families o That is, G2, VCL32, etc. But none of the NGVCK viruses detected by any of the 3 scanners Hunting for Metamorphic Engines 35
36
Conclusion HMM effective at detecting a highly metamorphic NGVCK malware family N-gram similarity also effective NGVCK not detected by commercial AV So, this detection improves the state of the art Practical considerations? Hunting for Metamorphic Engines 36
37
Lessons Learned? Why can we detect NGVCK family? In spite of high metamorphism, code is statistically different from normal “Improved” metamorphic malware? Metamorphism must be sufficient to evade signature detection But, metamorphic family must be statistically similar to normal Hunting for Metamorphic Engines 37
38
Future Work Build a better metamorphic generator o Some progress here, but still detectable using other detection methods o Still need better generators… Develop and test other detection strategies o Lots of work done here too o But lots more to do Hunting for Metamorphic Engines 38
39
References W. Wong and M. Stamp, Hunting for metamorphic engines, Journal in Computer Virology 2(3):211-229, 2006Hunting for metamorphic engines M. Stamp, A revealing introduction to hidden Markov modelsA revealing introduction to hidden Markov models Hunting for Metamorphic Engines 39
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.