Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hunting for Metamorphic Engines Wing Wong Mark Stamp Hunting for Metamorphic Engines 1.

Similar presentations


Presentation on theme: "Hunting for Metamorphic Engines Wing Wong Mark Stamp Hunting for Metamorphic Engines 1."— Presentation transcript:

1 Hunting for Metamorphic Engines Wing Wong Mark Stamp Hunting for Metamorphic Engines 1

2 In This Paper…  Analyze metamorphic malware o Hacker-produced metamorphic code  Measure similarity of software o Based on n-gram analysis  Compute scores o Based on n-grams and o Based on HMMs  This paper is baseline for future work Hunting for Metamorphic Engines 2

3 Motivation  Many virus construction kits available o Many can produce metamorphic code  So anybody can create “new” version of existing malware o Virtually no technical expertise required  How “effective” is the resulting metamorphic code?  Can we detect metamorphic malware? Hunting for Metamorphic Engines 3

4 Background  Encrypted, polymorphic, metamorphic o Metamorphic == body polymorphic  Metamorphic vs cloned software o Clone is the norm, but metamorphic could offer advantages to the good guy too…  From the theory, we know malware detection is NP-complete o And metamorphic is at least as hard o But what about practical situation? Hunting for Metamorphic Engines 4

5 Metamorphism  Metamorphic code changes it “shape”  Well-known examples o W95/Regswap o W32/Ghost o W95/Zperm o MetaPHOR Hunting for Metamorphic Engines 5

6 Metamorphism  General techniques available o Insertion o Substitution o Transposition o Deletion  Some easier than others  Some more effective against certain detection strategies Hunting for Metamorphic Engines 6

7 Virus Construction Kits  In this paper, we consider o PS-MPC (Phalcon/Skism Mass Produced Code generator) o G2 (Second Generation virus generator) o MPCGEN (Mass Produced Code GENerator) o NGVCK (Next Generation Virus Construction Kit) o VCL32 (Virus Creation Lab for Win32) Hunting for Metamorphic Engines 7

8 Virus Construction Kits  Did not consider MetaPHOR o Difficult to work with, finicky  All of these claim to be metamorphic  Are they really? o How can we measure “metamorphism”?  If they are highly metamorphic, can we still detect them? Hunting for Metamorphic Engines 8

9 Brief Review of Malware Detection  First generation o Signature scanning, wildcards OK  Second generation o Approximate signature scanning; e.g., ignore NOP instructions  Code emulation  Heuristic analysis o Static or dynamic, false positives… Hunting for Metamorphic Engines 9

10 Machine Learning  Consider the following o Data Mining, Neural Networks, HMMs  Data Mining o Malware-related previous work o Generic approach  Neural Networks o Previous work based on byte trigrams o Developed and used at IBM Hunting for Metamorphic Engines 10

11 Hidden Markov Models  Train HMM on metamorphic family  Then we can score any file to see how “close” it is to the family  What to use to train such an HMM? o Raw bytes in exe? o Disassembled code? o Opcode sequence?  More on this later… Hunting for Metamorphic Engines 11

12 Software Similarity  How to quantify metamorphism?  In general, how to measure similarity of software?  Given program 1 and program 2..  We develop a score o Score of 0 means “no similarity” o Score of 1 means “virtually identical” Hunting for Metamorphic Engines 12

13 N-gram Similarity  Given executable files X and Y  Extract opcode sequences from each o Suppose X has n opcodes o Suppose Y has m opcodes  How to compare the sequences?  Many possible ways --- here we use n- gram analysis o That is, we compare subsequences Hunting for Metamorphic Engines 13

14 N-gram Similarity  Extracted opcode sequences o X=(x 0,x 1,…,x n-1 ) and Y=(y 0,y 1,…,y m-1 )  Compare subsequences of length k o Then x i,x i+1,…,x i+k-1 matches y j,y j+1,…,y j+k-1 if they are the same in any order o For each such match, plot the point (i,j) o Remove any segments less than p points  Then score = (x axis covered + y axis covered) / 2 Hunting for Metamorphic Engines 14

15 N-gram Similarity Example Hunting for Metamorphic Engines 15

16 N-gram Similarity  Score is between 0 and 1  If program X identical to program Y o Main diagonal is a solid line o And score = 1  Minimum score is 0  The smaller the score, the less similar are the programs Hunting for Metamorphic Engines 16

17 Typical N-gram Similarity Hunting for Metamorphic Engines 17  Normal (cygwin utility) files

18 Typical N-gram Similarity Hunting for Metamorphic Engines 18  NGVCK

19 Typical N-gram Similarity Hunting for Metamorphic Engines 19  G2

20 N-gram Similarity  Compare members of a “family” with each other Hunting for Metamorphic Engines 20

21 N-gram Similarity  In graphical form… Hunting for Metamorphic Engines 21

22 N-gram Similarity Conclusion?  G2 more similar to each other than expected o So, they are not very metamorphic o Ditto for most of the other generators  But, NGVCK viruses more different from each other than expected o So, they are highly metamorphic  Implication wrt signature detection? Hunting for Metamorphic Engines 22

23 NGVCK Similarity  Compare NGVCK to other families… Hunting for Metamorphic Engines 23

24 NGVCK Similarity Conclusion?  NGVCK viruses very different from each other o Implies highly metamorphic… o …so, signature detection will fail  But NGVCK viruses are even more different from normal files o Then what about detection? Hunting for Metamorphic Engines 24

25 Aside: Similar Similarity Measures to Consider?  Given opcode sequences o Edit distance o Other sequence comparison techniques o Statistical measures  Considering raw bytes o Statistical measures o Entropy and other “structural” measures Hunting for Metamorphic Engines 25

26 Hidden Markov Models  Generic view of HMM Hunting for Metamorphic Engines 26

27 HMM Notation Hunting for Metamorphic Engines 27

28 HMM for Metamorphic Detection  Train HMM o Extract opcodes from family executables o Append opcode sequences o Train a model, i.e., determine matrices  Use trained HMM to score files o Given an file, extract opcode sequence o Score sequence against the model o Compare to predetermined threshold Hunting for Metamorphic Engines 28

29 HMM Scoring: Fine Points  Score computed as log likelihood of the scored sequence o We mormalize score to “log likelihood per opcode” (LLPO) o Why?  How to quantify effectiveness? o ROC curves are very useful o Specifically, area under ROC curve (AUC) Hunting for Metamorphic Engines 29

30 Results  HMM scoring for NGVCK family Hunting for Metamorphic Engines 30

31 HMM Scoring: Bottom Line  Signature detection for metamorphic families, except NGVCK  For NGVCK, we can use HMM o Classification is 100% when compared to normal (benign) files o Some misclassifications of other malware (is that good or bad?)  Should include ROC curves, AUC, … Hunting for Metamorphic Engines 31

32 HMM States: 3 State Model Hunting for Metamorphic Engines 32

33 N-gram Score  Can also score files using N-grams  Randomly select NGVCK file o Extract its opcode sequence  Given a file we want to score o Extract its opcode sequence o N-gram similarity to NGVCK sequence o Higher similarity, classify as NGVCK o Lower similarity, classify as “not NGVCK” Hunting for Metamorphic Engines 33

34 N-gram Score Results?  For NGVCK, obtain ideal separation o There exists a threshold for which… o …we can separate NGVCK from normal  Surprisingly strong results o For such a simple similarity score  Why does this work? o We come back to this at the end… Hunting for Metamorphic Engines 34

35 Compare to Commercial AV  Tested following on our virus sets o eTrust, avast!, AVG  These scanners detected most of the viruses from weak families o That is, G2, VCL32, etc.  But none of the NGVCK viruses detected by any of the 3 scanners Hunting for Metamorphic Engines 35

36 Conclusion  HMM effective at detecting a highly metamorphic NGVCK malware family  N-gram similarity also effective  NGVCK not detected by commercial AV  So, this detection improves the state of the art  Practical considerations? Hunting for Metamorphic Engines 36

37 Lessons Learned?  Why can we detect NGVCK family?  In spite of high metamorphism, code is statistically different from normal  “Improved” metamorphic malware?  Metamorphism must be sufficient to evade signature detection  But, metamorphic family must be statistically similar to normal Hunting for Metamorphic Engines 37

38 Future Work  Build a better metamorphic generator o Some progress here, but still detectable using other detection methods o Still need better generators…  Develop and test other detection strategies o Lots of work done here too o But lots more to do Hunting for Metamorphic Engines 38

39 References  W. Wong and M. Stamp, Hunting for metamorphic engines, Journal in Computer Virology 2(3):211-229, 2006Hunting for metamorphic engines  M. Stamp, A revealing introduction to hidden Markov modelsA revealing introduction to hidden Markov models Hunting for Metamorphic Engines 39


Download ppt "Hunting for Metamorphic Engines Wing Wong Mark Stamp Hunting for Metamorphic Engines 1."

Similar presentations


Ads by Google