Presentation is loading. Please wait.

Presentation is loading. Please wait.

PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

Similar presentations


Presentation on theme: "PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection."— Presentation transcript:

1 PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection

2 Viruses  Viruses and worms --- types of malware  Various definitions are used  For our purposes, “virus” used generically  How to detect malware?  Signature detection used most often  In simplest form, search for a string of bits found in the malware  Could also include wildcards, heuristics, etc. PHMMs for Metamorphic Detection2

3 Metamorphic Viruses  Metamorphic viruses change “shape”  For each instance, internal structure changes  But function stays the same  If the change is sufficient, signature detection fails  In principle, metamorphic malware among most difficult to detect  But, not too many have been seen in the wild  Why not??? PHMMs for Metamorphic Detection3

4 Metamorphic Detection  How to detect metamorphic malware?  Previous research: HMMs are effective  Train model on opcodes extracted from metamorphic “family” viruses  Determine a threshold score  Then, to score an unknown exe, extract opcodes and score against the model PHMMs for Metamorphic Detection4

5 Profile HMM  Standard HMM does not take positional information into account  Profile HMM analogous to defining HMM at each position in a sequence  Position info is taken into account  So, PHMM uses more information  This might yield stronger models PHMMs for Metamorphic Detection5

6 PHMMs  Will PHMM outperform HMM?  Possible advantage of PHMM  Uses more information…  …since position within sequence is taken into account  Possible disadvantages of PHMM  More complex, more costly to compute  Might overfit the data  “More” is not always “better” PHMMs for Metamorphic Detection6

7 The Plan 1.Extract opcodes from metamorphic family viruses 2.Pairwise align opcode sequences 3.Generate multiple sequence alignment (MSA) from pairwise alignments 4.Generate PHMM from MSA 5.Determine threshold, error rates PHMMs for Metamorphic Detection7

8 Metamorphic Techniques  Morphing usually applied at asm level  Many techniques can be used, such as…  Equivalent code substitution  Register swap  Different code, same function  Garbage code/dead code insertion  Code reordering  Subroutine reordering  Arbitrary reordering using jumps PHMMs for Metamorphic Detection8

9 Metamorphic Techniques  Opaque predicates  “Conditional” that isn’t  By combining several techniques, can get achieve desired effect  Metamorphism sufficient to break signature detection  Function of code remains unchanged PHMMs for Metamorphic Detection9

10 Metamorphic Example  Original code PHMMs for Metamorphic Detection10  Morphed version 2  Morphed version 1

11 Metamorphic Viruses  Real-world metamorphic viruses PHMMs for Metamorphic Detection11

12 Virus Construction Kits  Construction kits --- anyone can easily build (metamorphic) malware PHMMs for Metamorphic Detection12  First 2 are not very metamorphic  But, NGVCK is highly metamorphic  So, we consider NGVCK here

13 AV Techniques  Signature detection is most popular  So, of course, virus writers want to evade signature detection  Metamorphism can provide strong defense against signature detection PHMMs for Metamorphic Detection13

14 HMMs  See previous presentation PHMMs for Metamorphic Detection14

15 PHMMs  See previous presentation PHMMs for Metamorphic Detection15

16 PHMMs  PHMMs are designed to deal with biological sequences  Goal is to find evidence that sequences related by mutation and selection  Basic processes usually considered are  Substitution --- subsequence replaced  Insertion --- subsequence inserted  Deletion --- subsequence removed PHMMs for Metamorphic Detection16

17 PHMMs and Computer Viruses  The same basic processes can occur in metamorphic viruses  That is, substitution, insertion, deletion  But also have to deal with  Permutation --- re-ordering of sequence  Metamorphics may do lots of permuting  Permutation can be viewed as series of insertions/deletions  But “close” sequences might be “far” apart PHMMs for Metamorphic Detection17

18 Permutation and Alignment  Permutations are problematic…  How to deal with this?  Maybe we can pre-process sequences  But, adds complexity and cost  More about this later PHMMs for Metamorphic Detection18

19 Test Data  Virus construction kits from VX HeavensVX Heavens  We generated the following viruses  10 VCL32 viruses  30 MS-MPC viruses  200 NGVCK viruses  Also, 40 cygwin utilities  These serve as “normal” files PHMMs for Metamorphic Detection19

20 NGVCK Pairwise Alignment  Align two NGVCK opcode sequences  This looks reasonable PHMMs for Metamorphic Detection20

21 Gap Percentages  Recall, with PHMM, the more gaps, the weaker the model  MSAs for metamorphic viruses  But, VCL32 based on 5 files, PS-MPC based on 10, NGVCK based on 20 files PHMMs for Metamorphic Detection21

22 VCL32  Using five VCL32 viruses…  Generate pairwise alignments  Generate MSA  Then generate PHMM  PHMM has 1820 states  Can’t show the whole model here  So, next slides give 3 states, 126,127,128 PHMMs for Metamorphic Detection22

23 VCL32 Transition Probabilities  State transition probabilities  The A matrix for states 126,127,128 PHMMs for Metamorphic Detection23

24 VCL32 Emission Probabilities  Emission probabilities  The E matrix  States 126,127,128  Emissions only for match, insert states  “Add-one” rule was used here PHMMs for Metamorphic Detection24

25 Results  Typical PHMM results for VCL32 PHMMs for Metamorphic Detection25  Can set threshold for 100% detection  It doesn’t get any better than that!

26 Results  Typical MS-MPC results using PHMM  Again, perfect detection PHMMs for Metamorphic Detection26

27 Results  But, VCL32 and MS-MPC are easy cases  Not very metamorphic  Probably detectable using signatures  In contrast, NGVCK highly metamorphic  So, NGVCK is the important test  See next slides PHMMs for Metamorphic Detection27

28 Results  Typical results for NGVCK  Note that normal files score higher than NGVCK!  This is bad! PHMMs for Metamorphic Detection28

29 Pre-Processing  For NGVCK, is there any hope?  Can try pre-processing  Goal is to undo some of the effect of permutation  Able to reduce gap percentage in MSA  Before, gap percentage was 88.3%  After, gap percentage is 44.9%  Big improvement, but is it big enough? PHMMs for Metamorphic Detection29

30 Results  NGVCK with pre-processing  Much better, but not good enough  Error rate is still significant PHMMs for Metamorphic Detection30

31 Conclusions  HMMs developed in 1960s  Standard machine learning technique  Many applications  PHMMs relatively recent  Developed for biological applications  Here, a novel application of PHMMs  100% detection for some examples…  …poor detection for others PHMMs for Metamorphic Detection31

32 Possible Improvements  Improved pre-processing  To better account for permutation  Local alignment  For example, align subroutines  Baum-Welch re-estimation of PHMM obtained from MSA  Other??? PHMMs for Metamorphic Detection32

33 Last Word  Very trendy to apply biological analogies to information security  On the one hand…  Results here provide evidence supporting trend of looking to biological analogies  On the other hand…  Results here are “cautionary tale against applying biological analogies too literally” PHMMs for Metamorphic Detection33

34 References  Profile hidden Markov models for metamorphic virus detection, S. Attaluri, S. McGhee and M. Stamp, Journal in Computer Virology, Vol. 5, No. 2, May 2009, pp. 151-169  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Durbin, et al Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids PHMMs for Metamorphic Detection34


Download ppt "PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection."

Similar presentations


Ads by Google