PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection
Viruses Viruses and worms --- types of malware Various definitions are used For our purposes, “virus” used generically How to detect malware? Signature detection used most often In simplest form, search for a string of bits found in the malware Could also include wildcards, heuristics, etc. PHMMs for Metamorphic Detection2
Metamorphic Viruses Metamorphic viruses change “shape” For each instance, internal structure changes But function stays the same If the change is sufficient, signature detection fails In principle, metamorphic malware among most difficult to detect But, not too many have been seen in the wild Why not??? PHMMs for Metamorphic Detection3
Metamorphic Detection How to detect metamorphic malware? Previous research: HMMs are effective Train model on opcodes extracted from metamorphic “family” viruses Determine a threshold score Then, to score an unknown exe, extract opcodes and score against the model PHMMs for Metamorphic Detection4
Profile HMM Standard HMM does not take positional information into account Profile HMM analogous to defining HMM at each position in a sequence Position info is taken into account So, PHMM uses more information This might yield stronger models PHMMs for Metamorphic Detection5
PHMMs Will PHMM outperform HMM? Possible advantage of PHMM Uses more information… …since position within sequence is taken into account Possible disadvantages of PHMM More complex, more costly to compute Might overfit the data “More” is not always “better” PHMMs for Metamorphic Detection6
The Plan 1.Extract opcodes from metamorphic family viruses 2.Pairwise align opcode sequences 3.Generate multiple sequence alignment (MSA) from pairwise alignments 4.Generate PHMM from MSA 5.Determine threshold, error rates PHMMs for Metamorphic Detection7
Metamorphic Techniques Morphing usually applied at asm level Many techniques can be used, such as… Equivalent code substitution Register swap Different code, same function Garbage code/dead code insertion Code reordering Subroutine reordering Arbitrary reordering using jumps PHMMs for Metamorphic Detection8
Metamorphic Techniques Opaque predicates “Conditional” that isn’t By combining several techniques, can get achieve desired effect Metamorphism sufficient to break signature detection Function of code remains unchanged PHMMs for Metamorphic Detection9
Metamorphic Example Original code PHMMs for Metamorphic Detection10 Morphed version 2 Morphed version 1
Metamorphic Viruses Real-world metamorphic viruses PHMMs for Metamorphic Detection11
Virus Construction Kits Construction kits --- anyone can easily build (metamorphic) malware PHMMs for Metamorphic Detection12 First 2 are not very metamorphic But, NGVCK is highly metamorphic So, we consider NGVCK here
AV Techniques Signature detection is most popular So, of course, virus writers want to evade signature detection Metamorphism can provide strong defense against signature detection PHMMs for Metamorphic Detection13
HMMs See previous presentation PHMMs for Metamorphic Detection14
PHMMs See previous presentation PHMMs for Metamorphic Detection15
PHMMs PHMMs are designed to deal with biological sequences Goal is to find evidence that sequences related by mutation and selection Basic processes usually considered are Substitution --- subsequence replaced Insertion --- subsequence inserted Deletion --- subsequence removed PHMMs for Metamorphic Detection16
PHMMs and Computer Viruses The same basic processes can occur in metamorphic viruses That is, substitution, insertion, deletion But also have to deal with Permutation --- re-ordering of sequence Metamorphics may do lots of permuting Permutation can be viewed as series of insertions/deletions But “close” sequences might be “far” apart PHMMs for Metamorphic Detection17
Permutation and Alignment Permutations are problematic… How to deal with this? Maybe we can pre-process sequences But, adds complexity and cost More about this later PHMMs for Metamorphic Detection18
Test Data Virus construction kits from VX HeavensVX Heavens We generated the following viruses 10 VCL32 viruses 30 MS-MPC viruses 200 NGVCK viruses Also, 40 cygwin utilities These serve as “normal” files PHMMs for Metamorphic Detection19
NGVCK Pairwise Alignment Align two NGVCK opcode sequences This looks reasonable PHMMs for Metamorphic Detection20
Gap Percentages Recall, with PHMM, the more gaps, the weaker the model MSAs for metamorphic viruses But, VCL32 based on 5 files, PS-MPC based on 10, NGVCK based on 20 files PHMMs for Metamorphic Detection21
VCL32 Using five VCL32 viruses… Generate pairwise alignments Generate MSA Then generate PHMM PHMM has 1820 states Can’t show the whole model here So, next slides give 3 states, 126,127,128 PHMMs for Metamorphic Detection22
VCL32 Transition Probabilities State transition probabilities The A matrix for states 126,127,128 PHMMs for Metamorphic Detection23
VCL32 Emission Probabilities Emission probabilities The E matrix States 126,127,128 Emissions only for match, insert states “Add-one” rule was used here PHMMs for Metamorphic Detection24
Results Typical PHMM results for VCL32 PHMMs for Metamorphic Detection25 Can set threshold for 100% detection It doesn’t get any better than that!
Results Typical MS-MPC results using PHMM Again, perfect detection PHMMs for Metamorphic Detection26
Results But, VCL32 and MS-MPC are easy cases Not very metamorphic Probably detectable using signatures In contrast, NGVCK highly metamorphic So, NGVCK is the important test See next slides PHMMs for Metamorphic Detection27
Results Typical results for NGVCK Note that normal files score higher than NGVCK! This is bad! PHMMs for Metamorphic Detection28
Pre-Processing For NGVCK, is there any hope? Can try pre-processing Goal is to undo some of the effect of permutation Able to reduce gap percentage in MSA Before, gap percentage was 88.3% After, gap percentage is 44.9% Big improvement, but is it big enough? PHMMs for Metamorphic Detection29
Results NGVCK with pre-processing Much better, but not good enough Error rate is still significant PHMMs for Metamorphic Detection30
Conclusions HMMs developed in 1960s Standard machine learning technique Many applications PHMMs relatively recent Developed for biological applications Here, a novel application of PHMMs 100% detection for some examples… …poor detection for others PHMMs for Metamorphic Detection31
Possible Improvements Improved pre-processing To better account for permutation Local alignment For example, align subroutines Baum-Welch re-estimation of PHMM obtained from MSA Other??? PHMMs for Metamorphic Detection32
Last Word Very trendy to apply biological analogies to information security On the one hand… Results here provide evidence supporting trend of looking to biological analogies On the other hand… Results here are “cautionary tale against applying biological analogies too literally” PHMMs for Metamorphic Detection33
References Profile hidden Markov models for metamorphic virus detection, S. Attaluri, S. McGhee and M. Stamp, Journal in Computer Virology, Vol. 5, No. 2, May 2009, pp Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Durbin, et al Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids PHMMs for Metamorphic Detection34