Download presentation
Presentation is loading. Please wait.
Published byLeon Wilcox Modified over 9 years ago
1
PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection
2
Viruses Viruses and worms --- types of malware Various definitions are used For our purposes, “virus” used generically How to detect malware? Signature detection used most often In simplest form, search for a string of bits found in the malware Could also include wildcards, heuristics, etc. PHMMs for Metamorphic Detection2
3
Metamorphic Viruses Metamorphic viruses change “shape” For each instance, internal structure changes But function stays the same If the change is sufficient, signature detection fails In principle, metamorphic malware among most difficult to detect But, not too many have been seen in the wild Why not??? PHMMs for Metamorphic Detection3
4
Metamorphic Detection How to detect metamorphic malware? Previous research: HMMs are effective Train model on opcodes extracted from metamorphic “family” viruses Determine a threshold score Then, to score an unknown exe, extract opcodes and score against the model PHMMs for Metamorphic Detection4
5
Profile HMM Standard HMM does not take positional information into account Profile HMM analogous to defining HMM at each position in a sequence Position info is taken into account So, PHMM uses more information This might yield stronger models PHMMs for Metamorphic Detection5
6
PHMMs Will PHMM outperform HMM? Possible advantage of PHMM Uses more information… …since position within sequence is taken into account Possible disadvantages of PHMM More complex, more costly to compute Might overfit the data “More” is not always “better” PHMMs for Metamorphic Detection6
7
The Plan 1.Extract opcodes from metamorphic family viruses 2.Pairwise align opcode sequences 3.Generate multiple sequence alignment (MSA) from pairwise alignments 4.Generate PHMM from MSA 5.Determine threshold, error rates PHMMs for Metamorphic Detection7
8
Metamorphic Techniques Morphing usually applied at asm level Many techniques can be used, such as… Equivalent code substitution Register swap Different code, same function Garbage code/dead code insertion Code reordering Subroutine reordering Arbitrary reordering using jumps PHMMs for Metamorphic Detection8
9
Metamorphic Techniques Opaque predicates “Conditional” that isn’t By combining several techniques, can get achieve desired effect Metamorphism sufficient to break signature detection Function of code remains unchanged PHMMs for Metamorphic Detection9
10
Metamorphic Example Original code PHMMs for Metamorphic Detection10 Morphed version 2 Morphed version 1
11
Metamorphic Viruses Real-world metamorphic viruses PHMMs for Metamorphic Detection11
12
Virus Construction Kits Construction kits --- anyone can easily build (metamorphic) malware PHMMs for Metamorphic Detection12 First 2 are not very metamorphic But, NGVCK is highly metamorphic So, we consider NGVCK here
13
AV Techniques Signature detection is most popular So, of course, virus writers want to evade signature detection Metamorphism can provide strong defense against signature detection PHMMs for Metamorphic Detection13
14
HMMs See previous presentation PHMMs for Metamorphic Detection14
15
PHMMs See previous presentation PHMMs for Metamorphic Detection15
16
PHMMs PHMMs are designed to deal with biological sequences Goal is to find evidence that sequences related by mutation and selection Basic processes usually considered are Substitution --- subsequence replaced Insertion --- subsequence inserted Deletion --- subsequence removed PHMMs for Metamorphic Detection16
17
PHMMs and Computer Viruses The same basic processes can occur in metamorphic viruses That is, substitution, insertion, deletion But also have to deal with Permutation --- re-ordering of sequence Metamorphics may do lots of permuting Permutation can be viewed as series of insertions/deletions But “close” sequences might be “far” apart PHMMs for Metamorphic Detection17
18
Permutation and Alignment Permutations are problematic… How to deal with this? Maybe we can pre-process sequences But, adds complexity and cost More about this later PHMMs for Metamorphic Detection18
19
Test Data Virus construction kits from VX HeavensVX Heavens We generated the following viruses 10 VCL32 viruses 30 MS-MPC viruses 200 NGVCK viruses Also, 40 cygwin utilities These serve as “normal” files PHMMs for Metamorphic Detection19
20
NGVCK Pairwise Alignment Align two NGVCK opcode sequences This looks reasonable PHMMs for Metamorphic Detection20
21
Gap Percentages Recall, with PHMM, the more gaps, the weaker the model MSAs for metamorphic viruses But, VCL32 based on 5 files, PS-MPC based on 10, NGVCK based on 20 files PHMMs for Metamorphic Detection21
22
VCL32 Using five VCL32 viruses… Generate pairwise alignments Generate MSA Then generate PHMM PHMM has 1820 states Can’t show the whole model here So, next slides give 3 states, 126,127,128 PHMMs for Metamorphic Detection22
23
VCL32 Transition Probabilities State transition probabilities The A matrix for states 126,127,128 PHMMs for Metamorphic Detection23
24
VCL32 Emission Probabilities Emission probabilities The E matrix States 126,127,128 Emissions only for match, insert states “Add-one” rule was used here PHMMs for Metamorphic Detection24
25
Results Typical PHMM results for VCL32 PHMMs for Metamorphic Detection25 Can set threshold for 100% detection It doesn’t get any better than that!
26
Results Typical MS-MPC results using PHMM Again, perfect detection PHMMs for Metamorphic Detection26
27
Results But, VCL32 and MS-MPC are easy cases Not very metamorphic Probably detectable using signatures In contrast, NGVCK highly metamorphic So, NGVCK is the important test See next slides PHMMs for Metamorphic Detection27
28
Results Typical results for NGVCK Note that normal files score higher than NGVCK! This is bad! PHMMs for Metamorphic Detection28
29
Pre-Processing For NGVCK, is there any hope? Can try pre-processing Goal is to undo some of the effect of permutation Able to reduce gap percentage in MSA Before, gap percentage was 88.3% After, gap percentage is 44.9% Big improvement, but is it big enough? PHMMs for Metamorphic Detection29
30
Results NGVCK with pre-processing Much better, but not good enough Error rate is still significant PHMMs for Metamorphic Detection30
31
Conclusions HMMs developed in 1960s Standard machine learning technique Many applications PHMMs relatively recent Developed for biological applications Here, a novel application of PHMMs 100% detection for some examples… …poor detection for others PHMMs for Metamorphic Detection31
32
Possible Improvements Improved pre-processing To better account for permutation Local alignment For example, align subroutines Baum-Welch re-estimation of PHMM obtained from MSA Other??? PHMMs for Metamorphic Detection32
33
Last Word Very trendy to apply biological analogies to information security On the one hand… Results here provide evidence supporting trend of looking to biological analogies On the other hand… Results here are “cautionary tale against applying biological analogies too literally” PHMMs for Metamorphic Detection33
34
References Profile hidden Markov models for metamorphic virus detection, S. Attaluri, S. McGhee and M. Stamp, Journal in Computer Virology, Vol. 5, No. 2, May 2009, pp. 151-169 Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Durbin, et al Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids PHMMs for Metamorphic Detection34
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.