PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

Slides:



Advertisements
Similar presentations
Smita Thaker 1 Polymorphic & Metamorphic Viruses Presented By : Smita Thaker Dated : Nov 18, 2003.
Advertisements

Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp.
Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.
Transform Techniques Mark Stamp Transform Techniques.
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
1 Profile Hidden Markov Models For Protein Structure Prediction Colin Cherry
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Patterns, Profiles, and Multiple Alignment.
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Profiles for Sequences
Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)
Profile HMMs for sequence families and Viterbi equations Linda Muselaars and Miranda Stobbe.
Hidden Markov Models Pairwise Alignments. Hidden Markov Models Finite state automata with multiple states as a convenient description of complex dynamic.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell.
Lecture 1 BNFO 601 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Dynamic programming.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT. 2 HMM Architecture Markov Chains What is a Hidden Markov Model(HMM)? Components of HMM Problems of HMMs.
Profile-profile alignment using hidden Markov models Wing Wong.
Progressive MSA Do pair-wise alignment Develop an evolutionary tree Most closely related sequences are then aligned, then more distant are added. Genetic.
Distributed Microsystems Laboratory: Developing Microsystems that Make Sense Sensor Validation Techniques Sponsoring Agency: Center for Process Analytical.
Malware 1 Malware Malware 2 Malicious Software  Malware is not new…  Fred Cohen’s initial virus work in 1980’s o Used viruses to break MLS systems.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT
Metamorphic Malware Research
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Comparative ab initio prediction of gene structures using pair HMMs
METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
HUNTING FOR METAMORPHIC ENGINES Mark Stamp & Wing Wong August 5, 2006.
Profile Hidden Markov Models PHMM 1 Mark Stamp. Hidden Markov Models  Here, we assume you know about HMMs o If not, see “A revealing introduction to.
Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Probabilistic Sequence Alignment BMI 877 Colin Dewey February 25, 2014.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Multiple Sequence Alignment BMI/CS 576 Colin Dewey Fall 2010.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Introduction to Profile Hidden Markov Models
Masquerade Detection Mark Stamp 1Masquerade Detection.
Department of Computer Science Yasmine Kandissounon.
CSCE555 Bioinformatics Lecture 6 Hidden Markov Models Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Hidden Markov Models for Sequence Analysis 4
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.
CS Learning Rules1 Learning Sets of Rules. CS Learning Rules2 Learning Rules If (Color = Red) and (Shape = round) then Class is A If (Color.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Hunting for Metamorphic Engines Wing Wong Mark Stamp Hunting for Metamorphic Engines 1.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Chapter 3 Computational Molecular Biology Michael Smith
Hidden Markov Models for Software Piracy Detection Shabana Kazi Mark Stamp HMMs for Piracy Detection 1.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Forensic Analysis of Toolkit-Generated Malicious Programs Yasmine Kandissounon TSYS School of Computer Science Columbus State University 2009 ACM Mid-Southeast.
METAMORPHIC VIRUS NGUYEN LE VAN.
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
Operating System Protection Through Program Evolution Fred Cohen Computers and Security 1992.
Sequence Alignment.
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
Step 3: Tools Database Searching
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp.
Mismatch String Kernals for SVM Protein Classification Christina Leslie, Eleazar Eskin, Jason Weston, William Stafford Noble Presented by Pradeep Anand.
DNA sequences alignment measurement Lecture 13. Introduction Measurement of “strength” alignment Nucleic acid and amino acid substitutions Measurement.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
HUNTING FOR METAMORPHIC ENGINES Mark Stamp & Wing Wong September 13, 2006.
Profile Hidden Markov Models PHMM 1 Mark Stamp. Hidden Markov Models  Here, we assume you know about HMMs o If not, see “A revealing introduction to.
Introduction to Profile HMMs
Presentation transcript:

PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection

Viruses  Viruses and worms --- types of malware  Various definitions are used  For our purposes, “virus” used generically  How to detect malware?  Signature detection used most often  In simplest form, search for a string of bits found in the malware  Could also include wildcards, heuristics, etc. PHMMs for Metamorphic Detection2

Metamorphic Viruses  Metamorphic viruses change “shape”  For each instance, internal structure changes  But function stays the same  If the change is sufficient, signature detection fails  In principle, metamorphic malware among most difficult to detect  But, not too many have been seen in the wild  Why not??? PHMMs for Metamorphic Detection3

Metamorphic Detection  How to detect metamorphic malware?  Previous research: HMMs are effective  Train model on opcodes extracted from metamorphic “family” viruses  Determine a threshold score  Then, to score an unknown exe, extract opcodes and score against the model PHMMs for Metamorphic Detection4

Profile HMM  Standard HMM does not take positional information into account  Profile HMM analogous to defining HMM at each position in a sequence  Position info is taken into account  So, PHMM uses more information  This might yield stronger models PHMMs for Metamorphic Detection5

PHMMs  Will PHMM outperform HMM?  Possible advantage of PHMM  Uses more information…  …since position within sequence is taken into account  Possible disadvantages of PHMM  More complex, more costly to compute  Might overfit the data  “More” is not always “better” PHMMs for Metamorphic Detection6

The Plan 1.Extract opcodes from metamorphic family viruses 2.Pairwise align opcode sequences 3.Generate multiple sequence alignment (MSA) from pairwise alignments 4.Generate PHMM from MSA 5.Determine threshold, error rates PHMMs for Metamorphic Detection7

Metamorphic Techniques  Morphing usually applied at asm level  Many techniques can be used, such as…  Equivalent code substitution  Register swap  Different code, same function  Garbage code/dead code insertion  Code reordering  Subroutine reordering  Arbitrary reordering using jumps PHMMs for Metamorphic Detection8

Metamorphic Techniques  Opaque predicates  “Conditional” that isn’t  By combining several techniques, can get achieve desired effect  Metamorphism sufficient to break signature detection  Function of code remains unchanged PHMMs for Metamorphic Detection9

Metamorphic Example  Original code PHMMs for Metamorphic Detection10  Morphed version 2  Morphed version 1

Metamorphic Viruses  Real-world metamorphic viruses PHMMs for Metamorphic Detection11

Virus Construction Kits  Construction kits --- anyone can easily build (metamorphic) malware PHMMs for Metamorphic Detection12  First 2 are not very metamorphic  But, NGVCK is highly metamorphic  So, we consider NGVCK here

AV Techniques  Signature detection is most popular  So, of course, virus writers want to evade signature detection  Metamorphism can provide strong defense against signature detection PHMMs for Metamorphic Detection13

HMMs  See previous presentation PHMMs for Metamorphic Detection14

PHMMs  See previous presentation PHMMs for Metamorphic Detection15

PHMMs  PHMMs are designed to deal with biological sequences  Goal is to find evidence that sequences related by mutation and selection  Basic processes usually considered are  Substitution --- subsequence replaced  Insertion --- subsequence inserted  Deletion --- subsequence removed PHMMs for Metamorphic Detection16

PHMMs and Computer Viruses  The same basic processes can occur in metamorphic viruses  That is, substitution, insertion, deletion  But also have to deal with  Permutation --- re-ordering of sequence  Metamorphics may do lots of permuting  Permutation can be viewed as series of insertions/deletions  But “close” sequences might be “far” apart PHMMs for Metamorphic Detection17

Permutation and Alignment  Permutations are problematic…  How to deal with this?  Maybe we can pre-process sequences  But, adds complexity and cost  More about this later PHMMs for Metamorphic Detection18

Test Data  Virus construction kits from VX HeavensVX Heavens  We generated the following viruses  10 VCL32 viruses  30 MS-MPC viruses  200 NGVCK viruses  Also, 40 cygwin utilities  These serve as “normal” files PHMMs for Metamorphic Detection19

NGVCK Pairwise Alignment  Align two NGVCK opcode sequences  This looks reasonable PHMMs for Metamorphic Detection20

Gap Percentages  Recall, with PHMM, the more gaps, the weaker the model  MSAs for metamorphic viruses  But, VCL32 based on 5 files, PS-MPC based on 10, NGVCK based on 20 files PHMMs for Metamorphic Detection21

VCL32  Using five VCL32 viruses…  Generate pairwise alignments  Generate MSA  Then generate PHMM  PHMM has 1820 states  Can’t show the whole model here  So, next slides give 3 states, 126,127,128 PHMMs for Metamorphic Detection22

VCL32 Transition Probabilities  State transition probabilities  The A matrix for states 126,127,128 PHMMs for Metamorphic Detection23

VCL32 Emission Probabilities  Emission probabilities  The E matrix  States 126,127,128  Emissions only for match, insert states  “Add-one” rule was used here PHMMs for Metamorphic Detection24

Results  Typical PHMM results for VCL32 PHMMs for Metamorphic Detection25  Can set threshold for 100% detection  It doesn’t get any better than that!

Results  Typical MS-MPC results using PHMM  Again, perfect detection PHMMs for Metamorphic Detection26

Results  But, VCL32 and MS-MPC are easy cases  Not very metamorphic  Probably detectable using signatures  In contrast, NGVCK highly metamorphic  So, NGVCK is the important test  See next slides PHMMs for Metamorphic Detection27

Results  Typical results for NGVCK  Note that normal files score higher than NGVCK!  This is bad! PHMMs for Metamorphic Detection28

Pre-Processing  For NGVCK, is there any hope?  Can try pre-processing  Goal is to undo some of the effect of permutation  Able to reduce gap percentage in MSA  Before, gap percentage was 88.3%  After, gap percentage is 44.9%  Big improvement, but is it big enough? PHMMs for Metamorphic Detection29

Results  NGVCK with pre-processing  Much better, but not good enough  Error rate is still significant PHMMs for Metamorphic Detection30

Conclusions  HMMs developed in 1960s  Standard machine learning technique  Many applications  PHMMs relatively recent  Developed for biological applications  Here, a novel application of PHMMs  100% detection for some examples…  …poor detection for others PHMMs for Metamorphic Detection31

Possible Improvements  Improved pre-processing  To better account for permutation  Local alignment  For example, align subroutines  Baum-Welch re-estimation of PHMM obtained from MSA  Other??? PHMMs for Metamorphic Detection32

Last Word  Very trendy to apply biological analogies to information security  On the one hand…  Results here provide evidence supporting trend of looking to biological analogies  On the other hand…  Results here are “cautionary tale against applying biological analogies too literally” PHMMs for Metamorphic Detection33

References  Profile hidden Markov models for metamorphic virus detection, S. Attaluri, S. McGhee and M. Stamp, Journal in Computer Virology, Vol. 5, No. 2, May 2009, pp  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Durbin, et al Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids PHMMs for Metamorphic Detection34