Hunting for Metamorphic Engines Wing Wong Mark Stamp Hunting for Metamorphic Engines 1.

Slides:



Advertisements
Similar presentations
Assessing Student Performance
Advertisements

Indexing DNA Sequences Using q-Grams
Smita Thaker 1 Polymorphic & Metamorphic Viruses Presented By : Smita Thaker Dated : Nov 18, 2003.
Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp.
Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Transform Techniques Mark Stamp Transform Techniques.
Toward Automatic Music Audio Summary Generation from Signal Analysis Seminar „Communications Engineering“ 11. December 2007 Patricia Signé.
Effective and Efficient Malware Detection at the End Host Clemens Kolbitsch, Paolo Milani TU Vienna Christopher UCSB Engin Kirda.
Polymorphic Viruses A brief survey Joseph Hamm Shirlan Johnson.
Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,
Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)
Metamorphic Malware Research
METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.
Polymorphism in Computer Viruses CS265 Security Engineering Term Project Puneet Mishra.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining - revision Martin Russell.
HUNTING FOR METAMORPHIC ENGINES Mark Stamp & Wing Wong August 5, 2006.
Metamorphic Viruses Pat Walpole. Introduction What are metamorphic viruses Why they are dangerous Defenses against them.
Profile Hidden Markov Models PHMM 1 Mark Stamp. Hidden Markov Models  Here, we assume you know about HMMs o If not, see “A revealing introduction to.
Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.
Recap Don’t forget to – pick a paper and – me See the schedule to see what’s taken –
Information theoretic interpretation of PAM matrices Sorin Istrail and Derek Aguiar.
Beyond Anti-Virus by Dan Keller Fred Cohen- Computer Scientist “there is no algorithm that can perfectly detect all possible computer viruses”
Antivirus Software Detects malware (not just viruses) May eliminate malware as well Often sold with firewalls Two approaches: Dictionary-based - Compares.
Dr. Richard Ford  Szor 11  Virus Scanners – how they work, why they matter, how to write one…
Automated malware classification based on network behavior
CISC Machine Learning for Solving Systems Problems Presented by: Akanksha Kaul Dept of Computer & Information Sciences University of Delaware SBMDS:
Over the last years, the amount of malicious code (Viruses, worms, Trojans, etc.) sent through the internet is highly increasing. Due to this significant.
Introduction to Profile Hidden Markov Models
1 Chap 10 Malicious Software. 2 Viruses and ”Malicious Programs ” Computer “Viruses” and related programs have the ability to replicate themselves on.
1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo
Masquerade Detection Mark Stamp 1Masquerade Detection.
Computer Viruses Preetha Annamalai Niranjan Potnis.
Department of Computer Science Yasmine Kandissounon.
BY ANDREA ALMEIDA T.E COMP DON BOSCO COLLEGE OF ENGINEERING.
Printing: This poster is 48” wide by 36” high. It’s designed to be printed on a large-format printer. Customizing the Content: The placeholders in this.
Graph Techniques for Malware Detection Mark Stamp Graph Techniques 1.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.
1 A Feature Selection and Evaluation Scheme for Computer Virus Detection Olivier Henchiri and Nathalie Japkowicz School of Information Technology and Engineering.
KAIST Internet Security Lab. CS710 Behavioral Detection of Malware on Mobile Handsets MobiSys 2008, Abhijit Bose et al 이 승 민.
Statistical Tools for Linking Engine-generated Malware to its Engine Edna C. Milgo M.S. Student in Applied Computer Science TSYS School of Computer Science.
Biologically Inspired Defenses against Computer Viruses International Joint Conference on Artificial Intelligence 95’ J.O. Kephart et al.
Hidden Markov Models for Software Piracy Detection Shabana Kazi Mark Stamp HMMs for Piracy Detection 1.
CISC Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.
Sequence Comparison Algorithms Ellen Walker Bioinformatics Hiram College.
Advanced Polymorphic Worms: Evading IDS by Blending in with Normal Traffic Authors: Oleg Kolensnikov and Wenke Lee Published: Technical report, 2005, College.
Printing: This poster is 48” wide by 36” high. It’s designed to be printed on a large-format printer. Customizing the Content: The placeholders in this.
Forensic Analysis of Toolkit-Generated Malicious Programs Yasmine Kandissounon TSYS School of Computer Science Columbus State University 2009 ACM Mid-Southeast.
nd Joint Workshop between Security Research Labs in JAPAN and KOREA Polymorphic Worm Detection by Instruction Distribution Kihun Lee HPC Lab., Postech.
METAMORPHIC VIRUS NGUYEN LE VAN.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Cosc 4765 Antivirus Approaches. In a Perfect world The best solution to viruses and worms to prevent infected the system –Generally considered impossible.
HUNTING FOR METAMORPHIC ENGINES Mark Stamp & Wing Wong September 13, 2006.
Profile Hidden Markov Models PHMM 1 Mark Stamp. Hidden Markov Models  Here, we assume you know about HMMs o If not, see “A revealing introduction to.
Learning to Detect and Classify Malicious Executables in the Wild by J
V. A. Memos and K. E. Psannis*
A Malware Similarity Testing Framework
Techniques, Tools, and Research Issues
PCA Applications Mark Stamp PCA Applications.
Clustering Applications
Data Mining Classification: Alternative Techniques
Presentation transcript:

Hunting for Metamorphic Engines Wing Wong Mark Stamp Hunting for Metamorphic Engines 1

In This Paper…  Analyze metamorphic malware o Hacker-produced metamorphic code  Measure similarity of software o Based on n-gram analysis  Compute scores o Based on n-grams and o Based on HMMs  This paper is baseline for future work Hunting for Metamorphic Engines 2

Motivation  Many virus construction kits available o Many can produce metamorphic code  So anybody can create “new” version of existing malware o Virtually no technical expertise required  How “effective” is the resulting metamorphic code?  Can we detect metamorphic malware? Hunting for Metamorphic Engines 3

Background  Encrypted, polymorphic, metamorphic o Metamorphic == body polymorphic  Metamorphic vs cloned software o Clone is the norm, but metamorphic could offer advantages to the good guy too…  From the theory, we know malware detection is NP-complete o And metamorphic is at least as hard o But what about practical situation? Hunting for Metamorphic Engines 4

Metamorphism  Metamorphic code changes it “shape”  Well-known examples o W95/Regswap o W32/Ghost o W95/Zperm o MetaPHOR Hunting for Metamorphic Engines 5

Metamorphism  General techniques available o Insertion o Substitution o Transposition o Deletion  Some easier than others  Some more effective against certain detection strategies Hunting for Metamorphic Engines 6

Virus Construction Kits  In this paper, we consider o PS-MPC (Phalcon/Skism Mass Produced Code generator) o G2 (Second Generation virus generator) o MPCGEN (Mass Produced Code GENerator) o NGVCK (Next Generation Virus Construction Kit) o VCL32 (Virus Creation Lab for Win32) Hunting for Metamorphic Engines 7

Virus Construction Kits  Did not consider MetaPHOR o Difficult to work with, finicky  All of these claim to be metamorphic  Are they really? o How can we measure “metamorphism”?  If they are highly metamorphic, can we still detect them? Hunting for Metamorphic Engines 8

Brief Review of Malware Detection  First generation o Signature scanning, wildcards OK  Second generation o Approximate signature scanning; e.g., ignore NOP instructions  Code emulation  Heuristic analysis o Static or dynamic, false positives… Hunting for Metamorphic Engines 9

Machine Learning  Consider the following o Data Mining, Neural Networks, HMMs  Data Mining o Malware-related previous work o Generic approach  Neural Networks o Previous work based on byte trigrams o Developed and used at IBM Hunting for Metamorphic Engines 10

Hidden Markov Models  Train HMM on metamorphic family  Then we can score any file to see how “close” it is to the family  What to use to train such an HMM? o Raw bytes in exe? o Disassembled code? o Opcode sequence?  More on this later… Hunting for Metamorphic Engines 11

Software Similarity  How to quantify metamorphism?  In general, how to measure similarity of software?  Given program 1 and program 2..  We develop a score o Score of 0 means “no similarity” o Score of 1 means “virtually identical” Hunting for Metamorphic Engines 12

N-gram Similarity  Given executable files X and Y  Extract opcode sequences from each o Suppose X has n opcodes o Suppose Y has m opcodes  How to compare the sequences?  Many possible ways --- here we use n- gram analysis o That is, we compare subsequences Hunting for Metamorphic Engines 13

N-gram Similarity  Extracted opcode sequences o X=(x 0,x 1,…,x n-1 ) and Y=(y 0,y 1,…,y m-1 )  Compare subsequences of length k o Then x i,x i+1,…,x i+k-1 matches y j,y j+1,…,y j+k-1 if they are the same in any order o For each such match, plot the point (i,j) o Remove any segments less than p points  Then score = (x axis covered + y axis covered) / 2 Hunting for Metamorphic Engines 14

N-gram Similarity Example Hunting for Metamorphic Engines 15

N-gram Similarity  Score is between 0 and 1  If program X identical to program Y o Main diagonal is a solid line o And score = 1  Minimum score is 0  The smaller the score, the less similar are the programs Hunting for Metamorphic Engines 16

Typical N-gram Similarity Hunting for Metamorphic Engines 17  Normal (cygwin utility) files

Typical N-gram Similarity Hunting for Metamorphic Engines 18  NGVCK

Typical N-gram Similarity Hunting for Metamorphic Engines 19  G2

N-gram Similarity  Compare members of a “family” with each other Hunting for Metamorphic Engines 20

N-gram Similarity  In graphical form… Hunting for Metamorphic Engines 21

N-gram Similarity Conclusion?  G2 more similar to each other than expected o So, they are not very metamorphic o Ditto for most of the other generators  But, NGVCK viruses more different from each other than expected o So, they are highly metamorphic  Implication wrt signature detection? Hunting for Metamorphic Engines 22

NGVCK Similarity  Compare NGVCK to other families… Hunting for Metamorphic Engines 23

NGVCK Similarity Conclusion?  NGVCK viruses very different from each other o Implies highly metamorphic… o …so, signature detection will fail  But NGVCK viruses are even more different from normal files o Then what about detection? Hunting for Metamorphic Engines 24

Aside: Similar Similarity Measures to Consider?  Given opcode sequences o Edit distance o Other sequence comparison techniques o Statistical measures  Considering raw bytes o Statistical measures o Entropy and other “structural” measures Hunting for Metamorphic Engines 25

Hidden Markov Models  Generic view of HMM Hunting for Metamorphic Engines 26

HMM Notation Hunting for Metamorphic Engines 27

HMM for Metamorphic Detection  Train HMM o Extract opcodes from family executables o Append opcode sequences o Train a model, i.e., determine matrices  Use trained HMM to score files o Given an file, extract opcode sequence o Score sequence against the model o Compare to predetermined threshold Hunting for Metamorphic Engines 28

HMM Scoring: Fine Points  Score computed as log likelihood of the scored sequence o We mormalize score to “log likelihood per opcode” (LLPO) o Why?  How to quantify effectiveness? o ROC curves are very useful o Specifically, area under ROC curve (AUC) Hunting for Metamorphic Engines 29

Results  HMM scoring for NGVCK family Hunting for Metamorphic Engines 30

HMM Scoring: Bottom Line  Signature detection for metamorphic families, except NGVCK  For NGVCK, we can use HMM o Classification is 100% when compared to normal (benign) files o Some misclassifications of other malware (is that good or bad?)  Should include ROC curves, AUC, … Hunting for Metamorphic Engines 31

HMM States: 3 State Model Hunting for Metamorphic Engines 32

N-gram Score  Can also score files using N-grams  Randomly select NGVCK file o Extract its opcode sequence  Given a file we want to score o Extract its opcode sequence o N-gram similarity to NGVCK sequence o Higher similarity, classify as NGVCK o Lower similarity, classify as “not NGVCK” Hunting for Metamorphic Engines 33

N-gram Score Results?  For NGVCK, obtain ideal separation o There exists a threshold for which… o …we can separate NGVCK from normal  Surprisingly strong results o For such a simple similarity score  Why does this work? o We come back to this at the end… Hunting for Metamorphic Engines 34

Compare to Commercial AV  Tested following on our virus sets o eTrust, avast!, AVG  These scanners detected most of the viruses from weak families o That is, G2, VCL32, etc.  But none of the NGVCK viruses detected by any of the 3 scanners Hunting for Metamorphic Engines 35

Conclusion  HMM effective at detecting a highly metamorphic NGVCK malware family  N-gram similarity also effective  NGVCK not detected by commercial AV  So, this detection improves the state of the art  Practical considerations? Hunting for Metamorphic Engines 36

Lessons Learned?  Why can we detect NGVCK family?  In spite of high metamorphism, code is statistically different from normal  “Improved” metamorphic malware?  Metamorphism must be sufficient to evade signature detection  But, metamorphic family must be statistically similar to normal Hunting for Metamorphic Engines 37

Future Work  Build a better metamorphic generator o Some progress here, but still detectable using other detection methods o Still need better generators…  Develop and test other detection strategies o Lots of work done here too o But lots more to do Hunting for Metamorphic Engines 38

References  W. Wong and M. Stamp, Hunting for metamorphic engines, Journal in Computer Virology 2(3): , 2006Hunting for metamorphic engines  M. Stamp, A revealing introduction to hidden Markov modelsA revealing introduction to hidden Markov models Hunting for Metamorphic Engines 39