Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp.

Slides:



Advertisements
Similar presentations
CLASSICAL ENCRYPTION TECHNIQUES
Advertisements

Classical Encryption Techniques Week 6-wend. One-Time Pad if a truly random key as long as the message is used, the cipher will be secure called a One-Time.
Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp.
Cryptology  Terminology  plaintext - text that is not encrypted.  ciphertext - the output of the encryption process.  key - the information required.
Cryptography and Network Security Chapter 3
FEAL FEAL 1.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
Metamorphic Malware Research
METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.
ROC Curves.
Cryptanalysis of the Playfair Cipher Using an Evolutionary Algorithm By: Benjamin Rhew.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 5 Wenbing Zhao Department of Electrical and Computer Engineering.
HUNTING FOR METAMORPHIC ENGINES Mark Stamp & Wing Wong August 5, 2006.
Intermediate Code. Local Optimizations
Session 6: Introduction to cryptanalysis part 1. Contents Problem definition Symmetric systems cryptanalysis Particularities of block ciphers cryptanalysis.
Profile Hidden Markov Models PHMM 1 Mark Stamp. Hidden Markov Models  Here, we assume you know about HMMs o If not, see “A revealing introduction to.
Chapter 2 Basic Encryption and Decryption (part B)
Lecture 23 Symmetric Encryption
Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.
Computer Security CS 426 Lecture 3
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
L1.1. An Introduction to Classical Cryptosystems Rocky K. C. Chang, February 2013.
Chapter 2 – Classical Encryption Techniques
Cryptanalysis. The Speaker  Chuck Easttom  
Introduction to Profile Hidden Markov Models
Chapter 2 Basic Encryption and Decryption. csci5233 computer security & integrity 2 Encryption / Decryption encrypted transmission AB plaintext ciphertext.
Classical Encryption Techniques
Lecture 2 Overview.
Masquerade Detection Mark Stamp 1Masquerade Detection.
Department of Computer Science Yasmine Kandissounon.
Graph Techniques for Malware Detection Mark Stamp Graph Techniques 1.
Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.
Topic 21 Cryptography CS 555 Topic 2: Evolution of Classical Cryptography CS555.
Cryptography and Network Security (CS435) Part Two (Classic Encryption Techniques)
Day 18. Concepts Plaintext: the original message Ciphertext: the transformed message Encryption: transformation of plaintext into ciphertext Decryption:
Hunting for Metamorphic Engines Wing Wong Mark Stamp Hunting for Metamorphic Engines 1.
Symmetric-Key Cryptography
Module :MA3036NI Cryptography and Number Theory Lecture Week 3 Symmetric Encryption-2.
Cryptographic Attacks on Scrambled LZ-Compression and Arithmetic Coding By: RAJBIR SINGH BIKRAM KAHLON.
Cryptography Lecture 2: Classic Ciphers Piotr Faliszewski.
1 University of Palestine Information Security Principles ITGD 2202 Ms. Eman Alajrami.
Introduction to Ciphers Breno de Medeiros. Cipher types From “Cipher”, Wikipedia article.
Cryptography Part 1: Classical Ciphers Jerzy Wojdyło May 4, 2001.
Classical Crypto By: Luong-Sorin VA, IMIT Dith Nimol, IMIT.
Cryptography (Traditional Ciphers)
Traditional Symmetric-Key Ciphers
Security in Computing Cryptography (Traditional Ciphers)
Hidden Markov Models for Software Piracy Detection Shabana Kazi Mark Stamp HMMs for Piracy Detection 1.
CISC Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.
PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.
Lecture 23 Symmetric Encryption
Keisuke Goto, Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda
DATA & COMPUTER SECURITY (CSNB414) MODULE 3 MODERN SYMMETRIC ENCRYPTION.
Lecture 4 Page 1 CS 236 Online Basic Encryption Methods Substitutions –Monoalphabetic –Polyalphabetic Permutations.
CS526Topic 2: Classical Cryptography1 Information Security CS 526 Topic 2 Cryptography: Terminology & Classic Ciphers.
Lecture 2 Overview. Cryptography Secret writing – Disguised data cannot be read, modified, or fabricated easily – Feasibility of complexity for communicating.
@Yuan Xue CS 285 Network Security Block Cipher Principle Fall 2012 Yuan Xue.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
HUNTING FOR METAMORPHIC ENGINES Mark Stamp & Wing Wong September 13, 2006.
Lecture 3 Page 1 CS 236 Online Basic Encryption Methods Substitutions –Monoalphabetic –Polyalphabetic Permutations.
Substitution Ciphers.
Chapter 2 Basic Encryption and Decryption
PCA Applications Mark Stamp PCA Applications.
Cryptography and Network Security
HMM Applications Mark Stamp HMM Applications.
Basic Encryption Methods
ICS 555: Block Ciphers & DES Sultan Almuhammadi.
Presentation transcript:

Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp

The Idea  Metamorphic malware “mutates” with each infection  Measuring software similarity is a possible means of detection  But, how to measure similarity? o Much relevant previous work  Here, a novel distance measure is considered 2 Simple Substitution Distance

 We treat each metamorphic copy as if it is an “encrypted” version of “base” virus o Where the “cipher” is a simple substitution  Why simple substitution? o Easy to work with, fast algorithm to solve  Why might this work? o Simple substitution “cryptanalysis” tends to yield results that match family statistics o Accounts for modifications to files similar to some common metamorphic techniques 3 Simple Substitution Distance

Motivation  Given a simple substitution ciphertext where plaintext is English… o If we cryptanalyze using English language statistics, we expect a good score o If we cryptanalyze using, say, French language statistics, we expect a not-so-good score  We can obtain opcode statistics for a metamorphic family o Using simple substitution cryptanalysis, a virus of same family should score well… o …but, a benign exe should not score as well o Assuming statistics of these families differ 4 Simple Substitution Distance

Metamorphic Techniques  Many possible morphing strategies  Here, briefly consider o Register swapping o Garbage code insertion o Equivalent substitution o Transposition o Formal grammar mutation  At a high level --- substitution, transposition, insertion, and deletion 5 Simple Substitution Distance

Register Swap  Register swapping o E.g., replace EBX register with EAX, provided EAX not in use  Very simple and used in some of first metamorphic malware  Not very effective o Why not? 6 Simple Substitution Distance

Garbage Insertion  Garbage code insertion  Two cases: o Dead code --- inserted, but not executed  We can simply JMP over dead code o Do-nothing instructions --- executed, but has no effect on program  Like NOP or ADD EAX,0  Relatively easy to implement  Effective at breaking signature detection 7 Simple Substitution Distance

Code Substitution  Equivalent instruction substitution o For example, can replace SUB EAX,EAX with XOR EAX,EAX  Does not need to be 1 for 1 substitution o That is, can include insertion/deletion  Unlimited number of substitutions  Very effective  Somewhat difficult to implement 8 Simple Substitution Distance

Transposition  Transposition o Reorder instructions that have no dependency  For example, MOV R1,R2ADD R3,R4 ADD R3,R4MOV R1,R2  Can be highly effective  But, can be difficult to implement o Sometimes applied only to subroutines  9 Simple Substitution Distance

Formal Grammar Mutation  Formal grammar mutation  View morphing engine as non- deterministic automata o Allow transitions between any symbols o Apply formal grammar rules  Obtain many variants, high variation  Really just a formalization of others approaches, not a separate technique 10 Simple Substitution Distance

Previous Work  Easy to prove that “good” metamorphic code is immune to signature detection o Why?  But, many successes detecting hacker- produced metamorphic malware… o HMM/PHMM/machine learning o Graph-based techniques o Statistics (chi-squared, naïve Bayes) o Structural entropy o Linear algebraic techniques 11 Simple Substitution Distance

This Research  Measure similarity using “simple substitution distance”  We “decrypt” suspect file using statistics from a metamorphic family o If decryption is good, we classify it as a member of the same metamorphic family o If decryption is poor, we classify it as NOT a member of the given metamorphic family 12 Simple Substitution Distance

Simple Substitution Cipher  Simple substitution is one of the oldest and simplest means of encryption  A fixed key used to substitute letters o For example, Caesar’s cipher, substitute letter 3 positions ahead in alphabet o In general, any permutation can be key  Simple substitution cryptanalysis? o Statistical analysis of ciphertext 13 Simple Substitution Distance

Simple Substitution Cryptanalysis  Suppose you observe the ciphertext PBFPVYFBQXZTYFPBFEQJHDXXQVAPTPQJKTOYQWIPBVWLXTOXBTFXQW AXBVCXQWAXFQJVWLEQNTOZQGGQLFXQWAKVWLXQWAEBIPBFXFQVX GTVJVWLBTPQWAEBFPBFHCVLXBQUFEVWLXGDPEQVPQGVPPBFTIXPFHXZH VFAGFOTHFEFBQUFTDHZBQPOTHXTYFTODXQHFTDPTOGHFQPBQWAQJJ TODXQHFOQPWTBDHHIXQVAPBFZQHCFWPFHPBFIPBQWKFABVYYDZBOT HPBQPQJTQOTOGHFQAPBFEQJHDXXQVAVXEBQPEFZBVFOJIWFFACFCCF HQWAUVWFLQHGFXVAFXQHFUFHILTTAVWAFFAWTEVOITDHFHFQAITIX PFHXAFQHEFZQWGFLVWPTOFFA  Analyze frequency counts…  Likely that ciphertext “F” represents “E” o And so on, at least for common letters 14 Simple Substitution Distance

Simple Substitution Cryptanalysis  Can even automate attack 1. Make initial guess for key using frequency counts 2. Compute oldScore 3. Modify key by swapping adjacent elements 4. Compute newScore 5. If newScore > oldScore then oldScore = newScore 6. Else unswap elements 7. Goto 3  How to compute score? o Number of dictionary words in putative plaintext? o Much better to use English digraph statistics 15 Simple Substitution Distance

Jackobsen’s Algorithm  Method on previous slide can be slow o Why?  Jackobsen’s algorithm uses similar idea, but fast and efficient o Ciphertext is only decrypted once o So algorithm is (essentially) independent of length of message o Then, only matrix manipulations required 16 Simple Substitution Distance

Jackobsen’s Algorithm: Swapping  Assume plaintext is English, 26 letters  Let K = k 1,k 2,k 3,…,k 26 be putative key o And let “ | ” represent “swap”  Then we swap elements as follows  Also, we restart this swapping schedule from the beginning whenever score improves 17 Simple Substitution Distance

Jackobsen’s Algorithm: Swapping  Minimum swaps is 26 choose 2, or 325  Maximum is unbounded  Each swap requires a score computation  Average number of swaps? Experimentally o Ciphertext of length 500, average 1050 swaps o Ciphertext of length 8000, avg just 630 swaps  So, work depends on length of ciphertext o More ciphertext, better scores, fewer swaps 18 Simple Substitution Distance

Jackobsen’s Algorithm: Scoring  Let D = {d ij } be digraph distribution corresponding to putative key K  Let E = {e ij } be digraph distribution of English language  These matrices are 26 x 26  Compute score as 19 Simple Substitution Distance

Jackobsen’s Algorithm  So far, nothing fancy here o Could see all of this in a CS 265 assignment  Jackobsen’s trick: Determine new D matrix from old D without decrypting  How to do so? o It turns out that swapping elements of K swaps corresponding rows and columns of D  See example on next slides… 20 Simple Substitution Distance

Swapping Example  To simplify, suppose 10 letter alphabet E, T, A, O, I, N, S, R, H, D  Suppose you are given the ciphertext TNDEODRHISOADDRTEDOAHENSINEOAR DTTDTINDDRNEDNTTTDDISRETEEEEEAA  Frequency counts given by 21 Simple Substitution Distance

Swapping Example  We choose the putative key K given here   The corresponding putative plaintext is AOETRENDSHRIEENATE RIDTOHSOTRINEAAEAS OEENOTEOAAAEESHNA TTTTTII  Corresponding digraph distribution D is  22 Simple Substitution Distance

Swapping Example  Suppose we swap first 2 elements of K  Then decrypt using new K  And compute digraph matrix for new K Previous key K New key K 23 Simple Substitution Distance

Swapping Example  Old D matrix vs new D matrix  What do you notice?  So what’s the point here?  This is good! 24 Simple Substitution Distance

Jackobsen’s Algorithm 25 Simple Substitution Distance

Proposed Similarity Score  Extract opcodes sequences from collection of viruses o All viruses from same metamorphic family  Determine n most common opcodes o Symbol n+1 used for all “other” opcodes  Use resulting digraph statistics form matrix E = {e ij } o Note that matrix is (n+1) x (n+1) 26 Simple Substitution Distance

Scoring a File  Given an executable we want to score  Extract it’s opcode sequence  Use opcode digraph stats to get D = {d ij } o This matrix also (n+1) x (n+1)  Initial “key” K chosen to match monograph stats of virus family o Most frequent opcode in exe maps to most frequent opcode in virus family, etc.  Score based on distance between D and E o “Decrypt” D and score how closely it matches E o Jackobsen’s algorithm used for “decryption” 27 Simple Substitution Distance

Example  Suppose only 5 common opcodes in family viruses (in descending frequency)  Extract following sequence from an exe  Initial “key” is  And “decrypt is 28 Simple Substitution Distance

Example  Given “decrypt”  Form D matrix  After swap… o And so on… 29 Simple Substitution Distance

Scoring Algorithm 30 Simple Substitution Distance

Quantifying Success  Consider these 2 scatterplots of scores  Which is better (and why)? 31 Simple Substitution Distance

ROC Curves  Plot true-positive vs false positive o As “threshold” varies  Curve nearer 45-degree line is bad  Curve nearer upper-left is good 32 Simple Substitution Distance

ROC Curves  Use ROC curves to quantify success  Area under the ROC curve (AUC) o Probability that randomly chosen positive instance scores higher than a randomly chosen negative instance  AUC of 1.0 implies ideal detection  AUC of 0.5 means classification is no better than flipping a coin 33 Simple Substitution Distance

Parameter Selection  Tested the following parameters o Opcode matrix size o Scoring function o Normalization o Swapping strategy  None significant, except matrix size o So we only give results for matrix size here 34 Simple Substitution Distance

Opcode Matrix Size  Obtained following results  So, ironically, we use 26 x 26 matrix 35 Simple Substitution Distance

Test Data  Tested the following metamorphic families o G2 --- known to be weak o NGVCK --- highly metamorphic o MWOR --- highly metamorphic and stealthy  MWOR “padding ratios” of 0.5 to 4.0  For G2 and NGVCK o 50 files tested, cygwin utilities for benign files  For each MWOR padding ratio o 100 files tested, Linux utilities for benign files  5-fold cross validation in each experiment 36 Simple Substitution Distance

NGVCK and G2 Graphs 37 Simple Substitution Distance

MWOR Score Graphs 38 Simple Substitution Distance

MWOR ROC Curves 39 Simple Substitution Distance

MWOR AUC Statistics 40 Simple Substitution Distance

Efficiency 41 Simple Substitution Distance

Conclusions + Simple substitution score, good results for challenging metamorphic viruses + Scoring is fast and efficient + Applicable to other types of malware - Requires opcodes 42 Simple Substitution Distance

References  G. Shanmugam, R.M. Low, and M. Stamp, Simple substitution distance and metamorphic detection, Journal of Computer Virology and Hacking Techniques, 9(3): , 2013Simple substitution distance and metamorphic detection 43 Simple Substitution Distance