Detection of ASCII Malware Parbati Kumar Manna Dr. Sanjay Ranka Dr. Shigang Chen
2 Internet Worm and Malware Huge damage potential Infects hundreds of thousands of computers Costs millions of dollars in damage Melissa, ILOVEYOU, Code Red, Nimda, Slammer, SoBig, MyDoom Mostly uses Buffer Overflow Propagation is automatic (mostly)
3 Recent Trends Shift in hacker’s mindset Malware becoming increasingly evasive and obfuscative Emergence of Zero-day worms Arrival of Script Kiddies
4 Motivation for ASCII Attacks Prevalence of servers expecting text-only input Text-based protocols Presumption of text being benign Deployment of ASCII filter for bypassing text
5 IDS Detecting ASCII Attack? Disassembly-based IDS All jump instructions are ASCII Higher proportion of branches Exponential disassembly cost High processing overhead for IDS Frequency-based IDS PAYL evaded by ASCII worm
6 Buffer Overflow
7 Opcode Unavailability Shellcode requires binary opcodes Here only xor, and, sub, cmp etc. Must generate opcodes dynamically Difficulty in Encryption No backward jump Can’t use same decrypter routine for each encrypted block No one-to-one correspondence between ASCII and binary Constraints of ASCII Malware 0mayvary ASCII binary
8 Creation of ASCII Malware
9 Buffer Overflow using ASCII Overflowing a buffer using an ASCII string:
10 Opcode Unavailability Dynamic generation of opcodes needs more ASCII instructions for each binary instruction Difficulty in Encryption No backward jump means decrypter block for each encrypted block must be hardcoded Long sequence of contiguous valid instructions likely high MEL Detection of ASCII Malware What is this MEL?
11 Indicates maximum length of an execution path Need to disassemble (and execute) from all possible entry points All branching must be considered Abstract payload execution Used for binary worms with sled Effectiveness dwindled presently Maximum Executable Length
12 Benign Text has Low MEL Contains characters that correspond to invalid instructions Privileged Instruction (I/O) Arbitrary Segment Selector More Memory-accessing instructions – may use uninitialized registers Long sequence of contiguous valid instructions unlikely low MEL
13 Proposed Solution Question: How long is “long”? Find out the maximum length of valid instruction sequence If it is long enough, the stream contains a malware
14 Toss a coin n times What is the probability that the max distance between two consecutive heads is ? Probabilistic Analysis Head (H) Invalid Instruction (I) Tail (T) Valid Instruction (v) T H T T H T T T T T H T T TV I V V I V V V V V I V V VT H T T H T T T T T H T T TV I V V I V V V V V I V V V
15 Probabilistic Analysis n = number of coin tosses p = probability of a head X i = R.V.s for inter-head distances X max = Max inter-head distance C.D.F of X max = Prob [ X max ≤ x ] = [1 – p(1-p) x ] n F.P. rate = 1 - Prob [ X max ≤ τ ] = 1 - [1 – p(1-p) τ ] n
16 Probabilistic Analysis For a fixed N = k (exactly k invalid instructions)
17 Probabilistic Analysis For all possible values of N:
18 Threshold Calculation n, p, (false positive rate) (max inter-head distance) Known Unknown Threshold
19 Independence Assumption 2 test contingency table ObservedExpected I 2 is valid I 2 is invalid I 1 is valid I 2 is invalid I 1 is valid I 1 is invalid Validity of an instruction is an independent event All the X i ’s are independent (while X i = n)
20 Threshold Calculation With increasing n, we must choose a larger to keep the same rate of false positive
21 Threshold Calculation With decreasing p, we must choose a larger to keep the same rate of false positive
22 Determine n E [ I ] = E [ Prefix chain length ] + E [ core instruction length ] Obtained from character frequency of input data
23 1.Privileged instructions 2.Wrong Segment Prefix Selector 3.Un-initialized memory access Determine p Invalid Instructions Only 1. and 2. can be determined on a standalone basis
24 Experimental Setup
25 Implementation
26 Experimental Setup Benign data setup ASCII stream captured from live CISE network using Ethereal Malicious data setup Existing framework used to generate ASCII worm by converting binary worms Promising experimental results for max valid instruction length Benign: all max values all below threshold Malicious: values significantly higher than
27 Experimental Results (DAWN)
28 Experimental Results (APE-L)
29 Contrasting with APE Full content examination Threshold calculation Sled Vs. malware Exploiting text-specific properties
30 Multilevel Encryption Encryption Decryption binary ASCII binary Only Visible decrypter
31 Multilevel Encryption Text 0x20 – 0x3F Text 0x40 – 0x5F Text 0x60 – 0x7E Binary
32 Questions
33 Thank you