Detection of ASCII Malware Parbati Kumar Manna Dr. Sanjay Ranka Dr. Shigang Chen.

Detection of ASCII Malware Parbati Kumar Manna Dr. Sanjay Ranka Dr. Shigang Chen

2 Internet Worm and Malware Huge damage potential  Infects hundreds of thousands of computers  Costs millions of dollars in damage  Melissa, ILOVEYOU, Code Red, Nimda, Slammer, SoBig, MyDoom Mostly uses Buffer Overflow Propagation is automatic (mostly)

3 Recent Trends Shift in hacker’s mindset Malware becoming increasingly evasive and obfuscative Emergence of Zero-day worms Arrival of Script Kiddies

4 Motivation for ASCII Attacks Prevalence of servers expecting text-only input Text-based protocols Presumption of text being benign Deployment of ASCII filter for bypassing text

5 IDS Detecting ASCII Attack? Disassembly-based IDS  All jump instructions are ASCII  Higher proportion of branches  Exponential disassembly cost  High processing overhead for IDS Frequency-based IDS  PAYL evaded by ASCII worm

6 Buffer Overflow

7 Opcode Unavailability  Shellcode requires binary opcodes  Here only xor, and, sub, cmp etc.  Must generate opcodes dynamically Difficulty in Encryption  No backward jump  Can’t use same decrypter routine for each encrypted block  No one-to-one correspondence between ASCII and binary Constraints of ASCII Malware 0mayvary ASCII binary

8 Creation of ASCII Malware

9 Buffer Overflow using ASCII Overflowing a buffer using an ASCII string:

10 Opcode Unavailability  Dynamic generation of opcodes needs more ASCII instructions for each binary instruction Difficulty in Encryption  No backward jump means decrypter block for each encrypted block must be hardcoded  Long sequence of contiguous valid instructions likely  high MEL Detection of ASCII Malware What is this MEL?

11 Indicates maximum length of an execution path  Need to disassemble (and execute) from all possible entry points  All branching must be considered Abstract payload execution  Used for binary worms with sled  Effectiveness dwindled presently Maximum Executable Length

12 Benign Text has Low MEL Contains characters that correspond to invalid instructions  Privileged Instruction (I/O)  Arbitrary Segment Selector  More Memory-accessing instructions – may use uninitialized registers  Long sequence of contiguous valid instructions unlikely  low MEL

13 Proposed Solution Question: How long is “long”? Find out the maximum length of valid instruction sequence If it is long enough, the stream contains a malware

14 Toss a coin n times What is the probability that the max distance between two consecutive heads is ? Probabilistic Analysis Head (H) Invalid Instruction (I) Tail (T) Valid Instruction (v) T H T T H T T T T T H T T TV I V V I V V V V V I V V VT H T T H T T T T T H T T TV I V V I V V V V V I V V V

15 Probabilistic Analysis n = number of coin tosses p = probability of a head X i = R.V.s for inter-head distances X max = Max inter-head distance C.D.F of X max = Prob [ X max ≤ x ] = [1 – p(1-p) x ] n F.P. rate  = 1 - Prob [ X max ≤ τ ] = 1 - [1 – p(1-p) τ ] n

16 Probabilistic Analysis For a fixed N = k (exactly k invalid instructions)

17 Probabilistic Analysis For all possible values of N:

18 Threshold Calculation n, p,  (false positive rate)  (max inter-head distance) Known Unknown Threshold

19 Independence Assumption  2 test contingency table ObservedExpected I 2 is valid I 2 is invalid I 1 is valid I 2 is invalid I 1 is valid8960279789222835 I 1 is invalid27979382835900 Validity of an instruction is an independent event All the X i ’s are independent (while  X i = n)

20 Threshold Calculation With increasing n, we must choose a larger  to keep the same rate of false positive 

21 Threshold Calculation With decreasing p, we must choose a larger  to keep the same rate of false positive 

22 Determine n E [ I ] = E [ Prefix chain length ] + E [ core instruction length ] Obtained from character frequency of input data

23 1.Privileged instructions 2.Wrong Segment Prefix Selector 3.Un-initialized memory access Determine p Invalid Instructions Only 1. and 2. can be determined on a standalone basis

24 Experimental Setup

25 Implementation

26 Experimental Setup Benign data setup  ASCII stream captured from live CISE network using Ethereal Malicious data setup  Existing framework used to generate ASCII worm by converting binary worms Promising experimental results for max valid instruction length  Benign: all max values all below threshold   Malicious: values significantly higher than 

27 Experimental Results (DAWN)

28 Experimental Results (APE-L)

29 Contrasting with APE Full content examination Threshold calculation Sled Vs. malware Exploiting text-specific properties

30 Multilevel Encryption Encryption Decryption binary ASCII binary Only Visible decrypter

31 Multilevel Encryption Text 0x20 – 0x3F Text 0x40 – 0x5F Text 0x60 – 0x7E     Binary 

32 Questions

33 Thank you

Detection of ASCII Malware Parbati Kumar Manna Dr. Sanjay Ranka Dr. Shigang Chen.

Similar presentations

Presentation on theme: "Detection of ASCII Malware Parbati Kumar Manna Dr. Sanjay Ranka Dr. Shigang Chen."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Detection of ASCII Malware Parbati Kumar Manna Dr. Sanjay Ranka Dr. Shigang Chen.

Similar presentations

Presentation on theme: "Detection of ASCII Malware Parbati Kumar Manna Dr. Sanjay Ranka Dr. Shigang Chen."— Presentation transcript:

Similar presentations

About project

Feedback