A Model for Self-Modifying Code Bertrand Anckaert, Matias Madou and Koen De Bosschere 8 th Information Hiding Conference, July 11 th 2006
2 oProblem for Reverse-Engineering oUsed for Hiding Program Internals Software Protection oCopyright Protection Mechanisms oSecret Algorithms o… Malicious intent of viruses oProgram Optimization Self-Modifying Code
3 Scope Focus: malicious host paradigm Not: malicious code paradigm known
4 Goal oInternal Representation oConstruction and Deconstruction oAccurate and Conservative oAnalyses and Transformations
5 oIntroduction oRunning Example oInternal Representation oConstruction and Deconstruction oAnalyses and Transformations oApplications Overview Accurate and Conservative
6 Example: ISA AssemblyBinarySemantics movb value to0xc6 value to set byte at address to to value value inc reg0x40 reg increment register reg dec reg0x48 reg decrement register reg push reg0xff reg push register reg jmp to0x0c to jump to address to (absolute)
7 Example: Introduction AddressBinaryAssembly 0x0 0x3 0x5 0x8 0xa 0xc c6 0c c6 0c ff movb 0xc 0x8 inc %ebx movb 0xc 0x5 inc %edx push %ecx dec %ebx
8 Example: Trace movb 0xc 0x8 inc %ebx movb 0xc 0x5 inc %edx push %ecx dec %ebx 1 movb 0xc 0x8 inc %ebx movb 0xc 0x5 jmp 0x3 push %ecx dec %ebx movb 0xc 0x8 inc %ebx jmp 0xc jmp 0x3 push %ecx dec %ebx =inc %ebx 2) inc %ebx 3) movb 0xc 0x5 4) jmp 0x3 5) inc %ebx 6) jmp 0xc 7) dec %ebx Trace: 1) movb 0xc 0x8 1 3
9 oScope oRunning Example oInternal Representation Superposition of CFGs Codebytes Codebyte Conditional Edges Consumption of Codebyte Values oConstruction and Deconstruction oAnalyses and Transformations oApplications Overview
10 CFG for Traditional Code oOne of the most important internal representations for traditional code Well-understood how to: oconstruct and deconstruct oaccurate and conservative oanalysis and transformations representation of a superset of all possible executions
11 not conservative Traditional CFG Construction for SMC movb 0xc 0x8 inc %ebx movb 0xc 0x5 inc %edx push %ecx dec %ebx inc %ebx movb 0xc 0x5 jmp 0x3 push %ecx dec %ebx dec %ebx push %ecx jmp 0x3 inc %ebx jmp 0xc movb 0xc 0x8 1) movb 0xc 0x8 2) inc %ebx 3) movb 0xc 0x5 4) jmp 0x3 5) inc %ebx 6) jmp 0xc 7) dec %ebx 12,53712, ,5342, ,562,56 1 not a superset not accurate Unreachable Code Elimination
12 Example: Superposition of CFGs movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 dec %ebx jmp 0xc inc %edx push %ecx 2) inc %ebx 3) movb 0xc 0x5 4) jmp 0x3 5) inc %ebx 6) jmp 0xc 7) dec %ebx 1) movb 0xc 0x8 1 2,
13 Contains CFG 1 movb 0xc 0x8 inc %ebx movb 0xc 0x5 inc %edx push %ecx dec %ebx movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 inc %edx dec %ebx jmp 0xc push %ecx
14 Contains CFG 2 inc %ebx movb 0xc 0x5 jmp 0x3 push %ecx dec %ebx movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 inc %edx dec %ebx jmp 0xc push %ecx
15 Contains CFG 3 dec %ebx push %ecx jmp 0x3 inc %ebx jmp 0xc movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 inc %edx dec %ebx jmp 0xc push %ecx
16 Superposition of CFGs oRepresents a superset of all possible executions oBut: how do we linearize a graph with multiple outgoing/incoming fall-through paths? how do we analyze what states the program can be in at a given program point? … Extensions
17 oScope oRunning Example oInternal Representation Superposition of CFGs CodeBytes CodeByte Conditional Edges Consumption of CodeByte Values oConstruction and Deconstruction oAnalyses and Transformations oApplications Overview
18 CodeByte 0x5 c6 0c identifier (address) states initial state
19 Extension 1: CodeBytes movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 inc %edx dec %ebx jmp 0xc push %ecx 0x3 40 0x4 01 0x6 0c 0x7 05 0xa ff 0xb 02 0x9 03 0xc 48 0xd 01 0x8 40 0c 0x5 c6 0c 0x0 c6 0x1 0c 0x2 08
20 Extension 2: CodeByte Conditional Edges movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 inc %edx dec %ebx jmp 0xc push %ecx 0x3 40 0x4 01 0x6 0c 0x7 05 0xa ff 0xb 02 0x9 03 0xc 48 0xd 01 0x8 40 0c 0x5 c6 0c 0x0 c6 0x1 0c 0x2 08 *(0x5)==c6 *(0x8)==0c *(0x5)==0c *(0x8)==40
21 Extension 3: Consumption of CodeBytes oA codebyte is read when it is interpreted as (part of) an instruction by the CPU oImportant for data analyses, such as liveness analysis
22 Traditional Code vs. Self-Modifying Code oTraditional Code No Overlap Not Self-Inspecting Not Self-Modifying oSpecial case of self-modifying code. Extensions can be omitted because: Can be easily linearized as instructions do not overlap Target locations of control transfers can be in only one state Result of data analyses on code is trivial as the code is constant
23 oScope oRunning Example oInternal Representation oConstruction and Deconstruction oAnalyses and Transformations oApplications Overview
24 Construction oRequires that we know: Targets of control flow Which instructions write what where oNot a problem in the malicious host paradigm oIn the malicious code paradigm (Future Work): Observing dynamic execution Static extension
25 Linearization movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 inc %edx push %ecx dec %ebx jmp 0xc 0x3 40 0x4 01 0x6 0c 0x7 05 0xa ff 0xb 02 0x9 03 0xc 48 0xd 01 0x8 40 0c 0x5 c6 0c 0x0 c6 0x1 0c 0x2 08 c6 0c c6 0c ff
26 Example: Introduction AddressBinaryAssembly 0x0 0x3 0x5 0x8 0xa 0xc c6 0c c6 0c ff movb 0xc 0x8 inc %ebx movb 0xc 0x5 inc %edx push %ecx dec %ebx
27 oScope oRunning Example oInternal Representation oConstruction and Deconstruction oAnalyses and Transformations Constant Propagation Unreachable Code(Byte) Elimination Liveness Analysis Loop Unrolling oApplications Overview
28 *(0x8)==40 Constant Propagation movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 inc %edx dec %ebx jmp 0xc push %ecx 0x3 40 0x4 01 0x6 0c 0x7 05 0xa ff 0xb 02 0x9 03 0xc 48 0xd 01 0x8 40 0c 0x5 c6 0c 0x0 c6 0x1 0c 0x2 08 *(0x5)==c6 *(0x8)==0c *(0x5)==0c
29 Unreachable Code(Byte) Elimination movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 inc %edx dec %ebx jmp 0xc push %ecx 0x3 40 0x4 01 0x6 0c 0x7 05 0xa ff 0xb 02 0x9 03 0xc 48 0xd 01 0x8 40 0c 0x5 c6 0c 0x0 c6 0x1 0c 0x2 08 *(0x5)==c6 *(0x8)==0c *(0x5)==0c
30 Liveness Analysis movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 dec %ebx jmp 0xc 0x3 40 0x4 01 0x6 0c 0x7 05 0x9 03 0xc 48 0xd 01 0x8 40 0c 0x5 c6 0c 0x0 c6 0x1 0c 0x2 08 *(0x5)==c6 *(0x8)==0c *(0x5)==0c 0x8
31 Idempotent Instruction Removal movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 dec %ebx jmp 0xc 0x3 40 0x4 01 0x6 0c 0x7 05 0x9 03 0xc 48 0xd c 0x5 c6 0c 0x0 c6 0x1 0c 0x2 08 *(0x5)==c6 *(0x8)==0c *(0x5)==0c 0x8
32 1) movb 0xc 0x8 2) inc %ebx 3) movb 0xc 0x5 4) jmp 0x3 5) inc %ebx 6) jmp 0xc 7) dec %ebx Loop Unrolling and … inc %ebx _c c6 0c _e 0c jmp 0xc dec %ebx jmp 0xc inc %ebx movb 0xc 0x5 movb 0xc _c movb 0xc 0x5 movb 0xc _c jmp 0x3 _a 40 _b 01 _f 0c _g 0c _h _c _d 0c 0x5 c6 0c 0x7 05 0x6 0c 0x3 40 0x4 01 _i 0c _j 0c _k _c 0x3 40 0x4 01 *(_c)==0c *(_c)==c6 *(0x5)==0c *(0x5)==c6 = 0xc 48 0xd 01
33 oScope oRunning Example oInternal Representation oConstruction and Deconstruction oAnalyses and Transformations oApplications Overview
34 Applications oOutlining of almost identical code snippets through one-bit modifiers oOverlapping similar functions through diff scripts oSignificant slowdown (factor 1.15 up to 3)
35 Almost Identical Code Snippets push 0xa804245c pop %ebx ret 0x0 68 0x1 5c 0x4 a8 0x5 5b 0x6 c3 0x3 04 mov 4(%esp),%ebx test 0x5b,%al ret 0x2 24 0x0 8b 0x1 5c 0x4 a8 0x5 5b 0x6 c3 0x3 04 0x2 24
36 Merged Code Snippets push 0xa804245c pop %ebx 0x1 5c 0x4 a8 0x5 5b 0x6 c3 0x3 04 mov 4(%esp),%ebx test 0x5b,%al 0x2 24 0x0 8b 68 ret movb 0x68 0x0 jmp 0x0 movb 0x8b 0x0 jmp 0x0
37 Conclusion oSuperposition of different CFGs oThree extensions CodeByte datastructure CodeByte conditional edges Consumption of CodeBytes Internal Representation Allows for: Construction (limited) and Deconstruction Conservative and Accurate Analyses and Transformations (iterative)
Questions? Presentation: Tool:
39 Linearization oChains of instructions Chains of codebytes oCodebytes c and d must be concatenated: c and d are successive codebytes in an instruction c is the last codebyte of instruction I and d is the first codebyte of instruction J and I and J are successive instructions in a basic block c is the last codebyte of basic block A and d is the first codebyte of basic block B and A and B are connected by a fall-through path
40 Example: Superposition of CFGs movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 inc %edx dec %ebx jmp 0xc push %ecx
41 Example: Superposition of CFGs movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 inc %edx dec %ebx jmp 0xc push %ecx
42 Example: Superposition of CFGs movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 inc %edx dec %ebx jmp 0xc push %ecx