A Model for Self-Modifying Code Bertrand Anckaert, Matias Madou and Koen De Bosschere 8 th Information Hiding Conference, July 11 th 2006.

Slides:



Advertisements
Similar presentations
SSA and CPS CS153: Compilers Greg Morrisett. Monadic Form vs CFGs Consider CFG available exp. analysis: statement gen's kill's x:=v 1 p v 2 x:=v 1 p v.
Advertisements

Smashing the Stack for Fun and Profit
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Course Outline Traditional Static Program Analysis Software Testing
Compiler-Based Register Name Adjustment for Low-Power Embedded Processors Discussion by Garo Bournoutian.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
COMP 2003: Assembly Language and Digital Logic
CS412/413 Introduction to Compilers Radu Rugina Lecture 37: DU Chains and SSA Form 29 Apr 02.
Native x86 Decompilation Using Semantics-Preserving Structural Analysis and Iterative Control-Flow Structuring Edward J. Schwartz *, JongHyup Lee ✝, Maverick.
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Whole-Program Linear-Constant Analysis with Applications to Link-Time Optimization Ludo Van Put – Dominique Chanet – Koen De Bosschere Ghent University.
Assembly Code Verification Using Model Checking Hao XIAO Singapore University of Technology and Design.
University of Washington Last Time For loops  for loop → while loop → do-while loop → goto version  for loop → while loop → goto “jump to middle” version.
Binary Program Rewriting with Diablo – Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 1 Binary.
Cpeg421-08S/final-review1 Course Review Tom St. John.
Accessing parameters from the stack and calling functions.
EECC551 - Shaaban #1 Fall 2005 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.
1 CS 201 Compiler Construction Lecture 1 Introduction.
LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan.
Branch Regulation: Low-Overhead Protection from Code Reuse Attacks Mehmet Kayaalp, Meltem Ozsoy, Nael Abu-Ghazaleh and Dmitry Ponomarev Department of Computer.
Introduction to Virtual Machines
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
CEG 320/520: Computer Organization and Assembly Language ProgrammingIntel Assembly 1 Intel IA-32 vs Motorola
Dr. José M. Reyes Álamo 1.  The 80x86 memory addressing modes provide flexible access to memory, allowing you to easily access ◦ Variables ◦ Arrays ◦
Programmer's view on Computer Architecture by Istvan Haller.
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
1 Code Generation Part II Chapter 8 (1 st ed. Ch.9) COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
Dataflow Analysis Topic today Data flow analysis: Section 3 of Representation and Analysis Paper (Section 3) NOTE we finished through slide 30 on Friday.
Auther: Kevian A. Roudy and Barton P. Miller Speaker: Chun-Chih Wu Adviser: Pao, Hsing-Kuo.
Static Program Analyses of DSP Software Systems Ramakrishnan Venkitaraman and Gopal Gupta.
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
RIVERSIDE RESEARCH INSTITUTE Deobfuscator: An Automated Approach to the Identification and Removal of Code Obfuscation Eric Laspe, Reverse Engineer Jason.
1 CS 201 Compiler Construction Introduction. 2 Instructor Information Rajiv Gupta Office: WCH Room Tel: (951) Office.
Analyzing Memory Accesses in Obfuscated x86 Executables Michael Venable Mohamed R. Choucane Md. Enamul Karim Arun Lakhotia (Presenter) DIMVA 2005 Wien.
Assembly Language. Symbol Table Variables.DATA var DW 0 sum DD 0 array TIMES 10 DW 0 message DB ’ Welcome ’,0 char1 DB ? Symbol Table Name Offset var.
Program Obfuscation: A Quantitative Approach Presented by: Mariusz Jakubowski Microsoft Research Third Workshop on Quality of Protection October 29 th,
Assembly 03. Outline inc, dec movsx jmp, jnz Assembly Code Sections Labels String Variables equ $ Token 1.
Where’s the FEEB?: Effectiveness of Instruction Set Randomization Nora Sovarel, David Evans, Nate Paul University of Virginia Computer Science USENIX Security.
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.
© 2006 Andrew R. BernatMarch 2006Generalized Code Relocation Generalized Code Relocation for Instrumentation and Efficiency Andrew R. Bernat University.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 29-May 1, 2013 Detecting Code Reuse Attacks Using Dyninst Components Emily Jacobson, Drew.
Arrays. Outline 1.(Introduction) Arrays An array is a contiguous block of list of data in memory. Each element of the list must be the same type and use.
Binary Context-Sensitive Recognizer (BCSR) Hong Pham December 4, 2007.
Introduction to InfoSec – Recitation 3 Nir Krakowski (nirkrako at post.tau.ac.il) Itamar Gilad (infosec15 at modprobe.net)
Correct RelocationMarch 20, 2016 Correct Relocation: Do You Trust a Mutated Binary? Drew Bernat
E Virtual Machines Lecture 2 CPU Virtualization Scott Devine VMware, Inc.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Paradyn Project Safe and Efficient Instrumentation Andrew Bernat.
Section 5: Procedures & Stacks
Recitation 3: Procedures and the Stack
Instruction Set Architecture
Code Optimization Overview and Examples
Subroutines and the Stack
Introduction to Compilers Tim Teitelbaum
Chapter 3 Machine-Level Representation of Programs
Princeton University Spring 2016
asum.ys A Y86 Programming Example
Ramblr Making Reassembly Great Again
Summary by - Bo Zhang and Shuang Guo [Date: 03/31/2014]
Defeating Instruction Set Randomization Nora Sovarel
Factored Use-Def Chains and Static Single Assignment Forms
CS 201 Compiler Construction
Machine-Level Programming III: Procedures Sept 18, 2001
Code Optimization Overview and Examples Control Flow Graph
Chapter 3 Machine-Level Representation of Programs
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
Subroutines and the Stack
ICS51 Introductory Computer Organization
Computer Architecture and Assembly Language
CS 201 Compiler Construction
Presentation transcript:

A Model for Self-Modifying Code Bertrand Anckaert, Matias Madou and Koen De Bosschere 8 th Information Hiding Conference, July 11 th 2006

2 oProblem for Reverse-Engineering oUsed for Hiding Program Internals Software Protection oCopyright Protection Mechanisms oSecret Algorithms o… Malicious intent of viruses oProgram Optimization Self-Modifying Code

3 Scope Focus: malicious host paradigm Not: malicious code paradigm known

4 Goal oInternal Representation oConstruction and Deconstruction oAccurate and Conservative oAnalyses and Transformations

5 oIntroduction oRunning Example oInternal Representation oConstruction and Deconstruction oAnalyses and Transformations oApplications Overview Accurate and Conservative

6 Example: ISA AssemblyBinarySemantics movb value to0xc6 value to set byte at address to to value value inc reg0x40 reg increment register reg dec reg0x48 reg decrement register reg push reg0xff reg push register reg jmp to0x0c to jump to address to (absolute)

7 Example: Introduction AddressBinaryAssembly 0x0 0x3 0x5 0x8 0xa 0xc c6 0c c6 0c ff movb 0xc 0x8 inc %ebx movb 0xc 0x5 inc %edx push %ecx dec %ebx

8 Example: Trace movb 0xc 0x8 inc %ebx movb 0xc 0x5 inc %edx push %ecx dec %ebx 1  movb 0xc 0x8 inc %ebx movb 0xc 0x5 jmp 0x3 push %ecx dec %ebx  movb 0xc 0x8 inc %ebx jmp 0xc jmp 0x3 push %ecx dec %ebx  =inc %ebx 2) inc %ebx 3) movb 0xc 0x5 4) jmp 0x3 5) inc %ebx 6) jmp 0xc 7) dec %ebx Trace: 1) movb 0xc 0x8 1 3

9 oScope oRunning Example oInternal Representation Superposition of CFGs Codebytes Codebyte Conditional Edges Consumption of Codebyte Values oConstruction and Deconstruction oAnalyses and Transformations oApplications Overview

10 CFG for Traditional Code oOne of the most important internal representations for traditional code Well-understood how to: oconstruct and deconstruct oaccurate and conservative oanalysis and transformations representation of a superset of all possible executions

11 not conservative Traditional CFG Construction for SMC movb 0xc 0x8 inc %ebx movb 0xc 0x5 inc %edx push %ecx dec %ebx   inc %ebx movb 0xc 0x5 jmp 0x3 push %ecx dec %ebx dec %ebx push %ecx jmp 0x3 inc %ebx jmp 0xc movb 0xc 0x8 1) movb 0xc 0x8 2) inc %ebx 3) movb 0xc 0x5 4) jmp 0x3 5) inc %ebx 6) jmp 0xc 7) dec %ebx 12,53712, ,5342, ,562,56 1 not a superset not accurate Unreachable Code Elimination

12 Example: Superposition of CFGs movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 dec %ebx jmp 0xc inc %edx push %ecx 2) inc %ebx 3) movb 0xc 0x5 4) jmp 0x3 5) inc %ebx 6) jmp 0xc 7) dec %ebx 1) movb 0xc 0x8 1 2,

13 Contains CFG 1 movb 0xc 0x8 inc %ebx movb 0xc 0x5 inc %edx push %ecx dec %ebx  movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 inc %edx dec %ebx jmp 0xc push %ecx

14 Contains CFG 2  inc %ebx movb 0xc 0x5 jmp 0x3 push %ecx dec %ebx movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 inc %edx dec %ebx jmp 0xc push %ecx

15 Contains CFG 3  dec %ebx push %ecx jmp 0x3 inc %ebx jmp 0xc movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 inc %edx dec %ebx jmp 0xc push %ecx

16 Superposition of CFGs oRepresents a superset of all possible executions oBut: how do we linearize a graph with multiple outgoing/incoming fall-through paths? how do we analyze what states the program can be in at a given program point? … Extensions

17 oScope oRunning Example oInternal Representation Superposition of CFGs CodeBytes CodeByte Conditional Edges Consumption of CodeByte Values oConstruction and Deconstruction oAnalyses and Transformations oApplications Overview

18 CodeByte 0x5 c6 0c identifier (address) states initial state

19 Extension 1: CodeBytes movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 inc %edx dec %ebx jmp 0xc push %ecx 0x3 40 0x4 01 0x6 0c 0x7 05 0xa ff 0xb 02 0x9 03 0xc 48 0xd 01 0x8 40 0c 0x5 c6 0c 0x0 c6 0x1 0c 0x2 08

20 Extension 2: CodeByte Conditional Edges movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 inc %edx dec %ebx jmp 0xc push %ecx 0x3 40 0x4 01 0x6 0c 0x7 05 0xa ff 0xb 02 0x9 03 0xc 48 0xd 01 0x8 40 0c 0x5 c6 0c 0x0 c6 0x1 0c 0x2 08 *(0x5)==c6 *(0x8)==0c *(0x5)==0c *(0x8)==40

21 Extension 3: Consumption of CodeBytes oA codebyte is read when it is interpreted as (part of) an instruction by the CPU oImportant for data analyses, such as liveness analysis

22 Traditional Code vs. Self-Modifying Code oTraditional Code No Overlap Not Self-Inspecting Not Self-Modifying oSpecial case of self-modifying code. Extensions can be omitted because: Can be easily linearized as instructions do not overlap Target locations of control transfers can be in only one state Result of data analyses on code is trivial as the code is constant

23 oScope oRunning Example oInternal Representation oConstruction and Deconstruction oAnalyses and Transformations oApplications Overview

24 Construction oRequires that we know: Targets of control flow Which instructions write what where oNot a problem in the malicious host paradigm oIn the malicious code paradigm (Future Work): Observing dynamic execution Static extension

25 Linearization movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 inc %edx push %ecx dec %ebx jmp 0xc 0x3 40 0x4 01 0x6 0c 0x7 05 0xa ff 0xb 02 0x9 03 0xc 48 0xd 01 0x8 40 0c 0x5 c6 0c 0x0 c6 0x1 0c 0x2 08 c6 0c c6 0c ff

26 Example: Introduction AddressBinaryAssembly 0x0 0x3 0x5 0x8 0xa 0xc c6 0c c6 0c ff movb 0xc 0x8 inc %ebx movb 0xc 0x5 inc %edx push %ecx dec %ebx

27 oScope oRunning Example oInternal Representation oConstruction and Deconstruction oAnalyses and Transformations Constant Propagation Unreachable Code(Byte) Elimination Liveness Analysis Loop Unrolling oApplications Overview

28 *(0x8)==40 Constant Propagation movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 inc %edx dec %ebx jmp 0xc push %ecx 0x3 40 0x4 01 0x6 0c 0x7 05 0xa ff 0xb 02 0x9 03 0xc 48 0xd 01 0x8 40 0c 0x5 c6 0c 0x0 c6 0x1 0c 0x2 08 *(0x5)==c6 *(0x8)==0c *(0x5)==0c

29 Unreachable Code(Byte) Elimination movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 inc %edx dec %ebx jmp 0xc push %ecx 0x3 40 0x4 01 0x6 0c 0x7 05 0xa ff 0xb 02 0x9 03 0xc 48 0xd 01 0x8 40 0c 0x5 c6 0c 0x0 c6 0x1 0c 0x2 08 *(0x5)==c6 *(0x8)==0c *(0x5)==0c

30 Liveness Analysis movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 dec %ebx jmp 0xc 0x3 40 0x4 01 0x6 0c 0x7 05 0x9 03 0xc 48 0xd 01 0x8 40 0c 0x5 c6 0c 0x0 c6 0x1 0c 0x2 08 *(0x5)==c6 *(0x8)==0c *(0x5)==0c 0x8

31 Idempotent Instruction Removal movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 dec %ebx jmp 0xc 0x3 40 0x4 01 0x6 0c 0x7 05 0x9 03 0xc 48 0xd c 0x5 c6 0c 0x0 c6 0x1 0c 0x2 08 *(0x5)==c6 *(0x8)==0c *(0x5)==0c 0x8

32 1) movb 0xc 0x8 2) inc %ebx 3) movb 0xc 0x5 4) jmp 0x3 5) inc %ebx 6) jmp 0xc 7) dec %ebx Loop Unrolling and … inc %ebx _c c6 0c _e 0c jmp 0xc dec %ebx jmp 0xc inc %ebx movb 0xc 0x5 movb 0xc _c movb 0xc 0x5 movb 0xc _c jmp 0x3 _a 40 _b 01 _f 0c _g 0c _h _c _d 0c 0x5 c6 0c 0x7 05 0x6 0c 0x3 40 0x4 01 _i 0c _j 0c _k _c 0x3 40 0x4 01 *(_c)==0c *(_c)==c6 *(0x5)==0c *(0x5)==c6 = 0xc 48 0xd 01

33 oScope oRunning Example oInternal Representation oConstruction and Deconstruction oAnalyses and Transformations oApplications Overview

34 Applications oOutlining of almost identical code snippets through one-bit modifiers oOverlapping similar functions through diff scripts oSignificant slowdown (factor 1.15 up to 3)

35 Almost Identical Code Snippets push 0xa804245c pop %ebx ret 0x0 68 0x1 5c 0x4 a8 0x5 5b 0x6 c3 0x3 04 mov 4(%esp),%ebx test 0x5b,%al ret 0x2 24 0x0 8b 0x1 5c 0x4 a8 0x5 5b 0x6 c3 0x3 04 0x2 24

36 Merged Code Snippets push 0xa804245c pop %ebx 0x1 5c 0x4 a8 0x5 5b 0x6 c3 0x3 04 mov 4(%esp),%ebx test 0x5b,%al 0x2 24 0x0 8b 68 ret movb 0x68 0x0 jmp 0x0 movb 0x8b 0x0 jmp 0x0

37 Conclusion oSuperposition of different CFGs oThree extensions CodeByte datastructure CodeByte conditional edges Consumption of CodeBytes Internal Representation Allows for: Construction (limited) and Deconstruction Conservative and Accurate Analyses and Transformations (iterative)

Questions? Presentation: Tool:

39 Linearization oChains of instructions Chains of codebytes oCodebytes c and d must be concatenated: c and d are successive codebytes in an instruction c is the last codebyte of instruction I and d is the first codebyte of instruction J and I and J are successive instructions in a basic block c is the last codebyte of basic block A and d is the first codebyte of basic block B and A and B are connected by a fall-through path

40 Example: Superposition of CFGs movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 inc %edx dec %ebx jmp 0xc push %ecx

41 Example: Superposition of CFGs movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 inc %edx dec %ebx jmp 0xc push %ecx

42 Example: Superposition of CFGs movb 0xc 0x8 inc %ebx jmp 0x3 movb 0xc 0x5 inc %edx dec %ebx jmp 0xc push %ecx