Gogul Balakrishnan Thomas Reps University of Wisconsin Analyzing Memory Accesses in x86 Executables.

Slides:



Advertisements
Similar presentations
Practical Malware Analysis
Advertisements

Intermediate Code Generation
Run-time Environment for a Program different logical parts of a program during execution stack – automatically allocated variables (local variables, subdivided.
Machine/Assembler Language Putting It All Together Noah Mendelsohn Tufts University Web:
8. Code Generation. Generate executable code for a target machine that is a faithful representation of the semantics of the source code Depends not only.
COMP 2003: Assembly Language and Digital Logic
C Programming and Assembly Language Janakiraman V – NITK Surathkal 2 nd August 2014.
Assembly Code Verification Using Model Checking Hao XIAO Singapore University of Technology and Design.
Lecture 11 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
PC hardware and x86 3/3/08 Frans Kaashoek MIT
1 ICS 51 Introductory Computer Organization Fall 2006 updated: Oct. 2, 2006.
1 Function Calls Professor Jennifer Rexford COS 217 Reading: Chapter 4 of “Programming From the Ground Up” (available online from the course Web site)
Model Checking x86 Executables with CodeSurfer/x86 and WPDS++ G. Balakrishnan 1, T. Reps 1,2, N. Kidd 1, A. Lal 1, J. Lim 1, D. Melski 2, R. Gruian 2,
Accessing parameters from the stack and calling functions.
Gogul Balakrishnan NEC Laboratories America
Run time vs. Compile time
C Prog. To Object Code text text binary binary Code in files p1.c p2.c
Addressing Modes Chapter 11 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S.
CEG 320/520: Computer Organization and Assembly Language ProgrammingIntel Assembly 1 Intel IA-32 vs Motorola
Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.
6.828: PC hardware and x86 Frans Kaashoek
Dr. José M. Reyes Álamo 1.  The 80x86 memory addressing modes provide flexible access to memory, allowing you to easily access ◦ Variables ◦ Arrays ◦
Code Generation Gülfem Savrun Yeniçeri CS 142 (b) 02/26/2013.
CSc 453 Runtime Environments Saumya Debray The University of Arizona Tucson.
Computer Science Detecting Memory Access Errors via Illegal Write Monitoring Ongoing Research by Emre Can Sezer.
The x86 Architecture Lecture 15 Fri, Mar 4, 2005.
Recitation 2: Outline Assembly programming Using gdb L2 practice stuff Minglong Shao Office hours: Thursdays 5-6PM Wean Hall.
Today’s topics Procedures Procedures Passing values to/from procedures Passing values to/from procedures Saving registers Saving registers Documenting.
Analyzing Memory Accesses in Obfuscated x86 Executables Michael Venable Mohamed R. Choucane Md. Enamul Karim Arun Lakhotia (Presenter) DIMVA 2005 Wien.
Carnegie Mellon 1 Odds and Ends Intro to x86-64 Memory Layout.
Microprocessors The ia32 User Instruction Set Jan 31st, 2002.
Addressing Modes Chapter 6 S. Dandamudi To be used with S. Dandamudi, “Introduction to Assembly Language Programming,” Second Edition, Springer,
CNIT 127: Exploit Development Ch 1: Before you begin.
Gogul Balakrishnan, Radu Gruian and Thomas Reps Computer Science Dept., Univ. of Wisconsin GrammaTech, Inc. April, 2005 CodeSurfer / x86 A Platform for.
Assembly Language. Symbol Table Variables.DATA var DW 0 sum DD 0 array TIMES 10 DW 0 message DB ’ Welcome ’,0 char1 DB ? Symbol Table Name Offset var.
Carnegie Mellon 1 Machine-Level Programming I: Basics Lecture, Feb. 21, 2013 These slides are from website which accompanies the.
Analyzing Stripped Device-Driver Executables Gogul Balakrishnan 1 Thomas Reps 2 1 NEC Laboratories America 2 University of Wisconsin (Work done at University.
Compiler Construction Code Generation Activation Records
Arrays. Outline 1.(Introduction) Arrays An array is a contiguous block of list of data in memory. Each element of the list must be the same type and use.
Computer Organization & Assembly Language University of Sargodha, Lahore Campus Prepared by Ali Saeed.
Analyzing Memory Accesses in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Paradyn Project Safe and Efficient Instrumentation Andrew Bernat.
Assembly Language Addressing Modes. Introduction CISC processors usually supports more addressing modes than RISC processors. –RISC processors use the.
ICS51 Introductory Computer Organization Accessing parameters from the stack and calling functions.
Recitation 3: Procedures and the Stack
Assembly function call convention
Instruction Set Architecture
Assembly language.
Data Transfers, Addressing, and Arithmetic
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Aaron Miller David Cohen Spring 2011
Introduction to Compilers Tim Teitelbaum
Assembly IA-32.
High-Level Language Interface
Assembly Language Programming V: In-line Assembly Code
Machine-Level Programming 1 Introduction
Computer Architecture and Assembly Language
Ramblr Making Reassembly Great Again
BIC 10503: COMPUTER ARCHITECTURE
Code Generation.
C Prog. To Object Code text text binary binary Code in files p1.c p2.c
Machine-Level Programming: Introduction
Multi-modules programming
X86 Assembly Review.
CSC 497/583 Advanced Topics in Computer Security
Computer Architecture and System Programming Laboratory
Computer Architecture and Assembly Language
Computer Architecture and System Programming Laboratory
Computer Architecture and System Programming Laboratory
Presentation transcript:

Gogul Balakrishnan Thomas Reps University of Wisconsin Analyzing Memory Accesses in x86 Executables

Motivation There has been a growing need for tools that analyze executables. It’s important to be able to decipher the behavior of worms, mobile code and virus-infected code. (w/o source) Existing tools either treat memory accesses extremely conservatively, or assume the presence of symbol-table or debugging information. – Neither approach is satisfactory. 2

Goal (1) Create an intermediate representation (IR) that is similar to the IR used in a compiler CFGs call graph used, killed, may-killed variables for CFG nodes points-to sets Why? a tool for a security analyst a general infrastructure for binary analysis 3

Codesurfer/x86 Architecture Binary Build CFGs Parse Binary IDA Pro Connector Value-set Analysis CodeSurfer Build SDG Browse Client Applications Initial estimate of memory addr. & offsets procedures and call sites malloc sites Whole-program analysis CFGs call graph used, killed, may-killed variables for CFG nodes points-to sets 4

Outline Example Challenges Value-set analysis (VSA) Performance [Future work] 5

Tutorial on x86 Instructions mov ecx, edx ; ecx = edx mov ecx, [edx] ; ecx = *edx mov [ecx], edx ;*ecx = edx lea ecx, [esp+8] ; ecx = &a[2] 6

Running Example int arrVal=0, *pArray2; int main() { int i, a[10], *p; /* Initialize pointers */ pArray2 = &a[2]; p = &a[0]; /* Initialize Array */ for( i = 0 ; i<10 ; ++i ) { *p = arrVal; p++ ; } /* Return a[2] */ return *pArray2; } ; ebx  i ; ecx  variable p 1 sub esp, 40 ;adjust stack 2 lea edx, [esp+8] ; 3 mov [4], edx ;pArray2=&a[2] 4 lea ecx, [esp] ;p=&a[0] 5 mov edx, [0] ; 6 loc_9: 7 mov [ecx], edx ;*p=arrVal 8 add ecx, 4 ;p++ 9 inc ebx ;i++ 10 cmp ebx, 10 ;i<10? 11 jl short loc_9 ; 12 mov edi, [4] ; 13 mov eax, [edi] ;return *pArray2 14 add esp, retn 7

Running Example – Address Space 0h 4h a(40 bytes) arrVal(4 bytes) pArray2(4 bytes) Global data Data local to main (Activation Record) return_address 0ffffh ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, [esp+8] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn 8

Running Example – Address Space 0h Global data Data local to main (Activation Record) return_address 0ffffh No debugging information 9

Challenges No debugging/symbol-table information Explicit memory addresses need something similar to C variables a-locs Only have an initial estimate of code, data, procedures, call sites, malloc sites extend IR disassemble data, add to CFG,... similar to elaboration of CFG/call-graph in a compiler because of calls via function pointers 10

Memory-regions An abstraction of the address space Idea : group similar runtime addresses collapse the runtime ARs for each procedure one region for all global data one region for each malloc site f f … … … g f g global f g 11

Example – Memory-regions (GL,0) (GL,8) (main, -40) Region for main Global Region (main, 0) ret_main ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, [esp+8] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn 12

“Need Something Similar to C Variables” A-locs locations between consecutive addresses locations between consecutive offsets Registers Use a-locs instead of variables in static analysis e.g., killed a-loc  killed variable 13

Example – A-locs (GL,0) (GL,8) (main, -40) Region for main Global Region (main, 0) ret_main ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, [esp+8] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn (GL,4) [0] [4] [esp] [esp+8] (main, -32) 14

Example – A-locs (main, -40) (main, 0) ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, [esp+8] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn (main, -32) (GL,0) (GL,8) Region for main Global Region ret_main (GL,4) mainv_28 mem_4 mem_0 mainv_20 15

Example – A-locs (main, -40) (main, 0) (main, -32) (GL,0) (GL,8) Region for main Global Region ret_main (GL,4) mainv_28 mem_4 mem_0 mainv_20 ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, &mainv_20 ; mov mem_4, edx ;pArray2=&a[2] lea ecx, &mainv_28 ;p=&a[0] mov edx, mem_0 ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, mem_4 ; mov eax, [edi] ;return *pArray2 add esp, 40 retn 16

Example – A-locs edx Locals : mainv_28, mainv_20 {a[0], a[2]} Globals : mem_0, mem_4 {arrVal, pArray2} mainv_20 mem_4 ecx mainv_28 edi ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, &mainv_20 ; mov mem_4, edx ;pArray2=&a[2] lea ecx, &mainv_28 ;p=&a[0] mov edx, mem_0 ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, mem_4 ; mov eax, [edi] ;return *pArray2 add esp, 40 retn 17

Value-Set Analysis Resembles a pointer-analysis algorithm interprets pointer-manipulation operations pointer arithmetic, too Resembles a numeric-analysis algorithm over-approximate the set of values/addresses held by an a-loc range information stride information interprets arithmetic operations on sets of values/addresses 18

Value-Set Analysis Over approximate/estimate 19 (static address) (static stack-frame offset) or (register) (static address) + (offset) (static address) + (offest) C compiler global variable local variable struct array Base + (index * scale) + offset

Value-set An a-loc  a variable the address of an a-loc (memory-region, offset within the region), (rgn, offset) An a-loc  an aggregate variable addresses of elements of an a-loc (rgn, {o 1, o 2, …, o n }) Value-set = a set of such addresses {(rgn 1, {o 1, o 2, …, o n }), …, (rgn r, {o 1, o 2, …, o m })} “r” – number of regions in the program 20

Value-set - RIC Idea: approximate {o 1, …, o k } with a numeric domain {1, 3, 5, 7} represented as 2[0,3]+1 Reduced Interval Congruence (RIC) (e.g.)[0,3] == 0,1,2,3 common stride lower and upper bounds displacement Set of addresses is an r-tuple: (ric 1, …, ric r ) ric 1 : offsets in global region a set of numbers: (ric 1, , …,  ) 21

Example – Value-set analysis (main, -40) (main, 0) (main, -32) (GL,0) (GL,8) Region for main Global Region ret_main (GL,4) mainv_28 mem_4 mem_0 mainv_20 ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, [esp+8 ] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn 1 21 ecx  ( , 4[0,∞]-40) ebx  (1[0,9],  ) esp  ( , -40) 2 edi  ( , -32) esp  ( , -40) 22 mainGlobal

Example – Value-set analysis (main, -40) (main, 0) (main, -32) (GL,0) (GL,8) Region for main Global Region ret_main (GL,4) mainv_28 mem_4 mem_0 mainv_20 ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, [esp+8] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn 1 21 ecx  ( , 4[0,∞]-40) 2 edi  ( , -32) A stack-smashing attack? 23

Affine-Relation Analysis Value-set domain is non-relational cannot capture relationships among a-locs Imprecise results e.g. no upper bound for ecx at loc_9 ecx  ( , 4[0,∞]-40)... loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ;... 24

Affine-Relation Analysis Obtain affine relations via static analysis Use affine relations to improve precision e.g., at loc_9 ecx=esp+(4  ebx), ebx=([0,9],  ), esp=( ,-40)... loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ;...  ecx=( ,-40)+4([0,9])  ecx=( ,4[0,9]-40)  upper bound for ecx at loc_9 ecx ebx(==i) (==pArray2) …… esp 25

Example – Value-set analysis (main, -40) (main, 0) (main, -32) (GL,0) (GL,8) Region for main Global Region ret_main (GL,4) mainv_28 mem_4 mem_0 mainv_20 ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, [esp+8] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn 1 21 ecx  ( , 4[0,9]-40) No stack-smashing attack reported 26

Affine-Relation Analysis Affine relation x 1, x 2, …, x n – a-locs a 0, a 1, …, a n – integer constants a 0 +  i=1..n (a i x i ) = 0 Idea: determine affine relations on registers use such relations to improve precision 27

Performance ProgramProceduresInstructions Value-set analysis (seconds) Affine- relations (seconds) javac363, cat(2.0.14)1233, cut(2.0.14)1294, grep(2.4.2)24516, flex(2.5.4)23923, tar( )58750, awk(3.1.0)59569,9271,0171,011 winhlp32 ( ) 1,018108,3801,7121, Table 1.Running times for VSA and Affine-relation analysis

Future Work Aggregate Structure Identification Ramalingam et al. [POPL 99] Ignore declarative information Identify fields from the access patterns Useful for improving the a-loc abstraction discovering type information 29