Gogul Balakrishnan Thomas Reps University of Wisconsin Analyzing Memory Accesses in x86 Executables
Motivation There has been a growing need for tools that analyze executables. It’s important to be able to decipher the behavior of worms, mobile code and virus-infected code. (w/o source) Existing tools either treat memory accesses extremely conservatively, or assume the presence of symbol-table or debugging information. – Neither approach is satisfactory. 2
Goal (1) Create an intermediate representation (IR) that is similar to the IR used in a compiler CFGs call graph used, killed, may-killed variables for CFG nodes points-to sets Why? a tool for a security analyst a general infrastructure for binary analysis 3
Codesurfer/x86 Architecture Binary Build CFGs Parse Binary IDA Pro Connector Value-set Analysis CodeSurfer Build SDG Browse Client Applications Initial estimate of memory addr. & offsets procedures and call sites malloc sites Whole-program analysis CFGs call graph used, killed, may-killed variables for CFG nodes points-to sets 4
Outline Example Challenges Value-set analysis (VSA) Performance [Future work] 5
Tutorial on x86 Instructions mov ecx, edx ; ecx = edx mov ecx, [edx] ; ecx = *edx mov [ecx], edx ;*ecx = edx lea ecx, [esp+8] ; ecx = &a[2] 6
Running Example int arrVal=0, *pArray2; int main() { int i, a[10], *p; /* Initialize pointers */ pArray2 = &a[2]; p = &a[0]; /* Initialize Array */ for( i = 0 ; i<10 ; ++i ) { *p = arrVal; p++ ; } /* Return a[2] */ return *pArray2; } ; ebx i ; ecx variable p 1 sub esp, 40 ;adjust stack 2 lea edx, [esp+8] ; 3 mov [4], edx ;pArray2=&a[2] 4 lea ecx, [esp] ;p=&a[0] 5 mov edx, [0] ; 6 loc_9: 7 mov [ecx], edx ;*p=arrVal 8 add ecx, 4 ;p++ 9 inc ebx ;i++ 10 cmp ebx, 10 ;i<10? 11 jl short loc_9 ; 12 mov edi, [4] ; 13 mov eax, [edi] ;return *pArray2 14 add esp, retn 7
Running Example – Address Space 0h 4h a(40 bytes) arrVal(4 bytes) pArray2(4 bytes) Global data Data local to main (Activation Record) return_address 0ffffh ; ebx i ; ecx variable p sub esp, 40 ;adjust stack lea edx, [esp+8] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn 8
Running Example – Address Space 0h Global data Data local to main (Activation Record) return_address 0ffffh No debugging information 9
Challenges No debugging/symbol-table information Explicit memory addresses need something similar to C variables a-locs Only have an initial estimate of code, data, procedures, call sites, malloc sites extend IR disassemble data, add to CFG,... similar to elaboration of CFG/call-graph in a compiler because of calls via function pointers 10
Memory-regions An abstraction of the address space Idea : group similar runtime addresses collapse the runtime ARs for each procedure one region for all global data one region for each malloc site f f … … … g f g global f g 11
Example – Memory-regions (GL,0) (GL,8) (main, -40) Region for main Global Region (main, 0) ret_main ; ebx i ; ecx variable p sub esp, 40 ;adjust stack lea edx, [esp+8] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn 12
“Need Something Similar to C Variables” A-locs locations between consecutive addresses locations between consecutive offsets Registers Use a-locs instead of variables in static analysis e.g., killed a-loc killed variable 13
Example – A-locs (GL,0) (GL,8) (main, -40) Region for main Global Region (main, 0) ret_main ; ebx i ; ecx variable p sub esp, 40 ;adjust stack lea edx, [esp+8] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn (GL,4) [0] [4] [esp] [esp+8] (main, -32) 14
Example – A-locs (main, -40) (main, 0) ; ebx i ; ecx variable p sub esp, 40 ;adjust stack lea edx, [esp+8] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn (main, -32) (GL,0) (GL,8) Region for main Global Region ret_main (GL,4) mainv_28 mem_4 mem_0 mainv_20 15
Example – A-locs (main, -40) (main, 0) (main, -32) (GL,0) (GL,8) Region for main Global Region ret_main (GL,4) mainv_28 mem_4 mem_0 mainv_20 ; ebx i ; ecx variable p sub esp, 40 ;adjust stack lea edx, &mainv_20 ; mov mem_4, edx ;pArray2=&a[2] lea ecx, &mainv_28 ;p=&a[0] mov edx, mem_0 ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, mem_4 ; mov eax, [edi] ;return *pArray2 add esp, 40 retn 16
Example – A-locs edx Locals : mainv_28, mainv_20 {a[0], a[2]} Globals : mem_0, mem_4 {arrVal, pArray2} mainv_20 mem_4 ecx mainv_28 edi ; ebx i ; ecx variable p sub esp, 40 ;adjust stack lea edx, &mainv_20 ; mov mem_4, edx ;pArray2=&a[2] lea ecx, &mainv_28 ;p=&a[0] mov edx, mem_0 ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, mem_4 ; mov eax, [edi] ;return *pArray2 add esp, 40 retn 17
Value-Set Analysis Resembles a pointer-analysis algorithm interprets pointer-manipulation operations pointer arithmetic, too Resembles a numeric-analysis algorithm over-approximate the set of values/addresses held by an a-loc range information stride information interprets arithmetic operations on sets of values/addresses 18
Value-Set Analysis Over approximate/estimate 19 (static address) (static stack-frame offset) or (register) (static address) + (offset) (static address) + (offest) C compiler global variable local variable struct array Base + (index * scale) + offset
Value-set An a-loc a variable the address of an a-loc (memory-region, offset within the region), (rgn, offset) An a-loc an aggregate variable addresses of elements of an a-loc (rgn, {o 1, o 2, …, o n }) Value-set = a set of such addresses {(rgn 1, {o 1, o 2, …, o n }), …, (rgn r, {o 1, o 2, …, o m })} “r” – number of regions in the program 20
Value-set - RIC Idea: approximate {o 1, …, o k } with a numeric domain {1, 3, 5, 7} represented as 2[0,3]+1 Reduced Interval Congruence (RIC) (e.g.)[0,3] == 0,1,2,3 common stride lower and upper bounds displacement Set of addresses is an r-tuple: (ric 1, …, ric r ) ric 1 : offsets in global region a set of numbers: (ric 1, , …, ) 21
Example – Value-set analysis (main, -40) (main, 0) (main, -32) (GL,0) (GL,8) Region for main Global Region ret_main (GL,4) mainv_28 mem_4 mem_0 mainv_20 ; ebx i ; ecx variable p sub esp, 40 ;adjust stack lea edx, [esp+8 ] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn 1 21 ecx ( , 4[0,∞]-40) ebx (1[0,9], ) esp ( , -40) 2 edi ( , -32) esp ( , -40) 22 mainGlobal
Example – Value-set analysis (main, -40) (main, 0) (main, -32) (GL,0) (GL,8) Region for main Global Region ret_main (GL,4) mainv_28 mem_4 mem_0 mainv_20 ; ebx i ; ecx variable p sub esp, 40 ;adjust stack lea edx, [esp+8] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn 1 21 ecx ( , 4[0,∞]-40) 2 edi ( , -32) A stack-smashing attack? 23
Affine-Relation Analysis Value-set domain is non-relational cannot capture relationships among a-locs Imprecise results e.g. no upper bound for ecx at loc_9 ecx ( , 4[0,∞]-40)... loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ;... 24
Affine-Relation Analysis Obtain affine relations via static analysis Use affine relations to improve precision e.g., at loc_9 ecx=esp+(4 ebx), ebx=([0,9], ), esp=( ,-40)... loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ;... ecx=( ,-40)+4([0,9]) ecx=( ,4[0,9]-40) upper bound for ecx at loc_9 ecx ebx(==i) (==pArray2) …… esp 25
Example – Value-set analysis (main, -40) (main, 0) (main, -32) (GL,0) (GL,8) Region for main Global Region ret_main (GL,4) mainv_28 mem_4 mem_0 mainv_20 ; ebx i ; ecx variable p sub esp, 40 ;adjust stack lea edx, [esp+8] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn 1 21 ecx ( , 4[0,9]-40) No stack-smashing attack reported 26
Affine-Relation Analysis Affine relation x 1, x 2, …, x n – a-locs a 0, a 1, …, a n – integer constants a 0 + i=1..n (a i x i ) = 0 Idea: determine affine relations on registers use such relations to improve precision 27
Performance ProgramProceduresInstructions Value-set analysis (seconds) Affine- relations (seconds) javac363, cat(2.0.14)1233, cut(2.0.14)1294, grep(2.4.2)24516, flex(2.5.4)23923, tar( )58750, awk(3.1.0)59569,9271,0171,011 winhlp32 ( ) 1,018108,3801,7121, Table 1.Running times for VSA and Affine-relation analysis
Future Work Aggregate Structure Identification Ramalingam et al. [POPL 99] Ignore declarative information Identify fields from the access patterns Useful for improving the a-loc abstraction discovering type information 29