Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gogul Balakrishnan Thomas Reps University of Wisconsin Analyzing Memory Accesses in x86 Executables.

Similar presentations


Presentation on theme: "Gogul Balakrishnan Thomas Reps University of Wisconsin Analyzing Memory Accesses in x86 Executables."— Presentation transcript:

1 Gogul Balakrishnan Thomas Reps University of Wisconsin Analyzing Memory Accesses in x86 Executables

2 Motivation There has been a growing need for tools that analyze executables. It’s important to be able to decipher the behavior of worms, mobile code and virus-infected code. (w/o source) Existing tools either treat memory accesses extremely conservatively, or assume the presence of symbol-table or debugging information. – Neither approach is satisfactory. 2

3 Goal (1) Create an intermediate representation (IR) that is similar to the IR used in a compiler CFGs call graph used, killed, may-killed variables for CFG nodes points-to sets Why? a tool for a security analyst a general infrastructure for binary analysis 3

4 Codesurfer/x86 Architecture Binary Build CFGs Parse Binary IDA Pro Connector Value-set Analysis CodeSurfer Build SDG Browse Client Applications Initial estimate of memory addr. & offsets procedures and call sites malloc sites Whole-program analysis CFGs call graph used, killed, may-killed variables for CFG nodes points-to sets 4

5 Outline Example Challenges Value-set analysis (VSA) Performance [Future work] 5

6 Tutorial on x86 Instructions mov ecx, edx ; ecx = edx mov ecx, [edx] ; ecx = *edx mov [ecx], edx ;*ecx = edx lea ecx, [esp+8] ; ecx = &a[2] 6

7 Running Example int arrVal=0, *pArray2; int main() { int i, a[10], *p; /* Initialize pointers */ pArray2 = &a[2]; p = &a[0]; /* Initialize Array */ for( i = 0 ; i<10 ; ++i ) { *p = arrVal; p++ ; } /* Return a[2] */ return *pArray2; } ; ebx  i ; ecx  variable p 1 sub esp, 40 ;adjust stack 2 lea edx, [esp+8] ; 3 mov [4], edx ;pArray2=&a[2] 4 lea ecx, [esp] ;p=&a[0] 5 mov edx, [0] ; 6 loc_9: 7 mov [ecx], edx ;*p=arrVal 8 add ecx, 4 ;p++ 9 inc ebx ;i++ 10 cmp ebx, 10 ;i<10? 11 jl short loc_9 ; 12 mov edi, [4] ; 13 mov eax, [edi] ;return *pArray2 14 add esp, 40 15 retn 7

8 Running Example – Address Space 0h 4h a(40 bytes) arrVal(4 bytes) pArray2(4 bytes) Global data Data local to main (Activation Record) return_address 0ffffh ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, [esp+8] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn 8

9 Running Example – Address Space 0h Global data Data local to main (Activation Record) return_address 0ffffh No debugging information 9

10 Challenges No debugging/symbol-table information Explicit memory addresses need something similar to C variables a-locs Only have an initial estimate of code, data, procedures, call sites, malloc sites extend IR disassemble data, add to CFG,... similar to elaboration of CFG/call-graph in a compiler because of calls via function pointers 10

11 Memory-regions An abstraction of the address space Idea : group similar runtime addresses collapse the runtime ARs for each procedure one region for all global data one region for each malloc site f f … … … g f g global f g 11

12 Example – Memory-regions (GL,0) (GL,8) (main, -40) Region for main Global Region (main, 0) ret_main ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, [esp+8] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn 12

13 “Need Something Similar to C Variables” A-locs locations between consecutive addresses locations between consecutive offsets Registers Use a-locs instead of variables in static analysis e.g., killed a-loc  killed variable 13

14 Example – A-locs (GL,0) (GL,8) (main, -40) Region for main Global Region (main, 0) ret_main ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, [esp+8] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn (GL,4) [0] [4] [esp] [esp+8] (main, -32) 14

15 Example – A-locs (main, -40) (main, 0) ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, [esp+8] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn (main, -32) (GL,0) (GL,8) Region for main Global Region ret_main (GL,4) mainv_28 mem_4 mem_0 mainv_20 15

16 Example – A-locs (main, -40) (main, 0) (main, -32) (GL,0) (GL,8) Region for main Global Region ret_main (GL,4) mainv_28 mem_4 mem_0 mainv_20 ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, &mainv_20 ; mov mem_4, edx ;pArray2=&a[2] lea ecx, &mainv_28 ;p=&a[0] mov edx, mem_0 ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, mem_4 ; mov eax, [edi] ;return *pArray2 add esp, 40 retn 16

17 Example – A-locs edx Locals : mainv_28, mainv_20 {a[0], a[2]} Globals : mem_0, mem_4 {arrVal, pArray2} mainv_20 mem_4 ecx mainv_28 edi ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, &mainv_20 ; mov mem_4, edx ;pArray2=&a[2] lea ecx, &mainv_28 ;p=&a[0] mov edx, mem_0 ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, mem_4 ; mov eax, [edi] ;return *pArray2 add esp, 40 retn 17

18 Value-Set Analysis Resembles a pointer-analysis algorithm interprets pointer-manipulation operations pointer arithmetic, too Resembles a numeric-analysis algorithm over-approximate the set of values/addresses held by an a-loc range information stride information interprets arithmetic operations on sets of values/addresses 18

19 Value-Set Analysis Over approximate/estimate 19 (static address) (static stack-frame offset) or (register) (static address) + (offset) (static address) + (offest) C compiler global variable local variable struct array Base + (index * scale) + offset

20 Value-set An a-loc  a variable the address of an a-loc (memory-region, offset within the region), (rgn, offset) An a-loc  an aggregate variable addresses of elements of an a-loc (rgn, {o 1, o 2, …, o n }) Value-set = a set of such addresses {(rgn 1, {o 1, o 2, …, o n }), …, (rgn r, {o 1, o 2, …, o m })} “r” – number of regions in the program 20

21 Value-set - RIC Idea: approximate {o 1, …, o k } with a numeric domain {1, 3, 5, 7} represented as 2[0,3]+1 Reduced Interval Congruence (RIC) (e.g.)[0,3] == 0,1,2,3 common stride lower and upper bounds displacement Set of addresses is an r-tuple: (ric 1, …, ric r ) ric 1 : offsets in global region a set of numbers: (ric 1, , …,  ) 21

22 Example – Value-set analysis (main, -40) (main, 0) (main, -32) (GL,0) (GL,8) Region for main Global Region ret_main (GL,4) mainv_28 mem_4 mem_0 mainv_20 ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, [esp+8 ] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn 1 21 ecx  ( , 4[0,∞]-40) ebx  (1[0,9],  ) esp  ( , -40) 2 edi  ( , -32) esp  ( , -40) 22 mainGlobal

23 Example – Value-set analysis (main, -40) (main, 0) (main, -32) (GL,0) (GL,8) Region for main Global Region ret_main (GL,4) mainv_28 mem_4 mem_0 mainv_20 ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, [esp+8] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn 1 21 ecx  ( , 4[0,∞]-40) 2 edi  ( , -32) A stack-smashing attack? 23

24 Affine-Relation Analysis Value-set domain is non-relational cannot capture relationships among a-locs Imprecise results e.g. no upper bound for ecx at loc_9 ecx  ( , 4[0,∞]-40)... loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ;... 24

25 Affine-Relation Analysis Obtain affine relations via static analysis Use affine relations to improve precision e.g., at loc_9 ecx=esp+(4  ebx), ebx=([0,9],  ), esp=( ,-40)... loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ;...  ecx=( ,-40)+4([0,9])  ecx=( ,4[0,9]-40)  upper bound for ecx at loc_9 ecx ebx(==i) (==pArray2) …… esp 25

26 Example – Value-set analysis (main, -40) (main, 0) (main, -32) (GL,0) (GL,8) Region for main Global Region ret_main (GL,4) mainv_28 mem_4 mem_0 mainv_20 ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, [esp+8] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn 1 21 ecx  ( , 4[0,9]-40) No stack-smashing attack reported 26

27 Affine-Relation Analysis Affine relation x 1, x 2, …, x n – a-locs a 0, a 1, …, a n – integer constants a 0 +  i=1..n (a i x i ) = 0 Idea: determine affine relations on registers use such relations to improve precision 27

28 Performance ProgramProceduresInstructions Value-set analysis (seconds) Affine- relations (seconds) javac363,5557629 cat(2.0.14)1233,892926 cut(2.0.14)1294,329742 grep(2.4.2)24516,80811775 flex(2.5.4)23923,373108295 tar(1.13.19)58750,347220156 awk(3.1.0)59569,9271,0171,011 winhlp32 (5.00.2195.2014) 1,018108,3801,7121,290 28 Table 1.Running times for VSA and Affine-relation analysis

29 Future Work Aggregate Structure Identification Ramalingam et al. [POPL 99] Ignore declarative information Identify fields from the access patterns Useful for improving the a-loc abstraction discovering type information 29


Download ppt "Gogul Balakrishnan Thomas Reps University of Wisconsin Analyzing Memory Accesses in x86 Executables."

Similar presentations


Ads by Google