Analysis Of Stripped Binary Code Laune Harris University of Wisconsin – Madison
2 856c: d: 89e5 856f: 83ec : e8ddffffff 857b: c9 857c: c3 857d: e: 89e5 8581: 83ec18 858b: e8bfffffff 8591: c9 8592: c3 Binary code
3 856c: d: 89e5 856f: 83ec : e8ddffffff 857b: c9 857c: c3 857d: e: 89e5 8581: 83ec18 858b: e8bfffffff 8591: c9 8592: c3 push %ebp mov %esp, %ebp sub 8, %esp call 857d leave ret push %ebp mov %esp, %ebp sub %eax, %ebp call 866c leave ret Binary code (with assembly)
4 856c: d: 89e5 856f: 83ec : e8ddffffff 857b: c9 857c: c3 857d: e: 89e5 8581: 83ec18 858b: e8bfffffff 8591: c9 8592: c3 push %ebp mov %esp, %ebp sub 8, %esp call foo leave ret push %ebp mov %esp, %ebp sub %eax, %ebp call printf leave ret main foo Binary code (with symbol info)
5 A lot of code is stripped Commercial applications (usually) Proprietary libraries (often) Viruses OS libraries and utilities (depends on OS and OS version)
6 Steps in symbol reconstruction Find and name functions Find function size
7 Finding functions Build a call graph and traverse it to find function start addresses Opportunistic parsing: use existing symbol names and addresses where available Works on a spectrum of binaries ranging from binaries with all symbols to fully stripped binaries
8 push %ebp 856c:main Call Graph creation
9 push %ebp mov %esp, %ebp sub 8, %esp call 857d leave ret 856c: 856d: 856f: 8572: 857b: 857c: main Call Graph creation
10 push %ebp mov %esp, %ebp sub 8, %esp call func857d leave ret push %ebp 856c: 856d: 856f: 8572: 857b: 857c: 857d: main func857d Call Graph creation
11 push %ebp mov %esp, %ebp sub 8, %esp call func857d leave ret push %ebp mov %esp, %ebp sub %eax, %ebp call 865e call 866d leave ret 856c: 856d: 856f: 8572: 857b: 857c: 857d: 857e: 8581: 858b: 8591: 8596: 8597: main func857d Call Graph creation
12 Parsing Functions Disassemble function’s code by traversing intra-procedural control flow graph Highest address determines function size
13 Error Detection And Recovery CFG exit points are sometimes hard to identify Assume branches that are not obvious exits are intra-procedural Errors result in overestimation of function size Overlapping functions indicate error
14 Problems and Solutions Functions that are only called indirectly Problem: static call graph traversal does not discover these functions Solution: examine gaps in text space and use heuristics to find functions
15 Problems and Solutions cont’d Indirect Jumps Problem: need to find targets to complete CFG Solution: parse jump tables to find possible targets
16 Problems and Solutions cont’d Exception handling code Problem: creates code blocks that appear unreachable Solution: get block addresses from exception table
17 Test Programs paradyn ,676 condor_starter ,168 gimp ,329 eon ,163 om alara bubba size (MB) unstripped size (MB) stripped number of functions
18 Evaluation Parse time (includes CFG creation) ~1.4x faster than prev. parser (with cfg) ~1.7x slower than prev. parser (without cfg) Stripped parse time Varies: 1.2x - 1.9x slower than unstripped Symbol recreation 80% - 98% of original functions
19 Related Work Binary rewriters/instrumentation tools eel, emil, etch, goblin, leel, plto Disassemblers (lots available) IDAPro, Objdump, dumpbin, etc Symbol table reconstructors dress, objdump-output-beautifier
20 Status Implemented on x86 Ready for measurement and instrumentation Good start for security, but needs work
21 Future Work Develop more accurate heuristics to identify code in unlit areas of the binary Data flow analyses Port to other platforms Support unconventional function constructs Comprehensive comparison with other tools Evaluation on obfuscated code