Download presentation
Presentation is loading. Please wait.
Published byMuriel Golden Modified over 9 years ago
1
Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen McCamant Pongsin Poosankam Dawn Song UC, Berkeley
3
Motivation A popular approach for protecting applications from untrusted OS is to rely on a trusted VMM Binary translation is one of the commonly used implementation technologies in VMMs QEMU, earlier versions of VMWare, … Benefits: No need for hardware support, applicable to COTS binaries, whole system can be instrumented Unfortunately, existing binary translators unsuited for enforcing higher level properties Information flow, control-flow integrity, object-granularity memory safety, … Incur very high overheads (4x to 10x slowdown), or are simply unable to express certain properties
4
Our Approach Develop novel static analysis based methods to overcome the drawbacks of today’s techniques Robust, scalable static analysis of low-level code From different compilers, or hand-coded assembly Accurate disassembly of binary code Indirect control-flow transfers, non-standard call/return conventions, mixing of data and code, … Accurate reasoning about key properties Dynamic taint analysis
5
Robust and scalable Static analysis of low-level code
6
Static analysis of low-level code Scalability: requires modular analysis Analyze functions individually, compose results Avoids repeated analysis of same code (esp. libraries) Strength: requires accurate reasoning about variables (esp. local variables) Challenges in low-level binary code Difficult to identify parameter passing in optimized code Missing pushes, parameter passing via registers,… Difficult to distinguish local variables from other accesses Caller/callee-saved registers, stack pointer conventions, …
7
Static analysis of low-level code To solve these challenges, previous approaches make optimistic assumptions, or rely on compiler idioms often fail on optimized code and/or large programs don’t work for other compilers, or hand-written assembly Our solution: Develop a new approach that Uses systematic analysis to reduce assumptions/heuristics Accurately tracks local variables by analyzing values held in registers and on the stack
8
Stack Analysis Analyzes one function at a time Examines the use of stack to Determine parameters Number of them, whether in registers or on stack Caller- and callee-saved registers Summarize effect on parameters Preservation of SP, return to caller, changes in parameter or register contents,… ESP RETURN ADDR ƒ
9
Abstract Interpretation for Stack Analysis LATTICE : Activation Record Base_BP +[0,0] EBP push %ebp mov %esp, %ebp sub $16, %esp Base_SP +[0,0] ESP 0 Base_SP
10
Abstract Interpretation for Stack Analysis LATTICE : Activation Record Base_BP +[0,0] EBP push %ebp mov %esp, %ebp sub $16, %esp Base_SP +[-4,-4] ESP Base_BP+[0,0] 0 -4 Base_SP
11
Stack Analysis (contd) Summary for f: No change to ESP Two input parameters on stack EAX, EDX, arg1 changed as shown Others unchanged : push %ebp mov %esp, %ebp sub $16, %esp mov 8(%ebp), %eax add $3, %eax mov %eax, 8(%ebp) mov $7, -12(%ebp) mov 12(%ebp), %edx mov %edx, -8(%ebp) leave ret args locals Base_SP Base_SP+[-20,-20]
12
Stack Analysis: Preliminary results
13
Static disassembly of binary code
14
Background: Disassembly Techniques Linear sweep algorithm Start with program entry point, proceed to disassemble instructions sequentially Key assumption: all instructions appear one after the next, without any gaps Violated in most code (presence of data or padding) Recursive Traversal Algorithm After a control-flow transfer instruction (CTI), proceed to disassemble target address For conditional CTI and non-CTI, proceed to disassemble next instruction Key problems Code reached only through indirect CTIs Functions that don’t return in the usual way
15
Our Approach for Disassembly Assumption No code obfuscation Non-assumptions Function prologue and epilogue patterns Compiler idioms or (lack of) optimizations Approach Use recursive traversal Use stack analysis to compute/verify return targets Develop new analysis to determine targets of indirect control-flow transfers
16
Our Approach: Type inference Key insight: Code pointer values don’t undergo arithmetic or other transformations Implication: values assigned to code pointers must represent indirect CTI targets Achieves much better results than data flow analysis Avoids global def-use problem, which is very hard in low- level languages Compute sets C of possible code addresses and C of definite code addresses Code at addresses in C can be safely disassembled Code at addresses not in C can be safely relocated
17
Static Disassembly: Preliminary Results Analysis of disassembler on 'ls' binary AnalysisDisassembled code Reachable code not disassembled Recursive Traversal2.7%85% Compiler idioms and heuristics87%1% Function pointer analysis88%0%
18
Static Disassembly: Preliminary Results Gap in dhclient due to incomplete implementation, dealing with global arrays Application Size (KB) Disassembled code Reachable code not disassembled pdftops1497%0% chroot2685%0% chmod3987%0% cat4392%0% ls9688%0% dhclient41181%4%
19
DTA++: Improving accuracy of Dynamic Taint Analysis [NDSS 2011]
20
Under-tainting and Over-tainting Results vary based on which values are considered to depend on others:
21
Under-tainting and Over-tainting Results vary based on which values are considered to depend on others: Too few dependencies lead to under-tainting
22
Under-tainting and Over-tainting Results vary based on which values are considered to depend on others: Too many dependencies lead to over-tainting
23
Under-tainting occurs when control flow state represents (almost) all of the information in inputs Key idea: propagate taint only for control dependencies that would cause under-tainting (culprit implicit flows) Key Idea
24
Under-tainting occurs when control flow state represents (almost) all of the information in inputs Key idea: propagate taint only for control dependencies that would cause under-tainting (culprit implicit flows) Key Idea 1 char output[256]; 2 char input = next_in(); 3 long len = 0; 4 if (input == '{') { 5 output[0] = '\\'; 6 output[1] = '{'; 7 len = 2; 8 }
25
DTA++ Approach Overview Hypothesis: under-tainting occurs at just a few locations in a program (culprit branches) Approach: find these locations in advance, and construct new taint propagation rules for them Assumption: we are given test inputs that demonstrate the under-tainting
26
Approach Details Under-tainting Detection Predicate Given a (partial) execution trace t, φ(t) holds if t contains a culprit implicit flow Implementation Use symbolic execution to count how many other inputs could take the same execution path as t Few or none → φ(t) = true Search for Culprit Branches Find shortest prefix of t that satisfies φ the last instruction in the prefix is the culprit Remove culprit, repeat the search to find others
27
Program Description # of Culprit Implicit Flows Detected & Fixed Time for Diagnosis WordPad, RTF10.26s MS Word 2003, RTF2431m 5.26s AbiWord, HTML114.29s AngelWriter, HTML30.63s AurelEdit, RTF10.76s VNU Editor, RTF10.34s IntelliEdit, RTF10.40s CryptEdit, RTF10.23s DTA++ Results: Diagnosis Time
28
DTA++ Results: Over-tainting
29
Summary and Future Work Develop novel static analysis based methods to overcome the drawbacks of today’s techniques Robust, scalable static analysis of low-level code Accurate disassembly of binary code Accurate reasoning about key properties Dynamic taint analysis Future work Experimentation and evaluation of stack analysis and disassembly Robust and efficient binary instrumentation for information flow and related properties Application to hostile OS defense
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.