Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen McCamant.

Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen McCamant Pongsin Poosankam Dawn Song UC, Berkeley

Motivation A popular approach for protecting applications from untrusted OS is to rely on a trusted VMM Binary translation is one of the commonly used implementation technologies in VMMs QEMU, earlier versions of VMWare, … Benefits: No need for hardware support, applicable to COTS binaries, whole system can be instrumented Unfortunately, existing binary translators unsuited for enforcing higher level properties Information flow, control-flow integrity, object-granularity memory safety, …  Incur very high overheads (4x to 10x slowdown), or are simply unable to express certain properties

Our Approach Develop novel static analysis based methods to overcome the drawbacks of today’s techniques Robust, scalable static analysis of low-level code  From different compilers, or hand-coded assembly Accurate disassembly of binary code  Indirect control-flow transfers, non-standard call/return conventions, mixing of data and code, … Accurate reasoning about key properties  Dynamic taint analysis

Robust and scalable Static analysis of low-level code

Static analysis of low-level code Scalability: requires modular analysis Analyze functions individually, compose results Avoids repeated analysis of same code (esp. libraries) Strength: requires accurate reasoning about variables (esp. local variables) Challenges in low-level binary code Difficult to identify parameter passing in optimized code Missing pushes, parameter passing via registers,… Difficult to distinguish local variables from other accesses Caller/callee-saved registers, stack pointer conventions, …

Static analysis of low-level code To solve these challenges, previous approaches make optimistic assumptions, or rely on compiler idioms often fail on optimized code and/or large programs don’t work for other compilers, or hand-written assembly Our solution: Develop a new approach that Uses systematic analysis to reduce assumptions/heuristics Accurately tracks local variables by analyzing values held in registers and on the stack

Stack Analysis Analyzes one function at a time Examines the use of stack to Determine parameters Number of them, whether in registers or on stack Caller- and callee-saved registers Summarize effect on parameters Preservation of SP, return to caller, changes in parameter or register contents,… ESP RETURN ADDR ƒ

Abstract Interpretation for Stack Analysis LATTICE : Activation Record Base_BP +[0,0] EBP push %ebp mov %esp, %ebp sub $16, %esp Base_SP +[0,0] ESP 0 Base_SP

Abstract Interpretation for Stack Analysis LATTICE : Activation Record Base_BP +[0,0] EBP push %ebp mov %esp, %ebp sub $16, %esp Base_SP +[-4,-4] ESP Base_BP+[0,0] 0 -4 Base_SP

Stack Analysis (contd) Summary for f: No change to ESP Two input parameters on stack EAX, EDX, arg1 changed as shown Others unchanged : push %ebp mov %esp, %ebp sub $16, %esp mov 8(%ebp), %eax add $3, %eax mov %eax, 8(%ebp) mov $7, -12(%ebp) mov 12(%ebp), %edx mov %edx, -8(%ebp) leave ret args locals Base_SP Base_SP+[-20,-20]

Stack Analysis: Preliminary results

Static disassembly of binary code

Background: Disassembly Techniques Linear sweep algorithm Start with program entry point, proceed to disassemble instructions sequentially Key assumption: all instructions appear one after the next, without any gaps Violated in most code (presence of data or padding) Recursive Traversal Algorithm After a control-flow transfer instruction (CTI), proceed to disassemble target address For conditional CTI and non-CTI, proceed to disassemble next instruction Key problems Code reached only through indirect CTIs Functions that don’t return in the usual way

Our Approach for Disassembly Assumption No code obfuscation Non-assumptions Function prologue and epilogue patterns Compiler idioms or (lack of) optimizations Approach Use recursive traversal Use stack analysis to compute/verify return targets Develop new analysis to determine targets of indirect control-flow transfers

Our Approach: Type inference Key insight: Code pointer values don’t undergo arithmetic or other transformations Implication: values assigned to code pointers must represent indirect CTI targets Achieves much better results than data flow analysis Avoids global def-use problem, which is very hard in low- level languages Compute sets C of possible code addresses and C of definite code addresses Code at addresses in C can be safely disassembled Code at addresses not in C can be safely relocated

Static Disassembly: Preliminary Results Analysis of disassembler on 'ls' binary AnalysisDisassembled code Reachable code not disassembled Recursive Traversal2.7%85% Compiler idioms and heuristics87%1% Function pointer analysis88%0%

Static Disassembly: Preliminary Results Gap in dhclient due to incomplete implementation, dealing with global arrays Application Size (KB) Disassembled code Reachable code not disassembled pdftops1497%0% chroot2685%0% chmod3987%0% cat4392%0% ls9688%0% dhclient41181%4%

DTA++: Improving accuracy of Dynamic Taint Analysis [NDSS 2011]

Under-tainting and Over-tainting Results vary based on which values are considered to depend on others:

Under-tainting and Over-tainting Results vary based on which values are considered to depend on others: Too few dependencies lead to under-tainting

Under-tainting and Over-tainting Results vary based on which values are considered to depend on others: Too many dependencies lead to over-tainting

Under-tainting occurs when control flow state represents (almost) all of the information in inputs Key idea: propagate taint only for control dependencies that would cause under-tainting (culprit implicit flows) Key Idea

Under-tainting occurs when control flow state represents (almost) all of the information in inputs Key idea: propagate taint only for control dependencies that would cause under-tainting (culprit implicit flows) Key Idea 1 char output[256]; 2 char input = next_in(); 3 long len = 0; 4 if (input == '{') { 5 output[0] = '\\'; 6 output[1] = '{'; 7 len = 2; 8 }

DTA++ Approach Overview Hypothesis: under-tainting occurs at just a few locations in a program (culprit branches) Approach: find these locations in advance, and construct new taint propagation rules for them Assumption: we are given test inputs that demonstrate the under-tainting

Approach Details Under-tainting Detection Predicate Given a (partial) execution trace t, φ(t) holds if t contains a culprit implicit flow Implementation Use symbolic execution to count how many other inputs could take the same execution path as t Few or none → φ(t) = true Search for Culprit Branches Find shortest prefix of t that satisfies φ the last instruction in the prefix is the culprit Remove culprit, repeat the search to find others

Program Description # of Culprit Implicit Flows Detected & Fixed Time for Diagnosis WordPad, RTF10.26s MS Word 2003, RTF2431m 5.26s AbiWord, HTML114.29s AngelWriter, HTML30.63s AurelEdit, RTF10.76s VNU Editor, RTF10.34s IntelliEdit, RTF10.40s CryptEdit, RTF10.23s DTA++ Results: Diagnosis Time

DTA++ Results: Over-tainting

Summary and Future Work Develop novel static analysis based methods to overcome the drawbacks of today’s techniques Robust, scalable static analysis of low-level code Accurate disassembly of binary code Accurate reasoning about key properties Dynamic taint analysis Future work Experimentation and evaluation of stack analysis and disassembly Robust and efficient binary instrumentation for information flow and related properties Application to hostile OS defense

Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen McCamant.

Similar presentations

Presentation on theme: "Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen McCamant."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen McCamant.

Similar presentations

Presentation on theme: "Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen McCamant."— Presentation transcript:

Similar presentations

About project

Feedback