Download presentation
Presentation is loading. Please wait.
Published byGriselda Laureen Brooks Modified over 9 years ago
1
2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen McCamant Pongsin Poosankam Dawn Song UC, Berkeley
2
2/9/2009 2 Binary Rewriting for Protecting Applications Basic approach: Instrument OS+application to enforce policies that protect an application from a hostile OS Why binary rewriting? Versatile: enforce a wide range of properties Low-level: memory pages, instructions/operands,… Higher-level: fine-grained (data-structure level) memory isolation, policies on callable functions and parameters,… Global: information flow, control-flow integrity,… Wide applicability: COTS and legacy applications available only in binary form Application and all library code can be analyzed/rewritten Works across programs in many high-level languages Ability to handle low-level code written in assembly
3
2/9/2009 3 Binary Rewriting Today Relies on dynamic rewriting Each basic block rewritten just before first execution Benefit: Side-steps challenges of static rewriting, e.g., accurate disassembly Drawbacks High overheads for problems of our interest 400% to 4000% for taint-tracking Difficulty in reasoning about higher level properties Limited visibility (single basic block) constrains the classes of properties that can be reasoned about Targets a single instruction set (usually x86)
4
2/9/2009 4 Our Approach Develop novel static analysis based methods to overcome the drawbacks of today’s techniques Many research challenges: Robust and scalable static analysis of low-level code produced by different compilers (or hand-written assembly) Accurate disassembly of binary code Indirect control-flow transfers, non-standard call/return conventions, mingling of data and code, … Accurate reasoning about key properties Dynamic taint analysis
5
2/9/2009 5 Robust and scalable Static analysis of low-level code
6
2/9/2009 6 Static analysis of low-level code Scalability relies on modularity Analyze functions individually, compose results Avoids repeated analysis of same code (esp. libraries) Strength comes from accurate treatment of local variables Challenges in low-level binary code Difficult to identify parameter passing in optimized code Missing pushes, parameter passing via registers,… Difficult to distinguish local variables from other accesses Caller/callee-saved registers, stack pointer conventions, …
7
2/9/2009 7 Static analysis of low-level code To solve these challenges, previous approaches make optimistic assumptions, or rely on compiler idioms often fail on optimized code and/or large programs don’t work for other compilers, or hand-written assembly Our solution: Develop a new static analysis that Uses systematic analysis to avoid assumptions/heuristics Parameters, passing conventions, caller/callee save regs,… Verifies assumptions that it needs to make preservation of stack pointer across calls whether return goes back to caller, etc. Accurately tracks local variables by analyzing values held in registers and on the stack
8
2/9/2009 8 Stack Analysis Identify well-formed functions Associate with it scope, activation record No assumptions about Parameters & Return values Caller & Callee Saves Use of base pointers ESP RETURN ADDR ƒ
9
2/9/2009 9 Abstract Interpretation for Stack Analysis LATTICE : Activation Record Base_BP +[0,0] EBP push %ebp mov %esp, %ebp sub $16, %esp Base_SP +[0,0] ESP 0 Base_SP
10
2/9/2009 10 Abstract Interpretation for Stack Analysis LATTICE : Activation Record Base_BP +[0,0] EBP push %ebp mov %esp, %ebp sub $16, %esp Base_SP +[-4,-4] ESP Base_BP+[0,0] 0 -4 Base_SP
11
2/9/2009 11 Abstract Interpretation for Stack Analysis LATTICE : Activation Record Base_SP +[-4,-4] EBP push %ebp mov %esp, %ebp sub $16, %esp Base_SP +[-4,-4] ESP Base_BP+[0,0] 0 -4 Base_SP
12
2/9/2009 12 Abstract Interpretation for Stack Analysis LATTICE : Activation Record Base_SP +[-4,-4] EBP push %ebp mov %esp, %ebp sub $16, %esp Base_SP-20 ESP Base_BP+[0,0] 0 -4 Base_SP
13
2/9/2009 13 Stack Analysis (contd) : push %ebp mov %esp, %ebp sub $16, %esp mov 8(%ebp), %eax add $3, %eax mov %eax, 8(%ebp) mov $7, -12(%ebp) mov 12(%ebp), %edx mov %edx, -8(%ebp) leave ret args locals Base_SP Base_SP+[-20,-20]
14
2/9/2009 14 Function summaries from Stack analysis Change in ESP as a result of executing function Number of incoming parameters Changes in registers and parameters as a result of executing function For function : ESP unchanged 2 incoming arguments EAX, EDX and first parameter changed as shown before; Other registers and parameters unchanged.
15
2/9/2009 15 Analysis time
16
2/9/2009 16 Static disassembly of binary code
17
2/9/2009 17 Background: Disassembly Techniques Linear sweep algorithm Start with program entry point, proceed to disassemble instructions sequentially Key assumption: all instructions appear one after the next, without any gaps Violated in most code (presence of data or padding) Recursive Traversal Algorithm After a control-flow transfer instruction (CTI), proceed to disassemble target address For conditional CTI and non-CTI, proceed to disassemble next instruction Key problems Code reached only through indirect CTIs Functions that don’t return in the usual way
18
2/9/2009 18 Our Approach for Disassembly Assumption No code obfuscation Non-assumptions Function prologue and epilogue patterns Compiler idioms or (lack of) optimizations Approach Use recursive traversal Use stack analysis to compute/verify return targets Develop new analysis techniques to determine targets of indirect control-flow transfers
19
2/9/2009 19 Our Approach: Type inference Key insight: Code pointer values don’t undergo arithmetic or other transformations Implication: values assigned to code pointers must represent indirect CTI targets Achieves much better results than data flow analysis Avoids global def-use problem, which is very hard in low- level languages Compute sets C of possible code addresses and C of definite code addresses Code at addresses in C can be safely disassembled Code at addresses not in C can be safely relocated
20
2/9/2009 20 Static Disassembly: Preliminary Results Analysis of disassembler on 'ls' binary AnalysisDisassembled code Reachable code not disassembled Recursive Traversal2.7%85% Compiler idioms and heuristics87%1% Function pointer analysis88%0%
21
2/9/2009 21 Static Disassembly: Preliminary Results Gap in dhclient due to incomplete implementation, dealing with global arrays Application Size (KB) Disassembled code Reachable code not disassembled pdftops1497%0% chroot2685%0% chmod3987%0% cat4392%0% ls9688%0% dhclient41181%4%
22
2/9/2009 22 DTA++: Improving accuracy of Dynamic Taint Analysis
23
2/9/2009 23 Under-tainting and Over-tainting Results vary based on which values are considered to depend on others:
24
2/9/2009 24 Under-tainting and Over-tainting Results vary based on which values are considered to depend on others: Too few dependencies lead to under-tainting
25
2/9/2009 25 Under-tainting and Over-tainting Results vary based on which values are considered to depend on others: Too many dependencies lead to over-tainting
26
2/9/2009 26 Basic Idea Data dependencies Taint propagates from operands to the output of an operation Control dependencies Variables assigned within a conditional branch receive taint from the operands of the condition Commonly omitted in DTA: leading to under-tainting Key idea in DTA++: propagate taint only for control dependencies that would otherwise cause under- tainting (culprit implicit flows)
27
2/9/2009 27 Under-tainting occurs when control flow state represents (almost) all of the information in inputs Intuition: Information Flow
28
2/9/2009 28 Under-tainting occurs when control flow state represents (almost) all of the information in inputs Intuition: Information Flow 1 char output[256]; 2 char input = next_in(); 3 long len = 0; 4 if (input == '{') { 5 output[0] = '\\'; 6 output[1] = '{'; 7 len = 2; 8 }
29
2/9/2009 29 Offline Rule Generation Hypothesis: under-tainting occurs at just a few locations in a program (culprit branches) Approach: find these locations in advance, and construct new taint propagation rules form them Assumption: we are given test inputs that demonstrate the under-tainting
30
2/9/2009 30 Architecture Overview Extra Propagation Conventional DTA Extra Propagation Conventional DTA Under-tainting Diagnosis Rule Generation correct propagation information sample tainted input execution trace implicit flow branches DTA++ propagation rules Offline Analysis general tainted input trace (or other analysis)
31
2/9/2009 31 Under-tainting Detection Predicate Given a (partial) execution trace t, φ(t) holds if t contains a culprit implicit flow Implementation: count how many other inputs could take the same execution path as t (using symbolic execution) Few or none → φ(t) = true
32
2/9/2009 32 Search for Culprit Branches Search through prefixes of a trace to find the shortest satisfying φ: the last instruction in the prefix is the culprit To minimize calls to φ, use binary search After finding one culprit, remove it and repeat the search to find others
33
2/9/2009 33 Experiment Setup Subject programs are 8 Windows word- processing applications in binary form Input tainted plain text from virtual keyboard Convert and save the text in RTF or HTML – RTF: “ Taint it: { ” → “ Taint it \{ ” – HTML: “ Taint it: < ” → “ Taint it: < ”
34
2/9/2009 34 Results: Performance Program Description # of Culprit Implicit Flows Detected & Fixed Time for Diagnosis WordPad, RTF10.26s MS Word 2003, RTF2431m 5.26s AbiWord, HTML114.29s AngelWriter, HTML30.63s AurelEdit, RTF10.76s VNU Editor, RTF10.34s IntelliEdit, RTF10.40s CryptEdit, RTF10.23s
35
2/9/2009 35 Measuring Over-tainting After saving the file, count the number of tainted bytes in system memory – Also counted tainted branches (in paper) Four levels of propagation: – Original: vanilla DTA (has under-tainting) – Optimal: fix a single instruction manually – DTA++: targeted control-flow propagation – DYTAN*: indiscriminate control-flow propagation (similar to Clause et al.)
36
2/9/2009 36 Over-tainting Measurements
37
2/9/2009 37 Questions?
38
2/9/2009 38 Related Work IDAPro VSA NaCl TIE BIRD
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.