Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen McCamant.

Slides:



Advertisements
Similar presentations
Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
Advertisements

Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Compiler Optimized Dynamic Taint Analysis James Kasten Alex Crowell.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
Native x86 Decompilation Using Semantics-Preserving Structural Analysis and Iterative Control-Flow Structuring Edward J. Schwartz *, JongHyup Lee ✝, Maverick.
Programming Languages Marjan Sirjani 2 2. Language Design Issues Design to Run efficiently : early languages Easy to write correctly : new languages.
Dec 5, 2007University of Virginia1 Efficient Dynamic Tainting using Multiple Cores Yan Huang University of Virginia Dec
C Programming and Assembly Language Janakiraman V – NITK Surathkal 2 nd August 2014.
Whole-Program Linear-Constant Analysis with Applications to Link-Time Optimization Ludo Van Put – Dominique Chanet – Koen De Bosschere Ghent University.
Program Representations. Representing programs Goals.
A survey of techniques for precise program slicing Komondoor V. Raghavan Indian Institute of Science, Bangalore.
Assembly Code Verification Using Model Checking Hao XIAO Singapore University of Technology and Design.
CS412/413 Introduction to Compilers Radu Rugina Lecture 16: Efficient Translation to Low IR 25 Feb 02.
1 Integrating Influence Mechanisms into Impact Analysis for Increased Precision Ben Breech Lori Pollock Mike Tegtmeyer University of Delaware Army Research.
A Comparison of Online and Dynamic Impact Analysis Algorithms Ben Breech Mike Tegtmeyer Lori Pollock University of Delaware.
Eliminating Stack Overflow by Abstract Interpretation John Regehr Alastair Reid Kirk Webb University of Utah.
TaintCheck and LockSet LBA Reading Group Presentation by Shimin Chen.
Typed Assembly Languages COS 441, Fall 2004 Frances Spalding Based on slides from Dave Walker and Greg Morrisett.
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
1 Chapter 7: Runtime Environments. int * larger (int a, int b) { if (a > b) return &a; //wrong else return &b; //wrong } int * larger (int *a, int *b)
Memory Image of Running Programs Executable file on disk, running program in memory, activation record, C-style and Pascal-style parameter passing.
PC hardware and x86 3/3/08 Frans Kaashoek MIT
1 Function Calls Professor Jennifer Rexford COS 217 Reading: Chapter 4 of “Programming From the Ground Up” (available online from the course Web site)
CS 536 Spring Code generation I Lecture 20.
Code Generation Professor Yihjia Tsai Tamkang University.
1 Pertemuan 20 Run-Time Environment Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
1 Homework Reading –PAL, pp , Machine Projects –Finish mp2warmup Questions? –Start mp2 as soon as possible Labs –Continue labs with your.
Compiler Construction Recap Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University.
Microprocessors Frame Pointers and the use of the –fomit-frame-pointer switch Feb 25th, 2002.
1 Loop-Extended Symbolic Execution on Binary Programs Pongsin Poosankam ‡* Prateek Saxena * Stephen McCamant * Dawn Song * ‡ Carnegie Mellon University.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Precision Going back to constant prop, in what cases would we lose precision?
Addressing Modes Chapter 11 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S.
Vulnerability-Specific Execution Filtering (VSEF) for Exploit Prevention on Commodity Software Authors: James Newsome, James Newsome, David Brumley, David.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
6.828: PC hardware and x86 Frans Kaashoek
Data Flow in Static Profiling Cathal Boogerd, Delft University, The Netherlands Leon Moonen, Simula Research Lab, Norway ?
Department of Computer Science A Static Program Analyzer to increase software reuse Ramakrishnan Venkitaraman and Gopal Gupta.
CSc 453 Runtime Environments Saumya Debray The University of Arizona Tucson.
 2/9/ Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.
Analysis Of Stripped Binary Code Laune Harris University of Wisconsin – Madison
Branch Regulation: Low-Overhead Protection from Code Reuse Attacks.
Static Program Analyses of DSP Software Systems Ramakrishnan Venkitaraman and Gopal Gupta.
COP4020 Programming Languages Subroutines and Parameter Passing Prof. Xin Yuan.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
Execution of an instruction
Buffer Overflow Proofing of Code Binaries By Ramya Reguramalingam Graduate Student, Computer Science Advisor: Dr. Gopal Gupta.
Buffer Overflow Attack Proofing of Code Binary Gopal Gupta, Parag Doshi, R. Reghuramalingam, Doug Harris The University of Texas at Dallas.
Addressing Modes Chapter 6 S. Dandamudi To be used with S. Dandamudi, “Introduction to Assembly Language Programming,” Second Edition, Springer,
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Compiler Construction Code Generation Activation Records
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
1 Assembly Language: Function Calls Jennifer Rexford.
Overview of Back-end for CComp Zhaopeng Li Software Security Lab. June 8, 2009.
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
Correct RelocationMarch 20, 2016 Correct Relocation: Do You Trust a Mutated Binary? Drew Bernat
LECTURE 19 Subroutines and Parameter Passing. ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Paradyn Project Safe and Efficient Instrumentation Andrew Bernat.
Writing Functions in Assembly
Introduction to Compilers Tim Teitelbaum
Writing Functions in Assembly
Edward J. Schwartz, Thanassis Avgerinos, David Brumley
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.
UNIT V Run Time Environments.
CSC-682 Advanced Computer Security
IntScope: Automatically Detecting Integer overflow vulnerability in X86 Binary Using Symbolic Execution Tielei Wang, TaoWei, ZhingiangLin, weiZou Purdue.
Software Testing and QA Theory and Practice (Chapter 5: Data Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
Presentation transcript:

Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen McCamant Pongsin Poosankam Dawn Song UC, Berkeley

Motivation A popular approach for protecting applications from untrusted OS is to rely on a trusted VMM Binary translation is one of the commonly used implementation technologies in VMMs QEMU, earlier versions of VMWare, … Benefits: No need for hardware support, applicable to COTS binaries, whole system can be instrumented Unfortunately, existing binary translators unsuited for enforcing higher level properties Information flow, control-flow integrity, object-granularity memory safety, …  Incur very high overheads (4x to 10x slowdown), or are simply unable to express certain properties

Our Approach Develop novel static analysis based methods to overcome the drawbacks of today’s techniques Robust, scalable static analysis of low-level code  From different compilers, or hand-coded assembly Accurate disassembly of binary code  Indirect control-flow transfers, non-standard call/return conventions, mixing of data and code, … Accurate reasoning about key properties  Dynamic taint analysis

Robust and scalable Static analysis of low-level code

Static analysis of low-level code Scalability: requires modular analysis Analyze functions individually, compose results Avoids repeated analysis of same code (esp. libraries) Strength: requires accurate reasoning about variables (esp. local variables) Challenges in low-level binary code Difficult to identify parameter passing in optimized code Missing pushes, parameter passing via registers,… Difficult to distinguish local variables from other accesses Caller/callee-saved registers, stack pointer conventions, …

Static analysis of low-level code To solve these challenges, previous approaches make optimistic assumptions, or rely on compiler idioms often fail on optimized code and/or large programs don’t work for other compilers, or hand-written assembly Our solution: Develop a new approach that Uses systematic analysis to reduce assumptions/heuristics Accurately tracks local variables by analyzing values held in registers and on the stack

Stack Analysis Analyzes one function at a time Examines the use of stack to Determine parameters Number of them, whether in registers or on stack Caller- and callee-saved registers Summarize effect on parameters Preservation of SP, return to caller, changes in parameter or register contents,… ESP RETURN ADDR ƒ

Abstract Interpretation for Stack Analysis LATTICE : Activation Record Base_BP +[0,0] EBP push %ebp mov %esp, %ebp sub $16, %esp Base_SP +[0,0] ESP 0 Base_SP

Abstract Interpretation for Stack Analysis LATTICE : Activation Record Base_BP +[0,0] EBP push %ebp mov %esp, %ebp sub $16, %esp Base_SP +[-4,-4] ESP Base_BP+[0,0] 0 -4 Base_SP

Stack Analysis (contd) Summary for f: No change to ESP Two input parameters on stack EAX, EDX, arg1 changed as shown Others unchanged : push %ebp mov %esp, %ebp sub $16, %esp mov 8(%ebp), %eax add $3, %eax mov %eax, 8(%ebp) mov $7, -12(%ebp) mov 12(%ebp), %edx mov %edx, -8(%ebp) leave ret args locals Base_SP Base_SP+[-20,-20]

Stack Analysis: Preliminary results

Static disassembly of binary code

Background: Disassembly Techniques Linear sweep algorithm Start with program entry point, proceed to disassemble instructions sequentially Key assumption: all instructions appear one after the next, without any gaps Violated in most code (presence of data or padding) Recursive Traversal Algorithm After a control-flow transfer instruction (CTI), proceed to disassemble target address For conditional CTI and non-CTI, proceed to disassemble next instruction Key problems Code reached only through indirect CTIs Functions that don’t return in the usual way

Our Approach for Disassembly Assumption No code obfuscation Non-assumptions Function prologue and epilogue patterns Compiler idioms or (lack of) optimizations Approach Use recursive traversal Use stack analysis to compute/verify return targets Develop new analysis to determine targets of indirect control-flow transfers

Our Approach: Type inference Key insight: Code pointer values don’t undergo arithmetic or other transformations Implication: values assigned to code pointers must represent indirect CTI targets Achieves much better results than data flow analysis Avoids global def-use problem, which is very hard in low- level languages Compute sets C of possible code addresses and C of definite code addresses Code at addresses in C can be safely disassembled Code at addresses not in C can be safely relocated

Static Disassembly: Preliminary Results Analysis of disassembler on 'ls' binary AnalysisDisassembled code Reachable code not disassembled Recursive Traversal2.7%85% Compiler idioms and heuristics87%1% Function pointer analysis88%0%

Static Disassembly: Preliminary Results Gap in dhclient due to incomplete implementation, dealing with global arrays Application Size (KB) Disassembled code Reachable code not disassembled pdftops1497%0% chroot2685%0% chmod3987%0% cat4392%0% ls9688%0% dhclient41181%4%

DTA++: Improving accuracy of Dynamic Taint Analysis [NDSS 2011]

Under-tainting and Over-tainting Results vary based on which values are considered to depend on others:

Under-tainting and Over-tainting Results vary based on which values are considered to depend on others: Too few dependencies lead to under-tainting

Under-tainting and Over-tainting Results vary based on which values are considered to depend on others: Too many dependencies lead to over-tainting

Under-tainting occurs when control flow state represents (almost) all of the information in inputs Key idea: propagate taint only for control dependencies that would cause under-tainting (culprit implicit flows) Key Idea

Under-tainting occurs when control flow state represents (almost) all of the information in inputs Key idea: propagate taint only for control dependencies that would cause under-tainting (culprit implicit flows) Key Idea 1 char output[256]; 2 char input = next_in(); 3 long len = 0; 4 if (input == '{') { 5 output[0] = '\\'; 6 output[1] = '{'; 7 len = 2; 8 }

DTA++ Approach Overview Hypothesis: under-tainting occurs at just a few locations in a program (culprit branches) Approach: find these locations in advance, and construct new taint propagation rules for them Assumption: we are given test inputs that demonstrate the under-tainting

Approach Details Under-tainting Detection Predicate Given a (partial) execution trace t, φ(t) holds if t contains a culprit implicit flow Implementation Use symbolic execution to count how many other inputs could take the same execution path as t Few or none → φ(t) = true Search for Culprit Branches Find shortest prefix of t that satisfies φ the last instruction in the prefix is the culprit Remove culprit, repeat the search to find others

Program Description # of Culprit Implicit Flows Detected & Fixed Time for Diagnosis WordPad, RTF10.26s MS Word 2003, RTF2431m 5.26s AbiWord, HTML114.29s AngelWriter, HTML30.63s AurelEdit, RTF10.76s VNU Editor, RTF10.34s IntelliEdit, RTF10.40s CryptEdit, RTF10.23s DTA++ Results: Diagnosis Time

DTA++ Results: Over-tainting

Summary and Future Work Develop novel static analysis based methods to overcome the drawbacks of today’s techniques Robust, scalable static analysis of low-level code Accurate disassembly of binary code Accurate reasoning about key properties Dynamic taint analysis Future work Experimentation and evaluation of stack analysis and disassembly Robust and efficient binary instrumentation for information flow and related properties Application to hostile OS defense