Native x86 Decompilation Using Semantics-Preserving Structural Analysis and Iterative Control-Flow Structuring Edward J. Schwartz *, JongHyup Lee ✝, Maverick.

Slides:



Advertisements
Similar presentations
Code Optimization and Performance Chapter 5 CS 105 Tour of the Black Holes of Computing.
Advertisements

Semantics Static semantics Dynamic semantics attribute grammars
ByteWeight: Learning to Recognize Functions in Binary Code
A Program Transformation For Faster Goal-Directed Search Akash Lal, Shaz Qadeer Microsoft Research.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
David Brumley Carnegie Mellon University Credit: Some slides from Ed Schwartz.
CS412/413 Introduction to Compilers Radu Rugina Lecture 37: DU Chains and SSA Form 29 Apr 02.
Chair of Software Engineering From Program slicing to Abstract Interpretation Dr. Manuel Oriol.
Assembly Code Verification Using Model Checking Hao XIAO Singapore University of Technology and Design.
Review: Software Security David Brumley Carnegie Mellon University.
SSP Re-hosting System Development: CLBM Overview and Module Recognition SSP Team Department of ECE Stevens Institute of Technology Presented by Hongbing.
Homework Any Questions?. Statements / Blocks, Section 3.1 An expression becomes a statement when it is followed by a semicolon x = 0; Braces are used.
1 Function Calls Professor Jennifer Rexford COS 217 Reading: Chapter 4 of “Programming From the Ground Up” (available online from the course Web site)
Chair of Software Engineering Fundamentals of Program Analysis Dr. Manuel Oriol.
Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San.
1 Program Analysis Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
Lecture 25 Generating Code for Basic Blocks Topics Code Generation Readings: April 19, 2006 CSCE 531 Compiler Construction.
Overview of program analysis Mooly Sagiv html://
Compiler Construction
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Overview of program analysis Mooly Sagiv html://
1 Homework Reading –PAL, pp Machine Projects –MP2 due at start of Class 12 Labs –Continue labs with your assigned section.
Language Evaluation Criteria
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 17: Code Mining.
System/Software Testing
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Machine-Level Programming 3 Control Flow Topics Control Flow Switch Statements Jump Tables.
Apache Exploits
Department of Computer Science A Static Program Analyzer to increase software reuse Ramakrishnan Venkitaraman and Gopal Gupta.
University of Washington x86 Programming III The Hardware/Software Interface CSE351 Winter 2013.
Program Behavior Characterization and Clustering: An Empirical Study for Failure Clustering Danqing Zhang School of Software Engineering, Tongji University.
CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.
Inferring Specifications to Detect Errors in Code Mana Taghdiri Presented by: Robert Seater MIT Computer Science & AI Lab.
Lecture 1 Introduction Figures from Lewis, “C# Software Solutions”, Addison Wesley Richard Gesick.
Static Program Analyses of DSP Software Systems Ramakrishnan Venkitaraman and Gopal Gupta.
Compiler Testing Lavinder Singh CSS 548 Autumn 2012.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Machine-Level Programming 3 Control Flow Topics Control Flow Switch Statements Jump Tables.
Analyzing Memory Accesses in Obfuscated x86 Executables Michael Venable Mohamed R. Choucane Md. Enamul Karim Arun Lakhotia (Presenter) DIMVA 2005 Wien.
Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology.
Low Level Programming Lecturer: Duncan Smeed The Interface Between High-Level and Low-Level Languages.
1 Control Flow Analysis Topic today Representation and Analysis Paper (Sections 1, 2) For next class: Read Representation and Analysis Paper (Section 3)
Chapter 5 Methods 1. Motivations Method : groups statements that perform a function.  Level of abstraction (black box)  Code Reuse – no need to reinvent.
Overview of Back-end for CComp Zhaopeng Li Software Security Lab. June 8, 2009.
C++ for Engineers and Scientists, Second Edition 1 Problem Solution and Software Development Software development procedure: method for solving problems.
Correct RelocationMarch 20, 2016 Correct Relocation: Do You Trust a Mutated Binary? Drew Bernat
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Paradyn Project Safe and Efficient Instrumentation Andrew Bernat.
ICS51 Introductory Computer Organization Accessing parameters from the stack and calling functions.
Reading Condition Codes (Cont.)
Machine-Level Programming 2 Control Flow
Conditional Branch Example
Homework Reading Labs PAL, pp
Recitation 2 – 2/4/01 Outline Machine Model
Authors: Khaled Abdelsalam Mohamed Amr Kamel
Machine-Level Programming 2 Control Flow
Machine-Level Programming 2 Control Flow
Machine-Level Programming III: Procedures Sept 18, 2001
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.
Machine-Level Representation of Programs III
Machine-Level Programming 2 Control Flow
Homework Reading Machine Projects Labs PAL, pp
Control Flow Analysis (Chapter 7)
Homework Any Questions?.
POWERPOINT PRESENTATION
Khaled Yakdan University of Bonn Fraunhofer FKIE
Automatically Diagnosing and Repairing Error Handling Bugs in C
Chapter 15 Debugging.
Compiler Construction CS 606 Sohail Aslam Lecture 1.
Chapter 15 Debugging.
Presentation transcript:

Native x86 Decompilation Using Semantics-Preserving Structural Analysis and Iterative Control-Flow Structuring Edward J. Schwartz *, JongHyup Lee ✝, Maverick Woo *, and David Brumley * Carnegie Mellon University * Korea National University of Transportation ✝

Which would you rather analyze? 8/15/13Usenix Security push %ebp mov %esp,%ebp sub $0x10,%esp movl $0x1,-0x4(%ebp) jmp 1d mov -0x4(%ebp),%eax imul 0x8(%ebp),%eax mov %eax,-0x4(%ebp) subl $0x1,0x8(%ebp) cmpl $0x1,0x8(%ebp) jg f mov -0x4(%ebp),%eax leave ret int f(int c) { int accum = 1; for (; c > 1; c--) { accum = accum * c; } return accum; } Functions Variables Types Control Flow

8/15/133 int f (int x) { int y = 1; while (x > y) { y++; } return y; Compiler Original Source int f (int a) { int v = 1; while (a > v++) {} return v; Decompiler Recovered Source Compiled Binary

Decompilers for Software Security Manual reverse-engineering – Traditional decompiler application Apply wealth of existing source-code techniques to compiled programs [Chang06] – Find bugs, vulnerabilities Heard at Usenix Security 2013, during Dowsing for Overflows – “We need source code to access the high-level control flow structure and types” 8/15/13Usenix Security 20134

Desired Properties for Security 1.Effective abstraction recovery – Abstractions improve comprehension 8/15/13Usenix Security 20135

Effective Abstraction Recovery 8/15/13Usenix Security s1; while (e1) { if (e2) { break; } s2; } s3; More Abstract s1; L1: if (e1) { goto L2; } else { goto L4; } L2: if (e2) { goto L4; } L3: s2; goto L1; L4: s3; Less Abstract

Desired Properties for Security 1.Effective abstraction recovery – Abstractions improve comprehension 2.Correctness – Buggy(Decompiled)  Buggy(Original) 8/15/13Usenix Security 20137

Correctness 8/15/ Compiler Decompiler Compiled Binary int f (int x) { int y = 1; while (x > y) { y++; } return y; Original Source int f (int a) { int v = 1; while (a > v++) {} return v; Recovered Source Are these two programs semantically equivalent?

Prior Work on Decompilation 8/15/13Usenix Security 20139

The Phoenix C Decompiler 8/15/13Usenix Security

How to build a better decompiler? Recover missing abstractions one at a time – Semantics preserving abstraction recovery Rewrite program to use abstraction Don’t change behavior of program Similar to compiler optimization passes 8/15/13Usenix Security

Semantics Preservation 8/15/13Usenix Security s1; while (e1) { if (e2) { break; } s2; } s3; s1; L1: if (e1) { goto L2; } else { goto L4; } L2: if (e2) { goto L4; } L3: s2; goto L1; L4: s3; Abstraction Recovery Are these two programs semantically equivalent?

How to build a better decompiler? Recover missing abstractions one at a time – Semantics preserving abstraction recovery Rewrite program to use abstraction Don’t change behavior of program Similar to compiler optimization passes Challenge: building semantics preserving recovery algorithms – This talk Focus on control flow structuring Empirical demonstration 8/15/13Usenix Security

Phoenix Overview 8/15/13Usenix Security CFG Recovery Type Recovery Control Flow Structuring Source-code Output int f (int x) { int y = 1; while (x > y) { y++; } return y; New in Phoenix

Control Flow Graph Recovery 8/15/13Usenix Security Vertex represents straight-line binary code Edges represents possible control-flow transitions Challenge: Where does jmp %eax go? Phoenix uses Value Set Analysis [Balakrishnan10] CFG Recovery e¬e

Type Inference on Executables (TIE) [Lee11] 8/15/13Usenix Security movl (%eax), %ebx Constraint 1: %eax is a pointer to type Constraint 2: %ebx has type Solve all constraints to find How does each instruction constrain the types?

Control Flow Structuring 8/15/13Usenix Security

Compilation ¬e e Control Flow Structuring if (e) {…;} else {…;} 8/15/13Usenix Security if (e) {…;} else {…;} ? ¬e e Control Flow Structuring

Control Flow Structuring: Don’t Reinvent the Wheel Existing algorithms – Interval analysis [Allen70] Identifies intervals or regions – Structural analysis [Sharir80] Classifies regions into more specific types Both have been used in decompilers Phoenix based on structural analysis 8/15/13Usenix Security

Structural Analysis Iteratively match patterns to CFG – Collapse matching regions Returns a skeleton: while (e) { if (e’) {…} } 8/15/13Usenix Security B2 if-then-else B3 B1 B2 B1 while

Structural Analysis Example 8/15/13Usenix Security WHILE SEQ ITE 1 SEQ 1...; while (...) { if (...) {...} else {...} };...;

Structural Analysis Property Checklist 1.Effective abstraction recovery 8/15/13Usenix Security

Structural Analysis Property Checklist 1.Effective abstraction recovery – Graceless failures for unstructured programs break, continue, and goto statements Failures cascade to large subgraphs 8/15/13Usenix Security

Unrecovered Structure 8/15/13Usenix Security This break edge prevents progress UNKNOWN SEQ s1; while (e1) { if (e2) { break; } s2; } s3; s1; L1: if (e1) { goto L2; } else { goto L4; } L2: if (e2) { goto L4; } L3: s2; goto L1; L4: s3; OriginalDecompiled Fix: New structuring algorithm featuring Iterative Refinement Fix: New structuring algorithm featuring Iterative Refinement

Remove edges that are preventing a match – Represent in decompiled source as break, goto, continue Allows structuring algorithm to make more progress 8/15/13Usenix Security

Iterative Refinement 8/15/13Usenix Security s1; while (e1) { if (e2) { break; } s2; } s3; s1; while (e1) { if (e2) { break; } s2; } s3; OriginalDecompiled BREAK SEQ 1 WHILE SEQ

Structural Analysis Property Checklist 1.Effective abstraction recovery – Graceless failures for unstructured programs break, continue, and goto s Failures cascade to large subgraphs 2.Correctness 8/15/13Usenix Security

Structural Analysis Property Checklist 1.Effective abstraction recovery – Graceless failures for unstructured programs break, continue, and goto s Failures cascade to large subgraphs 2.Correctness – Not originally intended for decompilation – Structure can be incorrect for decompilation 8/15/13Usenix Security

Natural Loop Correctness Problem 8/15/13Usenix Security x=1 y=2 x≠1 y≠2 NATURAL LOOP NATURAL LOOP y=2 x=1 while (true) { s1; if (x==1) goto L2; if (y==2) goto L1; } Non-determinism Fix: Ensure patterns are Semantics Preserving

Semantics Preservation Applies inside of control flow structuring too 8/15/13Usenix Security x=1 y=2 x≠1 y≠2 NATURAL LOOP NATURAL LOOP y=2 x=1

Phoenix Implementation and Evaluation 8/15/13Usenix Security

Readability: Phoenix Output 8/15/13Usenix Security int f (void) { int a = 42; int b = 0; while (a) { if (b) { puts("c"); break; } else { puts("d"); } a--; b++; } puts ("e"); return 0; } t_reg32 f (void) { t_reg32 v20 = 42; t_reg32 v24; for (v24 = 0; v20 != 0; v24 = v24 + 1) { if (v24 != 0) { puts ("c"); break; } puts ("d"); v20 = v20 - 1; } puts ("e"); return 0; } Original Decompiled

Large Scale Experiment Details Decompilers tested – Phoenix – Hex-Rays (industry state of the art) – Boomerang (academic state of the art) Boomerang Did not terminate in <1 hour for most programs GNU coreutils 8.17, compiled with gcc – Programs of varying complexity – Test suite 8/15/13Usenix Security

Metrics (end-to-end decompiler) 1.Effective abstraction recovery – Control flow structuring 2.Correctness 8/15/13Usenix Security

Control Flow Structure: Gotos Emitted (Fewer Better) 8/15/13Usenix Security

Control Flow Structure: Gotos Emitted (Fewer is Better) 8/15/13Usenix Security

Ideal: Correctness 8/15/ Compiler Decompiler Compiled Binary int f (int x) { int y = 1; while (x > y) { y++; } return y; int f (int a) { int v = 1; while (a > v++) {} return v; Original Source Recovered Source Are these two programs semantically equivalent?

Scalable: Testing 8/15/ Compiler Decompiler Compiled Binary int f (int x) { int y = 1; while (x > y) { y++; } return y; int f (int a) { int v = 1; while (a > v++) {} return v; Original Source Recovered Source Passes tests Is the decompiled program consistent with test requirements?

Number of Correct Utilities 8/15/13Usenix Security All Utilities 107

Correctness All known correctness errors attributed to type recovery – No known problems in control flow structuring Rare issues in TIE revealed by Phoenix stress testing – Even one type error can cause incorrectness – Undiscovered variables – Overly general type information 8/15/13Usenix Security

Conclusion Phoenix decompiler – Ultimate goal: Correct, abstract decompilation – Control-flow structuring algorithm Iterative refinement Semantics preserving schemas End-to-end correctness and abstraction recovery experiments on >100 programs – Phoenix Control flow structuring: Correctness: 50%  Correct, abstract decompilation of real programs is within reach – This paper: improving control flow structuring – Next direction: improved static type recovery 8/15/13Usenix Security

Thanks! Questions? Edward J. Schwartz 8/15/13Usenix Security

END