A Generic Approach to Automatic Deobfuscation of Executable Code Paper by Babak Yadegari, Brian Johannesmeyer, Benjamin Whitely, Saumya Debray.

Slides:



Advertisements
Similar presentations
ROP is Still Dangerous: Breaking Modern Defenses Nicholas Carlini et. al University of California, Berkeley USENIX Security 2014 Presenter: Yue Li Part.
Advertisements

Saumya Debray The University of Arizona Tucson, AZ
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
White Box and Black Box Testing Tor Stålhane. What is White Box testing White box testing is testing where we use the info available from the code of.
Ensuring Operating System Kernel Integrity with OSck By Owen S. Hofmann Alan M. Dunn Sangman Kim Indrajit Roy Emmett Witchel Kent State University College.
Linear Obfuscation to Combat Symbolic Execution Zhi Wang 1, Jiang Ming 2, Chunfu Jia 1 and Debin Gao 3 1 Nankai University 2 Pennsylvania State University.
Binary Obfuscation Using Signals Igor V. Popov ( University of Arizona)‏ Saumya K. Debray (University of Arizona)‏ Gregory R. Andrews (University of Arizona)
Chapter 2: Algorithm Discovery and Design
LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan.
1 Software Testing and Quality Assurance Lecture 30 – Testing Systems.
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
C++ fundamentals.
2  Problem Definition  Project Purpose – Building Obfuscator  Obfuscation Quality  Obfuscation Using Opaque Predicates  Future Planning.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Introduction To C++ Programming 1.0 Basic C++ Program Structure 2.0 Program Control 3.0 Array And Structures 4.0 Function 5.0 Pointer 6.0 Secure Programming.
1 Exception and Event Handling (Based on:Concepts of Programming Languages, 8 th edition, by Robert W. Sebesta, 2007)
Exceptions. Many problems in code are handled when the code is compiled, but not all Some are impossible to catch before the program is run  Must run.
Address Obfuscation: An Efficient Approach to Combat a Broad Range of Memory Error Exploits Sandeep Bhatkar, Daniel C. DuVarney, and R. Sekar Stony Brook.
Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 17: Code Mining.
Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Chapter 12: Exception Handling
CIS Computer Programming Logic
Preventing SQL Injection Attacks in Stored Procedures Alex Hertz Chris Daiello CAP6135Dr. Cliff Zou University of Central Florida March 19, 2009.
Chapter 1: A First Program Using C#. Programming Computer program – A set of instructions that tells a computer what to do – Also called software Software.
Invitation to Computer Science, Java Version, Second Edition.
Programmer's view on Computer Architecture by Istvan Haller.
KEVIN COOGAN, GEN LU, SAUMYA DEBRAY DEPARTMENT OF COMUPUTER SCIENCE UNIVERSITY OF ARIZONA 報告者:張逸文 Deobfuscation of Virtualization- Obfuscated Software.
On the Difficulty of Software-Based Attestation of Embedded Devices Claude Castelluccia Aurélien Francillon Daniele Perito INRIA Rhône-Alpes
Ether: Malware Analysis via Hardware Virtualization Extensions Author: Artem Dinaburg, Paul Royal, Monirul Sharif, Wenke Lee Presenter: Yi Yang Presenter:
CSC-682 Cryptography & Computer Security Sound and Precise Analysis of Web Applications for Injection Vulnerabilities Pompi Rotaru Based on an article.
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
Branch Regulation: Low-Overhead Protection from Code Reuse Attacks.
Auther: Kevian A. Roudy and Barton P. Miller Speaker: Chun-Chih Wu Adviser: Pao, Hsing-Kuo.
Problem Solving Techniques. Compiler n Is a computer program whose purpose is to take a description of a desired program coded in a programming language.
Christopher Kruegel University of California Engin Kirda Institute Eurecom Clemens Kolbitsch Thorsten Holz Secure Systems Lab Vienna University of Technology.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
Chapter 2: A Brief History Object- Oriented Programming Presentation slides for Object-Oriented Programming by Yahya Garout KFUPM Information & Computer.
RIVERSIDE RESEARCH INSTITUTE Deobfuscator: An Automated Approach to the Identification and Removal of Code Obfuscation Eric Laspe, Reverse Engineer Jason.
Presented by Teererai Marange. According to Caliskan-Islam et al.(2015), authorship attribution using the Code Stylometry feature set is possible when.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Chapter 3 Top-Down Design with Functions Part II J. H. Wang ( 王正豪 ), Ph. D. Assistant Professor Dept. Computer Science and Information Engineering National.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Chapter 8: Aspect Oriented Programming Omar Meqdadi SE 3860 Lecture 8 Department of Computer Science and Software Engineering University of Wisconsin-Platteville.
© 2006 Pearson Addison-Wesley. All rights reserved 2-1 Chapter 2 Principles of Programming & Software Engineering.
nd Joint Workshop between Security Research Labs in JAPAN and KOREA Polymorphic Worm Detection by Instruction Distribution Kihun Lee HPC Lab., Postech.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Formal Refinement of Obfuscated Codes Hamidreza Ebtehaj 1.
Chapter 1 Software Engineering Principles. Problem analysis Requirements elicitation Software specification High- and low-level design Implementation.
Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software Paper by: James Newsome and Dawn Song.
Formal Verification. Background Information Formal verification methods based on theorem proving techniques and model­checking –To prove the absence of.
Security Attacks Tanenbaum & Bo, Modern Operating Systems:4th ed., (c) 2013 Prentice-Hall, Inc. All rights reserved.
JavaScript Introduction and Background. 2 Web languages Three formal languages HTML JavaScript CSS Three different tasks Document description Client-side.
MOPS: an Infrastructure for Examining Security Properties of Software Authors Hao Chen and David Wagner Appears in ACM Conference on Computer and Communications.
Reachability Testing of Concurrent Programs1 Reachability Testing of Concurrent Programs Richard Carver, GMU Yu Lei, UTA.
Beyond Stack Smashing: Recent Advances In Exploiting Buffer Overruns Jonathan Pincus and Brandon Baker Microsoft Researchers IEEE Security and.
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
Jump-Oriented Programming
CMSC 345 Defensive Programming Practices from Software Engineering 6th Edition by Ian Sommerville.
TriggerScope: Towards Detecting Logic Bombs in Android Applications
Unified Modeling Language
Chapter 9 – Real Memory Organization and Management
INFS 6225 – Object-Oriented Systems Analysis & Design
Chapter 1 Introduction(1.1)
Presentation transcript:

A Generic Approach to Automatic Deobfuscation of Executable Code Paper by Babak Yadegari, Brian Johannesmeyer, Benjamin Whitely, Saumya Debray

Background: Obfuscation  In general: obscuring semantics of a message  In code, hides nature of implementation  Malicious software uses it to:  avoid detection  prevent reverse engineering  make analysis difficult

Background: Obfuscation  Variety of approaches to obfuscating code  Can combine approaches to create layers of complexity  Commercially available obfuscation tools  Approaches mentioned in the paper:  Emulation-based obfuscation  Return-oriented programming

Background: Emulation-Based  Design a VM with an arbitrary instruction set i  Translate executable into this byte-code of i  Program interpreted by emulator for the VM  Instruction set can be perturbed for each instance of program  Can be combined with run-time unpacking  This obfuscation creates a problem because:  Execution trace only shows instructions in the emulator  Confusing memory accesses: some for program logic, some for emulator

Background: Return-oriented  Intended to bypass DEP  Gadgets: code snippets ending in return statement  Strung together via a buffer containing their addresses  Scattered code  JIT generation can add to confusion  Non-deterministic functionality from run to run

Research Problem (in general)  One of the primary ways to understand and defend against malicious software is via reverse engineering real pieces of code  Figuring out how to deobfuscate obfuscated software is thus an important problem in security

Related Solutions and Limitations  Coogan et al (reference 6): use equational reasoning to simplify emulation based execution traces  Sharif et al (reference 5): start by reverse engineering the VM, then recover logic from byte code  Lu et al (reference 9): translate ROP shellcode to non-ROP shellcode  Coogan’s solution struggles with complex control flow  Sharif’s solution has to make strong assumptions about the emulator  Lu’s solution is specific to ROP

Motivation and Research Problem v2  The drawback mentioned for Lu’s solution is shared by the others  This is the limitation that motivated the authors  Current deobfuscation techniques have to make an assumption about the obfuscation technique  Can’t handle instances of new or alternative techniques  The actual research problem addressed in the paper: How can we design a general approach to deobfuscation that works regardless of the obfuscation technique?

Contribution  The authors successfully describe and design a generic approach to deobfuscation  Implement their approach and evaluate it  Test programs obfuscated by emulation-based and return oriented programming, programs obfuscated via commercial tools

Approach  Note about threat model: assumes attacker is aware of their approach  First step to deobfuscation is determining what is semantically significant  Issue: this changes based on obfuscation technique  How to decide what is significant without making assumptions?

Approach  Solution: consider mapping of input values to output values  How does the input affect the output?  Use dynamic analysis  Collect multiple execution traces  Use info to simplify code and develop CFG

Approach  Four main steps  Identifying input and output values  Taint propagation and analysis  Code simplification  Control flow graph construction

Approach

Identifying Input to Output Flows  Steps 1 and 2 from previous slide  Step 1: Determining input and output values  Input values  values obtained from command line (arguments)  values written by a library routine, read by program instruction  Output values  defined by program instruction, read by library routine

Identifying Input to Output Flows  Step 2  Taint analysis/taint propagation  This is what actually determines the flow of input to output  i.e. how the input values eventually determine the output values  Notion of “tainted” values  Initial tainted values: input values  Values computed from tainted values become tainted  Forward and backward analysis  Special implementation

Identifying Input to Output Flows

Approach

Control Dependency Analysis  Necessary for the final steps  Notion of control-dependent instructions  Instruction j control-dependent on i if:  outcome of i determines if j is executed  Why is it necessary?  Get explicit control flow almost straight from taint analysis, but not implicit  Knowing both is important for untangling CFG

Approach

Trace Simplification  Step 3  This is where the code gets simplified  Use semantics preserving transformations  Notion of quasi-invariant locations  They mention the following transformations (non-exhaustive)  Arithmetic simplification  Indirect memory reference simplification  Data movement simplification  Dead code elimination  Control transfer simplification

Trace Simplification

Approach

Control Flow Graph Construction  Final step  Use the simplified trace T from the previous step  sequence of instructions -> sequence of basic blocks  Follow their given algorithm to add edges to graph G connecting blocks  Difficulties to handle  Code reuse causes lots of tangled connections  Not violating graph structural constraints  Dead code elimination causes algorithm to create unnecessary blocks

Control Flow Graph Construction

Evaluation  Prototype implementation  Evaluate by using edit distance algorithm to compare CFG’s  Emulation-based  Try to deobfuscate code obfuscated by four commercial tools  Single and multi-level emulation  Average similarity %84.55 for single, %83.85 for multi  Return oriented  2 sets of test programs  Simple synthetic programs, ie. factorial, bubble-sort  Average similarity %84.16  Compared results with results from Coogan’s tool  30 to 60 percent higher similarity

Evaluation

Author Mentioned Limitations  Doesn’t handle a program attempting to thwart analysis  Code coverage could be an issue  Given assumption about attacker, three ways one could attempt to weaken effectiveness  don’t obfuscate i/o, but entwine obfuscated code with i/o more deeply  add in noisy i/o operations  hide some computations

My Criticisms/Opinion  Overall, I thought the paper was good  For a complicated topic, was less of a struggle than I expected  Some reasonings/explanations felt forced though  For example, quasi-invariant locations  How steps of approach link together  Claim generic approach, but test only two obfuscation techniques  Zero to limited discussion of how CFG dissimilarity would affect potential understanding of program  What’s an acceptable threshold?  IE is a %70 similar CFG really helpful?

Questions 1. What two obfuscation techniques do the authors focus on? 2. For taint propagation, what initial values are “tainted” and how do tainted values spread? 3. What was the primary limitation of existing solutions that motivated this paper?

Questions 1. What two obfuscation techniques do the authors focus on? Emulation-based and Return-Oriented Programming 2. For taint propagation, what initial values are “tainted” and how do tainted values spread? 3. What was the primary limitation of existing solutions that motivated this paper?

Questions 1. What two obfuscation techniques do the authors focus on? Emulation-based and Return-Oriented Programming 2. For taint propagation, what initial values are “tainted” and how do tainted values spread? I.E. Input values: from command line, written from library rountine Any computations involving a tainted value becomes tainted 3. What was the primary limitation of existing solutions that motivated this paper?

Questions 1. What two obfuscation techniques do the authors focus on? 2. For taint propagation, what initial values are “tainted” and how do tainted values spread? 3. What was the primary limitation of existing solutions that motivated this paper?

Thank you!