Saumya Debray The University of Arizona Tucson, AZ 85721.

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

Introducing Formal Methods, Module 1, Version 1.1, Oct., Formal Specification and Analytical Verification L 5.
Loop Unrolling & Predication CSE 820. Michigan State University Computer Science and Engineering Software Pipelining With software pipelining a reorganized.
Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Dec 5, 2007University of Virginia1 Efficient Dynamic Tainting using Multiple Cores Yan Huang University of Virginia Dec
1 Detection of Injected, Dynamically Generated, and Obfuscated Malicious Code (DOME) Subha Ramanathan & Arun Krishnamurthy Nov 15, 2005.
Preventing Reverse Engineering by Obfuscating Bharath Kumar.
Linear Obfuscation to Combat Symbolic Execution Zhi Wang 1, Jiang Ming 2, Chunfu Jia 1 and Debin Gao 3 1 Nankai University 2 Pennsylvania State University.
Reverse Engineering © SERG Code Cloning: Detection, Classification, and Refactoring.
Theoretical Program Checking Greg Bronevetsky. Background The field of Program Checking is about 13 years old. Pioneered by Manuel Blum, Hal Wasserman,
Impeding Malware Analysis Using Conditional Code Obfuscation Paper by: Monirul Sharif, Andrea Lanzi, Jonathon Giffin, and Wenke Lee Conference: Network.
CMSC 345, Version 11/07 SD Vick from S. Mitchell Software Testing.
Binary Obfuscation Using Signals Igor V. Popov ( University of Arizona)‏ Saumya K. Debray (University of Arizona)‏ Gregory R. Andrews (University of Arizona)
Vertically Integrated Analysis and Transformation for Embedded Software John Regehr University of Utah.
CS 104 Introduction to Computer Science and Graphics Problems
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Introduction & Overview CS4533 from Cooper & Torczon.
CHAPTER 10 Recursion. 2 Recursive Thinking Recursion is a programming technique in which a method can call itself to solve a problem A recursive definition.
Efficient Instruction Set Randomization Using Software Dynamic Translation Michael Crane Wei Hu.
2  Problem Definition  Project Purpose – Building Obfuscator  Obfuscation Quality  Obfuscation Using Opaque Predicates  Future Planning.
Oakkar Fall The Need for Decision Engine Automate business processes Implement complex business decision logic Separation of rules and process Business.
Software Testing Verification and validation planning Software inspections Software Inspection vs. Testing Automated static analysis Cleanroom software.
TESTING.
Static Control-Flow Analysis for Reverse Engineering of UML Sequence Diagrams Atanas (Nasko) Rountev Ohio State University with Olga Volgin and Miriam.
KEVIN COOGAN, GEN LU, SAUMYA DEBRAY DEPARTMENT OF COMUPUTER SCIENCE UNIVERSITY OF ARIZONA 報告者:張逸文 Deobfuscation of Virtualization- Obfuscated Software.
Bug Localization with Machine Learning Techniques Wujie Zheng
Ether: Malware Analysis via Hardware Virtualization Extensions Author: Artem Dinaburg, Paul Royal, Monirul Sharif, Wenke Lee Presenter: Yi Yang Presenter:
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
Executable Unpacking using Dynamic Binary Instrumentation Shubham Bansal (iN3O) Feb 2015 UndoPack 1.
Auther: Kevian A. Roudy and Barton P. Miller Speaker: Chun-Chih Wu Adviser: Pao, Hsing-Kuo.
Control Flow Deobfuscation via Abstract Interpretation © Rolf Rolles, 2010.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Software Verification, Validation and Testing.
Christopher Kruegel University of California Engin Kirda Institute Eurecom Clemens Kolbitsch Thorsten Holz Secure Systems Lab Vienna University of Technology.
Chapter 5: Control Structures II (Repetition). Objectives In this chapter, you will: – Learn about repetition (looping) control structures – Learn how.
Introduction to Problem Solving. Steps in Programming A Very Simplified Picture –Problem Definition & Analysis – High Level Strategy for a solution –Arriving.
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
RIVERSIDE RESEARCH INSTITUTE Deobfuscator: An Automated Approach to the Identification and Removal of Code Obfuscation Eric Laspe, Reverse Engineer Jason.
QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs By Koen Claessen, Juhn Hughes ME: Mike Izbicki.
Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology.
PROC-1 1. Software Development Process. PROC-2 A Process Software Development Process User’s Requirements Software System Unified Process: Component Based.
Automated Patch Generation Adapted from Tevfik Bultan’s Lecture.
A Generic Approach to Automatic Deobfuscation of Executable Code Paper by Babak Yadegari, Brian Johannesmeyer, Benjamin Whitely, Saumya Debray.
Software Engineering 2004 Jyrki Nummenmaa 1 BACKGROUND There is no way to generally test programs exhaustively (that is, going through all execution.
Scientific Debugging. Errors in Software Errors are unexpected behaviors or outputs in programs As long as software is developed by humans, it will contain.
Formal Refinement of Obfuscated Codes Hamidreza Ebtehaj 1.
R-Verify: Deep Checking of Embedded Code James Ezick † Donald Nguyen † Richard Lethin † Rick Pancoast* (†) Reservoir Labs (*) Lockheed Martin The Eleventh.
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
CS 5150 Software Engineering Lecture 21 Reliability 2.
October 20-23rd, 2015 FEEBO: A Framework for Empirical Evaluation of Malware Detection Resilience Against Behavior Obfuscation Sebastian Banescu Tobias.
Dissecting complex code-reuse attacks with ROPMEMU
Advanced Computer Systems
Cyber Physiology Analysis Framework Concept
Support for Program Analysis as a First-Class Design Constraint in Legion Michael Bauer 02/22/17.
Atanas (Nasko) Rountev Ohio State University
Java bytecode manipulation
Using Execution Feedback in Test Case Generation
Part 1: Basic Analysis Chapter 1: Basic Static Techniques
Machine-Independent Optimization
Human Complexity of Software
SAT-Based Area Recovery in Technology Mapping
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.
Yikes! Why is my SystemVerilog Testbench So Slooooow?
Automated Software Integration
CSC-682 Advanced Computer Security
Applying Use Cases (Chapters 25,26)
Applying Use Cases (Chapters 25,26)
Spring 2019 Prof. Eric Rotenberg
Presentation transcript:

Saumya Debray The University of Arizona Tucson, AZ 85721

The Problem  Rapid analysis and understanding of malware code essential for swift response to new threats ‒ Malicious software are usually heavily obfuscated against analysis  Existing approaches to reverse engineering such code are primitive ‒ not a lot of high-level tool support ‒ requires a lot of manual intervention ‒ slow, cumbersome, potentially error-prone  Delays development of countermeasures

Goals Develop automated techniques for analysis and reverse engineering of obfuscated binaries  semantics-based ‒ output is functionally equivalent to, but simpler than, the input program  generality ‒ should work on any obfuscation  even ones we haven’t thought of yet! ‒ should minimize assumptions about obfuscations

Challenges  can’t make assumptions about obfuscations ‒ what do we leverage for deobfuscation? ‒ distinguishing code we care about from code we don’t  how do we know which instructions we care about?  scale ‒ “needle in haystack”  no. of instructions executed increases by  270 x (VMprotect) to  4300 x (Themida) [Lau 2008]  anti-analysis defenses ‒ runtime unpacking ‒ anti-emulation, anti-debug checks

Our Approach  no obfuscation-specific assumptions ‒ treat programs as input-to-output transformations ‒ use semantics-preserving transformations to simplify execution traces  dynamic analysis to handle runtime unpacking Taint analysis (bit-level) Control flow reconstruction Semantics- preserving transformations input program control flow graph map flow of values from input to output simplify logic of input-to-output transformation reconstruct logic of simplified computation

Ex 1:Emulation-based Obfuscation  examination of the code reveals only the emulator’s logic ‒ actual program logic embedded in byte code  lots of “chaff” during execution ‒ separating emulator logic from payload logic tricky  emulators can be nested Obfuscator input program random seed bytecode logic (data) emulator (code) mutation engine

Ex 2:Return-Oriented Programs (ROP)  Originally designed to bypass anti-code-injection defenses ‒ stitches together existing code fragments ( “gadgets” ), e.g., in system libraries  Logic can be difficult to discern ‒ gadgets are typically scattered across many different functions and/or libraries ‒ gadgets can overlap in memory in weird ways ‒ control flow structures (if-else, loops, function calls) are typically implemented using non-standard idioms

Example 1 (emulation-obfuscation) factorial (Themida)

Example 2 (ROP) o originalROP factorial

Interactions between Obfuscations Example: Unpacking + Emulation unpack output input instructions “tainted” as propagating values from input to output execution trace input-to-output computation (further simplified) used to construct control flow graph

Results  Ex. 1. binary search : Themida originalobfuscated (cropped) deobfuscated

Results  Ex. 2. Hunatcha (drive infection code) : ExeCryptor originalobfuscated (cropped) deobfuscated

Results  Ex. 3. Stuxnet (encryption routine) : Code Virtualizer originalobfuscated (cropped) deobfuscated

Results  Ex. 3. fibonacci: ROP originalobfuscated deobfuscated

Results  Ex. 4. Win32/Kryptik.OHY: Code Virtualizer obfuscateddeobfuscated multiple layers of runtime code generation unpacking cod e initial unpacker is emulation-obfuscated the CFG shown materializes incrementally

Results: CFG Similarity

Lessons and Issues  Static vs. dynamic analysis ‒ multiple layers of runtime code generation/unpacking limits utility of static analysis ‒ dynamic analysis can run into problems of scale  O(n 2 ) algorithms impractical ; even O(n log n) can be problematic  trade memory space for execution time/complexity  code coverage — multi-path exploration?  Taint propagation ‒ byte/word-level analyses may not be precise enough  we use (enhanced) bit-level taint propagation  Simplified trace → CFG: NP-hard ‒ semantic considerations?

Conclusions  Rapid analysis and understanding of malware code essential for swift response to new threats ‒ need to deal with advanced code obfuscations ‒ obfuscation-specific solutions tend to be fragile  We describe a semantics-based framework for automatic code deobfuscation ‒ no assumptions about the obfuscation(s) used ‒ promising results on obfuscators (e.g., Themida) not handled by prior research

Semantics-based simplification  Quasi-invariant locations: locations that have the same value at each use.  Our transformations (currently): ‒ Arithmetic simplification  adaptation of constant folding to execution traces  consider quasi-invariant locations as constants  controlled to avoid over-simplification ‒ Data movement simplification  use pattern-driven rules to identify and simplify data movement. ‒ Dead code elimination  need to consider implicit destinations, e.g., condition code flags.