IBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State.

Slides:



Advertisements
Similar presentations
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
Advertisements

Compiler Optimized Dynamic Taint Analysis James Kasten Alex Crowell.
TaintScope: A Checksum-Aware Directed Fuzzing Tool for Automatic Software Vulnerability Detection Tielei Wang 1, Tao Wei 1, Guofei Gu 2, Wei Zou 1 1 Peking.
Ensuring Operating System Kernel Integrity with OSck By Owen S. Hofmann Alan M. Dunn Sangman Kim Indrajit Roy Emmett Witchel Kent State University College.
Towards Self-Testing in Autonomic Computing Systems Tariq M. King, Djuradj Babich, Jonatan Alava, and Peter J. Clarke Software Testing Research Group Florida.
FSE’14 Semantics-Based Obfuscation-Resilient Binary Code Similarity Comparison with Application to Software Plagiarism Detection Lannan Luo, Jiang Ming,
SMU SRG reading by Tey Chee Meng: Automatic Patch-Based Exploit Generation is Possible: Techniques and Implications by David Brumley, Pongsin Poosankam,
David Brumley, Pongsin Poosankam, Dawn Song and Jiang Zheng Presented by Nimrod Partush.
1 Program Slicing Purvi Patel. 2 Contents Introduction What is program slicing? Principle of dependences Variants of program slicing Slicing classifications.
Linear Obfuscation to Combat Symbolic Execution Zhi Wang 1, Jiang Ming 2, Chunfu Jia 1 and Debin Gao 3 1 Nankai University 2 Pennsylvania State University.
Reverse Engineering © SERG Code Cloning: Detection, Classification, and Refactoring.
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
Impeding Malware Analysis Using Conditional Code Obfuscation Paper by: Monirul Sharif, Andrea Lanzi, Jonathon Giffin, and Wenke Lee Conference: Network.
Effective and Efficient Malware Detection at the End Host Clemens Kolbitsch, Paolo Milani TU Vienna Christopher UCSB Engin Kirda.
BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference.
SOFTWARE SECURITY JORINA VAN MALSEN 1 FLAX: Systematic Discovery of Client-Side Validation Vulnerabilities in Rich Web Applications.
TaintCheck and LockSet LBA Reading Group Presentation by Shimin Chen.
Cpeg421-08S/final-review1 Course Review Tom St. John.
ECE Synthesis & Verification1 ECE 667 Spring 2011 Synthesis and Verification of Digital Systems Verification Introduction.
1 Achieving Trusted Systems by Providing Security and Reliability (Research Project #22) Project Members: Ravishankar K. Iyer, Zbigniew Kalbarczyk, Jun.
Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Regression Test Selection for AspectJ Software Guoqing Xu and Atanas.
The Superdiversifier: Peephole Individualization for Software Protection Mariusz H. Jakubowski Prasad Naldurg Chit Wei (Nick) Saw Ramarathnam Venkatesan.
1 Loop-Extended Symbolic Execution on Binary Programs Pongsin Poosankam ‡* Prateek Saxena * Stephen McCamant * Dawn Song * ‡ Carnegie Mellon University.
Principle of Functional Verification Chapter 1~3 Presenter : Fu-Ching Yang.
Automated Web Patrol with Strider HoneyMonkeys Present by Zhichun Li.
PJSISSTA '001 Black-Box Test Reduction Using Input-Output Analysis ISSTA ‘00 Patrick J. Schroeder, Bogdan Korel Department of Computer Science Illinois.
Address Obfuscation: An Efficient Approach to Combat a Broad Range of Memory Error Exploits Sandeep Bhatkar, Daniel C. DuVarney, and R. Sekar Stony Brook.
Jarhead Analysis and Detection of Malicious Java Applets Johannes Schlumberger, Christopher Kruegel, Giovanni Vigna University of California Annual Computer.
Automated malware classification based on network behavior
Lucent Technologies – Proprietary Use pursuant to company instruction Learning Sequential Models for Detecting Anomalous Protocol Usage (work in progress)
CISC Machine Learning for Solving Systems Problems Presented by: Akanksha Kaul Dept of Computer & Information Sciences University of Delaware SBMDS:
Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 17: Code Mining.
Vulnerability-Specific Execution Filtering (VSEF) for Exploit Prevention on Commodity Software Authors: James Newsome, James Newsome, David Brumley, David.
Stamping out worms and other Internet pests Miguel Castro Microsoft Research.
Introduction Overview Static analysis Memory analysis Kernel integrity checking Implementation and evaluation Limitations and future work Conclusions.
Behavior-based Spyware Detection By Engin Kirda and Christopher Kruegel Secure Systems Lab Technical University Vienna Greg Banks, Giovanni Vigna, and.
Analyzing and Detecting Network Security Vulnerability Weekly report 1Fan-Cheng Wu.
Plagiarism Detection for Multithreaded Software Based on Thread-Aware Software Birthmarks Zhenzhou Tian MOE Key Lab for Intelligent.
Roberto Paleari,Universit`a degli Studi di Milano Lorenzo Martignoni,Universit`a degli Studi di Udine Emanuele Passerini,Universit`a degli Studi di Milano.
Jose Sanchez 1 o Tielei Wang†, TaoWei†, Zhiqiang Lin‡, Wei Zou†. o Purdue University & Peking University o Proceedings of NDSS'09: Network and Distributed.
Auther: Kevian A. Roudy and Barton P. Miller Speaker: Chun-Chih Wu Adviser: Pao, Hsing-Kuo.
Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara.
Christopher Kruegel University of California Engin Kirda Institute Eurecom Clemens Kolbitsch Thorsten Holz Secure Systems Lab Vienna University of Technology.
1 Program Slicing Amir Saeidi PhD Student UTRECHT UNIVERSITY.
Stamping out worms and other Internet pests Miguel Castro Microsoft Research.
Protecting Software Code By Guards The George Washington University Cs297 YU-HAO HU.
Embedded Lab. Park Yeongseong.  Introduction  State of the art  Core values  Design  Experiment  Discussion  Conclusion  Q&A.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Overview of Compilers and JikesRVM John.
On the Relation between SAT and BDDs for Equivalence Checking Sherief Reda Rolf Drechsler Alex Orailoglu Computer Science & Engineering Dept. University.
CISC Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic.
SAFEWARE System Safety and Computers Chap18:Verification of Safety Author : Nancy G. Leveson University of Washington 1995 by Addison-Wesley Publishing.
Exploiting Instruction Streams To Prevent Intrusion Milena Milenkovic.
Formal Refinement of Obfuscated Codes Hamidreza Ebtehaj 1.
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.
Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software Paper by: James Newsome and Dawn Song.
PLC '06 Experience in Testing Compiler Optimizers Using Comparison Checking Masataka Sassa and Daijiro Sudo Dept. of Mathematical and Computing Sciences.
Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorphic Worms Zhichun Li 1, Lanjia Wang 2, Yan Chen 1 and Judy Fu 3 1 Lab.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
On the Relation Between Simulation-based and SAT-based Diagnosis CMPE 58Q Giray Kömürcü Boğaziçi University.
A Framework For Trusted Instruction Execution Via Basic Block Signature Verification Milena Milenković, Aleksandar Milenković, and Emil Jovanov Electrical.
Memory Protection through Dynamic Access Control Kun Zhang, Tao Zhang and Santosh Pande College of Computing Georgia Institute of Technology.
TriggerScope: Towards Detecting Logic Bombs in Android Applications
Automatic Network Protocol Analysis
Semantics-Aware Malware Detection
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.
CSC-682 Advanced Computer Security
IntScope: Automatically Detecting Integer overflow vulnerability in X86 Binary Using Symbolic Execution Tielei Wang, TaoWei, ZhingiangLin, weiZou Purdue.
Presentation transcript:

iBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State University D’Crypt Pte Ltd School of Information Systems, Singapore Management University

Introduction Binary Hunting: automatically finding Semantic Differences in binary programs Need to capture Semantic Differences –Differences in functionality (input-output behavior) Syntactic Differences cause false positives –Differences in instructions –Register allocation –Basic-block reordering –Variables rename –….

An example: gzip Different instructions in two versions, but with the same semantics A patch with 5 lines of code All the 75 non-empty functions are changed xor eax, eaxand ebx, 0 1 Gzip Long File Name Buffer Overflow Vulnerability 1

Importance of Binary Hunting Security applications of binary hunting Finding security vulnerabilities with patched binary –“BinHunt: Automatically finding semantic differences in binary programs”, ICICS 2008 Automatic patch-based exploit (1-day exploit ) generation –“Automatic Patch-Based Exploit Generation is Possible”, IEEE S&P 2008 Software plagiarism detection –“GPLAG: detection of software plagiarism by program dependence graph analysis”, KDD 2006 Adapting trained anomaly detectors to software patches –“Automatically adapting a trained anomaly detector to software patches”, RAID 2009 Malware analysis –“Polymorphic worm detection using structural information of executables”, RAID 2005 –“Large-scale malware indexing using function-call graphs”, CCS 2009 …

Challenge Source code of binary files is not available Function name extracted from these binary files are unreliable Variety of obfuscation …… Latest solutions -- find similarity/difference in control flow structure rather than binary instructions –Resistant to “superficial” changes –Example: BinDiff, BinHunt, DarunGrim, SMIT

Intra-procedural control flow vs. Inter-procedural control flow Intra-procedural control flow –Most previous work focus on the intra- procedural control flow. –Sub-graph isomorphism problem is NP- complete. –Example: 96% of non-empty functions of thttpd have fewer than 30 basic blocks. –Graph isomorphism is practical in analyzing intra-procedural control flow Inter-procedural control flow –No function boundary –Huge graph with large size of nodes, where graph isomorphism is impractical –Example: thttpd-2.25 totally has more than 4,300 basic blocks. More than 4,000 candidate matchings for single basic block

Function Transformation Obfuscation Function transformation obfuscation is well-studied –Inlining functions –Outlining functions –Cloning functions –Interleaving functions Performing such obfuscation is simple and without intensive analysis of the binaries. 1 C. Collberg, C. Thomborson, and D. Low. A taxonomy of obfuscating transformations. Technical Report 148, Department of Computer Sciences, The University of Auckland, July Inlining and outlining transformations 1

Advanced control flow obfuscation Control flow flattening –“Protection of software-based survivability mechanisms”, DSN 2001 –“An Approach to the Obfuscation of Control- Flow of Sequential Computer Programs” , ISC 2001 Redirecting control-flow with exceptions –“Binary Obfuscation Using Signals”, USENIX Security 2007 – “binOb+: a framework for potent and stealthy binary obfuscation” , AsiaCCS 2010 Function boundary information (Intra-procedural control flow) is not reliable !

Overview of iBinHunt iBinHunt: Binary Diffing with Inter-Procedural Control Flow Graphs iBinHunt provides practical solutions to large number of basic block matchings –Dynamic Tainting: Monitor the execution of the two binary programs under a common input and use taint analysis to record all basic blocks involved in the processing of the input. –Deep taint: assign different taint tags to various parts of the input; only basic blocks from two binary programs that are marked with the same taint tags are considered matching candidates ( a reduction factor of up to 74% ). –Basic block comparison: symbolic execution is first used to represent outputs of the basic blocks with their input symbols, and a theorem prover is then used to check if the outputs from the two basic block are semantically equivalent. –Automatic input generation: increases the coverage of tainted basic blocks by automatically generating inputs that result in different execution traces.

Deep taint for basic block comparison Inter-Procedural Control Flow Graphs Deep taint execution trace Deep Taint Basic block comparison

An example: thttpd Input and its taint tag colors Dynamic execution traces with Deep taint

Basic Blocks comparison Symbolic execution and theorem proving –Use symbolic execution to represent final values of outputs (registers and variables) –Use a theorem prover to test if the outputs of two basic blocks are always the same given the same inputs Context aware –the permutation of outputs of the equivalent basic blocks is the permutation of inputs of the successor blocks. Obtain the matching strength based on the result from the theorem

Basic block matching we need to consider two other groups of blocks for finding matched blocks. Blocks are not semantically equivalent but with the same taint tags Blocks are not tainted but on the dynamic execution trace They could very likely be the differences between the two programs that iBinHunt is trying to locate. E.g., BB_13232 and BB_16184 are the location of binary difference Due to various reasons including limitations of taint analysis, not directly processing program inputs (e.g., signal processing), etc.

Matching Strength Basic blocks B 1 and B 2 are considered matched to one another if B 1 and B 2 have the same taint tags (possibly non-tainted) and B 1 and B 2 are semantically equivalent (evaluated by symbolic execution and a theorem proving); or a predecessor of B 1 and a predecessor of B 2 match; or a successor of B 1 and a successor of B 2 match. predecessor successor

Automatic Input Generation Symbolic ExecutionConcrete Execution Symbolic Formula Initial Input: GET index.html HTTP/1.1 Host:. Constraint Solver (STP) New Input

Evaluation We applied iBinHunt to find semantic differences in several versions of thttpd and gzip. There are two main aspects on which we want to evaluate: – Efficiency: how many basic blocks can be matched under our definition of matching strength, how many matchings are identified by deep taint, and how long it takes to find these matchings. – Accuracy: confirm these differences by comparing them to the ground truth (program source code). Different versions of thttpd and gzip (number of lines changed / total number of lines) thttpd c / / / /7271 gzip / / /4841

Matching basic blocks We evaluate: Matched basic blocks that are semantically the same; Matched ones that are not semantically equivalent but have both a predecessor and a successor matched; Basic blocks are not semantically equivalent but have either a predecessor or a successor matched. The time taken by input generation and deep taint;

Effectiveness of deep taint Results show that more than 34% and 67% of the matched basic blocks in thttpd and gzip contain the same taint tags. – a large number of these matchings do contain the same taint tags; – even though many basic blocks are not tainted by our limited number of program inputs, their neighbors are tainted in most cases and the tainted neighbors help matchings to be identified. Percentage of matched basic blocks with the same taint representation thttpd c %38.2%39.9%37.4% gzip %72.2%72.6%

Accuracy BB_1371 from thttpd-2.19 should match with BB_1689 in thttpd-2.25, both of which deal with the “-i” argument. However, BB_1687 in thttpd-2.25 also contains the same (type of) instructions, which confuses the binary diffing tool in the matching.

Discussions Limitations –The power of iBinHunt is limited by the non-perfect basic block coverage. –In our experiments with thttpd and gzip, some basic blocks are not covered even if we continue to generate new program inputs –Performance Future work –More optimization on the code to improve efficiency. –Parallelizing Dynamic Taint Tracking –More in-depth binary difference analysis, in which (part of) the programs are only semantically equivalent on certain subset of the inputs.

Conclusion Introduce function obfuscation attacks in existing binary diffing tools that analyze intra-procedural control flow of programs. Propose a novel binary diffing tool called iBinHunt which analyzes the inter-procedural control flow. iBinHunt makes use of a novel technique called deep taint.