1 Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation David Brumley, Juan.

Slides:



Advertisements
Similar presentations
Copyright 2000 Cadence Design Systems. Permission is granted to reproduce without modification. Introduction An overview of formal methods for hardware.
Advertisements

Auto-Generation of Test Cases for Infinite States Reactive Systems Based on Symbolic Execution and Formula Rewriting Donghuo Chen School of Computer Science.
A System to Generate Test Data and Symbolically Execute Programs Lori A. Clarke September 1976.
Satisfiability Modulo Theories (An introduction)
Compiler Optimized Dynamic Taint Analysis James Kasten Alex Crowell.
White Box and Black Box Testing Tor Stålhane. What is White Box testing White box testing is testing where we use the info available from the code of.
1/20 Generalized Symbolic Execution for Model Checking and Testing Charngki PSWLAB Generalized Symbolic Execution for Model Checking and Testing.
TaintScope: A Checksum-Aware Directed Fuzzing Tool for Automatic Software Vulnerability Detection Tielei Wang 1, Tao Wei 1, Guofei Gu 2, Wei Zou 1 1 Peking.
David Brumley, Pongsin Poosankam, Dawn Song and Jiang Zheng Presented by Nimrod Partush.
Linear Obfuscation to Combat Symbolic Execution Zhi Wang 1, Jiang Ming 2, Chunfu Jia 1 and Debin Gao 3 1 Nankai University 2 Pennsylvania State University.
Hybrid Concolic Testing Rupak Majumdar Koushik Sen UC Los Angeles UC Berkeley.
1 Semantic Description of Programming languages. 2 Static versus Dynamic Semantics n Static Semantics represents legal forms of programs that cannot be.
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
Impeding Malware Analysis Using Conditional Code Obfuscation Paper by: Monirul Sharif, Andrea Lanzi, Jonathon Giffin, and Wenke Lee Conference: Network.
1 Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation David Brumley, Juan.
Software Testing and Quality Assurance
1 Advanced Material The following slides contain advanced material and are optional.
CS 330 Programming Languages 09 / 16 / 2008 Instructor: Michael Eckmann.
Leveraging User Interactions for In-Depth Testing of Web Applications Sean McAllister, Engin Kirda, and Christopher Kruegel RAID ’08 1 Seoyeon Kang November.
1 Loop-Extended Symbolic Execution on Binary Programs Pongsin Poosankam ‡* Prateek Saxena * Stephen McCamant * Dawn Song * ‡ Carnegie Mellon University.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
CBLOCK: An Automatic Blocking Mechanism for Large-Scale Deduplication Tasks Ashwin Machanavajjhala Duke University with Anish Das Sarma, Ankur Jain, Philip.
Efficient Instruction Set Randomization Using Software Dynamic Translation Michael Crane Wei Hu.
Lucent Technologies – Proprietary Use pursuant to company instruction Learning Sequential Models for Detecting Anomalous Protocol Usage (work in progress)
Vulnerability-Specific Execution Filtering (VSEF) for Exploit Prevention on Commodity Software Authors: James Newsome, James Newsome, David Brumley, David.
© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.
An Introduction to MBT  what, why and when 张 坚
GENERAL CONCEPTS OF OOPS INTRODUCTION With rapidly changing world and highly competitive and versatile nature of industry, the operations are becoming.
COMPUTER PROGRAMMING Source: Computing Concepts (the I-series) by Haag, Cummings, and Rhea, McGraw-Hill/Irwin, 2002.
1 Intelligent Systems ISCRAM 2013 Validating Procedural Knowledge in the Open Virtual Collaboration Environment Gerhard Wickler AIAI, University.
Investigating System Requirements
Carnegie Mellon Selected Topics in Automated Diversity Stephanie Forrest University of New Mexico Mike Reiter Dawn Song Carnegie Mellon University.
Software Testing Testing types Testing strategy Testing principles.
FiG: Automatic Fingerprint Generation Shobha Venkataraman Joint work with Juan Caballero, Pongsin Poosankam, Min Gyung Kang, Dawn Song & Avrim Blum Carnegie.
Reasoning about Information Leakage and Adversarial Inference Matt Fredrikson 1.
1 © Copyright 2000 Ethel Schuster The Web… in 15 minutes Ethel Schuster
IXA 1234 : C++ PROGRAMMING CHAPTER 1. PROGRAMMING LANGUAGE Programming language is a computer program that can solve certain problem / task Keyword: Computer.
1 © 2002 John Urrutia. All rights reserved. Qbasic Constructing Qbasic Programs.
Property of Jack Wilson, Cerritos College1 CIS Computer Programming Logic Programming Concepts Overview prepared by Jack Wilson Cerritos College.
Developing software and hardware in parallel Vladimir Rubanov ISP RAS.
Christopher Kruegel University of California Engin Kirda Institute Eurecom Clemens Kolbitsch Thorsten Holz Secure Systems Lab Vienna University of Technology.
1 CPSC 320: Intermediate Algorithm Design and Analysis July 28, 2014.
Chapter 1 Introduction. Chapter 1 - Introduction 2 The Goal of Chapter 1 Introduce different forms of language translators Give a high level overview.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University July 21, 2008WODA.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
EXE: Automatically Generating Inputs of Death Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, Dawson R. Engler 13th ACM conference on.
Compiler Construction (CS-636)
System Test Planning SYSTTPLAN 1 Location of Test Planning Responsibilities for Test Planning Results of Test Planning Structure of a Test Plan Test Definitions.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Binary Concolic Execution for Automatic Exploit Generation Todd Frederick.
Using Symbolic PathFinder at NASA Corina Pãsãreanu Carnegie Mellon/NASA Ames.
Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.
System To Generate Test Data: The Analysis Program Syed Nabeel.
CES 592 Theory of Software Systems B. Ravikumar (Ravi) Office: 124 Darwin Hall.
Automated Formal Verification of PLC (Programmable Logic Controller) Programs
Lazy Annotation for Program Testing and Verification (Supplementary Materials) Speaker: Chen-Hsuan Adonis Lin Advisor: Jie-Hong Roland Jiang December 3,
T EST T OOLS U NIT VI This unit contains the overview of the test tools. Also prerequisites for applying these tools, tools selection and implementation.
Random Test Generation of Unit Tests: Randoop Experience
PROGRAMMING FUNDAMENTALS INTRODUCTION TO PROGRAMMING. Computer Programming Concepts. Flowchart. Structured Programming Design. Implementation Documentation.
On the Relation Between Simulation-based and SAT-based Diagnosis CMPE 58Q Giray Kömürcü Boğaziçi University.
Wolfgang Runte Slide University of Osnabrueck, Software Engineering Research Group Wolfgang Runte Software Engineering Research Group Institute.
Automatic Network Protocol Analysis
Computational Thinking, Problem-solving and Programming: General Principals IB Computer Science.
Data Structures and Algorithms
Systems Analysis and Design in a Changing World, 6th Edition
Module 1: Getting Started
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.
Searching Similar Segments over Textual Event Sequences
Chapter 1 Introduction(1.1)
CSC-682 Advanced Computer Security
IntScope: Automatically Detecting Integer overflow vulnerability in X86 Binary Using Symbolic Execution Tielei Wang, TaoWei, ZhingiangLin, weiZou Purdue.
Presentation transcript:

1 Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation David Brumley, Juan Caballero, Zhenkai Liang, James Newsome, and Dawn Song Carnegie Mellon University

2 Introduction Many different implementations usually exist for the same protocol –HTTP Servers: Apache, Miniweb, … Deviation — difference in how two implementations of the same protocol interpret the same input Deviations are often results of –Implementation errors –Different interpretations of the same protocol specification

3 Importance of Deviations Security applications of deviations Error detection –Deviations suggest good candidate for errors –No need for complex protocol model Fingerprint generation –Inputs triggering deviation are natural fingerprints –Automatic fingerprint generation is important for fingerprinting tools

4 Problem Definition: Deviation Detection We focus on behavior-related deviations, instead of minor output details –HTTP Status 200 vs. Status 404 We view program as function from input space I to protocol state space S –Apache maps “ GET /index.html ” to Status 200 Given two programs P A and P M of the same protocol, easy to find an input i, Our goal: Automatically generate input j, P : I ! S P A (i) = P M (i) = s P A (j) ≠ P M (j)

5 A M Problem Setting Are there deviations between server A and server M? If yes, how to find inputs to demonstrate them?

6 Possible HTTP Queries A M Naïve Solution: Random Testing Status 200

7 Possible HTTP Queries Inferring Inputs M A Symbolic Input Status 200 (IA [ IM)¡(IA \ IM)(IA [ IM)¡(IA \ IM)

8 Our Approach INPUT: two implementations P A and P M of the same protocol 1.Create formula f A modeling how P A interprets a symbolic input, formula f M modeling how P M interprets the same input –Symbolic formula: predicate over symbolic inputs 2.Use f A and f M to infer (I A [ I M ) ¡ (I A \ I M ) ? –Generate candidate deviation inputs 3.Validate candidate deviation inputs OUTPUT: generated list of inputs that make P A and P M reach different protocol states

9 Contributions 1.A novel approach for automatically discover deviations in binaries of a protocol –Build symbolic formulas to compare two implementations Benefits: –Faithful to implementations –No source code needed –Efficient 2.Two applications of deviations –Error detection –Fingerprint generation 3.Found errors and fingerprints in real programs

10 Talk Outline Introduction Approach Overview Evaluation Related Work Summary

11 Approach Overview 1. Formula Extraction 2. Deviation Detection 3. Validation A M Symbolic FormulasCandidate Deviation Inputs Deviation Inputs (IA [ IM)¡(IA \ IM)(IA [ IM)¡(IA \ IM)

12 Key Concepts Key idea: Use a symbolic formula f to represent how a program P interprets a symbolic input i Recall: A program P is a function from input space to protocol state space A symbolic formula f is a predicate on symbolic inputs. –Formula f represents the inputs can make program P reaches protocol state s

13 Key Concepts (Cont.) Formula f can be generated by calculating weakest precondition from P and s For a reasonable formula size, our current approach generates formulas on a single program path

14 Step 1: Formula Extraction x86 instructions MOV AL, [ECX] SUB AL, ‘/’ JZ NEXT... Intermediate Language (ILA) AL = INPUT[4] AL = AL – ‘/’ ZF = (AL == 0) IF (ZF==1) THEN JMP(NEXT) Symbolic formula f A (INPUT) = (INPUT[4] == ‘/’) GET /index.html : ZF == 1 A INPUT[4]

15 Step 2: Deviation Detection Formulas from Step 1 –Server A: f A ( INPUT ) = ( INPUT[4] == ‘/’) –Server M: f M ( INPUT ) = ( INPUT[4] != 0) Construct queries Solve f A ^: f M, : f A ^ f M –Candidate deviation inputs GET %index.html GET Aindex.html... I M -I A f A ^: f M :fA^fM:fA^fM

16 Step 3: Validation Problem: Multiple paths to a protocol state –Our formula is based on a single path –Candidate deviation inputs may not lead to deviations Solution: Validate candidate deviation inputs –Send candidate deviation inputs to both implementations –Compare resulting protocol states Deviation inputs GET %index.html, GET Aindex.html, …

17 Talk Outline Introduction Approach Overview Evaluation Related Work Summary

18 Evaluation Overview Implementation –BitBlaze binary analysis platform –Solver: STP (decision procedure) –Supports Windows and Linux binaries Evaluated text and binary protocols –Text-based protocol: HTTP »Apache 2.2.4, Miniweb 0.8.1, Savant 3.1 –Binary-based protocol: NTP »NetTime 2.0b7, NTPD

19 Input: Request for homepage GET /index.html Step 2: DetectionStep 3: Validation f Apache ^: f Miniweb No candidate f Apache ^: f Savant CandidateNo deviation f Miniweb ^: f Apache CandidateDeviation f Miniweb ^: f Savant CandidateDeviation f Savant ^: f Apache No candidate f Savant ^: f Miniweb No candidate Evaluation: HTTP

20 Performance Time Apache 39.5s Miniweb 20.5s Savant 21.5s NTPD 5.37s NetTime 5.05s Time Apache & Miniweb 21.3s Apache & Savant 11.8s Savant & Miniweb 9.0s NetTime & NTPD 0.56s Symbolic formulaCandidate Deviation Inputs NTP: 6 seconds to detect deviation HTTP: 1 minute to detect deviation

21 Future Work Explore different program paths –Rudder: automatic dynamic path exploration Create multi-path formulas –The weakest precondition algorithm used in our approach can handle multiple program paths Details at

22 Related Work Symbolic execution [King76] and weakest precondition [Dijkstra76, Cohen90, Brumley07] Fuzz testing [Kaksonen01,Marquis05,Oehlert05,Xiao03] –Random and semi-random input generation –No deep analysis on how an input is used Implementation error detection –Static source code analysis [Chen02, Udrea06] and Model checking [Chaki03, Musuvathi02, Musuvathi04] »Need manually defined models Protocol fingerprint generation –Manual fingerprint generation [Comer94, Paxson97] »Need manual analysis –Automatic fingerprint generation [Caballero07] »Need semi-random input selection

23 Summary A novel approach for automatically discover deviations in binaries –Use symbolic formulas to represent how a program interprets inputs –Solve formulas to compare two implementations –Validate generated inputs Applications of deviations –Error detection –Fingerprint generation

24 Thank you! For more information and related projects: Visit