1 Loop-Extended Symbolic Execution on Binary Programs Pongsin Poosankam ‡* Prateek Saxena * Stephen McCamant * Dawn Song * ‡ Carnegie Mellon University.

Slides:



Advertisements
Similar presentations
Cristian Cadar, Peter Boonstoppel, Dawson Engler RWset: Attacking Path Explosion in Constraint-Based Test Generation TACAS 2008, Budapest, Hungary ETAPS.
Advertisements

Binary Analysis for Botnet Reverse Engineering & Defense Dawn Song UC Berkeley.
Leonardo de Moura Microsoft Research. Z3 is a new solver developed at Microsoft Research. Development/Research driven by internal customers. Free for.
1 Symbolic Execution Kevin Wallace, CSE
Finding bugs: Analysis Techniques & Tools Symbolic Execution & Constraint Solving CS161 Computer Security Cho, Chia Yuan.
Symbolic Execution with Mixed Concrete-Symbolic Solving
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
TaintScope: A Checksum-Aware Directed Fuzzing Tool for Automatic Software Vulnerability Detection Tielei Wang 1, Tao Wei 1, Guofei Gu 2, Wei Zou 1 1 Peking.
Symbolic execution © Marcelo d’Amorim 2010.
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
IBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State.
SMU SRG reading by Tey Chee Meng: Automatic Patch-Based Exploit Generation is Possible: Techniques and Implications by David Brumley, Pongsin Poosankam,
David Brumley, Pongsin Poosankam, Dawn Song and Jiang Zheng Presented by Nimrod Partush.
Bouncer securing software by blocking bad input Miguel Castro Manuel Costa, Lidong Zhou, Lintao Zhang, and Marcus Peinado Microsoft Research.
1 FLAX: Systematic Discovery of Client-Side Validation Vulnerabilities in Rich Web Applications Pongsin Poosankam ‡* Prateek Saxena * Steve Hanna * Dawn.
Linear Obfuscation to Combat Symbolic Execution Zhi Wang 1, Jiang Ming 2, Chunfu Jia 1 and Debin Gao 3 1 Nankai University 2 Pennsylvania State University.
Chapter Four Data Types Pratt 2 Data Objects A run-time grouping of one or more pieces of data in a virtual machine a container for data it can be –system.
1 Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation David Brumley, Juan.
Symmetry-Aware Predicate Abstraction for Shared-Variable Concurrent Programs Alastair Donaldson, Alexander Kaiser, Daniel Kroening, and Thomas Wahl Computer.
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
Impeding Malware Analysis Using Conditional Code Obfuscation Paper by: Monirul Sharif, Andrea Lanzi, Jonathon Giffin, and Wenke Lee Conference: Network.
1 Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation David Brumley, Juan.
1 Refining Buffer Overflow Detection via Demand-Driven Path-Sensitive Analysis Wei Le and Mary Lou Soffa University of Virginia sotesty.cs.virginia.edu.
Checking and Inferring Local Non-Aliasing Alex AikenJeffrey S. Foster UC BerkeleyUMD College Park John KodumalTachio Terauchi UC Berkeley.
SOFTWARE SECURITY JORINA VAN MALSEN 1 FLAX: Systematic Discovery of Client-Side Validation Vulnerabilities in Rich Web Applications.
Precise Inter-procedural Analysis Sumit Gulwani George C. Necula using Random Interpretation presented by Kian Win Ong UC Berkeley.
Leveraging User Interactions for In-Depth Testing of Web Applications Sean McAllister, Engin Kirda, and Christopher Kruegel RAID ’08 1 Seoyeon Kang November.
1 RISE: Randomization Techniques for Software Security Dawn Song CMU Joint work with Monica Chew (UC Berkeley)
Scalable and Flexible Static Analysis of Flight-Critical Software Guillaume P. Brat Arnaud J. Venet Carnegie.
This is a work of the U.S. Government and is not subject to copyright protection in the United States. The OWASP Foundation OWASP AppSec DC October 2005.
Alleviating False Alarm Problem of Static Buffer Overflow Analysis Youil Kim
A New Fuzzing Technique for Software Vulnerability Testing IEEE CONSEG 2009 Zhiyong Wu 1 J. William Atwood 2 Xueyong Zhu 3 1,3 Network Information Center.
By David Brumley, James Newsome, Dawn Song and Hao Wang and Somesh Jha.
Dynamic Test Generation To Find Integer Bugs in x86 Binary Linux Programs David Molnar Xue Cong Li David Wagner.
CUTE: A Concolic Unit Testing Engine for C Technical Report Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
Michael Ernst, page 1 Collaborative Learning for Security and Repair in Application Communities Performers: MIT and Determina Michael Ernst MIT Computer.
Fundamental File Structure Concepts & Managing Files of Records
BLENDED ATTACKS EXPLOITS, VULNERABILITIES AND BUFFER-OVERFLOW TECHNIQUES IN COMPUTER VIRUSES By: Eric Chien and Peter Szor Presented by: Jesus Morales.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Automatic Diagnosis and Response to Memory Corruption Vulnerabilities Authors: Jun Xu, Peng Ning, Chongkyung Kil, Yan Zhai, Chris Bookholt In ACM CCS’05.
HAMPI A Solver for String Constraints Vijay Ganesh MIT (With Adam Kiezun, Philip Guo, Pieter Hooimeijer and Mike Ernst)
Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central Florida April 7, 2011 David Molnar UC Berkeley
FiG: Automatic Fingerprint Generation Shobha Venkataraman Joint work with Juan Caballero, Pongsin Poosankam, Min Gyung Kang, Dawn Song & Avrim Blum Carnegie.
Computer Science Detecting Memory Access Errors via Illegal Write Monitoring Ongoing Research by Emre Can Sezer.
Automated Whitebox Fuzz Testing Network and Distributed System Security (NDSS) 2008 by Patrice Godefroid, ‏Michael Y. Levin, and ‏David Molnar Present.
Jose Sanchez 1 o Tielei Wang†, TaoWei†, Zhiqiang Lin‡, Wei Zou†. o Purdue University & Peking University o Proceedings of NDSS'09: Network and Distributed.
Automatic Diagnosis and Response to Memory Corruption Vulnerabilities Presenter: Jianyong Dai Jun Xu, Peng Ning, Chongkyung Kil, Yan Zhai, Chris Bookhot.
Deriving Input Syntactic Structure From Execution Zhiqiang Lin Xiangyu Zhang Purdue University November 11 th, 2008 The 16th ACM SIGSOFT International.
jFuzz – Java based Whitebox Fuzzing
EXE: Automatically Generating Inputs of Death Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, Dawson R. Engler 13th ACM conference on.
Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd.
A Tool for Pro-active Defense Against the Buffer Overrun Attack D. Bruschi, E. Rosti, R. Banfi Presented By: Warshavsky Alex.
A System to Generate Test Data and Symbolically Execute Programs Lori A. Clarke Presented by: Xia Cheng.
Symbolic and Concolic Execution of Programs Information Security, CS 526 Omar Chowdhury 10/7/2015Information Security, CS 5261.
Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler.
Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.
Information Security - 2. A Stack Frame. Pushed to stack on function CALL The return address is copied to the CPU Instruction Pointer when the function.
CUTE: A Concolic Unit Testing Engine for C Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
Chapter 10 Chapter 10 Implementing Subprograms. Implementing Subprograms  The subprogram call and return operations are together called subprogram linkage.
INTRODUCTION TO COMPUTER PROGRAMMING(IT-303) Basics.
Chapter 2 Build Your First Project A Step-by-Step Approach 2 Exploring Microsoft Visual Basic 6.0 Copyright © 1999 Prentice-Hall, Inc. By Carlotta Eaton.
@Yuan Xue Worm Attack Yuan Xue Fall 2012.
Sabrina Wilkes-Morris CSCE 548 Student Presentation
Path-Based Fault Correlations
Secure Software Development: Theory and Practice
Presented by Mahadevan Vasudevan + Microsoft , *UC-Berkeley
High Coverage Detection of Input-Related Security Faults
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.
CSC-682 Advanced Computer Security
CUTE: A Concolic Unit Testing Engine for C
Presentation transcript:

1 Loop-Extended Symbolic Execution on Binary Programs Pongsin Poosankam ‡* Prateek Saxena * Stephen McCamant * Dawn Song * ‡ Carnegie Mellon University * UC Berkeley

2 Dynamic Symbolic Execution Combines concrete execution with symbolic execution Has important applications Program Testing and Analysis –Automatic test case generation (DART, PEX, EXE, KLEE, SAGE) –Given an initial test case, find a variant that executes a different path Computer Security –Vulnerability Discovery & Exploit Generation –Given an initial benign test case, find a variant that triggers a bug –Vulnerability Diagnosis & Signature Generation –Given an initial exploit for a vulnerability, find a set of conditions necessary to trigger it

3 Limitations of Previous Approach Symbolic ExecutionConcrete Execution Program Symbolic Formula Initial Input Single-Path Symbolic Execution (SPSE) Ineffective for loops!

4 Contributions of Our Work Symbolic Single-Path Reasoning Concrete Execution Symbolic Loop Reasoning Loop-Extended Symbolic Execution (LESE) Generalizes symbolic reasoning to loops SPSE LESE Applicable directly to binaries Demonstrate its effectiveness in an important security application Buffer overflow diagnosis & discovery Show scalability for practical real-world examples

5 Talk Outline Motivation Overview of LESE Technical Approach Results and Applications Conclusion & Related Work

6 Motivation: A HTTP Server Example Input void process_request (char* input) { char URL [1024]; … for (ptr = 4; input [ptr] != ' '; ptr++) urlLen ++; … for (i = 0, p = 4; i < urlLen; i++) { URL [i] = input [p++]; } GET /index.html HTTP/1.1 CMD URL VERSION Calculating length Copying URL to buffer

7 Motivation: A HTTP Server Example void process_request (char* input) { char URL [1024]; … for (ptr = 4; input [ptr] != ' '; ptr++) urlLen ++; … for (i = 0, p = 4; i < urlLen; i++) { ASSERT (i < 1024); URL [i] = input [p++]; } Goal: Check if the buffer can be overflowed

8 Motivation: A HTTP Server Example Symbolic Constraints GET /index.html HTTP/1.1 F url [0] ≠ ‘ ’ F url [1] ≠ ‘ ’ F url [2] ≠ ‘ ’ … F url [12]= ‘ ’ (‘/’ != ‘ ’) Concrete Constraints (‘i’ != ‘ ’) (‘n’ != ‘ ’) … (‘ ’ == ‘ ’) ‘i’ not symbolic Vanilla SPSE would try over 1000 tests before exploiting void process_request (char* input) { char URL [1024]; … for (ptr = 4; input [ptr] != ' '; ptr++) urlLen ++; … for (i = 0, p = 4; i < urlLen; i++) { ASSERT (i < 1024); URL [i] = input [p++]; }

9 Talk Outline Motivation Overview of LESE Technical Approach Results and Applications Conclusion & Related Work

10 Overview LESE: Finds an exploit for the example in 1 step Key Point: Summarize loop effects Intuition: Why was ‘i’ not symbolic? –SPSE only tracks data dependencies –‘i’ was loop dependent Model loop-dependencies in addition to data dependencies Symbolic Data Dependencies Concrete Execution Symbolic Loop Dependencies SPSE LESE

11 Practical Requirements Applicable to binary code –Works for closed-sourced security-critical applications Should be simple –Scale to real-world programs Should not require manual function summaries –Need to reason about custom input processing –Hard to write summaries for certain functions (like sprintf)

12 Talk Outline Motivation Overview of LESE Technical Approach Results and Applications Conclusion & Related Work

13 Introduce a symbolic “trip count” for each loop –Symbolic variable representing the number of times a loop executes LESE has 2 steps –STEP 1: Derive relationship between program variables and trip counts »Linear Relationships –STEP 2: Relate trip counts to inputs Approach

14 Introducing Symbolic Trip Counts Introduces symbolic loop trip counts TC L1 TC L2 void process_request (char* input) { char URL [1024]; … for (ptr = 4; input [ptr] != ' '; ptr++) urlLen ++; … for (i = 0, p = 4; i < urlLen; i++) { ASSERT (i < 1024); URL [i] = input [p++]; }

15 void process_request (char* input) { char URL [1024]; … for (ptr = 4; input [ptr] != ' '; ptr++) urlLen ++; … for (i = 0, p = 4; i < urlLen; i ++) { ASSERT (i < 1024); URL [i] = input [p++]; } Step 1: Relating program variables to TCs Links trip counts to program variables ptr = 4 + TC L1 Symbolic Constraints urlLen = 0 + TC L1 i = -1 + TC L2 p = 4 + TC L2 (i < urlLen)

16 Inputs –Initial Concrete Test Case –A Grammar »Fields »Delimiters Implicitly models symbolic attributes for fields –Lengths of fields –Counts of repeated elements Available from off-the-shelf tools –Network application grammars in Wireshark, GAPA –Media file formats in Hachoir, GAPA –Can even be automatically inferred [CCS07,S&P09] Step 2: Relating Trip Counts to Input GET /index.html HTTP/1.1

17 Step 2: Link trip counts to input Link trip counts to the input grammar (F uri [0] ≠ ‘ ’) && (F uri [1] ≠ ‘ ’) && … (F uri [12] == ‘ ’) Len(F URL ) == TC L1 G Symbolic Constraints void process_request (char* input) { char URL [1024]; … for (ptr = 4; input [ptr] != ' '; ptr++) urlLen ++; … for (i = 0, p = 4; i < urlLen; i++) { ASSERT (i < 1024); URL [i] = input [p++]; }

18 (i < urlLen) Solve using a decision procedure Link trip counts to the input grammar Symbolic Constraints ptr = 4 + TC L1 urlLen = 0 + TC L1 i = -1 + TC L1 p = 4 + TC L1 (i < urlLen) Len(F URL ) == TC L1 ASSERT (i >= 1024) SOLVE void process_request (char* input) { char URL [1024]; … for (ptr = 4; input [ptr] != ' '; ptr++) urlLen ++; … for (i = 0, p = 4; i < urlLen; i++) { ASSERT (i < 1024); URL [i] = input [p++]; }

19 Solution: HTTP Server Example Solve constraints Exploit Condition Len(F URL ) > 1024 GET aaa.. (1025 times)… void process_request (char* input) { char URL [1024]; … for (ptr = 4; input [ptr] != ' '; ptr++) urlLen ++; … for (i = 0, p = 4; i < urlLen; i++) { ASSERT (i < 1024); URL [i] = input [p++]; }

20 Challenges Problems: –Identifying loop dependencies on binaries »Syntactic induction variable analysis insufficient –Capturing the inter-dependence between two loops »An induction variable of may influence trip counts of subsequent loops Our Solution –Dynamic abstract interpretation of x86 machine code –Symbolic memory model for buffer pointers only

21 Talk Outline Motivation Overview of LESE Technical Approach Results and Applications Conclusion & Related Work

22 Experimental Setup Program LESE Decision Procedure (STP) Initial Test Case No Error Candidate Exploits Validation

23 Results (I): Vulnerability Discovery On 14 benchmark applications (MIT Lincoln Labs) –Created from historic buffer overflows (BIND, sendmail, wuftp) Found 1 or more vulnerabilities in each benchmark –1 new exploit location in sendmail 7 benchmark

24 Results (II): Real-world Vulnerabilities Diagnosis and Discovery 3 Real-world Case Studies –SQL Server Resolution [Slammer Worm 2003] –GDI Windows Library [MS07-046] –Gaztek HTTP web Server Diagnosis Results –Results precise and field level Discovery Results: Found 4 buffer overflows in 6 candidates –1 new exploit location for Gaztek HTTP server

25 Results (III): Loop statistics Provides new symbolic conditions –Loop Conditions

26 Talk Outline Limitations of SPSE Overview of LESE Requirements Technical Approach Results and Applications Conclusion & Related Work

27 Applications –Finding buffer overflows [NDSS00, FSE04,CCS03,FSE03] –Signature Generation [S&P06,RAID09,SOSP07] Dynamic Symbolic Execution –Whitebox fuzzing,DART,EXE,KLEE,PEX [NDSS08,PLDI05,CCS06,ODSI08,TAP08] –Generalizations »Length Abstraction [ISSTA08] »Compositional [POPL07] »Constraint subsumption [NDSS08] »Grammar-based fuzzing [PLDI08] »Protocol-level constraints [RAID09] Related Work

28 Conclusion LESE is a generalization of SPSE –Captures effect of program inputs on loops –Summarizes the effect of loops on program variables Works for real-world Windows and Linux binaries Key enabler for several applications –Buffer overflow discovery and diagnosis »Capable of finding new bugs »Does not require manual function summaries Thanks… Any Questions, comments, or feedback?

29 Challenges(II) Problem: Scaling the symbolic analysis to real programs –Combining dynamic information with symbolic information –Our Solution: Concretization when abstract values are beyond our symbolic theory Dealing with x86 binaries –No notion of program variables, types –Aliasing –Memory layouts –Our Solution »Stack abstract interpretation »Runtime information