Dynamic Test Generation To Find Integer Bugs in x86 Binary Linux Programs David Molnar Xue Cong Li David Wagner.

Slides:



Advertisements
Similar presentations
Modular and Verified Automatic Program Repair Francesco Logozzo, Thomas Ball RiSE - Microsoft Research Redmond.
Advertisements

1 Symbolic Execution Kevin Wallace, CSE
Identity and Equality Based on material by Michael Ernst, University of Washington.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
1/20 Generalized Symbolic Execution for Model Checking and Testing Charngki PSWLAB Generalized Symbolic Execution for Model Checking and Testing.
TaintScope: A Checksum-Aware Directed Fuzzing Tool for Automatic Software Vulnerability Detection Tielei Wang 1, Tao Wei 1, Guofei Gu 2, Wei Zou 1 1 Peking.
David Brumley, Pongsin Poosankam, Dawn Song and Jiang Zheng Presented by Nimrod Partush.
CMSC 414 Computer and Network Security Lecture 22 Jonathan Katz.
Hybrid Concolic Testing Rupak Majumdar Koushik Sen UC Los Angeles UC Berkeley.
1 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Extended Whole Program Paths Sriraman Tallam Rajiv Gupta Xiangyu Zhang University of Arizona.
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
Stack-Based Buffer Overflows Attacker – Can take over a system remotely across a network. local malicious users – To elevate their privileges and gain.
CPSC Compiler Tutorial 8 Code Generator (unoptimized)
CS 536 Spring Run-time organization Lecture 19.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
1 Loop-Extended Symbolic Execution on Binary Programs Pongsin Poosankam ‡* Prateek Saxena * Stephen McCamant * Dawn Song * ‡ Carnegie Mellon University.
Chapter 7Louden, Programming Languages1 Chapter 7 - Control I: Expressions and Statements "Control" is the general study of the semantics of execution.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Address Obfuscation: An Efficient Approach to Combat a Broad Range of Memory Error Exploits Sandeep Bhatkar, Daniel C. DuVarney, and R. Sekar Stony Brook.
University of Washington CSE 351 : The Hardware/Software Interface Section 5 Structs as parameters, buffer overflows, and lab 3.
Binary Representation - Shortcuts n Negation x + x = 1111…1111 two = -1 (in 2’s complement) Therefore, -x = x + 1 n Sign Extension o Positive numbers :
Dr. Pedro Mejia Alvarez Software Testing Slide 1 Software Testing: Building Test Cases.
Vulnerability-Specific Execution Filtering (VSEF) for Exploit Prevention on Commodity Software Authors: James Newsome, James Newsome, David Brumley, David.
CREST Internal Yunho Kim Provable Software Laboratory CS Dept. KAIST.
Fundamentals of Python: From First Programs Through Data Structures Chapter 14 Linear Collections: Stacks.
Operator Precedence First the contents of all parentheses are evaluated beginning with the innermost set of parenthesis. Second all multiplications, divisions,
Outlines Chapter 3 –Chapter 3 – Loops & Revision –Loops while do … while – revision 1.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Computer Security and Penetration Testing
Copyright © 2009 Techtronics'09 by GCECT 1 Presents, De Code C De Code C is a C Programming competition, which challenges the participants to solve problems.
Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central Florida April 7, 2011 David Molnar UC Berkeley
Computer Science Detecting Memory Access Errors via Illegal Write Monitoring Ongoing Research by Emre Can Sezer.
Automated Whitebox Fuzz Testing Network and Distributed System Security (NDSS) 2008 by Patrice Godefroid, ‏Michael Y. Levin, and ‏David Molnar Present.
COMPUTER SECURITY MIDTERM REVIEW CS161 University of California BerkeleyApril 4, 2012.
Jose Sanchez 1 o Tielei Wang†, TaoWei†, Zhiqiang Lin‡, Wei Zou†. o Purdue University & Peking University o Proceedings of NDSS'09: Network and Distributed.
Problem of the Day  Why are manhole covers round?
Replay Compilation: Improving Debuggability of a Just-in Time Complier Presenter: Jun Tao.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
Execution of an instruction
jFuzz – Java based Whitebox Fuzzing
Java Basics Hussein Suleman March 2007 UCT Department of Computer Science Computer Science 1015F.
CSV 889: Concurrent Software Verification Subodh Sharma Indian Institute of Technology Delhi Scalable Symbolic Execution: KLEE.
Symbolic and Concolic Execution of Programs Information Security, CS 526 Omar Chowdhury 10/7/2015Information Security, CS 5261.
Sairajiv Burugapalli. This chapter covers three main categories of classic software vulnerability: Buffer overflows Integer vulnerabilities Format string.
Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software Paper by: James Newsome and Dawn Song.
CUTE: A Concolic Unit Testing Engine for C Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
Security Attacks Tanenbaum & Bo, Modern Operating Systems:4th ed., (c) 2013 Prentice-Hall, Inc. All rights reserved.
Random Test Generation of Unit Tests: Randoop Experience
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
On the Relation Between Simulation-based and SAT-based Diagnosis CMPE 58Q Giray Kömürcü Boğaziçi University.
CSE 351 Caches. Before we start… A lot of people confused lea and mov on the midterm Totally understandable, but it’s important to make the distinction.
Memory Protection through Dynamic Access Control Kun Zhang, Tao Zhang and Santosh Pande College of Computing Georgia Institute of Technology.
Secure Coding Rules for C++ Copyright © 2016 Curt Hill
YAHMD - Yet Another Heap Memory Debugger
Compiler Construction (CS-636)
RDE: Replay DEbugging for Diagnosing Production Site Failures
Secure Coding Rules for C++ Copyright © Curt Hill
Presented by Mahadevan Vasudevan + Microsoft , *UC-Berkeley
Chapter 9 :: Subroutines and Control Abstraction
High Coverage Detection of Input-Related Security Faults
CS 465 Buffer Overflow Slides by Kent Seamons and Tim van der Horst
Code Generation.
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.
Mid Term II Review.
CSC-682 Advanced Computer Security
CUTE: A Concolic Unit Testing Engine for C
IntScope: Automatically Detecting Integer overflow vulnerability in X86 Binary Using Symbolic Execution Tielei Wang, TaoWei, ZhingiangLin, weiZou Purdue.
靜夜思 床前明月光, 疑是地上霜。 舉頭望明月, 低頭思故鄉。 ~ 李白 李商隱.
Dynamic Binary Translators and Instrumenters
Presentation transcript:

Dynamic Test Generation To Find Integer Bugs in x86 Binary Linux Programs David Molnar Xue Cong Li David Wagner

Related-Static & Runtime check Poor fit with traditional runtime, static analysis – Static analysis: false positives – Runtime analysis: “benign” overflow problem many overflows are benign and harmless. Throwing 1. an exception in such cases prevents the application from functioning and thus causes false positives. 2. Occasionally the code intentionally relies upon overflow semantics; e.g., cryptographic code or fast hash functions.

Related-Dynamic SAGE: most related, dynamic test generation with bug-seeking queries for integer overflow, underflow, and some narrowing conversion errors. SmartFuzz(ours): looks at a wider range of narrowing conversion errors, and we consider signed/unsigned conversion while their work does not.

Related-Dynamic(2) KLEE: focuses on scaling dynamic test generation, but in a different way. KLEE focuses on high code coverage for over 450 smaller programs, as measured by trace size and source lines of code. Ours: focus on a few “large” programs KLEE & EXE: also use integer overflow to prioritize different test cases in dynamic test generation, but they do not break out results on the number of bugs found due to this heuristic. These previous works also do not address the problem of type inference for integer types in binary traces.

Related-Dynamic(3) IntScope : is a static binary analysis tool for finding integer overflow bugs [28]. IntScope translates binaries to an intermediate representation, then it checks lazily for potentially harmful integer overflows by using symbolic execution for data that flows into “taint sinks” defined by the tool, such as memory allocation functions. Smart-Fuzz: in contrast, eagerly attempts to generate new test cases that cause an integer bug at the point in the program where such behavior could occur. This difference is due in part to the fact that IntScope reports errors to a programmer directly, while SmartFuzz filters test cases using a tool such as memcheck. Finally, IntScope focuses only on integer overflow errors, while SmartFuzz covers underflow,narrowing conversion, and signed/unsigned conversion bugs in addition.

Related-Dynamic(4) BitBlaze [5]: Also performs symbolic execution of x86 binaries, but their focus is on malware and signature signature generation, not on test generation.

What’s new new methods for finding integer bugs in dynamic test generation Extended integer bugs bug bucketing-fuzzy stack hash: report techniques for reducing the amount of human time required to process test cases generated by fuzzing and improve the quality of our error reports to developers

Components SmartFuzz: test case generation, bug finding. Metafuzz: bug report

Architecture symbolic execution solving to obtain new test cases triage to determine whether to report a bug or score the test case for addition to the pool of unexplored test cases.

First, we add one or more test cases to a pool(seed). Each test case in the pool receives a score given by the number of new basic blocks seen when running the target program on the test case.

In each iteration of test generation, we choose a highscoring test case, execute the program on that input, and use symbolic execution to generate a set of constraints that record how each intermediate value computed by the program relates to the inputs(tainted) in the test case.

For each symbolic branch, SmartFuzz adds a constraint that tries to force the program down a different path. We then query the constraint solver to see whether there exists any solution to the resulting set of constraints; if there is, the solution describes a new test case. We refer to these as coverage queries to the constraint solver.

Queries coverage queries find new test case cover more basic blocks. bug-seeking queries SmartFuzz also injects constraints that are satisfied if a condition causing an error or potential error is satisfied (e.g., to force an arithmetic calculation to overflow).We then query the constraint solver; a solution describes a test case likely to cause an error. A single symbolic execution therefore leads to many coverage and bug-seeking queries to the constraint solver, which may result in many new test cases.

Determine if it exhibits a bug. If so, we report the bug; otherwise, we add the test case to the pool for scoring and possible symbolic execution. Valgrind memcheck on the target program with each test case, which is a tool that observes concrete execution looking for common programming errors [26]. We record any test case that causes the program to crash or triggers a memcheck warning.

Integer Bugs Overflow/Underflow Width Conversions Signed/Unsigned Conversions Memcpy: additional monitor “memcpy”’s length argument (unsigned constraint)

Overflow/Underflow char *badalloc(int sz, int n) { return (char *) malloc(sz * n); } If the multiplication sz * n overflows, the allocated buffer may be smaller than expected, which can lead to a buffer overflow later.

Width Conversions void badcpy(Int16 n, char *p, char *q) { UInt32 m = n; //n is negative memcpy(p, q, m); } Converting a value of one integral type to a wider (or narrower) integral type which has a different range of values can introduce width conversion bugs.

void bad(int x,char * src,char * dst) { if (x > 800) What if x == -1 ? { return; } else { copy_bytes(unsigned int x,... copy_bytes(x,src, dst); } Signed/Unsigned Conversion

memcpy SmartFuzz also add unsigned type constraints to values used as the length argument of memcpy function. which we can detect because we know the calling convention for x86 and we have debugging symbols for glibc.

Techniques for Finding Integer Bugs Idea: 1. Keep track of type for every tainted program value(For x86 binaries, the sources of type constraints are signed and unsigned comparison operators) 2. Use solver to force values with type “Bottom” to equal -1(negative values behave differently in signed and unsigned comparisons, and so they are likely to exhibit an error if one exists.) Unknown/Top Potential Bug/Bottom SignedUnsigned

Metafuzz: Triage and reporting at scale Existing report way: computed a stack hash as a hash of the sequence of instruction pointers in the backtrace. Problem: 1. it is sensitive to address space layout randomization (ASLR), because different runs of the same program may load the stack or dynamically linked libraries at different addresses, leading to different hash values for call stacks that are semantically the same. 2. a single bug might be triggered at multiple call stacks that were similar but not identical. For example,a buggy function can be called in several different places in the code. Each call site then yields a different stack hash. 3. any slight change to the target software can change instruction pointers and thus cause the same bug to receive a different stack hash.

Fuzzy stack hash The name of the function called, the line number in source code (debug symbol information) name of the object file for each frame in the call stack then hash all of this information for the three(trade-off) functions at the top of the call stack

Results Test Programs.: mplayer version SVN-r , ffmpeg version SVN-r16903, exiv2 version SVN-r1735, gzip version , bzip2 version 1.0.5, ImageMagick convert version , which are all widely used media and compression programs.

Size information of program source lines of code size of one of our seed files total number of branches x86 instructions Valgrind IR statements STP assert statements STP query statements

The number of each type of query for each test program after a single 24-hour run.

The number of bugs found, by query type, over all test runs. The fourth column shows the number of distinct bugs found from test cases produced by the given type of query, as classified using our fuzzy stack hash.

The number of bugs, after fuzzy stack hashing, found by SmartFuzz (the number on the left in each column) and zzuf (the number on the right). We also report the cost per bug, assuming $0:10 per small compute-hour, $0:40 per large compute-hour, and 3 runs of 24 hours each per target for each tool. Different bugs found by SmartFuzz and zzuf

Coverage metrics: the initial number of basic blocks, before testing; the number of blocks added during testing; and the percentage of blocks added. Zzuf added a higher percentage of new blocks than Smart-Fuzz in 13 of the test runs, while SmartFuzz added a higher percentage of new blocks in 4 of the test runs

The percentage of time spent in each of the phases of SmartFuzz.

Solver Statistics Solver optimization: just select the tainted variable. average query size before and after related constraint optimization for each test program.

ECD( empirical cumulative distribution) function of STP solver times over all our test runs. For about 70% of the test cases, the solver takes at most one second. The maximum solver time was about seconds.

Other Design Choice Intermediate Representation :on-the-fly Online Constraint Generation: not aware of off-line. Memory Model: concrete value->symbolic Only Tainted Data is Symbolic: small size Focus on Fuzzing Files: Single Threaded Program.

Conclusion new methods for finding integer bugs in dynamic test generation reported on our experiences building the web site metafuzz.com and using it to manage test case generation at scale. Smart-Fuzz finds bugs not found by zzuf and vice versa Can find integer bugs without the false positives inherent to static analysis or runtime checking approaches scale to commodity Linux media playing software

Thanks.