KLEE: Effective Testing of Systems Programs

Slides:

Advertisements

Similar presentations

1 Verification by Model Checking. 2 Part 1 : Motivation.

Advertisements

1 Automating the Generation of Mutation Tests Mike Papadakis and Nicos Malevris Department of Informatics Athens University of Economics and Business.

Cristian Cadar, Peter Boonstoppel, Dawson Engler RWset: Attacking Path Explosion in Constraint-Based Test Generation TACAS 2008, Budapest, Hungary ETAPS.

Leonardo de Moura Microsoft Research. Z3 is a new solver developed at Microsoft Research. Development/Research driven by internal customers. Free for.

Applications of Synchronization Coverage A.Bron,E.Farchi, Y.Magid,Y.Nir,S.Ur Tehila Mayzels 1.

Saumya Debray The University of Arizona Tucson, AZ

1 Symbolic Execution Kevin Wallace, CSE

Technology from seed Automatic Equivalence Checking of UF+IA Programs Nuno Lopes and José Monteiro.

Amal Khalil & Juergen Dingel

Finding bugs: Analysis Techniques & Tools Symbolic Execution & Constraint Solving CS161 Computer Security Cho, Chia Yuan.

Masahiro Fujita Yoshihisa Kojima University of Tokyo May 2, 2008

PLDI’2005Page 1June 2005 Example (C code) int double(int x) { return 2 * x; } void test_me(int x, int y) { int z = double(x); if (z==y) { if (y == x+10)

Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.

Parallel Symbolic Execution for Structural Test Generation Matt Staats Corina Pasareanu ISSTA 2010.

Model Counting >= Symbolic Execution Willem Visser Stellenbosch University Joint work with Matt Dwyer (UNL, USA) Jaco Geldenhuys (SU, RSA) Corina Pasareanu.

1 Symbolic Execution for Model Checking and Testing Corina Păsăreanu (Kestrel) Joint work with Sarfraz Khurshid (MIT) and Willem Visser (RIACS)

TaintScope: A Checksum-Aware Directed Fuzzing Tool for Automatic Software Vulnerability Detection Tielei Wang 1, Tao Wei 1, Guofei Gu 2, Wei Zou 1 1 Peking.

Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.

ASE’14, 18 th September 2014 This work is supported by Microsoft Research through its PhD Scholarship Programme Microsoft is a registered trademark of.

Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.

Automatic Generation of Inputs of Death and High-Coverage Tests Presented by Yoni Leibowitz EXE & KLEE.

ECE Synthesis & Verification1 ECE 667 Spring 2011 Synthesis and Verification of Digital Systems Verification Introduction.

DART Directed Automated Random Testing Patrice Godefroid, Nils Klarlund, and Koushik Sen Syed Nabeel.

Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San.

Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.

PathExpander: Architectural Support for Increasing the Path Coverage of Dynamic Bug Detection S. Lu, P. Zhou, W. Liu, Y. Zhou, J. Torrellas University.

Additional project suggestions Finish lecturing on BDDs Survey some symbolic analysis papers Ganesh, Week 8 CS 6110, Spring 2011.

Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.

Dynamic Test Generation To Find Integer Bugs in x86 Binary Linux Programs David Molnar Xue Cong Li David Wagner.

DART: Directed Automated Random Testing Koushik Sen University of Illinois Urbana-Champaign Joint work with Patrice Godefroid and Nils Klarlund.

CUTE: A Concolic Unit Testing Engine for C Technical Report Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.

Automatic Generation of Inputs of Death and High-Coverage Tests Presented by Yoni Leibowitz EXE & KLEE.

Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.

Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central Florida April 7, 2011 David Molnar UC Berkeley

Dept. of Computer and Information Sciences : University of Delaware John Cavazos Department of Computer and Information Sciences University of Delaware.

Aditya V. Nori, Sriram K. Rajamani Microsoft Research India.

Testing Testing Techniques to Design Tests. Testing:Example Problem: Find a mode and its frequency given an ordered list (array) of with one or more integer.

Replay Compilation: Improving Debuggability of a Just-in Time Complier Presenter: Jun Tao.

CSC 221: Recursion. Recursion: Definition Function that solves a problem by relying on itself to compute the correct solution for a smaller version of.

QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs By Koen Claessen, Juhn Hughes ME: Mike Izbicki.

Lazy Annotation for Program Testing and Verification Speaker: Chen-Hsuan Adonis Lin Advisor: Jie-Hong Roland Jiang November 26,

jFuzz – Java based Whitebox Fuzzing

EXE: Automatically Generating Inputs of Death Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, Dawson R. Engler 13th ACM conference on.

Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

“Isolating Failure Causes through Test Case Generation “ Jeremias Rößler Gordon Fraser Andreas Zeller Alessandro Orso Presented by John-Paul Ore.

Finding Errors in.NET with Feedback-Directed Random Testing Carlos Pacheco (MIT) Shuvendu Lahiri (Microsoft) Thomas Ball (Microsoft) July 22, 2008.

Design - programming Cmpe 450 Fall Dynamic Analysis Software quality Design carefully from the start Simple and clean Fewer errors Finding errors.

CSV 889: Concurrent Software Verification Subodh Sharma Indian Institute of Technology Delhi Scalable Symbolic Execution: KLEE.

Symbolic and Concolic Execution of Programs Information Security, CS 526 Omar Chowdhury 10/7/2015Information Security, CS 5261.

Using Symbolic PathFinder at NASA Corina Pãsãreanu Carnegie Mellon/NASA Ames.

CUTE: A Concolic Unit Testing Engine for C Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.

Combining Static and Dynamic Reasoning for Bug Detection Yannis Smaragdakis and Christoph Csallner Elnatan Reisner – April 17, 2008.

Lazy Annotation for Program Testing and Verification (Supplementary Materials) Speaker: Chen-Hsuan Adonis Lin Advisor: Jie-Hong Roland Jiang December 3,

Whole Test Suite Generation. Abstract Not all bugs lead to program crashes, and not always is there a formal specification to check the correctness of.

HW7: Due Dec 5th 23:59 1.Describe test cases to reach full path coverage of the triangle program by completing the path condition table below. Also, draw.

Random Test Generation of Unit Tests: Randoop Experience

Week 6 MondayTuesdayWednesdayThursdayFriday Testing III Reading due Group meetings Testing IVSection ZFR due ZFR demos Progress report due Readings out.

Shadow Shadow of a Doubt: Testing for Divergences Between Software Versions Hristina PalikarevaTomasz KuchtaCristian Cadar ICSE’16, 20 th May 2016 This.

Control Flow Testing Handouts

Outline of the Chapter Basic Idea Outline of Control Flow Testing

INTRODUCTION TO UNIX: The Shell Command Interface

RDE: Replay DEbugging for Diagnosing Production Site Failures

All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.

Automated Extraction of Inductive Invariants to Aid Model Checking

Automatic Test Generation SymCrete

Example (C code) int double(int x) { return 2 * x; }

GNU gcov gcov is a test coverage program running with gcc.

CUTE: A Concolic Unit Testing Engine for C

CSE 1020:Software Development

Model Checking and Its Applications

Presentation transcript:

KLEE: Effective Testing of Systems Programs Cristian Cadar Joint work with Daniel Dunbar and Dawson Engler April 16th, 2009 1

Writing Systems Code Is Hard Code complexity Tricky control flow Complex dependencies Abusive use of pointer operations Environmental dependencies Code has to anticipate all possible interactions Including malicious ones

KLEE Based on symbolic execution and constraint solving techniques [OSDI 2008, Best Paper Award] Based on symbolic execution and constraint solving techniques Automatically generates high coverage test suites Over 90% on average on ~160 user-level apps Finds deep bugs in complex systems programs Including higher-level correctness ones 3

Toy Example x =  x < 0 x < 0 x  0 x = 1234 return -x x = 1234 int bad_abs(int x) { if (x < 0) return –x; if (x == 1234) return x; } TRUE FALSE x < 0 x  0 x = 1234 return -x TRUE FALSE x = 1234 x  1234 x = -2 return -x return x test1.out x = 1234 x = 3 test2.out test3.out

Constraint Solver (STP) KLEE Architecture L V M LLVM bytecode C code x = -2 SYMBOLIC ENVIRONMENT K L E E x = 1234 x = 3 x  0 x  1234 x = 3 Constraint Solver (STP)

Outline Motivation Example and Basic Architecture Scalability Challenges Experimental Evaluation

Three Big Challenges Motivation Example and Basic Architecture Scalability Challenges Exponential number of paths Expensive constraint solving Interaction with environment Experimental Evaluation

Exponential Search Space Naïve exploration can easily get “stuck” Use search heuristics: Coverage-optimized search Select path closest to an uncovered instruction Favor paths that recently hit new code Random path search See [KLEE – OSDI’08]

Three Big Challenges Motivation Example and Basic Architecture Scalability Challenges Exponential number of paths Expensive constraint solving Interaction with environment Experimental Evaluation

Constraint Solving Dominates runtime Inherently expensive (NP-complete) Invoked at every branch Two simple and effective optimizations Eliminating irrelevant constraints Caching solutions Dramatic speedup on our benchmarks

Eliminating Irrelevant Constraints In practice, each branch usually depends on a small number of variables … if (x < 10) { } x + y > 10 z & -z = z x < 10 ?

Caching Solutions Static set of branches: lots of similar constraint sets 2  y < 100 x > 3 x + y > 10 x = 5 y = 15 2  y < 100 x + y > 10 Eliminating constraints cannot invalidate solution x = 5 y = 15 2  y < 100 x > 3 x + y > 10 x < 10 Adding constraints often does not invalidate solution x = 5 y = 15 UBTree data structure [Hoffman and Koehler, IJCAI ’99]

Dramatic Speedup Aggregated data over 73 applications Time (s) Executed instructions (normalized)

Three Big Challenges Motivation Example and Basic Architecture Scalability Challenges Exponential number of paths Expensive constraint solving Interaction with environment Experimental Evaluation

Environment: Calling Out Into OS int fd = open(“t.txt”, O_RDONLY); If all arguments are concrete, forward to OS Otherwise, provide models that can handle symbolic files Goal is to explore all possible legal interactions with the environment int fd = open(sym_str, O_RDONLY);

Environmental Modeling // actual implementation: ~50 LOC ssize_t read(int fd, void *buf, size_t count) { exe_file_t *f = get_file(fd); … memcpy(buf, f->contents + f->off, count) f->off += count; } Plain C code run by KLEE Users can extend/replace environment w/o any knowledge of KLEE internals Currently: effective support for symbolic command line arguments, files, links, pipes, ttys, environment vars

Does KLEE work? Motivation Example and Basic Architecture Scalability Challenges Evaluation Coverage results Bug finding Crosschecking

GNU Coreutils Suite Core user-level apps installed on many UNIX systems 89 stand-alone (i.e. excluding wrappers) apps (v6.10) File system management: ls, mkdir, chmod, etc. Management of system properties: hostname, printenv, etc. Text file processing : sort, wc, od, etc. … Variety of functions, different authors, intensive interaction with environment Heavily tested, mature code

Coreutils ELOC (incl. called lib) Number of applications Executable Lines of Code (ELOC)

Methodology Fully automatic runs Run KLEE one hour per utility, generate test cases Run test cases on uninstrumented version of utility Measure line coverage using gcov Coverage measurements not inflated by potential bugs in our tool

High Line Coverage (Coreutils, non-lib, 1h/utility = 89 h) Overall: 84%, Average 91%, Median 95% 16 at 100% Coverage (ELOC %) Apps sorted by KLEE coverage

Beats 15 Years of Manual Testing Avg/utility KLEE 91% Manual 68% Manual tests also check correctness KLEE coverage – Manual coverage Apps sorted by KLEE coverage – Manual coverage

Busybox Suite for Embedded Devices Overall: 91%, Average 94%, Median 98% 31 at 100% Coverage (ELOC %) Apps sorted by KLEE coverage

Busybox – KLEE vs. Manual Avg/utility KLEE 94% Manual 44% KLEE coverage – Manual coverage Apps sorted by KLEE coverage – Manual coverage

Does KLEE work? Motivation Example and Basic Architecture Scalability Challenges Evaluation Coverage results Bug finding Crosschecking

GNU Coreutils Bugs Ten crash bugs More crash bugs than approx last three years combined KLEE generates actual command lines exposing crashes

Ten command lines of death md5sum -c t1.txt mkdir -Z a b mkfifo -Z a b mknod -Z a b p seq -f %0 1 pr -e t2.txt tac -r t3.txt t3.txt paste -d\\ abcdefghijklmnopqrstuvwxyz ptx -F\\ abcdefghijklmnopqrstuvwxyz ptx x t4.txt t1.txt: \t \tMD5( t2.txt: \b\b\b\b\b\b\b\t t3.txt: \n t4.txt: A

Does KLEE work? Motivation Example and Basic Architecture Scalability Challenges Evaluation Coverage results Bug finding Crosschecking

Finding Correctness Bugs KLEE can prove asserts on a per path basis Constraints have no approximations An assert is just a branch, and KLEE proves feasibility/infeasibility of each branch it reaches If KLEE determines infeasibility of false side of assert, the assert was proven on the current path

Crosschecking Assume f(x) and f’(x) implement the same interface Make input x symbolic Run KLEE on assert(f(x) == f’(x)) For each explored path: KLEE terminates w/o error: paths are equivalent KLEE terminates w/ error: mismatch found Coreutils vs. Busybox: UNIX utilities should conform to IEEE Std.1003.1 Crosschecked pairs of Coreutils and Busybox apps Verified paths, found mismatches

Mismatches Found Input Busybox Coreutils t1.txt: a t2.txt: b tee "" <t1.txt [infinite loop] [terminates] tee - [copies once to stdout] [copies twice] comm t1.txt t2.txt [doesn’t show diff] [shows diff] cksum / "4294967295 0 /" "/: Is a directory" split / tr [duplicates input] "missing operand" [ 0 ‘‘<’’ 1 ] "binary op. expected" tail –2l [rejects] [accepts] unexpand –f split – t1.txt: a t2.txt: b (no newlines!)

Related Work Very active area of research. E.g.: EGT / EXE / KLEE [Stanford] DART [Bell Labs] CUTE [UIUC] SAGE, Pex [MSR Redmond] Vigilante [MSR Cambridge] BitScope [Berkeley/CMU] CatchConv [Berkeley] JPF [NASA Ames] KLEE Hundred distinct benchmarks Extensive coverage numbers Symbolic crosschecking Environment support

Effective Testing of Systems Programs KLEE Effective Testing of Systems Programs KLEE can effectively: Generate high coverage test suites Over 90% on average on ~160 user-level applications Find deep bugs in complex software Including higher-level correctness bugs, via crosschecking