Presentation is loading. Please wait.

Presentation is loading. Please wait.

DART and CUTE: Concolic Testing

Similar presentations


Presentation on theme: "DART and CUTE: Concolic Testing"— Presentation transcript:

1 DART and CUTE: Concolic Testing
Koushik Sen University of California, Berkeley Joint work with Gul Agha, Patrice Godefroid, Nils Klarlund, Rupak Majumdar, Darko Marinov 9/18/2018

2 Today, QA is mostly testing
“50% of my company employees are testers, and the rest spends 50% of their time testing!” Bill Gates 1995

3 Software Reliability Importance
Software is pervading every aspects of life Software everywhere Software failures were estimated to cost the US economy about $60 billion annually [NIST 2002] Improvements in software testing infrastructure may save one-third of this cost Testing accounts for an estimated 50%-80% of the cost of software development [Beizer 1990] So, let me start by talking about the motivation of the problem. So, why is software reliability important. A 2002 report from NIST claims that software failures cost US economy an estimated 60 billion dollars annually. Improvements in software testing infra-structure may save one-third of this. It is not that we are not trying to address the problem. Testing currently accounts for an estimated 50%-80% of the total software development cost. So, software developers are trying to address this problem, but our methods and tools are not good enough..

4 Big Picture Testing Safe Programming Languages and Type systems
Model Checking and Theorem Proving Runtime Monitoring Static Program Analysis Dynamic Program Analysis There are several approaches that facilitate software reliability: type systems and safe programming languages, static analysis, software model checking, runtime monitoring, testing and so on… these methods help, but they have limitations. For example, testing is mostly done ad-hoc by manually translating specifications into test cases. Safe programming languages are good at preventing many common bugs, but they are normally not as efficient as C. Model Based Software Development and Analysis

5 Big Picture Testing Dynamic Program Analysis
Safe Programming Languages and Type systems Testing Model Checking and Theorem Proving Runtime Monitoring Static Program Analysis Dynamic Program Analysis There are several approaches that facilitate software reliability: type systems and safe programming languages, static analysis, software model checking, runtime monitoring, testing and so on… these methods help, but they have limitations. For example, testing is mostly done ad-hoc by manually translating specifications into test cases. Safe programming languages are good at preventing many common bugs, but they are normally not as efficient as C. Model Based Software Development and Analysis

6 A Familiar Program: QuickSort
void quicksort (int[] a, int lo, int hi) { int i=lo, j=hi, h; int x=a[(lo+hi)/2]; // partition do { while (a[i]<x) i++; while (a[j]>x) j--; if (i<=j) { h=a[i]; a[i]=a[j]; a[j]=h; i++; j--; } } while (i<=j); // recursion if (lo<j) quicksort(a, lo, j); if (i<hi) quicksort(a, i, hi);

7 A Familiar Program: QuickSort
void quicksort (int[] a, int lo, int hi) { int i=lo, j=hi, h; int x=a[(lo+hi)/2]; // partition do { while (a[i]<x) i++; while (a[j]>x) j--; if (i<=j) { h=a[i]; a[i]=a[j]; a[j]=h; i++; j--; } } while (i<=j); // recursion if (lo<j) quicksort(a, lo, j); if (i<hi) quicksort(a, i, hi); Test QuickSort Create an array Initialize the elements of the array Execute the program on this array How much confidence do I have in this testing method? Is my test suite *Complete*? Can someone generate a small and *Complete* test suite for me?

8 Automated Test Generation
Studied since 70’s King 76, Myers 79 30 years have passed, and yet no effective solution What Happened???

9 Automated Test Generation
Studied since 70’s King 76, Myers 79 30 years have passed, and yet no effective solution What Happened??? Program-analysis techniques were expensive Automated theorem proving and constraint solving techniques were not efficient

10 Automated Test Generation
Studied since 70’s King 76, Myers 79 30 years have passed, and yet no effective solution What Happened??? Program-analysis techniques were expensive Automated theorem proving and constraint solving techniques were not efficient In the recent years we have seen remarkable progress in static program-analysis and constraint solving SLAM, BLAST, ESP, Bandera, Saturn, MAGIC

11 Automated Test Generation
Studied since 70’s King 76, Myers 79 30 years have passed, and yet no effective solution What Happened??? Program-analysis techniques were expensive Automated theorem proving and constraint solving techniques were not efficient In the recent years we have seen remarkable progress in static program-analysis and constraint solving SLAM, BLAST, ESP, Bandera, Saturn, MAGIC Question: Can we use similar techniques in Automated Testing?

12 Systematic Automated Testing
Model-Checking and Formal Methods Static Analysis Testing There are several approaches that facilitate software reliability: type systems and safe programming languages, static analysis, software model checking, runtime monitoring, testing and so on… these methods help, but they have limitations. For example, testing is mostly done ad-hoc by manually translating specifications into test cases. Safe programming languages are good at preventing many common bugs, but they are normally not as efficient as C. Constraint Solving Automated Theorem Proving

13 Concrete + Symbolic = Concolic
CUTE and DART Combine random testing (concrete execution) and symbolic testing (symbolic execution) [PLDI’05, FSE’05, FASE’06, CAV’06,ISSTA’07, ICSE’07] Concrete + Symbolic = Concolic I proposed an approach called concolic testing where we combine random testing and symbolic testing in a co-operative way. Concolic testing works for sequential programs. Let me illustrate concolic testing with the running example. Then I’ll describe its extension to concurrent programs. Initially I started working on this idea when I was in Lucent two years ago. Since then I extended the method to handle pointers, data structures, and concurrency.

14 Goal Automated Unit Testing of real-world C and Java Programs
Generate test inputs Execute unit under test on generated test inputs so that all reachable statements are executed Any assertion violation gets caught

15 Goal Automated Unit Testing of real-world C and Java Programs
Generate test inputs Execute unit under test on generated test inputs so that all reachable statements are executed Any assertion violation gets caught Our Approach: Explore all execution paths of an Unit for all possible inputs Exploring all execution paths ensure that all reachable statements are executed

16 Execution Paths of a Program
Can be seen as a binary tree with possibly infinite depth Computation tree Each node represents the execution of a “if then else” statement Each edge represents the execution of a sequence of non-conditional statements Each path in the tree represents an equivalence class of inputs 1

17 Example of Computation Tree
void test_me(int x, int y) { if(2*x==y){ if(x != y+10){ printf(“I am fine here”); } else { printf(“I should not reach here”); ERROR; } 2*x==y x!=y+10 N Y N Y ERROR

18 Concolic Testing: Finding Security and Safety Bugs
Divide by 0 Error x = 3 / i; Buffer Overflow a[i] = 4;

19 Concolic Testing: Finding Security and Safety Bugs
Key: Add Checks Automatically and Perform Concolic Testing Divide by 0 Error if (i !=0) x = 3 / i; else ERROR; Buffer Overflow if (0<=i && i < a.length) a[i] = 4; else ERROR;

20 Existing Approach I Random testing
generate random inputs execute the program on generated inputs Probability of reaching an error can be astronomically less test_me(int x){ if(x==94389){ ERROR; } Probability of hitting ERROR = 1/232

21 Existing Approach II Symbolic Execution
use symbolic values for input variables execute the program symbolically on symbolic input values collect symbolic path constraints use theorem prover to check if a branch can be taken Does not scale for large programs test_me(int x){ if((x%10)*4!=17){ ERROR; } else { } Symbolic execution will say both branches are reachable: False positive

22 Existing Approach II Symbolic Execution
use symbolic values for input variables execute the program symbolically on symbolic input values collect symbolic path constraints use theorem prover to check if a branch can be taken Does not scale for large programs test_me(int x){ if(bbox(x)!=17){ ERROR; } else { } Symbolic execution will say both branches are reachable: False positive

23 Concolic Testing Approach
int double (int v) { return 2*v; } void testme (int x, int y) { z = double (y); if (z == x) { if (x > y+10) { ERROR; Random Test Driver: random value for x and y Probability of reaching ERROR is extremely low Consider testing the running example. We want to test the function testme, which has inputs x and y. As in random testing, if we generate random values for the inputs p and x, then the probability of reaching ERROR is extremely low.

24 Concolic Testing Approach
Concrete Execution Symbolic Execution concrete state symbolic state path condition int double (int v) { return 2*v; } void testme (int x, int y) { z = double (y); if (z == x) { if (x > y+10) { ERROR; x = 22, y = 7 x = x0, y = y0 In concolic testing approach we do the following. Rather than executing the program simply concretely as in random testing, we execute the program both concretely and symbolically simultaneously. To do so we create two states: a concrete state used by the normal concrete execution of the program and a symbolic state manipulated by the symbolic execution. For the concrete execution we initialize the inputs x and y with random concrete values. For the symbolic execution we initialize the inputs with the symbols x0 and y0. This arrow represents the program counter diagrammatically.

25 Concolic Testing Approach
Concrete Execution Symbolic Execution concrete state symbolic state path condition int double (int v) { return 2*v; } void testme (int x, int y) { z = double (y); if (z == x) { if (x > y+10) { ERROR; x = 22, y = 7, z = 14 x = x0, y = y0, z = 2*y0 Now we execute the program both symbolically and concretely. Note that we updated both the concrete and symbolic value of z.

26 Concolic Testing Approach
Concrete Execution Symbolic Execution concrete state symbolic state path condition int double (int v) { return 2*v; } void testme (int x, int y) { z = double (y); if (z == x) { if (x > y+10) { ERROR; 2*y0 != x0 As we execute the program we generate a symbolic constraint corresponding to the execution of each conditional statement. The conjunction of the symbolic constraints generated at the end of the execution is called path condition. For example, the conjunction of these two constraints represent the path condition. Any values of the inputs x and y that satisfy the path condition would force the program along the same path. Our next goal is to generate an input that would force the program execution along a different path. x = 22, y = 7, z = 14 x = x0, y = y0, z = 2*y0

27 Concolic Testing Approach
Concrete Execution Symbolic Execution concrete state symbolic state path condition int double (int v) { return 2*v; } void testme (int x, int y) { z = double (y); if (z == x) { if (x > y+10) { ERROR; Solve: 2*y0 == x0 Solution: x0 = 2, y0 = 1 2*y0 != x0 For that we pick a constraint, negate it, and solve the resultant path condition to generate a new input. For example here we generate this input where x = 2 and y = 1. x = 22, y = 7, z = 14 x = x0, y = y0, z = 2*y0

28 Concolic Testing Approach
Concrete Execution Symbolic Execution concrete state symbolic state path condition int double (int v) { return 2*v; } void testme (int x, int y) { z = double (y); if (z == x) { if (x > y+10) { ERROR; x = 2, y = 1 x = x0, y = y0 We repeat the process of concolic execution, but now with the newly generated input.

29 Concolic Testing Approach
Concrete Execution Symbolic Execution concrete state symbolic state path condition int double (int v) { return 2*v; } void testme (int x, int y) { z = double (y); if (z == x) { if (x > y+10) { ERROR; x = 2, y = 1, z = 2 x = x0, y = y0, z = 2*y0

30 Concolic Testing Approach
Concrete Execution Symbolic Execution concrete state symbolic state path condition int double (int v) { return 2*v; } void testme (int x, int y) { z = double (y); if (z == x) { if (x > y+10) { ERROR; 2*y0 == x0 x = 2, y = 1, z = 2 x = x0, y = y0, z = 2*y0 In this example the symbolic constraints are simple arithmetic constraints. However, in general our symbolic constraints also incorporate pointer constraints, which are simple equality and dis-equality constraints over symbolic pointer values. So we can effectively handle programs with complex data structures.

31 Concolic Testing Approach
Concrete Execution Symbolic Execution concrete state symbolic state path condition int double (int v) { return 2*v; } void testme (int x, int y) { z = double (y); if (z == x) { if (x > y+10) { ERROR; 2*y0 == x0 x0 · y0+10 Again at the end of the execution we have a path condition. x = 2, y = 1, z = 2 x = x0, y = y0, z = 2*y0

32 Concolic Testing Approach
Concrete Execution Symbolic Execution concrete state symbolic state path condition int double (int v) { return 2*v; } void testme (int x, int y) { z = double (y); if (z == x) { if (x > y+10) { ERROR; Solve: (2*y0 == x0) Æ (x0 > y0 + 10) Solution: x0 = 30, y0 = 15 2*y0 == x0 x0 · y0+10 We negate a constraint and solve the path condition to generate a new input. x = 2, y = 1, z = 2 x = x0, y = y0, z = 2*y0

33 Concolic Testing Approach
Concrete Execution Symbolic Execution concrete state symbolic state path condition int double (int v) { return 2*v; } void testme (int x, int y) { z = double (y); if (z == x) { if (x > y+10) { ERROR; x = 30, y = 15 x = x0, y = y0 Finally we generate this input.

34 Concolic Testing Approach
Concrete Execution Symbolic Execution concrete state symbolic state path condition int double (int v) { return 2*v; } void testme (int x, int y) { z = double (y); if (z == x) { if (x > y+10) { ERROR; Program Error 2*y0 == x0 x0 > y0+10 In fact this input leads the program to the error. x = 30, y = 15 x = x0, y = y0

35 Explicit Path (not State) Model Checking
Traverse all execution paths one by one to detect errors assertion violations program crash uncaught exceptions combine with valgrind to discover memory errors F T So in this approach we tried to explore all possible paths in the computation tree of the program under test using a depth first search strategy.

36 Explicit Path (not State) Model Checking
Traverse all execution paths one by one to detect errors assertion violations program crash uncaught exceptions combine with valgrind to discover memory errors F T Since we explore all paths rather than states of the program, we also call it explicit path model checking.

37 Explicit Path (not State) Model Checking
Traverse all execution paths one by one to detect errors assertion violations program crash uncaught exceptions combine with valgrind to discover memory errors F T

38 Explicit Path (not State) Model Checking
Traverse all execution paths one by one to detect errors assertion violations program crash uncaught exceptions combine with valgrind to discover memory errors F T

39 Explicit Path (not State) Model Checking
Traverse all execution paths one by one to detect errors assertion violations program crash uncaught exceptions combine with valgrind to discover memory errors F T

40 Explicit Path (not State) Model Checking
Traverse all execution paths one by one to detect errors assertion violations program crash uncaught exceptions combine with valgrind to discover memory errors F T

41 Novelty : Simultaneous Concrete and Symbolic Execution
Concrete Execution Symbolic Execution concrete state symbolic state path condition int foo (int v) { return (v*v) % 50; } void testme (int x, int y) { z = foo (y); if (z == x) { if (x > y+10) { ERROR; x = 22, y = 7 x = x0, y = y0 There is a complication here: sometimes the symbolic constraints can become very complex and cannot be solve by our constraint solver. This also happens if we have black box library functions. In such situations we simply replace some of the symbolic values by concrete values available from the concrete state. This is sound because concrete values are instantiations of symbolic values. However, we may lose completeness. Nevertheless, this way of replacing some symbolic values by concrete values helps concolic testing to scale for large programs for which symbolic testing would have otherwise failed.

42 Novelty : Simultaneous Concrete and Symbolic Execution
Concrete Execution Symbolic Execution concrete state symbolic state path condition int foo (int v) { return (v*v) % 50; } void testme (int x, int y) { z = foo (y); if (z == x) { if (x > y+10) { ERROR; Solve: (y0*y0 )%50 == x0 Don’t know how to solve! Stuck? (y0*y0)%50 !=x0 x = 22, y = 7, z = 49 x = x0, y = y0, z = (y0 *y0)%50

43 Novelty : Simultaneous Concrete and Symbolic Execution
Concrete Execution Symbolic Execution concrete state symbolic state path condition void testme (int x, int y) { z = foo (y); if (z == x) { if (x > y+10) { ERROR; } Solve: foo (y0) == x0 Don’t know how to solve! Stuck? foo (y0) !=x0 x = 22, y = 7, z = 49 x = x0, y = y0, z = foo (y0)

44 Novelty : Simultaneous Concrete and Symbolic Execution
Concrete Execution Symbolic Execution concrete state symbolic state path condition int foo (int v) { return (v*v) % 50; } void testme (int x, int y) { z = foo (y); if (z == x) { if (x > y+10) { ERROR; Solve: (y0*y0 )%50 == x0 Don’t know how to solve! Not Stuck! Use concrete state Replace y0 by 7 (sound) (y0*y0)%50 !=x0 x = 22, y = 7, z = 49 x = x0, y = y0, z = (y0 *y0)%50

45 Novelty : Simultaneous Concrete and Symbolic Execution
Concrete Execution Symbolic Execution concrete state symbolic state path condition int foo (int v) { return (v*v) % 50; } void testme (int x, int y) { z = foo (y); if (z == x) { if (x > y+10) { ERROR; Solve: 49 == x0 Solution : x0 = 49, y0 = 7 49 !=x0 x = 22, y = 7, z = 48 x = x0, y = y0, z = 49

46 Novelty : Simultaneous Concrete and Symbolic Execution
Concrete Execution Symbolic Execution concrete state symbolic state path condition int foo (int v) { return (v*v) % 50; } void testme (int x, int y) { z = foo (y); if (z == x) { if (x > y+10) { ERROR; x = 49, y = 7 x = x0, y = y0

47 Novelty : Simultaneous Concrete and Symbolic Execution
Concrete Execution Symbolic Execution concrete state symbolic state path condition int foo (int v) { return (v*v) % 50; } void testme (int x, int y) { z = foo (y); if (z == x) { if (x > y+10) { ERROR; Program Error 2*y0 == x0 x0 > y0+10 x = 49, y = 7, z = 49 x = x0, y = y0 , z = 49

48 Concolic Testing: A Middle Approach
Random Testing Symbolic Testing Concolic Testing + Complex programs +/- Somewhat efficient + High coverage + No false positive + Complex programs + Efficient - Less coverage + No false positive - Simple programs - Not efficient + High coverage - False positive In fact we can see concolic testing as a middle path between random testing and symbolic testing. Concolic testing like random testing can handle large programs. At the same time it increases the code coverage due to the underlying symbolic testing. Moreover, unlike symbolic testing we do not get false positives. In fact I’ll show some case studies which establishes this fact.

49 Implementations DART and CUTE for C programs jCUTE for Java programs
Goto for CUTE and jCUTE binaries MSR has four implementations SAGE, PEX, YOGI, Vigilante Similar tool: EXE at Stanford Easiest way to use and to develop on top of CUTE Implement concolic testing yourself

50 Testing Data Structures (joint work with Darko Marinov and Gul Agha

51 Example Random Test Driver: random memory graph reachable from p
typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; Random Test Driver: random memory graph reachable from p random value for x Probability of reaching abort( ) is extremely low

52 CUTE Approach Concrete Execution Symbolic Execution
typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; concrete state symbolic state constraints p=p0, x=x0 p , x=236 NULL

53 CUTE Approach Concrete Execution Symbolic Execution
typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; concrete state symbolic state constraints p=p0, x=x0 p , x=236 NULL

54 CUTE Approach Concrete Execution Symbolic Execution
typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; concrete state symbolic state constraints x0>0 p=p0, x=x0 p , x=236 NULL

55 CUTE Approach Concrete Execution Symbolic Execution
typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; concrete state symbolic state constraints x0>0 !(p0!=NULL) p=p0, x=x0 p , x=236 NULL

56 solve: x0>0 and p0NULL
CUTE Approach Concrete Execution Symbolic Execution typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; concrete state symbolic state constraints solve: x0>0 and p0NULL x0>0 p0=NULL p=p0, x=x0 p , x=236 NULL

57 solve: x0>0 and p0NULL
CUTE Approach Concrete Execution Symbolic Execution typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; concrete state symbolic state constraints solve: x0>0 and p0NULL x0=236, p0 NULL 634 x0>0 p0=NULL p=p0, x=x0 p , x=236 NULL

58 p=p0, x=x0, p->v =v0, p->next=n0
CUTE Approach Concrete Execution Symbolic Execution typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; concrete state symbolic state constraints p=p0, x=x0, p->v =v0, p->next=n0 p , x=236 NULL 634

59 p=p0, x=x0, p->v =v0, p->next=n0
CUTE Approach Concrete Execution Symbolic Execution typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; concrete state symbolic state constraints p=p0, x=x0, p->v =v0, p->next=n0 p , x=236 NULL 634 x0>0

60 p=p0, x=x0, p->v =v0, p->next=n0
CUTE Approach Concrete Execution Symbolic Execution typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; concrete state symbolic state constraints x0>0 p=p0, x=x0, p->v =v0, p->next=n0 p , x=236 NULL 634 p0NULL

61 p=p0, x=x0, p->v =v0, p->next=n0
CUTE Approach Concrete Execution Symbolic Execution typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; concrete state symbolic state constraints x0>0 p0NULL p=p0, x=x0, p->v =v0, p->next=n0 p , x=236 NULL 634 2x0+1v0

62 p=p0, x=x0, p->v =v0, p->next=n0
CUTE Approach Concrete Execution Symbolic Execution typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; concrete state symbolic state constraints x0>0 p0NULL 2x0+1v0 p=p0, x=x0, p->v =v0, p->next=n0 p , x=236 NULL 634

63 CUTE Approach Concrete Execution Symbolic Execution
typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; concrete state symbolic state constraints solve: x0>0 and p0NULL and 2x0+1=v0 x0>0 p0NULL 2x0+1v0 p=p0, x=x0, p->v =v0, p->next=n0 p , x=236 NULL 634

64 CUTE Approach Concrete Execution Symbolic Execution
typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; concrete state symbolic state constraints solve: x0>0 and p0NULL and 2x0+1=v0 x0=1, p0 NULL 3 x0>0 p0NULL 2x0+1v0 p=p0, x=x0, p->v =v0, p->next=n0 p , x=236 NULL 634

65 p=p0, x=x0, p->v =v0, p->next=n0
CUTE Approach Concrete Execution Symbolic Execution typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; concrete state symbolic state constraints p=p0, x=x0, p->v =v0, p->next=n0 p , x=1 NULL 3

66 p=p0, x=x0, p->v =v0, p->next=n0
CUTE Approach Concrete Execution Symbolic Execution typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; concrete state symbolic state constraints p=p0, x=x0, p->v =v0, p->next=n0 p , x=1 NULL 3 x0>0

67 p=p0, x=x0, p->v =v0, p->next=n0
CUTE Approach Concrete Execution Symbolic Execution typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; concrete state symbolic state constraints x0>0 p=p0, x=x0, p->v =v0, p->next=n0 p , x=1 NULL 3 p0NULL

68 p=p0, x=x0, p->v =v0, p->next=n0
CUTE Approach Concrete Execution Symbolic Execution typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; concrete state symbolic state constraints x0>0 p0NULL p=p0, x=x0, p->v =v0, p->next=n0 p , x=1 NULL 3 2x0+1=v0

69 p=p0, x=x0, p->v =v0, p->next=n0
CUTE Approach Concrete Execution Symbolic Execution typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; concrete state symbolic state constraints x0>0 p0NULL p=p0, x=x0, p->v =v0, p->next=n0 p , x=1 NULL 3 2x0+1=v0 n0p0

70 p=p0, x=x0, p->v =v0, p->next=n0
CUTE Approach Concrete Execution Symbolic Execution typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; concrete state symbolic state constraints x0>0 p0NULL 2x0+1=v0 n0p0 p=p0, x=x0, p->v =v0, p->next=n0 p , x=1 NULL 3

71 CUTE Approach Concrete Execution Symbolic Execution
typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; concrete state symbolic state constraints solve: x0>0 and p0NULL and 2x0+1=v0 and n0=p0 . x0>0 p0NULL 2x0+1=v0 n0p0 p=p0, x=x0, p->v =v0, p->next=n0 p , x=1 NULL 3

72 CUTE Approach Concrete Execution Symbolic Execution
typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; concrete state symbolic state constraints solve: x0>0 and p0NULL and 2x0+1=v0 and n0=p0 x0=1, p0 x0>0 p0NULL 3 2x0+1=v0 n0p0 p=p0, x=x0, p->v =v0, p->next=n0 p , x=1 NULL 3

73 p=p0, x=x0, p->v =v0, p->next=n0
CUTE Approach Concrete Execution Symbolic Execution typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; concrete state symbolic state constraints p , x=1 p=p0, x=x0, p->v =v0, p->next=n0 3

74 p=p0, x=x0, p->v =v0, p->next=n0
CUTE Approach Concrete Execution Symbolic Execution typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; concrete state symbolic state constraints p=p0, x=x0, p->v =v0, p->next=n0 p , x=1 3 x0>0

75 p=p0, x=x0, p->v =v0, p->next=n0
CUTE Approach Concrete Execution Symbolic Execution typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; concrete state symbolic state constraints x0>0 p=p0, x=x0, p->v =v0, p->next=n0 p , x=1 3 p0NULL

76 p=p0, x=x0, p->v =v0, p->next=n0
CUTE Approach Concrete Execution Symbolic Execution typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; concrete state symbolic state constraints x0>0 p0NULL p=p0, x=x0, p->v =v0, p->next=n0 p , x=1 3 2x0+1=v0

77 p=p0, x=x0, p->v =v0, p->next=n0
CUTE Approach Concrete Execution Symbolic Execution typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; concrete state symbolic state constraints Program Error x0>0 p0NULL p=p0, x=x0, p->v =v0, p->next=n0 p , x=1 3 2x0+1=v0 n0=p0

78 CUTE in a Nutshell Generate concrete inputs one by one
each input leads program along a different path

79 CUTE in a Nutshell Generate concrete inputs one by one
each input leads program along a different path On each input execute program both concretely and symbolically

80 CUTE in a Nutshell Generate concrete inputs one by one
each input leads program along a different path On each input execute program both concretely and symbolically Both cooperate with each other concrete execution guides the symbolic execution

81 CUTE in a Nutshell Generate concrete inputs one by one
each input leads program along a different path On each input execute program both concretely and symbolically Both cooperate with each other concrete execution guides the symbolic execution concrete execution enables symbolic execution to overcome incompleteness of theorem prover replace symbolic expressions by concrete values if symbolic expressions become complex resolve aliases for pointer using concrete values handle arrays naturally

82 CUTE in a Nutshell Generate concrete inputs one by one
each input leads program along a different path On each input execute program both concretely and symbolically Both cooperate with each other concrete execution guides the symbolic execution concrete execution enables symbolic execution to overcome incompleteness of theorem prover replace symbolic expressions by concrete values if symbolic expressions become complex resolve aliases for pointer using concrete values handle arrays naturally symbolic execution helps to generate concrete input for next execution increases coverage

83 Handling Pointer Casting and Arithmetic
struct foo { int i; char c; }; void * memset(void *s,char c,size_t n){ for(int i=0;i<n;i++){ ((char *)s)[i] = c; } return s; } bar(struct foo *a){ if(a && a->c==1){ memset(a,0,sizeof(struct foo)); if(a->c!= 1) { ERROR; Reasoning about dynamic data is easy Due to limitation of alias analysis “static analyzers” cannot determine that “a->c” has been rewritten BLAST would infer that the program is safe practical CUTE finds the error

84 Library Functions With Side-effects
struct foo { int i; char c; }; bar(struct foo *a){ if(a && a->c==1){ memset(a,0,sizeof(struct foo)); if(a->c== 1) { ERROR; } Reasoning about function with side-effects Sound static tool will give false alarm CUTE will not report the error because it cannot generate a concrete input that will make the program hit ERROR.

85 Underlying Random Testing Helps
1 foobar(int x, int y){ 2 if (x*x*x > 0){ if (x>0 && y==10){ ERROR; } 6 } else { if (x>0 && y==20){ ERROR; } 10 } 11 } static analysis based model-checkers would consider both branches both ERROR statements are reachable false alarm Symbolic execution gets stuck at line number 2 or warn that both ERRORs are reachable CUTE finds the only error

86 Data-structure Testing
Solving Data-structure Invariants int isSortedSlist(slist * head) { slist * cur, *tmp; int i,j; if (head == 0) return 1; i=j=0; for (cur = head; cur!=0; cur = cur->next){ i++; j=1; for (tmp = head; j<i; tmp = tmp->next){ j++; if(cur==tmp) return 0; } for (cur = head; cur->next!=0; cur = cur->next){ if(cur->i > cur->_next->i) return 0; return 1; testme(slist *L,slist *e){ CUTE_assume(isSortedSlist(L)); sglib_slist_add(&L,e); CUTE_assert(isSortedSlist(L)); }

87 Data-structure Testing
Generating Call Sequence for (i=1; i<10; i++) { CU_input(toss); CU_input(e); switch(toss){ case 2: sglib_hashed_ilist_add_if_not_member(htab,e,&m); break; case 3: sglib_hashed_ilist_delete_if_member(htab,e,&m); break; case 4: sglib_hashed_ilist_delete(htab,e); break; case 5: sglib_hashed_ilist_is_member(htab,e); break; case 6: sglib_hashed_ilist_find_member(htab,e); break; }

88 Instrumentation

89 Instrumented Program Maintains a symbolic state
maps each memory location used by the program to symbolic expressions over symbolic input variables Maintains a path constraint a sequence of constraints C1, C2,…, Ck such that Ci s are symbolic constraints over symbolic input variables C1 Æ C2 Æ … Æ Ck holds over the path currently executed by the program

90 Instrumented Program Maintains a symbolic state
maps each memory location used by the program to symbolic expressions over symbolic input variables Maintains a path constraint a sequence of constraints C1, C2,…, Ck such that Ci s are symbolic constraints over symbolic input variables C1 Æ C2 Æ … Æ Ck holds over the path currently executed by the program Arithmetic Expressions a1x1+…+anxn+c Pointer expressions x

91 Instrumented Program Maintains a symbolic state
maps each memory location used by the program to symbolic expressions over symbolic input variables Maintains a path constraint a sequence of constraints C1, C2,…, Ck such that Ci s are symbolic constraints over symbolic input variables C1 Æ C2 Æ … Æ Ck holds over the path currently executed by the program Arithmetic Expressions a1x1+…+anxn+c Pointer expressions x Arithmetic Constraints a1x1+…+anxn ¸ 0 Pointer Constraints x=y x y x=NULL x NULL

92 Constraint Solving Path constraint C1Æ C2Æ… Æ Ck Æ… Cn

93 Constraint Solving Path constraint C1Æ C2Æ… Æ Ck Æ… Cn Solve
to generate next input

94 Constraint Solving Path constraint C1Æ C2Æ… Æ Ck Æ… Cn Solve
to generate next input If Ck is pointer constraint then invoke pointer solver Ci1 Æ Ci2 Æ … Æ : Ck where ij 2 [1..k-1] and Cij is pointer constraint If Ck is arithmetic constraint then invoke linear solver Ci1Æ Ci2 Æ … Æ : Ck where ij 2 [1..k-1] and Cij is arithmetic constraint

95 Optimizations Fast un-satisfiability check C1Æ C2Æ… Æ : Ck

96 Optimizations Fast un-satisfiability check C1Æ C2Æ… Æ : Ck
Eliminate same constraints (y>2) Æ (y>2)Æ :(y==4) (y>2)Æ :(y==4) that is C1Æ C2Æ…Ci … Cj … Æ : Ck = C1Æ C2Æ… Ci … Æ : Ck if Ci = Cj

97 Optimizations Fast un-satisfiability check C1Æ C2Æ… Æ : Ck
Eliminate same constraints (y>2) Æ (y>2)Æ :(y==4) (y>2)Æ :(y==4) that is C1Æ C2Æ…Ci … Cj … Æ : Ck = C1Æ C2Æ… Ci … Æ : Ck if Ci = Cj Eliminate non-dependent constraints (x==1) Æ (y>2)Æ :(y==4)

98 SGLIB: popular library for C data-structures
Used in Xrefactory a commercial tool for refactoring C/C++ programs Found two bugs in sglib 1.0.1 reported them to authors fixed in sglib 1.0.2 Bug 1: doubly-linked list library segmentation fault occurs when a non-zero length list is concatenated with a zero-length list discovered in 140 iterations ( < 1second) Bug 2: hash-table an infinite loop in hash table is member function 193 iterations (1 second)

99 SGLIB: popular library for C data-structures
Name Run time in seconds # of Iterations # of Branches Explored % Branch Coverage # of Functions Tested # of Bugs Found Array Quick Sort 2 732 43 97.73 Array Heap Sort 4 1764 36 100.00 Linked List 570 100 96.15 12 Sorted List 1020 110 96.49 11 Doubly Linked List 3 1317 224 99.12 17 1 Hash Table 193 46 85.19 8 Red Black Tree 2629 1,000,000 242 71.18 This table gives the various statistics of testing the SGLIB library. For most of the data-structures, the runtime is less than 5 seconds. For red-black tree, we did not manage to get complete path coverage after 45 minutes. So we stopped testing it any further. In most of the cases the % branch coverage is as higher than 96%. For hashtable the branch coverage is 85% because we lost some completeness due to the presence of a complex hash function.

100 Experiments: Needham-Schroeder Protocol
Tested a C implementation of a security protocol (Needham-Schroeder) with a known attack 600 lines of code Took less than 13 seconds on a machine with 1.8GHz Xeon processor and 2 GB RAM to discover middle-man attack In contrast, a software model-checker (VeriSoft) took 8 hours

101 Testing Data-structures of CUTE itself
Unit tested several non-standard data-structures implemented for the CUTE tool cu_depend (used to determine dependency during constraint solving using graph algorithm) cu_linear (linear symbolic expressions) cu_pointer (pointer symbolic expresiions) Discovered a few memory leaks and a copule of segmentation faults these errors did not show up in other uses of CUTE for memory leaks we used CUTE in conjunction with Valgrind

102 Entire Computation Tree
Limitations Path Space of a Large Program is Huge Path Explosion Problem Entire Computation Tree

103 Entire Computation Tree Explored by Concolic Testing
Limitations Path Space of a Large Program is Huge Path Explosion Problem Entire Computation Tree Explored by Concolic Testing

104 Entire Computation Tree Explored by Concolic Testing
Limitations Path Space of a Large Program is Huge Path Explosion Problem Entire Computation Tree Explored by Concolic Testing

105 Hybrid Concolic Testing (joing work with Rupak Majumdar)

106 Limitations: A Comparative View
Concolic: Broad, shallow Random: Narrow, deep

107 Hybrid Concolic Testing
Interleave Random Testing and Concolic Testing to increase coverage Motivated by similar idea used in VLSI design validation: Ganai et al. 1999, Ho et al. 2000

108 Hybrid Concolic Testing
Interleave Random Testing and Concolic Testing to increase coverage while (not required coverage) { while (not saturation) perform random testing; Checkpoint; while (not increase in coverage) perform concolic testing; Restore; }

109 Hybrid Concolic Testing
Interleave Random Testing and Concolic Testing to increase coverage while (not required coverage) { while (not saturation) perform random testing; Checkpoint; while (not increase in coverage) perform concolic testing; Restore; } Deep, broad search Hybrid Search

110 Hybrid Concolic Testing
4x more coverage than random testing 2x more coverage than concolic testing

111 Results

112 Results

113 Model-Checking and Formal Methods Automated Theorem Proving
Summary Static Analysis Model-Checking and Formal Methods Program Verification There are several approaches that facilitate software reliability: type systems and safe programming languages, static analysis, software model checking, runtime monitoring, testing and so on… these methods help, but they have limitations. For example, testing is mostly done ad-hoc by manually translating specifications into test cases. Safe programming languages are good at preventing many common bugs, but they are normally not as efficient as C. Constraint Solving Automated Theorem Proving

114 Model-Checking and Formal Methods Automated Theorem Proving
Summary Static Analysis Model-Checking and Formal Methods Program Verification Testing There are several approaches that facilitate software reliability: type systems and safe programming languages, static analysis, software model checking, runtime monitoring, testing and so on… these methods help, but they have limitations. For example, testing is mostly done ad-hoc by manually translating specifications into test cases. Safe programming languages are good at preventing many common bugs, but they are normally not as efficient as C. Constraint Solving Automated Theorem Proving

115 References DART: Directed Automated Random Testing. Patrice Godefroid, Nils Klarlund, and Koushik Sen (PLDI 2005) CUTE: A Concolic Unit Testing Engine for C. Koushik Sen, Darko Marinov, and Gul Agha (ESEC/FSE 2005) "Dynamic Test Input Generation for Database Applications” Michael Emmi, Rupak Majumdar, and Koushik Sen, International Symposium on Software Testing and Analysis (ISSTA'07), ACM, 2007. "Hybrid Concolic Testing” Rupak Majumdar and Koushik Sen, 29th International Conference on Software Engineering (ICSE'07), IEEE, 2007. "A Race-Detection and Flipping Algorithm for Automated Testing of Multi-Threaded Programs” Koushik Sen and Gul Agha, Haifa verification conference 2006 (HVC'06), Lecture Notes in Computer Science, Springer, 2006. "CUTE and jCUTE : Concolic Unit Testing and Explicit Path Model-Checking Tools” Koushik Sen and Gul Agha, 18th International Conference on Computer Aided Verification (CAV'06), Lecture Notes in Computer Science, Springer, (Tool Paper). "Automated Systematic Testing of Open Distributed Programs” Koushik Sen and Gul Agha, International Conference on Fundamental Approaches to Software Engineering (FASE'06) (ETAPS'06 conference), Lecture Notes in Computer Science, Springer, 2006. “EXE: Automatically Generating Inputs of Death,” Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, Dawson R. Engler, 13th ACM Conference on Computer and Communications Security, 2006. “Automatically generating malicious disks using symbolic execution,” Junfeng Yang, Can Sar, Paul Twohey, Cristian Cadar, and Dawson Engler. IEEE Security and Privacy, 2006.


Download ppt "DART and CUTE: Concolic Testing"

Similar presentations


Ads by Google