Symbolic execution © Marcelo d’Amorim 2010
Goal and Input-Output Automate test input data generation – Input: parameterized function call – Output: inputs s.t. all* paths are explored © Marcelo d’Amorim 2010 foo(int x, int y){ if(x > y){... } else{... } } Symbolic Execution foo($a, $b); foo(1,0); foo(0,0)
Attention! Function foo can be arbitrarily complex – Other types, call to other functions, contain loops and branches, etc. One can obtain tests with user-defined assertions © Marcelo d’Amorim 2010
Opening the box… © Marcelo d’Amorim 2010 Symbolic Execution foo($a, $b); foo(1,0); foo(0,0)
Opening the box… © Marcelo d’Amorim 2010 Symbolic Execution foo($a, $b); foo(1,0); foo(0,0) Constraint generation Constraint solving
Opening the box… © Marcelo d’Amorim 2010 Symbolic Execution foo($a, $b); foo(1,0); foo(0,0) A path condition is a description of a path as function of symbolic inputs. Symbolic execution explores all program paths. Constraint generation Constraint solving path conditions
Opening the box… © Marcelo d’Amorim 2010 Symbolic Execution foo($a, $b); foo(1,0); foo(0,0) Constraint generation Constraint solving $a > $b $a <= $b foo(int x, int y){ if(x > y){... } else{... } }
Exercise Generate the path conditions for this program. © Marcelo d’Amorim 2010 void bar1(int x){ if (x > 0) { … } else if (x < 0) { … } else { ERROR; } }
Exercise Generate the path conditions for this program. © Marcelo d’Amorim 2010 void bar2(int x){ if (x > 0) { if (x > 10) {…} } else if (x < 0) { if (x < 2) {…} } else { ERROR; } }
Exercise Generate the path conditions for this program. © Marcelo d’Amorim 2010 void bar2(int x){ if (x > 0) { if (x > 10) {…} } else if (x < 0) { if (x < 2) {…} } else { ERROR; } } Infeasible path!
Exercise Generate the path conditions for this program. Hint: ignore paths with length > 2. © Marcelo d’Amorim 2010 int fact(int n){ return n * (n > 0) ? fact (n – 1) : 1; }
Exercise Generate the path conditions for this program. Hint: ignore paths with length > 2. © Marcelo d’Amorim 2010 int fact(int n){ return n * (n > 0) ? fact (n – 1) : 1; } Repeated states.
Part 1: constraint generator Modifies program semantics to handle symbolic state – Stack, heap, and static area hold symbolic values Two popular alternatives – Instrumentation – Modified interpreter (e.g., Java Virtual Machine) © Marcelo d’Amorim 2010
Instrumentation © Marcelo d’Amorim 2010 foo(int x) { x = x + 1; if (x > 10) { // … } else { // … } foo(SymInt x) { x = x.add(ONE); if (x.gt(TEN).choose()) { // … } else { // … } Types and operationschoice
Discussion What would you need to modify in a JVM to run programs in symbolic execution mode? What are pros-cons of instrumentation-based solution vs. modified JVM? © Marcelo d’Amorim 2010
Part 2: constraint solver Decision procedures can be used to solve simple constraints. For example: – Integer linear arithmetic: x > y + z and z < y Unfortunately, symbolic execution can generate complex constraints – Undecidable, intractable, or just not handled by decision procedures © Marcelo d’Amorim 2010
Pointers to the interested JVM symbolic execution: AQUA and SPF Complex constraints: CORAL or FloPSy Links: – AQUA and CORAL: – SPF: google JPF and symb project – FloPSy: us/people/nikolait/ us/people/nikolait/ © Marcelo d’Amorim 2010
Objects: Lazy initialization A symbolic object is an “unknown blob”. – Execution details the blob by need Assignment example: o.f = exp – Variable o holds the symbolic object ? (the blob) – 3 possible outcomes depending on ?: ? is null ? is a not yet seen object ? Is an already seen object © Marcelo d’Amorim 2010
Objects: Lazy initialization A symbolic object is an “unknown blob”. – Execution details the blob by need Assignment example: o.f = exp – Variable o holds the symbolic object ? (the blob) – 3 possible outcomes depending on ?: ? is null ? is a not yet seen object ? Is an already seen object © Marcelo d’Amorim 2010 Concretize the heap while making choices
Example © Marcelo d’Amorim 2010 Node root; add(Node n) { if (root == null) { root = n; } else { int v = root.val; if (v < n.val) {…} … } Notation: Primitive fields inside the box. Reference fields outside (omission indicates null). Dashed borders indicate symbolic objects. BST bst = new BST(); bst.add($a); bst.add($b); bst
Example © Marcelo d’Amorim 2010 Node root; add(Node n) { if (root == null) { root = n; } else { int v = root.val; if (v < n.val) {…} … } BST bst = new BST(); bst.add($a); bst.add($b); $abst root
Example © Marcelo d’Amorim 2010 Node root; add(Node n) { if (root == null) { root = n; } else { int v = root.val; if (v < n.val) {…} … } BST bst = new BST(); bst.add($a); bst.add($b); $a $x $y bst $b $a $x $y bst $b $a root left right $a == null $a != null and $a.val = $x and $b.val = $y and $y < $x $x bst $a root $a != null and $a.val = x and $b.val = y and $x=$y $a != null and $a.val = $x and $b.val = $y and $y > $x NPE!
Strings Two approaches – A string is an array of symbolic characters – Symbolic string + special interpretation of library methods First approach can be too expensive. Why? © Marcelo d’Amorim 2010
Strings Two approaches – A string is an array of symbolic characters – Symbolic string + special interpretation of library methods First approach can be too expensive. Why? © Marcelo d’Amorim 2010 foo(String s) { …if (s.equals(“hello”)) {…}… }
Automata for string constraints Second approach generates finite automata for string constraints generated with library calls Constraint solving = automata walk! © Marcelo d’Amorim 2010
Exercise Generate automata to characterize these constraints © Marcelo d’Amorim 2010 $s.startsWith(“hello”) and $s.indexOf(“class”)!=-1 and s.endsWith(“.”)
Concolic execution (a.k.a. fuzzing) Several problems with standard symbolic execution. In particular: – Exploration of infeasible paths – Symbolic arrays – Handling of loops and recursion – Native method calls © Marcelo d’Amorim 2010
Concolic Execution: How it works 1.Execute the problem with concrete and symbolic inputs 2.Save decisions as before, but execute a single path! 3.Solve pending decisions and back to 1 © Marcelo d’Amorim 2010 Can go from symbolic to concrete domain anytime during execution!
Summary Important technique to automate testing Found real errors in file systems, OS, network protocols, and several data structures See for industrial applicationswww.coverity.com © Marcelo d’Amorim 2010
What I believe is still missing Automation of driver and oracle generation Exploit natural parallelism © Marcelo d’Amorim 2010 SYMB.EXE Solver YICES … … queries: solutions: