Advanced Concepts for/using Symbolic Execution

Slides:



Advertisements
Similar presentations
Hybrid BDD and All-SAT Method for Model Checking Orna Grumberg Joint work with Assaf Schuster and Avi Yadgar Technion – Israel Institute of Technology.
Advertisements

50.530: Software Engineering
Greedy best-first search Use the heuristic function to rank the nodes Search strategy –Expand node with lowest h-value Greedily trying to find the least-cost.
Symbolic Execution with Mixed Concrete-Symbolic Solving
MATH 224 – Discrete Mathematics
Satisfiability Modulo Theories (An introduction)
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
Parallel Symbolic Execution for Structural Test Generation Matt Staats Corina Pasareanu ISSTA 2010.
Model Counting >= Symbolic Execution Willem Visser Stellenbosch University Joint work with Matt Dwyer (UNL, USA) Jaco Geldenhuys (SU, RSA) Corina Pasareanu.
Counting the bits Analysis of Algorithms Will it run on a larger problem? When will it fail?
1 Symbolic Execution for Model Checking and Testing Corina Păsăreanu (Kestrel) Joint work with Sarfraz Khurshid (MIT) and Willem Visser (RIACS)
1/20 Generalized Symbolic Execution for Model Checking and Testing Charngki PSWLAB Generalized Symbolic Execution for Model Checking and Testing.
Techniques for Dealing with Hard Problems Backtrack: –Systematically enumerates all potential solutions by continually trying to extend a partial solution.
CS 267: Automated Verification Lecture 10: Nested Depth First Search, Counter- Example Generation Revisited, Bit-State Hashing, On-The-Fly Model Checking.
Planning under Uncertainty
CSE503: SOFTWARE ENGINEERING SYMBOLIC TESTING, AUTOMATED TEST GENERATION … AND MORE! David Notkin Spring 2011.
Simulation Where real stuff starts. ToC 1.What, transience, stationarity 2.How, discrete event, recurrence 3.Accuracy of output 4.Monte Carlo 5.Random.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
The Theory of NP-Completeness
Chapter 11: Limitations of Algorithmic Power
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 4: SMT-based Bounded Model Checking of Concurrent Software.
Decision Procedures An Algorithmic Point of View
Symbolic Execution with Mixed Concrete-Symbolic Solving (SymCrete Execution) Jonathan Manos.
CUTE: A Concolic Unit Testing Engine for C Technical Report Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.
The Complexity of Optimization Problems. Summary -Complexity of algorithms and problems -Complexity classes: P and NP -Reducibility -Karp reducibility.
Model Counting A Quest for Nails 2 Willem Visser Stellenbosch University Joint work with Matt Dwyer (UNL, USA) Jaco Geldenhuys (SU, RSA) Corina Pasareanu.
The Class NP Lecture 39 Section 7.3 Mon, Nov 26, 2007.
Convergence of Model Checking & Program Analysis Philippe Giabbanelli CMPT 894 – Spring 2008.
Symbolic Execution with Abstract Subsumption Checking Saswat Anand College of Computing, Georgia Institute of Technology Corina Păsăreanu QSS, NASA Ames.
NP-Complete Problems. Running Time v.s. Input Size Concern with problems whose complexity may be described by exponential functions. Tractable problems.
Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.
CSV 889: Concurrent Software Verification Subodh Sharma Indian Institute of Technology Delhi Scalable Symbolic Execution: KLEE.
Model Counting with Applications to CodeHunt Willem Visser Stellenbosch University South Africa.
CUTE: A Concolic Unit Testing Engine for C Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
( = “unknown yet”) Our novel symbolic execution framework: - extends model checking to programs that have complex inputs with unbounded (very large) data.
Lecture. Today Problem set 9 out (due next Thursday) Topics: –Complexity Theory –Optimization versus Decision Problems –P and NP –Efficient Verification.
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
CSE 331 SOFTWARE DESIGN & IMPLEMENTATION SYMBOLIC TESTING Autumn 2011.
Symstra: A Framework for Generating Object-Oriented Unit Tests using Symbolic Execution Tao Xie, Darko Marinov, Wolfram Schulte, and David Notkin University.
COSC 3101A - Design and Analysis of Algorithms 14 NP-Completeness.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
On the Relation Between Simulation-based and SAT-based Diagnosis CMPE 58Q Giray Kömürcü Boğaziçi University.
Model Counting for Test Coverage, CodeHunt & Mutations Willem Visser Stellenbosch University.
Lecture 3: Uninformed Search
The Theory of NP-Completeness
Uniformed Search (cont.) Computer Science cpsc322, Lecture 6
Hybrid BDD and All-SAT Method for Model Checking
Integer Programming An integer linear program (ILP) is defined exactly as a linear program except that values of variables in a feasible solution have.
Inference in Bayesian Networks
The minimum cost flow problem
Exact Algorithms via Monotone Local Search
Uniformed Search (cont.) Computer Science cpsc322, Lecture 6
NP-Completeness Yin Tat Lee
CSCI1600: Embedded and Real Time Software
Objective of This Course
Binary Decision Diagrams
What to do when you don’t know anything know nothing
CIS 488/588 Bruce R. Maxim UM-Dearborn
CS 188: Artificial Intelligence
Automatic Test Generation SymCrete
NP-Complete Problems.
Graphs and Algorithms (2MMD30)
NP-Completeness Yin Tat Lee
CSE 6408 Advanced Algorithms.
Lecture 10, Computer Networks (198:552)
CUTE: A Concolic Unit Testing Engine for C
CSCI1600: Embedded and Real Time Software
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Presentation transcript:

Advanced Concepts for/using Symbolic Execution Willem Visser Stellenbosch University

Overview Optimizing constraint solving Model Counting and its uses Green overview Green usage and demos Model Counting and its uses Probabilistic Symbolic execution Reliability Program Understanding

Green: Reduce, Reuse and Recycle Constraints in Program Analysis Willem Visser Stellenbosch University Joint work with Jaco Geldenhuys and Matt Dwyer

What is Symbolic Execution Executing a program with symbolic inputs Collect all constraints to execute a path through code, called Path Condition Stop when Path Condition becomes infeasible Many uses Checking for errors, without running the code Solve feasible constraints to get inputs for test cases

Decision Procedures Huge advances in the last 15 years Many great tools Z3, Yices, CVC3, STP, … Satisfiability is NP-complete Worst case complexity is exponential in the size of the formula Our goal is to make these tools even better, without changing a line of code inside them!

int m(int x,y) { if (x < 0) x = -x; if (y < 0) y = -y; return 1; } else if (9 < y) { return -1; } else { return 0; } [ X < 0 ] X < 0 !(X < 0) [ Y < 0 ] [ Y < 0 ] Y < 0 !(Y < 0) [ X < 10 ] [ X < 10 ] -X < 10 !(-X < 10) -X < 10 !(-X < 10) [ 9 < Y ] [ 9 < Y ] !(9 < -Y) 9 < -Y 9 < Y !(9 < Y)

Don’t need the complete constraint [ X < 0 ] !(X < 0) X < 0 X < 0 [ Y < 0 ] [ Y < 0 ] Y < 0 !(Y < 0) Y < 0 !(Y < 0) Don’t need the complete constraint to decide feasibility X < 0 /\ Y < 0 [ X < 10 ] [ X < 10 ] [ X < 10 ] [ X < 10 ] -X < 10 !(-X < 10) -X < 10 -X < 10 X < 10 !(X < 10) X < 10 !(X < 10) X < 0 /\ Y < 0 /\ !(-X < 10) [ 9 < Y ] [ 9 < Y ] [ 9 < Y ] [ 9 < Y ] 9 < -Y !(9 < -Y) 9 < Y 9 < -Y 9 < -Y !(9 < -Y) 9 < Y !(9 < Y) X < 0 /\ Y < 0 /\ !(-X < 10) /\ 9 < -Y

Slicing constraints leads to the same constraints in different places [ X < 0 ] !(X < 0) X < 0 Slicing constraints leads to the same constraints in different places X < 0 [ Y < 0 ] [ Y < 0 ] !(X < 0) Y < 0 !(Y < 0) Y < 0 !(Y < 0) Y < 0 [ X < 10 ] [ X < 10 ] !(Y < 0) Y < 0 [ X < 10 ] [ X < 10 ] !(Y < 0) -X < 10 !(-X < 10) -X<10 !(-X<10) X < 10 !(X < 10) X < 10 !(X < 10) X < 0 /\ !(-X < 10) [ 9 < Y ] X < 0 /\ !(-X < 10) [ 9 < Y ] !(X < 0) /\ !(X < 10) [ 9 < Y ] !(X < 0) /\ !(X < 10) [ 9 < Y ] These two constraints are the same! 9 < -Y !(9 < -Y) 9 < Y 9 < -Y 9 < -Y !(9 < -Y) 9 < Y !(9 < Y) Y < 0 /\ 9 < -Y

Canonization of Constraints X < 0 /\ !(-X < 10) Y < 0 /\ 9 < -Y X < 0 /\ -X >= 10 Y < 0 /\ Y < - 9 X < 0 /\ X <= -10 Y < 0 /\ Y + 9 < 0 Y + 1 <= 0 /\ Y + 10 <= 0 X + 1 <= 0 /\ X + 10 <= 0 V0 + 1 <= 0 /\ V0 + 10 <= 0 ax + by + cz +…+ k {<=,=,!=} 0 Canonical Form Scale by -1 to transform > and >= to < and <= Add 1 to transform < to <=

[ X < 0 ] V0+1 <= 0 [ Y < 0 ] [ Y < 0 ] -V0 <= 0 V0+1 <= 0 /\ V0+10 <= 0 [ 9 < Y ] V0+1<=0 /\ V0+10<=0 [ 9 < Y ] -V0<=0/\-V0+10<=0 [ 9 < Y ] -V0<=0/\-V0+10<=0 [ 9 < Y ] V0+1<=0 /\ V0+10<=0 V0+1<=0 /\ -V0-9<=0 -V0<=0 /\ -V0+10<=0 -V0<=0 /\ V0-9<=0 V0+1<=0 /\ V0+10<=0 V0+1<=0 /\ -V0-9<=0 -V0<=0 /\ -V0+10<=0 -V0<=0 /\ V0-9<=0

What if we store the results? and reuse them to avoid recalculation

[ X < 0 ] V0+1 <= 0 1 [ Y < 0 ] [ X < 10 ] [ 9 < Y ] -V0<=0/\-V0+10<=0 -V0<=0 /\ V0-9 <=0 V0+1<=0 /\ V0+10<=0 -V0-9<=0 -V0<=0 -V0+10<=0 V0-9<=0 4 1 6 5 3 2 V0+1 <= 0 1 -V0 <= 0 4 V0+1<=0 /\ -V0 - 9 <=0 2 V0+1<=0 /\ -V0 - 9 <=0 2 V0+1 <= 0 /\ V0+10 <= 0 3 V0+1<=0 /\ V0+10<=0 3 V0+1<=0 /\ V0+10<=0 3 V0+1<=0 /\ -V0-9<=0 2 -V0<=0 /\ -V0+10<=0 5 -V0<=0 /\ V0-9<=0 6

Let’s change the program! int m(int x,y) { if (x < 0) x = -x; if (y < 0) y = -y; if (x < 10) { return 1; } else if (9 < y) { return -1; } else { return 0; } Only the last 8 constraints are changed in the symbolic execution tree and 4 of them are reused. Reusing the stored results from the first analysis eliminates 14 decision procedure calls! If (10 < y)

Green Reduce Reuse Recycle Slicing + Canonization Storing results Across Analyses of Programs and even Tools

PC = knownPC /\ newPC Slicing Algorithm Known to be SAT Build a constraint graph for knownPC /\ newPC Vertices are symbolic variables Edges between them if they are in the same constraint Find all variables R reachable from variables in newPC Return the conjunction of all the constraints containing variables R Classic Symbolic Execution newPC is the last decision on the path knownPC is all the rest Dynamic Symbolic Execution newPC is the negated conjunct knownPC are all the other conjuncts

Factorizing Slicer PC = C1 & C2 & … & Cn PC = (C1 & C2) & Returns independent sub-constraints PC = (C1 & C2) & (C3 & C4 & C5) & (… & Cn)

Three Parts to Canonization Pre-Heuristic lexicographic reordering X > Y vs Y < X => X > Y Normal Form ax + by + cz +…+ k {<=,=,!=} 0 Post-Heuristic 1. lexicographic order of constraints 2. Renaming based on order in constraints

NoSQL In-memory key-value store First hack took about 10 mins: Download Redis, make, start Find Java wrapper…Jedis Add 5 lines of code Viola! Simply get(“PC”) and if not found put(“PC”,”T | F”)

Storage is layered Localhost Colleague What you don’t find locally, look for in other stores Results are pushed back New local results are pushed out Offshore Store

Results Why Slice and Canonize? -store +store -canon +canon -slice 95506 94739 96448 50467 +slice 27129 27369 20410 5603 Binomial Heap with all add/remove sequences of length 5 time in milliseconds

Reuse between programs BinomialHeap Only 3.1% reused 155 1 4 133 38 154 80.6% reused 54.5% reused TreeMap BinaryTree

Green History First version was in support of Probabilistic Symbolic Execution (2011) Slicing constraints Reusing Latte counts within one run Made its own tool in 2012 Paper published at FSE 2012 Introduced Redis store to reuse across runs Current version at green-solver.googlecode.com Extensible pipeline of transformations

SAT Example Usage Setup solver = new Green(); props = new Properties(); props.setProperty( "green.services", "sat"); "green.service.sat", ”z3"); "green.service.sat.z3", "za.ac.sun.cs.green.service.z3.SATZ3JavaService"); config = new Configuration(solver, props); config.configure();

SAT Example Usage Calling Instance green = new Instance(solver,null,Cons); Boolean result = (Boolean)green.request("sat");

Counting Example Usage solver = new Green(); props = new Properties(); props.setProperty( "green.services", ”count"); "green.service.count", ”latte"); "green.service.sat.latte", ”za.ac.sun.cs.green.service.latte.CountLattEService");

Counting Example Usage props.setProperty( "green.services", ”count"); "green.service.count", ”(bounder latte)"); "green.service.count.bounder", "za.ac.sun.cs.green.service.bounder.BounderService"); "green.service.count.latte", ”za.ac.sun.cs.green.service.latte.CountLattEService"); … Apint result = (Apint)green.request("count");

Adding the Reusable Store solver = new Green(); props = new Properties(); props.setProperty( "green.store", "za.ac.sun.cs.green.store.redis.RedisStore"); …

What do you think this will do? solver = new Green(); props = new Properties(); props.setProperty( "green.taskmanager", ”...green.taskmanager.ParallelTaskManager"); props.setProperty("green.services", "sat"); props.setProperty("green.service.sat", "choco z3"); … Runs choco and z3 in parallel and takes the result of the first one to finish

Reporting SATChocoService:: invocationCount = 28 SATChocoService:: cacheHitCount = 0 SATChocoService:: cacheMissCount = 28 SATChocoService:: timeConsumption = 3829 SATZ3JavaService:: invocationCount = 28 SATZ3JavaService:: cacheHitCount = 0 SATZ3JavaService:: cacheMissCount = 28 SATZ3JavaService:: timeConsumption = 346 Every Green component keeps relevant statistics that can be accessed via a Reporter

Lets try advanced features! props.setProperty( "green.services", ”count"); "green.service.count", "(bounder (factorize (canonize latte)))"); "green.service.count.factorize", ”...green.service.factorizer.CountFactorizerService"); "green.service.count.canonize", ”...green.service.canonizer.SATCanonizerService");

CountFactorizerService Splits formula into independent parts Count(C1 && C2 && C3 && C4 && C5) Splits into × × Count(C1 && C2) Count(C3) Count(C4 && C5) Note that each part can be found in store

SATFactorizerService Splits formula into independent parts SAT(C1 && C2 && C3 && C4 && C5) Splits into SAT(C1 && C2) SAT(C3) SAT(C4 && C5) && && Note that each part can be found in store

SAT/Count CanonizerService Pre-Heuristic lexicographic reordering X > Y vs Y < X => X > Y ax + by + cz +…+ k {<=,=,!=} 0 Canonical Form Scale by -1 to transform > and >= to < and <= Add 1 to transform < to <= Post-Heuristic 1. lexicographic order of constraints 2. Renaming based on order in constraints

Example Models props.setProperty("green.services”,"model"); "green.service.model", "(bounder z3)"); "green.service.model.z3", ”...green.service.z3.ModelZ3JavaService"); Object result = green.request("model"); Map<IntVariable,Object> res = (Map<IntVariable,Object>)result;

Conclusions Please try it at green-solver.googlecode.com Lots of possible experiments to be done here We need to add CVC4 and STP We need to add support for Strings

Future Work Extending Model Counting to other types Green Reference Types, Strings, Floats, etc. Green Adding support for \/ efficiently Are the number of actually occurring constraints in code “finite”? How far can one push the Big Data idea? Main goal now is to get as many people as possible to use Green Ultimate Goal: Real-time developer feedback

Already integrated into Symbolic PathFinder The Green Framework http://green-solver.googlecode.com Already integrated into Symbolic PathFinder

Model Counting opens new doors in Program Analysis Willem Visser Stellenbosch University Joint work with Matt Dwyer (UNL, USA) Jaco Geldenhuys (SU, RSA) Corina Pasareanu (NASA, USA) Antonio Filieri (Stuttgart, Germany)

Saving the Whooping Crane

PC = C1 & C2 & … & Cn PC solutions PC feasibility >0

Resources ISSTA 2012 FSE 2012 ICSE 2013 PLDI 2014 FSE 2014 Accepted Probabilistic Symbolic Execution FSE 2012 Green: Reduce, Reuse and Recycle Constraints… ICSE 2013 Software Reliability with Symbolic PathFinder PLDI 2014 Compositional Solution Space Quantification for Probabilistic Software Analysis FSE 2014 Accepted Statistical Symbolic Execution with Informed Sampling ASE 2014 Exact and Approximate Probabilistic Symbolic Execution for Nondeterministic Programs Implemented in Symbolic PathFinder Using LattE

In a perfect world… only linear integer constraints and only uniform distributions

Symbolic Execution Test(1,10) reaches S0,S3 Test(0,1) reaches S1,S3 void test(int x, int y) { if (y == x*10) S0; else S1; if (x > 3 && y > 10) S2; S3; } [ true ] test (X,Y) [ Y=X*10 ] S0 [ Y!=X*10 ] S1 [ X>3 & 10<Y=X*10] S2 [ X>3 & 10<Y!=X*10] S2 [ Y=X*10 & !(X>3 & Y>10) ] S3 [ Y!=X*10 & !(X>3 & Y>10) ] S3 Test(1,10) reaches S0,S3 Test(0,1) reaches S1,S3 Test(4,11) reaches S1,S2

Paths void test(int x, int y) { if (y == x*10) S0; else S1; if (x > 3 && y > 10) S2; S3; } [ true ] test (X,Y) [ Y=X*10 ] S0 [ Y!=X*10 ] S1 [ X>3 & 10<Y=X*10] S2 [ X>3 & 10<Y!=X*10] S2 [ Y=X*10 & !(X>3 & Y>10) ] S3 [ Y!=X*10 & !(X>3 & Y>10) ] S3

Paths and Rivers void test(int x, int y) { if (y == x*10) S0; else S1; if (x > 3 && y > 10) S2; S3; } [ true ] [ Y=X*10 ] [ Y!=X*10 ] [ X>3 & 10<Y=X*10] [ Y=X*10 & !(X>3 & Y>10) ] [ Y!=X*10 & !(X>3 & Y>10) ] [ X>3 & 10<Y!=X*10]

Which of 1, 2, 3 or 4 is the most likely? Almost Rivers void test(int x, int y: 0..99) { if (y == x*10) S0; else S1; if (x > 3 && y > 10) S2; S3; } [ true ] y=10x [ Y=X*10 ] [ Y!=X*10 ] x>3 & y>10 x>3 & y>10 Which of 1, 2, 3 or 4 is the most likely? 1 2 3 4 [ Y=X*10 & !(X>3 & Y>10) ] [ X>3 & 10<Y=X*10] [ X>3 & 10<Y!=X*10] [ Y!=X*10 & !(X>3 & Y>10) ]

Rivers void test(int x, int y: 0..99) { if (y == x*10) S0; else S1; if (x > 3 && y > 10) S2; S3; } [ true ] y=10x [ Y=X*10 ] [ Y!=X*10 ] x>3 & y>10 x>3 & y>10 [ Y=X*10 & !(X>3 & Y>10) ] [ X>3 & 10<Y=X*10] [ X>3 & 10<Y!=X*10] [ Y!=X*10 & !(X>3 & Y>10) ]

Count solutions for conjunction of Linear Inequalities LattE Model Counter http://www.math.ucdavis.edu/~latte/ Count solutions for conjunction of Linear Inequalities

Rivers of Values void test(int x, int y: 0..99) { if (y == x*10) S0; else S1; if (x > 3 && y > 10) S2; S3; } 104 [ true ] y=10x [ Y=X*10 ] [ Y!=X*10 ] 9990 10 x>3 & y>10 x>3 & y>10 8538 1452 6 4 [ Y=X*10 & !(X>3 & Y>10) ] [ X>3 & 10<Y=X*10] [ X>3 & 10<Y!=X*10] [ Y!=X*10 & !(X>3 & Y>10) ]

Program Understanding 104 [ true ] y=10x [ Y!=X*10 ] Program Understanding 9990 10 [ Y=X*10 ] x>3 & y>10 x>3 & y>10 8538 1452 6 4 [ Y=X*10 & !(X>3 & Y>10) ] [ Y!=X*10 & !(X>3 & Y>10) ] [ X>3 & 10<Y=X*10] [ X>3 & 10<Y!=X*10]

A Path Condition defines the constraints on the inputs to execute a path How likely is a PC to be satisfied? # solutions to the PC Domain Size Assuming uniform distribution of values

Conditional and Path Probabilities Pc = Prob (c | PC) PC = Prob (c & PC) Prob (PC) P !c 1-Pc c = Prob (c & PC) P Pc P’’ = (1-Pc) x P P’ = Pc x P

1 y=10x 0.999 Probabilities 0.001 x>3 & y>10 x>3 & y>10 0.855 0.145 0.6 0.4 0.1452 0.0004 0.8538 0.0006 [ Y=X*10 & !(X>3 & Y>10) ] [ Y!=X*10 & !(X>3 & Y>10) ] [ X>3 & 10<Y=X*10] [ X>3 & 10<Y!=X*10]

1 y=10x Reliability 0.999 0.001 x>3 & y>10 x>3 & y>10 0.9996 Reliable 0.855 0.145 0.6 0.4 0.1452 0.0004 0.8538 0.0006 [ Y=X*10 & !(X>3 & Y>10) ] [ Y!=X*10 & !(X>3 & Y>10) ] [ X>3 & 10<Y=X*10] [ X>3 & 10<Y!=X*10]

What is the reliability? Reliability with Symbolic Execution void test(int x,y: 0..99) { boolean error = false; if (x > 0) { if (y == hash(x)) error = true; else … if (x > 3 && y > 10) assert !error; } What is the reliability? Uniform Distribution: 0.9908 int hash(x) { if (0<=x<=10) return x*10; else return 0; }

Constraints must be disjoint and cover the complete domain Usage Profiles domain{ x : 0,99; y : 0,99; }; usageProfile{ x > y : 1/10; x <= y : 9/10; Constraints must be disjoint and cover the complete domain Probabilities must add to 1

Reliability with Symbolic Execution void test(int x,y) { boolean error = false; if (x > 0) { if (y == hash(x)) error = true; else … if (x > 3 && y > 10) assert !error; } Profile Reliability Uniform 0.99080 x > y : 0.1 0.99766 y > x : 0.1 0.98407 x > 10 & y > 10: 0.99 0.99995 x > 10 & y > 10: 1 1.00000 int hash(x) { if (0<=x<=10) return x*10; else return 0; }

Calculate Probabilities c1 : p1 c2 : p2 … cn : pn UP Calculate Probabilities AFTER Symbolic Execution PC … c1 c2 cn Prob(PC | UP) = i=1,n Prob(PC | ci) x pi Prob(PC | ci) = Prob (PC & ci) Prob (ci)

NON Looping Programs n Failure Paths m Success Paths Reliability(P) = ProbS(P) n Failure Paths m Success Paths ProbS(P) = i=1..m Prob(PCm | UP)

Looping Programs => Bounded Analysis Unknown Reliability(P) >= ProbS(P) Confidence = 1 – ProbG(P) n Failure m Success ProbS(P) = i=1..m Prob(PCm | UP) ProbF(P) = i=1..n Prob(PCn | UP) ProbG(P) = 1 - (ProbS(P) + ProbF(P))

Time for a new example

10-9 probability void unlikely(int x, int y, int z : 1..1000) { if (x <= 50) { S0 } else { if (x == 500 && y == 500 && z == 500) { assert false; S1 10-9 probability

Statistical Symbolic Execution Informed Monte Carlo Sampling of Symbolic Paths + Confidence and Error Bounds based on Bayesian Estimation Confidence = 1, i.e. exact incremental analysis

Monte Carlo Sampling of Symbolic Paths Step 1: Calculate Conditional Probability for a branch Pc = Prob (c | PC) PC = Prob (c & PC) Prob (PC) #PC !c 1-Pc = # (c & PC) #PC c Pc

Monte Carlo Sampling of Symbolic Paths Step 2: Take random value and pick c or !c direction rand = throwDice(); If (rand <= Pc) pick c; else pick !c; PC #PC !c 1-Pc c Pc

More likely to be picked void unlikely(int x, int y, int z : 1..1000) { if (x <= 50) { S0 } else { if (x == 500 && y == 500 && z == 500) { assert false; S1 109 x<=50 [ X<=50 ] [ X>50 ] 950*106 50*106 More likely to be picked

Will likely also cover S0 109 void unlikely(int x, int y, int z : 1..1000) { if (x <= 50) { S0 } else { if (x == 500 && y == 500 && z == 500) { assert false; S1 After 1 sample Covered only S1 After 100 samples Will likely also cover S0 109 x<=50 After 105 samples Will likely hit x==500 but Eagles will have to reunite before hitting the violation [ X<=50 ] [ X>50 ] 950*106 50*106 More likely to be picked x==500 [ X<=50 ] [ X=500 ] 949*106 106 y==500 [ X>50 & X!=500 ]

After every path sampled remove the path cleverly void unlikely(int x, int y, int z : 1..1000) { if (x <= 50) { S0 } else { if (x == 500 && y == 500 && z == 500) { assert false; S1 Informed Sampling [Draining the river] 109 x<=50 After every path sampled remove the path cleverly [ X<=50 ] [ X>50 ] 950*106 50*106 x==500 [ X=500 ] 949*106 106 [ X>50 & X!=500 ]

void unlikely(int x, int y, int z : 1..1000) { if (x <= 50) { S0 } else { if (x == 500 && y == 500 && z == 500) { assert false; S1 Informed Sample 2 51*106 x<=50 [ X<=50 ] [ X>50 ] 106 50*106 x==500 [ X=500 ] 106 [ X>50 & X!=500 ]

void unlikely(int x, int y, int z : 1..1000) { if (x <= 50) { S0 } else { if (x == 500 && y == 500 && z == 500) { assert false; S1 Informed Sample 3 106 x<=50 [ X<=50 ] [ X>50 ] 106 x==500 [ X<=50 ] [ X=500 ] 106 y==500 [ X>50 & X!=500 ]

void unlikely(int x, int y, int z : 1..1000) { if (x <= 50) { S0 } else { if (x == 500 && y == 500 && z == 500) { assert false; S1 Informed Sample 4 106 x<=50 106 [ X>50 ] x==500 106 [ X==500 ] y==500 999*103 1*103 [ X,Y==500 ] [ X==500 & Y!=500 ]

void unlikely(int x, int y, int z : 1..1000) { if (x <= 50) { S0 } else { if (x == 500 && y == 500 && z == 500) { assert false; S1 Informed Sample 5 103 x<=50 103 [ X>50 ] x==500 103 [ X==500 ] y==500 [ X,Y==500 ] 103 [ X==500 & Y!=500 ] z==500 1 999 [ X,Y==500 & Z!=500 ]

void unlikely(int x, int y, int z : 1..1000) { if (x <= 50) { S0 } else { if (x == 500 && y == 500 && z == 500) { assert false; S1 1 x<=50 1 [ X>50 ] After 6 Informed Samples we hit the 10-9 event Confindence = 1, since we explored the complete space x==500 1 [ X==500 ] y==500 [ X,Y==500 ] 1 z==500 1 [ X,Y,Z==500 ] [ X,Y==500 & Z!=500 ]

Cool Feature of Informed Sampling First samples the most likely paths Then the slightly less likely paths Then the even less likely paths Until you get to the very unlikely paths

Multithreaded Informed Sampling => Symbolic Execution 104 y=10x Only shared structure PC => count Run n threads, each doing informed sampling to reach a leave 9990 10 y=10x & x>3 & y>10 y!=10x & x>3 & y>10 When you update, first check if any value will become <= 0, if so, terminate and pick a new path from the top 8538 1452 6 4 [ X>3 & 10<Y=X*10] [ Y=X*10 & !(X>3 & Y>10) ] [ X>3 & 10<Y!=X*10] [ Y!=X*10 & !(X>3 & Y>10) ]

Multithreaded Informed Sampling => Symbolic Execution 104 y=10x 9990 10 y=10x & x>3 & y>10 y!=10x & x>3 & y>10 8538 1452 6 4 T1 T2 [ X>3 & 10<Y=X*10] [ Y=X*10 & !(X>3 & Y>10) ] [ X>3 & 10<Y!=X*10] [ Y!=X*10 & !(X>3 & Y>10) ]

Multithreaded Informed Sampling => Symbolic Execution 104 y=10x 1452 10 T2 y=10x & x>3 & y>10 y!=10x & x>3 & y>10 1452 6 4 T2 T2 [ X>3 & 10<Y=X*10] [ Y=X*10 & !(X>3 & Y>10) ] [ X>3 & 10<Y!=X*10] [ Y!=X*10 & !(X>3 & Y>10) ]

Multithreaded Informed Sampling => Symbolic Execution 104 y=10x 10 y=10x & x>3 & y>10 y!=10x & x>3 & y>10 6 4 [ X>3 & 10<Y=X*10] [ Y=X*10 & !(X>3 & Y>10) ] [ X>3 & 10<Y!=X*10] [ Y!=X*10 & !(X>3 & Y>10) ]

Multithreaded Informed Sampling => Symbolic Execution 104 y=10x 10 y=10x & x>3 & y>10 y!=10x & x>3 & y>10 6 4 T2 T1 [ X>3 & 10<Y=X*10] [ Y=X*10 & !(X>3 & Y>10) ] [ X>3 & 10<Y!=X*10] [ Y!=X*10 & !(X>3 & Y>10) ]

Multithreaded Informed Sampling => Symbolic Execution 104 y=10x y=10x & x>3 & y>10 y!=10x & x>3 & y>10 [ X>3 & 10<Y=X*10] [ Y=X*10 & !(X>3 & Y>10) ] [ X>3 & 10<Y!=X*10] [ Y!=X*10 & !(X>3 & Y>10) ]

Informed Sampling as a search heuristic for Concolic execution when negating constraints pick the path with the most values flowing down it next

More Probabilistic Topics Nondeterminism Markov Decision Processes Finding an optimal scheduler to resolve nondeterminism Domains that symbolic execution have trouble with Non-linear, floating point, strings Probabilistic Programming Biological/Ecological models

Markov Decision Processes public static void testMethod1 ( int x) { if ( Verify . getBoolean ()) { if (x <= 60) println (" success " ); else assert false ; } else { if (x <= 30) } if (x <= 55) } } 1 2 X<=55 X>55 3 4 .55 .45 X<=60 X>60 X<=30 X>30 .6 .4 .3 .7

Markov Decision Processes public static void testMethod1 ( int x) { if ( Verify . getBoolean ()) { if (x <= 60) println (" success " ); else assert false ; } else { if (x <= 30) } if (x <= 55) } } Optimal Scheduler 0 - 1 - 3 1 2 X<=55 X>55 3 4 .55 .45 X<=60 X>60 X<=30 X>30 .6 .4 .3 .7

Markov Decision Processes public static void testMethod1 ( int x) { if ( Verify . getBoolean ()) { if (x <= 60) println (" success " ); else assert false ; } else { if (x <= 30) } if (x <= 55) } } At nondeterministic nodes take the max of the children 1 2 X<=55 X>55 3 4 .55 .45 X<=60 X>60 X<=30 X>30 .6 .4 .3 .7

Probabilistic Programming FOSE Track at ICSE 2014 bool c1, c2; int count = 0; c1 = Bernoulli(0.5); if (c1) then count = count + 1; c2 = Bernoulli(0.5); if (c2) then while !(c1 || c2) { count = 0; } return(count); bool c1, c2; c1 = Bernoulli(0.5); c2 = Bernoulli(0.5); return(c1, c2); bool c1, c2; int count = 0; c1 = Bernoulli(0.5); if (c1) then count = count + 1; c2 = Bernoulli(0.5); if (c2) then observe(c1 || c2); return(count); bool c1, c2; c1 = Bernoulli(0.5); c2 = Bernoulli(0.5); // observe is assume observe(c1 || c2); return(c1, c2); =