From Verification to Synthesis Sumit Gulwani Microsoft Research, Redmond August 2013 Marktoberdorf Summer School Lectures: Part 1
1 Synthesis Goal: Synthesize a computational concept in some underlying language from user intent using some search technique. Significance: –Variety of computational devices, platforms, models. Difficult to remember/learn a new programming framework. Billions of non-experts have access to these! –Enabling technology is now available. Better search algorithms Faster machines (good application for multi-cores) State of the art: We can synthesize programs of size This is a revolutionary capability if we target the right set of application domains, and provide the right intent specification mechanism
2 Synthesis Goal: Synthesize a computational concept in some underlying language from user intent using some search technique. Significance: –Variety of computational devices, platforms, models. Difficult to remember/learn a new programming framework. Billions of non-experts have access to these! –Enabling technology is now available. Better search algorithms Faster machines (good application for multi-cores) State of the art: We can synthesize programs of size This is a revolutionary capability if we target the right set of application domains, and provide the right intent specification mechanism
Language –Programs Straight-line programs –Automata –Queries User Intent –Logic, Natural Language –Examples, Demonstrations/Traces –Program Search Technique –SAT/SMT solvers (Formal Methods) –A*-style goal-directed search (AI) –Version space algebras (Machine Learning) 3 Dimensions in Synthesis PPDP 2010: “Dimensions in Program Synthesis”, Gulwani. (Application) (Ambiguity) (Algorithm)
4 Compilers vs. Synthesizers DimensionCompilersSynthesizers Concept Language Executable ProgramVariety of concepts: Program, Automata, Query, Sequence User IntentStructured languageVariety/mixed form of constraints: logic, examples, traces Search Technique Syntax-directed translation (No new algorithmic insights) Uses some kind of search (Discovers new algorithmic insights)
Part 1: From verification to synthesis –Bitvector algorithms (PLDI 2011, ICSE 2012) –General loopy programs (POPL 2010) –SIMD algorithms (PPoPP 2013) –Program inverses (PLDI 2011) –Graph algorithms (OOPSLA 2010) Part 2: End-user Programming (Examples & Natural Language) –Syntactic string transformations: Flash Fill (POPL 2011) –Semantic string transformations (VLDB 2012) –Table layout transformations (PLDI 2011) –Smartphone scripts (MobiSys 2013) Part 3: Computer-aided Education –Problem Synthesis (AAAI 2012, CHI 2013) –Solution Synthesis (PLDI 2011, IJCAI 2013) –Feedback Synthesis (PLDI 2013, IJCAI 2013) –Content Authoring (CHI 2012) Outline 5
ApplicationGenerating Synthesis Constraint Solving Synthesis Constraint BitvectorLocation variablesCEGIS + SMT Loopy Alg.Template-basedSMT SIMDRelational verificationCEGIS + Reachability value graph InversesTemplate-based + symbolic execution SMT Graph Alg.Already in constraint form! CEGIS + Fact enumeration From Verification to Synthesis 6
ApplicationGenerating Synthesis Constraint Solving Synthesis Constraint BitvectorLocation variablesCEGIS + SMT Loopy Alg. SIMD Inverses Graph Alg. From Verification to Synthesis 7 Reference: Synthesis of Loop-free Programs, PLDI 2011, Gulwani, Jha, Tiwari, Venkatesan
Language –Programs Straight-line programs –Automata –Queries User Intent –Logic, Natural Language –Examples, Demonstrations/Traces –Program Search Technique –SAT/SMT solvers (Formal Methods) –A*-style goal-directed search (AI) –Version space algebras (Machine Learning) 8 Dimensions in Synthesis
Straight-line programs that use –Arithmetic Operators: +,-,*,/ –Logical Operators: Bitwise and/or/not, Shift left/right 9 Bitvector Algorithms
Turn-off rightmost 1-bit 10 Examples of Bitvector Algorithms Z Z & (Z-1) Z Z & Z & (Z-1)
11 Examples of Bitvector Algorithms Turn-off rightmost contiguous sequence of 1-bits Z Z & (1 + (Z | (Z-1))) Ceil of average of two integers without overflowing (Y|Z) – ((Y © Z) >> 1)
12 Examples of Bitvector Algorithms Higher order half of product of x and y o1 := and(x,0xFFFF); o2 := shr(x,16); o3 := and(y,0xFFFF); o4 := shr(y,16); o5 := mul(o1,o3); o6 := mul(o2,o3); o7 := mul(o1,o4); o8 := mul(o2,o4); o9 := shr(o5,16); o10 := add(o6,o9); o11 := and(o10,0xFFFF); o12 := shr(o10,16); o13 := add(o7,o11); o14 := shr(o13,16); o15 := add(o14,o12); res := add(o15,o8); Round up to next highest power of 2 o1 := sub(x,1); o2 := shr(o1,1); o3 := or(o1,o2); o4 := shr(o3,2); o5 := or(o3,o4); o6 := shr(o5,4); o7 := or(o5,o6); o8 := shr(o7,8); o9 := or(o7,o8); o10 := shr(o9,16); o11 := or(o9,o10); res := add(o10,1);
Given: Specification of desired functionality Specification of library components Synthesize a straight-line program 13 Problem Definition where Each variable in is either or some where k<j is a permutation of 1...n that meets the desired specification. Verification Constraint
Specification of desired functionality Specification of library components 14 Problem Definition: Turn-off rightmost 1 bit
15 Synthesis Constraint Verification Constraint Synthesis Constraint
represents which component goes on which location (line #) and from which location does it gets its input arguments. We encode this by location variables L. 16 Idea # 1: Reduce Second-order Quantification in Synthesis Constraint to First Order
17 Example: Possible programs that use 2 components and their Representation using Location Variables
Consistency Constraint: Every line in the program should have at most one component. 18 Encoding Well-formedness of Programs Acyclicity Constraint: A variable should be initialized before being used. The following constraint ensures that L assignments correspond to well-formed programs.
19 Encoding data-flow The following constraint describes connections between inputs and outputs of various components.
20 Idea # 1: Reduce Second-order Quantification in Synthesis Constraint to First Order
Synthesis constraint is of the form: 9 L 8 Y F(L,Y) Finite Synthesis Step 9 L F(L,y 1 ) Æ … Æ F(L,y n ) Verification Step Does 8 Y F(S,Y) hold? Or, equivalently 9 Y : F(S,Y) Solution Y = y n+1 return S 21 Choose some values y1,..,yn for y Solution L = S Failure No Solution Idea # 2: Using CEGIS style procedure to solve the Synthesis Constraint
Experiments: Comparison with Brute-force Search 22 ProgramBrahmaAHA time Namelinesiterstime P P P P P P P73212 P83211 P93267 P P P ProgramBrahmaAHA time Namelinesiterstime P13446X P144460X P X P164562X P P186546X P196535X P X P218528X P X P X P X P X
Reference: Program Synthesis by Sketching, Phd Thesis 2008, Armando Solar-Lezama (Advisor: Ras UC-Berkeley) Key Ideas: –Write an arbitrary program with holes, where each hole takes values from a finite domain. –Use CEGIS to generate SAT constraints on holes. Cons: Not as efficient as domain-specific synthesizers. –(On bitvector benchmark, times out on 9/25 tasks, and on the remaining it is slower by 20x on average). Pros: –A very powerful formalism that can be used to model a variety of synthesis problems. –Sees synthesis as an interactive process. Related Work: Program Sketching 23
Synthesizing Bitvector Algorithms from Examples Reference: Oracle-Guided Component-Based Program Synthesis, ICSE 2012, Jha, Gulwani, Seshia, Tiwari
Language –Programs Straight-line programs –Automata –Queries User Intent –Logic, Natural Language –Examples, Demonstrations/Traces –Program Search Technique –SAT/SMT solvers (Formal Methods) –A*-style goal-directed search (AI) –Version space algebras (Machine Learning) 25 Dimensions in Synthesis
26 Synthesis from Logical Specifications Turn-off rightmost contiguous string of 1-bits
27 Interactive Synthesis using Examples
Turn-off rightmost contiguous string of 1’s User: > Tool: ? User: Tool: ? User: Tool: ? User: Tool: ? User: Tool: ? User: Tool: Your program is 28 Interactive Synthesis using Examples
29 Distinguishing Input Constraint L is consistent with the set E of examples.
Reference: Automated Synthesis of Symbolic Instruction Encodings from I/O Samples PLDI 2012, Patrice Godefroid, Ankur Taly Key Idea: Generate upfront a set of distinguishing inputs for a given class of programs. Pros: Makes the process much more efficient that distinguishing constraint generation at run-time. Cons: Domain-specific Related Work: Smart Sampling 30
ApplicationGenerating Synthesis Constraint Solving Synthesis Constraint BitvectorLocation variablesCEGIS + SMT Loopy Alg.Template-basedSMT SIMD Inverses Graph Alg. From Verification to Synthesis 31 Reference: From Program Verification to Program Synthesis, POPL 2011, Srivastava, Gulwani, Foster
Language –Programs Straight-line programs –Automata –Queries User Intent –Logic, Natural Language –Examples, Demonstrations/Traces –Program Search Technique –SAT/SMT solvers (Formal Methods) –A*-style goal-directed search (AI) –Version space algebras (Machine Learning) 32 Dimensions in Synthesis
Template-based Invariant Generation Goal-directed invariant generation for verification of a Hoare triple (Pre, Program, Post) 33 Pre while (c) S Post Pre ) I I Æ : c ) Post (I Æ c)[S] ) I 9 I I 8X8X Verification Constraint (Second-order) Base Case Precision Inductive Case VCGen Key Idea: Reduce the second-order verification constraint to a first-order satisfiability constraint that can be solved using off-the-shelf SAT/SMT solvers –Choose a template for I (specific color/shade in some logic). –Convert 8 into 9.
Trick for converting 8 to 9 is known for following domains: Linear Arithmetic –Farkas Lemma Linear Arithmetic + Uninterpreted Fns. –Farkas Lemma + Ackerman’s Reduction Non-linear Arithmetic –Grobner Basis Predicate Abstraction –Boolean indicator variables + Cover Algorithm (Abduction) Quantified Predicate Abstraction –Boolean indicator variables + More general Abduction 34 Key Idea in reducing 8 to 9 for various Domains
Linear Arithmetic –Constraint-based Linear-relations analysis; Sankaranarayanan, Sipma, Manna; SAS ’04 –Program analysis as constraint solving; Gulwani, Srivastava, Venkatesan; PLDI ‘08 Linear Arithmetic + Uninterpreted Fns. –Invariant synthesis for combined theories; Beyer, Henzinger, Majumdar, Rybalchenko; VMCAI ‘07 Non-linear Arithmetic –Non-linear loop invariant generation using Gröbner bases; Sankaranarayanan, Sipma, Manna; POPL ’04 Predicate Abstraction –Constraint-based invariant inference over predicate abstraction; Gulwani, Srivastava, Venkatesan; VMCAI ’09 Quantified Predicate Abstraction –Program verification using templates over predicate abstraction; Srivastava, Gulwani; PLDI ‘09 35 Template-based Invariant Generation: References
Postcondition: The best fit line shouldn’t deviate more than half a pixel from the real line, i.e., |y – (Y/X)x| · 1/2 36 Example: Bresenham’s Line Drawing Algorithm [0<Y · X] v 1 :=2Y-X; y:=0; x:=0; while (x · X) out[x] := y; if (v 1 <0) v 1 :=v 1 +2Y; else v 1 :=v 1 +2(Y-X); y++; return out; [ 8 k (0 · k · X ) |out[k]–(Y/X)k| · ½)]
37 Transition System Representation [0<Y · X] v 1 :=2Y-X; y:=0; x:=0; while (x · X) v 1 <0: out’=Update(out,x,y) Æ v’ 1 =v 1 +2Y Æ y’=y Æ x’=x+1 v 1 ¸ 0: out’=Update(out,x,y) Æ v 1 =v 1 +2(Y-X) Æ y’=y+1 Æ x’=x+1 [ 8 k (0 · k · X ) |out[k]–(Y/X)k| · ½)] [Pre] s entry ; while (g loop ) g body1 : s body1 ; g body2 : s body2 ; [Post] Where, g body1 : v 1 <0 g body2 : v 1 ¸ 0 g loop : x · X s entry : v 1 ’=2Y-X Æ y’=0 Æ x’=0 s body1 : out’=Update(out,x,y) Æ v’ 1 =v 1 +2Y Æ x’=x+1 Æ y’=y s body2 : out’=Update(out,x,y) Æ v’ 1 =v 1 +2(Y-X) Æ x’=x+1 Æ y’=y+1 Or, equivalently,
38 Verification Constraint Generation & Solution Pre Æ s entry ) I’ I Æ g loop Æ g body1 Æ s body1 ) I’ I Æ g loop Æ g body2 Æ s body2 ) I’ I Æ : g loop ) Post 0<Y · X Æ v 1 =2(x+1)Y-(2y+1)X Æ 2(Y-X) · v 1 · 2Y Æ 8 k(0 · k · x ) |out[k]–(Y/X)k| · ½) Given Pre, Post, g loop, g body1, g body2, s body1, s body2, we can find solution for I using constraint-based techniques. Verification Constraint: I:
39 From Verification to Synthesis Verification Constraint: Pre Æ s entry ) I’ I Æ g loop Æ g body1 Æ s body1 ) I’ I Æ g loop Æ g body2 Æ s body2 ) I’ I Æ : g loop ) Post What if we treat each g and s as unknowns like I? We get a solution that has g body1 = g body2 = false. –This doesn’t correspond to a valid transition system. –We can fix this by encoding g body1 Ç g body2 = true. We now get a solution that has g loop = true. –This corresponds to a non-terminating loop. –We can fix this by encoding existence of a ranking function. We now discover each g and s along with I. –We have gone from Invariant Synthesis to Program Synthesis.
ApplicationGenerating Synthesis Constraint Solving Synthesis Constraint BitvectorLocation variablesCEGIS + SMT Loopy Alg.Template-basedSMT SIMDRelational verificationCEGIS + Reachability value graph Inverses Graph Alg. From Verification to Synthesis 40 Reference: From Relational Verification to SIMD Loop Synthesis, PPoPP 2013 (Best Paper Award), Barthe, Crespo, Gulwani, Kunz, Marron
Language –Programs Straight-line programs –Automata –Queries User Intent –Logic, Natural Language –Examples, Demonstrations/Traces –Program Search Technique –SAT/SMT solvers (Formal Methods) –A*-style goal-directed search (AI) –Version space algebras (Machine Learning) 41 Dimensions in Synthesis
Example (Exists Function) struct { int tag; int score; } widget; int exists(widget* vals, int len, int t, int s) { for(int i = 0; i < len; ++i) { int tagok = vals[i].tag == t; int scoreok = vals[i].score > s; int andok = tagok & scoreok; if(andok) return 1; } return 0; }
int exists_sse(widget* vals, int len, int t, int s) { m128i vect = [t, t, t, t] ; m128i vecs = [s, s, s, s] ; for(int i=0; i < (len - 3); i=i+4) { m128i blck1 = load_128(vals, i); m128i blck2 = load_128(vals, i + 4); m128i tagvs = shuffle_i32(blck1, blck2, ORDER(0,2,0,2)); m128i scorevs = shuffle_i32(blck1, blck2, ORDER(1,3,1,3)); m128i cmptag = cmpeq_i32(vect, tagvs); m128i cmpscore = cmpgt_i32(vecs, scorevs); m128i cmpr = and_i128(cmptag, cmpscore); int match = !allzeros(cmpr); if (match) return 1; } return 0; } SIMD Example (Exists Function) [t i, s i, t i+1, s i+1 ] [t i+2, s i+2, t i+3, s i+3 ] [t i, t i+1, t i+2, t i+3 ] [s i, s i+1, s i+2, s i+3 ] [t i ==t ? 0xF…F : 0x0, …, t i+3 ==t ? 0xF…F : 0x0] [s i >s ? 0xF…F : 0x0, …, s i+3 >s ? 0xF…F : 0x0] [cmptag 0 & cmpscore 0, …, cmptag 3 & cmpscore 3 ] (cmpr 0 !=0 | cmpr 1 !=0 | cmpr 2 !=0 | cmpr 3 !=0)
Performance Impact
Verification Methodology struct { int tag; int score; } widget; int exists(widget* vals, int len, int t, int s) { for(int i = 0; i < len; ++i) { int tagok = vals[i].tag == t; int scoreok = vals[i].score > s; int andok = tagok & scoreok; if(andok) return 1; } return 0; }
Verification: Step 1 (Structural transformation of source) struct { int tag; int score; } widget; int exists(widget* vals, int len, int t, int s) { for(int i = 0; i < len-3; i=i+4) { int tagok0 = vals[i].tag == t; int scoreok0 = vals[i].score > s; int andok0 = tagok0 & scoreok0;... int tagok3 = vals[i+3].tag == t; int scoreok3 = vals[i+3].score > s; int andok3 = tagok3 & scoreok3; match = andok3 | andok1 | andok2 | andok3 if(match) return 1; } return 0; }
47 Verification: Step 2 (Simulation relation)
Verification Condition: R pre => WP(B 2, WP(B 1, R post ) Synthesis Condition: Find B 2 such that the Hoare triple holds, where R’ = WP(B 1, R post ). R pre and R post require guessing the vectorized versions of variables in the source code. There are two kinds of vectorized variables. Loop invariant expressions e 1 : v 2 = Reduction variables r 1 : v 2 [1] + v 2 [2] + v 2 [3] + v 2 [4] = r 1 48 Synthesis
Other Examples 49
ApplicationGenerating Synthesis Constraint Solving Synthesis Constraint BitvectorLocation variablesCEGIS + SMT Loopy Alg.Template-basedSMT SIMDRelational verificationCEGIS + Reachability value graph InversesTemplate-based + symbolic execution SMT Graph Alg. From Verification to Synthesis 50 Reference: Path-based Inductive Synthesis for Program Inversion, PLDI 2011, Srivastava, Gulwani, Chaudhuri, Foster
Language –Programs Straight-line programs –Automata –Queries User Intent –Logic, Natural Language –Examples, Demonstrations/Traces –Program Search Technique –SAT/SMT solvers (Formal Methods) –A*-style goal-directed search (AI) –Version space algebras (Machine Learning) 51 Dimensions in Synthesis
In-place run-length encoding: A = [1,1,1,0,0,2,2,2,2] Encoder A=[1,0,2] N=[3,2,4] Decoder A’=[1,1,1,0,0,2,2,2,2] Program Inversion: Example 52 IN(A,n); Assume (n >= 0) i, m := 0, 0; // parallel assignment while (i<n) r := 1; while (i+1<n && A[i]=A[i+1]) r, i := r+1, i+1; A[m], N[m], m, i := A[i], r, m+1, i+1; OUT(A,N,m); IN(A,N,m) i’, m’ := 0, 0; while (m’ < m) r’ := N[m’]; while (r’>0) r’,i’, A’[i’] := r’-1, i’+1, A[m’]; m’ := m’+1; OUT(A’,m’); assert(A’=A; m’=n);
In-place run-length encoding: A = [1,1,1,0,0,2,2,2,2] Encoder A=[1,0,2] N=[3,2,4] Decoder A’=[1,1,1,0,0,2,2,2,2] Program Inversion as Synthesis Problem 53 IN(A,n); Assume (n >= 0) i, m := 0, 0; // parallel assignment while (i<n) r := 1; while (i+1<n && A[i]=A[i+1]) r, i := r+1, i+1; A[m], N[m], m, i := A[i], r, m+1, i+1; OUT(A,N,m);
Synthesis Technique 54
Other Program Inversion Examples 55 LZ77 Compressor LZW Compressor
ApplicationGenerating Synthesis Constraint Solving Synthesis Constraint BitvectorLocation variablesCEGIS + SMT Loopy Alg.Template-basedSMT SIMDRelational verificationCEGIS + Reachability value graph InversesTemplate-based + symbolic execution SMT Graph Alg.Already in constraint form! CEGIS + Fact enumeration From Verification to Synthesis 56 Reference: A Simple Inductive Synthesis Methodology and its Applications, OOPSLA 2010, Itzhaky, Gulwani, Immerman, Sagiv
Language –Programs Straight-line programs –Automata –Queries User Intent –Logic, Natural Language –Examples, Demonstrations/Traces –Program Search Technique –SAT/SMT solvers (Formal Methods) –A*-style goal-directed search (AI) –Version space algebras (Machine Learning) 57 Dimensions in Synthesis
58 Bipartite-ness Synthesis Algorithm Specification: Second order logic Implementation: First order logic + Transitive Closure
59 Tree-ness Specification: Second order logic Implementation: First order logic + Transitive Closure Synthesis Algorithm
Use CEGIS to generate models (where source specification and target query don’t match). Generate the set S of clauses (from the target query language) that are satisfied by each positive model. Generate a minimal subset of S that rules out each negative model. 60 Synthesis Algorithm
ApplicationGenerating Synthesis Constraint Solving Synthesis Constraint BitvectorLocation variablesCEGIS + SMT Loopy Alg.Template-basedSMT SIMDRelational verificationCEGIS + Reachability value graph InversesTemplate-based + symbolic execution SMT Graph Alg.Already in constraint form! CEGIS + Fact enumeration From Verification to Synthesis 61