The Sketching Approach to Program Synthesis

Slides:

Advertisements

Similar presentations

Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?

Advertisements

PRESENTED BY MATTHEW GRAF AND LEE MIROWITZ Linked Lists.

Greta YorshEran YahavMartin Vechev IBM Research. { ……………… …… …………………. ……………………. ………………………… } P1() Challenge: Correct and Efficient Synchronization { ……………………………

Greta YorshEran YahavMartin Vechev IBM Research. { ……………… …… …………………. ……………………. ………………………… } T1() Challenge: Correct and Efficient Synchronization { ……………………………

Introducing Formal Methods, Module 1, Version 1.1, Oct., Formal Specification and Analytical Verification L 5.

Software Engineering & Automated Deduction Willem Visser Stellenbosch University With Nikolaj Bjorner (Microsoft Research, Redmond) Natarajan Shankar (SRI.

1/20 Generalized Symbolic Execution for Model Checking and Testing Charngki PSWLAB Generalized Symbolic Execution for Model Checking and Testing.

Symbolic execution © Marcelo d’Amorim 2010.

Programming Languages Marjan Sirjani 2 2. Language Design Issues Design to Run efficiently : early languages Easy to write correctly : new languages.

Program Representations. Representing programs Goals.

Lecture 8 CS203. Implementation of Data Structures 2 In the last couple of weeks, we have covered various data structures that are implemented in the.

Semantic analysis Parsing only verifies that the program consists of tokens arranged in a syntactically-valid combination, we now move on to semantic analysis,

COMP171 Data Structure & Algorithm Tutorial 1 TA: M.Y.Chan.

C. FlanaganSAS’04: Type Inference Against Races1 Type Inference Against Races Cormac Flanagan UC Santa Cruz Stephen N. Freund Williams College.

Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.

Algorithms and Problem Solving-1 Algorithms and Problem Solving.

Object (Data and Algorithm) Analysis Cmput Lecture 5 Department of Computing Science University of Alberta ©Duane Szafron 1999 Some code in this.

Introduction to Computer Science Exam Information Unit 20.

1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.

Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.

JS Arrays, Functions, Events Week 5 INFM 603. Agenda Arrays Functions Event-Driven Programming.

Abstraction IS 101Y/CMSC 101 Computational Thinking and Design Tuesday, September 17, 2013 Carolyn Seaman University of Maryland, Baltimore County.

Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

CUTE: A Concolic Unit Testing Engine for C Technical Report Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.

Imperative Programming

Comp 245 Data Structures Software Engineering. What is Software Engineering? Most students obtain the problem and immediately start coding the solution.

CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Lucas Bang Lecture 15: Linked data structures.

Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.

Benjamin Gamble. What is Time?  Can mean many different things to a computer Dynamic Equation Variable System State 2.

Generative Programming Meets Constraint Based Synthesis Armando Solar-Lezama.

Stephen P. Carl - CS 2421 Recursion Reading : Chapter 4.

Abstraction IS 101Y/CMSC 101 Computational Thinking and Design Tuesday, September 17, 2013 Marie desJardins University of Maryland, Baltimore County.

CS 1308 Computer Literacy and The Internet Software.

Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.

Inferring Specifications to Detect Errors in Code Mana Taghdiri Presented by: Robert Seater MIT Computer Science & AI Lab.

Testing and Debugging Version 1.0. All kinds of things can go wrong when you are developing a program. The compiler discovers syntax errors in your code.

CS 363 Comparative Programming Languages Semantics.

Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.

Synthesis with the Sketch System D AY 1 Armando Solar-Lezama.

Synthesis, Analysis, and Verification Lecture 05a Lectures: Viktor Kuncak Programs with Data Structures: Assertions for Accesses. Dynamic Allocation.

Introduction to Problem Solving. Steps in Programming A Very Simplified Picture –Problem Definition & Analysis – High Level Strategy for a solution –Arriving.

Storyboard Programming Rishabh Singh and Armando Solar-Lezama.

CS703: PROJECT GUIDELINES 1. Logistics: Project Most important part of the course Teams of 1 or 2 people Expectations commensurate with size of team Deliverables.

CSE 190p wrapup Michael Ernst CSE 190p University of Washington.

Synthesis with the Sketch System D AY 2 Armando Solar-Lezama.

Basic Scheme February 8, 2007 Compound expressions Rules of evaluation Creating procedures by capturing common patterns.

Principle of Programming Lanugages 3: Compilation of statements Statements in C Assertion Hoare logic Department of Information Science and Engineering.

1/33 Basic Scheme February 8, 2007 Compound expressions Rules of evaluation Creating procedures by capturing common patterns.

CUTE: A Concolic Unit Testing Engine for C Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.

CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Tevfik Bultan Lecture 4: Introduction to C: Control Flow.

LECTURE 3 Compiler Phases. COMPILER PHASES Compilation of a program proceeds through a fixed series of phases.  Each phase uses an (intermediate) form.

Efficiently Solving Computer Programming Problems Doncho Minkov Telerik Corporation Technical Trainer.

CSE 332: C++ STL iterators What is an Iterator? An iterator must be able to do 2 main things –Point to the start of a range of elements (in a container)

Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From

CPS 100e 5.1 Inheritance and Interfaces l Inheritance models an "is-a" relationship  A dog is a mammal, an ArrayList is a List, a square is a shape, …

Finding bugs with a constraint solver daniel jackson. mandana vaziri mit laboratory for computer science issta 2000.

Basic Scheme February 8, 2007 Compound expressions Rules of evaluation

Introduction to Sketching

Introduction to Sketching

Automating Induction for Solving Horn Clauses

Introduction to Problem Solving and Programming CS140: Introduction to Computing 1 8/19/13.

Lecture 7 Constraint-based Search

Functions Inputs Output

Guest Lecture by David Johnston

Templates of slides for P4 Experiments with your synthesizer

Data Structures & Algorithms

CUTE: A Concolic Unit Testing Engine for C

Predicate Abstraction

Synthesis with the Sketch System

Presentation transcript:

The Sketching Approach to Program Synthesis Armando Solar-Lezama

The Challenge Computer should help make programming easier Problem Programming requires insight and experience Computers are not that smart Interaction between programmers and tools is key

The sketching approach New programming model based on localized synthesis Let the programmer control the implementation strategy Focus the synthesizer on the low-level details Key design principle: Exploit familiar programming concepts

Sketch language basics Sketches are programs with holes write what you know use holes for the rest 2 semantic issues specifications How does SKETCH know what program you actually want? holes Constrain the set of solutions the synthesizer may consider

Specifications Idea: Use tests as specification Programmers know how to write those Non-determinism in the test improves coverage System ensures test will pass for all inputs/choices

Example Test harness for a linked list based Queue: Bound Known implementation void test(int[N] in){ ArrayQueue arrQueue = new ArrayQueue(); llQueue arrQueue = new llQueue(); for(int i=0; i<N; ++i){ if(*){ arrQueue.enqueue(in[i]); llQueue.enqueue(in[i]); }else{ assert arrQueue.empty() == llQueue.empty(); if(!arrQueue.empty()) assert arrQueue.dequeue() == llQueue.dequeue(); } Sketched implementation non-determinism Assertions

Example Test harness for a Linked List Reversal: Bound void main(int n){ assume n < N; node[N] nodes = null; list l = newList(); populateList(n, l, nodes); // write list to nodes array l = reverseSK(l); check(n, l, nodes); } Sketched implementation Assertions

Example Test harness for a Linked List Reversal: Assertions void check(int n, list l, node[N] nodes){ node cur = l.head; int i=0; while(cur != null){ assert cur == nodes[n-1-i]; cur = cur.next; i = i+1; } assert i == n; if(n > 0){ assert l.head == nodes[n-1]; assert l.tail == nodes[0]; assert l.tail.next == null; }else{ assert l.head == null; assert l.tail == null; Assertions

Defining sets of code fragments is the key to Sketching effectively Holes Holes are placeholders for the synthesizer synthesizer replaces hole with concrete code fragment fragment must come from a set defined by the user Defining sets of code fragments is the key to Sketching effectively

Integer hole Define sets of integer constants Example: Hello World of Sketching spec: int foo (int x) { return x + x; } sketch: int bar (int x) implements foo { return x * ??; } Integer Hole

Integer Holes  Sets of Expressions Expressions with ?? == sets of expressions linear expressions x*?? + y*?? polynomials x*x*?? + x*?? + ?? sets of variables ?? ? x : y

Integer Holes  Sets of Expressions Expressions with ?? == sets of expressions linear expressions x*?? + y*?? polynomials x*x*?? + x*?? + ?? sets of variables ?? ? x : y Semantically powerful but syntactically awkward Regular Expressions are a more convenient way of defining sets

Regular Expression Generators {| RegExp |} RegExp supports choice ‘|’ and optional ‘?’ can be used arbitrarily within an expression to select operands {| (x | y | z) + 1 |} to select operators {| x (+ | -) y |} to select fields {| n(.prev | .next)? |} to select arguments {| foo( x | y, z) |} Set must respect the type system all expressions in the set must type-check all must be of the same type

Least Significant Zero Bit Example: Least Significant Zero Bit 0010 0101  0000 0010 Trick: Adding 1 to a string of ones turns the next zero to a 1 i.e. 000111 + 1 = 001000 int W = 32; bit[W] isolate0 (bit[W] x) { // W: word size bit[W] ret = 0; for (int i = 0; i < W; i++) if (!x[i]) { ret[i] = 1; return ret; } } !(x + 1) & (x + 0) !(x + 1) & (x + 0xFFFF) !(x + ??) & (x + ??)  !(x + 0) & (x + 1) !(x + 0xFFFF) & (x + 1)

Integer Holes  Sets of Expressions Example: Least Significant Zero Bit 0010 0101  0000 0010 int W = 32; bit[W] isolate0 (bit[W] x) { // W: word size bit[W] ret = 0; for (int i = 0; i < W; i++) if (!x[i]) { ret[i] = 1; return ret; } } bit[W] isolateSk (bit[W] x) implements isolate0 { return !(x + ??) & (x + ??) ;

Least Significant Zero bit How did I know the solution would take the form !(x + ??) & (x + ??) . What if all you know is that the solution involves x, +, & and !. bit[W] tmp=0; {| x | tmp |} = {| (!)?((x | tmp) (& | +) (x | tmp | ??)) |}; return tmp; This is now a set of statements (and a really big one too)

More Constructs: repeat Avoid copying and pasting repeat(n){ s}  s;s;…s; each of the n copies may resolve to a distinct stmt n can be a hole too. Keep in mind: the synthesizer won’t try to minimize n n bit[W] tmp=0; repeat(??){ {| x | tmp |} = {| (!)?((x | tmp) (& | +) (x | tmp | ??)) |}; } return tmp;

Putting it together: Linked List Reverse Efficient implementation uses imperative updates and a while loop list reverse(list l){ list nl = new list(); node tmp = null; while( l ){ l } return nl; This should be a pointer comparison (nl | l) (.head | .tail)(.next)? | null

Putting it together: Linked List Reverse Efficient implementation uses imperative updates and a while loop #define LOC {| (nl | l) (.head | .tail)(.next)? | null |} #define CMP {| LOC ( == | != ) LOC |} list reverse(list l){ list nl = new list(); node tmp = null; while( CMP ){ l } return nl; This should be a series of pointer assignments possibly guarded by conditions

Putting it together: Linked List Reverse #define LOC {| (nl | l) (.head | .tail)(.next)? | null |} #define CMP {| LOC ( == | != ) LOC |} #define LHS {| (tmp | (l | nl)(.head | .tail))(.next)? |} list reverse(list l){ list nl = new list(); node tmp = null; while( CMP ){ repeat(??) if(CMP){ LHS = {| LOC | tmp |}; } } return nl; This resolves in less than 1 min

Example: remove() for a sorted, linked list The data structure: linked list of Nodes sorted by Node.key with sentries at head and tail The problem: implement a concurrent remove() method c -∞ +∞ b a |…> we want to implement a [HIGHLY?] concurrent remove() method for a set data type with the following implementation

Thinking about the problem Sequential remove (): Insight 1: for efficiency, use fine-grain locking of individual Nodes, rather than the whole list Insight 2: Keep a sliding window with two locks -∞ a b c +∞

Hand Over Hand Locking bit add(Set S, int key) { bit ret = 0; Node prev = null; Node cur = S.head; aas if(??) lock(LOC); find (S, key, prev, cur); if (key < cur.key) { prev.next = newNode (key, cur); ret = 1; } else { ret = 0; } if(??) unlock(LOC); return ret; void find (Set S, int key, ref Node prev, ref Node cur) { while (cur.key < key) { reorder{ if(COMP) lock(LOC); if(COMP) unlock(LOC); { prev = cur; cur = cur.next; } } #define COMP {| ((cur|prev)(.next)? | null) (== | !=) (cur|prev)(.next)? |} #define LOC {| (cur | prev )(.next)? |}

Defining the synthesis problem

Defining a solution to a sketch Defined in terms of a control function Control defines an integer value for each hole Values in the sketch are parameterized by i.e. program values depend on the values of holes Goal: Find a control such that all assertions succeed for all inputs

Solving sketches sequentially Syntax directed translation finds constraints int lin(int x){ return ??*x + ??; } void main(int x){ int t1 = lin(x); int t2 = lin(x+1); assert lin(0) == 1; if(x<4) assert t1 >= x*x; if(x>=3) assert t2-t1 == 3;

Solving sketches sequentially Syntax directed translation finds constraints φ(??1) = 3 φ(??2) = 1 is a valid solution. int lin(int x){ return φ(??1)*x + φ(??2); } void main(int x){ int t1 = lin(x); int t2 = lin(x+1); assert lin(0) == 1; if(x<4) assert t1 >= x*x; if(x>=3) assert t2-t1 == 3; φ(??1)*0 + φ(??2) == 1 x < 4  φ(??1)*x + φ(??2) >= x*x x >=3  φ(??1)*(x+1) - φ(??1)*x == 3

A Sketch as a constraint system Synthesis reduces to constraint satisfaction Constraints are too hard for standard techniques Universal quantification over inputs Too many inputs Too many constraints Too many holes Q(x, φ) φ. x. A E

Insight Sketches are not arbitrary constraint systems They express the high level structure of a program A small number of inputs can be enough focus on corner cases This is an inductive synthesis problem ! Q(x, φ) φ. x in E. A E where E = {x1, x2, …, xk}

Counterexample –Guided Inductive Synthesis (CEGIS) Sequential candidate implementation succeed Inductive Synthesizer Automated Validation Derive candidate implementation from concrete inputs. fail ok buggy Your verifier/checker goes here E φ. x in E. A The idea behind inductive synthesis is that rather than looking for a candidate that is correct for all inputs, we want to look for a candidate which works correctly for all inputs in a small set of observations E. If no such candidate exists, then clearly the sketch must be buggy. If we find such a candidate, then we want to know if it generalizes to all inputs, so we pass it to an automated validation procedure; a bounded-model checker in our case. If the validation succeeds, then we are done; but if it doesn’t, the validator produces something really valuable, a new observation which covers a corner-case in the sketch not covered by any of the previous observations. By adding this counterexample to our observation set, we can very quickly converge on an observation which works for all inputs. In the sequential case, we were able to search spaces of over 30000 bits in about 600 iterations. In order to extend this algorithm to handle concurrency we must address several problems. Q(x, φ) fail counterexample input observation set E

What about concurrency?

Reframing the problem E A E A φ. x. Q(x, φ) φ. t in tr(P). Q(t, φ) Sequential constraints are in terms of inputs derived from the sequential semantics of the program Concurrent constraints defined in terms of traces traces have sequential semantics Q(x, φ) φ. x. A E set of traces of the sketch Q(t, φ) φ. t in tr(P). A E We can do inductive synthesis on traces as well!

Counterexample –Guided Inductive Synthesis (CEGIS) Sequential Concurrent candidate implementation succeed Inductive Synthesizer Derive candidate implementation from concrete inputs. Inductive Synthesizer Automated Validation Derive candidate implementation from counterexample traces fail ok buggy Your verifier/checker goes here SPIN E φ. t in E. A First, we need a validation procedure which understands concurrency. In our case we use the bounded model checker SPIN. But the biggest problem is that the counterexample is no longer an input, but a trace showing a bug with a specific thread interleaving. The problem with these counterexamples is that, unlike inputs, traces are very specific to the particular candidate that produced them. So the inductive synthesizer is left with the task of producing a candidate implementation that avoids the bugs exposed by traces from several different candidate implementations. Q(t, φ) fail counterexample input counterexample trace observation set E traces of the sketch != trace of a candidate

Traces are only relevant to the program they came from Learning from traces Problem: Traces are only relevant to the program they came from Solution: Trace Projection tp ► S Desired property if P’ shares the bug exposed by tp … … tp ► S should expose the bug too allows inductive synthesis through constraint solving Trace on sketch S Trace on program P

How to do projection The key is to preserve statement ordering statements in the trace ⊆ statements in the Control flow may be an obstacle but we can get rid of it

Projection Algorithm: Key Ideas Sk[c] Sk P = Sk[0] Want a parameterized set of traces preserve order of common statements Algorithm If-Conversion of the Sketch Easier to build a trace Schedule the statements preserve order from original trace if(c){ S1; }else { S2; S3; } S4; if(c) S1; if(!c) S2; if(!c) S3; S4; S2; S3; S4; if(c) S1; if(!c) S2; if(!c) S3; S4; tp ► Sk c=0 c=1 S2; S3; S4; tP

Sample of what sketch can do Bit-level manipulations If you want to do low-level bit-vector manipulations use sketch don’t waste time doing it by hand… …but don’t expect the SAT solver to break crypto functions Integer problems very good at coming up with tricky arithmetic expressions … but only when used with an SMT solver backend Data-structures Very good at local manipulations of lists and trees We have sketched some interesting graph algorithms Concurrent objects Various implementations of locked and lock-free manipulations of lists/queues Synchronization objects such as 2-phase barrier

So are all the problems now solved? Synthesis is in its infancy Big opportunities for improvement

The big problems Scalability how do we scale to bigger programs? how do we scale to bigger holes? how do you scale to programs that require complex high-level reasoning?

The really big problems How do we scale to bigger programs? it’s not about making the solver faster although that helps it’s about achieving modularity How do we scale to bigger holes? it’s about capturing insight at the right level of abstraction What about complex reasoning? it’s about capturing domain knowledge

Some directions of new research Abstraction enhanced sketching discover facts about the solution by analyzing abstract sketch use these facts to boost scalability of CEGIS Sketching for very large programs use data-driven abstractions to model the program behavior synthesizer learns by observation Sketching for numerical control software very important domain reasoning about discrete and numerical computation in tandem

Points to take home It’s time for a revolution in programming tools Unprecedented ability to reason about programs Unprecedented access to large-scale computing resources Unprecedented challenges faced by programmers Successful tools can’t ignore the programmer programmers know too much to be replaced by machines but they sure need our help!