Programming by Sketching

Slides:



Advertisements
Similar presentations
_Synthesis__________________ __Of_______________________ ___First-Order_____Dynamic___ _____________Programming___ _______________Algorithms___ Yewen (Evan)
Advertisements

MATH 224 – Discrete Mathematics
Introducing Formal Methods, Module 1, Version 1.1, Oct., Formal Specification and Analytical Verification L 5.
Counting the bits Analysis of Algorithms Will it run on a larger problem? When will it fail?
Lecture3: Algorithm Analysis Bohyung Han CSE, POSTECH CSED233: Data Structures (2014F)
The Future of Correct Software George Necula. 2 Software Correctness is Important ► Where there is software, there are bugs ► It is estimated that software.
Sketching high-performance implementations of bitstream programs. Armando Solar-Lezama, Rastislav Bodik UC Berkeley.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
– 1 – Basic Machine Independent Performance Optimizations Topics Load balancing (review, already discussed) In the context of OpenMP notation Performance.
Relational Verification to SIMD Loop Synthesis Mark Marron – IMDEA & Microsoft Research Sumit Gulwani – Microsoft Research Gilles Barthe, Juan M. Crespo,
Programming by Sketching Armando Solar-Lezama, Liviu Tancau, Gilad Arnold, Rastislav Bodik, Sanjit Seshia UC Berkeley, Rodric Rabbah MIT, Kemal Ebcioglu,
Generative Programming Meets Constraint Based Synthesis Armando Solar-Lezama.
Analysis of Algorithms
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Synthesis with the Sketch System D AY 1 Armando Solar-Lezama.
Introduction to Problem Solving. Steps in Programming A Very Simplified Picture –Problem Definition & Analysis – High Level Strategy for a solution –Arriving.
Data Structure Introduction.
Algorithm Analysis. What is an algorithm ? A clearly specifiable set of instructions –to solve a problem Given a problem –decide that the algorithm is.
Today’s Material Sorting: Definitions Basic Sorting Algorithms
Big O David Kauchak cs302 Spring Administrative Assignment 1: how’d it go? Assignment 2: out soon… Lab code.
Recursion Unrolling for Divide and Conquer Programs Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.
BITS Pilani Pilani Campus Data Structure and Algorithms Design Dr. Maheswari Karthikeyan Lecture1.
Programming by Sketching Ras Bodik. 2 The Problem Problem: k-line algorithm translates to k lines of code. 30-year-old idea: Can we synthesize the.
FURQAN MAJEED ALGORITHMS. A computer algorithm is a detailed step-by-step method for solving a problem by using a computer. An algorithm is a sequence.
Advanced Algorithms Analysis and Design
Chapter 7. Classification and Prediction
L is in NP means: There is a language L’ in P and a polynomial p so that L1 ≤ L2 means: For some polynomial time computable map r :  x: x  L1 iff.
Analysis of Algorithms
Introduction to Parsing
Database Management System
Introduction to Parsing (adapted from CS 164 at Berkeley)
Introduction to Sketching
School of Computer Science and Engineering Pusan National University
Introduction Introduction to VHDL Entities Signals Data & Scalar Types
Introduction Algorithms Order Analysis of Algorithm
GC211Data Structure Lecture2 Sara Alhajjam.
New applications of program synthesis
MA/CSSE 473 Day 02 Some Numeric Algorithms and their Analysis
Umans Complexity Theory Lectures
Introduction to Sketching
Algorithms Furqan Majeed.
CS 213: Data Structures and Algorithms
4 (c) parsing.
CS 3343: Analysis of Algorithms
Description Given a linear collection of items x1, x2, x3,….,xn
Lecture 7 Constraint-based Search
Syntax-Directed Translation
Algorithms Chapter 3 With Question/Answer Animations
Analysis Algorithms.
Chapter 11 Introduction to Programming in C
Algorithm An algorithm is a finite set of steps required to solve a problem. An algorithm must have following properties: Input: An algorithm must have.
Programming by Sketching for Bit-Streaming Programs
Objective of This Course
Unit-2 Divide and Conquer
Over-Approximating Boolean Programs with Unbounded Thread Creation
Chapter 11 Introduction to Programming in C
CS200: Algorithms Analysis
Programming and Data Structure
Programming and Data Structure
Chapter 11 Limitations of Algorithm Power
CSE 2010: Algorithms and Data Structures Algorithms
UNIVERSITY OF MASSACHUSETTS Dept
NP-Completeness Reference: Computers and Intractability: A Guide to the Theory of NP-Completeness by Garey and Johnson, W.H. Freeman and Company, 1979.
Yan Shi CS/SE 2630 Lecture Notes
UNIVERSITY OF MASSACHUSETTS Dept
Armando Solar-Lezama, Rastislav Bodik UC Berkeley
Canonical Computation without Canonical Data Structure
David Kauchak cs161 Summer 2009
Cryptography and Network Security Chapter 5 Fifth Edition by William Stallings Lecture slides by Lawrie Brown.
Algorithms and Data Structures
Presentation transcript:

Programming by Sketching Armando Solar-Lezama, Liviu Tancau, Gilad Arnold, Rastislav Bodik, Sanjit Seshia UC Berkeley, Rodric Rabbah MIT, Kemal Ebcioglu, Vijay Saraswat, Vivek Sarkar IBM

Merge sort looks simple to code, but there is a bug int[] mergeSort (int[] input, int n) { return merge( mergeSort (input[0::n/2]), mergeSort (input[n/2+1::n]) , n); } int[] merge (int[] a, int b[], int n) { int j=0, k=0; for (int i = 0; i < n; i++) if ( a[j] < b[k] ) { result[i] = a[j++]; } else { result[i] = b[k++]; return result; looks simple to code, but there is a bug

Merge sort int[] mergeSort (int[] input, int n) { return merge( mergeSort (input[0::n/2]), mergeSort (input[n/2+1::n]) , n); } int[] merge (int[] a, int b[], int n) { int j, k; for (int i = 0; i < n; i++) if ( j<n && ( !(k<n) || a[j] < b[k]) ) { result[i] = a[j++]; } else { result[i] = b[k++]; return result;

The sketching experience implementation (completed sketch) spec specification + So what do we mean by a sketch? A sketch is an incomplete description of the final product you want to implement. It contains the high level features that you want your implementation to have, but it leaves it up to the system to fill in the details. Transition: SKETCHING, AT LEAST FOR NOW, IS DOMAIN-SPECIFIC METHODOLOGY sketch

The spec: bubble sort int[] sort (int[] input, int n) { for (int i=0; i<n; ++i) for (int j=i+1; j<n; ++j) if (input[j] < input[i]) swap(input, j, i); }

Merge sort: sketched hole int[] mergeSort (int[] input, int n) { return merge( mergeSort (input[0::n/2]), mergeSort (input[n/2+1::n]) , n); } int[] merge (int[] a, int b[], int n) { int j, k; for (int i = 0; i < n; i++) if ( expression( ||, &&, <, !, [] ) ) { result[i] = a[j++]; } else { result[i] = b[k++]; return result; hole

Merge sort: synthesized int[] mergeSort (int[] input, int n) { return merge( mergeSort (input[0::n/2]), mergeSort (input[n/2::n]) ); } int[] merge (int[] a, int b[], int n) { int j, k; for (int i = 0; i < n; i++) if ( j<n && ( !(k<n) || a[j] < b[k]) ) { result[i] = a[j++]; } else { result[i] = b[k++]; return result;

Sketching: spec vs. sketch Specification executable: easy to debug, serves as a prototype a reference implementation: simple and sequential written by domain experts: crypto, bio, MPEG committee Sketched implementation program with holes: filled in by synthesizer programmer sketches strategy: machine provides details written by performance experts: vector wizard; cache guru

How sketching fits into autotuning Autotuning: two methods for obtaining code variants optimizing compiler: transform a “spec” in various ways custom generator: for a specific algorithm We seek to simplify the second approach Scenario 1: library of variants stores resolved sketches as if written by hand Scenario 2: library has unresolved, flexible sketches sketch works for a variety of specifications: e.g., a class of stencils

SKETCH A language with support for sketching-based synthesis like C without pointers two simple synthesis constructs restricted to finite programs: input size known at compile time, terminates on all inputs most high-performance kernels are finite: matrix multiply: yes binary search tree: no we’re already working on relaxing the fineteness restriction later in this talk

Ex1: Isolate rightmost 0-bit. 1010 0111  0000 1000 bit[W] isolate0 (bit[W] x) { // W: word size bit[W] ret = 0; for (int i = 0; i < W; i++) if (!x[i]) { ret[i] = 1; break; } return ret; } bit[W] isolate0Fast (bit[W] x) implements isolate0 { return ~x & (x+1); bit[W] isolate0Sketched (bit[W] x) implements isolate0 { return ~(x + ??) & (x + ??);

Programmer’s view of sketches the ?? operator replaced with a suitable constant as directed by the implements clause. the ?? operator introduces non-determinism the implements clause constrains it.

Beyond synthesis of literals Synthesizing values of ?? already very useful parallelization machinery: bitmasks, tables in crypto codes array indices: A[i+??,j+??] We can synthesize more than constants semi-permutations: functions that select and shuffle bits polynomials: over one or more variables actually, arbitrary expressions, programs

Synthesizing polynomials int spec (int x) { return 2*x*x*x*x + 3*x*x*x + 7*x*x + 10; } int p (int x) implements spec { return (x+1)*(x+2)*poly(3,x); int poly(int n, int x) { if (n==0) return ??; else return x * poly(n-1, x) + ??;

Karatsuba’s multiplication x = x1*b + x0 y = y1*b + y0 b=2k x*y = b2*x1*y1 + b*(x1*y0 + x0*y1) + x0*y0 x*y = poly(??,b) * x1*y1 + + poly(??,b) * poly(1,x1,x0,y1,y0)*poly(1,x1, x0, y1, y0) + poly(??,b) * x0*y0 x*y = (b2 +b) * x1*y1 + b * (x1 - x0)*(y1 - y0) + (b+1) * x0*y0

Sketch of Karatsuba bit[N*2] k<int N>(bit[N] x, bit[N] y) implements mult { if (N<=1) return x*y; bit[N/2] x1 = x[0:N/2-1]; bit[N/2+1] x2 = x[N/2:N-1]; bit[N/2] y1 = y[0:N/2-1]; bit[N/2+1] y2 = y[N/2:N-1]; bit[2*N] t11 = x1 * y1; bit[2*N] t12 = poly(1, x1, x2, y1, y2) * poly(1, x1, x2, y1, y2); bit[2*N] t22 = x2 * y2; return multPolySparse<2*N>(2, N/2, t11) // log b = N/2 + multPolySparse<2*N>(2, N/2, t12) + multPolySparse<2*N>(2, N/2, t22); } bit[2*N] poly<int N>(int n, bit[N] x0, x1, x2, x3) { if (n<=0) return ??; else return (??*x0 + ??*x1 + ??*x2 + ??*x3) * poly<N>(n-1, x0, x1, x2, x3); bit[2*N] multPolySparse<int N>(int n, int x, bit[N] y) { if (n<=0) return 0; else return y << x*?? + multPolySparse<N>(n-1, x, y);

Semantic view of sketches a sketch represents a set of functions: the ?? operator modeled as reading from an oracle int f (int y) { int f (int y, bit[][K] oracle) { x = ??; x = oracle[0]; loop (x) { loop (x) { y = y + ??; y = y + oracle[1]; } } return y; return y; } } Synthesizer must find oracle satisfying f implements g

Synthesis algorithm: overview translation: represent spec and sketch as circuits synthesis: find suitable oracle code generation: specialize sketch wrt oracle

Ex : Population count. 0010 0110  3 F(x) = int pop (bit[W] x) { one int pop (bit[W] x) { int count = 0; for (int i = 0; i < W; i++) { if (x[i]) count++; } return count; 1 + mux count + mux count + mux count + mux F(x) = count

Synthesis as generalized SAT The sketch synthesis problem is an instance of 2QBF:  o . x . P(x) = S(x,o) Counter-example driven solver: I = {} x = random() do I = I U {x} c = synthesizeForSomeInputs(I) if c = nil then exit(“buggy sketch'') x = verifyForAllInputs(c) // x: counter-example while x != nil return c S(x1, c)=P(x1)  …  S(xk, c)=P(xk) I ={ x1, x2, …, xk } S(x, c)  P(x)

Case study Implemented AES Our results the modern block-cipher standard 14 rounds: each has table lookup, permutation, GF-multiply a good implementation collapses each round into table lookups Our results we synthesized 32Kbit oracle! synthesis time: about 1 hour counterexample-driven synthesizer iterated 655 times performance of synthesized code within 10% of hand-tuned

Finite programs In theory, SKETCH is complete for all finite programs: specification can specify any finite program sketch can describe any implementation over given instructions synthesizer can resolve any sketch In practice, SKETCH scales for small finite programs small finite programs: block ciphers, small kernels large finite: big-integer multiplication, matrix multiplication Solution: synthesize for a small input size prove (or examine) that result of synthesis works for bigger inputs

Lossless abstraction Problem Approach Stencil kernels does result of synthesis for a small matrix work for all matrices? Approach spec, sketch have unbounded-input/output abstract them into finite functions, with the same abstraction synthesize obtained oracle works for original sketch Stencil kernels concrete: matrix A[N]  matrix B[N] abstract: A[e(i)], i, N  B[i]

Example: divide and conquer parallelization Parallel algorithm: Data rearrangement + parallel computation spec: sequential version of the program sketch: parallel computation automatically synthesized: Rearranging the data (dividing the data structure)