Automating String Processing in Spreadsheets using Input-Output Examples Sumit Gulwani Microsoft Research, Redmond.

Slides:



Advertisements
Similar presentations
Algorithm Analysis Input size Time I1 T1 I2 T2 …
Advertisements

Synthesizing Number Transformations from Input-Output Examples Rishabh Singh and Sumit Gulwani.
From Verification to Synthesis Sumit Gulwani Microsoft Research, Redmond August 2013 Marktoberdorf Summer School Lectures: Part 1.
Representing Boolean Functions for Symbolic Model Checking Supratik Chakraborty IIT Bombay.
1 CS101 Introduction to Computing Lecture 17 Algorithms II.
Sumit Gulwani Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative.
Dimensions in Synthesis Part 2: Applications (Intelligent Tutoring Systems) Sumit Gulwani Microsoft Research, Redmond May 2012.
FlashExtract : A General Framework for Data Extraction by Examples
Fundamentals of Python: From First Programs Through Data Structures
Outline Introduction Related work on packet classification Grouper Performance Empirical Evaluation Conclusions.
Learning Semantic String Transformations from Examples Rishabh Singh and Sumit Gulwani.
Data Manipulation using Programming by Examples and Natural Language Invited Upenn April 2015 Sumit Gulwani.
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. Problem Reduction Dr. M. Sakalli Marmara Unv, Levitin ’ s notes.
Computational Complexity 1. Time Complexity 2. Space Complexity.
Program Verification as Probabilistic Inference Sumit Gulwani Nebojsa Jojic Microsoft Research, Redmond.
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A Sumit Gulwani (MSR Redmond) Component-based Synthesis Susmit Jha.
Data Structures Introduction. What is data? (Latin) Plural of datum = something given.
Usable Synthesis Sumit Gulwani Microsoft Research, Redmond Usable Verification Workshop November 2010 MSR Redmond.
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
VS 3 : Verification and Synthesis using SMT Solvers SMT Solvers for Program Verification Saurabh Srivastava * Sumit Gulwani ** Jeffrey S. Foster * * University.
Program Synthesis for Automating Education Sumit Gulwani Microsoft Research, Redmond.
11 -1 Chapter 11 Randomized Algorithms Randomized algorithms In a randomized algorithm (probabilistic algorithm), we make some random choices.
1 Lecture 10 – Synthesis from Examples Eran Yahav.
Dimensions in Synthesis Sumit Gulwani Microsoft Research, Redmond May 2012.
Programming by Example using Least General Generalizations Mohammad Raza, Sumit Gulwani & Natasa Milic-Frayling Microsoft Research.
Silvio Cesare Ph.D. Candidate, Deakin University.
Cultivating Research Taste (illustrated via a journey in Program Synthesis research) Programming Languages Mentoring Workshop 2015 Sumit Gulwani Microsoft.
Smten: Automatic Translation of High-level Symbolic Computations into SMT Queries Richard Uhler (MIT-CSAIL) and Nirav Dave (SRI International) CAV 2013.
Comp 249 Programming Methodology Chapter 15 Linked Data Structure - Part B Dr. Aiman Hanna Department of Computer Science & Software Engineering Concordia.
From Program Verification to Program Synthesis Saurabh Srivastava * Sumit Gulwani ♯ Jeffrey S. Foster * * University of Maryland, College Park ♯ Microsoft.
Programming by Examples Marktoberdorf Lectures August 2015 Sumit Gulwani.
End-User Programming (using Examples & Natural Language) Sumit Gulwani Microsoft Research, Redmond August 2013 Marktoberdorf Summer.
Generative Programming Meets Constraint Based Synthesis Armando Solar-Lezama.
Dimensions in Synthesis Part 3: Ambiguity (Synthesis from Examples & Keywords) Sumit Gulwani Microsoft Research, Redmond May 2012.
11 -1 Chapter 11 Randomized Algorithms Randomized Algorithms In a randomized algorithm (probabilistic algorithm), we make some random choices.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Synthesis with the Sketch System D AY 1 Armando Solar-Lezama.
Logic (continuation) Boolean Logic and Bit Operations.
Software Testing Input Space Partition Testing. 2 Input Space Coverage Four Structures for Modeling Software Graphs Logic Input Space Syntax Use cases.
Integrating high-level constructs into programming languages Language extensions to make programming more productive Underspecified programs –give assertions,
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
FlashNormalize: Programming by Examples for Text Normalization International Joint Conference on Artificial Intelligence, Buenos Aires 7/29/2015FlashNormalize1.
SNU OOPSLA Lab. 1 Great Ideas of CS with Java Part 1 WWW & Computer programming in the language Java Ch 1: The World Wide Web Ch 2: Watch out: Here comes.
Predicting a Correct Program in PBE Rishabh Singh, Microsoft Research Sumit Gulwani, Microsoft Research.
Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun
Compositional Program Synthesis from Natural Language and Examples Mohammad Raza, Sumit Gulwani & Natasa Milic-Frayling Microsoft.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Nikolaj Bjørner Microsoft Research DTU Winter course January 2 nd 2012 Organized by Flemming Nielson & Hanne Riis Nielson.
FlashMeta Microsoft PROSE SDK: A Framework for Inductive Program Synthesis Oleksandr Polozov University of Washington Sumit Gulwani Microsoft Research.
CES 592 Theory of Software Systems B. Ravikumar (Ravi) Office: 124 Darwin Hall.
Why do we study algorithms?. 2 First results are about bats and dolphins.
Programming by Examples Marktoberdorf Lectures August 2015 Sumit Gulwani.
Dagstuhl Seminar Oct 2015 Sumit Gulwani Applications of Inductive Programming in Data Wrangling.
1 COMS 261 Computer Science I Title: C++ Fundamentals Date: September 15, 2004 Lecture Number: 11.
Programming by Examples applied to Data Wrangling Invited SYNT July 2015 Sumit Gulwani.
Deductive Techniques for synthesis from Inductive Specifications Dagstuhl Seminar Oct 2015 Sumit Gulwani.
Sumit Gulwani Spreadsheet Programming using Examples Keynote at SEMS July 2016.
Sumit Gulwani Programming by Examples Applications, Algorithms & Ambiguity Resolution Keynote at IJCAR June 2016.
Tackling Ambiguity in PBE Rishabh Singh
Outline Core Synthesis Architecture [1 hour by Sumit]
Everything is a number Everything in a computer memory and on storages is a number. Number  Number Characters  Number by ASCII code Sounds  Number.
COMP 53 – Week Seven Big O Sorting.
Programming by Examples
Propositional Calculus: Boolean Algebra and Simplification
CO 303 Algorithm Analysis And Design Quicksort
Objective of This Course
Chapter 6 Intermediate-Code Generation
Alan Mishchenko University of California, Berkeley
Intermediate Code Generation
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Presentation transcript:

Automating String Processing in Spreadsheets using Input-Output Examples Sumit Gulwani Microsoft Research, Redmond

1 Automated Program Synthesis Deserves renewed interest today! Natural goal given that computing has become accessible, but most people are not expert programmers. Enabling technology is now available –Better search techniques AI style search techniques. logical reasoning based techniques (SAT/SMT solvers). –Faster machines (good application for multi-cores) State of the art: We can synthesize lines of code.

Our techniques can synthesize a wide variety of algorithms/programs from logic and examples. Undergraduate book algorithms (e.g., sorting, dynamic prog) –[Srivastava/Gulwani/Foster, POPL 2010] Program Inverses (e.g, deserializers from serializers) –[Srivastava/Gulwani/Chaudhuri/Foster, MSR-TR ] Graph Algorithms (e.g., bi-partiteness check) –[Itzhaky/Gulwani/Immerman/Sagiv, OOPSLA 2010] Bit-vector algorithms (e.g., turn-off rightmost one bit) –[Jha/Gulwani/Seshia/Tiwari, ICSE 2010] 2 Recent Success in Program Synthesis

End-Users Algorithm Designers Software Developers Most Useful Target Potential Consumers of Synthesis Technology Pyramid of Technology Users

Demo 4

 Language of String Programs Synthesis Algorithm Ranking Strategy Limitations 5 Outline

Guarded Expression G := Switch((b 1,e 1 ), …, (b n,e n )) String Expression e := Concatenate(f 1, …, f n ) Base Expression f := s // Constant String | SubStr(v i, p 1, p 2 ) Index Expression p := k // Constant Integer | Pos(r 1, r 2, k) // k th position in string whose left/right side matches with r 1 /r 2 Notation: SubStr2(v i,r,k) ´ SubsStr(v i,Pos( ²,r,k),Pos(r, ²,k)) –Denotes k th occurrence of regular expression r in v i 6 Language for Constructing Output Strings

7 Example Switch((b 1, e 1 ), (b 2, e 2 )), where b 1 ´ Match(v 1,NumTok,3), b 2 ´ : Match(v 1,NumTok,3), e 1 ´ Concatenate(SubStr2(v 1,NumTok,1), ConstStr(“-”), SubStr2(v 1,NumTok,2), ConstStr(“-”), SubStr2(v 1,NumTok,3)) e 2 ´ Concatenate(ConstStr(“425-”),SubStr2(v 1,NumTok,1), ConstStr(“-”),SubStr2(v 1,NumTok,2)) Format phone numbers Input v 1 Output (425)

Language of String Programs  Synthesis Algorithm Ranking Strategy Limitations 8 Outline

Reduction requires computing all solutions for each of the sub-problems: –This also allows to rank various solutions and select the highest ranked solution at the top-level. –A challenge here is to efficiently represent, compute, and manipulate huge number of such solutions. I will show three applications of this idea in the talk. –Read the paper for more tricks! 9 Key Synthesis Idea: Divide and Conquer Reduce the problem of synthesizing expressions into sub-problems of synthesizing sub-expressions.

10 Synthesizing Guarded Expression Goal: Given input-output pairs: (i 1,o 1 ), (i 2,o 2 ), (i 3,o 3 ), (i 4,o 4 ), find P such that P(i 1 )=o 1, P(i 2 )=o 2, P(i 3 )=o 3, P(i 4 )=o 4. Algorithm: 1. Learn set S 1 of string expressions s.t. 8 e in S 1, [[e]] i 1 = o 1. Similarly compute S 2, S 3, S 4. Let S = S 1 Å S 2 Å S 3 Å S 4. 2(a) If S ≠ ; then result is Switch((true,S)). Application #1: We reduce the problem of learning guarded expression P to the problem of learning string expressions for each input-output pair.

11 Example: Various choices for a String Expression Input Output Constant

Number of all possible string expressions (that can construct a given output string o 1 from a given input string i 1 ) is exponential in size of output string. –# of substrings is just quadratic in size of output string! –We use a DAG based data-structure, and it supports efficient intersection operation! 12 Synthesizing String Expressions Application #2: To represent/learn all string expressions, it suffices to represent/learn all base expressions for each substring of the output.

Various ways to extract “706” from “ ”: Chars after 1 st hyphen and before 2 nd hyphen. Substr(v 1, Pos(HyphenTok, ²,1), Pos( ²,HyphenTok,2)) Chars from 2 nd number and up to 2 nd number. Substr(v 1, Pos( ²,NumTok,2), Pos(NumTok, ²,2)) Chars from 2 nd number and before 2 nd hyphen. Substr(v 1, Pos( ²,NumTok,2), Pos( ²,HyphenTok,2)) Chars from 1 st hyphen and up to 2 nd number. Substr(v 1, Pos(HyphenTok, ²,1), Pos( ²,HyphenTok,2))  13 Example: Various choices for a SubStr Expression

The number of SubStr(v,p 1,p 2 ) expressions that can extract a given substring w from a given string v can be large! –This allows for representing and computing O(n 1 *n 2 ) choices for SubStr using size/time O(n 1 +n 2 ). 14 Synthesizing SubStr Expressions Application #3: To represent/learn all SubStr expressions, we can independently represent/learn all choices for each of the two index expressions.

15 Back to Synthesizing Guarded Expression Goal: Given input-output pairs: (i 1,o 1 ), (i 2,o 2 ), (i 3,o 3 ), (i 4,o 4 ), find P such that P(i 1 )=o 1, P(i 2 )=o 2, P(i 3 )=o 3, P(i 4 )=o 4. Algorithm: 1.Learn set S 1 of string expressions s.t. 8 e in S 1, [[e]] i 1 = o 1. Similarly compute S 2, S 3, S 4. Let S = S 1 Å S 2 Å S 3 Å S 4. 2(a). If S ≠ ; then result is Switch((true,S)). 2(b). Else find a smallest partition, say {S 1,S 2 }, {S 3,S 4 }, s.t. S 1 Å S 2 ≠ ; and S 3 Å S 4 ≠ ;. 3. Learn boolean formulas b 1, b 2 s.t. b 1 maps i 1, i 2 to true and i 3, i 4 to false. b 2 maps i 3, i 4 to true and i 1, i 2 to false. 4. Result is: Switch((b 1,S 1 Å S 2 ), (b 2,S 3 Å S 4 ))

Language of String Programs Synthesis Algorithm  Ranking Strategy Limitations 16 Outline

Prefer shorter programs. –Fewer number of conditionals. –Shorter string expression, regular expressions. Prefer programs with less number of constants. 17 Ranking Strategy

Language of String Programs Synthesis Algorithm Ranking Strategy  Limitations 18 Outline

This paper: Syntactic Manipulation of strings Extension 1: Semantic Manipulation of strings –Joint work with intern Rishabh Singh (MIT) Extension 2: Layout Manipulation of tables –Joint work with intern Bill Harris (UW-Madison) 19 Limitations and Follow-up Work

Demo 20

Problem: End-user Programming Solution: Program Synthesis with inter-disciplinary inspirations Programming Languages –Design of an expressive language that can succinctly represent string computations and is amenable to learning. Machine Learning –Version space algebra for learning straight-line code. –Boolean classification technique for learning control flow. HCI –Input-output based interaction model –Several usability features: Ranking scheme, Feedback to user, Quick Convergence, Noise tolerance. 21 Conclusion