End-User Programming (using Examples & Natural Language) Sumit Gulwani Microsoft Research, Redmond August 2013 Marktoberdorf Summer.

Slides:



Advertisements
Similar presentations
Synthesizing Number Transformations from Input-Output Examples Rishabh Singh and Sumit Gulwani.
Advertisements

From Verification to Synthesis Sumit Gulwani Microsoft Research, Redmond August 2013 Marktoberdorf Summer School Lectures: Part 1.
CrowdER - Crowdsourcing Entity Resolution
Introducing Formal Methods, Module 1, Version 1.1, Oct., Formal Specification and Analytical Verification L 5.
FlashExtract : A General Framework for Data Extraction by Examples
Learning Semantic String Transformations from Examples Rishabh Singh and Sumit Gulwani.
Data Manipulation using Programming by Examples and Natural Language Invited Upenn April 2015 Sumit Gulwani.
Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수
ISBN Chapter 3 Describing Syntax and Semantics.
1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A Sumit Gulwani (MSR Redmond) Component-based Synthesis Susmit Jha.
Requirements Analysis Concepts & Principles
Usable Synthesis Sumit Gulwani Microsoft Research, Redmond Usable Verification Workshop November 2010 MSR Redmond.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Describing Syntax and Semantics
1 Lecture 10 – Synthesis from Examples Eran Yahav.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Introduction to Machine Learning Approach Lecture 5.
Programming by Example using Least General Generalizations Mohammad Raza, Sumit Gulwani & Natasa Milic-Frayling Microsoft Research.
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
Cultivating Research Taste (illustrated via a journey in Program Synthesis research) Programming Languages Mentoring Workshop 2015 Sumit Gulwani Microsoft.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
CMPS 3223 Theory of Computation Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Programming by Examples Marktoberdorf Lectures August 2015 Sumit Gulwani.
Programming Project (Last updated: August 31 st /2010) Updates: - All details of project given - Deadline: Part I: September 29 TH 2010 (in class) Part.
Introduction. 2COMPSCI Computer Science Fundamentals.
Dimensions in Synthesis Part 3: Ambiguity (Synthesis from Examples & Keywords) Sumit Gulwani Microsoft Research, Redmond May 2012.
Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.
Querying Structured Text in an XML Database By Xuemei Luo.
Chapter 6: Information Retrieval and Web Search
INTRUDUCTION TO SOFTWARE TESTING TECHNIQUES BY PRADEEP I.
CONTEXT FREE GRAMMAR presented by Mahender reddy.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Software Architecture Evaluation Methodologies Presented By: Anthony Register.
Data Structures and Algorithms Dr. Tehseen Zia Assistant Professor Dept. Computer Science and IT University of Sargodha Lecture 1.
FlashNormalize: Programming by Examples for Text Normalization International Joint Conference on Artificial Intelligence, Buenos Aires 7/29/2015FlashNormalize1.
1 / 48 Formal a Language Theory and Describing Semantics Principles of Programming Languages 4.
SmartSynth: Synthesizing Smartphone Automation Scripts from Natural Language Vu Le (UC Davis) Sumit Gulwani (MSR Redmond) Zhendong Su (UC Davis)
Predicting a Correct Program in PBE Rishabh Singh, Microsoft Research Sumit Gulwani, Microsoft Research.
Automating String Processing in Spreadsheets using Input-Output Examples Sumit Gulwani Microsoft Research, Redmond.
Compositional Program Synthesis from Natural Language and Examples Mohammad Raza, Sumit Gulwani & Natasa Milic-Frayling Microsoft.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
FlashMeta Microsoft PROSE SDK: A Framework for Inductive Program Synthesis Oleksandr Polozov University of Washington Sumit Gulwani Microsoft Research.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Programming by Examples Marktoberdorf Lectures August 2015 Sumit Gulwani.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Dagstuhl Seminar Oct 2015 Sumit Gulwani Applications of Inductive Programming in Data Wrangling.
Programming by Examples applied to Data Wrangling Invited SYNT July 2015 Sumit Gulwani.
Deductive Techniques for synthesis from Inductive Specifications Dagstuhl Seminar Oct 2015 Sumit Gulwani.
Sumit Gulwani Spreadsheet Programming using Examples Keynote at SEMS July 2016.
Sumit Gulwani Programming by Examples Applications, Algorithms & Ambiguity Resolution Keynote at IJCAR June 2016.
Tackling Ambiguity in PBE Rishabh Singh
Outline Core Synthesis Architecture [1 hour by Sumit]
Usability Design Space in Programming by Examples
Programming by Examples
Programming by Examples
Programming by Examples
Lecture 12: Data Wrangling
CSc4730/6730 Scientific Visualization
Word embeddings (continued)
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Presentation transcript:

End-User Programming (using Examples & Natural Language) Sumit Gulwani Microsoft Research, Redmond August 2013 Marktoberdorf Summer School Lectures: Part 2

Students and Teachers End-Users Algorithm Designers Software Developers Potential Users of Synthesis Technology 1 Most Useful Target Vision for End-users: Enable people to have (automated) personal assistants.

Problem Definition: Identify a vertical domain of tasks that users struggle with. Domain-Specific Language (DSL): Design a DSL that can succinctly describe tasks in that domain. Synthesis Algorithm: Develop an algorithm that can efficiently translate intent into likely concepts in DSL. Machine Learning: Rank the various concepts. User Interface: Provide an appropriate interaction mechanism to resolve ambiguities. 2 Generic Methodology for End User Programming CACM 2012: “Spreadsheet Data Manipulation using Examples”, Gulwani, Harris, Singh

Syntactic String Transformations (from Examples) Flash Fill feature in Excel 2013 Reference: Automating String Processing in Spreadsheets using Input-Output Examples, POPL 2011, Gulwani

Demo

Guarded Expr G := Switch((b 1,e 1 ), …, (b n,e n )) Boolean Expr b := c 1 Æ … Æ c n Predicate c := Match(v i,k,r) Trace Expr e := Concatenate(f 1, …, f n ) Base Expr f := s // Constant String | SubStr(v i, p 1, p 2 ) Position Expr p := k // Constant Integer | Pos(r 1, r 2, k) // k th position in string whose left/right side matches with r 1 /r 2 Regular Expr r := TokenSeq(T 1,...,T n ) Notation: SubStr2(v i,r,k) ´ SubsStr(v i,Pos( ²,r,k),Pos(r, ²,k)) –Denotes k th occurrence of regular expression r in v i 5 Syntactic String Transformations: Language

Let w = SubString(s, p, p’) where p = Pos(r 1, r 2, k) and p’ = Pos(r 1 ’, r 2 ’, k’) 6 Substring Operator s p p’ w w1w1 w2w2 w1’w1’ w2’w2’ r 1 matches w 1 r 2 matches w 2 r 1 ’ matches w 1 ’ r 2 ’ matches w 2 ’

7 Syntactic String Transformations: Example Switch((b 1, e 1 ), (b 2, e 2 )), where b 1 ´ Match(v 1,NumTok,3), b 2 ´ : Match(v 1,NumTok,3), e 1 ´ Concatenate(SubStr2(v 1,NumTok,1), ConstStr(“-”), SubStr2(v 1,NumTok,2), ConstStr(“-”), SubStr2(v 1,NumTok,3)) e 2 ´ Concatenate(ConstStr(“425-”),SubStr2(v 1,NumTok,1), ConstStr(“-”),SubStr2(v 1,NumTok,2)) Format phone numbers Input v 1 Output (425)

Reduction requires computing all solutions for each of the sub-problems: –This also allows to rank various solutions and select the highest ranked solution at the top-level. –A challenge here is to efficiently represent, compute, and manipulate huge number of such solutions. Three applications of this idea in the talk. –Read the paper for more tricks! 8 Key Synthesis Idea: Divide and Conquer Reduce the problem of synthesizing expressions into sub-problems of synthesizing sub-expressions.

9 Synthesizing Guarded Expression Goal: Given input-output pairs: (i 1,o 1 ), (i 2,o 2 ), (i 3,o 3 ), (i 4,o 4 ), find P such that P(i 1 )=o 1, P(i 2 )=o 2, P(i 3 )=o 3, P(i 4 )=o 4. Algorithm: 1. Learn set S 1 of string expressions s.t. 8 e in S 1, [[e]] i 1 = o 1. Similarly compute S 2, S 3, S 4. Let S = S 1 Å S 2 Å S 3 Å S 4. 2(a) If S ≠ ; then result is Switch((true,S)). Application #1: We reduce the problem of learning guarded expression P to the problem of learning trace expressions for each input-output pair.

10 Too many choices for a Trace Expression Input Output Constant

Number of all possible trace expressions (that can construct a given output string o 1 from a given input string i 1 ) is exponential in size of output string. –# of substrings is just quadratic in size of output string! –We use a DAG based data-structure, and it supports efficient intersection operation! 11 Synthesizing Trace Expressions Application #2: To represent/learn all string expressions, it suffices to represent/learn all base expressions for each substring of the output.

Various ways to extract “706” from “ ”: Chars after 1 st hyphen and before 2 nd hyphen. Substr(v 1, Pos(HyphenTok, ²,1), Pos( ²,HyphenTok,2)) Chars from 2 nd number and up to 2 nd number. Substr(v 1, Pos( ²,NumTok,2), Pos(NumTok, ²,2)) Chars from 2 nd number and before 2 nd hyphen. Substr(v 1, Pos( ²,NumTok,2), Pos( ²,HyphenTok,2)) Chars from 1 st hyphen and up to 2 nd number. Substr(v 1, Pos(HyphenTok, ²,1), Pos( ²,HyphenTok,2))  12 Too many choices for a SubStr Expression

The number of SubStr(v,p 1,p 2 ) expressions that can extract a given substring w from a given string v can be large! –This allows for representing and computing O(n 1 *n 2 ) choices for SubStr using size/time O(n 1 +n 2 ). 13 Synthesizing SubStr Expressions Application #3: To represent/learn all SubStr expressions, we can independently represent/learn all choices for each of the two index expressions.

14 Back to Synthesizing Guarded Expression Goal: Given input-output pairs: (i 1,o 1 ), (i 2,o 2 ), (i 3,o 3 ), (i 4,o 4 ), find P such that P(i 1 )=o 1, P(i 2 )=o 2, P(i 3 )=o 3, P(i 4 )=o 4. Algorithm: 1.Learn set S 1 of trace expressions s.t. 8 e in S 1, [[e]] i 1 = o 1. Similarly compute S 2, S 3, S 4. Let S = S 1 Å S 2 Å S 3 Å S 4. 2(a). If S ≠ ; then result is Switch((true,S)). 2(b). Else find a smallest partition, say {S 1,S 2 }, {S 3,S 4 }, s.t. S 1 Å S 2 ≠ ; and S 3 Å S 4 ≠ ;. 3. Learn boolean formulas b 1, b 2 s.t. b 1 maps i 1, i 2 to true and i 3, i 4 to false. b 2 maps i 3, i 4 to true and i 1, i 2 to false. 4. Result is: Switch((b 1,S 1 Å S 2 ), (b 2,S 3 Å S 4 ))

General Principles Prefer shorter programs. –Fewer number of conditionals. –Shorter string expression, regular expressions. Prefer programs with less number of constants. Strategies Baseline: Pick any minimal sized program using minimal number of constants. Manual: Break conflicts using a weighted score of various program features. Machine Learning: Weights are identified using gradient descent over training data. 15 Ranking

16 Experimental Comparison of various Ranking Strategies StrategyAverage # of examples required Baseline4.17 Manual2.09 Learning1.48 Reference: Predicting a correct program in Programming by Example, Technical Report, Singh, Gulwani

Semantic String Transformations (from Examples) Reference: Learning Semantic String Transformations from Examples, VLDB 2012, Singh, Gulwani

Demo

19 Semantic String Transformations: Language | e t

20 Semantic String Transformations: Example Input v 1 Input v 2 Output (Price+ Markup*Price) Stroller10/12/2010$ * Bib23/12/2010$ *3.56 Diapers21/1/2011 Wipes2/4/2009 Aspirator23/2/2010 IdNameMarkup S33Stroller30% B56Bib45% D32Diapers35% W98Wipes40% A46Aspirator30% IdDatePrice S3312/2010$ S3311/2010$ B5612/2010$3.56 D321/2011$21.45 W984/2009$ CostRec Table MarkupRec Table

Idea 1: Suppose the language consists of only select exprs. –A reachability hyper-graph, where nodes are strings and edges are labeled with appropriate select expression, represents the set of all programs. –We use the same trick for synthesizing loop bodies of vectorized code [PPoPP 2013]! Idea 2: Observe that the synthesis algorithm for syntactic transformations identifies, for each substring of the output, various expressions that can generate it. –We now account for the possibility that a substring can also be generated by using a select expr. 21 Semantic String Transformations: Synthesis Algorithm

22 Semantic String Transformations: Experimental Results

Table Layout Transformations (from Examples) Reference: Spreadsheet Table Transformations from Examples, PLDI 2011, Harris, Gulwani

Demo

25 Table Layout Transformations: Language

26 Table Layout Transformation: Example Qual 1Qual 2Qual 3 Andrew Ben Carl AndrewQual AndrewQual AndrewQual BenQual BenQual CarlQual CarlQual

1.For each example, generate the set all component programs that are consistent with the output table. –First generate filter programs and then associative programs. 2.Intersect the sets (from step 1) for various examples. 3.Pick any subset of the resultant set (from step 2) that covers each of the output tables. This is quite similar to how we synthesize graph algorithms [OOPSLA ‘10], where also a program is a set of sub-programs! 27 Table Layout Transformations: Synthesis Algorithm

28 Table Layout Transformations: Experimental Results # of benchmark tasks# of examples required # of benchmark tasksSynthesis time 31<1 second seconds seconds Benchmark: 51 Tasks

SmartPhone Scripts (from Natural Language) Reference: SmartSynth: Synthesizing Smartphone Automation Scripts from Natural Language, MobiSys 2013, Le, Gulwani, Su

Demo

31 Examples of SmartPhone Scripts When I receive an SMS message, reply “I am driving” to the sender. Take a picture, add to it the current location and upload to Facebook. Silent at night, but ring for important contacts. Speak the current weather every morning at 8am. Send current location to a friend via SMS. Turn off ringer by turning the phone down.

32 Google AppInventor Programming Model When I receive an SMS message, Reply “I am driving” to the sender.

33 SmartScript Language

When I receive a new SMS, if the phone is connected to my car’s bluetooth, read the message content and reply to the sender “I’m driving.” 34 Example when (number, content) := MessageReceived() if (IsConnectedToBTDevice(Car_BT) then Speak(content); SendMessage(number, "I'm driving"); Synthesis

Script = Components + Relations/Connections –Component = API or Entity, where Entity = API return value, constant, or input –Relation = pair –as in synthesis of bit-vector algorithms! Discover components & relations using NLP techniques and type-based synthesis. –Identify likely set of components & relations using NLP engine. –Refine components using feedback from synthesis engine. –Infer missing relations using type-based synthesis. –Select among multiple candidates using ranking. 35 Synthesis Approach: Key Insights

Map all phrases to components. as in FlashFill, where we map all substrings in output to corresponding programs! We use various features to identify such a mapping and its confidence: Regular expressions Bag of words Phrase length Punctuation Parse tree (NLP parser) 36 Component Discovery

When I receive a new SMS, if the phone is connected to my car’s bluetooth, read the message content and reply to the sender “I’m driving.” 37 Component Discovery: Example PhraseDesired Component mapping When I receive a new SMSMessageReceived if the phone is connected toIsConnectedToBTDevice my car’s BluetoothCar_BT readSpeak message contentMessageReceived.Text O replySendMessage the senderMessageReceived.Number O “I’m driving”"I'm driving"

PhraseInitial Component Mapping When I receive a new SMS, if the phone is connected to my car’s bluetooth, read the message content and reply to the sender “I’m driving.” 38 Component Discovery: Example (more details) receive SMS When I receive a new SMS if the phone is connected to my car’s Bluetooth reply... MessageReceived Received,... MessageReceived, SendMessage,... MessageReceived IsConnectedToWifiNetwork IsConnectedToBTDevice,... Car_BT SendMessage, Send ,... Component mapping is refined by feedback from synthesis engine.

Relation between components = pair Rule-based relation discovery. –Relative locations of components 39 Relation Discovery C1TypeOf(C2)TypeOf(C3)Relations IsConnectedTo BTDevice BT ReadTextText SendMessageNumberText Missing relations are discovered using type-based synthesis. In case of multiple high-ranked solutions, interactive Q&A can be performed with the user.

40 Relation Discovery: Example EntityAPI Parameter Car_BTIsConnectedToBTDevice.Text I MessageReceived.Text O Speak.Text I MessageReceived.Number O SendMessage.Number I “I’m driving”SendMessage.Text I

41 Relation Discovery: Interactive Q&A Distinguishing multiple choice questions in case of multiple high-ranked alternative. Similar to idea of “Distinguishing input” used in programming (of bit-vector algorithms) by example. Question: API parameter Multiple choices: Equally-likely type-consistent entities What do you want the phone to speak? A.The received message content B.“I’m driving”

42 Synthesis Architecture NLP Engine Synthesis Engine Components + their Relations Feedback on component mapping Desired Script Natural Language Q&A Natural Language Description Feedback on Description User

640 English descriptions for 50 help-forum tasks (Tasker, AppInventor, TouchDevelop) Component Discovery Only NLP features: 70% With Synthesis engine feedback: 90% Relation Discovery Only NLP features: 76% With synthesis engine: 100% Overall Only NLP Techniques: 58% With Synthesis Engine: 90% 43 Results

44 Results: Component Discovery

45 Results: Relation Discovery

46 Results: Overall

After having identified components (colored text below), and relations (colored edges below), we need to now generate a script in the underlying DSL. 47 Script Generation when (number, content) := MessageReceived() if (IsConnectedToBTDevice(Car_BT) then Speak(content); SendMessage(number, "I'm driving"); See paper for some of these interesting details!

48 Results: Synthesis Time