Download presentation
Presentation is loading. Please wait.
Published byNorah Norton Modified over 6 years ago
0
Programming by Examples
Sumit Gulwani Alex Polozov PLDI Tutorial June 2016
1
Outline Core Synthesis Architecture [1 hour by Sumit]
Domain-specific Languages Search methodology Ranking function PROSE Framework [1:15 hours by Alex] Concepts: VSAs, Witness Functions Live coding demo Next generation Synthesis [15 min by Sumit] Interactive Predictive Adaptive
2
The PROSE Team Allen Cypher Sumit Gulwani Ranvijay Kumar
Daniel Perelman Vu Le We are hiring interns/full-time! Alex Polozov Mohammad Raza Danny Simmons Adam Smith Abhishek Udupa
3
Machine Learning, Analytics, & Data Science Conference
4/18/2018 3:15 PM Motivation 99% of computer users cannot program! They struggle with simple repetitive tasks. © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
4
Spreadsheet help forums
4
5
Typical help-forum interaction
300_w30_aniSh_c1_b w30 300_w5_aniSh_c1_b w5 =MID(B1,5,2) =MID(B1,5,2) =MID(B1,FIND(“_”,$B:$B)+1, FIND(“_”,REPLACE($B:$B,1,FIND(“_”,$B:$B), “”))-1) 5
6
Flash Fill (Excel 2013 feature) demo
“Automating string processing in spreadsheets using input-output examples”; POPL 2011; Sumit Gulwani 6
7
Data Science Class Assignment
To get Started!
8
FlashExtract Demo Ships inside two Microsoft products: Powershell
ConvertFrom-String cmdlet Custom Log, Custom Field “FlashExtract: A Framework for data extraction by examples”; PLDI 2014; Vu Le, Sumit Gulwani
9
Killer Application: Data Wrangling
Machine Learning, Analytics, & Data Science Conference 4/18/2018 3:15 PM Killer Application: Data Wrangling Data is the new oil. Data scientists spend 80% time wrangling data. Extraction FlashExtract:Tabular data from log files [PLDI 2014] Tabular data from spreadsheets [PLDI 2015] Transformation FlashFill: Syntactic String Transformations [POPL 2011] Semantic String Transformations [VLDB 2012] Number Transformations [CAV 2013] Date Transformations [POPL 2016] Normalization Transformations [IJCAI 2015] Formatting FlashRelate: Table layout transformations [PLDI 2011] Repetitive editing in Powerpoint [AAAI 2014] © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
10
Programming-by-Examples Architecture
Machine Learning, Analytics, & Data Science Conference 4/18/2018 3:15 PM Programming-by-Examples Architecture Example-based Intent Program set (a sub-DSL of D) Program Synthesizer DSL D 10 © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
11
Domain-specific Language (DSL)
Balanced Expressiveness Expressive enough to cover wide range of tasks Restricted enough to enable efficient search Restricted set of operators those with small inverse sets Restricted syntactic composition of those operators Natural computation patterns Increased user understanding/confidence Enables selection between programs, editing
12
DSL for String Transformations
Machine Learning, Analytics, & Data Science Conference 4/18/2018 3:15 PM DSL for String Transformations 𝑇𝑢𝑝𝑙𝑒 𝑆𝑡𝑟𝑖𝑛𝑔 𝑥 1 ,…,𝑆𝑡𝑟𝑖𝑛𝑔 𝑥 𝑛 → 𝑆𝑡𝑟𝑖𝑛𝑔 atomic expression A := input string X := x1 | x2 | … position expression P := K | Pos(X, R1, R2, K) SubStr(X,P,P) Kth position in X whose left/right side matches with R1/R2. “Automating String Processing in Spreadsheets using Input-Output Examples”; POPL 2011; Sumit Gulwani © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
13
Substring Operator Evaluation of Expr SubStr(x,p,p’) , where p=Pos(x,r1,r2,k) & p’=Pos(x,r1’,r2’,k’) matches r1 matches r2 matches r1’ matches r2’ p p’ w x Two special cases: r1 = r2’ = 𝜖 : This describes the substring r2 = r1’ = 𝜖 : This describes the context around the substring General case is very expressive (describes substring & context) Regular exprs are simple (they describe local properties).
14
DSL for String Transformations
Machine Learning, Analytics, & Data Science Conference 4/18/2018 3:15 PM DSL for String Transformations 𝑇𝑢𝑝𝑙𝑒 𝑆𝑡𝑟𝑖𝑛𝑔 𝑥 1 ,…,𝑆𝑡𝑟𝑖𝑛𝑔 𝑥 𝑛 → 𝑆𝑡𝑟𝑖𝑛𝑔 top-level expr T := if-then-else(B,C,T) | C condition-free expr C := | A atomic expression A := | ConstantString input string X := x1 | x2 | … position expression P := K | Pos(X, R1, R2, K) Boolean expression B := … Concatenate(A,C) SubStr(X,P,P) Kth position in X whose left/right side matches with R1/R2. © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
15
FlashExtract DSL 𝑆𝑡𝑟𝑖𝑛𝑔 𝑑→𝐿𝑖𝑠𝑡(𝑃𝑜𝑠𝑃𝑎𝑖𝑟) Seq expr E := Map(N, 𝜆z: S[z])
| Merge(T1, T2) some lines N := FilterByPosition(L, init, iter) | Filter(L, 𝜆z: F) line filter function F := Contains(z,r,K) | startsWith(z,r) all lines L := Split(d,”\n”) [z]) | Filter(L, 𝜆y: F[prevLine(y)]) [z] substr expr S[z] := “FlashExtract: A Framework for data extraction by examples”; PLDI 2014; Vu Le, Sumit Gulwani
16
Search Methodology Goal: Set of expr of kind 𝑒 that satisfies spec 𝜙 [denoted 𝑒⊨𝜙 ] 𝑒: DSL (top-level) expression 𝜙: Conjunction of (input state 𝜎 , output value 𝑣) [denoted 𝜎⇝𝑣] “FlashMeta: A Framework for Inductive Program Synthesis”; [OOPSLA 2015] Alex Polozov, Sumit Gulwani
17
Search Methodology Goal: Set of expr of kind 𝑒 that satisfies spec 𝜙
[denoted 𝑒⊨𝜙 ] 𝑒: DSL (top-level) expression 𝜙: Conjunction of (input state 𝜎 , output value 𝑣) [denoted 𝜎⇝𝑣] Methodology: Based on divide-and-conquer style problem decomposition. 𝑒⊨𝜙 is reduced to simpler problems (over sub-expressions of e or sub-constraints of 𝜙). Top-down (as opposed to bottom-up enumerative search). “FlashMeta: A Framework for Inductive Program Synthesis”; [OOPSLA 2015] Alex Polozov, Sumit Gulwani
18
Problem Reduction Rules
𝑒⊨ 𝜙 1 ∧ 𝜙 2 = 𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡( 𝑒⊨ 𝜙 1 , 𝑒⊨ 𝜙 2 ) An alternative strategy: 𝑒⊨ 𝜙 1 ∧ 𝜙 2 = 𝐹𝑖𝑙𝑡𝑒𝑟( 𝑒⊨ 𝜙 1 , 𝜙 2 ) Let 𝑒 be a non-terminal defined as 𝑒 ≔ 𝑒 1 | 𝑒 2 𝑒⊨𝜙 = 𝑈𝑛𝑖𝑜𝑛( 𝑒 1 ⊨𝜙 , 𝑒 2 ⊨𝜙 )
19
Problem Reduction Rules
Inverse Set: Let F be an n-ary operator. 𝐹 −1 𝑣 = 𝑢 1 ,…, 𝑢 𝑛 𝐹 𝑢 1 ,…, 𝑢 𝑛 =𝑣} 𝐶𝑜𝑛𝑐𝑎 𝑡 −1 "Abc" = { "Abc",ϵ , ("𝐴𝑏","c"), ("A","bc"), (ϵ,"Abc")} 𝐹 𝑒 1 , …,𝑒 𝑛 ⊨𝜎⇝𝑣 = 𝑈𝑛𝑖𝑜𝑛({F e 1 ⊨𝜎⇝ 𝑢 1 , …, 𝑒 𝑛 ⊨𝜎⇝ 𝑢 𝑛 | 𝑢 1 ,…, 𝑢 𝑛 ∈ 𝐹 −1 𝑣 } 𝐹 𝑆 1 , …, 𝑆 𝑛 denotes 𝐹 𝑒 1 , …,𝑒 𝑛 𝑒 1 ∈ 𝑆 1 ,…, 𝑒 𝑛 ∈ 𝑆 𝑛 } [𝐶𝑜𝑛𝑐𝑎𝑡 𝑋,𝑌 ⊨(𝜎⇝"Abc")] = Union({ 𝐶𝑜𝑛𝑐𝑎𝑡( 𝑋⊨ 𝜎⇝"Abc" , 𝑌⊨ 𝜎⇝𝜖 ), 𝐶𝑜𝑛𝑐𝑎𝑡 𝑋⊨ 𝜎⇝"Ab" , 𝑌⊨ 𝜎⇝"𝑐" , 𝐶𝑜𝑛𝑐𝑎𝑡 𝑋⊨ 𝜎⇝"A" , 𝑌⊨ 𝜎⇝"𝑏𝑐" , 𝐶𝑜𝑛𝑐𝑎𝑡 𝑋⊨ 𝜎⇝ϵ , 𝑌⊨ 𝜎⇝"𝐴𝑏𝑐" })
20
Problem Reduction Rules
Inverse Set: Let F be an n-ary operator. 𝐹 −1 𝑣 = 𝑢 1 ,…, 𝑢 𝑛 𝐹 𝑢 1 ,…, 𝑢 𝑛 =𝑣} 𝐶𝑜𝑛𝑐𝑎 𝑡 −1 "Abc" = { "Abc",ϵ , ("𝐴𝑏","c"), ("A","bc"), (ϵ,"Abc")} 𝐼𝑓𝑇ℎ𝑒𝑛𝐸𝑙𝑠 𝑒 −1 𝑣 = { 𝑡𝑟𝑢𝑒, 𝑣, ⊤ , (𝑓𝑎𝑙𝑠𝑒, ⊤, 𝑣)} 𝑆𝑢𝑏𝑆𝑡 𝑟 −1 "Ab" = …
21
Problem Reduction Rules
Conditional Inverse Set: Let F be an n-ary operator. 𝐹 −1 𝑣 𝑢 1 )= 𝑢 2 ,…, 𝑢 𝑛 𝐹 𝑢 1 ,…, 𝑢 𝑛 =𝑣} 𝑆𝑢𝑏𝑆𝑡 𝑟 −1 "Ab" "Ab cd Ab")= { 0,2 , (6,8) } 𝐹 (𝑒 1 ,…, 𝑒 𝑛 ) ⊨𝜎⇝𝑣 = let 𝐺=𝐺𝑟𝑎𝑚𝑚𝑎𝑟 𝑒 1 in let 𝐺 1 ,…, 𝐺 𝑛 =𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 𝐺,𝜎 in 𝑈𝑛𝑖𝑜𝑛 𝑗=1..𝑘 𝐹 𝐺, 𝐻 2 , …, 𝐻 𝑛 𝑢 1 =𝐸𝑣𝑎𝑙 𝐺 𝑗 ,𝜎 𝑢 2 , …, 𝑢 𝑛 ∈ 𝐹 −1 𝑣 𝑢 𝐻 2 = 𝑒 2 ⊨𝜎⇝ 𝑢 2 𝐻 𝑛 = 𝑒 𝑛 ⊨𝜎⇝ 𝑢 𝑛 Let 𝜎 be the state 𝑥: “𝐴𝑏 𝑐𝑑 𝐴𝑏” . 𝑆𝑢𝑏𝑆𝑡𝑟 𝑥, 𝑃 1 , 𝑃 2 ⊨ 𝜎⇝"cd" = 𝑆𝑢𝑏𝑆𝑡𝑟 𝑥, 𝑃 1 ⊨ 𝜎⇝3 , 𝑃 2 ⊨ 𝜎⇝5
22
Generalizing output values to output properties
User specification Examples: Conjunction of (input state, output value) Inductive Spec: Conjunction of (input state, output property) Search methodology (Conditional) Inverse sets: Back-propagation of values (Conditional) Witness functions: Back-propagation of properties
23
Problem Reduction list of strings T := Map(L,S) substring fn S := 𝜆y: … list of lines L := Filter(Split(d,"\n"), B) boolean fn B := 𝜆y: … FlashExtract DSL ⋈ Spec for T Spec for L Spec for S ∧
24
Problem Reduction substring expr E := SubStr(y,P1,P2) position expr P := K | Pos(y,R1,R2,K) SubStr grammar Spec for E Spec for P1 ⋈ Spec for P2 Redmond, WA Redmond, WA
25
Programming-by-Examples Architecture
Machine Learning, Analytics, & Data Science Conference 4/18/2018 3:15 PM Programming-by-Examples Architecture Example-based Intent Program set (a sub-DSL of D) Ranked Program set (a sub-DSL of D) Program Synthesizer Ranking fn DSL D 25 © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
26
Basic ranking scheme Prefer programs with simpler Kolmogorov complexity Prefer fewer constants. Prefer smaller constants. Input Output Rishabh Singh Rishabh Ben Zorn Ben 1st Word If (input = “Rishabh Singh”) then “Rishabh” else “Ben” “Rishabh”
27
Challenges with Basic ranking scheme
Prefer programs with simpler Kolmogorov complexity Prefer fewer constants. Prefer smaller constants. Input Output Rishabh Singh Singh, Rishabh Ben Zorn Zorn, Ben 2nd Word + “, ‘’ + 1st Word “Singh, Rishabh” How to select between Fewer larger constants vs. More smaller constants? Idea: Associate numeric weights with constants.
28
Challenges with Basic ranking scheme
Prefer programs with simpler Kolmogorov complexity Prefer fewer constants. Prefer smaller constants. Input Output Missing page numbers, 1993 1993 64-67, 1995 1995 1st Number from the beginning 1st Number from the end How to select between Same number of same-sized constants? Idea: Examine data features (in addition to program features)
29
Machine learning based ranking scheme
Rank score of a program: Weighted combination of various features. Weights are learned using machine learning. Program features Number of constants Size of constants Features over user data: Similarity of generated output (or even intermediate values) over various user inputs IsYear, Numeric Deviation, Number of characters IsPersonName “Predicting a correct program in Programming by Example”; [CAV 2015] Rishabh Singh, Sumit Gulwani
30
Outline Core Synthesis Architecture [1 hour by Sumit]
Domain-specific Languages Search methodology Ranking function PROSE Framework [1:15 hours by Alex] Concepts: VSAs, Witness Functions Live coding demo Next generation Synthesis [15 min by Sumit] Interactive Predictive Adaptive
31
Outline Core Synthesis Architecture [1 hour by Sumit]
Domain-specific Languages Search methodology Ranking function PROSE Framework [1:15 hours by Alex] Concepts: VSAs, Witness Functions Live coding demo Next generation Synthesis [15 min by Sumit] Interactive Predictive Adaptive
32
Programming-by-Examples Architecture
Machine Learning, Analytics, & Data Science Conference 4/18/2018 3:15 PM Programming-by-Examples Architecture Refined Intent Example based Intent Ranked Program set (a sub-DSL of D) Intended Program in D Program Synthesizer Debugging Translator Test inputs Ranking fn DSL D Intended Program in R/Python/C#/C++/… 32 © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
33
Interactive Debugging
Machine Learning, Analytics, & Data Science Conference 4/18/2018 3:15 PM Interactive Debugging Sampling inputs Asking questions Visual explanations of the synthesized program 33 © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
34
Programming-by-Examples Architecture
Machine Learning, Analytics, & Data Science Conference 4/18/2018 3:15 PM Programming-by-Examples Architecture Refined Intent Example based Intent Ranked Program set (a sub-DSL of D) Intended Program in D Program Synthesizer Debugging Incrementality Translator Test inputs Ranking fn DSL D Intended Program in R/Python/C#/C++/… 34 © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
35
Predictive Intended programs can sometimes be synthesized from just the input. Tabular data extraction, Sort, Join Can save large amount of user effort. User need not provide examples for each of tens of columns.
36
Programming-by-Examples Architecture
Machine Learning, Analytics, & Data Science Conference 4/18/2018 3:15 PM Programming-by-Examples Architecture Refined Intent Example based Intent Ranked Program set (a sub-DSL of D) Intended Program in D Program Synthesizer Debugging Incrementality Translator Test inputs Ranking fn DSL D Intended Program in R/Python/C#/C++/… Learner Interaction history 36 © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
37
Adaptive Learn from past interactions
of the same user (personalized experience). of other users in the enterprise/cloud. The synthesis sessions now require less interaction.
38
Reference [based on Marktoberdorf Summer School 2015 Lecture Notes]
“Programming by Examples (and its applications in Data Wrangling)”, In Verification and Synthesis of Correct and Secure Systems; IOS Press; 2016 [based on Marktoberdorf Summer School 2015 Lecture Notes]
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.