Download presentation
Presentation is loading. Please wait.
Published byLesley Barber Modified over 9 years ago
1
Cultivating Research Taste (illustrated via a journey in Program Synthesis research) Programming Languages Mentoring Workshop 2015 Sumit Gulwani Microsoft Research, Redmond
2
Problem Definition –Advisor’s interest and funding, Internship, Course project –Intersection with your collaborator’s interest –Next logical advance in your current portfolio –Talk to potential customers, market surveys Solution Strategy –Develop new techniques vs. Apply existing techniques –Cross-disciplinary Impact –Paper, Tool, Awards, Media –Personal happiness Cultivating research taste is a journey! Once you develop it, you start on another journey! 1 Dimensions in Research
3
2 Program Synthesis Goal: Synthesize a program in the underlying domain-specific language (DSL) from user intent using some search algorithm. An old problem, but more significant today. Diverse computational platforms & programming languages. Enabling technology: Better algorithms & faster machines Synthesis can revolutionize end-user programming if we: target the right set of application domains –such as Data manipulation allow the right intent specification mechanism –Examples, Natural Language can tame the huge search space for real-time interaction –Domain-specific search algorithms PPDP 2010 [Invited talk paper]: “Dimensions in Program Synthesis”;
4
3 Graduation Advice (2005) George Necula UC-Berkeley You will have too many problems to solve; you can’t pursue them all. Make thoughtful choices.
5
4 From Program Verification to Program Synthesis Statement s Precondition P Postcondition Q Forward dataflow analysis: From s, P, compute Q Program Synthesis: Backward dataflow analysis: From s, Q, compute P From P, Q, compute s Nebojsa Jojic MSR Redmond (2005)
6
5 Synthesis using SAT/SMT Constraint Solvers Venkie MSR Bangalore (2006) Try using SAT solvers, which have been engineered to solve huge instances. Program synthesis is an extremely hard combinatorial search task!
7
Results: Managed to synthesize a wide variety of programs from logic specs. Approach: Reduce synthesis to solving SAT/SMT constraints. Bit-vector algorithms (e.g., turn-off rightmost one bit) –[PLDI 2011, ICSE 2010] SIMD algorithms (e.g., vectorization of CountIf) –[PPoPP 2013] Undergraduate book algorithms (e.g., sorting, dynamic prog) –[POPL 2010] Program Inverses (e.g, deserializers from serializers) –[PLDI 2011] Graph Algorithms (e.g., bi-partiteness check) –[OOPSLA 2010] 6 Initial results in program synthesis
8
Mid-life Awakening (2010) Software developers End users Two orders of magnitude more users
9
Problem Definition –Advisor’s interest and funding, Internship, Course project –Intersection with your collaborator’s interest –Next logical advance in your current portfolio –Talk to potential customers, market surveys Solution Strategy –Develop new techniques vs. Apply existing techniques –Cross-disciplinary Impact –Paper, Tool, Media, Awards –Personal happiness Cultivating research taste is a journey! Once you develop it, you start on another journey! 8 Dimensions in Research
10
Problem Definition: Inspired by Excel help forums
11
Typical help-forum interaction 300_w5_aniSh_c1_b w5 =MID(B1,5,2) 300_w30_aniSh_c1_b w30 =MID(B1,FIND(“_”,$B:$B)+1, FIND(“_”,REPLACE($B:$B,1,FIND(“_”,$B:$B),””))-1) =MID(B1,5,2)
12
Flash Fill (Excel 2013 feature)
13
Problem Definition –Advisor’s interest and funding, Internship, Course project –Intersection with your collaborator’s interest –Next logical advance in your current portfolio –Talk to potential customers, market surveys Solution Strategy –Develop new techniques vs. Apply existing techniques –Cross-disciplinary Impact –Paper, Tool, Awards, Media –Personal happiness Cultivating research taste is a journey! Once you develop it, you start on another journey! 12 Dimensions in Research
14
Guarded Expression G := Switch((b 1,e 1 ), …, (b n,e n )) Boolean Expression b := c 1 Æ … Æ c n Atomic Predicate c := Match(v i,k,r) Trace Expression e := Concatenate(f 1, …, f n ) Atomic Expression f := s // Constant String | SubStr(v i, p 1, p 2 ) | Loop( ¸ w: e) Index Expression p := k // Constant Integer | Pos(r 1, r 2, k) // k th position in string whose left/right side matches with r 1 /r 2 Regular Expression r := TokenSequence(T 1,…,T n ) 13 Flash Fill: Domain Specific Language POPL 2011: “Automating String Processing in Spreadsheets using Input-Output Examples”; Sumit Gulwani.
15
Let w = SubString(s, p, p’) where p = Pos(r 1, r 2, k) and p’ = Pos(r 1 ’, r 2 ’, k’) 14 Substring Operator s p p’ w w1w1 w2w2 w1’w1’ w2’w2’ r 1 matches w 1 r 2 matches w 2 r 1 ’ matches w 1 ’ r 2 ’ matches w 2 ’
16
15 Syntactic String Transformations: Example Switch((b 1, e 1 ), (b 2, e 2 )), where b 1 ´ Match(v 1,NumTok,3), b 2 ´ : Match(v 1,NumTok,3), e 1 ´ Concatenate(SubStr2(v 1,NumTok,1), ConstStr(“-”), SubStr2(v 1,NumTok,2), ConstStr(“-”), SubStr2(v 1,NumTok,3)) e 2 ´ Concatenate(ConstStr(“425-”),SubStr2(v 1,NumTok,1), ConstStr(“-”),SubStr2(v 1,NumTok,2)) Format phone numbers Input v 1 Output (425)-706-7709425-706-7709 510.220.5586510-220-5586 235 7654425-235-7654 745-8139425-745-8139
17
16 Goal: Given input-output pairs: (i 1,o 1 ), (i 2,o 2 ), (i 3,o 3 ), (i 4,o 4 ), find P such that P(i 1 )=o 1, P(i 2 )=o 2, P(i 3 )=o 3, P(i 4 )=o 4. Algorithm: 1.Learn set S 1 of trace expressions s.t. 8 e in S 1, [[e]] i 1 = o 1. Similarly compute S 2, S 3, S 4. Let S = S 1 Å S 2 Å S 3 Å S 4. 2(a). If S ≠ ; then result is S. Challenge: Each S j may have a huge number of expressions. Key Idea: We have a DAG based data-structure that allows for succinct representation and manipulation of S j. Flash Fill: Search Algorithm
18
17 Goal: Given input-output pairs: (i 1,o 1 ), (i 2,o 2 ), (i 3,o 3 ), (i 4,o 4 ), find P such that P(i 1 )=o 1, P(i 2 )=o 2, P(i 3 )=o 3, P(i 4 )=o 4. Algorithm: 1.Learn set S 1 of trace expressions s.t. 8 e in S 1, [[e]] i 1 = o 1. Similarly compute S 2, S 3, S 4. Let S = S 1 Å S 2 Å S 3 Å S 4. 2(a). If S ≠ ; then result is S. 2(b). Else find a smallest partition, say {S 1,S 2 }, {S 3,S 4 }, s.t. S 1 Å S 2 ≠ ; and S 3 Å S 4 ≠ ;. 3. Learn boolean formulas b 1, b 2 s.t. b 1 maps i 1, i 2 to true, and b 2 maps i 3, i 4 to true. 4. Result is: Switch((b 1,S 1 Å S 2 ), (b 2,S 3 Å S 4 )) Flash Fill: Search Algorithm Search Methodology: Reduce learning of an expression to learning of sub-expressions (Divide-and-Conquer!)
19
General Principles Prefer shorter programs. –Fewer number of conditionals. –Shorter string expression, regular expressions. Prefer programs with fewer constants. Strategies Baseline: Pick any minimal sized program using minimal number of constants. Machine Learning: Programs are scored using a weighted combination of program features. –Weights are learned using training data. 18 Ranking Rishabh Singh
20
19 Experimental Comparison of various Ranking Strategies StrategyAverage # of examples required Baseline4.17 Learning1.48 Technical Report: “Predicting a correct program in Programming by Example”; Singh, Gulwani
21
Current Flash Fill Model Auto-prediction avoids discoverability issue. User inspects output and may provide additional examples. Show programs in any desired language (after conversion from DSL). Paraphrase in English. Computer initiated interactivity Highlight less confident entries in the output. Ask directed questions based on distinguishing inputs. 20 User Interaction Model
22
Problem Definition –Advisor’s interest and funding, Internship, Course project –Intersection with your collaborator’s interest –Next logical advance in your current portfolio –Talk to potential customers, market surveys Solution Strategy –Develop new techniques vs. Apply existing techniques –Cross-disciplinary Impact –Paper, Tool, Awards, Media –Personal happiness Cultivating research taste is a journey! Once you develop it, you start on another journey! 21 Dimensions in Research
23
Initial Success: Media articles & Blogposts
24
Defined a new research trajectory, which keeps me busy with a passionate sense of purpose. End-user Programming using Examples and Natural Language Intelligent Tutoring systems 23 Broader Impact
25
Dimensions in Research Problem definition, Solution strategy, Impact Cultivating research taste is a journey Mine involved: “Program analysis” -> “Program synthesis” -> “Program synthesis for end-users using examples” Once you develop it, you start a new journey Mine involves: having fun with cross-disciplinary research in “Frameworks for end-user programming using examples & NL” “Intelligent Tutoring systems” Conclusion
26
25 Backup Slides for Flash Fill Demo
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.