Sumit Gulwani Programming by Examples Applications, Algorithms & Ambiguity Resolution Keynote at IJCAR June 2016.

Slides:



Advertisements
Similar presentations
Synthesizing Number Transformations from Input-Output Examples Rishabh Singh and Sumit Gulwani.
Advertisements

From Verification to Synthesis Sumit Gulwani Microsoft Research, Redmond August 2013 Marktoberdorf Summer School Lectures: Part 1.
MapReduce in Action Team 306 Led by Chen Lin College of Information Science and Technology.
(non-programmers with access to computers)
FlashExtract : A General Framework for Data Extraction by Examples
Learning Semantic String Transformations from Examples Rishabh Singh and Sumit Gulwani.
Data Manipulation using Programming by Examples and Natural Language Invited Upenn April 2015 Sumit Gulwani.
Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수
Parallel Merging Advanced Algorithms & Data Structures Lecture Theme 15 Prof. Dr. Th. Ottmann Summer Semester 2006.
Usable Synthesis Sumit Gulwani Microsoft Research, Redmond Usable Verification Workshop November 2010 MSR Redmond.
Describing Syntax and Semantics
Overview of Search Engines
Programming by Example using Least General Generalizations Mohammad Raza, Sumit Gulwani & Natasa Milic-Frayling Microsoft Research.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Relational Verification to SIMD Loop Synthesis Mark Marron – IMDEA & Microsoft Research Sumit Gulwani – Microsoft Research Gilles Barthe, Juan M. Crespo,
Cultivating Research Taste (illustrated via a journey in Program Synthesis research) Programming Languages Mentoring Workshop 2015 Sumit Gulwani Microsoft.
With Microsoft Office 2007 Intermediate© 2008 Pearson Prentice Hall1 PowerPoint Presentation to Accompany GO! with Microsoft ® Office 2007 Intermediate.
© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.
With Microsoft Access 2007 Volume 1© 2008 Pearson Prentice Hall1 PowerPoint Presentation to Accompany GO! with Microsoft ® Access 2007 Volume 1 Chapter.
Attribute Data in GIS Data in GIS are stored as features AND tabular info Tabular information can be associated with features OR Tabular data may NOT be.
Visual Sequences IQ Tests 1 Dipendra Kumar Misra (Y9201) Mukul Singh (Y9350) Tags : Search, Pattern Recognition, Logic etc Advisor : Dr. Amitabh Mukherjee.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Programming by Examples Marktoberdorf Lectures August 2015 Sumit Gulwani.
Automatically Synthesizing SQL Queries from Input-Output Examples Sai Zhang University of Washington Joint work with: Yuyin Sun.
End-User Programming (using Examples & Natural Language) Sumit Gulwani Microsoft Research, Redmond August 2013 Marktoberdorf Summer.
Dimensions in Synthesis Part 3: Ambiguity (Synthesis from Examples & Keywords) Sumit Gulwani Microsoft Research, Redmond May 2012.
PETRA – the Personal Embedded Translation and Reading Assistant Werner Winiwarter University of Vienna InSTIL/ICALL Symposium 2004 June 17-19, 2004.
Yazd University, Electrical and Computer Engineering Department Course Title: Advanced Software Engineering By: Mohammad Ali Zare Chahooki 1 Introduction.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
How to create property volumes
Formal Methods in Invited CBSoft Sep 2015 Sumit Gulwani Data Wrangling & Education.
CMP 131 Introduction to Computer Programming Violetta Cavalli-Sforza Week 3, Lecture 1.
Joey Paquet, 2000, Lecture 2 Lexical Analysis.
Data Structures and Algorithms Dr. Tehseen Zia Assistant Professor Dept. Computer Science and IT University of Sargodha Lecture 1.
FlashNormalize: Programming by Examples for Text Normalization International Joint Conference on Artificial Intelligence, Buenos Aires 7/29/2015FlashNormalize1.
Predicting a Correct Program in PBE Rishabh Singh, Microsoft Research Sumit Gulwani, Microsoft Research.
Automating String Processing in Spreadsheets using Input-Output Examples Sumit Gulwani Microsoft Research, Redmond.
Compositional Program Synthesis from Natural Language and Examples Mohammad Raza, Sumit Gulwani & Natasa Milic-Frayling Microsoft.
HKOI Programming HKOI Training Team (Intermediate) Alan, Tam Siu Lung Unu, Tse Chi Yung.
FlashMeta Microsoft PROSE SDK: A Framework for Inductive Program Synthesis Oleksandr Polozov University of Washington Sumit Gulwani Microsoft Research.
CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Tevfik Bultan Lecture 4: Introduction to C: Control Flow.
Programming by Examples Marktoberdorf Lectures August 2015 Sumit Gulwani.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Dagstuhl Seminar Oct 2015 Sumit Gulwani Applications of Inductive Programming in Data Wrangling.
Programming by Examples applied to Data Wrangling Invited SYNT July 2015 Sumit Gulwani.
Deductive Techniques for synthesis from Inductive Specifications Dagstuhl Seminar Oct 2015 Sumit Gulwani.
Sumit Gulwani Spreadsheet Programming using Examples Keynote at SEMS July 2016.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 10/9/2006 Lecture 6 – String Processing.
1 Sections 3.1 – 3.2a Basic Syntax and Semantics Fundamentals of Java: AP Computer Science Essentials, 4th Edition Lambert / Osborne.
Tackling Ambiguity in PBE Rishabh Singh
Information Retrieval in Practice
Outline Core Synthesis Architecture [1 hour by Sumit]
Advanced Computer Systems
MS Access Forms, Queries, Reports Matt Martin
Programming by Examples
Lecture 2 Lexical Analysis Joey Paquet, 2000, 2002, 2012.
Usability Design Space in Programming by Examples
Wrangler: Interactive Visual Specification of Data Transformation Scripts Presented by Tifany Yung October 5, 2015.
TMC 1414 Introduction to Programming
Programming by Examples
Programming by Examples
Programming by Examples
Lecture 12: Data Wrangling
Objective of This Course
Data Warehousing and Data Mining
Slides prepared by Samkit
Agenda About Excel/Calc Spreadsheets Key Features
Evaluation of Relational Operations: Other Techniques
Jiasi Shen, Martin Rinard MIT EECS & CSAIL
Presentation transcript:

Sumit Gulwani Programming by Examples Applications, Algorithms & Ambiguity Resolution Keynote at IJCAR June 2016

Motivation 99% of computer users cannot program! They struggle with simple repetitive tasks. 1

Spreadsheet help forums 2

Typical help-forum interaction 300_w5_aniSh_c1_b  w5 =MID(B1,5,2) 300_w30_aniSh_c1_b  w30 =MID(B1,FIND(“_”,$B:$B)+1, FIND(“_”,REPLACE($B:$B,1,FIND(“_”,$B:$B), “”))-1) =MID(B1,5,2) 3

Flash Fill (Excel 2013 feature) demo “Automating string processing in spreadsheets using input-output examples”; POPL 2011; Sumit Gulwani 4

To get Started! Data Science Class Assignment 5

Ships inside two Microsoft products: 6 “FlashExtract: A Framework for data extraction by examples”; PLDI 2014; Vu Le, Sumit Gulwani ConvertFrom-String cmdlet Custom Log, Custom Field FlashExtract Demo Powershell

Killer Application: Data Wrangling Data is the new oil. Data scientists spend 80% time wrangling data. Extraction FlashExtract:Tabular data from log files [PLDI 2014] Tabular data from spreadsheets [PLDI 2015] Transformation FlashFill: Syntactic String Transformations [POPL 2011] Semantic String Transformations [VLDB 2012] Number Transformations [CAV 2013] Date Transformations [POPL 2016] Normalization Transformations [IJCAI 2015] Formatting FlashRelate: Table layout transformations [PLDI 2011] Repetitive editing in Powerpoint [AAAI 2014] 7

Programming-by-Examples Architecture Example-based Intent Program set (a sub-DSL of D) DSL D Program Synthesizer 8

Balanced Expressiveness –Expressive enough to cover wide range of tasks –Restricted enough to enable efficient search Restricted set of operators –those with small inverse sets Restricted syntactic composition of those operators Natural computation patterns –Increased user understanding/confidence –Enables selection between programs, editing 9 Domain-specific Language (DSL)

Flash Fill DSL (String Transformations) Concatenate(A,C) SubStr(X,P,P) K th position in X whose left/right side matches with R 1 /R 2.

11 FlashExtract DSL (Sequence Extraction) substr expr S[z] := [z] [z]) “FlashExtract: A Framework for data extraction by examples”; PLDI 2014; Vu Le, Sumit Gulwani

Programming-by-Examples Architecture Example-based Intent Program set (a sub-DSL of D) DSL D Program Synthesizer 12

13 Search Methodology “ FlashMeta: A Framework for Inductive Program Synthesis ”; [OOPSLA 2015] Alex Polozov, Sumit Gulwani

14 Problem Reduction Rules An alternative strategy:

15 Problem Reduction Rules

16 Problem Reduction Rules The notion of inverse set (and corresponding reduction rule) can be generalized to the conditional setting.

17 Generalizing output values to output properties

Programming-by-Examples Architecture Example-based Intent Program set (a sub-DSL of D) DSL D Ranking fn Program Synthesizer 18 Ranked Program set (a sub-DSL of D)

Prefer programs with simpler Kolmogorov complexity Prefer fewer constants. Prefer smaller constants. 19 Basic ranking scheme InputOutput Rishabh SinghRishabh Ben ZornBen 1 st Word If (input = “Rishabh Singh”) then “Rishabh” else “Ben” “Rishabh”

Prefer programs with simpler Kolmogorov complexity Prefer fewer constants. Prefer smaller constants. 20 Challenges with Basic ranking scheme InputOutput Rishabh SinghSingh, Rishabh Ben ZornZorn, Ben 2 nd Word + “, ‘’ + 1 st Word “Singh, Rishabh” How to select between Fewer larger constants vs. More smaller constants? Idea: Associate numeric weights with constants.

Prefer programs with simpler Kolmogorov complexity Prefer fewer constants. Prefer smaller constants. 21 Challenges with Basic ranking scheme How to select between Same number of same-sized constants? Idea: Examine data features (in addition to program features) InputOutput Missing page numbers, , st Number from the beginning 1 st Number from the end

22 Machine learning based ranking scheme Rank score of a program: Weighted combination of various features. Weights are learned using machine learning. Program features Number of constants Size of constants Features over user data: Similarity of generated output (or even intermediate values) over various user inputs IsYear, Numeric Deviation, Number of characters IsPersonName “Predicting a correct program in Programming by Example”; [CAV 2015] Rishabh Singh, Sumit Gulwani

Core Synthesis Architecture –Domain-specific Language –Search methodology –Ranking function  Next generation Synthesis –Interactive –Predictive –Adaptive 23 Outline

Programming-by-Examples Architecture Example based Intent Ranked Program set (a sub-DSL of D ) DSL D Test inputs Intended Program in D Intended Program in R/Python/C#/C++/… Translator Ranking fn Program Synthesizer Debugging Refined Intent 24 Incrementality

Interactive Debugging 25

Intended programs can sometimes be synthesized from just the input. –Tabular data extraction, Sort, Join Can save large amount of user effort. –User need not provide examples for each of tens of columns. 26 Predictive

Programming-by-Examples Architecture Example based Intent Ranked Program set (a sub-DSL of D) DSL D Test inputs Intended Program in D Intended Program in R/Python/C#/C++/… Translator Ranking fn Program Synthesizer Debugging Refined Intent Incrementality 27

Learn from past interactions –of the same user (personalized experience). –of other users in the enterprise/cloud. The synthesis sessions now require less interaction. 28 Adaptive

Programming-by-Examples Architecture Example based Intent Ranked Program set (a sub-DSL of D) DSL D Interaction history Test inputs Intended Program in D Intended Program in R/Python/C#/C++/… Translator Learner Ranking fn Program Synthesizer Debugging Refined Intent Incrementality 29

Efficient implementation of the generic search methodology. Provides a library of reduction rules. Role of synthesis designer Implement a DSL and provide reduction rules for new operators. Provide ranking strategy. Can specify tactics to resolve non-determinism in search. 30 PROSE Framework

Vu Le The PROSE Team Sumit Gulwani Daniel Perelman Danny Simmons Adam Smith Mohammad Raza Abhishek Udupa Allen Cypher Ranvijay Kumar Alex Polozov We are hiring interns/full-time!

Application to robotics Learn from usage data Probabilistic noise handling Programming using natural language 32 Future Directions

Killer application: Data wrangling –99% of end users are non-programmers. –Data scientists spend 80% time cleaning data. Algorithmic search –Domain-specific language –Deductive methodology based on back-propagation Ambiguity resolution –Ranking –Interactivity 33 Conclusion Reference: “Programming by Examples (and its applications in Data Wrangling)”, In Verification and Synthesis of Correct and Secure Systems; IOS Press; 2016 [based on Marktoberdorf Summer School 2015 Lecture Notes]