Programming by Examples Marktoberdorf Lectures August 2015 Sumit Gulwani.

Slides:



Advertisements
Similar presentations
Information Retrieval in Practice
Advertisements

Synthesizing Number Transformations from Input-Output Examples Rishabh Singh and Sumit Gulwani.
The 20th International Conference on Software Engineering and Knowledge Engineering (SEKE2008) Department of Electrical and Computer Engineering
From Verification to Synthesis Sumit Gulwani Microsoft Research, Redmond August 2013 Marktoberdorf Summer School Lectures: Part 1.
Course Web Site – Also linked from Blackboard Course Materials – Excel Tutorials – Access Tutorials – PPT.
(non-programmers with access to computers)
FlashExtract : A General Framework for Data Extraction by Examples
Learning Semantic String Transformations from Examples Rishabh Singh and Sumit Gulwani.
Data Manipulation using Programming by Examples and Natural Language Invited Upenn April 2015 Sumit Gulwani.
Chapter 8 and 9 Review: Logical Functions and Control Structures Introduction to MATLAB 7 Engineering 161.
 The Rise of Computer Science ◦ Machine Language (1 st Gen) ◦ Assembly Language (2 nd Gen) ◦ Third Generation Languages (FORTRAN, BASIC, Java, C++, etc.)
Two main requirements: 1. Implementation Inspection policies (scheduling algorithms) that will extand the current AutoSched software : Taking to account.
Chapter 7 Data Management. Agenda Database concept Import data Input and edit data Sort data Function Filter data Create range name Calculate subtotal.
Semi-Automatic Generation of Mini-Ontologies from Canonicalized Relational Tables Chris Hathaway.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 16 Slide 1 User interface design.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Programming by Example using Least General Generalizations Mohammad Raza, Sumit Gulwani & Natasa Milic-Frayling Microsoft Research.
Cultivating Research Taste (illustrated via a journey in Program Synthesis research) Programming Languages Mentoring Workshop 2015 Sumit Gulwani Microsoft.
Word Processors, Databases, Spreadsheets, and Data Problems.
Making a Pie Chart In Microsoft Excel For PowerPoint WHAT MY DAY IS LIKE.
Ahsan Abdullah 1 Data Warehousing Lecture-17 Issues of ETL Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Fluency with Information Technology INFO100 and CSE100 Katherine Deibel Katherine Deibel, Fluency in Information Technology1.
Mobile search engine for a smart phone / navigation system can be used to search and compare hundreds of stores and their products in seconds. © 2001 –
Purpose of study A high-quality computing education equips pupils to use computational thinking and creativity to understand and change the world. Computing.
Visual Sequences IQ Tests 1 Dipendra Kumar Misra (Y9201) Mukul Singh (Y9350) Tags : Search, Pattern Recognition, Logic etc Advisor : Dr. Amitabh Mukherjee.
Automatically Synthesizing SQL Queries from Input-Output Examples Sai Zhang University of Washington Joint work with: Yuyin Sun.
End-User Programming (using Examples & Natural Language) Sumit Gulwani Microsoft Research, Redmond August 2013 Marktoberdorf Summer.
Dimensions in Synthesis Part 3: Ambiguity (Synthesis from Examples & Keywords) Sumit Gulwani Microsoft Research, Redmond May 2012.
Melissa Armstrong – Sponsor Dr. Eck Doerry – Mentor Greg Andolshek Alex Koch Michael McCormick Department of Computer Science SolutionProblemDesign User.
Leena Razzaq Office: 310BWVH Office hours: Monday 11am-1pm or by appointment jys.
Just as there are many human languages, there are many computer programming languages that can be used to develop software. Some are named after people,
Problem Solving Techniques. Compiler n Is a computer program whose purpose is to take a description of a desired program coded in a programming language.
Term 2, 2011 Week 1. CONTENTS Problem-solving methodology Programming and scripting languages – Programming languages Programming languages – Scripting.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Formal Methods in Invited CBSoft Sep 2015 Sumit Gulwani Data Wrangling & Education.
CSC 221: Computer Programming I Fall 2001 course overview  what did we set out to learn?  what did you actually learn?  where do you go from here? 
Goals for Presentation Explain the basics of software development methodologies Explain basic XP elements Show the structure of an XP project Give a few.
FlashNormalize: Programming by Examples for Text Normalization International Joint Conference on Artificial Intelligence, Buenos Aires 7/29/2015FlashNormalize1.
Predicting a Correct Program in PBE Rishabh Singh, Microsoft Research Sumit Gulwani, Microsoft Research.
Automating String Processing in Spreadsheets using Input-Output Examples Sumit Gulwani Microsoft Research, Redmond.
Compositional Program Synthesis from Natural Language and Examples Mohammad Raza, Sumit Gulwani & Natasa Milic-Frayling Microsoft.
Intermediate 2 Computing Unit 2 - Software Development.
FlashMeta Microsoft PROSE SDK: A Framework for Inductive Program Synthesis Oleksandr Polozov University of Washington Sumit Gulwani Microsoft Research.
Learning Objectives Understand the concepts of Information systems.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Programming by Examples Marktoberdorf Lectures August 2015 Sumit Gulwani.
Dagstuhl Seminar Oct 2015 Sumit Gulwani Applications of Inductive Programming in Data Wrangling.
Chapter 8-1 Chapter 8 Accounting Information Systems Information Technology Auditing Dr. Hisham madi.
Programming by Examples applied to Data Wrangling Invited SYNT July 2015 Sumit Gulwani.
“Babeş-Bolyai” University Faculty of Economics and Business Administration Second semester 1st year, English line of study Business IT Introductive course.
Deductive Techniques for synthesis from Inductive Specifications Dagstuhl Seminar Oct 2015 Sumit Gulwani.
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
Sumit Gulwani Spreadsheet Programming using Examples Keynote at SEMS July 2016.
Sumit Gulwani Programming by Examples Applications, Algorithms & Ambiguity Resolution Keynote at IJCAR June 2016.
Tackling Ambiguity in PBE Rishabh Singh
Python for data analysis Prakhar Amlathe Utah State University
Outline Core Synthesis Architecture [1 hour by Sumit]
Chapter 14: System Protection
Programming by Examples
Usability Design Space in Programming by Examples
Wrangler: Interactive Visual Specification of Data Transformation Scripts Presented by Tifany Yung October 5, 2015.
Programming by Examples
Database Vocabulary Terms.
Programming by Examples
Programming by Examples
Lecture 12: Data Wrangling
Agenda About Excel/Calc Spreadsheets Key Features
Presentation transcript:

Programming by Examples Marktoberdorf Lectures August 2015 Sumit Gulwani

Demos of Programming-by-Examples Tools Dealing with Ambiguity in Example-based specification 1 Lecture 1

2 Programming by Examples Program synthesis: Generate a program in an underlying language from user specification using a search algorithm. Programming by Examples is a subfield of Program Synthesis Specification: Examples Underlying Languages: Lecture 2 Search Methodology: Lecture 3 An end-to-end story involves aspects from ML, HCI (Lec. 1) Lecture 4: Soon-to-be-released SDK with academic license (given by student attendee Alex Polozov) Lecture 5: Other specs/methodologies/applications + …

3 The New Opportunity End Users (non-programmers with access to computers) Software developer 2 orders of magnitude more end users  99% of computer users don’t know programming. Struggle with simple repetitive tasks Traditional customer for PL technology

Excel help forums

Typical help-forum interaction 300_w5_aniSh_c1_b  w5 =MID(B1,5,2) 300_w30_aniSh_c1_b  w30 =MID(B1,FIND(“_”,$B:$B)+1, FIND(“_”,REPLACE($B:$B,1,FIND(“_”,$B:$B),””))-1) =MID(B1,5,2)

Flash Fill (Excel 2013 feature) demo “Automating string processing in spreadsheets using input-output examples”; POPL 2011; Sumit Gulwani

Data locked up in silos in various formats –Flexible organization for viewing but challenging to manipulate. Wrangling workflow: Extraction, Transformation, Formatting Data scientists spend 80% of their time wrangling data. PBE can enable easier & faster data wrangling experience. 7 Data Wrangling

To get Started! Data Science Class Assignment

FlashExtract Demo 9 “FlashExtract: A Framework for data extraction by examples”; PLDI 2014; Vu Le, Sumit Gulwani

FlashExtract

Trifacta: small, guided steps Start with: End goal: Trifacta provides a series of small transformations: From: Skills of the Agile Data Wrangler (tutorial by Hellerstein and Heer) 1. Split on “:” Delimiter 2. Delete Empty Rows 3. Fill Values Down 4. Pivot Number on Type FlashRelate Table Re-formatting

FlashRelate Demo 13 “FlashRelate: Extracting Relational Data from Semi-Structured Spreadsheets Using Examples”; PLDI 2015; Barowy, Gulwani, Hart, Zorn

Extraction FlashExtract: Extract data from text files, web pages [PLDI 2014; Powershell convertFrom-string cmdlet Transformation Flash Fill: Excel feature for Syntactic String Transformations [POPL 2011, CAV 2015] Semantic String Transformations [VLDB 2012] Number Transformations [CAV 2013] FlashNormalize: Text normalization [IJCAI 2015] Formatting FlashRelate: Extract data from spreadsheets [PLDI 2015, PLDI 2011] FlashFormat: a Powerpoint add-in [AAAI 2014] 14 PBE tools for Data Manipulation

15 PBE Architecture Example-based specification Program Search Algorithm Challenge 1: Ambiguous/under-specified intent may result in unintended programs.

Ranking –Synthesize multiple programs and rank them. 16 Dealing with Ambiguity

Prefer programs with simpler Kolmogorov complexity Prefer fewer constants. Prefer smaller constants. 17 Basic ranking scheme InputOutput Alex PolozovAlex Helmut SeidlHelmut 1 st Word If (input = “Alex Polozov”) then “Alex” else “Helmut” “Alex”

Prefer programs with simpler Kolmogorov complexity Prefer fewer constants. Prefer smaller constants. 18 Challenges with Basic ranking scheme InputOutput Alex PolozovPolozov, Alex Helmut SeidlSeidl, Helmut 2 nd Word + “, ‘’ + 1 st Word “Polozov, Alex” How to select between Fewer larger constants vs. More smaller constants? Idea: Associate numeric weights with constants.

Prefer programs with simpler Kolmogorov complexity Prefer fewer constants. Prefer smaller constants. 19 Challenges with Basic ranking scheme How to select between Same number of same-sized constants? Idea: Examine data features (in addition to program features) InputOutput Missing page numbers, , st Number from the beginning 1 st Number from the end

20 Machine learning based ranking scheme

Rank score of a program is a: weighted combination of various features Features are over both program and user data Features over user data: Similarity of generated output (or even intermediate values) over various user inputs –IsYear –Numeric Deviation –Number of characters –IsPersonName 21 Machine learning based ranking scheme

22 Comparison of Ranking Strategies over FlashFill Benchmarks StrategyAverage # of examples required Basic4.17 Learning1.48 “Predicting a correct program in Programming by Example”; CAV 2015 Rishabh Singh, Sumit Gulwani Basic Learning

FlashFill Ranking Demo 23

“It's a great concept, but it can also lead to lots of bad data. I think many users will look at a few "flash filled" cells, and just assume that it worked. … Be very careful.” 24 Need for a fall-back mechanism “most of the extracted data will be fine. But there might be exceptions that you don't notice unless you examine the results very carefully.”

Ranking –Synthesize multiple programs and rank them. User Interaction Models –Communicate actionable information to the user. 25 Dealing with Ambiguity

Make it easy to inspect output correctness –User can accordingly provide more examples Show programs –in any desired programming language; in English –Enable effective navigation between programs Computer initiated interactivity (Active learning) –Highlight less confident entries in the output. –Ask directed questions based on distinguishing inputs. 26 User Interaction Models for Ambiguity Resolution “User Interaction Models for Disambiguation in Programming by Example”, [UIST 2015] Mayer, Soares, Grechkin, Le, Marron, Polozov, Singh, Zorn, Gulwani

FlashExtract Demo (User Interaction Models) 27