Predicting a Correct Program in PBE Rishabh Singh, Microsoft Research Sumit Gulwani, Microsoft Research.

Slides:

Advertisements

Similar presentations

Synthesizing Number Transformations from Input-Output Examples Rishabh Singh and Sumit Gulwani.

Advertisements

Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:

Sequential Minimal Optimization Advanced Machine Learning Course 2012 Fall Semester Tsinghua University.

1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)

CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.

Analysis of Algorithms

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.

Learning Semantic String Transformations from Examples Rishabh Singh and Sumit Gulwani.

Data Manipulation using Programming by Examples and Natural Language Invited Upenn April 2015 Sumit Gulwani.

Finding the Median Algorithm : Design & Analysis [11]

Support Vector Machines Joseph Gonzalez TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A AA A AA.

Personalized Search Result Diversification via Structured Learning

Machine Learning CPSC 315 – Programming Studio Spring 2009 Project 2, Lecture 5.

1 A Prediction Interval for the Misclassification Rate E.B. Laber & S.A. Murphy.

Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.

1 A Prediction Interval for the Misclassification Rate E.B. Laber & S.A. Murphy.

Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.

Practice for Midterm 1. Practice problems These slides have six programming problems for in-class practice There are an additional seven programming problems.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.

Programming by Example using Least General Generalizations Mohammad Raza, Sumit Gulwani & Natasa Milic-Frayling Microsoft Research.

Cultivating Research Taste (illustrated via a journey in Program Synthesis research) Programming Languages Mentoring Workshop 2015 Sumit Gulwani Microsoft.

Final Presentation Tong Wang. 1.Automatic Article Screening in Systematic Review 2.Compression Algorithm on Document Classification.

EM and expected complete log-likelihood Mixture of Experts

Programming by Examples Marktoberdorf Lectures August 2015 Sumit Gulwani.

End-User Programming (using Examples & Natural Language) Sumit Gulwani Microsoft Research, Redmond August 2013 Marktoberdorf Summer.

Generative Programming Meets Constraint Based Synthesis Armando Solar-Lezama.

Dimensions in Synthesis Part 3: Ambiguity (Synthesis from Examples & Keywords) Sumit Gulwani Microsoft Research, Redmond May 2012.

Functions Oracle Labs 5 & 6. 2/3/2005Adapted from Introduction to Oracle: SQL and PL/SQL 2 SQL Functions Function arg n arg 2 arg 1. Input Resulting Value.

Model representation Linear regression with one variable

Andrew Ng Linear regression with one variable Model representation Machine Learning.

Section Summation & Sigma Notation. Sigma Notation  is the Greek letter “sigma” “Sigma” represents the capital “S”

Xiangnan Kong,Philip S. Yu Department of Computer Science University of Illinois at Chicago KDD 2010.

Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.

Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.

Tokenization & POS-Tagging

Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.

Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.

1 Italian FE Component CROSSMARC Eighth Meeting Crete 24 June 2003.

1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.

FlashNormalize: Programming by Examples for Text Normalization International Joint Conference on Artificial Intelligence, Buenos Aires 7/29/2015FlashNormalize1.

Automating String Processing in Spreadsheets using Input-Output Examples Sumit Gulwani Microsoft Research, Redmond.

Compositional Program Synthesis from Natural Language and Examples Mohammad Raza, Sumit Gulwani & Natasa Milic-Frayling Microsoft.

Compiler Principles Fall Compiler Principles Exercise Set 1: Lexical Analysis Roman Manevich Ben-Gurion University.

Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.

Chapter 7 Continued Arrays & Strings. Strings as Class Members Strings frequently appear as members of classes. The next example, a variation of the objpart.

BOOTSTRAPPING INFORMATION EXTRACTION FROM SEMI-STRUCTURED WEB PAGES Andrew Carson and Charles Schafer.

Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.

Programming by Examples Marktoberdorf Lectures August 2015 Sumit Gulwani.

Dagstuhl Seminar Oct 2015 Sumit Gulwani Applications of Inductive Programming in Data Wrangling.

Programming by Examples applied to Data Wrangling Invited SYNT July 2015 Sumit Gulwani.

Deductive Techniques for synthesis from Inductive Specifications Dagstuhl Seminar Oct 2015 Sumit Gulwani.

Sumit Gulwani Spreadsheet Programming using Examples Keynote at SEMS July 2016.

Sumit Gulwani Programming by Examples Applications, Algorithms & Ambiguity Resolution Keynote at IJCAR June 2016.

Tackling Ambiguity in PBE Rishabh Singh

1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.

Outline Core Synthesis Architecture [1 hour by Sumit]

Machine Learning Inductive Learning and Decision Trees

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Lecture 3: Linear Regression (with One Variable)

Issues in Decision-Tree Learning Avoiding overfitting through pruning

Programming by Examples

بسم الله الرحمن الرحيم.

Programming by Examples

Programming by Examples

إستراتيجيات ونماذج التقويم

CS639: Data Management for Data Science

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Logistic Regression Geoff Hulten.

Presentation transcript:

Predicting a Correct Program in PBE Rishabh Singh, Microsoft Research Sumit Gulwani, Microsoft Research

Programming By Examples Intuitive Natural Accessible Ambiguity!

Excel Forums 300_w1_aniSh_c1_b  w1 =MID(“300_w1_aniSh_c1_b”,5,2)

300_w30_aniSh_c1_b  w30 =MID($B:$B,FIND(“_”,$B:$B)+1, FIND(“_”,REPLACE($B:$B,1,FIND(“_”,$B:$B),””))-1) Excel Forums

FlashFill [Gulwani POPL2011][Gulwani,Harris,Singh CACM 2012] DSL VSA Program Heuristics Benchmarks

DSL VSA Program Ranking Benchmarks

Handling Ambiguity InputOutput Rick RashidMr. Rick Satya Nadella

Prefer non-constants InputOutput Rick RashidMr. Rick Satya NadellaMs. Satya Prefer smaller substrings as constants

Prefer smaller constants InputOutput Satya NadellaS. Nadella Bill Gates 2 nd word, last word, 2 nd capital followed by 2 nd lowercase string….

Machine Learning for Ranking “With great power comes great responsibility.”

Labelled Training Data Machine Learning Algorithm Efficient Ranking Algorithm Three Challenges

Training Data Generation InputOutput Rick RashidMr. Rashid Satya NadellaMr. Nadella Peter LeeMr. Lee

Structuring Hypothesis Space with Sharing in Version-space Associative Expressions Fixed-arity Expressions f(e 1, f(e 2, f(e 3, e 4 ))) f(e 1, e 2, e 3, e 4 ) DAG-based sharing Set-based sharing

Ranking Function f(p) Assume Linear Function f(p) = w 1 * f 1 + w 2 *f 2 + … + w k *f k

Learning To Rank Logistic Regression Listwise Approach Didn’t work well  Too strong a constraint All relevant pages over irrelevant

Training Phase InputOutput Rick RashidMr. Rick Satya NadellaMr. Satya Peter LeeMr. Lee Lower 1 st uppercase letter Constant “r” Lower 2 nd upper case letter …. Goal: Find ranking function f(p) over program features that ranks positive programs higher than negative programs

Learn DAGs Rick Rashid  Mr. Rashid Satya Nadella  Mr. Satya

Intersect DAGs Rick Rashid  Mr. Rick Satya Nadella  Mr. Satya

Assign Positive Labels Rick Rashid  Mr. Rick Satya Nadella  Mr. Satya

Assign Negative Labels Rick Rashid  Mr. Rick Satya Nadella  Mr. Satya

Rick Rashid  Mr. Rick Satya Nadella  Mr. Satya Learn ranking function f(p) that ranks programs higher than programs.

Training Phase Positive ProgramsNegative Programs Rank any positive program over all negative programs

Hierarchical Ranking Atomic Expression Substring Expression Concat Expression Frequency of tokens, context, neighborhood,… Length of substring, input, output, constant,… Number of Arguments, sum, max, min, prod

Evaluation 175 benchmarks train-test partition Baseline (Occam’s razor): Smallest & Simplest programs

Ranking Evaluation LearnRank learns from 1 example for 79% benchmarks

Efficiency of Ranking

Ranking for PBE Machine Learning + Synthesis VSA Sharing Formalization Efficient Features & Algorithms General Loss Function for PBE Thanks!