Code Search and Idiomatic Snippet Synthesis Mukund Raghothaman University of Pennsylvania (Joint work with Yi Wei and Youssef Hamadi)

Slides:



Advertisements
Similar presentations
Creating a Dialog-Based Comet Windows Program Brian Levantine.
Advertisements

Semantics Static semantics Dynamic semantics attribute grammars
Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
CSCI 6962: Server-side Design and Programming Input Validation and Error Handling.
Computer and Programming
SPARQL RDF Query.
IS 1181 IS 118 Introduction to Development Tools VB Chapter 06.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Knowledge is Power Marketing Information System (MIS) determines what information managers need and then gathers, sorts, analyzes, stores, and distributes.
Overview of Search Engines
CMSC 104, Version 8/061L18Functions1.ppt Functions, Part 1 of 4 Topics Using Predefined Functions Programmer-Defined Functions Using Input Parameters Function.
Getting Started Example ICS2O curriculum
8 Copyright © 2004, Oracle. All rights reserved. Creating LOVs and Editors.
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
Eric Vogel Software Developer A.J. Boggs & Company.
XSLT for Data Manipulation By: April Fleming. What We Will Cover The What, Why, When, and How of XSLT What tools you will need to get started A sample.
CMPS 3223 Theory of Computation Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided.
CISC474 - JavaScript 03/02/2011. Some Background… Great JavaScript Guides: –
PLATFORM INDEPENDENT SOFTWARE DEVELOPMENT MONITORING Mária Bieliková, Karol Rástočný, Eduard Kuric, et. al.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Internet and Distributed Representation of Agent Based Model by- Manish Sharma.
Putting it all together: LINQ as an Example. The Problem: SQL in Code Programs often connect to database servers. Database servers only “speak” SQL. Programs.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
JAVA SERVER PAGES. 2 SERVLETS The purpose of a servlet is to create a Web page in response to a client request Servlets are written in Java, with a little.
ASP.NET Programming with C# and SQL Server First Edition Chapter 5 Manipulating Strings with C#
Program documentation Using the Doxygen tool Program documentation1.
CSC-682 Cryptography & Computer Security Sound and Precise Analysis of Web Applications for Injection Vulnerabilities Pompi Rotaru Based on an article.
JavaScript. Overview Introduction: JavaScript basics Expressions and types Expressions and types Arrays Arrays Objects and Associative Arrays Objects.
AUTOMATION OF WEB-FORM CREATION - KINNERA ANGADI – MS FINAL DEFENSE GUIDANCE BY – DR. DANIEL ANDRESEN.
The WinMine Toolkit Max Chickering. Build Statistical Models From Data Dependency Networks Bayesian Networks Local Distributions –Trees Multinomial /
Using CookCC.  Use *.l and *.y files.  Proprietary file format  Poor IDE support  Do not work well for some languages.
Basic & Advanced Reporting in TIMSNT ** Part Two **
Optimization in XSLT and XQuery Michael Kay. 2 Challenges XSLT/XQuery are high-level declarative languages: performance depends on good optimization Performance.
Distributed Information Retrieval Using a Multi-Agent System and The Role of Logic Programming.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
File Input and Output (I/O) Engineering 1D04, Teaching Session 7.
More About Objects and Methods Chapter 5. Outline Programming with Methods Static Methods and Static Variables Designing Methods Overloading Constructors.
Working with Forms and Regular Expressions Validating a Web Form with JavaScript.
Evaluation of (Search) Results How do we know if our results are any good? Evaluating a search engine  Benchmarks  Precision and recall Results summaries:
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
IBM Research © Copyright IBM Corporation 2005 | A Development Environment for Configurable Meta-Annotators in a Pipelined NLP Architecture Youssef Drissi,
XP Tutorial 8 Adding Interactivity with ActionScript.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
FlashNormalize: Programming by Examples for Text Normalization International Joint Conference on Artificial Intelligence, Buenos Aires 7/29/2015FlashNormalize1.
Oracle Data Integrator User Functions, Variables and Advanced Mappings
1) PHP – Personal Home Page Scripting Language 2) JavaScript.
1 Java Server Pages A Java Server Page is a file consisting of HTML or XML markup into which special tags and code blocks are inserted When the page is.
LINQ Language Integrated Query LINQ1. LINQ: Why and what? Problem Many data sources: Relational databases, XML, in-memory data structures, objects, etc.
Getting Your Content in the Penn State Student Portal Presented By James Leous, Program Manager James Vuccolo, Lead Research Programmer.
General Architecture of Retrieval Systems 1Adrienn Skrop.
 Corpus Formation [CFT]  Web Pages Annotation [Web Annotator]  Web sites detection [NEACrawler]  Web pages collection [NEAC]  IE Remote.
Learning Usage of English KWICly with WebLEAP/DSR
Natural Language Processing (NLP)
Easy-Bash: Designing a Metasearch Engine for Bash Command Queries
Phil Tayco Slide version 1.0 Created Oct 2, 2017
Part A – Doing Your Own Input Validation with Simple VB Tools
PHP.
Representation, Syntax, Paradigms, Types
Natural Language Processing (NLP)
Classes and Objects Object Creation
Natural Language Processing (NLP)
Presentation transcript:

Code Search and Idiomatic Snippet Synthesis Mukund Raghothaman University of Pennsylvania (Joint work with Yi Wei and Youssef Hamadi)

“How do I match a regular expression in C#?” EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis2

“How do I match a regular expression in C#?” (Now) 1.Ask Google / Bing / ⋯ 2.Read returned web pages 3.Repeat Step 2 4.… 5.“ Match.Success is what we need!” 6.… 7.Write code EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis3

“How do I match a regular expression in C#?” (Us) 1.Enter query “match regular expression” 2.Get answer: string pattern; RegexOptions options; var regex = new Regex(pattern, options); string input; var match = regex.Match(input); if (match.Success) { var groups = match.Groups; } EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis4 Branches and loops synthesized Descriptive variable names

“Download file from URL” var wc = new WebClient(); string address; string fileName; wc.DownloadFile(address, fileName); EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis5 Method returns void Unintuitively named API classes Possibly uninitialized variables

SWIM: Synthesize What I Mean EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis6

SWIM: Synthesize What I Mean Input: API-related query (“How do I play a sound?”) Output: Idiomatic C# code snippet Requirements: Speed No user annotations We do not answer: “C# class static member initialization order” Or: “C# lambda” EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis7

SWIM: Synthesize What I Mean Input: API-related query (“How do I play a sound?”) Output: Idiomatic C# code snippet Requirements: Speed No user annotations This talk: How do we build SWIM? Question 1: Given a natural language query, what code do we synthesize? Question 2: What are code idioms? How do we recognize them? How do we synthesize from them? EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis8

IntelliSense EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis9

Type Inhabitation EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis10

Visual Studio Code Snippets EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis11 Slava Agafonov,

Bing Developer Assistant EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis12

anyCode Synthesizes expressions; SWIM synthesizes code snippets Aware of developer context: local variables etc. Code idioms expressed as Probabilistic Context Free Grammars anyCode parses the user input; SWIM uses a bag- of-words EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis13

Structured Call Sequences EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis14

Structured Call Sequences Regex.Match(string) Many code snippets in the corpus similar to: var match = regex.Match(…); if (match.Success) { var groups = match.Groups; … } EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis15

Structured Call Sequences Code seen: var match = regex.Match(…); if (match.Success) { var groups = match.Groups; … } Corresponding structured call sequence: ■ := Regex.Match(string); if ([■.Success] get ) { [■.Groups] get ; } EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis16

Structured Call Sequences Code seen var dialog = new OpenFileDialog(); dialog.Title =...; dialog.InitialDirectory =...; if (dialog.ShowDialog()) { var var1 = dialog.FileName; } Structured call sequence ■ := new OpenFileDialog(); [■.Title] set ; [■.InitialDirectory] set ; if (■.ShowDialog()) { [■.FileName] get ; } EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis17

Structured Call Sequences EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis18 Exceptions, generics, first- class functions, anonymous classes, … not (yet) included Simple imperative proto-language

Structured Call Sequences: Thesis Capture API usage patterns Easy to extract and straightforward synthesis targets EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis19

Big Picture Question 1: Given a natural language query, what code do we synthesize? Given a natural language query, which structured call sequence do we pick for synthesis? Question 2: What are code idioms? How do we recognize them? How do we synthesize from them? Question 2.1: How do we extract SCS from the corpus? Question 2.2: How do we synthesize code from SCS? EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis20

Structured Call Sequences: Extraction EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis21

Structured Call Sequences: Synthesis 1.How do we get a Regex object to invoke Regex.Match(string) ? 2.What argument do we pass to the Regex.Match(string) method? 3.What do we name “ ■ ”? EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis22 ■ := Regex.Match(string); if ([■.Success] get ) { [■.Groups] get ; }

Q1: Object Creation How do we get a Regex object to invoke Regex.Match(string) ? Perform a recursive lookup! Use the same NLP method to find the best structured call sequence for Regex, which also happens to invoke Regex.Match(string) EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis23 ■ := Regex.Match(string); if ([■.Success] get ) { [■.Groups] get ; }

Q2: Method Arguments What argument do we pass to the Regex.Match(string) method? What we did: For basic types ( int, double, etc.), use the value 0 For all other types, use null Use formal name of argument to reflect intent var input = default(string); regex.Match(input); More intelligent solutions certainly possible EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis24 ■ := Regex.Match(string); if ([■.Success] get ) { [■.Groups] get ; }

Q3: The Variable Name Model EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis25 ■ := Regex.Match(string); if ([■.Success] get ) { [■.Groups] get ; }

SWIM Tool Architecture GitHub code corpus mined for API usage patterns Query-to-API translation done using Bing clickthrough data EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis26

EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis27

Ranked APIs Convenient hand-off point between NLP experts and synthesis experts Input query: “append strings” Ranked APIs: StringBuilder.Append(string) StringBuilder.AppendLine(string) … EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis28

Query-to-API Mapping EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis29

Query-to-API Mapping Several potential ways: Search for matches using C# documentation [SNIFF, Chatterjee et al, 2009] Pass query to Bing, and look at code snippets within search results Clickthrough data more reliable EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis30

Clickthrough Data EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis31 “match regular expression” → us/library/system.text.regularexpressions.regex.matchhttps://msdn.microsoft.com/en- us/library/system.text.regularexpressions.regex.match

Clickthrough Data EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis us/library/system.text.regularexpressions.regex.match → Regex.Match(string) us/library/system.text.regularexpressions.regex.match 2. us/library/system.text.regularexpressions.regex.match → Regex.Match(string, int) us/library/system.text.regularexpressions.regex.match 3.…

Query-to-API Mapping EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis33

EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis34

Picking Structured Call Sequences for Synthesis EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis35

Picking Structured Call Sequences for Synthesis EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis36

Evaluation EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis37

Evaluation Queries EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis38 append stringsexecute sql statementparse xml append text filegenerate md5 hash codeplay sound binaryformatterget current directoryrandom number connect to databaseget files in folderread binary file convert int to stringlaunch processread text file convert string to intload bitmap imagesend mail copy fileload dllserialize xml create filematch regular expressionstring split current timeopen file dialogsubstring download file from urlparse datetime from string test file exists 30 common API-related queries from the Bing query log

Evaluation 10 solution snippets generated for each query Graded manually by a human programmer: Relevant / Irrelevant Top solution relevant in 70% of the cases At least one relevant solution in each case Variable name selection: Appropriate / Inappropriate Average of 2.5 variable names required per snippet 88% of chosen names marked appropriate Very responsive: 1.5 seconds per generated snippet EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis39

Evaluation: Oops! (1) Query 1: “convert string to int” Query 2: “convert int to string” Same generated snippet for both var value = default(string); System.Convert.ToInt32(value); Because query-to-API translator uses bags-of-words EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis40

Evaluation: Oops! (2) Query: “open file dialog” Filter property specifies types of files to be chosen Special syntax for correct values For example: "Text Files (.txt)|*.txt" Generated snippet is unhelpful var dlg = new OpenFileDialog(); dlg.Title = null; dlg.InitialDirectory = null; dlg.Filter = null; dlg.FilterIndex = 0; if (dlg.ShowDialog()) { var fName = dlg.FileName; } EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis41

Evaluation: Oops! (2) Query: “open file dialog” Filter property specifies filetypes to be chosen Special syntax for correct values For example: "Text Files (.txt)|*.txt" Generated snippet is unhelpful Similar examples: regular expressions, date-time format strings ( “dd-mm-yyyy" ), etc. EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis42

Evaluation: Oops! (3) Query: “launch process” First relevant snippet ranked 8 th var startInfo = new ProcessStartInfo(); startInfo.FileName = null; var process = Process.Start(startInfo); process.WaitForExit(); var startInfo = new ProcessStartInfo(); startInfo.FileName = null; startInfo.CreateNoWindow = false; startInfo.RedirectStandardOutput = false; startInfo.RedirectStandardError = false; startInfo.UseShellExecute = false; EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis43

Evaluation: Oops! (3) Query: “launch process” First relevant snippet ranked 8 th ProcessStartInfo is ranked very highly by the query-to-API model If code synthesizer starts with a ProcessStartInfo object, then it will never call Process.Start() Can we somehow require that every ProcessStartInfo object is destined to be fed into Process.Start() ? Joint probability distributions, perhaps? EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis44

Conclusion Presented SWIM, a code search tool powered by the GitHub code corpus and Bing clickthrough data EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis45

Future Work Open-source code corpuses are a great resource for programming language researchers (Traditionally used as) Benchmarks Anomaly detection Program synthesis Consciously consider statistics and uncertainty in program analysis Clustering runtime values: overloaded types such as strings Inferring types in dynamic languages EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis46

EPFL Visit, April 2016Code Search and Idiomatic Snippet Synthesis47