1 Recommendation Systems for Code Reuse Tao Xie Department of Computer Science North Carolina State University Raleigh, USA.

Slides:



Advertisements
Similar presentations
Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers I am Raphael Hoffmann and this is joint work with James Fogarty.
Advertisements

Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.
Chapter 7 User-Defined Methods. Chapter Objectives  Understand how methods are used in Java programming  Learn about standard (predefined) methods and.
 2005 Pearson Education, Inc. All rights reserved Introduction.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 8 Slide 1 System modeling 2.
Finding Code to Reuse Kerry Chang Human-Computer Interaction Institute Carnegie Mellon University D: Human Aspects of Software Development (HASD)
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
By Intellext Presented By: Neha Bhatt. What is Watson? Watson is an information access assistant that automatically retrieves useful information in the.
ASP.NET Programming with C# and SQL Server First Edition
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Mining Jungloids to Cure Programmer Headaches Dave Mandelin, Ras BodikUC Berkeley Doug KimelmanIBM.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
Introduction To System Analysis and design
2012 National BDPA Technology Conference Creating Rich Data Visualizations using the Google API Yolanda M. Davis Senior Software Engineer AdvancED August.
The 2nd International Conference of e-Learning and Distance Education, 21 to 23 February 2011, Riyadh, Saudi Arabia Prof. Dr. Torky Sultan Faculty of Computers.
XFindBugs: eXtended FindBugs for AspectJ Haihao Shen, Sai Zhang, Jianjun Zhao, Jianhong Fang, Shiyuan Yao Software Theory and Practice Group (STAP) Shanghai.
Introduction to Java Appendix A. Appendix A: Introduction to Java2 Chapter Objectives To understand the essentials of object-oriented programming in Java.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University ICSE 2003 Java.
Search Engines and Information Retrieval Chapter 1.
1 v1.6 08/02/2006 Overview of Eclipse Lectures 1.Overview 2.Installing and Running 3.Building and Running Java Classes 4.Refactoring 5.Debugging 6.Testing.
Using JavaBeans and Custom Tags in JSP Lesson 3B / Slide 1 of 37 J2EE Web Components Pre-assessment Questions 1.The _____________ attribute of a JSP page.
Dale Roberts Procedural Programming using Java Dale Roberts, Lecturer Computer Science, IUPUI Department of Computer and.
Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University
1 PARSEWeb: A Programmer Assistant for Reusing Open Source Code on the Web Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Programming in Java Unit 2. Class and variable declaration A class is best thought of as a template from which objects are created. You can create many.
1 Module Objective & Outline Module Objective: After completing this Module, you will be able to, appreciate java as a programming language, write java.
JAVA SERVER PAGES. 2 SERVLETS The purpose of a servlet is to create a Web page in response to a client request Servlets are written in Java, with a little.
Reviewing Recent ICSE Proceedings For:.  Defining and Continuous Checking of Structural Program Dependencies  Automatic Inference of Structural Changes.
Hipikat: A Project Memory for Software Development The CISC 864 Analysis By Lionel Marks.
POS 406 Java Technology And Beginning Java Code
CSE 131 Computer Science 1 Module 1: (basics of Java)
Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign
Chapter 6 Server-side Programming: Java Servlets
Debug Concern Navigator Masaru Shiozuka(Kyushu Institute of Technology, Japan) Naoyasu Ubayashi(Kyushu University, Japan) Yasutaka Kamei(Kyushu University,
Methods in Java. Program Modules in Java  Java programs are written by combining new methods and classes with predefined methods in the Java Application.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Computer Science Automated Software Engineering Research ( Mining Exception-Handling Rules as Conditional Association.
Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina.
Object Oriented Software Development
XP Tutorial 8 Adding Interactivity with ActionScript.
Exploiting Code Search Engines to Improve Programmer Productivity and Quality Suresh Thummalapenta Advisor: Dr. Tao Xie Department of Computer Science.
Summing Up Object Oriented Design. Four Major Components: Abstraction modeling real-life entities by essential information only Encapsulation clustering.
Design and Implementation of a Rationale-Based Analysis Tool (RAT) Diploma thesis from Timo Wolf Design and Realization of a Tool for Linking Source Code.
Methods: A Deeper Look. Template for Class Definition public class { } A.Import Statement B.Class Comments C.Class Name D.Data members E.Methods (inc.
UMass Lowell Computer Science Java and Distributed Computing Prof. Karen Daniels Fall, 2000 Lecture 9 Java Fundamentals Objects/ClassesMethods Mon.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
OOP (Object Oriented Programming) Lecture 1. Why a new paradigm is needed? Complexity Five attributes of complex systems –Frequently, complexity takes.
 In the java programming language, a keyword is one of 50 reserved words which have a predefined meaning in the language; because of this,
Data Design and Implementation. Definitions Atomic or primitive type A data type whose elements are single, non-decomposable data items Composite type.
(1) ICS 313: Programming Language Theory Chapter 11: Abstract Data Types (Data Abstraction)
Inheritance and Class Hierarchies Chapter 3. Chapter 3: Inheritance and Class Hierarchies2 Chapter Objectives To understand inheritance and how it facilitates.
Chapter 11: Advanced Inheritance Concepts. Objectives Create and use abstract classes Use dynamic method binding Create arrays of subclass objects Use.
Recommending Adaptive Changes for Framework Evolution Barthélémy Dagenais and Martin P. Robillard ICSE08 Dec 4 th, 2008 Presented by EJ Park.
© 2006 Pearson Addison-Wesley. All rights reserved 1-1 Chapter 1 Review of Java Fundamentals.
Java Programming: Advanced Topics 1 Building Web Applications Chapter 13.
© 2004 Pearson Addison-Wesley. All rights reserved September 5, 2007 Packages & Random and Math Classes ComS 207: Programming I (in Java) Iowa State University,
CAR-Miner: Mining Exception-Handling Rules as Sequence Association Rules Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina.
7-Nov Fall 2001: copyright ©T. Pearce, D. Hutchinson, L. Marshall Oct lecture23-24-hll-interrupts 1 High Level Language vs. Assembly.
SESSION 1 Introduction in Java. Objectives Introduce classes and objects Starting with Java Introduce JDK Writing a simple Java program Using comments.
More Sophisticated Behavior
CS 326 Programming Languages, Concepts and Implementation
{ XML Technologies } BY: DR. M’HAMED MATAOUI
Lecture 2 of Computer Science II
Cross-library API Recommendation Using Web Search Engines
MSIS 655 Advanced Business Applications Programming
Data Mining Chapter 6 Search Engines
MAPO: Mining and Recommending API Usage Patterns
Plug-In Architecture Pattern
Presentation transcript:

1 Recommendation Systems for Code Reuse Tao Xie Department of Computer Science North Carolina State University Raleigh, USA

2 2 Motivation Programmers commonly reuse APIs of existing frameworks or libraries –Advantages: Low cost and high efficiency of development –Challenges: Complexity and lack of documentation E.g., searching for information nearly ¼ of developer time [metallect.com] Frame works

Example Task from Eclipse Programming Task: How to parse code in a dirty editor of Eclipse? ? Query: “IEditorPart -> ICompilationUnit” Open Source Projects 1 2 N … … Extract MIS 1 MIS 2... … MIS k *MIS: Method-Invocation sequence, FMIS: Frequent MIS FMIS 1 FMIS 2 … FMIS n Recommend Mine PARSEWeb [Thummalapenta&Xie ASE 07]

4 Scenario 1 While reusing APIs of existing open source frameworks or libraries, programmers often –know what type of object they need –but do not know how to write code for getting that object Query: “Source  Destination” How to use these APIs? Prospector [Mandelin et al. PLDI 05 ], XSnippet [Sahavechaphan&Claypool OOPSLA 06 ], PARSEWeb [Thummalapenta&Xie ASE 07]

5 Example Task from Eclipse Programming Task: How to parse code in a dirty editor? Query: IEditorPart  ICompilationUnit Example solution from Prospector/PARSEWeb: IEditorPart iep =... IEditorInput editorInp = iep.getEditorInput(); IWorkingCopyManager wcm = JavaUI.getWorkingCopyManager(); ICompilationUnit icu = wcm.getWorkingCopy(editorInp); Difficulties: a. Needs an instance of IWorkingCopyManager b. Needs to invoke a static method of JavaUI for getting the preceding instance Prospector [Mandelin et al. PLDI 05 ], XSnippet [Sahavechaphan&Claypool OOPSLA 06 ], PARSEWeb [Thummalapenta&Xie ASE 07]

6 Scenario 2 While reusing APIs of existing open source frameworks or libraries, programmers often –know what method call they need –but do not know how to write code before and after this method call Query: “Method name” How to use these APIs? MAPO [Xie&Pei MSR 05]

7 Example Task from BCEL Programming Task: How to instrument the bytecode of a Java class by adding an extra method to the class? Query : org.apache.bcel.generic.ClassGen public void addMethod(Method m ) Example solution from MAPO: public void generateStubMethod(ClassGen c) InstructionList il = new InstructionList(); MethodGen m= genFromISList(il); m.setMaxLocals(); m.setMaxStack(); c.addMethod(m.getMethod()); System.out.println(“…”); … } MAPO [Xie&Pei MSR 05]

8 Scenario 3 While reusing APIs of existing open source frameworks or libraries, programmers often –know structural context such as a class’ type, its parents, and fields’ types, a method’s signature, method or constructor callees –but do not know how to write code in this context Query: Structural context How to use these APIs? Strathcona [Holmes et al. 05], XSnippet [Sahavechaphan&Claypool OOPSLA 06 ]

9 Example Task from HttpClient Programming Task: How to evolve a system to use a third party library, HttpClient, for handling http connections? Query : HttpClient, PostMethod classes Example solution from Strathcona: Strathcona [Holmes et al. 05], XSnippet [Sahavechaphan&Claypool OOPSLA 06 ]

10 Steps in Recommenders Data collection/extraction Data preprocessing Data analysis/mining Result postprocessing Result representation

11 Data Collection/Extraction From one or multiple local code repositories –Often followed by offline analysis or mining –Challenges: lack of relevant code examples –Ex.: Strathcona, Prospector, XSnippet From the whole open source world with a code search engine! –Often followed by on-the-fly analysis and mining –Challenges: only partial code files –Ex.: MAPO, PARSEWeb

12 Exploiting A Code Search Engine Accepts queries including keywords of classes or/and method names Interacts with a code search engine such as Google code search to gather related code samples Stores gathered code samples (source files) in a local code repository (later being analyzed and mined) Challenges: gathered code samples are partial and not compilable as code search engines retrieve individual source files instead of entire projects PARSEWeb [Thummalapenta&Xie ASE 07]

13 Available Code Search Engines Google Code Search Krugle: Koders: Codase: JExamples: etc., Why not using just code search engines?

What are Developers Searching for? Assieme [Hoffmann et al. UIST 07] 339 sessions related to Java programming 15 million queries of Windows Live Search from May API sessions (34.2%); 70 trouble-shooting sessions (20.6%)

15 API-related Search Sessions 64.1% sessions contained queries that were merely descriptive but did not contain actual names of APIs, packages, types, or members. The remaining sessions contained –API or package names (12.8%), –Type names (17.9%) –Method names (5.1%). Among all these API-related sessions, 17.9% contained terms like “example”, “using”, or “sample code” Assieme [Hoffmann et al. UIST 07]

16 An Example 4-Query Session java JSP current date java SimpleDateFormat using currentdate in jsp Assieme [Hoffmann et al. UIST 07]

Only compatible with new Java versions Why Not Use Web Search Engines? Requires installation of external library, but no link Code on pages essentially the same Contains no code examples parse xml java ©Raphael Hoffmann Assieme [Hoffmann et al. UIST 07]

Code Search Engines import javax.xml.parsers.*; import org.w3c.dom.*; public class JAXPSample { public static void main(String[] args) { String filename = "sample.xml"; try { DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder parser = factory.newDocumentBuilder(); Document d = parser.parse(filename); } catch (Exception e) { System.err.println("Exception: " + e.getMessage()); } } } Index source code of open-source Projects (from compressed archive Files and CVS repositories) Code is parsed and terms in type names, variable names, etc. are weighted differently. ©Raphael Hoffmann Assieme [Hoffmann et al. UIST 07]

Why not use code search engines only? Irrelevant (An Emacs Lisp File!?!) Code is complicated, contains no comments related to query, and is more than 300(!) lines long Requires installation of external library, but no link Code on pages essentially the same parse xml java ©Raphael Hoffmann Assieme [Hoffmann et al. UIST 07]

Why not use code search engines only? MAPO [Xie&Pei MSR 06]

21 Steps in Recommenders Data collection/extraction Data preprocessing Data analysis/mining Result postprocessing Result representation

22 Fact Extraction Whole-program analysis: applicable when the whole code bases are available and compilable Partial-program analysis: applicable when only partial code samples are available and not compilable –When a code search engine is used

23 Analysis of Partial Code Samples Not all code samples contain main method or driver code that can serve as an entry point –consider all public methods as entry points Deal with local method calls by inlining methods Deal with conditionals/loops by traversing control flow graphs Deal with unknown types with heuristics PARSEWeb [Thummalapenta&Xie ASE 07]

24 Type Heuristics I Inferring fully qualified class names import javax.jms.QueueSession; import java.util.*; Public class test { public QueueSession qsObj; public Integer intObj; public Iterator iter; … -Fully qualified name of QueueSession is “javax.jms.QueueSession”, inferred through lookup of import statement -Fully qualified name of Integer is “java.lang.Integer”, inferred through loading of a class by appending “java.lang” to the class name -Cannot infer the fully qualified name of “Iterator” (incorporating domain knowledge of java.util helps) 24 PARSEWeb [Thummalapenta&Xie ASE 07]

25 Type Heuristics II Infer the receiver type in expression “X.Y” – Lookup the declaration of X in local variables or member variables. If not, “X” is a class name and Y is a static member Infer the receiver type in expression “M1().Y” – Check the return type of M1() method declaration, if not available locally, the receiver type cannot be inferred 25 PARSEWeb [Thummalapenta&Xie ASE 07]

26 Type Heuristics III Infer the return type of a method invocation in an assignment statement such as “ Queue qObj = createQueueSession()” –Lookup the type of the variable on the left hand side. The return type is the same as or a sub class of Queue Infer the return type of a method invocation in a return statement such as public QueueSession test() {... return connect.createQueueSession(false,int); } - Lookup the return type of the enclosing method declaration 26 PARSEWeb [Thummalapenta&Xie ASE 07]

27 Type Heuristics IV Infer types with multiple method invocations Queue qObj = connect.m1(); Stack sObj = connect.m1().m2(); The receiver type of m2() can be inferred from the lookup of the return type of m1() 27 PARSEWeb [Thummalapenta&Xie ASE 07]

28 Sequence Filtering Remove common Java library calls Remove sequences that contain no query words: ClassGen and addMethod InstructionList. () genFromISList(InstructionList) MethodGen.setMaxStack() MethodGen.setMaxLocals() MethodGen.getMethod() ClassGen.addMethod(Method) PrintStream.println(String) … public void generateStubMethod(ClassGen c) InstructionList il = new InstructionList(); MethodGen m= genFromISList(il); m.setMaxLocals(); m.setMaxStack(); c.addMethod(m.getMethod()); System.out.println(“…”); … } MAPO [Xie&Pei MSR 05]

Type Signature Graph Any path from h to w is a (h,w)-jungloid IFileCompilationUnit ICompilationUnit ASTNode IClassFile JavaCore.createCompilationUnitFrom() AST.parseCompilationUnit() supertype AST.parseCompilationUnit() JavaCore.createClassFileFrom() IJavaElementIResource supertype getResource() IContainer getParent() Prospector [Mandelin et al. PLDI 05 ]

Jungloids with Downcasts IDebugView debugger =... Viewer viewer = debugger.getViewer(); IStructuredSelection sel = (IStructuredSelection) viewer.getSelection(); JavaInspectExpression expr = (JavaInspectExpression) sel.getFirstElement(); IDebugView Viewer ISelection IStructuredSelection JavaInspectExpression Object getViewer() getSelection() getFirstElement() getInput() downcast Prospector [Mandelin et al. PLDI 05 ]

31 Steps in Recommenders Data collection/extraction Data preprocessing Data analysis/mining Result postprocessing Result representation

32 Data Analysis/Mining Some recommenders don’t use specific mining techniques to “abstract” or “generalize” common patterns but return relevant raw code samples –Prospector, Strathcona, XSnippet, PARSEWeb Data mining can be used to uncover hidden patterns –Association rules: CodeWeb [Michail ICSE 00] –Frequent subsequences: MAPO [Xie&Pei MSR 06] –Frequent partial orders: Apiator [Acharya et al. FSE 07]

33 Association Rules KApplication reuse patterns CodeWeb [Michail ICSE 00]

#include void p ( ) { b ( ); c ( ); } void q ( ) { c ( ); b ( ); } void r ( ) { e ( ); f ( ); } void s ( ) { f ( ); e ( ); } int main ( ) { int i, j, k; a ( ); if ( i == 1) { f ( ); e ( ); c ( ); exit ( ); } else { if ( j == 1 ) p ( ); else q ( ); d ( ); if ( k == 1 ) r ( ); else s ( ); } Frequent SubSeq/Partial Order Consider APIs a, b, c, d, e, and f Apiator [Acharya et al. FSE 07]

#include void p ( ) { b ( ); c ( ); } void q ( ) { c ( ); b ( ); } void r ( ) { e ( ); f ( ); } void s ( ) { f ( ); e ( ); } int main ( ) { int i, j, k; a ( ); if ( i == 1) { f ( ); e ( ); c ( ); exit ( ); } else { if ( j == 1 ) p ( ); else q ( ); d ( ); if ( k == 1 ) r ( ); else s ( ); } 1 a  f  e  c 2 a  b  c  d  e  f 3 a  c  b  d  e  f 4 a  b  c  d  f  e 5 a  c  b  d  f  e a d c e b f a  b  d  e a  b  d  f a  c  d  e a  c  d  f (b) Static program traces (c) Frequent sequential patterns Support 4/5 (d) Frequent partial order R (a) Example code Consider APIs a, b, c, d, e, and f Frequent SubSeq/Partial Order Apiator [Acharya et al. FSE 07]

#include void p ( ) { b ( ); c ( ); } void q ( ) { c ( ); b ( ); } void r ( ) { e ( ); f ( ); } void s ( ) { f ( ); e ( ); } int main ( ) { int i, j, k; a ( ); if ( i == 1) { f ( ); e ( ); c ( ); exit ( ); } else { if ( j == 1 ) p ( ); else q ( ); d ( ); if ( k == 1 ) r ( ); else s ( ); } 1 a  f  e  c 2 a  b  c  d  e  f 3 a  c  b  d  e  f 4 a  b  c  d  f  e 5 a  c  b  d  f  e a d c e b f a  b  d  e a  b  d  f a  c  d  e a  c  d  f (b) Static program traces (c) Frequent sequential patterns support, 4/5 (d) Frequent partial order R(a) Example code Frequent SubSeq/Partial Order Consider APIs a, b, c, d, e, and f Apiator [Acharya et al. FSE 07]

1 a  f  e  c 2 a  b  c  d  e  f 3 a  c  b  d  e  f 4 a  b  c  d  f  e 5 a  c  b  d  f  e a d c e b f a  b  d  e a  b  d  f a  c  d  e a  c  d  f (b) Static program traces (c) Frequent sequential patterns support, 4/5 (d) Frequent partial order R (a) Example code #include void p ( ) { b ( ); c ( ); } void q ( ) { c ( ); b ( ); } void r ( ) { e ( ); f ( ); } void s ( ) { f ( ); e ( ); } int main ( ) { int i, j, k; a ( ); if ( i == 1) { f ( ); e ( ); c ( ); exit ( ); } else { if ( j == 1 ) p ( ); else q ( ); d ( ); if ( k == 1 ) r ( ); else s ( ); } Frequent SubSeq/Partial Order Apiator [Acharya et al. FSE 07] MAPO [Xie&Pei MSR 05] MAPO Apiator

38 Data Analysis/Mining Data collection/extraction Data preprocessing Data analysis/mining Result postprocessing Result representation

39 Result Postprocessing When a third-party miner or learner isn’t used, this step may be considered part of the data analysis/mining step. Examples Result clustering Result ranking Result filtering

40 Clustering and Ranking Candidate method sequences produced by the data analysis/mining step for query “Source  Destination” may be too many Solutions: Cluster similar sequences –Clustering heuristics are developed Rank sequences –Ranking heuristics are developed PARSEWeb [Thummalapenta&Xie ASE 07]

41 Clustering Heuristics Method-invocation sequences with the same set of statements can be considered similar, although the statements are in different order. e.g., '' '' and '' '' Method-invocation sequences with minor differences measured by an attribute cluster precision value can be considered similar. e.g., '' '' and '' '' can be considered similar under cluster precision value one PARSEWeb [Thummalapenta&Xie ASE 07]

42 Ranking Heuristics Heuristic 1: Higher frequency -> Higher rank Heuristic 2: Shorter length -> Higher rank Heuristic 3: Fewer package boundaries -> Higher rank PARSEWeb [Thummalapenta&Xie ASE 07]Prospector [Mandelin et al. PLDI 05 ]

43 Query Splitting Lack of code samples that give candidate method- invocation sequences in the results of code search engines –Required method-invocation sequences are split among different source files Solution: –Split the user query into multiple queries –Compose the results for each split query PARSEWeb [Thummalapenta&Xie ASE 07]

44 Query Splitting Example 1. User query: “org.eclipse.jface.viewers.IStructuredSelection->java.io.ObjectInputStream” Results: None 2. Query: “java.io.ObjectInputStream” Results: 3. Most used immediate sources are: java.io.InputStream, java.io.ByteArrayInputStream, java.io.FileInputStream 3. Three Queries to be fired: “org.eclipse.jface.viewers.IStructuredSelection-> java.io.InputStream” Results: 1 “org.eclipse.jface.viewers.IStructuredSelection-> java.io.ByteArrayInputStream” Results: 5 “org.eclipse.jface.viewers.IStructuredSelection-> java.io.FileInputStream” Results: None PARSEWeb [Thummalapenta&Xie ASE 07]

45 Result Filtering Remove sequences that contain no query words: ClassGen and addMethod Compress consecutive calls of the same method into one, e.g., abbba  aba Remove duplicate frequent sequences after the compression, e.g., aba, aba  aba Reduce a seq if it is a subseq of another, e.g., aba, abab  abab MAPO [Xie&Pei MSR 06]

46 Data Analysis/Mining Data collection/extraction Data preprocessing Data analysis/mining Result postprocessing Result representation

47 Result Representation Display results in the tool user interface –Strathcona –XSnippet –PARSEWeb –MAPO –CodeBroker –Assieme

48 Strathcona Strathcona [Holmes et al. 05]

49 XSnippet XSnippet [Sahavechaphan&Claypool OOPSLA 06 ]

50 PARSEWeb PARSEWeb [Thummalapenta&Xie ASE 07]

51 PARSEWeb

52 MAPO (new) MAPO [Xie&Pei MSR 06]

53 MAPO (new) MAPO [Xie&Pei MSR 06]

CodeBroker Comments signature CodeBroker [Ye&Fischer ICSE 01] Information delivery that autonomously locates and presents software developers with task- relevant and personalized components. Active repository!!!

Assieme A hybrid search engine Index code snippets found on web pages Link them to required libraries and documentation Assieme [Hoffmann et al. UIST 07]

Assieme links to pages with snippets group pages with similar snippets links to required libraries Assieme [Hoffmann et al. UIST 07]

Example Evaluations of Recommenders Prospector Strathcona PARSEWeb

Prospector Experiment 1 (ranking test) hypothesis: –to find the desired code, the user needs to examine only top 5 candidate jungloids. result: –desired code in “top 5” 17 out 20 times (10 out of 20, in “top 1”) –remaining three fixable methodology: –used 20 real-world coding tasks –collected from FAQs, newsgroups, our practice, s to us

Prospector Experiment 2 (user study) hypothesis: –Prospector-equipped programmers are better at solving API programming problems than other programmers methodology: –6 problems, each user did 3 with Prospector and 3 without –problems formulated not to reveal the query –sample problem: “The new Java channel IO system represents files as channels. How do I get a channel that represents a String filename?” –somewhat sparse data (10 users)

Experiment 2 (user study). Results. Prospector shortens development time –some problems solved only by Prospector users –when both groups succeeded, Prospector users 30% faster Prospector may help enable reuse –non-Prospector users sometimes reimplemented Prospector may help avoid making mistakes –mistakes applying code found on internet into own code The authors expect even stronger results on a more robust infrastructure.

Strathcona: User Study 2 developers were assigned 4 tasks on building a plug-in for Eclipse. Neither developers knew how to implement any of the tasks at hand. The results showed that the tool can deliver relevant and useful examples to developers. They also showed a developer can determine when the examples returned are not relevant. Table 2: Results from Evaluation: Useful Example Source Viewed Succeeded at Task Task 1 Subject yes Subject 2 1 1yes Task 2 Subject 1 12 yes Subject yes Task 3 Subject yes Subject 20 6 yes Task 4 Subject yes Subject partially Strathcona [Holmes et al. 05]

Strathcona: Performance and Scalability As a test case for scalability, Eclipse 3.0 source was populated to the repository. The resulting amount of information in the repository is shown in Table1. On a Pentium MHz 1024 MB RAM Server, a Pentium MHz 256 MB RAM Repository with Postgresql DB the performance numbers are: Table 1: Number of Structural Relations Classes 17,456 Methods 124,359 Fields 48,441 Inheritance Relations 15,187 Object Instant ions 43,923 Calls Relations 1,066,838 Total 1,316,204 –Less than 500 ms for building a structural context. –Less than 300 ms for displaying the example. –4 – 12 seconds server response time. Strathcona [Holmes et al. 05]

63 PARSEWeb Evaluations Real Programming Problems: To address problems posted in developer forums Real Projects: To show that solutions recommended by PARSEWeb are –available in real projects –better than solutions recommended by related tools PROSPECTOR, Strathcona, and Google Code Search averagely

64 Real Programming Problems Jakarta BCEL user forum, 2001 Problem : “How to disassemble java byte code” Query : “Code  Instruction” Solution Sequence: FileName:2_RepMIStubGenerator.java MethodName: isWriteMethod Rank:1 NumberOfOccurrences:1 Code,getCode() ReturnType:#UNKNOWN# CONSTRUCTOR,InstructionList(#UNKNOWN#) ReturnType:InstructionList InstructionList,getInstructions() ReturnType:Instruction Solution Sample Code : Code code; InstructionList il = new InstructionList(code.getCode()); Instruction[] ins = il.getInstructions();

65 Real Programming Problems Dev 2 Dev Newsgroups, 2006 Problem : “how to connect db by sessionBean” Query : javax.naming.InitialContext  java.sql.Connection Solution Sequence : FileName:3 AddressBean.java MethodName:getNextUniqueKey Rank:1 NumberOfOccurrences:34 javax.naming.InitialContext,lookup(java.lang.String) ReturnType:javax.sql.DataSource javax.sql.DataSource,getConnection() ReturnType:java.sql.Connection

66 Real Project: Logic Source File: LogicEditor.java SUMMARY-> PARSEWeb: 8/10, Prospector: 6/10, Strathcona: 5/10

67 Comparison with Prospector 12 specific programming tasks taken from XSnippet approach. SUMMARY-> PARSEWeb: 11/12, Prospector: 7/12

68 Comparison with Other Tools Percentage of tasks successfully completed by PARSEWeb, Prospector, and XSnippet

69 Significance of Internal Techniques *Legend: Method inline: Method inlining Post Process: Sequence Post Processor Query Split: Query Splitter

70 T. Xie Mining Program Source Code Questions? Bibliography on Mining Software Engineering Data What software engineering tasks can be helped by data mining? What kinds of software engineering data can be mined? How are data mining techniques used in software engineering? Resources Available Data Mining Tools

Mining Partial Orders 71 Consider APIs a, b, c, d, e, and f Partial Order Partial Order with Transitive Reduction The extracted scenarios are fed to a partial order miner The partial order miner mines frequent closed partial order Closed Partial Order Apiator [Acharya et al. FSE 07]

XOpenDisplay XCloseDisplay XCreateWindow XGetWindowAttributes XCreateGC XSetForeground XGetBackground XMapWindow XChageWindowAttributes XMapWindow XSelectInput XGetAtomName XFreeGC XNextEvent Example Partial Order A usage scenario around XOpenDisplay API as a partial order. Specifications are shown with dotted lines. Apiator [Acharya et al. FSE 07]