Code search & recommendation engines A summary of some results from research Robert Feldt 2016-11-14
Overview Lots of research on Code Search & Recommendation Engines in last 5-10 years This is quite a general overview of some relevant results; you already know some of it Recent trend is Program Repair based on code search (history- or semantics-driven)
Goal is to make developers more effective We want them to more efficiently: Reuse code that already does what they need Find similar code they can change Find examples of how to use similar code Find test code for related code Help them adapt the code Find similar code, doc & bugs for fixing new bugs Help them automatically repair code (fix bugs)
Recommendation engines “Recommendation systems are software applications that aim to support users in their decision-making while interacting with large information spaces. They provide information items estimated to be valuable for a software engineering task in a given context.” Proactively tailor suggestions to meet developers (& testers) current needs and preferences Help find right & relevant info when developers: Lack experience Can’t consider all relevant info at the same time Important, basic/overview work: [Robillard2010] M. Robillard et al ,”Recommendation Systems for Software Engineering”, IEEE Software, 2010.
RSSE design dimensions
Portfolio: Finding Relevant Functions and their Use
Portfolio Interface
Portfolio: Evaluation Results Portfolio was better than Google Code Search and Koders engines Higher confidence, precision and gain Improves ranking of functions: higher in list Half of relevant functions did not contain keyword from Q Call graph analysis required, keyword search not enough Needed both Keywords, PageRank and SAN algorithm for best results
Active Code Search: Improve Ranking based on Feedback Wang, Shaowei et al, “Active Code Search: Incorporating User Feedback to Improve Code Search Relevance”, ASE ’14, 2014, Västerås, Sweden.
Active Code Search: Results 10-15% better than Portfolio (PageRank + SAN) They use it online for one user but Can also be saved for future use Nice way for you to get this information back from plug-in: Add REST function: rerank_result(resultID, [(resultNum1, “NotInteresting”), (resultNum2, “Interesting”)]) it returns a new result, and saves the feedback to improve future search It is a general solution regardless of how you produce result lists, feedback can always improve it
Mining StackOverflow Ponzanelli et al, “Mining StackOverflow to Turn the IDE into a Self-Confident Programming Prompter”, MSR’12, 2012.
PROMPTER: Architecture Ponzanelli et al, “Mining StackOverflow to Turn the IDE into a Self-Confident Programming Prompter”, MSR’12, 2012.
Repairing bugs based on semantic code search SearchRepair: Semantic Code Search = Encode/index based on its constraints on input-output behavior Localize a defect to buggy part of program Construct input-output profile for buggy code Search database for fragments with similar profiles Generate patch based on similar code and validate against test suite Results: Repaired programs passed 97.3% of tests while other techniques reached only 64-72% correctness Quite involved technique since it involves constraint solving Ke et al, “Repairing Programs with Semantic Code Search”, ASE’15, 2015.
Thank you! robert.feldt@huawei.com xiaoming.duan@huawei.com