Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

Slides:



Advertisements
Similar presentations
Google News Personalization: Scalable Online Collaborative Filtering
Advertisements

Chapter 12 Analyzing Semistructured Decision Support Systems Systems Analysis and Design Kendall and Kendall Fifth Edition.
CONQUER: A Tool for NL-based Query Refinement & Contextualizing Code Search Results Manuel Roldan-Vega, Greg Mallet, Emily Hill, Jerry Alan Fails.
ISBN Chapter 3 Describing Syntax and Semantics.
Exploring the Neighborhood with Dora to Expedite Software Maintenance Emily Hill, Lori Pollock, K. Vijay-Shanker University of Delaware.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Economic Perspectives in Test Automation: Balancing Automated and Manual Testing with Opportunity Cost Paper By – Rudolf Ramler and Klaus Wolfmaier Presented.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Introducing Natural Language Program Analysis Lori Pollock, K. Vijay-Shanker, David Shepherd, Emily Hill, Zachary P. Fry, Kishen Maloor.
Investigating JAVA Classes with Formal Concept Analysis Uri Dekel Based on M.Sc. work at the Israeli Institute of Technology. To appear:
DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT MAYURI UMRANIKAR.
WMES3103 : INFORMATION RETRIEVAL
Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Algorithms and Problem Solving. Learn about problem solving skills Explore the algorithmic approach for problem solving Learn about algorithm development.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
Online Learning for Web Query Generation: Finding Documents Matching a Minority Concept on the Web Rayid Ghani Accenture Technology Labs, USA Rosie Jones.
Describing Syntax and Semantics
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Important Task in Patents Retrieval Recall is an Important Factor Given Query Patent -> the Task is to Search all Related Patents Patents have Complex.
Flash talk by: Aditi Garg, Xiaoran Wang Authors: Sarah Rastkar, Gail C. Murphy and Gabriel Murray.
Data Structures and Programming.  John Edgar2.
Improving Automatic Abbreviation Expansion within Source Code to Aid in Program Search Tools Zak Fry.
Developing Natural Language-based Software Analyses and Tools to Expedite Software Maintenance Lori Pollock Collaborators: K. Vijay-Shanker, Emily Hill,
Impact Analysis of Database Schema Changes Andy Maule, Wolfgang Emmerich and David S. Rosenblum London Software Systems Dept. of Computer Science, University.
Class Specification Implementation Graph By: Njume Njinimbam Chi-Chang Sun.
A Framework for Examning Topical Locality in Object- Oriented Software 2012 IEEE International Conference on Computer Software and Applications p
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
Change Impact Analysis for AspectJ Programs Sai Zhang, Zhongxian Gu, Yu Lin and Jianjun Zhao Shanghai Jiao Tong University.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Towards Supporting On-Demand Virtual Remodularization Using Program Graphs David Shepherd, Lori Pollock, and K. Vijay-Shanker University of Delaware.
Bug Localization with Machine Learning Techniques Wujie Zheng
Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.
Information Retrieval Evaluation and the Retrieval Process.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
1 Query Operations Relevance Feedback & Query Expansion.
17/10/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald hugbúnaðar Fyrirlestrar 37 og 38 Program Exploration with Dora.
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.
1 Systems Analysis and Design in a Changing World, Thursday, January 18, 2007.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
Automatically Repairing Broken Workflows for Evolving GUI Applications Sai Zhang University of Washington Joint work with: Hao Lü, Michael D. Ernst.
Debug Concern Navigator Masaru Shiozuka(Kyushu Institute of Technology, Japan) Naoyasu Ubayashi(Kyushu University, Japan) Yasutaka Kamei(Kyushu University,
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
1 Asking and Answering Questions during a Programming Change Task, By Jonathan Sillito, Member, IEEE, Gail C. Murphy, Member, IEEE, and Kris De Volder.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Query Suggestion. n A variety of automatic or semi-automatic query suggestion techniques have been developed  Goal is to improve effectiveness by matching.
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
“Isolating Failure Causes through Test Case Generation “ Jeremias Rößler Gordon Fraser Andreas Zeller Alessandro Orso Presented by John-Paul Ore.
Automatically detecting and describing high level actions within methods Presented by: Gayani Samaraweera.
Generating Software Documentation in Use Case Maps from Filtered Execution Traces Edna Braun, Daniel Amyot, Timothy Lethbridge University of Ottawa, Canada.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Petter Nielsen Information Systems/IFI/UiO 1 Systems development Methodologies IN364.
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
Matching Logic An Alternative to Hoare/Floyd Logic
David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and K
Program comprehension during Software maintenance and evolution Armeliese von Mayrhauser , A. Marie Vans Colorado State University Summary By- Fardina.
Modern Information Retrieval
Verification and Validation Unit Testing
Data Mining Chapter 6 Search Engines
Algorithms and Problem Solving
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Relevance and Reinforcement in Interactive Browsing
George Mason University
Chapter 12 Analyzing Semistructured Decision Support Systems
MAPO: Mining and Recommending API Usage Patterns
Topic: Semantic Text Mining
Presentation transcript:

Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and K. Vijay-Shanker Presented By: Paul Heintzelman

Global Concepts Concept assignment problem Hybrid of structural and natural language information Concern Comprehension Action-oriented relations between identifiers –Represented by Action-oriented identifier graph model (AOIG)

Why Action-Oriented Concerns In OOP –Code is organized by objects Objects are nouns Objects and actions conflict –Code organized by objects causes actions to be scattered Therefore in OOP action-oriented concerns tend to be scattered and more difficult to locate

Paper Contributions AOIG –Interactive query expansion algorithm –A result graph construction algorithm –An Eclipse plug-in Evaluation –Comparison of search effectiveness of tools –Per task analysis –Comparison of user effort

AOIG

State of the Art Search-based approaches –Lexical searches Lead to over-generalized searches –Information retrieval Does not separate verbs and objects Uses word frequency Program navigation –Uses structural information e.g. call, inheritance graphs... –Accurate but difficult to seed Dynamic approaches –Requires test case to enact concept

Challenges Map high level concepts to queries –Aid user in mapping concepts Inability to search with high precision and recall –Search NLP representation of concern Understanding large result sets –Return results in an explorable graph

Overview of Approach User formulates a query –Query must include verb-direct object pairings User expands query –Recommendations based on query words and source code Searches the AOIG –Interact with result graph

Independent Variables Search Tools –Find-Concept –ELex built in Eclipse search –GES Google Eclipse search (modified) Search Tasks –Application concept pairing Human Subjects –13 professional programmers –5 grad students

Applications –4 large open source java projects 9 concepts taken from bug reports –1 training application 2 concepts Application Concept Pairing

Forming the Initial Query User generates abstract initial query –e.g. “automatically finish the word” User decomposes abstract query into verb-direct object pairs –e.g. “finish” and “word” Find concept maintains both verb query and direct object query Initial query expansion –User is presented with alternative forms of words in both queries

Query Expansion Iterative steps –Generate recommended list Similar semantics is weighted more heavily than similar use 10 ranked recommendations –User examines recommendations User selects words to add to queries User can view a list of methods fitting the current queries Stop when user is satisfied –Augment user query with get, set, execute, construct Use AOIG to map verb-direct object pairs to source code –Generate result graph

Word Recommendation Similar semantics –Stemming Recommends different forms of words in either list e.g. If “finish” is in verb-query, “finished” will be recommended –Synonyms Recommend a word if synonym exists in either list e.g. Recommend “complete” if “finish” is in list Similar use –Recommend words that occur near words in either query –e.g. Recommend “word” if “complete” is in the verb query and “complete word” is in the AOIG

Evolution of a Query

Result Graph

Find-Concept Process

Research Questions Which search tool is most effective at locating concerns by forming and executing a query? Which search tool requires the least amount of human effort to form an effective query?

Evaluation Effectiveness –Use the harmonic mean of precision and recall (f-measure) (2 * precision * recall)/(precision + recall) –Result set is compared to evaluation set Evaluation set is 90% generated by a member unfamiliar with the work of this paper Effort –Measured amount of time required to form each query

Experimental Setup Training –Subjects are guided through the use of each tool on the two training tasks Task setup –Users are presented concepts in a visual form –Users confirm that they understood each task

Experimental Procedure 9 tasks 18 programmers 6 groups 6 of every task tool combination

Results Find-Concept vs. ELex –Consistently outperformed ELex Find-Concept vs. GES –Outperformed GES on 4 tasks –Outperformed by GES on 2 tasks AOIG to blame? –Performed equally to GES on 3 tasks

Effectiveness

Effort Human Effort was very similar with all tools

Threats to Validity The selected tasks favored one tool –Concerns selected from bug reports Evaluation sets created for evaluation –90% generated by member unfamiliar with work Results may not generalize to all Java applications –Tested on reasonably-sized applications Results may not generalize to all types of concepts

Conclusion Interactive query expansion algorithm Graph construction algorithm Find-Concept performs well against state of the art tools All evaluated tools required similar human effort

Future Work Create a more effective AOIGBuilder Evaluate the effect of application’s quality and size on results Evaluate the effect of incorporating naming conventions Perform a study on how many tasks focus on actions Automate query expansion

Additional threats to validity Effort and Effectiveness are not really independent Relies heavily on unjustified heuristic –Augmenting query Search tools are often used in conjunction with structural tools