Introduction and Overview to Mining Software Repository Zoltan Karaszi zkaraszi (at) kent.edu MS/PHD seminar (cs6/89191) November 9th, 2011 1.

Slides:



Advertisements
Similar presentations
Jeremy S. Bradbury, James R. Cordy, Juergen Dingel, Michel Wermelinger
Advertisements

Critical Reading Strategies: Overview of Research Process
The Robert Gordon University School of Engineering Dr. Mohamed Amish
Quantitative vs. Qualitative Research Method Issues Marian Ford Erin Gonzales November 2, 2010.
Benjamin J. Deaver Advisor – Dr. LiGuo Huang Department of Computer Science and Engineering Southern Methodist University.
Object-Oriented Analysis and Design
Managing Data Resources
Software Quality Metrics
SE curriculum in CC2001 made by IEEE and ACM: Overview and Ideas for Our Work Katerina Zdravkova Institute of Informatics
Introduction to Software Evolution and Maintenance
Software Metrics II Speaker: Jerry Gao Ph.D. San Jose State University URL: Sept., 2001.
Copyright 2002 Prentice-Hall, Inc. Chapter 1 The Systems Development Environment 1.1 Modern Systems Analysis and Design Third Edition Jeffrey A. Hoffer.
Automatically Extracting and Verifying Design Patterns in Java Code James Norris Ruchika Agrawal Computer Science Department Stanford University {jcn,
Writing Good Software Engineering Research Papers A Paper by Mary Shaw In Proceedings of the 25th International Conference on Software Engineering (ICSE),
Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.
Copyright 2004 Prentice-Hall, Inc. Essentials of Systems Analysis and Design Second Edition Joseph S. Valacich Joey F. George Jeffrey A. Hoffer Chapter.
Supplement 02CASE Tools1 Supplement 02 - Case Tools And Franchise Colleges By MANSHA NAWAZ.
Introduction to Communication Research
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
1 Prediction of Software Reliability Using Neural Network and Fuzzy Logic Professor David Rine Seminar Notes.
Copyright 2001 Prentice-Hall, Inc. Essentials of Systems Analysis and Design Joseph S. Valacich Joey F. George Jeffrey A. Hoffer Chapter 1 The Systems.
1 Software Maintenance and Evolution CSSE 575: Session 8, Part 2 Analyzing Software Repositories Steve Chenoweth Office Phone: (812) Cell: (937)
OOSE 01/17 Institute of Computer Science and Information Engineering, National Cheng Kung University Member:Q 薛弘志 P 蔡文豪 F 周詩御.
The Data Attribution Abdul Saboor PhD Research Student Model Base Development and Software Quality Assurance Research Group Freie.
Copyright 2002 Prentice-Hall, Inc. Chapter 1 The Systems Development Environment 1.1 Modern Systems Analysis and Design.
“Enhancing Reuse with Information Hiding” ITT Proceedings of the Workshop on Reusability in Programming, 1983 Reprinted in Software Reusability, Volume.
Dependency Tracking in software systems Presented by: Ashgan Fararooy.
A Framework for Examning Topical Locality in Object- Oriented Software 2012 IEEE International Conference on Computer Software and Applications p
Copyright 2002 Prentice-Hall, Inc. Chapter 1 The Systems Development Environment 1.1 Modern Systems Analysis and Design Third Edition Jeffrey A. Hoffer.
CSCE 548 Code Review. CSCE Farkas2 Reading This lecture: – McGraw: Chapter 4 – Recommended: Best Practices for Peer Code Review,
Software Engineering CS3003
Database System Concepts and Architecture
1 Research Groups : KEEL: A Software Tool to Assess Evolutionary Algorithms for Data Mining Problems SCI 2 SMetrology and Models Intelligent.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Mining and Analysis of Control Structure Variant Clones Guo Qiao.
Reviewing Recent ICSE Proceedings For:.  Defining and Continuous Checking of Structural Program Dependencies  Automatic Inference of Structural Changes.
1 Experience-Driven Process Improvement Boosts Software Quality © Software Quality Week 1996 Experience-Driven Process Improvement Boosts Software Quality.
Samad Paydar Web Technology Lab. Ferdowsi University of Mashhad 10 th August 2011.
Copyright 2002 Prentice-Hall, Inc. 1.1 Modern Systems Analysis and Design Jeffrey A. Hoffer Joey F. George Joseph S. Valacich Chapter 1 The Systems Development.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
Lucian Voinea Visualizing the Evolution of Code The Visual Code Navigator (VCN) Nunspeet,
Yazd University, Electrical and Computer Engineering Department Course Title: Advanced Software Engineering By: Mohammad Ali Zare Chahooki 1 Machine Learning.
Debug Concern Navigator Masaru Shiozuka(Kyushu Institute of Technology, Japan) Naoyasu Ubayashi(Kyushu University, Japan) Yasutaka Kamei(Kyushu University,
SEMINAR WEI GUO. Software Visualization in the Large.
Enabling Reuse-Based Software Development of Large-Scale Systems IEEE Transactions on Software Engineering, Volume 31, Issue 6, June 2005 Richard W. Selby,
Mining Version Histories to Guide Software Changes Thomas Zimmerman Peter Weisgerber Stephan Diehl Andreas Zeller.
THE IRISH SOFTWARE ENGINEERING RESEARCH CENTRELERO© What we currently know about software fault prediction: A systematic review of the fault prediction.
Exploiting Code Search Engines to Improve Programmer Productivity and Quality Suresh Thummalapenta Advisor: Dr. Tao Xie Department of Computer Science.
CS532 TERM PAPER MEASUREMENT IN SOFTWARE ENGINEERING NAVEEN KUMAR SOMA.
Software Architecture Evaluation Methodologies Presented By: Anthony Register.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
+ Moving Targets: Security and Rapid-Release in Firefox Presented by Carlos Bernal-Cárdenas.
1 Experience from Studies of Software Maintenance and Evolution Parastoo Mohagheghi Post doc, NTNU-IDI SEVO Seminar, 16 March 2006.
University of Waterloo Four “interesting” ways in which history can teach us about software Michael W. Godfrey * Xinyi Dong Cory Kapser Lijie Zou Software.
Requirements Analysis
Recommending Adaptive Changes for Framework Evolution Barthélémy Dagenais and Martin P. Robillard ICSE08 Dec 4 th, 2008 Presented by EJ Park.
The PLA Model: On the Combination of Product-Line Analyses 강태준.
Introduction to Machine Learning, its potential usage in network area,
Chapter 1 The Systems Development Environment
Chapter 1 The Systems Development Environment
Applications of Data Mining in Software Engineering
Mining and Analyzing Data from Open Source Software Repository
SWE-795 Presentation 01 11/16/2018 Asking and Answering Questions during a Programming Change Task Jonathan Sillito, Member, IEEE Computer Society, Gail.
Chapter 1 The Systems Development Environment
Recommending Adaptive Changes for Framework Evolution
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Presentation transcript:

Introduction and Overview to Mining Software Repository Zoltan Karaszi zkaraszi (at) kent.edu MS/PHD seminar (cs6/89191) November 9th,

Abstract November 9th,  Based on the following survey paper: “ A survey and taxonomy of approaches for mining software repositories in the context of software evolution” by Huzefa Kagdi, Michael L. Collard and Jonathan I. Maletic, 2007  After defining MSR, giving background and different classifications, my main goal is - give a general picture about MSR  After showing the different MSR approaches I will focus on one example of Frequent-pattern mining that examines the changes and evolution of software

Outline November 9th, Introduction 2. Dimensions of survey 3. A layered taxonomy of MSR 4. Software repository mining overview 5. Example: Frequent-pattern mining 6. Discussion and open issues 7. Concluding remarks 8. References

November 9th, Introduction 1.1. Terms 1.2. Premise 1.3. Scope, background, history 1.4. Goals of the survey.

November 9th, Introduction 1.1. Terms  Mining Software Repositories (MSRs): created to describe a broad class of investigations into the examination of software repositories  Software Repositories (SRs): produced and archived during software evolution  Concurrent Versions System (CVS): client-server free software revision control system, track of all changes in a set of files 1.2. Premise  Empirical and systematic investigations of repositories  Identify uncovered information, relationships or trends  Bring new light on the process of software evolution and the changes

1.3. Scope, background and history  Scope Survey the literature until June, 2006 Specifically investigates evolutionary changes of software artifacts  Background No survey of investigation examined the changes and evaluation of software and use data mining and other similar techniques before  In the past MSR investigations were subjected on industrial Systems research efforts were limited for few software systems  Currently Large increase in open-source software  how to manage this challenge November 9th,

1.4. Goals of the survey  Form a basis for researchers interested in MSR to better understand the evolution of software systems  Create a taxonomy assist in the continued advancement of the field  Clearer understanding support the development of tools, methods, processes  More precisely reflect the actual nature of software evolution November 9th,

. 2. Dimensions of survey 2.1. Information sources 2.2. Purpose 2.3. Methodology 2.4. Evaluation.

2. Dimensions of the survey 2.1. Information sources Categories of information in SR  Metadata about the software change: comments, user-ids, timestamps  Differences between the versions: addition, deletion or modification  Classification of different software versions (artifacts) Version control systems  CVS – doesn’t maintain explicit branch and merge points  Subversion (more modern) – build the change-set  Bugzilla – bug-tracking system - history of the entire lifecycle of a bug (bug report) November 9th,

2.2. Purpose Extract information and uncover relationships or trends in source code evolution Two classes of answers of MSR questions  Market-Basket Question (MBQ) formulated as If A occurs then what else occurs on a regular basis?  Prevalence Questions (PQ) formulated as Was a particular function added/deleted/modified? How many and which of the functions are reused? November 9th,

2.3. Methodology Researchers utilize software repositories in multiple ways  Limit the studies to the metadata directly available from the repositories  using the semantic manner, traditional  Use directly the functionality of source code repositories (CVS commands) to get a particular version of the code  using the adopted/invented methodology 2.4. Evaluation Assessment metrics  Precision: how much of the information found is relevant  Recall: how much of all of the relevant information is found November 9th,

November 7th, Layered taxonomy of MSR approaches

All the investigated survey paper works: on version-release histories, on the same level of granularity, ask and answer very similar type of MSR questions, analyze the information and derive conclusions within the context of software evolution The four-layer taxonomic description [1] November 9th,

4. Software repository mining overview 4.1. Metadata analysis 4.2. Static source code analysis 4.3. Source code differencing, analysis 4.4. Software metrics 4.5. Visualization 4.6. Clone-detection methods 4.7. Information-retrieval methods 4.8. Classification with supervised learn 4.9. Social network analysis.

4. Software repository mining overview 4.1. Metadata analysis  Lightweight methodology to analyze metadata  Utilize the metadata stored in software repositories  Straightforward first choice – accessible (CVS log) 4.2. Static source code analysis  Good approach to extract facts and other information from versions of a system  Bug finding and fixing 4.3. Source code differencing and analysis  Further extension of MSR with regards to source code changes  More source code ‘aware’manner November 9th,

4.4. Software metrics  Quantitatively measures various aspects of software products and projects  Include size, effort, cost, functionality, quality, complexity and efficiency 4.5. Visualization Interactive visual representation of data to amplify cognition and to support software maintenance and evolution  Very task specific  Based on the mined data and how one separates approach categories 4.6. Clone-detection methods  Approaches for identify both exact and near-miss clones  Source code entities with similar textual, structural and semantic composition November 9th,

4.7. Information-Retrieval (IR) methods Classification and clustering of textual units  Applied to many software engineering problems  Traceability, program comprehension, and software reuse  CVS comments, textual descriptions of bug reports, and s 4.8. Classification with supervised learning  Supervised learning: technique creating cause–effect function from training data 4.9. Social network analysis  For deriving and measuring‘invisible’ relationships between social entities  To discover developer roles, contributions, associations in the software development November 9th,

November 9th, Example: Frequent-pattern mining 5.1. Evolutionary couplings and change predictions 5.2. Capabilities of technique 5.3. Extension of their work [33] 5.4. Evaluation 5.5. Advantages of extended ROSE 18.

5. Example: Frequent-pattern mining  Discover implicit knowledge from large datasets (patterns, trends, rules)  Encompasses IR, statistical analysis and modeling and machine learning  Applied to uncover frequently co-change (frequent patterns) software entities  Include the ordering information [34] November 9th,

5.1. Evolutionary couplings and change predictions Zimmermann et al. [15] aimed to identify co-occurring changes in a software system  Purpose: find changes ? source code entity(function A) modified  other entities(functions B and C)modified  Use  ROSE (parser tool) for SC (C++, Java, Python)  Association-rule mining technique to determine rules of the form B  A  Derived association rules such as a particular ‘type’ definition changes  leads to changes  In instances of variables of that ‘type’  In coupling between interface and implementation November 9th,

5.2. Capabilities of technique  Ability to identify addition, modification and deletion of syntactic entities  Handles various programming languages and HTML documents  Detection of hidden dependencies Figure 1.2: Programmers who Changed this Function also Changed…[15] November 9th,

5.3. Extension of their work [33] Allows prediction of additions to and deletions from entities ROSE was evaluated for  Navigation (recommendation of other affected entities)  Closure (false suggestions for missing entities)  Granularity (fine versus coarse)  Maintenance (modified only) November 9th,

5.4. Evaluation (‘interactive power’ of ROSE tool)  Period: at least one month selected for eight open-source projects  Prediction - based on previous versions: changes occurred during the evaluation  New additional measure feedback: percentage of queries Average precision, recall, and feedback values  Navigation and prevention support is better with coarse level than with fine level granularity  Average feedback values in the case of closure: 1.9% in the case of fine and coarse granularity: 3% November 9th,

5.5. Advantages of extended ROSE tool  Needs only a few weeks of history to make suggestions  Results can be improved by assigning higher weight to rapid renames and moves Similar approach Ying et al. [34] - approach for source code change prediction at a file level  Use: association-mining technique based on FP-tree item-set mining  Evaluated: version histories of Mozilla and Eclipse projects November 9th,

November 9th, Discussion and open issues 7. Concluding remarks 8. References.

6. Discussion and open issues  Need to be able to perform MSR on fine-grained entities  Standards for validation must be developed 7. Concluding remarks  Over 80 investigations were surveyed  Layered taxonomy was derived MSR investigations are promising avenue to help support and understand software evolution ! November 9th,

8. References [1]. Kagdi, H., Collard, M.L., Maletic, J.I., "A Survey and Taxonomy of Approaches for Mining Software Repositories in the Context of Software Evolution", in the Journal of Software Maintenance and Evolution: Research and Practice (JSME), Vol. 19, No. 2, 2007, pp [15]. Zimmermann T, Weißgerber P, Diehl S, Zeller A. Mining version histories to guide software changes. Proceedings 26 th International Conference on Software Engineering (ICSE’04). IEEE Computer Society Press: Los Alamitos CA, 2004; [33]. Zimmermann T, Zeller A,Weißgerber P, Diehl S. Mining version histories to guide software changes. IEEE Transactions on Software Engineering 2005; 31(6):429–445. [34]. Ying ATT, Murphy GC, Ng R, Chu-Carroll MC. Predicting source code changes by mining change history. IEEE Transactions on Software Engineering 2004; 30(9):574–586. Thank you for your time ! November 9th,