Clustering Software Artefacts Based on Frequent common changes Presented by Haroon Malik.

Slides:



Advertisements
Similar presentations
Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
Advertisements

1 Software Design Introduction  The chapter will address the following questions:  How do you factor a program into manageable program modules that can.
Activity relationship analysis
 Graph Graph  Types of Graphs Types of Graphs  Data Structures to Store Graphs Data Structures to Store Graphs  Graph Definitions Graph Definitions.
What is Software Design?  Introduction  Software design consists of two components, modular design and packaging.  Modular design is the decomposition.
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering.
GNANA SUNDAR RAJENDIRAN JOYESH MISHRA RISHI MISHRA FALL 2008 BIOINFORMATICS Clustering Method for Repeat Analysis in DNA sequences.
Normalized Cuts and Image Segmentation Jianbo Shi and Jitendra Malik, Presented by: Alireza Tavakkoli.
Massive Graph Visualization: LDRD Final Report Sandia National Laboratories Sand Printed October 2007.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
Segmentation Graph-Theoretic Clustering.
1 On the Benefits of Adaptivity in Property Testing of Dense Graphs Joint work with Mira Gonen Dana Ron Tel-Aviv University.
System Architecture: Desing alternatives and methodologies.
The Shortest Path Problem
Unit Testing CS 414 – Software Engineering I Don Bagert Rose-Hulman Institute of Technology January 16, 2003.
Software Configuration Management (SCM)
Basic Concepts The Unified Modeling Language (UML) SYSC System Analysis and Design.
Clustering Software Artifacts Based on Frequent common changes Presented by: Ashgan Fararooy Prepared by: Haroon Malik (Modified)
9.2 Graph Terminology and Special Types Graphs
S/W Project Management
Systems Analysis – Analyzing Requirements.  Analyzing requirement stage identifies user information needs and new systems requirements  IS dev team.
1 CSE 2102 CSE 2102 CSE 2102: Introduction to Software Engineering Ch9: Software Engineering Tools and Environments.
Software Engineering Modern Approaches
Graph-based Segmentation. Main Ideas Convert image into a graph Vertices for the pixels Vertices for the pixels Edges between the pixels Edges between.
Unless otherwise noted, the content of this course material is licensed under a Creative Commons Attribution 3.0 License.
Information Flow using Edge Stress Factor Communities Extraction from Graphs Implied by an Instant Messages Corpus Franco Salvetti University of Colorado.
Graphs Chapter 12.
Configuration Management (CM)
1 Software Design Reference: Software Engineering, by Ian Sommerville, Ch. 12 & 13, 5 th edition and Ch. 10, 6 th edition.
Copyright © 2007 Addison-Wesley. All rights reserved.1-1 Reasons for Studying Concepts of Programming Languages Increased ability to express ideas Improved.
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009.
 The need for a formal methodology description  SPEM for describing an agent oriented methodology  PASSI: an example  The needed extension  Discussion.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
1/24 Introduction to Graphs. 2/24 Graph Definition Graph : consists of vertices and edges. Each edge must start and end at a vertex. Graph G = (V, E)
Daniel A. Keim, Hans-Peter Kriegel Institute for Computer Science, University of Munich 3/23/ VisDB: Database exploration using Multidimensional.
Yen-Ting Yu Iris Hui-Ru Jiang Yumin Zhang Charles Chiang DRC-Based Hotspot Detection Considering Edge Tolerance and Incomplete Specification ICCAD’14.
Software Configuration Management (SCM) Source: Pressman, R., Software Engineering: A Practitioner ’ s Approach. Boston: McGraw Hill, Inc., 2005; Ghezzi,
® IBM Software Group © 2009 IBM Corporation Essentials of Modeling with IBM Rational Software Architect V7.5 Module 17: Team Modeling.
GRAPHS. Graph Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component, spanning tree Types of graphs: undirected,
“Pajek”: Large Network Analysis. 2 Agenda Introduction Network Definitions Network Data Files Network Analysis 2.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Chapter 20: Graphs. Objectives In this chapter, you will: – Learn about graphs – Become familiar with the basic terminology of graph theory – Discover.
NN k Networks for browsing and clustering image collections Daniel Heesch Communications and Signal Processing Group Electrical and Electronic Engineering.
Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.
School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2013 Figures are taken.
Software Engineering Lecture 9: Configuration Management.
Energy Models for Graph Clustering Bo-Young Kim Applied Algorithm Lab, KAIST.
OBJECT ORIENTED VS STRUCTURED WHICH ONE IS YOUR CHOICE.
On the Ability of Graph Coloring Heuristics to Find Substructures in Social Networks David Chalupa By, Tejaswini Nallagatla.
Document Clustering with Prior Knowledge Xiang Ji et al. Document Clustering with Prior Knowledge. SIGIR 2006 Presenter: Suhan Yu.
Graph-based Segmentation
Metrics of Software Quality
Discrete ABC Based on Similarity for GCP
We propose a method which can be used to reduce high dimensional data sets into simplicial complexes with far fewer points which can capture topological.
Graphs 7/18/2018 7:39 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
Graphs ORD SFO LAX DFW Graphs Graphs
Taibah University College of Computer Science & Engineering Course Title: Discrete Mathematics Code: CS 103 Chapter 10 Graphs Slides are adopted from “Discrete.
Network analysis.
The Process of Object Modeling
Segmentation Graph-Theoretic Clustering.
Noémi Gaskó, Rodica Ioana Lung, Mihai Alexandru Suciu
Preparing Conference Papers (1)
Graphs 4/29/15 01:28:20 PM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia,
Preparing Conference Papers (1)
Graphs G = (V, E) V are the vertices; E are the edges.
Charts A chart is a graphic or visual representation of data
HW 3 (Due Wednesday Feb 6) Create slide(s) for your 1 minute presentation on a graph theory application. Make sure your slide(s) include (1) Define the.
HW 3 (Due Wednesday Feb 6) Create slide(s) for your 1 minute presentation on a graph theory application. Make sure your slide(s) include (1) Define the.
Presentation transcript:

Clustering Software Artefacts Based on Frequent common changes Presented by Haroon Malik

Abstract The clusters of artifacts that are frequently changed changed together are subsystem candidates. Two step method identification of clusters: Extracting Co-Change graph from the version control repository. Computing a layout of the co-changed graph. This reveals the cluster of frequent co-change artifacts.

Proposed Model High level description can be recovered from source code and other low-level information through reverse engineering. Software clustering divided software artifacts into subsystems which are as independent as possible with respected to comprehension, change, reuse etc. Co-change graph model is proposed for clustering software system

Proposed Model (Con’t) Co-change Graph: Abstraction of version control repositories. Vertices of this graph are: Software artifacts (Files or Functions) & Change transactions ( Commits in terms of CVS). Edges connect the change transaction with their participating artifacts.

Proposed Model (Con’t) Presentation: The result of clustering is not a partition of the graph vertices, but a layout of graph vertices. This layout of the graph refers to position of the graph vertices in two or three dimensional space. Heavily co-changed artifacts closer together. Rarely co-changed artifacts at larger distances. Layout is comprehensive and provides additional information  How clearly Clusters are Separated.  If artifacts are at center of the cluster or rather between two clusters.

Proposed Model (Con’t) Contents: Not just arranged in some nice way, but their positions have a well-defined interpretation with respect to their common changes. Two artifacts are placed closer to the degree of that their common change is stronger then random.

Co-Change Graph. The graph refers to the common changes of artifacts in version repositories. It can be easily extracted from version repositories. Ensures, that the clustering results have a clear interpretation in terms of repositories. Biases though arbitrary choices i.e. weight function of values of free parameters are minimized.

Co-Change Graph (Con’t) Software artifact : Is an entity that belongs to a software system E.g. A package, a file, a line of code or even a piece of document Version: State of a software artifact at a particular point in time. CVS system stores version of artifacts in a central repository. User of such systems modify local copies of the software artifacts, and check-in their changes to the central repository.

Co-Change Graph (Con’t) Change Transaction: It is a coherent sequence of cheek-ins of several software artifacts. Software artifacts that participate in the same change transaction are co-changed (commonly changed). The Co-change graph of a give version of repository is an undirected graph (V,E ). The set of vertices V of the co-change graph contains all the software artifacts and all change transaction of the version repository.

Co-Change Graph (Con’t) The set of edges E contains the undirected edge {c,a}, if the artifact a was changed by transaction c. Bipartite: It contains no edges that connect two change transaction of two software artifacts.

Co-Change Graph (Con’t) For a vertex v of a co-changed graph, the number of its adjacent vertices is called the degree of v and denoted by deg(v) For transaction vertices; te degree gives the number of artifacts that participate in the transaction. For artifacts, the degree gives the number of their changes.

Weight Co-change Graph It involves assigning a real number to each edge by weigh function (w) to set of Edges (E) The real number assigned to each edge interprets the importance of the corresponding change. Each edge is give same weight.

Condensed Co-Change Graph It is a weighted, undirected graph (V,E,w), for a given repository. Where, the set of vertices V contains all software artifacts in repository. Set of Edges E contains the edge {a,a’}, if the artifact a and a’ were commonly changed by a transaction.

Edge-Repulsion Linlog Energy Model This model specify the good graph layout. The basic idea is that in co-change graph edges causes both repulsion and attraction. Every edge will cause same amount of repulsion and attraction. Model helps in creating suitable readable layouts

Evaluation The Software system were chosen based on : Size, number of developers, project duration and artifacts in different programming languages Based of familiarity.

Evaluation The co-change graph were extracted on file level A tool cvs2cl2 is used to recover change transaction from CVS repository A calculator for relation generated the co-change graph from transaction ---- CrocoPat Duration, total changes indeed all number were obtained with tool Stat CVS Layout was computed using utomatically usig Edge repulsion linlog energy model

Artifacts in the CrocoPat repository

Artifacts in the Rabbit repository

Artifacts in the Blast repository

Conclusions Introduced a new method for clustering software artifacts. Defined the co-change graph as underlying formal model Evaluated our method on three example software systems with different types of documents and source code in several programming languages