2001 IEEE International Conference on Software Maintenance (ICSM'01).

Slides:



Advertisements
Similar presentations
Experiments and Variables
Advertisements

Iterative Optimization and Simplification of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial.
1 An Architecture for Distributing the Computation of Software Clustering Algorithms 2001 Working Conference on Software Architecture (WICSA'01). Brian.
ARCHITECTURAL RECOVERY TO AID DETECTION OF ARCHITECTURAL DEGRADATION Joshua Garcia*, Daniel Popescu*, Chris Mattmann* †, Nenad Medvidovic*, and Yuanfang.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Clustering Categorical Data: An Approach Based on Dynamical Systems (1998) David Gibson, Jon Kleinberg, Prabhakar Raghavan VLDB Journal: Very Large Data.
Mapping Nominal Values to Numbers for Effective Visualization Presented by Matthew O. Ward Geraldine Rosario, Elke Rundensteiner, David Brown, Matthew.
Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.
Paper Title Your Name CMSC 838 Presentation. CMSC 838T – Presentation Motivation u Problem paper is trying to solve  Characteristics of problem  … u.
A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
ANOVA: Factorial Designs. Experimental Design Choosing the appropriate statistic or design involves an understanding of  The number of independent variables.
The Project AH Computing. Functional Requirements  What the product must do!  Examples attractive welcome screen all options available as clickable.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
1 Using Heuristic Search Techniques to Extract Design Abstractions from Source Code The Genetic and Evolutionary Computation Conference (GECCO'02). Brian.
SHOWTIME! STATISTICAL TOOLS IN EVALUATION CORRELATION TECHNIQUE SIMPLE PREDICTION TESTS OF DIFFERENCE.
Presenter : Kuang-Jui Hsu Date : 2011/5/3(Tues.).
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
1 A Heuristic Approach Towards Solving the Software Clustering Problem ICSM03 Brian S. Mitchell /
Feature Detection in Ajax-enabled Web Applications Natalia Negara Nikolaos Tsantalis Eleni Stroulia 1 17th European Conference on Software Maintenance.
Graphs, Puzzles, & Map Coloring
Problem Paramount to the success of your effort stated precisely address an important question advance knowledge.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
1 The Search Landscape of Graph Partitioning Problems using Coupling and Cohesion as the Clustering Criteria Brian S. Mitchell & Spiros Mancoridis
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Manoranjan.
Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Design CS 470 – Software Engineering I Sheldon X. Liang, PH.D.
1 Modeling the Search Landscape of Metaheuristic Software Clustering Algorithms Dagstuhl – Software Architecture Brian S. Mitchell
Color Image Segmentation Mentor : Dr. Rajeev Srivastava Students: Achit Kumar Ojha Aseem Kumar Akshay Tyagi.
Cedric D. Murry APT Instructor of Applied Technology in research and development.
Energy Models for Graph Clustering Bo-Young Kim Applied Algorithm Lab, KAIST.
Anomaly Detection Carolina Ruiz Department of Computer Science WPI Slides based on Chapter 10 of “Introduction to Data Mining” textbook by Tan, Steinbach,
Prediction of Interconnect Net-Degree Distribution Based on Rent’s Rule Tao Wan and Malgorzata Chrzanowska- Jeske Department of Electrical and Computer.
Yasuhiro Hayase†, Yu Kashima‡, Yuki Manabe‡, Katsuro Inoue‡
Graph clustering to detect network modules
Learning Objectives : After completing this lesson, you should be able to: Describe key data collection methods Know key definitions: Population vs. Sample.
Scatterplots, Association, and Correlation
An Efficient Method for Computing Alignment Diagnoses
Static Analysis of Object References in RMI-based Java Software
An Efficient Algorithm for Incremental Update of Concept space
On Routine Evolution of Complex Cellular Automata
Discrete ABC Based on Similarity for GCP
CACTUS-Clustering Categorical Data Using Summaries
Jonathan W. Duggins; James Blum NC State University; UNC Wilmington
Crystallography Labeling Using Machine Learning
Hidden Markov Models Part 2: Algorithms
Example: Applying EC to the TSP Problem
Visualizing Prim’s MST Algorithm Used to Trace the Algorithm in Class
Predict Failures with Developer Networks and Social Network Analysis
Computer Programming.
MIS2502: Data Analytics Clustering and Segmentation
Objects First with Java A Practical Introduction using BlueJ
MIS2502: Data Analytics Clustering and Segmentation
Automatic Segmentation of Data Sequences
Analyzing Two Participation Strategies in an Undergraduate Course Community Francisco Gutierrez Gustavo Zurita
On Refactoring Support Based on Code Clone Dependency Relation
Objects First with Java A Practical Introduction using BlueJ
TECHNICAL PAPER PRESENTATION By: Srihitha Yerabaka
Precise Condition Synthesis for Program Repair
Clustering The process of grouping samples so that the samples are similar within each group.
Generalizing Similar Functions
Evaluating Software Clustering Algorithms
Clustering Deviance From CART Analysis and Silhouette Widths
Efficient Aggregation over Objects with Extent
Presentation transcript:

2001 IEEE International Conference on Software Maintenance (ICSM'01). Comparing the Decompositions Produced by Software Clustering Algorithms using Similarity Measurements 2001 IEEE International Conference on Software Maintenance (ICSM'01). Brian S. Mitchell & Spiros Mancoridis Math & Computer Science, Drexel University

Motivation Using module dependencies when determining the similarity between two decompositions is a good idea…

Clustering the Structure of a System (1) Given the structure of a system…

Clustering the Structure of a System (2) The goal is to partition the system structure graph into clusters… The clusters should represent the subsystems

Clustering the Structure of a System (3) But how do we know that the clustering result is good? Many ways to partition only a few are good. Exploit commonality is the small set of reasonable solution.

Ways to Evaluate Software Clustering Results… Given a software clustering result, we can: Assess it against a mental model Assess it against a benchmark standard Techniques: Subjective Opinions Similarity Measurements

Example: How “Similar” are these Decompositions? Blue Edges: Similarity still the same… M1 M2 M3 M7 Green Edges: Similarity still the same… PA Red Edges: Not as similar… M5 M6 M4 M8 Conclusions: Once we add the red edges the similarity between PA and PB decreases M1 Many ways to partition only a few are good. Exploit commonality is the small set of reasonable solution. According to MoJo and PR/ still similar... copy this slide to 12 with the above comments M2 M3 M7 PB M4 M5 M6 M8

Observations Edges are important for determining the similarity between decompositions Existing measurements don’t consider edges: Precision / Recall (similarity) MoJo (distance) Our idea: Use the edges to determine similarity Add Author names for PR and MoJo

Research Objectives Create new similarity measurements that use dependencies (edges) EdgeSim (similarity) MeCl (distance) Evaluate the new similarity measurements against MoJo & Precision/Recall Use similarity measurements to support evaluation of software clustering results (see our WCRE’01 paper)

Example: How “Similar” are these Decompositions? Add Blue Edges: PR, MoJo, MeCl & EdgeSim unchanged. M1 M2 M3 M7 Add Green Edges: PR, MoJo, MeCl & EdgeSim unchanged. PA Add Red Edges: PR, MoJo unchanged. EdgeSim, MeCl reduced. M5 M6 M4 M8 M1 Many ways to partition only a few are good. Exploit commonality is the small set of reasonable solution. According to MoJo and PR/ still similar... copy this slide to 12 with the above comments M2 M3 M7 PB M4 M5 M6 M8

Definitions M1 M3 M2 M4 Internal/Intra-Edge: Edge within a cluster internal and external edges Internal/Intra-Edge: Edge within a cluster External/Inter-Edge: Edge between two clusters

EdgeSim Example PA PB MDG a b k l f g c h a b d e i j c f g d e h e d

Step 1: Find Common Inter- and Intra-Edges EdgeSim Example MDG PA a b k l a b f g c c f g h d e i j d e h i j PB e d k l c a h Step 1: Find Common Inter- and Intra-Edges i f k b j g l

EdgeSim Example PA PB MDG Common Edge Weight 10 = = 53% k l a b f g c c f g h d e i j d e h i j PB e d k l c a Common Edge Weight h i 10 f = = 53% k b Total Edge Weight 19 j g l

MeCl Example PA PB MDG a b k l f g c h a b d e i j c f g d e h e d c i

MeCl Example (AB) PA PB A1 A2 A3 B1 B2 a b k l f g c h d e i j e d c

MeCl Example (A1 B1) U PA PB A1,1 A1 A2 A3 a b c B1 B2 a b k l f g c h d e i j B1 B2 e d c a h i PB f k b j g l

MeCl Example (A2 B1) U PA PB A1,1 A2,1 A1 A2 A3 a b f g c B1 B2 a b k h d e i j B1 B2 e d c a h i PB f k b j g l

MeCl Example (A1 B2) U PA PB A1,1 A2,1 A1,2 A1 A2 A3 a b f g c d e B1 k l c PA f g c h d e i j A1,2 d e B1 B2 e d c a h i PB f k b j g l

MeCl Example (A2 B2) U PA PB A1,1 A2,1 A1,2 A2,2 A1 A2 A3 a b f g c d k l c PA f g c h d e i j A1,2 A2,2 d e h B1 B2 e d c a h i PB f k b j g l

MeCl Example (A3 B2) U PA PB A1,1 A2,1 A1,2 A2,2 A3,2 A1 A2 A3 a b f g k l c PA f g c h d e i j A1,2 A2,2 d e h B1 B2 e d c a h i j i PB f k b A3,2 k l j g l

MeCl Example (AB) B1 PA PB B2 A1,1 A2,1 A1,2 A2,2 A3,2 A1 A2 A3 a b f g a b k l c PA f g c h d e i j A1,2 A2,2 d e h B1 B2 e d c breakdown on A_X,2 a h i j i PB f k b A3,2 k l j g l B2

Newly Introduced Inter-Edges MeCl Example (AB) B1 Newly Introduced Inter-Edges A1,1 A2,1 A1 A2 A3 a b f g a b k l c PA f g c h d e i j A1,2 A2,2 d e h B1 B2 e d c a h i j i PB f k b A3,2 k l j g l B2

MeCl Example (BA) PA PB B1,1 B1,2 B2,1 B2,2 B2,3 A1 A2 A3 a b f g c d k l c PA f g c h B2,1 B2,2 d e i j d e h B1 B2 e d c a h i i j PB f k b k l j B2,3 g l

MeCl Example (BA) A1 A2 PA PB A3 B1,1 B1,2 B2,1 B2,2 B2,3 A1 A2 A3 a f g a b k l c PA f g c h B2,1 B2,2 d e i j d e h B1 B2 e d c a h i i j PB f k b k l j B2,3 g l A3

Newly Introduced Inter-Edges MeCl Example (BA) A1 A2 Newly Introduced Inter-Edges B1,1 B1,2 A1 A2 A3 a b f g a b k l c PA f g c h B2,1 B2,2 d e i j d e h B1 B2 e d c a h i i j PB f k b k l j B2,3 g l A3

MeCl Calculation PA PB Inter-Edges Introduced MeCl(AB): ({b,e},{e,c},{g,h},{f,h}) MeCl(BA): ({e,i},{h,j},{b,f},{c,f},{h,e}) a b k l PA f g c h d e i j B1 B2 e d c maxW(MAB, MBA) MeCl= 1 - a h Total Edge Weight i PB f k b 5 j MeCl = 1 - = 73.7% g l 19

Similarity Measurement Recap B1: M1 M5 M2 M6 M3 M4 M7 M8 A2: B2: P1 P2 MoJo(P1) = MoJo(P2) = 87.5% PR(P1) = PR(P2) = P:84.6%, R:68.7%, AVGPR=76.7% Conclusion… P1 is equally similar to P2

Similarity Measurement Recap B1: M1 M5 M2 M6 M3 M4 M7 M8 A2: B2: P1 P2 EdgeSim(P1)=77.8% EdgeSim(P2)=58.3% MeCl(P1)=88.9% MeCl(P2)=66.7% Conclusion… P1 is more similar than P2

Summary: EdgeSim & MeCl Rewards clustering algorithms for preserving the edge types Penalizes clustering algorithms for changing the edge types MeCl: Rewards the clustering algorithm for creating cohesive “subclusters”

Special Modules Omnipresent Modules: “Strong” Connection to other Modules A1 A2 A3 a b k l PA f g c h Library Modules: Always used by other modules, never use other modules d e i j B1 B2 e d c a h Isomorphic Modules: Modules equally connected to other subsystems i PB f k b j g l

Special Treatment of Special Modules helps to determine the Similarity b k l PA f g c h d k i Omnipresent Modules: Removed B1 B2 Library Modules: Removed k d c OMNIPRESENT, a h i PB Isomorphic Modules: Replicated f k b g l

Similarity Evaluation Tool Clustering Algorithms Case Study Overview Similarity Evaluation Tool Similarity Analysis Source Code void main() { printf(“hello”); } Precision/ Recall MoJo Clustering Algorithms EdgeSim Clustered Result M1 M2 M3 M5 M4 M6 M7 M8 MeCl Average, Variance, etc. based on 100 clustering runs… (4950 Evaluations)

Case Study Observations All similarity measurements exhibit consistent behavior for the systems studied Removal of “special” modules improved all similarity measurements Treating isomorphic modules specially only improved similarity slightly EdgeSim and MeCl produced higher and less variable similarity values then Precision/Recall and MoJo For all systems examined: If MeCl(SA) < MeCl(SB) then MoJo(SA) < MoJo(SB), PR(SA) < PR(SB), and EdgeSim(SA) < EdgeSim(SB)

Questions Special Thanks To: AT&T Research Sun Microsystems DARPA NSF US Army