1 Using Heuristic Search Techniques to Extract Design Abstractions from Source Code The Genetic and Evolutionary Computation Conference (GECCO'02). Brian.

Slides:



Advertisements
Similar presentations
1 An Adaptive GA for Multi Objective Flexible Manufacturing Systems A. Younes, H. Ghenniwa, S. Areibi uoguelph.ca.
Advertisements

Exact and heuristics algorithms
Multi-Objective Optimization NP-Hard Conflicting objectives – Flow shop with both minimum makespan and tardiness objective – TSP problem with minimum distance,
1 An Architecture for Distributing the Computation of Software Clustering Algorithms 2001 Working Conference on Software Architecture (WICSA'01). Brian.
Crunch: Search-based Hierarchy Generation for State Machines Mathew Hall University of Sheffield.
CPSC 322, Lecture 16Slide 1 Stochastic Local Search Variants Computer Science cpsc322, Lecture 16 (Textbook Chpt 4.8) February, 9, 2009.
Spie98-1 Evolutionary Algorithms, Simulated Annealing, and Tabu Search: A Comparative Study H. Youssef, S. M. Sait, H. Adiche
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Optimization via Search CPSC 315 – Programming Studio Spring 2009 Project 2, Lecture 4 Adapted from slides of Yoonsuck Choe.
Nature’s Algorithms David C. Uhrig Tiffany Sharrard CS 477R – Fall 2007 Dr. George Bebis.
Quadtrees, Octrees and their Applications in Digital Image Processing
Iterative Improvement Algorithms
Introduction to Artificial Intelligence Local Search (updated 4/30/2006) Henry Kautz.
System Partitioning Kris Kuchcinski
Artificial Intelligence Genetic Algorithms and Applications of Genetic Algorithms in Compilers Prasad A. Kulkarni.
D Nagesh Kumar, IIScOptimization Methods: M1L4 1 Introduction and Basic Concepts Classical and Advanced Techniques for Optimization.
Clustering Unsupervised learning Generating “classes”
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
Radial Basis Function Networks
Genetic Algorithm.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Swarm Intelligence 虞台文.
A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation Dmitri G. Roussinov Department of.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
Optimization Problems - Optimization: In the real world, there are many problems (e.g. Traveling Salesman Problem, Playing Chess ) that have numerous possible.
1 A Heuristic Approach Towards Solving the Software Clustering Problem ICSM03 Brian S. Mitchell /
Fuzzy Genetic Algorithm
FINAL EXAM SCHEDULER (FES) Department of Computer Engineering Faculty of Engineering & Architecture Yeditepe University By Ersan ERSOY (Engineering Project)
How Much Randomness Makes a Tool Randomized? Petr Fišer, Jan Schmidt Faculty of Information Technology Czech Technical University in Prague
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Thursday, May 9 Heuristic Search: methods for solving difficult optimization problems Handouts: Lecture Notes See the introduction to the paper.
Exact and heuristics algorithms
C OMPARING T HREE H EURISTIC S EARCH M ETHODS FOR F UNCTIONAL P ARTITIONING IN H ARDWARE -S OFTWARE C ODESIGN Theerayod Wiangtong, Peter Y. K. Cheung and.
Random Graph Generator University of CS 8910 – Final Research Project Presentation Professor: Dr. Zhu Presented: December 8, 2010 By: Hanh Tran.
1 The Search Landscape of Graph Partitioning Problems using Coupling and Cohesion as the Clustering Criteria Brian S. Mitchell & Spiros Mancoridis
1 Genetic Algorithms K.Ganesh Introduction GAs and Simulated Annealing The Biology of Genetics The Logic of Genetic Programmes Demo Summary.
Automated Patch Generation Adapted from Tevfik Bultan’s Lecture.
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
Chapter 9 Genetic Algorithms.  Based upon biological evolution  Generate successor hypothesis based upon repeated mutations  Acts as a randomized parallel.
© SERG Reverse Engineering (Interconnection Styles) Interconnection Styles.
For Wednesday Read chapter 6, sections 1-3 Homework: –Chapter 4, exercise 1.
For Wednesday Read chapter 5, sections 1-4 Homework: –Chapter 3, exercise 23. Then do the exercise again, but use greedy heuristic search instead of A*
Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T.
Optimization Problems
Software Clustering Using Bunch
Local Search. Systematic versus local search u Systematic search  Breadth-first, depth-first, IDDFS, A*, IDA*, etc  Keep one or more paths in memory.
An Introduction to Simulated Annealing Kevin Cannons November 24, 2005.
An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
1 Modeling the Search Landscape of Metaheuristic Software Clustering Algorithms Dagstuhl – Software Architecture Brian S. Mitchell
Color Image Segmentation Mentor : Dr. Rajeev Srivastava Students: Achit Kumar Ojha Aseem Kumar Akshay Tyagi.
Quantum Computer Simulation Alex Bush Matt Cole James Hancox Richard Inskip Jan Zaucha.
An Evolutionary Algorithm for Neural Network Learning using Direct Encoding Paul Batchis Department of Computer Science Rutgers University.
Brian Mitchell - Drexel University MCS680-FCS 1 Case Study: Automatic Techniques For Software Modularization int MSTWeight(int.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Extending wireless Ad-Hoc
Multi-Layer Network Representation of the NTC Environment Lili Sun, Proof School Arijit Das, Computer Science Introduction The United States Army’s National.
QianZhu, Liang Chen and Gagan Agrawal
MultiRefactor: Automated Refactoring To Improve Software Quality
Artificial Intelligence (CS 370D)
Subject Name: Operation Research Subject Code: 10CS661 Prepared By:Mrs
Example: Applying EC to the TSP Problem
Software Clustering.
DATA MINING Introductory and Advanced Topics Part II - Clustering
Boltzmann Machine (BM) (§6.4)
2001 IEEE International Conference on Software Maintenance (ICSM'01).
Coevolutionary Automated Software Correction
Presentation transcript:

1 Using Heuristic Search Techniques to Extract Design Abstractions from Source Code The Genetic and Evolutionary Computation Conference (GECCO'02). Brian S. Mitchell & Spiros Mancoridis Math & Computer Science, Drexel University

Drexel University Software Engineering Research Group (SERG) 2 Software Clustering Background Software clustering simplifies program maintenance and program understanding Software clustering techniques help developers fix defects (maintenance), or add a features (program understanding) to existing software systems

Drexel University Software Engineering Research Group (SERG) 3 Understanding the Software Structure It’s important to understand the software structure when fixing or extending a software system Desirable to change as few of the existing modules/classes as possible Problem 1: The structure is complex and often not documented for large systems Problem 2: Ad hoc changes to the source code tend to deteriorate the system’s structure over time

Drexel University Software Engineering Research Group (SERG) 4 Clustering Techniques A variety of techniques for software clustering have been studied by the reverse engineering community: Source code component similarity (or dissimilarity) Concept Analysis Subsystem Patterns Implementation-Specific Information Our clustering approach uses search algorithms

Drexel University Software Engineering Research Group (SERG) 5 Design Extraction with Bunch Source Code Analysis Tools MDG File Bunch Clustering Tool Partitioned MDG File Visualization Tool Source Code void main() { printf(“hello”); } AcaciaChava M1 M2 M3 M5M4 M6 M7M8 M1 M2 M3 M5M4 M6 M7M8 Bunch GUI Clustering Algorithms Clustering Tools Programming API

Drexel University Software Engineering Research Group (SERG) 6 Step 1: Creating the MDG Example: The MDG for Apache’s Regular Expression class library Source Code Analysis Tools Source Code void main() { printf(“hello”); } AcaciaChava 1.The MDG can be generated automatically using source code analysis tools 2.Nodes are the modules/classes, edges represent source-code relations 3.Edge weights can be established in many ways, and different MDGs can be created depending on the types of relations considered

Drexel University Software Engineering Research Group (SERG) 7 Software Clustering with Search Algorithms Source Code Analysis Tools MDG Source Code void main() { printf(“hello”); } AcaciaChava M1 M2 M3 M5M4 M6 M7M8 Software Clustering Search Algorithms bP = null; while(searching()) { p = selectNext(); if(p.isBetter(bP)) bP = p; } return bP; “GOOD” MDG Partition M1 M2 M3 M5M4 M6 M7M8 SEARCH SPACE Set of All MDG Partitions M1 M2 M3 M5M4 M6 M8M7 M1 M2 M3 M5M4 M6 M8M7 Total = 4140 Partitions

Drexel University Software Engineering Research Group (SERG) 8 Software Clustering with Search Algorithms Source Code Analysis Tools MDG File Source Code void main() { printf(“hello”); } AcaciaChava M1 M2 M3 M5M4 M6 M7M8 Software Clustering Search Algorithms bP = null; while(searching()) { p = selectNext(); if(p.isBetter(bP)) bP = p; } return bP; “GOOD” MDG Partition M1 M2 M3 M5M4 M6 M7M8 SEARCH SPACE Set of All MDG Partitions M1 M2 M3 M5M4 M6 M8M7 M1 M2 M3 M5M4 M6 M8M7 Total = 4140 Partitions Search Algorithm Requirements Must be able to compare one partition to another objectively. We define the Modularization Quality (MQ) measurement to meet this goal.  Given partitions P1 & P2, MQ(P1) > MQ(P2) means that P1 “is better than” P2 Search Algorithm Requirements Must be able to compare one partition to another objectively. We define the Modularization Quality (MQ) measurement to meet this goal.  Given partitions P1 & P2, MQ(P1) > MQ(P2) means that P1 “is better than” P2

Drexel University Software Engineering Research Group (SERG) 9 Problem: There are too many partitions of the MDG… 1 = 1 2 = 2 3 = 5 4 = 15 5 = 52 6 = = = = = = = = = = = = = = =        otherwisekSS nkkif S knkn kn,11,1, 11 A 15 Module System is about the limit for performing Exhaustive Analysis The number of MDG partitions grows very quickly, as the number of modules in the system increases…

Drexel University Software Engineering Research Group (SERG) 10 Our Approach to Automatic Clustering “Treat automatic clustering as a searching problem” Maximize an objective function that formally quantifies of the “quality” of an MDG partition. We refer to the value of the objective function as the modularization quality (MQ)

Drexel University Software Engineering Research Group (SERG) 11 Edge Types With respect to each cluster, there are two different kinds of edges:  edges (Intra-Edges) which are edges that start and end within the same cluster  edges (Inter-Edges) which are edges that start and end in different clusters a bc CLUSTER Other Clusters

Drexel University Software Engineering Research Group (SERG) 12 Our Assumption… “Well designed software systems are organized into cohesive clusters that are loosely interconnected.” The MQ measurement design must: Increase as the weight of the intra-edges increases Decrease as the weight of the inter-edges increases

Drexel University Software Engineering Research Group (SERG) 13 MDG Not all Partitions are Created Equal... Good Partition!Bad Partition! M1 M2 M1 M2M3 M1 M2 M4 M3 M5 M6M3 M4 M5M6 M4 M5 M6 MQ( Good Partition ) > MQ( Bad Partition )

Drexel University Software Engineering Research Group (SERG) 14 The Software Clustering Problem: Algorithm Objectives “Find a good partition of the MDG.” A partition is the decomposition of a set of elements (i.e., all the nodes of the graph) into mutually disjoint clusters. A good partition is a partition where: highly interdependent nodes are grouped in the same clusters independent nodes are assigned to separate clusters The better the partition the higher the MQ

Drexel University Software Engineering Research Group (SERG) 15 Bunch Hill Climbing Clustering Algorithm Generate a Random Decomposition of MDG Iteration Step Generate Next Neighbor Measure MQ Compare to Best Neighboring Partition Better Measure MQ Best Neighboring Partition New Best Neighboring Partition Convergence Best Neighboring Partition for Iteration Current Partition A neighbor partition is created by altering the current partition slightly. Neighbor Partition Better?

Drexel University Software Engineering Research Group (SERG) 16 Bunch Genetic Clustering Algorithm (GA) Generate a Starting Population from the MDG Iteration Step Crossover Operation Best Partition from Final Population All Generations Processed Current Population P1 P2 P3 Pn Next Population P1 P2 P3 Pn P2 Mutation Operation Next Generation P1 P2 P3 Pn P1 P2 P3 Pn Favor Partitions with Larger MQ Values for Crossover Operation RANDOM SELECTION Mutate (Alter) a Small Number of Partitions RANDOM SELECTION

Drexel University Software Engineering Research Group (SERG) 17 Clustering Example – Apache Regular Expression Library Random Partition Bunch Partition < 5 Relations 5-10 Relations >10 Relations MDG

Drexel University Software Engineering Research Group (SERG) 18 Bunch Hill Climbing Clustering Algorithm – Extended Features Generate a Random Decomposition of MDG Iteration Step Generate Next Neighbor Measure MQ Compare to Best Neighboring Partition Better Measure MQ Best Neighboring Partition New Best Neighboring Partition Convergence Best Neighboring Partition for Iteration Current Partition A neighbor partition is created by altering the current partition slightly. Neighbor Partition Better? Hill-Climbing Algorithm Extended Features Adjustable Clustering Threshold Simulated Annealing Hill-Climbing Algorithm Extended Features Adjustable Clustering Threshold Simulated Annealing

Drexel University Software Engineering Research Group (SERG) 19 Research Objectives Investigate if the new hill-climbing clustering features impact: The clustering results Clustering performance Goals Provide configuration guidance to Bunch users Determine performance versus quality tradeoffs associated with different Bunch configurations Gain intuition into the search space of different systems

Drexel University Software Engineering Research Group (SERG) 20 Case Study Design Basic test consisted of 1,050 clustering runs 50 runs with clustering threshold set to 0% Incremented clustering threshold by 5% and repeated the test until clustering threshold reached 100% Repeated the basic test 3 additional times with simulated annealing altering the initial temperature T(0) and cooling rate  Examined 5 systems – compiler, ispell, rcs, dot, and swing We used the Bunch API for the case study

Drexel University Software Engineering Research Group (SERG) 21 Case Study Results – RCS No SA T(0)=100  =.99 T(0)=100  =.90 T(0)=100  =.80 Clustering Threshold & MQ Clustering Threshold & MQ Evals. MQ of Random Partitions

Drexel University Software Engineering Research Group (SERG) 22 Case Study Results – Swing No SA T(0)=100  =.99 T(0)=100  =.90 T(0)=100  =.80 Clustering Threshold & MQ Clustering Threshold & MQ Evals. MQ of Random Partitions

Drexel University Software Engineering Research Group (SERG) 23 Case Study Results - Summary The clustering threshold had an expected and consistent impact on the clustering runtime The clustering threshold did not appear to have any impact on the quality of the clustering results The hill-climbing algorithm provides some intuition into the search landscape for the systems studied The software clustering results always were better than random generated clusters

Drexel University Software Engineering Research Group (SERG) 24 Case Study Results - Summary Compiler ispell rcs dot swing Intuition into the search landscape… Rare Partitions Systems That Converge To A Consistent Neighborhood Multimodal Search Space

Drexel University Software Engineering Research Group (SERG) 25 Case Study Results - Summary Simulated annealing did not have any noticeable impact on the quality of clustering results. Simulated annealing did appear to reduce the overall runtime needed to cluster the sample systems.

Drexel University Software Engineering Research Group (SERG) 26 Concluding Remarks It was expected that increasing the clustering threshold would impact the runtime or clustering results – neither was found to be true Simulated annealing did not improve the quality of the clustering results but did decrease the overall clustering runtime We obtained some intuition into the search landscape of the systems studied

Drexel University Software Engineering Research Group (SERG) 27 Questions Special Thanks To: AT&T Research Sun Microsystems DARPA NSF US Army