Brian Mitchell - Drexel University MCS680-FCS 1 Case Study: Automatic Techniques For Software Modularization int MSTWeight(int.

Slides:



Advertisements
Similar presentations
Provenance-Aware Storage Systems Margo Seltzer April 29, 2005.
Advertisements

Ch:8 Design Concepts S.W Design should have following quality attribute: Functionality Usability Reliability Performance Supportability (extensibility,
CS6800 Advanced Theory of Computation
Greedy Algorithms Greed is good. (Some of the time)
High Quality Code Why it matters. By Ryan Ruzich.
1 An Architecture for Distributing the Computation of Software Clustering Algorithms 2001 Working Conference on Software Architecture (WICSA'01). Brian.
CMSC 345, Version 11/07 SD Vick from S. Mitchell Software Testing.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
System Partitioning Kris Kuchcinski
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Dynamic Programming Introduction to Algorithms Dynamic Programming CSE 680 Prof. Roger Crawfis.
Genetic Programming.
1 Using Heuristic Search Techniques to Extract Design Abstractions from Source Code The Genetic and Evolutionary Computation Conference (GECCO'02). Brian.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 14Slide 1 Design with Reuse l Building software from reusable components.
GENERAL CONCEPTS OF OOPS INTRODUCTION With rapidly changing world and highly competitive and versatile nature of industry, the operations are becoming.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.
SOFTWARE DESIGN.
Chapter 06 (Part I) Functions and an Introduction to Recursion.
Software Development Cycle What is Software? Instructions (computer programs) that when executed provide desired function and performance Data structures.
SE: CHAPTER 7 Writing The Program
1 A Heuristic Approach Towards Solving the Software Clustering Problem ICSM03 Brian S. Mitchell /
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Software Testing Yonsei University 2 nd Semester, 2014 Woo-Cheol Kim.
FINAL EXAM SCHEDULER (FES) Department of Computer Engineering Faculty of Engineering & Architecture Yeditepe University By Ersan ERSOY (Engineering Project)
Software Testing Reference: Software Engineering, Ian Sommerville, 6 th edition, Chapter 20.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 11 Slide 1 Design.
Design Concepts By Deepika Chaudhary.
CS Data Structures I Chapter 2 Principles of Programming & Software Engineering.
Exact and heuristics algorithms
An Automatic Software Quality Measurement System.
Computer Systems & Architecture Lesson 4 8. Reconstructing Software Architectures.
SOFTWARE DESIGN. INTRODUCTION There are 3 distinct types of activities in design 1.External design 2.Architectural design 3.Detailed design Architectural.
1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.
1 The Search Landscape of Graph Partitioning Problems using Coupling and Cohesion as the Clustering Criteria Brian S. Mitchell & Spiros Mancoridis
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Introduction to Genetic Algorithms. Genetic Algorithms We’ve covered enough material that we can write programs that use genetic algorithms! –More advanced.
© SERG Reverse Engineering (Interconnection Styles) Interconnection Styles.
Data Structures Using C++ 2E
1 / 26 CS 425/625 Software Engineering Architectural Design Based on Chapter 10 of the textbook [Somm00] Ian Sommerville, Software Engineering, 6 th Ed.,
Software Engineering and Object-Oriented Design Topics: Solutions Modules Key Programming Issues Development Methods Object-Oriented Principles.
Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T.
Optimization Problems
Written by Changhyun, SON Chapter 5. Introduction to Design Optimization - 1 PART II Design Optimization.
1 Advanced Software Architecture Muhammad Bilal Bashir PhD Scholar (Computer Science) Mohammad Ali Jinnah University.
Software Clustering Using Bunch
Chapter 15: Recursion. Objectives In this chapter, you will: – Learn about recursive definitions – Explore the base case and the general case of a recursive.
Introduction to OOP CPS235: Introduction.
Brian Mitchell - Drexel University MCS680-FCS 1 Brian Mitchell
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Agenda  INTRODUCTION  GENETIC ALGORITHMS  GENETIC ALGORITHMS FOR EXPLORING QUERY SPACE  SYSTEM ARCHITECTURE  THE EFFECT OF DIFFERENT MUTATION RATES.
Chapter 15: Recursion. Objectives In this chapter, you will: – Learn about recursive definitions – Explore the base case and the general case of a recursive.
1 Modeling the Search Landscape of Metaheuristic Software Clustering Algorithms Dagstuhl – Software Architecture Brian S. Mitchell
1 Comparative Study of two Genetic Algorithms Based Task Allocation Models in Distributed Computing System Oğuzhan TAŞ 2005.
On the Ability of Graph Coloring Heuristics to Find Substructures in Social Networks David Chalupa By, Tejaswini Nallagatla.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Advanced Computer Systems
MCS680: Foundations Of Computer Science
MultiRefactor: Automated Refactoring To Improve Software Quality
Algorithm An algorithm is a finite set of steps required to solve a problem. An algorithm must have following properties: Input: An algorithm must have.
Objective of This Course
Software Clustering.
Block Matching for Ontologies
The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’
MCS680: Foundations Of Computer Science
EE368 Soft Computing Genetic Algorithms.
2001 IEEE International Conference on Software Maintenance (ICSM'01).
Presentation transcript:

Brian Mitchell - Drexel University MCS680-FCS 1 Case Study: Automatic Techniques For Software Modularization int MSTWeight(int graph[][], int size) { int i,j; int weight = 0; for(i=0; i<size; i++) for(j=0; j<size; j++) weight+= graph[i][j]; return weight; } 1111 n n O(1) O(n) Running Time = 2O(1) + O(n 2 ) = O(n 2 ) MCS680: Foundations Of Computer Science

Brian Mitchell - Drexel University MCS680-FCS 2 Introduction This topic reinforces the concepts of set and graph theory by demonstrating a current research area –Algorithms for Automatic Software Modularization This research was conducted by Drexel faculity: –Brian Mitchell –Spiros Mancoridis –Chris Rorres

Brian Mitchell - Drexel University MCS680-FCS 3 Software Engineering Problem Software maintenance is an arduous task because of the difficulties associated with understanding the intricate relationships that exist between the source code components –Design document is inaccurate –Original system architect/designer is no longer available for consultation With no mechanism for gaining insight into the system design and structure, the software maintenance practitioner is often forced to make modifications to the source code without a through understanding of the systems organization Also, heavily used software systems change rapidly –Use of an “ad-hoc” maintenance approach will negatively affect the system design

Brian Mitchell - Drexel University MCS680-FCS 4 Software Engineering Problem Software engineers have long known of the difficulties associated with maintaining software systems whose only current documentation is limited to the source code Leads to decay in the design due to source code changes that are made without an understanding of the system structure –Size of modern day software systems is beyond a programmers cognitive ability to determine the affect of a local change on the entire system –Changes made to the source code without an understanding of it’s organization usually contradict one or more aspects of the original design Goal is to give the programmer a tool that visualizes the modularization of the system

Brian Mitchell - Drexel University MCS680-FCS 5 Other Work In Field Top-Down Approaches –Tools such as “Rigi” and “Arch” have been developed to perform a modularization of a software system Still requires somebody familiar with the system to provide feedback and/or set system-specific parameters Bottom-Up Approaches –Software Reflection Model Used to capture and exploit the differences that exist between the actual source code organization and the designers high-level model of the systems modularization Streamline learning process –The Orphan Adoption Problem Given the name of a new software resource (an orphan), this tool emits as output the name of the subsystem that has been chosen as the parent for the orphan

Brian Mitchell - Drexel University MCS680-FCS 6 Our Automatic Modularization Tool Implements algorithms that we developed that –Are fully automatic –Recursively generates a hierarchical view of of the system organization based solely on information extracted from the source code Fully automatic techniques are not only useful to programmers that lack familiarity with the system, but can also be used by the system architect to compare the documented modularization, with the one created by our tool and learn from the differences

Brian Mitchell - Drexel University MCS680-FCS 7 Software System Organization Software systems contain a finite set of software components and a collection of relationships that govern how the software components interact with each other Typical software components –Classes, Modules –Variables, Macros –Structures Typical software relationships –Import –Export –Inherit Can represent the system structure as a resource dependency graph –The information required to build this graph can be obtained by parsing the source code

Brian Mitchell - Drexel University MCS680-FCS 8 Example Resource Dependency Graph: Plan9 The following resource dependency graph was automatically generated by scanning the source code from the file system of the Plan9 operating system –Access to source code provided by AT&T Labs

Brian Mitchell - Drexel University MCS680-FCS 9 Goals of Research Goal of our research is to automatically partition the components of a system into clusters that maximize cohesion and minimize coupling The clusters once discovered represent a higher level abstraction of the systems organization by grouping related software components into subsystems Each subsystem contains a collection of modules that either –Cooperate to perform some high-level function in the overall system Scanner, parser, code generator –Provide a set of related services that are used throughout the system Import Library File manager, memory manger

Brian Mitchell - Drexel University MCS680-FCS 10 Automatically Modularized Visualization of Plan9 OS The following graph was derived by our clustering utility Formal definitions for cohesion, coupling and modularization quality must now be developed in order to illustrate our process

Brian Mitchell - Drexel University MCS680-FCS 11 Architecture of our Clustering Environment { cout... } Source Code Modules CIA Utility scan Parse Source Code XREF Database generate Awk Script - Query - Format scan Clustering Engine generate DOT File read DOTTY Utility read Clustered Graph display

Brian Mitchell - Drexel University MCS680-FCS 12 Quantifying Cohesion Cohesion is an indication of the strength of the relationships that exist between modules that are grouped into a cluster. –High cohesion = Strong Encapsulation. We define cohesion (H) as a measurement of intra-edge dependencies between the components in a particular cluster. –Formally, the cohesion H i of cluster i consisting of N i components and  i intra-edge dependencies is: This measurement is a percentage of intra-edge dependencies, which is N i 2.

Brian Mitchell - Drexel University MCS680-FCS 13 Qualifying Coupling Coupling (C) is a measurement of inter- edge dependencies between the components of two distinct clusters The coupling C i,j between clusters i and j each consisting of N i and N j components respectively, and  i,j inter-edge dependencies is: This measurement is a percentage of the maximum number of inter-edge dependencies between clusters i and j

Brian Mitchell - Drexel University MCS680-FCS 14 Modularization Quality Modularization Quality (MQ) is defined as the measurement of the “goodness” of a particular system modularization. –Specifically, the MQ of a modularization of k clusters, where H i is the cohesion of the i th cluster and C i,j is the coupling between the i th and j th clusters is: –This measurement shows the trade-off between cohesion and coupling by Rewarding many small highly-cohesive clusters Penalizing too many inter-edges

Brian Mitchell - Drexel University MCS680-FCS 15 Modularization Quality Example Subsystem 1 M 1 M 2 M 3 Subsystem 2 M 4 M 5 Subsystem 3 M 6 M 7 M 8

Brian Mitchell - Drexel University MCS680-FCS 16 Partitions of a Set Must construct a data model to represent a partition (a clustering) of a software system Consider the source code organization for system S. –S = {M 1, M 2, …, M n } –Let a collection  = {A 1, A 2, …, A n } be a set of non-empty subsets such that each A i  S.  is a partition of S if: The subsets are a covering of S The subsets are mutually exclusive Each subset A i is called a cluster of the partition A partition of S onto k non-empty clusters is called a k-partition of S

Brian Mitchell - Drexel University MCS680-FCS 17 Number of k-Partititions of a Set Let S be a set of n elements. The number of k-partitions of an n-set satisifies the recurrence equation: The entries S n,k are called Stirling numbers Striling numbers govern the number of k- partitions of a set. Stirling numbers grow exponentially with respect to the size of S.

Brian Mitchell - Drexel University MCS680-FCS 18 Clustering: Optimal Solution Algorithm –Let S = {M 1, M 2, …, M n }, where each M i is a module in the software system –Let G be the graph representing the relationships between the modules in S –Generate every partition of set S –Evaluate MQ for each partition –The partition with the largest MQ is the optimal solution The algorithm works well for sets of up to 15 elements, beyond that the number of k- partitions becomes too large to enumerate in a reasonable timeframe Clearly, sub-optimal techniques must be employed for large sets

Brian Mitchell - Drexel University MCS680-FCS 19 How many k-partitions are there? 1 = 1 2 = 2 3 = 5 4 = 15 5 = 52 6 = = = = = = = = = = = = = = = The following table illustrates the number of k-partitions of a system given that the system has N modules.

Brian Mitchell - Drexel University MCS680-FCS 20 Sub-Optimal Modularization Strategy The search space required for enumerating all possible partitions is too large in most software systems –We need to develop a search strategy that quickly discovers an acceptable sub- optimal clustering Generic Sub-Optimal Algorithm Construct a resource dependency graph G that represents the relationships between the modules in S. Generate a uniformly distributed random clusterings of S. We use a combinatorial algorithm to accomplish this task because our sub-optimal techniques require the generation of many random clusterings. Iteratively improve a randomly generated clustering, by measuring its MQ, until no further improvement is possible. This task is accomplished by heuristically moving modules in S between the generated clusters. Repeat this process until an acceptable sub-optimal result it determined.

Brian Mitchell - Drexel University MCS680-FCS 21 Neighboring Partition We need a way to improve a partitions MQ We define a partition NP to be a neighbor of a partition P if and only if: –NP is exactly the same as P except that a single element of P is in a different cluster in partition NP

Brian Mitchell - Drexel University MCS680-FCS 22 Generic Sub-Optimal Algorithm Algorithm –Let S = {M 1, M 2, …, M n }, where each M i is a module in the software system –Let G be the graph representing the relationships between the modules in S –Generate a random partition P of set S –If possible, find a neighboring partition NP that has an improved MQ over P –If an improved neighboring partition is found Let P = NP –P is the sub-optimal solution A variety of algorithms for finding sub- optimal solutions are possible, depending on how “improved” is defined

Brian Mitchell - Drexel University MCS680-FCS 23 Steepest-Ascent Hill Climbing (SAHC Algorithm) Algorithm –Let S = {M 1, M 2, …, M n }, where each M i is a module in the software system –Let G be the graph representing the relationships between the modules in S –Generate a random partition P of set S –Repeat Find the best neighboring partition BNP that has MQ(BNP) > MQ(P) If an improved BNP is found such that MQ(BNP) > MQ(P) –Let P = BNP –Until no further “improved” BNP’s can be found –P is the sub-optimal solution BNP may be expensive to calculate –All neighboring partitions of P must be examined

Brian Mitchell - Drexel University MCS680-FCS 24 Next-Ascent Hill Climbing (NAHC) Algorithm Algorithm –Let S = {M 1, M 2, …, M n }, where each M i is a module in the software system –Let G be the graph representing the relationships between the modules in S –Generate a random partition P of set S –Repeat Find a better neighboring partition bNP that has MQ(bNP) > MQ(P) If an improved bNP is found such that MQ(bNP) > MQ(P) –Let P = bNP –Until no further “improved” BNP’s can be found –P is the sub-optimal solution A bNP is discovered by randomly searching the set of neighboring partitions until a partition with a higher MQ is found –Usually, not all NP’s will have to be examined

Brian Mitchell - Drexel University MCS680-FCS 25 A Genetic Algorithm Framework Our experimentation with the SAHC and NAHC algorithms have shown that given an initial random starting partition that –The algorithms will converge to a local maximum –However, not all initial partitions converge to an acceptable result Therefore we must either: –Run the experiment many times using different initial partitions and pick the experiment that results in the largest MQ –Or, Devise an approach that works with a population of randomly generated initial partitions and concurrently improves them until all of the initial samples converge The partition in the final population with the largest MQ is the sub-optimal solution This approach lends itself to being implemented with a Genetic Algorithm

Brian Mitchell - Drexel University MCS680-FCS 26 Genetic Algorithms Genetic algorithms were first developed by John Holland et. al. at the University of Michigan Genetic algorithms have been applied to many problems that involve exploring large search spaces Characteristics of GA’s –Combine survival-of-the-fittest techniques with a structured and randomized information exchange Facilitates innovative algorithms that parallel the natural human selection process GA are more than a randomized search, instead, they exploit historical data to speculate new information that is expected to yield improved results

Brian Mitchell - Drexel University MCS680-FCS 27 Genetic Search Sub-Optimal Clustering Algorithm Algorithm –Let S = {M 1, M 2, …, M n }, where each M i is a module in the software system –Let G be the graph representing the relationships between the modules in S –Generate a random partition P of set S –Repeat Randomly select a percentage of partitions from the population and improve them using the SAHC or NAHC technique Generate a new population (from the current one) by using a biased wheel that favors partitions with larger MQ –Let P = bNP –Until no improvement is seen for t generations, until the population has converged, or until the max. number of generations has been executed –P in the final generation with the largest MQ is the sub-optimal solution

Brian Mitchell - Drexel University MCS680-FCS 28 Agglomerative Clustering The prevous algorithms discovered subsystems based on the graph that was formed by recovering the relationships that existed in the source code components In most systems, however, we are interested in finding a hierarchy of subsystems that capture the higher-order relationships that exist in the software Wrapping our algorithms with an agglomerative clustering engine solves this problem

Brian Mitchell - Drexel University MCS680-FCS 29 Agglomerative Clustering Algorithm Algorithm –Let S = {M 1, M 2, …, M n } –Let G be the resource dependency graph –Let Q be a queue –Repeat Find a maximal partition (Pmax) of S using the Optimal, SAHC or NAHC algorithm Save partition Pmax on Q Now let S = {C 1, C 2, …, C n } where each Ci is a cluster in Pmax Build a new graph G by treating each cluster in Pmax as a single element. Furthermore if there is at least one edge between any two clusters in Pmax then there is an edge between their representative nodes in G –Until Pmax has coalesced into a single cluster –Q contains a hierarchy of partitions

Brian Mitchell - Drexel University MCS680-FCS 30 Where to Get the Clustering Engine We have implemented and applied the clustering engines to many examples The system can be downloaded on the Web from the Drexel University Software Engineering Reasearch Group (SERG) hompeage at: – The clustering engine was developed using the Java 1.1 programming language

Brian Mitchell - Drexel University MCS680-FCS 31 Compiler Example

Brian Mitchell - Drexel University MCS680-FCS 32 Boxer (Autolayout Utility) Example