Application of Graph Theory to OO Software Engineering Alexander Chatzigeorgiou, Nikolaos Tsantalis, George Stephanides Department of Applied Informatics University of Macedonia Thessaloniki, Greece WISER 2006, May 20, 2006, Shanghai, China
Motivation Application of Graph Theory to SE is not new: Planning: network diagrams (CPM, PERT) Analysis: DFDs, FSMs, Petri Nets Design: everything is essentially a graph Testing: McCabe's complexity measure... Graph Theory is suitable for object-oriented SE: Class diagrams can be perfectly mapped to graphs
System Representation
Identification of "God" classes Goal: to identify heavily loaded classes of an OO design such "God" classes imply a poor model Inspiration comes from the Web (HITS algorithm)
HITS Relative Importance: Low Relative Importance: High
Identification of "God" classes OO system : directed graph G=(V, E) classes vertices associations edges Each edge is annotated with an integer m p,q corresponding to the number of discrete messages sent to the same direction from p to q.
Identification of "God" classes
Using theorems from Linear Algebra, authority/hub weights can be obtained by finding the principal eigenvectors of A T A and AA T
Identification of "God" classes
Clustering Goal: to partition the system into strongly communicating classes might imply relevance of functionality might imply possible reusable components Spectral graph partitioning employs the degree matrix (diagonal matrix containing the degrees of vertices), and the Laplacian matrix, defined as L = D – A the smallest eigenvalue of L is always zero
Clustering the properties of the eigenvector x 2 associated with the second smallest eigenvalue λ 2 have been explored by M. Fiedler Clustering a graph G into two sub-graphs according to the positive and negatives entries of the Fiedler vector, corresponds to a partition which minimizes the weight of the cut set
Clustering weight cut-set = weight cut-set = weight cut-set = 1 provided by Fiedler vector
Clustering Application to OO systems: edges are undirected and edge weight is the sum of number of messages exchanged in both directions Partitioning is performed iteratively When to stop ? when a resulting graph is less cohesive than the parent graph
Clustering
DB Logic GUI
Design Pattern Detection Design Patterns (descriptions of communicating classes): form solutions to common problems According to Parnas software engineering deals with multi-version projects Multiple Versions + Large Number of Components = Complicated and messy architecture Patterns impose structure Consequently, the identification of implemented patterns is useful for understanding an existing design enables further improvements
Design Pattern Detection
Classical pattern matching algorithms fail since patterns often differ from the standard representation System Segment 1 System Segment 2 Pattern
Design Pattern Detection Exploiting recent research on graph similarity [Blondel2004] it is possible to measure the degree of similarity between two vertices
Design Pattern Detection similarity: 1 similarity: 0 similarity: 0.5 similarity: 1
Design Pattern Detection System Segment 1 System Segment 2 Pattern
Design Pattern Detection Experimental Results: JHotDraw v5.1 (172 classes) JRefactory (572 classes) JUnit 3.7 (99 classes)
Design Pattern Detection
Scale-Freeness of OO Systems Popular topic: investigation of whether certain systems (technological, biological, social etc) are scale-free A scale-free phenomenon shows up statistically in the form of power law. For a network, the probability P(k) that a node in the network connects with k other nodes is P(k) ~ k -γ
Scale-Freeness of OO Systems Naturally, research has also focused on OO systems Scale-freeness is usually graphically detected, since the relationship of P(k) vs. k, plotted on a log-log scale, appears as a line with slope -γ
Scale-Freeness of OO Systems
Recently, in [Li2005], a structural metric has been proposed to evaluate the scale-freeness of a network. For an undirected, simple and connected graph g=(V,E) The metric value is maximized when high-degree nodes ("hubs") are connected to other high-degree nodes. Among all graphs having the same degree sequence, there is a graph smax that maximizes the value of the metric s(g) and a graph smin that minimizes it. Thus:
Scale-Freeness of OO Systems Given such a metric, it is possible: to validate whether a given OO system is scale-free to assess whether an optimization increases scale-freeness to evaluate the evolution of systems in terms of scale-freeness
Conclusions Graph Theory has been widely applied on several CS fields It can provide a powerful "tool" for analyzing OO systems quantification of properties identification of structures Graph Theory is important for CS curricula
Application of Graph Theory to OO Software Engineering Thank you for your attention WISER 2006, May 20, 2006, Shanghai, China