Architecture Recovery

Architecture Recovery
Simin Wang Advisor: Prof. Liguo Huang Southern Methodist University

Aspect of Architecture
Design decisions are made and unmade over a system’s lifetime At time t a system has only one architecture Prescriptive architecture (PA) captures design decisions made prior to system construction as-designed Descriptive architecture (DA) describes how the system has been built as-implemented

Aspect of Architecture
Software decay Drift – introduction of design decisions into a system that are not encompassed or implied by its architectural design Erosion – introduction of design decisions into a system that violate its architectural design Architectural Decay Can exist both in the design and code Software smell Commonly made design or implementation decision Negatively impacts your system’s lifecycle properties It is not a bug – it doesn’t break your system It is a manifestation of technical debt

Architecture Recovery
The process of determining a system’s architecture from its implementation-level artifacts (Source code, executable files, Java .class files, etc.) Output is an architectural view (a structured arrangement of a system’s implementation-level artifacts under a set of criteria, or a higher-level representation).

Why Recover Architecture?
Research Maintenance Evaluation Metrics and Issues Resource (work) allocation

Evaluation

How to Recover? Walk around, look, measure? What to recover from?
Humans unavailable, different ideas, afraid to tell truth Documentation not always followed Code is reliable

Methods ACDC ARC WCA LIMBO Bunch ZBR

Clustering vs. Hierarchical Clustering
Clustering is the process of forming groups of items or entities such that entities within a group are similar to one another and different from those in other groups. A hierarchical clustering method produces a classification in which small clusters of very similar molecules are nested within larger clusters of less closely-related molecules.

ACDC Algorithm for Comprehension-Driven Clustering
Recovers components using patterns Source File Pattern Directory Structure Pattern Body-header Pattern (.c and .h file in C) Leaf Collection Pattern (drivers) Support Library Pattern Central Dispatcher Pattern Subgraph Dominator Pattern (G = (V, E)) Dominator node n0, dominator set ni (i = 1..m) A path from n0 to every ni For any node v, exist a path P from v to any ni, either n0  P or v  N

ACDC Stage 1: Skeleton construction. Create a skeleton of the final decomposition by identifying subsystems using a pattern-driven approach Stage 2: Orphan Adoption. Deal with the problem of maintaining a system’s decomposition as the system evolves

ACDC – Skeleton Construction
Source file clusters Body-header conglomeration Leaf collection und support library identification. Ordered and limited subgraph domination. Disregards any files with an out-degree larger than 20 Goes through all the nodes and examines whether they qualify as the dominator node of a subsystem following the subgraph dominator pattern. If a non-empty dominated set is discovered, ACDC creates a subsystem containing both the dominator node and the dominated set. The name of this subsystem is the name of the dominator node plus the suffix “ss”. ACDC organizes the obtained subsystems, the containment hierarchy is a tree Finally, any files that were disregarded earlier are now considered again. Creation of “Support.ss” . Any files that were identified as candidates for the support library pattern in step 3 are assigned to this subsystem, unless they were already assigned to some subsystem during step 4.

ACDC – Orphan Adoption Non-clustered files are the orphans
Attempts to place each newly introduced resource (called an orphan) in the subsystem that seems more appropriate.

ARC Architecture Recovery using Concerns
Recovers concerns of implementation-level entities and uses a hierarchical clustering technique to obtain architectural elements. Compute similarity measures between concerns and identify which concerns appear in a single implementation-level entity.

ARC ARC represents a software system as a set of documents
A document can have different topics, which are the concerns in ARC A topic z is a multinomial probability distribution over words w A document d is represented as a multinomial probability distribution over topics z Each implementation-level entity is treated as a document where its document-topic distribution is its feature vector. Hierarchical clustering is performed by computing similarities between entities using the Jensen-Shannon divergence, which allows computing similarities between document-topic distributions.

WCA Weighted Combined Algorithm
Measures the inter-cluster distance between software entities and merges them into clusters based on this distance Two measures are proposed to measure the inter- cluster distance: Unbiased Ellenberg (UE) and Unbiased Ellenberg-NM (UENM).

WCA Begins by placing each implementation-level entity in its own cluster, where a cluster represents an architectural component. Computes the pair-wise similarity between all the clusters and then combines the two most similar clusters into a new cluster. Repeated until all elements have been clustered or the desired number of clusters is obtained. When two clusters are merged by WCA, a new feature vector is formed by combining the feature vectors of the two clusters.

LIMBO A hierarchical clustering algorithm that aims to make the Information Bottleneck algorithm scalable for large data sets. Uses a mechanism called Summary Artifacts (SA) to reduce the computations needed while minimizing accuracy loss. Uses the Information Loss (IL) measure to compute similarities between entities

Bunch Transforms the architecture recovery problem into an optimization problem. An optimization function called Modularization Quality (MQ) represents the quality of a recovered architecture. Uses hill-climbing and genetic algorithms to find a partition that maximizes MQ.

ZBR Zone Based Recovery
Based on natural language semantics of identifiers found in the source code. Demonstrated accuracy in recovering Java package structure but struggled with memory issues when dealing with larger systems

References Garcia, Joshua, Igor Ivkovic, and Nenad Medvidovic. "A comparative analysis of software architecture recovery techniques." Automated Software Engineering (ASE), 2013 IEEE/ACM 28th International Conference on. IEEE, 2013. Tzerpos, Vassilios, and Richard C. Holt. "Accd: an algorithm for comprehension-driven clustering." Reverse Engineering, Proceedings. Seventh Working Conference on. IEEE, 2000. Maqbool, Onaiza, and Haroon Babri. "Hierarchical clustering for software architecture recovery." IEEE Transactions on Software Engineering 33.11 (2007). Lutellier, Thibaud, et al. "Comparing software architecture recovery techniques using accurate dependencies." Software Engineering (ICSE), IEEE/ACM 37th IEEE International Conference on. Vol. 2. IEEE,

Architecture Recovery

Similar presentations

Presentation on theme: "Architecture Recovery"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Architecture Recovery

Similar presentations

Presentation on theme: "Architecture Recovery"— Presentation transcript:

Similar presentations

About project

Feedback