1 Modeling the Search Landscape of Metaheuristic Software Clustering Algorithms Dagstuhl – Software Architecture Brian S. Mitchell or Department of Computer Science College of Engineering Drexel University Philadelphia, PA, USA
Drexel University Software Engineering Research Group (SERG) 2 Understanding Large Systems is HARD Example: RedHat Linux 7.1 Kernel 1,400 modules, 2.5M LOC System 350K modules, 30M LOC Languages: > 19 (including scripting) [ Manual Analysis is Tedious and Error Prone Source Code Analysis Approaches Create Large Repositories Software Clustering Approaches Create Abstract Representations (1) (2) (3)
Drexel University Software Engineering Research Group (SERG) 3 Software Clustering Bunch Tool Requires a Representation... …A Clustering Algorithm… …And a way to Represent Results… Researchers Have Examined Many Different Approaches for Software Clustering
Drexel University Software Engineering Research Group (SERG) 4 Search-Based Software Clustering with Bunch Bunch Uses Metaheuristic Search Algorithms for Software Clustering
Drexel University Software Engineering Research Group (SERG) 5 Bunch Example The MDG The Random Start Point The Solution
Drexel University Software Engineering Research Group (SERG) 6 Evaluating Bunch’s Results Observation: Bunch produces similar results This is desirable, but This is unexpected considering the use of metaheuristic search algorithms Some evaluation has been done “Good Enough” via empirical studies Similarity Analysis [WCRE01,ICSM01] Comparing to spectral clustering techniques [WCRE02] We were intrigued to investigate why Bunch’s results are consistently similar Bunch Produces A “Family” of Related Results
Drexel University Software Engineering Research Group (SERG) 7 The Search Landscape Search Landscape Modeler Structural LandscapeSimilarity Landscape What are some common properties, if any, in the MDG partitions? How similar are the contents of the MDG partitions? MDG Bunch Tool Clustering Results Cluster a System Many Times, Look for Patterns in the Clustering Results that Provide Insight into the Search Space Can Modeling the Search Space be useful for Evaluation?
Drexel University Software Engineering Research Group (SERG) 8 The Structural Landscape – What do we Expect? The Structural Landscape is Modeled using a Series of Views MQ vs Number of Clusters Intra- Edge Density MQ Value Number of Clusters We expect to see a relationship between MQ and the number of clusters. Both MQ and the number of clusters in the partitioned MDG should not vary widely across clustering runs. We expect a good result to produce a high percentage of intraedges (edges that start and end in the same cluster) consistently. We expect repeated clustering runs to produce similar MQ results. We expect that the number of clusters remains relatively consistent across multiple clustering runs. Comparing Bunch’s Final Results against the Initial Random Partitioned MDG
Drexel University Software Engineering Research Group (SERG) 9 The Similarity Landscape – What do we Expect? a bc CLUSTER Other Clusters edges (Intra-Edges) edges (Inter-Edges) 1.Create a counter C for each edge, initialize to zero 2.Cluster a system many times, For each run: For each edge, Increment C if is an Intraedge 3.After all Runs, determine P which is the percentage of times that each appeares as an Intraedge None Low MediumHigh Aggregate the P based on the level of agreement LARGE Dissimilarity MODERATE Dissimilarity NOT Similar VERY Similar Our Expectations
Drexel University Software Engineering Research Group (SERG) 10 Case Study System Name Number Modules Number Relations Description Telnet2881Terminal Emulator PHP62191Internet Scripting Language Bash92901Unix Terminal Environment Lynx1481,745Text-Based HTML Browser Bunch220764Software Clustering Tool Swing4131,513Standard Java User Interface Framework Kerberos 55583,793Security Services Infrastructure We also looked at 6 randomly generated MDGs
Drexel University Software Engineering Research Group (SERG) 11 Structural Landscape (1) The independent samples were ordered by MQ to highlight some relationships that would not be obvious otherwise.
Drexel University Software Engineering Research Group (SERG) 12 Structural Landscape (2)
Drexel University Software Engineering Research Group (SERG) 13 Structural Landscape (3) – Random MDGs
Drexel University Software Engineering Research Group (SERG) 14 Structural Landscape (4) – Random MDGs
Drexel University Software Engineering Research Group (SERG) 15 Structural Landscape - Observations There was significant commonality across the clustering results Many desirable aspects A lot of commonality between the random and open source systems Some additional variability in the MQ vs Cluster Size relationship for the random MDGs More variability in the clustering results for the random graphs with higher edge densities
Drexel University Software Engineering Research Group (SERG) 16 Similarity Landscape (1) ZeroLowMediumHigh Open Source Systems Random MDGs
Drexel University Software Engineering Research Group (SERG) 17 Similarity Landscape (2) ZeroLowMediumHigh Open Source Systems Random MDGs - Low Random MDGs - High
Drexel University Software Engineering Research Group (SERG) 18 Observations – Similarity Landscape Open Source systems exhibited expected trends High dissimilarity and high similarity Low medium similarity Random MDGs had much higher medium similarity, and almost no high-similarity We think that this might be due to isomorphism in the clustering results Why: The variability in the number of clusters with similar MQ that we observed from the structural landscape
Drexel University Software Engineering Research Group (SERG) 19 Conclusions Ideally evaluation can be performed by comparing Bunch’s results to a benchmark Not possible – Graph partitioning is NP-Hard Empirical feedback indicates that the results are “good enough” Up to this point and time no investigation has been performed on why Bunch produces consistent results The Search Landscape model provided a lot of intuition into Bunch’s behavior We examined both the structural and similarity aspects of the search landscape The Search Landscape approach seems appropriate for modeling other metaheuristic search algorithms