1 Modeling the Search Landscape of Metaheuristic Software Clustering Algorithms Dagstuhl – Software Architecture Brian S. Mitchell

Slides:

Advertisements

Similar presentations

1 An Architecture for Distributing the Computation of Software Clustering Algorithms 2001 Working Conference on Software Architecture (WICSA'01). Brian.

Advertisements

Crunch: Search-based Hierarchy Generation for State Machines Mathew Hall University of Sheffield.

CMSC 345, Version 11/07 SD Vick from S. Mitchell Software Testing.

1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.

Knowledge Acquisitioning. Definition The transfer and transformation of potential problem solving expertise from some knowledge source to a program.

Software Architecture Design Instructor: Dr. Jerry Gao.

Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.

Empirically Assessing End User Software Engineering Techniques Gregg Rothermel Department of Computer Science and Engineering University of Nebraska --

Automated Changes of Problem Representation Eugene Fink LTI Retreat 2007.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Problem Spaces and Search Fall 2008 Jingsong.

1 Instability Visualization and Analysis Jim Whitehead Jennifer Bevan University of California, Santa Cruz

Testing Components in the Context of a System CMSC 737 Fall 2006 Sharath Srinivas.

Company LOGO B2C E-commerce Web Site Quality: an Empirical Examination (Cao, et al) Article overview presented by: Karen Bray Emilie Martin Trung (John)

Using Use Case Scenarios and Operational Variables for Generating Test Objectives Javier J. Gutiérrez María José Escalona Manuel Mejías Arturo H. Torres.

1 An introduction to design patterns Based on material produced by John Vlissides and Douglas C. Schmidt.

Impact Analysis of Database Schema Changes Andy Maule, Wolfgang Emmerich and David S. Rosenblum London Software Systems Dept. of Computer Science, University.

University of Toronto Department of Computer Science © 2001, Steve Easterbrook CSC444 Lec22 1 Lecture 22: Software Measurement Basics of software measurement.

272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 17: Code Mining.

1 Using Heuristic Search Techniques to Extract Design Abstractions from Source Code The Genetic and Evolutionary Computation Conference (GECCO'02). Brian.

Unit 2: Engineering Design Process

1 Object-Oriented Testing CIS 375 Bruce R. Maxim UM-Dearborn.

Genetic Programming on Program Traces as an Inference Engine for Probabilistic Languages Vita Batishcheva, Alexey Potapov

Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.

QUALITATIVE MODELING IN EDUCATION Bert Bredweg and Ken Forbus Yeşim İmamoğlu.

GrIDS -- A Graph Based Intrusion Detection System For Large Networks Paper by S. Staniford-Chen et. al.

Automatic Identification of Concurrency in Handel-C Joseph C Libby, Kenneth B Kent, Farnaz Gharibian Faculty of Computer Science University of New Brunswick.

Parallel and Distributed Computing in Model Checking Diana DUBU (UVT) Dana PETCU (IeAT, UVT)

Basic concepts in ordination

SOFTWARE REUSE 28 March 2013 William W. McMillan.

Final Year Project Interim Presentation Software Visualisation and Comparison Tool Presented By : Shane Lillis, , 4th Year Computer Engineering.

Generic Approaches to Model Validation Presented at Growth Model User’s Group August 10, 2005 David K. Walters.

Chapter 12 Evaluating Products, Processes, and Resources.

Hyper/J and Concern Manipulation Environment. The need for AOSD tools and development environment AOSD requires a variety of tools Life cycle – support.

Chapter 14 Part II: Architectural Adaptation BY: AARON MCKAY.

Assessing the Frequency of Empirical Evaluation in Software Modeling Research Workshop on Experiences and Empirical Studies in Software Modelling (EESSMod)

Software Development Cycle What is Software? Instructions (computer programs) that when executed provide desired function and performance Data structures.

1 A Heuristic Approach Towards Solving the Software Clustering Problem ICSM03 Brian S. Mitchell /

Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge.

Black-box Testing.

1 Evaluating Code Duplication Detection Techniques Filip Van Rysselberghe and Serge Demeyer Lab On Re-Engineering University Of Antwerp Towards a Taxonomy.

Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.

Assessing Peer Support and Usability of Blogging Technology Yao Jen Chang Department of Electronic Engineering Chung-Yuan Christian University, Taiwan.

1 The Search Landscape of Graph Partitioning Problems using Coupling and Cohesion as the Clustering Criteria Brian S. Mitchell & Spiros Mancoridis

Prepared by: Mahmoud Rafeek Al-Farra

Software Design Process

1 Synthesizing Datapath Circuits for FPGAs With Emphasis on Area Minimization Andy Ye, David Lewis, Jonathan Rose Department of Electrical and Computer.

CHAPTER OVERVIEW Say Hello to Inferential Statistics The Idea of Statistical Significance Significance Versus Meaningfulness Meta-analysis.

© Drexel University Software Engineering Research Group (SERG) 1 The OASIS SOA Reference Model Brian Mitchell.

© SERG Reverse Engineering (REportal) REportal: Reverse Engineering Portal (reportal.cs.drexel.edu)

Dynamic Testing.

Experiences With gRAVI Brian Tieman Beamline Controls and Data Acquisition Advanced Photon Source.

Computer Programming 12 Lesson 6 – Loop structure By: Dan Lunney.

M&CML: A Monitoring & Control Specification Modeling Language

Chapter 8 Introducing Inferential Statistics.

Introduction to Algorithms

Chapter 8 Environments, Alternatives, and Decisions.

Netscape Application Server

Linear Regression.

Computational Reasoning in High School Science and Math

Fast Kernel-Density-Based Classification and Clustering Using P-Trees

Harry Xu University of California, Irvine & Microsoft Research

Benchmarking CAD Search Techniques

CSSSPEC6 SOFTWARE DEVELOPMENT WITH QUALITY ASSURANCE

Course: Module: Lesson # & Name Instructional Material 1 of 32 Lesson Delivery Mode: Lesson Duration: Document Name: 1. Professional Diploma in ERP Systems.

Model Base Validation Techniques for Software

Automated Software Integration

Introduction to Algorithms

Department of Computer Science Abdul Wali Khan University Mardan

Title of your experimental design

2001 IEEE International Conference on Software Maintenance (ICSM'01).

Presentation transcript:

1 Modeling the Search Landscape of Metaheuristic Software Clustering Algorithms Dagstuhl – Software Architecture Brian S. Mitchell or Department of Computer Science College of Engineering Drexel University Philadelphia, PA, USA

Drexel University Software Engineering Research Group (SERG) 2 Understanding Large Systems is HARD Example: RedHat Linux 7.1 Kernel 1,400 modules, 2.5M LOC System 350K modules, 30M LOC Languages: > 19 (including scripting) [ Manual Analysis is Tedious and Error Prone Source Code Analysis Approaches Create Large Repositories Software Clustering Approaches Create Abstract Representations (1) (2) (3)

Drexel University Software Engineering Research Group (SERG) 3 Software Clustering Bunch Tool Requires a Representation... …A Clustering Algorithm… …And a way to Represent Results… Researchers Have Examined Many Different Approaches for Software Clustering

Drexel University Software Engineering Research Group (SERG) 4 Search-Based Software Clustering with Bunch Bunch Uses Metaheuristic Search Algorithms for Software Clustering

Drexel University Software Engineering Research Group (SERG) 5 Bunch Example The MDG The Random Start Point The Solution

Drexel University Software Engineering Research Group (SERG) 6 Evaluating Bunch’s Results Observation: Bunch produces similar results This is desirable, but This is unexpected considering the use of metaheuristic search algorithms Some evaluation has been done “Good Enough” via empirical studies Similarity Analysis [WCRE01,ICSM01] Comparing to spectral clustering techniques [WCRE02] We were intrigued to investigate why Bunch’s results are consistently similar Bunch Produces A “Family” of Related Results

Drexel University Software Engineering Research Group (SERG) 7 The Search Landscape Search Landscape Modeler Structural LandscapeSimilarity Landscape What are some common properties, if any, in the MDG partitions? How similar are the contents of the MDG partitions? MDG Bunch Tool Clustering Results Cluster a System Many Times, Look for Patterns in the Clustering Results that Provide Insight into the Search Space Can Modeling the Search Space be useful for Evaluation?

Drexel University Software Engineering Research Group (SERG) 8 The Structural Landscape – What do we Expect? The Structural Landscape is Modeled using a Series of Views MQ vs Number of Clusters Intra- Edge Density MQ Value Number of Clusters We expect to see a relationship between MQ and the number of clusters. Both MQ and the number of clusters in the partitioned MDG should not vary widely across clustering runs. We expect a good result to produce a high percentage of intraedges (edges that start and end in the same cluster) consistently. We expect repeated clustering runs to produce similar MQ results. We expect that the number of clusters remains relatively consistent across multiple clustering runs. Comparing Bunch’s Final Results against the Initial Random Partitioned MDG

Drexel University Software Engineering Research Group (SERG) 9 The Similarity Landscape – What do we Expect? a bc CLUSTER Other Clusters  edges (Intra-Edges)  edges (Inter-Edges) 1.Create a counter C for each edge, initialize to zero 2.Cluster a system many times, For each run: For each edge, Increment C if is an Intraedge 3.After all Runs, determine P which is the percentage of times that each appeares as an Intraedge None Low MediumHigh Aggregate the P based on the level of agreement LARGE Dissimilarity MODERATE Dissimilarity NOT Similar VERY Similar Our Expectations

Drexel University Software Engineering Research Group (SERG) 10 Case Study System Name Number Modules Number Relations Description Telnet2881Terminal Emulator PHP62191Internet Scripting Language Bash92901Unix Terminal Environment Lynx1481,745Text-Based HTML Browser Bunch220764Software Clustering Tool Swing4131,513Standard Java User Interface Framework Kerberos 55583,793Security Services Infrastructure We also looked at 6 randomly generated MDGs

Drexel University Software Engineering Research Group (SERG) 11 Structural Landscape (1) The independent samples were ordered by MQ to highlight some relationships that would not be obvious otherwise.

Drexel University Software Engineering Research Group (SERG) 12 Structural Landscape (2)

Drexel University Software Engineering Research Group (SERG) 13 Structural Landscape (3) – Random MDGs

Drexel University Software Engineering Research Group (SERG) 14 Structural Landscape (4) – Random MDGs

Drexel University Software Engineering Research Group (SERG) 15 Structural Landscape - Observations There was significant commonality across the clustering results Many desirable aspects A lot of commonality between the random and open source systems Some additional variability in the MQ vs Cluster Size relationship for the random MDGs More variability in the clustering results for the random graphs with higher edge densities

Drexel University Software Engineering Research Group (SERG) 16 Similarity Landscape (1) ZeroLowMediumHigh Open Source Systems Random MDGs

Drexel University Software Engineering Research Group (SERG) 17 Similarity Landscape (2) ZeroLowMediumHigh Open Source Systems Random MDGs - Low Random MDGs - High

Drexel University Software Engineering Research Group (SERG) 18 Observations – Similarity Landscape Open Source systems exhibited expected trends High dissimilarity and high similarity Low medium similarity Random MDGs had much higher medium similarity, and almost no high-similarity We think that this might be due to isomorphism in the clustering results  Why: The variability in the number of clusters with similar MQ that we observed from the structural landscape

Drexel University Software Engineering Research Group (SERG) 19 Conclusions Ideally evaluation can be performed by comparing Bunch’s results to a benchmark Not possible – Graph partitioning is NP-Hard Empirical feedback indicates that the results are “good enough” Up to this point and time no investigation has been performed on why Bunch produces consistent results The Search Landscape model provided a lot of intuition into Bunch’s behavior We examined both the structural and similarity aspects of the search landscape The Search Landscape approach seems appropriate for modeling other metaheuristic search algorithms