An Analysis of Machine Learning Algorithms for Condensing Reverse Engineered Class Diagrams Hafeez Osman, Michel R.V. Chaudron and Peter van der Putten.

Slides:



Advertisements
Similar presentations
D ON ’ T G ET K ICKED – M ACHINE L EARNING P REDICTIONS FOR C AR B UYING Albert Ho, Robert Romano, Xin Alice Wu – Department of Mechanical Engineering,
Advertisements

Indian Statistical Institute Kolkata
Lazy vs. Eager Learning Lazy vs. eager learning
Partitioned Logistic Regression for Spam Filtering Ming-wei Chang University of Illinois at Urbana-Champaign Wen-tau Yih and Christopher Meek Microsoft.
SBSE Course 3. EA applications to SE Analysis Design Implementation Testing Reference: Evolutionary Computing in Search-Based Software Engineering Leo.
Lesson learnt from the UCSD datamining contest Richard Sia 2008/10/10.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Reverse Engineering When is it the most cost effective? Raymond Utz.
Sparse vs. Ensemble Approaches to Supervised Learning
WEKA Evaluation of WEKA Waikato Environment for Knowledge Analysis Presented By: Manoj Wartikar & Sameer Sagade.
Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers.
March 25, 2004Columbia University1 Machine Learning with Weka Lokesh S. Shrestha.
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
Sparse vs. Ensemble Approaches to Supervised Learning
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Peter Myers Bitwise Solutions Pty Ltd. Predictive Analytics PresentationExplorationDiscovery Passive Interactive Proactive Business Insight Canned.
Classifiers, Part 3 Week 1, Video 5 Classification  There is something you want to predict (“the label”)  The thing you want to predict is categorical.
Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang.
ANALYTICS BUSINESS INTELLIGENCE SOFTWARE STATISTICS Kreara Solutions | 9 years | 60 members | ISO 9001:2008.
COMP3503 Intro to Inductive Modeling
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Appendix: The WEKA Data Mining Software
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Machine Learning with Weka Cornelia Caragea Thanks to Eibe Frank for some of the slides.
Trees Lives Temp>30° Lives Dies Temp
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Loan Default Model Saed Sayad 1www.ismartsoft.com.
Introduction Use machine learning and various classifying techniques to be able to create an algorithm that can decipher between spam and ham s. .
Gary M. Weiss Alexander Battistin Fordham University.
Machine Learning Documentation Initiative Workshop on the Modernisation of Statistical Production Topic iii) Innovation in technology and methods driving.
Software Architecture Evaluation Methodologies Presented By: Anthony Register.
Nurissaidah Ulinnuha. Introduction Student academic performance ( ) Logistic RegressionNaïve Bayessian Artificial Neural Network Student Academic.
Weka – A Machine Learning Toolkit October 2, 2008 Keum-Sung Hwang.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
Generating Software Documentation in Use Case Maps from Filtered Execution Traces Edna Braun, Daniel Amyot, Timothy Lethbridge University of Ottawa, Canada.
Introduction to Weka Xingquan (Hill) Zhu Slides copied from Jeffrey Junfeng Pan (UST)
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall DM Finals Study Guide Rodney Nielsen.
Finding τ → μ−μ−μ+ Decays at LHCb with Data Mining Algorithms
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
A Brief Introduction and Issues on the Classification Problem Jin Mao Postdoc, School of Information, University of Arizona Sept 18, 2015.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.
Meta-learning for Algorithm Recommendation Meta-learning for Algorithm Recommendation Background on Local Learning Background on Algorithm Assessment Algorithm.
High Throughput and Programmable Online Traffic Classifier on FPGA Author: Da Tong, Lu Sun, Kiran Kumar Matam, Viktor Prasanna Publisher: FPGA 2013 Presenter:
A Survey-based Study of Maintainability Metrics Luiz Paulo Coelho Ferreira
FNA/Spring CENG 562 – Machine Learning. FNA/Spring Contact information Instructor: Dr. Ferda N. Alpaslan
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Introduction to Machine Learning, its potential usage in network area,
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Learning to Detect and Classify Malicious Executables in the Wild by J
Stephan Nathanael Mgaya
R Data Mining for Insurance Retention Modeling
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Pfizer HTS Machine Learning Algorithms: November 2002
Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.
Source: Procedia Computer Science(2015)70:
Introduction Feature Extraction Discussions Conclusions Results
Machine Learning Week 1.
Prepared by: Mahmoud Rafeek Al-Farra
Opening Weka Select Weka from Start Menu Select Explorer Fall 2003
Machine Learning with Clinical Data
Exploring Complexity Metrics as Indicators of Software Vulnerability
Assignment 1: Classification by K Nearest Neighbors (KNN) technique
Sofia Pediaditaki and Mahesh Marina University of Edinburgh
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Credit Card Fraudulent Transaction Detection
Presentation transcript:

An Analysis of Machine Learning Algorithms for Condensing Reverse Engineered Class Diagrams Hafeez Osman, Michel R.V. Chaudron and Peter van der Putten Leiden University, Leiden, the Netherlands Chalmers University of Technology and Goteborg University, Gothenburg, Sweden Luiz Paulo Coelho Ferreira

Introduction Up-to-date design documentation is important. UML models created during the design are often poorly kept up to date during development and maintenance. For legacy software, up-to-date designs are valuable for maintaining such systems and is hard to find. This paper is partially motivated by a scenario where new programmers want to join a development team. Luiz Paulo Coelho Ferreira 2

Research Problem This paper specifically aims at providing suitable classification algorithms to decide which classes should be included in a class diagram. They seek an automated approach to classify the key classes in a class diagram. Luiz Paulo Coelho Ferreira 3

Contribution They explore 9 classification algorithms for predicting key classes that should be included in a class diagram. Evaluated 9 open sources systems, with 59 to 903 classes. Luiz Paulo Coelho Ferreira 4

Research Questions RQ1: Which individual predictors are influential for the classification? RQ2: How robust is the classification to the inclusion of categories of predictors? RQ3: What are suitable classification algorithms in classifying key classes? Luiz Paulo Coelho Ferreira 5

Machine Learning Univariate Analysis Checks the predictor who has more influence Machine Learning Classification Algorithm: J48 Decision Tree, k-Nearest Neighbor, Logistic Regression, Naive Bayes, Decision Tables, Decision Stumps, Radial Basis Function Networks, Random Forests and Random Trees. Luiz Paulo Coelho Ferreira 6

Machine Learning Evaluation Method: Univariate Analysis they used InfoGain Attribute Evaluator (InfoGain). Classification Algorithms were evaluated by Area Under ROC curve (AUC). Luiz Paulo Coelho Ferreira 7

Approach Examined Predictors and Tools Case Studies Process Luiz Paulo Coelho Ferreira 8

Predictors and Tools Reverse Engineering: MagicDraw Software Metrics: SDMetrics Data Mining: WEKA Luiz Paulo Coelho Ferreira 9

Case Studies Criteria: Open Source Project Must have a forward design class diagram 50+ classes Luiz Paulo Coelho Ferreira 10

Process Luiz Paulo Coelho Ferreira 11

Evaluation RQ1: Which individual predictors are influential for the classification? Luiz Paulo Coelho Ferreira 12

Evaluation RQ2: How robust is the classification to the inclusion of categories of predictors? Luiz Paulo Coelho Ferreira 13

Evaluation RQ2: How robust is the classification to the inclusion of categories of predictors? Luiz Paulo Coelho Ferreira 14

Evaluation RQ3: What are suitable classification algorithms in classifying key classes? Luiz Paulo Coelho Ferreira 15

Evaluation RQ3: What are suitable classification algorithms in classifying key classes? Luiz Paulo Coelho Ferreira 16

Discussion and Future Work Export Coupling Parameter (EC Par), Dependency In (Dep In) and Number of Operation (NumOps) were the most influential predictors. K-NN(5) and Random Forest were the best algorithms, and they can be combined to find better solutions. Wasn’t able to produce high values of AUC. Could use different metrics. Evolve the “ground truth” to be iterative or use version control mining Luiz Paulo Coelho Ferreira 17

Threats to Validity This study assumed that all the classes that existed in the forward designs were the important classes. The input of this study is dependent on the MagicDraw CASE tools. We only cover 9 open source case studies. Luiz Paulo Coelho Ferreira 18

Conclusion They propose an approach for condensing reverse engineered class diagram by selecting the key classes in it. Evaluates the influential predictors in classifying key classes and compares various machine learning classification algorithms on 9 case studies. Export Coupling Parameter, Dependency In and Number of Operation are the most influential predictors for predicting key classes On these predictor sets, Random Forest and k-Nearest Neighbor provided the best results Luiz Paulo Coelho Ferreira 19

Questions? ?????????????? Luiz Paulo Coelho Ferreira 20