UNIVERSITAS SCIENTIARUM SZEGEDIENSIS UNIVERSITY OF SZEGED D epartment of Software Engineering New Conceptual Coupling and Cohesion Metrics for Object-Oriented.

Slides:



Advertisements
Similar presentations
CLUSTERING SUPPORT FOR FAULT PREDICTION IN SOFTWARE Maria La Becca Dipartimento di Matematica e Informatica, University of Basilicata, Potenza, Italy
Advertisements

Overview Introduces a new cohesion metric called Conceptual Cohesion of Classes (C3) and uses this metric for fault prediction Compares a new cohesion.
Dimensionality Reduction PCA -- SVD
COLLABORATIVE FILTERING Mustafa Cavdar Neslihan Bulut.
Prediction of fault-proneness at early phase in object-oriented development Toshihiro Kamiya †, Shinji Kusumoto † and Katsuro Inoue †‡ † Osaka University.
Software Fault Prediction using Language Processing Dave Binkley Henry Field Dawn Lawrie Maurizio Pighin Loyola College in Maryland Universita’ degli Studi.
Comparison of information retrieval techniques: Latent semantic indexing (LSI) and Concept indexing (CI) Jasminka Dobša Faculty of organization and informatics,
São Paulo Advanced School of Computing (SP-ASC’10). São Paulo, Brazil, July 12-17, 2010 Looking at People Using Partial Least Squares William Robson Schwartz.
What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.
Nov R. McFadyen1 Metrics Fan-in/fan-out Lines of code Cyclomatic complexity* Comment percentage Length of identifiers Depth of conditional.
Latent Semantic Analysis
Hinrich Schütze and Christina Lioma
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Page 1 Building Reliable Component-based Systems Chapter 7 - Role-Based Component Engineering Chapter 7 Role-Based Component Engineering.
Analysis of CK Metrics “Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults” Yuming Zhou and Hareton Leung,
Latent Semantic Indexing via a Semi-discrete Matrix Decomposition.
1 Latent Semantic Indexing Jieping Ye Department of Computer Science & Engineering Arizona State University
The Conceptual Coupling Metrics for Object-Oriented Systems
TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.
March R. McFadyen1 Software Metrics Software metrics help evaluate development and testing efforts needed, understandability, maintainability.
1/ 30. Problems for classical IR models Introduction & Background(LSI,SVD,..etc) Example Standard query method Analysis standard query method Seeking.
CSE 300: Software Reliability Engineering Topics covered: Software metrics and software reliability.
CSE 300: Software Reliability Engineering Topics covered: Software metrics and software reliability Software complexity and software quality.
Object Oriented Metrics XP project group – Saskia Schmitz.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
INFO415 Approaches to System Development: Part 2
Latent Semantic Indexing Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Refactoring.
Japan Advanced Institute of Science and Technology
Paradigm Independent Software Complexity Metrics Dr. Zoltán Porkoláb Department of Programming Languages and Compilers Eötvös Loránd University, Faculty.
INF 141 COURSE SUMMARY Crista Lopes. Lecture Objective Know what you know.
SOFTWARE DESIGN.
Software Measurement & Metrics
1 OO Metrics-Sept2001 Principal Components of Orthogonal Object-Oriented Metrics Victor Laing SRS Information Services Software Assurance Technology Center.
CpSc 881: Information Retrieval. 2 Recall: Term-document matrix This matrix is the basis for computing the similarity between documents and queries. Today:
Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.
June 5, 2006University of Trento1 Latent Semantic Indexing for the Routing Problem Doctorate course “Web Information Retrieval” PhD Student Irina Veredina.
1 Advanced Software Architecture Muhammad Bilal Bashir PhD Scholar (Computer Science) Mohammad Ali Jinnah University.
SINGULAR VALUE DECOMPOSITION (SVD)
Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004.
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
CSc 461/561 Information Systems Engineering Lecture 5 – Software Metrics.
Techniques for Collaboration in Text Filtering 1 Ian Soboroff Department of Computer Science and Electrical Engineering University of Maryland, Baltimore.
1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret
Multi-Abstraction Concern Localization Tien-Duy B. Le, Shaowei Wang, and David Lo School of Information Systems Singapore Management University 1.
Chapter 19: Interfaces and Components [Arlow and Neustadt, 2005] University of Nevada, Reno Department of Computer Science & Engineering.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Natural Language Processing Topics in Information Retrieval August, 2002.
Algebraic Techniques for Analysis of Large Discrete-Valued Datasets 
Evaluating C++ Design Pattern Miner Tools Lajos Jenő Fülöp 1, Tamás Gyovai 2 and Rudolf Ferenc 1 1 Department of Software Engineering University of Szeged,
1 CS 8803 AIAD (Spring 2008) Project Group#22 Ajay Choudhari, Avik Sinharoy, Min Zhang, Mohit Jain Smart Seek.
Image Retrieval and Ranking using L.S.I and Cross View Learning Sumit Kumar Vivek Gupta
Experience Report: System Log Analysis for Anomaly Detection
Singular Value Decomposition and its applications
A Hierarchical Model for Object-Oriented Design Quality Assessment
Document Clustering Based on Non-negative Matrix Factorization
School of Computer Science & Engineering
Unified Modeling Language
Trevor Savage, Bogdan Dit, Malcom Gethers and Denys Poshyvanyk
Chapter 19: Interfaces and Components
Predict Failures with Developer Networks and Social Network Analysis
Modeling Architectures
Chapter 19: Interfaces and Components
Chapter 19: Interfaces and Components
Block Matching for Ontologies
Interfaces and Components
Restructuring Sparse High Dimensional Data for Effective Retrieval
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Chapter 19: Interfaces and Components
Latent Semantic Analysis
Presentation transcript:

UNIVERSITAS SCIENTIARUM SZEGEDIENSIS UNIVERSITY OF SZEGED D epartment of Software Engineering New Conceptual Coupling and Cohesion Metrics for Object-Oriented Systems Béla Újházi 1, Rudolf Ferenc 1, Denys Poshyvanyk 2 and Tibor Gyimóthy 1 1 University of Szeged, Hungary 2 The College of William and Mary, USA

UNIVERSITY OF SZEGED D epartment of Software Engineering UNIVERSITAS SCIENTIARUM SZEGEDIENSIS SCAM Background  Coupling and cohesion measures capture the degree of interaction and relationships among source code elements  Most coupling and cohesion metrics rely on structural information, which capture relations, such as method calls or attribute usages  Structural metrics lack the ability to identify conceptual links, which specify implicit relationships encoded in identifiers and comments in the source code

UNIVERSITY OF SZEGED D epartment of Software Engineering UNIVERSITAS SCIENTIARUM SZEGEDIENSIS SCAM New metrics  CCBO ■Conceptual Coupling between Object classes ■Number of classes which are conceptually related to a class  CLCOM5 ■Conceptual Lack of Cohesion on Methods ■Number of strongly connected components in a class when a class is considered as a graph whose vertices are the methods and the edges are the conceptual relations among the methods

UNIVERSITY OF SZEGED D epartment of Software Engineering UNIVERSITAS SCIENTIARUM SZEGEDIENSIS SCAM Conceptual relations  The identifiers and comments are parsed and transformed into a corpus of textual documents ■each document corresponds to the implementation of a method  LSI (Latent Semantic Indexing) technique takes the corpus and creates a term-by-document matrix ■captures the dispersion and co-occurrence of terms in class methods  SVD (Singular Value Decomposition) constructs a subspace, referred to as the LSI subspace ■methods from the matrix are represented as vectors in the LSI subspace  The cosine similarity between two vectors is the measure of conceptual similarity between two methods  Two methods are conceptually related if their conceptual similarity is greater than a given threshold

UNIVERSITY OF SZEGED D epartment of Software Engineering UNIVERSITAS SCIENTIARUM SZEGEDIENSIS SCAM Advantages  Simpler to compute  Basically programming language independent  CCBO & CLCOM5, when used in conjunction, can predict bugs nearly as precisely as the combination of 58 structural metrics available in the Columbus source code quality framework and can be effectively combined with these metrics to improve bug prediction in Mozilla source code (C++)

UNIVERSITY OF SZEGED D epartment of Software Engineering UNIVERSITAS SCIENTIARUM SZEGEDIENSIS SCAM Empirical evaluation  Research question #1 ■Are the new metrics, CCBO and CLCOM5, orthogonal as compared to existing structural and conceptual coupling and cohesion metrics?  PCA did not reveal new components because of the new metrics  Correlation analysis showed that ■CCBO correlated with CBO and RFC (coefficient between ) ■CLCOM5 correlated with LOC, NOI, CBO, RFC, WMC (coefficient above 0.7)  The new metrics are not othogonal, but are cheaper to compute

UNIVERSITY OF SZEGED D epartment of Software Engineering UNIVERSITAS SCIENTIARUM SZEGEDIENSIS SCAM Empirical evaluation  Research question #2 ■How does stemming impact accuracy of CCBO and CLCOM5 for predicting fault-proneness of classes?  Result of combining new conceptual metrics (without stemming) for predicting fault-proneness is comparable to the combination of structural metrics  Models based on conceptual metrics are able to outperform the models based on structural metrics in terms of recall (84.2% vs. 73%) and F-measure (73.9% vs. 72.4%)  Combining conceptual metrics with stemming and all the structural metrics produces the best values for accuracy (72.7%), precision (82.1%), recall (74.7%) and F-measure (74.2%)  Stemming improves the results

UNIVERSITY OF SZEGED D epartment of Software Engineering UNIVERSITAS SCIENTIARUM SZEGEDIENSIS SCAM Empirical evaluation  Research question #3 ■What is the optimal threshold for CCBO and CLCOM5 for predicting fault-prone classes?  CLCOM5 peak performance of 68.5% in accuracy is observed in the interval of [0.7, 0.8]  CCBO best accuracy is observed at threshold 0.05

UNIVERSITY OF SZEGED D epartment of Software Engineering UNIVERSITAS SCIENTIARUM SZEGEDIENSIS SCAM Empirical evaluation  Research question #4 ■Does combining CCBO and CLCOM5 with existing structural and conceptual cohesion and coupling metrics improve accuracy of predicting fault-prone classes?  Combining CCBO and CLCOM5 with structural metrics prediction models are more robust than combinations of existing conceptual metric C3 and structural metrics ■Accuracy (73% vs. 72.3%) ■Precision (81.1% vs. 80.9%) ■Recall (72.9% vs. 73.9%)

UNIVERSITY OF SZEGED D epartment of Software Engineering UNIVERSITAS SCIENTIARUM SZEGEDIENSIS SCAM Our controversial statement It is enough to analyze only source code identifiers and comments to reliably measure coupling and cohesion! Thanks for your attention!