The Conceptual Coupling Metrics for Object-Oriented Systems

Slides:



Advertisements
Similar presentations
Text mining Gergely Kótyuk Laboratory of Cryptography and System Security (CrySyS) Budapest University of Technology and Economics
Advertisements

Covariance Matrix Applications
Clustering Basic Concepts and Algorithms
Overview Introduces a new cohesion metric called Conceptual Cohesion of Classes (C3) and uses this metric for fault prediction Compares a new cohesion.
2 Information Retrieval System IR System Query String Document corpus Ranked Documents 1. Doc1 2. Doc2 3. Doc3.
1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda.
Dimensionality Reduction PCA -- SVD
INF 141 IR METRICS LATENT SEMANTIC ANALYSIS AND INDEXING Crista Lopes.
What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.
Principal Component Analysis
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Face Recognition Under Varying Illumination Erald VUÇINI Vienna University of Technology Muhittin GÖKMEN Istanbul Technical University Eduard GRÖLLER Vienna.
Principal Component Analysis
1 Latent Semantic Indexing Jieping Ye Department of Computer Science & Engineering Arizona State University
Vector Space Information Retrieval Using Concept Projection Presented by Zhiguo Li
Overview. Why data structures is a key course Main points from syllabus Survey Warmup program And now to get started...
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005
Latent Dirichlet Allocation a generative model for text
Singular Value Decomposition in Text Mining Ram Akella University of California Berkeley Silicon Valley Center/SC Lecture 4b February 9, 2011.
TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.
Indexing by Latent Semantic Analysis Scot Deerwester, Susan Dumais,George Furnas,Thomas Landauer, and Richard Harshman Presented by: Ashraf Khalil.
1/ 30. Problems for classical IR models Introduction & Background(LSI,SVD,..etc) Example Standard query method Analysis standard query method Seeking.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
Michael W. Berry Xiaoyan (Kathy) Zhang Padma Raghavan Department of Computer Science University of Tennessee Level Search Filtering for IR Model Reduction.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD
Presented By Wanchen Lu 2/25/2013
Automated Essay Grading Resources: Introduction to Information Retrieval, Manning, Raghavan, Schutze (Chapter 06 and 18) Automated Essay Scoring with e-rater.
1. 2  Have a basic understanding of the fundamental principles of object-oriented software development.  Understand a selection of the design patterns.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
UNIVERSITAS SCIENTIARUM SZEGEDIENSIS UNIVERSITY OF SZEGED D epartment of Software Engineering New Conceptual Coupling and Cohesion Metrics for Object-Oriented.
On Scaling Latent Semantic Indexing for Large Peer-to-Peer Systems Chunqiang Tang, Sandhya Dwarkadas, Zhichen Xu University of Rochester; Yahoo! Inc. ACM.
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
Latent Semantic Indexing: A probabilistic Analysis Christos Papadimitriou Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala.
Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.
June 5, 2006University of Trento1 Latent Semantic Indexing for the Routing Problem Doctorate course “Web Information Retrieval” PhD Student Irina Veredina.
LATENT SEMANTIC INDEXING Hande Zırtıloğlu Levent Altunyurt.
SINGULAR VALUE DECOMPOSITION (SVD)
1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh
SOFTWARE DESIGN. INTRODUCTION There are 3 distinct types of activities in design 1.External design 2.Architectural design 3.Detailed design Architectural.
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 6. Dimensionality Reduction.
A Clustering Method Based on Nonnegative Matrix Factorization for Text Mining Farial Shahnaz.
Clustering More than Two Million Biomedical Publications Comparing the Accuracies of Nine Text-Based Similarity Approaches Boyack et al. (2011). PLoS ONE.
Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar.
1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret
© 2006 Pearson Addison-Wesley. All rights reserved 2-1 Chapter 2 Principles of Programming & Software Engineering.
V. Clustering 인공지능 연구실 이승희 Text: Text mining Page:82-93.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Principle Component Analysis and its use in MA clustering Lecture 12.
1 Modularity Analysis of Use Case Implementations Fernanda d’Amorim Advisor: Paulo Borba.
A Patent Document Retrieval System Addressing Both Semantic and Syntactic Properties Liang Chen*,Naoyuki Tokuda+, Hisahiro Adachi+ *University of Northern.
1 Predicting Classes in Need of Refactoring – An Application of Static Metrics Liming Zhao Jane Hayes 23 September 2006.
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
DATA MINING LECTURE 8 Sequence Segmentation Dimensionality Reduction.
Information Bottleneck Method & Double Clustering + α Summarized by Byoung Hee, Kim.
Vector Semantics Dense Vectors.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Principal Component Analysis (PCA)
Document Clustering Based on Non-negative Matrix Factorization
Outlier Processing via L1-Principal Subspaces
Vector-Space (Distributional) Lexical Semantics
Trevor Savage, Bogdan Dit, Malcom Gethers and Denys Poshyvanyk
Restructuring Sparse High Dimensional Data for Effective Retrieval
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Latent Semantic Analysis
Presentation transcript:

The Conceptual Coupling Metrics for Object-Oriented Systems Denys Poshyvanyk and Andrian Marcus SEVERE group @ 22nd IEEE International Conference on Software Maintenance Philadelphia, Pennsylvania September 27, 2006

Motivation Concepts and classes Implementation and representation of concepts Semantic information

Example Methods from MySecMan class in Mozilla

Approach Latent Semantic Indexing Advantages: captures essential semantic info via dimensionality reduction overcomes problems with polysemy and synonymy easy to apply on the source code

Related Work Coupling measures Previously solved problems: Traceability link recovery Managing software artifacts Conceptual cohesion Software clustering Concept/feature location Requirements traceability Isolating concerns in requirements

Extracting Semantic Info Source code -> Corpus (doc = method) Preprocessing: split_identifiers & SplitIdentifiers Vector space = term-by-document matrix Singular Value Decomposition -> LSI subspace

Computing Conceptual Similarity Cosine between vectors

Conceptual Coupling between Classes Method - Class conceptual similarity Class - Class conceptual similarity Conceptual coupling between A and B = 0.4 Class A Class B 0.5 method1 0.6 method1 0.5 0.2 0.7 method2 0.3 0.4 method2 0.4 0.3 0.2 0.4 method3 method3 0.3

Maximal Conceptual Coupling Conceptual coupling based on the strongest conceptual coupling link Conceptual coupling between A and B = 0.56 Class A Class B 0.5 method1 0.6 method1 0.7 0.2 0.7 method2 0.3 0.4 method2 0.6 0.3 0.2 0.4 method3 method3 0.4

Are We Measuring Anything New? Compare with other coupling measures: Coupling between classes (CBO) [Chidamber’04] Response for class (RFC) [Chidamber’04] Message passing coupling (MPC) [Li’93] Data abstraction coupling (DAC) [Li’93] Information-flow based coupling (IPC) [Lee’95] A suite of coupling measures by Briand et al: ACAIC, OCAIC, ACMIC and OCMIC Tools: Columbus [Ferenc’04] IRC2M

Software Systems Ten open-source systems from different domains

Principal Component Analysis Identifying groups of metrics (variables) which measure the same underlying mechanism that defines coupling (dimension) PCA procedure: collect data identify outliers perform PCA

PCA Results: Rotated Components CoCC and CoCCm define new dimensions (PC2 and PC6)

Discussion of the Results Conceptual similarities between all pairs of classes Selected classes with highest values of conceptual coupling No direct structural dependencies

Discussion of the Results Cont. Concepts TortoiseCVS: merge and update CVS operations WinMerge: checking out a revision of the file Related concepts and history of common changes

Current & Future Work Connection to change/fault proneness Impact analysis Hidden dependencies/indirect coupling Aspect mining Refining canonical feature sets Concept location and clustering