Investigating JAVA Classes with Formal Concept Analysis Uri Dekel Based on M.Sc. work at the Israeli Institute of Technology. To appear: 10 th Working Conference on Reverse Engineering (WCRE’03), and as a poster in OOPSLA’ Software Research Seminar (SSSG)
9/25/2003 Investigating Classes with FCA, Uri Dekel, Software Research Seminar 2 Outline Research goals and hypotheses Research goals and hypotheses A crash-course in formal concept analysis A crash-course in formal concept analysis Interface visualization Interface visualization Reasoning about class implementation. Reasoning about class implementation. Applications to code inspection Applications to code inspection Additional research Additional research
9/25/2003 Investigating Classes with FCA, Uri Dekel, Software Research Seminar 3 Goals Research question: Research question: ``Can we exploit the data-member based cohesion between function-methods in a class to reason about the class and discover errors?’’ Specifically: Specifically: 1. Provide faster learning curve for new class users by improving interface presentation 2. Assist reverse engineering by visualizing structure 3. Assist code inspection by suggesting reading order Important principle: keep it simple to use and learn. Important principle: keep it simple to use and learn.
9/25/2003 Investigating Classes with FCA, Uri Dekel, Software Research Seminar 4 Hypothesis #1 Data-member use is fundamental to understanding a class. Data-member use is fundamental to understanding a class. All possible implementations of an operation will use the same fields All possible implementations of an operation will use the same fields Representation changes are rare Representation changes are rare Basis for cohesion-based metrics (e.g., LCOM) Basis for cohesion-based metrics (e.g., LCOM) Analogous to global variable based modularization of procedural code. Analogous to global variable based modularization of procedural code.
9/25/2003 Investigating Classes with FCA, Uri Dekel, Software Research Seminar 5 Hypothesis #2 Methods that use the same combination of fields are likely to be related. Methods that use the same combination of fields are likely to be related. e.g., get/set, add/remove, etc. e.g., get/set, add/remove, etc. Even more so due to the ``shopping list approach’’ Even more so due to the ``shopping list approach’’ Promotes complete interfaces using composite methods Promotes complete interfaces using composite methods
9/25/2003 Investigating Classes with FCA, Uri Dekel, Software Research Seminar 6 Means Formal Concept Analysis Formal Concept Analysis Mathematical classification technique Mathematical classification technique Uses binary relation (context) between objects and attributes Uses binary relation (context) between objects and attributes not to be confused with OO terms not to be confused with OO terms Produces a concept lattice (next slide) Produces a concept lattice (next slide) Much literature on applications in various fields Much literature on applications in various fields Example: Context of the Pnt3D class
9/25/2003 Investigating Classes with FCA, Uri Dekel, Software Research Seminar 7 Formal Concept Analysis Input: A context Input: A context O is a set of objects O is a set of objects A is a set of attributes A is a set of attributes R is a binary relation between O and A R is a binary relation between O and A Mapping: Galois Connection Mapping: Galois Connection Common attributes of a set of objects: Common attributes of a set of objects: Common objects of a set of attributes: Common objects of a set of attributes: Output: Concepts s.t. Output: Concepts s.t.
9/25/2003 Investigating Classes with FCA, Uri Dekel, Software Research Seminar 8 A concept lattice is based upon a partial order between concepts: Formal Concept Analysis Example: Concepts of the Pnt3D class
9/25/2003 Investigating Classes with FCA, Uri Dekel, Software Research Seminar 9 Concept Lattices A sparse concept lattice provides an alternate view of the tabular context and the full concept lattice A sparse concept lattice provides an alternate view of the tabular context and the full concept lattice Each concept is a group of objects which have the same attributes Each concept is a group of objects which have the same attributes The attributes are the union of attributes in that concept and all the concept that it dominates The attributes are the union of attributes in that concept and all the concept that it dominates In our case, methods that use the same fields are clustered together In our case, methods that use the same fields are clustered together Reveals structure and asymmetries Reveals structure and asymmetries
9/25/2003 Investigating Classes with FCA, Uri Dekel, Software Research Seminar 10 Interface Visualization The lattice partitions the methods in the interface into equivalence classes The lattice partitions the methods in the interface into equivalence classes Similar methods are heuristically clustered together. Similar methods are heuristically clustered together. An automatic ``feature categorization’’ An automatic ``feature categorization’’ Lattice provides multidimensional connections Lattice provides multidimensional connections Compare with simple lexical lists of methods Compare with simple lexical lists of methods (Note: class is “flattened” to remove inheritance details)
9/25/2003 Investigating Classes with FCA, Uri Dekel, Software Research Seminar 11 Interface Visualization To be effective, multiple methods should appear in each concept, on average To be effective, multiple methods should appear in each concept, on average A lattice can have up to n=2 MIN(|M|,|F|) concepts A lattice can have up to n=2 MIN(|M|,|F|) concepts In a data set of circa 6000 classes: In a data set of circa 6000 classes: In 99.5%, n < M + F In 99.5%, n < M + F In 77.4%, n < M In 77.4%, n < M Example: Concepts vs. Methods in Eclipse.
9/25/2003 Investigating Classes with FCA, Uri Dekel, Software Research Seminar 12 Case Study The Molecule class from CDK The Molecule class from CDK CDK: Chemistry Development Kit CDK: Chemistry Development Kit Open source library of chemistry related classes Open source library of chemistry related classes Developed at the Max Plank institute in Germany Developed at the Max Plank institute in Germany Used in chemistry visualization applications Used in chemistry visualization applications Why the Molecule class? Why the Molecule class? Has a large interface (nearly 75 public members) Has a large interface (nearly 75 public members) The represented entity is familiar to most people The represented entity is familiar to most people Our technique revealed new errors in this class. Our technique revealed new errors in this class.
9/25/2003 Investigating Classes with FCA, Uri Dekel, Software Research Seminar 13 Case Study Lattice structure hints on class structure Lattice structure hints on class structure A lot of independent operations on the left. A lot of independent operations on the left. Similar to a C struct. Similar to a C struct. Cohesive component on the right. Cohesive component on the right.
9/25/2003 Investigating Classes with FCA, Uri Dekel, Software Research Seminar 14 Interface Visualization Multiple methods with the similar signatures indicate possible repetition. Multiple methods with the similar signatures indicate possible repetition. Inconsistency in naming. Inconsistency in naming. Inconsistencies in return types. Inconsistencies in return types. Because related methods are grouped in concepts, we can notice inconsistencies or repetitions Because related methods are grouped in concepts, we can notice inconsistencies or repetitions
9/25/2003 Investigating Classes with FCA, Uri Dekel, Software Research Seminar 15 Investigate Implementation We examine fields and dependencies between concepts to understand the cohesive component We examine fields and dependencies between concepts to understand the cohesive component Collections of atoms and bonds Collections of atoms and bonds Micro-management of arrays (count field tracks available items) Micro-management of arrays (count field tracks available items) Inconsistencies and broken invariants. Inconsistencies and broken invariants.
9/25/2003 Investigating Classes with FCA, Uri Dekel, Software Research Seminar 16 Investigate Implementation Asymmetries are revealed by examining pairs of related concepts. Asymmetries are revealed by examining pairs of related concepts.
9/25/2003 Investigating Classes with FCA, Uri Dekel, Software Research Seminar 17 Embedded Call Graph A concept lattice clusters methods but does not portray interactions A concept lattice clusters methods but does not portray interactions Call graphs show interaction between methods but layout does not depend on semantics Call graphs show interaction between methods but layout does not depend on semantics Embedded call graph combines the two Embedded call graph combines the two
9/25/2003 Investigating Classes with FCA, Uri Dekel, Software Research Seminar 18 Code Inspection Lattice can help us select a reading order Lattice can help us select a reading order Minimize focus shifts. Minimize focus shifts. Similar methods are read consecutively. Similar methods are read consecutively. We define a global order between concepts. We define a global order between concepts. e.g., each component separately, topological ordering, read by order of layers. e.g., each component separately, topological ordering, read by order of layers. We define a local order between methods in each concept. We define a local order between methods in each concept. e.g., topological ordering, read by order of simplicity, etc. e.g., topological ordering, read by order of simplicity, etc.
9/25/2003 Investigating Classes with FCA, Uri Dekel, Software Research Seminar 19 Tooling Support Batch-mode prototype Batch-mode prototype Produces lattices and metrics Produces lattices and metrics Database-support for metrics and statistics research Database-support for metrics and statistics research Interactive Eclipse plug-in prototype Interactive Eclipse plug-in prototype Adds an additional view for a.java files Adds an additional view for a.java files Uses simplistic external static analyzer. Uses simplistic external static analyzer. Limited by current 2D capabilities of eclipse. Limited by current 2D capabilities of eclipse.
9/25/2003 Investigating Classes with FCA, Uri Dekel, Software Research Seminar 20 Research Directions Conduct user studies to validate methodology Conduct user studies to validate methodology Preliminary user-studies provided good feedback Preliminary user-studies provided good feedback Lattice-based metrics suite Lattice-based metrics suite Application to class design in CASE tools Application to class design in CASE tools Interactive class diagram editor based on concept lattice Interactive class diagram editor based on concept lattice Semantics assigned by connecting methods to fields. Compare with simply adding methods to a list as in current tools. Semantics assigned by connecting methods to fields. Compare with simply adding methods to a list as in current tools.
9/25/2003 Investigating Classes with FCA, Uri Dekel, Software Research Seminar 21 Research Directions Class-wide “diffing” Class-wide “diffing” Provide birds-eye view of changed areas. Provide birds-eye view of changed areas. Example: Differences between the original version of the “ Graph ” class of VGJ (Visualizing Graphs with Java) and the Technion adaptation of that class. Original appear in bold font, modifications appear in plain font
Backup Material
9/25/2003 Investigating Classes with FCA, Uri Dekel, Software Research Seminar 23 Graph Class