Class Analysis with Concept Lattices Uri Dekel Department of Computer Science Technion, Haifa, Israel.

Slides:



Advertisements
Similar presentations
Object Oriented Programming
Advertisements

1 Copyright 1998 by Dragos Manolescu and Joseph W. Yoder Building Frameworks With Patterns “An Active Object-Model For A Dynamic Web-Based Application”
Liang, Introduction to Java Programming, Sixth Edition, (c) 2007 Pearson Education, Inc. All rights reserved Chapter 1 Object-Oriented.
Investigating JAVA Classes with Formal Concept Analysis Uri Dekel Based on M.Sc. work at the Israeli Institute of Technology. To appear:
Inheritance and Class Hierarchies Chapter 3. Chapter 3: Inheritance and Class Hierarchies2 Chapter Objectives To understand inheritance and how it facilitates.
Chapter 25 GRASP: More Objects with Responsibilities 1CS6359 Fall 2011 John Cole.
1 Programming for Engineers in Python Autumn Lecture 5: Object Oriented Programming.
Algorithms and Problem Solving-1 Algorithms and Problem Solving.
UML CASE Tool. ABSTRACT Domain analysis enables identifying families of applications and capturing their terminology in order to assist and guide system.
/department of mathematics and computer science Visualization of Transition Systems Hannes Pretorius Visualization Group
Architectural Design Principles. Outline  Architectural level of design The design of the system in terms of components and connectors and their arrangements.
May-June 2001 ISISTAN Research Institute – Tandil, Argentina Software Design Methodologies: UML in Action Dr. Mohamed Fayad, J.D. Edwards Professor Department.
© Copyright Eliyahu Brutman Programming Techniques Course.
Data Abstraction and Object- Oriented Programming CS351 – Programming Paradigms.
Developed by Reneta Barneva, SUNY Fredonia Component Level Design.
Software Issues Derived from Dr. Fawcett’s Slides Phil Pratt-Szeliga Fall 2009.
ECE122 L17: Method Development and Testing April 5, 2007 ECE 122 Engineering Problem Solving with Java Lecture 17 Method Development and Testing.
Architectural Design Establishing the overall structure of a software system Objectives To introduce architectural design and to discuss its importance.
OBJECT ORIENTED PROGRAMMING IN C++ LECTURE
1 Software Testing Techniques CIS 375 Bruce R. Maxim UM-Dearborn.
The chapter will address the following questions:
Introduction SWE 619. Why Is Building Good Software Hard? Large software systems enormously complex  Millions of “moving parts” People expect software.
Basic Concepts The Unified Modeling Language (UML) SYSC System Analysis and Design.
Architectural Design.
Introduction To System Analysis and design
Liang, Introduction to Java Programming, Seventh Edition, (c) 2009 Pearson Education, Inc. All rights reserved Chapter 12 Object-Oriented Design.
Liang, Introduction to Java Programming, Sixth Edition, (c) 2007 Pearson Education, Inc. All rights reserved Chapter 12 Object-Oriented.
Implementation Yaodong Bi. Introduction to Implementation Purposes of Implementation – Plan the system integrations required in each iteration – Distribute.
Mathematical Modeling and Formal Specification Languages CIS 376 Bruce R. Maxim UM-Dearborn.
Introduction CS 3358 Data Structures. What is Computer Science? Computer Science is the study of algorithms, including their  Formal and mathematical.
INT-Evry (Masters IT– Soft Eng)IntegrationTesting.1 (OO) Integration Testing What: Integration testing is a phase of software testing in which.
SOFTWARE DESIGN.
11 Chapter 11 Object-Oriented Databases Database Systems: Design, Implementation, and Management 4th Edition Peter Rob & Carlos Coronel.
Object-Oriented Modeling Chapter 10 CSCI CSCI 1302 – Object-Oriented Modeling2 Outline The Software Development Process Discovering Relationships.
Question of the Day  On a game show you’re given the choice of three doors: Behind one door is a car; behind the others, goats. After you pick a door,
Question of the Day  On a game show you’re given the choice of three doors: Behind one door is a car; behind the others, goats. After you pick a door,
Testing. 2 Overview Testing and debugging are important activities in software development. Techniques and tools are introduced. Material borrowed here.
Introduction CS 3358 Data Structures. What is Computer Science? Computer Science is the study of algorithms, including their  Formal and mathematical.
Black-box Testing.
Chapter 16 Applying UML and Patterns Craig Larman
© 2005 Prentice Hall9-1 Stumpf and Teague Object-Oriented Systems Analysis and Design with UML.
Object Oriented Software Development
Introduction to c++ programming - object oriented programming concepts - Structured Vs OOP. Classes and objects - class definition - Objects - class scope.
Data Structures Using C++ 2E
Software Design Patterns Curtsy: Fahad Hassan (TxLabs)
Object-Oriented Programming Chapter Chapter
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Design Patterns Software Engineering CS 561. Last Time Introduced design patterns Abstraction-Occurrence General Hierarchy Player-Role.
CSC 480 Software Engineering Test Planning. Test Cases and Test Plans A test case is an explicit set of instructions designed to detect a particular class.
Review of Parnas’ Criteria for Decomposing Systems into Modules Zheng Wang, Yuan Zhang Michigan State University 04/19/2002.
Refactoring Agile Development Project. Lecture roadmap Refactoring Some issues to address when coding.
Inheritance and Class Hierarchies Chapter 3. Chapter 3: Inheritance and Class Hierarchies2 Chapter Objectives To understand inheritance and how it facilitates.
Inheritance and Class Hierarchies Chapter 3. Chapter Objectives  To understand inheritance and how it facilitates code reuse  To understand how Java.
Lecture 2: Review of Object Orientation. © Lethbridge/La ganière 2005 Chapter 2: Review of Object Orientation What is Object Orientation? Procedural.
Object-Oriented Design Concepts University of Sunderland.
Recap Introduction to Inheritance Inheritance in C++ IS-A Relationship Polymorphism in Inheritance Classes in Inheritance Visibility Rules Constructor.
Object Design More Design Patterns Object Constraint Language Object Design Specifying Interfaces Review Exam 2 CEN 4010 Class 18 – 11/03.
PROGRAMMING FUNDAMENTALS INTRODUCTION TO PROGRAMMING. Computer Programming Concepts. Flowchart. Structured Programming Design. Implementation Documentation.
WELCOME TO OUR PRESENTATION UNIFIED MODELING LANGUAGE (UML)
Algorithms and Problem Solving
Chapter ? Quality Assessment
Chapter 11 Object-Oriented Design
Chapter 19: Interfaces and Components
Object-Oriented Design
Analysis models and design models
Chapter 10 – Software Testing
Chapter 19: Interfaces and Components
Algorithms and Problem Solving
Revealing Class Structure With Zoomable Concept Lattices
UML  UML stands for Unified Modeling Language. It is a standard which is mainly used for creating object- oriented, meaningful documentation models for.
Presentation transcript:

Class Analysis with Concept Lattices Uri Dekel Department of Computer Science Technion, Haifa, Israel

2 Outline Introduction Formal Concept Analysis Stage I – Interface Analysis Stage II – Implementation Analysis Stage III – Code Inspection Version Comparison Overview of other FCA applications

3 Domain Understanding and analyzing individual Java classes Interface (black-box) analysis Reducing the learning curve Discovering interface problems Implementation (white-box) analysis Understanding class structure and role of fields Discovering implementation problems Code review and inspection Understanding the purpose of each method from its code. Ensuring style, quality, and correctness Discovering code reuse opportunities Version Comparison

4 Problems Classes can be very large and complex OOP practices promote use of many methods Meyer ’ s “ shopping list approach ” advocates completing the interface with “ syntactic-sugar ” methods “ Rules of software evolution ” : The entropy of software artifacts increases with time Delocalisation Definition order not meaningful Fact: A quarter of all public methods are found in classes with more than 100 methods !

5 Research Question Can Formal Concept Analysis (FCA) help alleviate some of these problems? FCA is a mathematical classification technique Helps discover meaningful data in binary relations Can be visualized with Concept Lattices FCA has been applied to many CS and SW problems Automatic modularization Automatic construction and refinement of class hierarchies Reverse engineering complex systems Smart component repositories

6 Formal Concept Analysis Input: A context O is a set of objects A is a set of attributes R is a binary relation between O and A Mapping: Galois Connection Common attributes of a set of objects: Common objects of a set of attributes: Output: Concepts s.t.

7 FCA Example Field-accesses context of a class Objects are fields, attributes are methods, relation specifies which methods access each field Context: Concepts:

8 Concept Lattices Partial order: Defines domination between concepts Visualized as a concept lattice

9 Interpreting Class Lattices We use only sparse lattices Economical but equivalent representation Each object introduced in lowest concept Each attribute introduced in highest concept Interpretation: Each method uses all fields introduced in the same concept or below Reveals: Possible restructuring Asymmetry between coordinates

10 Field-Accesses Context Field usage is critical for understanding a class All implementations of an operation use the same fields Representation changes are rare Methods that use the same combination are related Can be calculated directly from the.class file Allows some reverse engineering without source code Calculated using standard static analysis Currently restricted to accesses inside the class

11 Lattices vs. Tables The lattice and the accesses table contain exactly the same information! Advantages of the table: It is immediately clear what fields are accessed by each method. Advantages of the lattice: Related methods appear together. Makes it easier to: Discover what exactly each method does. Discover duplicate methods. Find inconsistencies. Determine level of abstraction.

12 Graph example Accesses tables (only a part is visible)

13 Graph example (cont.)

14 Class Assignment Try to find as many problems as possible in the Molecule class. Examples: Duplicate methods. Different methods that do the same thing (not composites!). Inconsistencies in types and names between methods. Asymmetries in the interface. Invariants that are violated. Methods which do not access the fields you expect them to. Assume that: All methods are documented. Some methods declare and throw exceptions.

15 Zoom-in Zoom-out approach Problems: Concept lattices can be very large Number of concepts is bound by Polynomial for most real-life contexts Linear for 99.5% of classes! Elaborate member details are cumbersome Solution: Provide (semi-) automatic zoom in/out tools

16 Running Example The Molecule class from CDK CDK: Chemistry Development Kit Open source library of chemistry related classes Developed at the Max Plank institute in Germany Used in chemistry visualization applications Why the Molecule class? Has a large interface (nearly 75 public members) The represented entity is familiar to most people Methodology was successfully applied to other classes as well Our methodology revealed several new bugs and issues !

Stage I: Interface Analysis “ Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to produce bigger and better idiots. So far, the universe is winning …” --Rich Cook “ There are only two industries that refer to their customers as ‘ users ’…” -- Edward Tufte

18 Interface Analysis Purpose: Understand the functionality provided by the class Map expectations into interface members The “ concept assignment ” or “ feature mapping ” problems Discover problems e.g. missing or superfluous functionality, exposed implementation details, inconsistent naming Methodology: Methods are partitioned into concepts Heuristic for automatic feature categorization Zoom-out and reason about overall structure Zoom-in and examine specific functionalities

19 Preliminaries Mapping features to interface members requires knowing what the features are Tasks: Surmising abstraction, purpose and role Determining vocabulary Predicting mandatory- and non-mandatory functionality Information sources: Domain-specific knowledge Class environment E.g. hierarchy, dependencies, etc. This step is not unique to concept analysis

20 Context Selection Only client-visible methods should be used Public methods by default, protected if client is subclass, default if client is in the same package All fields are kept to ensure a correct partitioning Will be removed after the lattice is constructed Context parameters: (boldface indicates selection) (bold indicates our selection, Φ represents ” don ’ t care ” )

21 Constructing the Lattice The lattice is too cluttered to grasp immediately We start zooming-out Layers correspond to levels of abstraction

22 Simplifying concepts We summarize the responsibilities of each concept in a quick skim over method signatures This process cannot be fully-automated at present Still too cluttered !

23 Naming Concepts Name concepts based on summary Use symbolic representations for common responsibilities

24 Horizontal Decomposition Remove top- and bottom- concepts Connected components are orthogonal Problem with title (on the right) becomes obvious Abundance of trivial components implies record-like behavior Cohesive component requires further analysis

25 Abstraction Lattice Heuristic for clustering concepts Concepts dominated by the same top-layer concepts belong in the same cluster

26 Match services against expectations Functionality search order: Expected mandatory features Expected non-mandatory features Unexpected features For each functionality: Mark relevant clusters Mark relevant concepts Examine each concept Example: Bond management

Stage II – Implementation Analysis "There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. ” C. A. R. Hoare

28 Implementation Analysis Purpose: Understand implementation and structure. Discover problems e.g. redundant fields, bad naming conventions, wrongly- implemented operations Methodology: Code is not inspected at this stage! All information derived from lattice Zoom-in: Including private fields and methods Listing full signatures and introducing classes Embedded call-graph

29 Embedded Call Graph Superposition of call-graph on concept lattice A semantics-based CG layout heuristic Keeps related methods together while reducing crossings Helps investigate relations between methods e.g. surmise level of abstraction or discover wrappers Used later for selecting an order for code inspection Example: ECG of Pnt3D

30 Investigate Fields Examine unused fields Might indicate unimplemented stubs or dead structure Discover the roles of fields Easy for trivial components Harder for the cohesive one Investigate interdependency Naming quality

31 Investigate Special Methods Methods that (should) use the entire state should be in the top concept Exceptions can indicate problems Zoom-in by adding declaring class details Examine methods that do not use fields e.g. discover undeclared statics

32 Investigate Other Methods Ensure symmetry where expected e.g. C11 and C13, C10 and C14, C16 and C17 Ensure methods use expected access patterns Add non-public methods to lattice

Stage III – Code Inspection “ Real programmers don't document. If it was hard to write, it should be hard to understand …” --Anonymous “ Real programmers can write assembly code in any language …” --Larry Wall

34 Code Inspection Purpose: Understand functionality which is unclear after the previous stages. Ensure quality of code and style Methodology: Select an order for effective reading Maximizing reading throughput Maximizing discovered defects Minimizing repetitions

35 Code Inspection Problem Original source code order not effective Co-definitions. No incremental order All class members are defined simultaneously Perturbations to intended order Evolution and maintenance Language issues (e.g. inheritance) Style issues (e.g. public before private)

36 Reading Strategy Organize methods into groups of related functionality and order these groups (global order) Order the methods inside each group (local order) Each concept is a group Same-concept methods are similar in purpose, semantics and implementation Increased prospects of understanding differences between methods and discovering redundancies and replications Less infrastructure (e.g. external libraries) to memorize

37 Reading Strategy Global order (by importance) Read each HD component separately Each represents an independent functionality Read concepts in ascending order of layers Exploit similar level of abstraction Read concepts of the same cluster together Local order (by importance) Read methods in topological order Use restricted ECG Read methods in same ECG component together Resolve equivalencies with “ simplest-first ” rule

38 Inspection Tasks Inspection tasks customized for our reading order Finding duplicate services inside a concept e.g. getDegree and getBondCount Identifying code-sharing opportunities e.g. overloads of addBond Verify that low-level methods are not bypassed e.g. getBondCount, getBondAt An addition to “ standard ” inspection tasks

Version Comparison “ Zero defects: The result of shutting down a production line …” --Kelvin Throop III, "The Management Dictionary"

40 Version Comparison Examine an outline of the differences before the actual details Example: Also useful for subclass/superclass comparisons Differences between the original version of the “ Graph ” class of VGJ (Visualizing Graphs with Java) and the Technion adaptation of that class. Originals appear in bold font, Modifications appear in plain font

Other applications of FCA

42 Hierarchy Construction Godin and Mili (93) classified Smalltalk classes: Objects: Names of concrete collection classes Attributes: Names of messages that these classes accept.

43 Hierarchy Construction (cont.) Output: Multiple inheritance class hierarchy

44 Hierarchy Construction (cont.) There are four types of concepts: Concrete concepts Introduce both attributes and objects Intersect concepts Introduce objects but no attributes Abstract concepts Introduce attributes but no objects Connector ( “ empty ” ) concepts Do not introduce objects or attributes Can be removed!

45 Hierarchy Construction (cont.) Hierarchy after removing connectors and naming abstract concepts:

46 Other Applications Modularizing legacy code Objects: Global variables. Attributes: Functions. Lattice is horizontally decomposed, resulting in modules. Managing component repositories: Objects: Software components. Attributes: Text-based properties or features. Lattice includes all search paths.

The End “ Theory is when you know something, but it doesn't work. Practice is when something works, but you don't know why. Programming combines theory and practice: Nothing works and you don't know why …” -- Anonymous