Object-Oriented Parsing and Transformation Kenneth Baclawski Northeastern University Scott A. DeLoach Air Force Institute of Technology Mieczyslaw Kokar Northeastern University Jeffrey Smith Northeastern University/Sanders
Why Formalize CASE Tools? 4 Formal Methods –Provably correct software –Code generation –Specification refinement –Theorem proving –Specification and software composition 4 CASE Tools –Uniform graphical interface –Modern SE methodologies –Reverse engineering –Large-scale development paradigm
The Problem 4 Refinement is the process of transforming one specification to a more detailed specification. 4 CASE tools commonly support OO Analysis and Design, but refinement is still based on grammars and parse trees.
Proposed Solution 4 We introduce a toolkit for OO refinement and transformation. 4 The toolkit also automates the generation of grammars and parsers when it is necessary to use linear (grammar-based) representations.
Examples: Web Documents 4 Web Documents. –An OO data model can be transformed in an automated way to an XML DTD. –An OO repository can be viewed as an XML document using a variety of “panoramas.” –The parser for the DTD can also be produced in an automated way.
Examples: Natural Language –Traditional NLP techniques involve a “pipeline” of linear scans of the text. Lexical scanning to produce tokens. Tagging determines the part of speech of terms. Parsing determines a tree structure. Knowledge extraction maps the tree structure to a data model (usually a relational data model). –OO transformation avoids the need for generating and parsing intermediate linear representations.
Example: UML Formalization 4 Formal Methods can provide a foundation for specification and modeling. 4 However, formal methods are regarded as difficult to learn and to use. 4 Combining a CASE tool with a formal methods system would make formal methods more accessible and usable.
Theory-Based Object Model UML Component sort class type class sort abstract class concrete class attribute object-valued attribute method operation axiom state attribute state sort state invariant event Meaning collection of values structure of object and response to stimuli all possible value representations of objects of the class class with no direct instances blueprint for instances function that returns data values/objects - observable class characteristic class attribute whose sort is a set of objects function that modifies attribute values function that does not modify attribute values class attribute value invariant or specification of a function’s semantics function mapping from class to state sort all possible states of an object constraint on class attribute in a given state function that invokes methods, generates events and modifies state attributes
Component Composition An important feature of the theory-based object model is the ability to compose components using the colimit operation. The following diagram illustrates the use of the colimit for aggregation of account information for a customer of a bank. Integer Set Acct-ClassCust-AcctCust-Class Bank C C C {E Acct, Set Acct-Class} {E Customer, Set Cust-Class} {E Account, Set Accounts} {E Customer, Set Customers} {E CA-Link, Set Cust-Acct}
Grammars versus OO Models 4 Expressing an OO model in terms of a grammar is complex and awkward. –Many-to-many relationships require introducing artificial identifiers. –Object sharing in general requires identifiers. –A “focal point” must be chosen. –Web documents add the additional complexity of choosing document boundaries.
Example Student Course takes * * Student as focal point: List of students; each student has the list of courses being taken by the student. Is the course information replicated for each student or is an identifier used? Where does the information about the course get expressed? Course as focal point: List of courses; each course has the list of students who are taking the course. Is the student information replicated for each course or is an identifier used? Where does the information about the student get expressed?
The Transformation Pipeline 4 Refinement and transformation are usually modularized into a series of steps. 4 In the grammar-based approach, each step communicates with the next using a linear representation which requires: –grammar –parser –symbol table –generator
Pipeline Example CASE Diagram Most of the effort in construction such a pipeline is devoted to adapting to the needs of the grammar-based intermediate representations. Intermediate Structure Object Model Structure Formal Methods System Intermediate Structure Export Format Parse Tree Executable Code Object Model Language Formal Methods Language Programming Language Intermediate Code
Simplifying the Pipeline CASE Diagram Intermediate Structure Object Model Structure Formal Methods Structure Programming Language Intermediate Structure The nu& toolkit was introduced to simplify the transformational pipeline by specifying transformations directly on the OO data structures:
Conclusion 4 Grammar-based refinement requires a great deal of unnecessary effort which is only partly mitigated by attribute grammars and support tools. 4 Direct OO refinement and transformation is much simpler and less error-prone. 4 Unfortunately, this particular paradigm shift has yet to occur in the refinement community.
Future Directions 4 Complete the formalization of UML. Development of nu& into a full-featured system for object-oriented refinement and transformation. 4 Application of formal methods (via CASE tools) for component composition, reusable components and self-adaptive systems.