The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark Grechanik, Dewayne E. Perry, and Don Batory
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 2 Polylingual Systems Polylingual systems consist of interoperating programs (or COTS components) that are written in two or more languages or are run on two or more platforms Native type system is the type system of a host language in which a program is written A program written in a host language interoperates with a program based on a Foreign Type System (FTS) PnPn PkPk P n P k P n P k
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 3 Examples of Polylingual Systems A C++ program and an EJB interoperate P C++ P Java P C++ P Java P C++ P Java A C# program and a Python program interoperate P C# P Python P C# P Python P C# P Python
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 4 Large-Scale Polylingual Systems P1P1 P2P2 P3P3 P4P4 … PnPn Polylingual systems can be represented as graphs of interoperating programs Circles mean programs Arrows mean interoperating APIs For a clique with n programs, the complexity of APIs used to interoperate programs is O(n 2 ) We need a scalable approach for designing, implementing, and maintaining large-scale polylingual systems!
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 5 Assumptions Reflection is available for all platforms The cost of reflection is insignificant Hardware is powerful and cheap Cost of network communications outweighs the cost of reflection the order of magnitude Polylingual systems are based on recursive type systems
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 6 Core Abstraction Int n = R[“CEO”][“CTO”][“Geeks”] CEO CFOCTO Test Geeks Name Bonus Name Salary Geeks CEO CTO Geeks
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 7 Operations On Reification Operators Copy Creates a copy of an element or attribute and adds it to its new location. All properties of an element or an attribute are cloned including all nested elements Move It is identical to the copy operation except for the automatic removal of the original element or attribute upon completion of copying Add It appends elements and attributes under a given path Remove It removes elements and attributes from the given path. If a removed element contains nested elements then the entire branch of the graph under the removed element is deleted Relational Compares graphs and their elements with constants, variables, or other graphs Logic set Computes various logic set operations such as intersection, union, cartesian product, complement, and difference Composition Composes two reification operators
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 8 Our Solution: Reification Object-Oriented Framework (ROOF) Basic idea: each component in a polylingual system is represented as a graph of objects and a uniform set of APIs is provided to navigate and manipulate these objects We use the generality of graphs to develop a language and platform-independent solution for polylingual systems Reification Object-Oriented Framework Reify objects from an FTS to the host language Remote objects become first-class objects Reification is based on reflection ROOF hides all the complexity that programmers have to deal with today
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 9 Birds-Eye View of the ROOF CORBA.NetXMLHTMLDBMS Reification Object-Oriented Framework (ROOF) Foreign Object Reification Language (FOREL)
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 10 … String s; s = R[“H2”][“B”][“FONT”]; … C++ Program Reification Mechanism <FONT size=“2"> Hello World! HTML Parser
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 11 Reification Mechanism R HTML C++ <FONT size=“2"> Hello World! HTML Parser from … String s; s = R[“H2”][“B”][“FONT”]; … C++ Program to
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 12 Reification Mechanism R HTML C++ <FONT size=“2"> Hello World! HTML Parser … String s; s = R[“H2”][“B”][“FONT”]; … C++ Program
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 13 Reification Mechanism R HTML C++ <FONT size=“2"> Hello World! HTML Parser … String s; s = R[“H2”][“B”][“FONT”]; … C++ Program R
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 14 Reification Mechanism R HTML C++ <FONT size=“2"> Hello World! HTML Parser H2 B FONT Hello World! … String s; s = R[“H2”][“B”][“FONT”]; … C++ Program H2B FONT S
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 15 … String s; s = R[“JCls”][“GetString”]; … C# Program Reification Mechanism class JCls { String GetString() { return( new String( “Hello World!”)); } Java Virtual Machine
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 16 R Java C# class JCls { String GetString() { return( new String( “Hello World!”)); } Java Virtual Machine Reification Mechanism from … String s; s = R[“JCls”][“GetString”]; … C# Program to
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 17 Reification Mechanism R Java C# class JCls { String GetString() { return( new String( “Hello World!”)); } Java Virtual Machine … String s; s = R[“JCls”][“GetString”]; … C# Program
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 18 Reification Mechanism R Java C# class JCls { String GetString() { return( new String( “Hello World!”)); } Java Virtual Machine … String s; s = R[“JCls”][“GetString”]; … C# Program R
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 19 Reification Mechanism R Java C# class JCls { String GetString() { return( new String( “Hello World!”)); } Java Virtual Machine … String s; s = R[“JCls”][“GetString”]; … C# Program JCls GetString Hello World! JCls GetString S
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 20 Properties of the ROOF Our solution does not introduce Additional type systems Hard-to-learn API Special constraints that affect programmer’s decisions to share objects ROOF allows programmers to Avoid using any naming mechanisms Type check foreign objects at compile time Other reasons
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 21 FORTRESS We exploit properties of FOREL-based code to recover high-level design of polylingual systems with a high degree of automation Our solution is FOReign Types Reverse Engineering Semantic System (FORTRESS) Normalize code to conform to FOREL grammar Analyze FOREL-based code using program analysis techniques (CFA and DFA) Infer schemas that describe FTS models and operations executed against them
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 22 GUI Visualization Engine FORTRESS Process Normalized code Compiler Front end Program Analysis Schema Inference
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 23 FTS RE Algorithm 1) Parse the source code and build an AST 2) Build a control flow graph 3) Build a data flow graph 4) For each branch in the control flow graph do a) Detect reachability of statements accessing and manipulating reified types b) Create schema definitions from reified types c) Translate operations on reified type instances to operations of the schema definition elements d) Output the schema and operations on its instances 5) End For Program Analysis Schema Inference Output Generation
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 24 Schema Inference SELECT u.Name, c.Course FROM User u, Courses c WHERE u.ID = c.ID; Two tables: User and Courses Attributes Name and ID in User table Attributes Course and ID in Course table Declaration of attribute ID in both tables is the same or compatible
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 25 Schema Inference
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 26 Schema Inference in FTSs ReificationOperator R; float var = ; R[“CEO”][“CTO”](“Salary”) = var; What can we infer from this statement? The structure of a branch of the data flow Composite type CEO of some FTS Attribute Salary of type CTO The type of this attribute and a value that it is assigned in this branch
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 27 Schema Inference in FORTS CEO CTO Salary R[“CEO”][“CTO”](“Salary”) = var; CEO CTO Salary
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 28 Synergy Program analysis and schema inference engine is a powerful combination Create the schemas that reflect the semistructured data operated by the code Relate different FTSs by analyzing a single FTS program Create high-level design by relating actions to schemas rather than variables and functions IJ Q
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 29 Output Generation Outputs schemas describing FTSs instructions in readable format that manipulate instances of schemas Visualization Tool Presents a single high-level view of FTSs Models program execution and visualizes its aspects
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 30 FORTRESS Architecture FOREL code Compiler Front end Control Flow Analyzer Data Flow Analyzer Schema Inference Engine Visualization Driver GUI AST
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin 31 Conclusion We show how the ROOF serves the underlying mechanism enabling the verification of large-scale polylingual systems Reduce the complexity from O(n 2 ) to 1 Provide uniform API for graph navigation and manipulation with precise semantics assigned to operations Enable an effective reverse engineering process Removes pain associated with understanding of legacy software No existing solution addresses this problem