Connecting Task to Source Gail C. Murphy Department of Computer Science University of British Columbia Includes joint work with: Elisa Baniassad, University of British Columbia David Notkin, University of Washington Kevin Sullivan, University of Virginia
© G.C. Murphy 2 Once Upon a Time... Change is inevitable... ?
© G.C. Murphy 3 Overview of Talk A Typical Estimation Task Software Reflexion Model Technique A Typical Reengineering Task Conceptual Modules Technique Partial and Approximate Techniques Summary Task
© G.C. Murphy 4 A Typical Estimation Scenario You are asked to provide, within five days, an estimate of the effort required to modify an implementation of a Unix operating system to page over a distributed network NetBSD Kernel Source Code
© G.C. Murphy 5 Software Visualization
© G.C. Murphy 6 Reverse Engineering
© G.C. Murphy 7 Boxology Model of a Unix virtual memory subsystem drawn by a domain expert
© G.C. Murphy 8 Software Reflexion Model
© G.C. Murphy 9 Software Reflexion Model Technique
© G.C. Murphy State a High-Level Model Syntactic Multiple relations “everyone has one or more”
© G.C. Murphy Extract a Source Model Use existing tools (e.g., cflow, Field, etc.) Lightweight lexical source model extractor (Murphy/Notkin) May contain multiple relations
© G.C. Murphy State a Declarative Mapping Name source model entities using: physical and logical software structure regular expressions Many-to-many mapping Source Model EntitiesHigh-Level Model Entities file=pager.cPager file=vm_map.*VirtualAddressMaint. dir=vm func=active VMPolicy
© G.C. Murphy Investigate a Reflexion Model
© G.C. Murphy 14 Iteration Want to investigate the data relationships? augment the source model update the mapping: var=queue.*active VMPolicy recompute...
© G.C. Murphy 15 Refined Reflexion Model
© G.C. Murphy 16 Experience
© G.C. Murphy 17 Excel: Experimental Reengineering A Microsoft engineer computed Reflexion Models several times a day for four weeks 120,000 calls and global variable references map file with over 1000 entries high-level model with 15 entities and 96 interactions 4 minutes to compute on a 486 Some lessons learned: map files evolved to be larger than expected scale places pressure on managing the information
© G.C. Murphy 18 Other Features... Family of reflexion model systems Parameterized by structural descriptions Incremental computation algorithms Typed model Tagging and annotations to manage investigation Used for a variety of tasks
© G.C. Murphy 19 A Typical Reengineering Scenario
© G.C. Murphy 20 Reengineering Scenario... Procedure main: Procedure sort: Input Pipe
© G.C. Murphy 21 Program Database Identify variables of interest For each variable where is the variable declared? where is the variable referenced? Collate results Repeat
© G.C. Murphy 22 Slicer Compute backward slices on variables in pre-identified lines of code
© G.C. Murphy 23 Type Inferencer Determine constraints on the representation of values Can be used to identify abstract data types, detect abstraction violations, find unused variables, and determine where there are possible references to a value The Lackwit [O’Callahan & Jackson 97] tool produces graphs summarizing how values are transmitted through a program
© G.C. Murphy 24 Software Reflexion Model Difficult to ascertain interface of the module No support for querying the source model Syntactic comparison
© G.C. Murphy 25 Conceptual Module Technique
© G.C. Murphy 26 Forming a Conceptual Module Map lines of code to a logical module Two ways to map the code: by specifying line numbers (individual, ranges, etc.) by specifying pieces of existing logical structure (i.e., variables or procedures) Each module has a name Formation can be iterative For sort, we ended up including about 24 lines in the input pipe conceptual module.
© G.C. Murphy 27 Interface Analysis Local (interface) analysis is used to summarize how the module interacts directly with the existing code Input Variables: sortalloc, main.ofp, main.minus, etc. Output Variables: main.mergeonly, sort.ofp, sortalloc, etc. Local Variables: main.files, main.nfiles, sort.files Control Transfers: xmalloc at sort.c 1796, fillbuf at sort.c 248, etc.
© G.C. Murphy 28 Interface Analysis... Interface analysis is straightfoward. One twist is that the analysis is setup to be tolerant of the source model. Source model consists of: variable dependence relation control transfer relation procedure start relation May be either use-def pairs or uses & defs Two phase analysis for local variables: 1. Use-def pairs: all uses & defs in module implies local variable. 2. Uses & defs: consider input/output; promote to local if all uses and defs in module.
© G.C. Murphy 29 Querying about Conceptual Modules Once one or more conceptual modules are formed, the re-engineer typically needs to perform queries: How do the Conceptual Modules relate to each other? How do the Conceptual Modules relate to the existing source? The tool provides both pre-coded queries as well as a programmable interface through which a user can code queries.
© G.C. Murphy 30 Conceptual Module Relationships A B direct def use A B indirect def use AB A B overlapcontains
© G.C. Murphy 31 Programmable Interface SET common = new SET(); // Get the use-def chains for all input and local variables // of that module. Module first = (Module)Module.ModuleTable.elementAt(0); common=DefUse.GetFullUseDefChain(first); for(int i=1; i<Module.ModuleTable.size(); i++) { // Get the use-def chains for the next module Module current = (Module)Module.ModuleTable.elementAt(i); SET curr_chain = DefUse.GetFullDefUseChain(current); // Intersect the chains to determine common definition points common = DefUse.INTERSECTION(common, curr_chain); } common.print();
© G.C. Murphy 32 Experience SUIF xrefdb SUIF = tools built on SUIF provides use-def pairs xrefdb = Field’s xrefdb provides uses & defs
© G.C. Murphy 33 Query Context and Form Two parts to expressing context: identify region of source over which to query restrict the region for which results are reported Conceptual Module identifies region and interface analysis summarizes local results Form includes both input and output: some tasks require queries over grouped items reort results in terms of task can use Conceptual Module structure to query against source; results are reported in terms of target structure
© G.C. Murphy 34 Partial and Approximate Techniques Each of these characteristics can be an effective way to attack scale. These characteristics can be combined to provide software engineers with a “smoother” means of managing source investigations. Bottom line for most developments is that time is money. approximate conservative
© G.C. Murphy 35 Summary Software Reflexion Model “Definitely confirmed suspicions about the structure of Excel. Further, it allowed me to pinpoint the deviations.... very easy to ignore stuff that is not interesting and thereby focus on the part of Excel I want to know more about.” Microsoft Engineer Conceptual Module “not only did the tool verify the independent nature of the ZDD functionality and allow me to rip out all that code, but, the process of using your tool forced me to analyze and understand the code in a way that I had not been doing and that ultimately it very quickly gave me the perspective I needed.” Yvonne Coady Task
© G.C. Murphy 36 Summary... demonstrated benefits of task-aware program understanding techniques current techniques are structurally task-aware demonstrated role for approximate information reflexion model technique makes engineer responsible conceptual modules takes some of responsibility goal is to get to “what-if” tools that would allow engineers to leverage, cost-effectively, connections between design and source