Semi-Automated Software Restructuring By Santosh K Singh Kesar Advisor Dr. James Fawcett Master’s Thesis Dept. of Electrical Engineering and Computer Science, Syracuse University October 8, 2008
Long-Term Research Goals Attempt to answer the question: Is it possible to reliably improve the structure of large, complex, software? If so, can that be automated? If so, find appropriate means to implement a process for such improvement.
Specific Goals of this Research Find ways to reduce the size of large functions and methods by turning them into a composition of smaller functions and methods with the same behavior. Automate that process. Evaluate the results.
Software Restructuring Extracting new functions and methods from source code functions and methods. Semi-Automated Source Code Restructuring Maintains the same external behavior of restructured source code. New files of restructured source code written, in a different location from actual source code
Restructuring v/s Refactoring Both code restructuring and refactoring are concerned with improving logical structure. Refactoring is a largely manual process with broader scope. Restructuring is automatic, but user-guided. Refactoring has traditionally been applied to managed source code Java in Eclipse C# in Visual Studio Our restructuring works with native languages: C and C++
Is Badly Structured Code Likely? Is there a need for the results of this research? Do experienced researchers and professional developers often create badly structured code?
WeightTwoQuadrantsFactor Imaging Research Code File Function Name Number of lines Weights_calculation.cpp WeightTwoQuadrantsFactor 280 FactorsTwoRays 206 AreaWeightFactor 223 W_Calculate 377 Main 191 WeightBottomFactor 164 Mlr800fs.c Emsid2_new 724 851 Ect 3608 emsid3_new 749 emsid4 516
GKGFX Library, Mozilla 1.4.1 Smallest disk is a file Dependency Lines Number indicates the size of a strong component, in this case, 60 mutually dependent files
Restructuring Process Analysis Find feasible regions for function extraction Selection Select from feasible regions code segments that require few parameters to be passed as function arguments Code generation
Analysis Lexical analysis Parsing Tokenize input stream Group into analysis sequences Parsing Recognize key grammatical elements Store for later use Deeper analysis of functions
Lexical Analysis Tokenize Form semi-expressions Remove comments Eliminate whitespace Recognize key punctuators Form semi-expressions Sequences of tokens appropriate for parsing
Our Lexical Analysis Tools Tokenizer Sample output from Tokenizer Module
Our Lexical Analysis Tools Semi-Expressions Sample output from Semi-Expressions Module
Parsing Recognize key grammatical elements A very small subset of language grammar Function definitions Method definitions Data declarations Data manipulations Build parse tree Use tree elements to support code generation
Top Level Structure of Parse Tree
Different types of Nodes in Parse Tree RootObj FunObj ClassObj ScopeObj DataObject Different types of Nodes in Parse Tree
Building of First Three Levels Building Parse Tree Root Union 1 Global Function 1 Class 1 Global data 1 Global Function 2 Level 0 Level 1 Member Function 1 Member data 1 Level 2 Building of First Three Levels
Containment Diagram of Parse Tree Root Global Function 1 Class 1 Global Function 2 Try Catch Member Function 1 Collection Of Local DataObjects Of member Collection of Local DataObjects Of Scope Top Level Containment diagram of Parse Tree
Criteria #1 – Line Numbers Void source_code(int param) { Int _value = param; Std::string str = “test”; …… // Source code removed for brevity Try param++; if(param>5) param--; } Catch(std::exception& ex) std::cout<<“Exception!”; exit(1); If the source code in this section spans with in the maximum line count, it satisfies criteria #1, and is thus identified as a candidate ‘feasible region’ Example Criteria #1 – Feasible Region’s maximum number of lines (a Command Line Argument) Line number criteria for Feasible Regions
Top down approach Int _value = param; Str = str + “ string”; Std::string str = “test”; Str = str + “ string”; Param = ++ _value; … 34 35 36 37 Constant bottom pointer Moving top pointer downwards Top down approach for determining parameters
Class Diagram of Parsers Class Diagram of Parsers using Utility Class
Class Diagram of ICRNode Class Diagram of ICRNode Interface
Class Diagram of RootObj
Association of DataObjects Class Relationship diagram of Parse tree Objects
Representing Node Types Class Diagram of Different Node Types
Class Diagram of DataObject
Hypothetical view of Hierarchy Stack Root class Function try catch for if Top of Stack Representing the current Scope Stack Top Pointer Hypothetical view of Hierarchy Stack
Class Diagram of TempContainer Figure 3.16 – Class Diagram of TempContainer
Class Diagram – feasibleRegions and newFunctions Class Relationship diagram of feasibleRegions and newFunctions
Class Diagram – FunctionParser and fileManager Class Relationship diagram of FunctionParser and fileManager
Restructuring in multiple passes Original length of ‘testFun’ function: 120 Maximum number of parameters Maximum number of lines Number of lines in host method: testFun Pass 1 3 20 105 Pass 2 97 Pass 3 92 Pass 4 79
Restructuring Functions void setRootValues() { try std::string inFile = getInputFile(); Directory dir; Scanner scanr; scanr.doRecursiveScan(inFile); dir.RestoreFirstDirectory(); if(dir.dirContainIncludes()) scanr.setFileIncludes(); std::vector<std::string> _files = getCompleteFiles(); if(_files.size() > 0) RootObj* root = new RootObj(); std::string _type = root->_typename(); if(_type == "") _type = "pRoot"; root->displayRootStats(); } catch(std::exception& ex) std::cout<< ex.what() <<std::endl; Original Source code
Extracted Function Restructured Code void setRootValues_1() { std::string inFile = getInputFile(); Directory dir; Scanner scanr; scanr.doRecursiveScan(inFile); dir.RestoreFirstDirectory(); if(dir.dirContainIncludes()) scanr.setFileIncludes(); } void setRootValues() { try setRootValues_1(); std::vector<std::string> _files = getCompleteFiles(); if(_files.size() > 0) RootObj* root = new RootObj(); std::string _type = root->_typename(); if(_type == "") _type = "pRoot"; root->displayRootStats(); } catch(std::exception& ex) std::cout<< ex.what() <<std::endl;
Contributions Semi-Automated Software Restructuring Future Work: Type Analysis parser for host language Representing source code structure as Parse tree Identification of Feasible regions Demonstration with working code Future Work: Further Optimization can be achieved. Semantic cues may help make sensible functions. Other things to think about like extracting Objects.
Changes to Thesis document Removed references to SMIRG Re-formatted to match university regulations.
Demonstration Simple code that shows: Parsing Functions and Methods Manages header and implementation files correctly.
End of Presentation Questions ?