Semi-Automated Software Restructuring

Slides:



Advertisements
Similar presentations
Introduction to C Programming
Advertisements

Introduction to C Programming
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
1 CSC 1401 S1 Computer Programming I Hamid Harroud School of Science and Engineering, Akhawayn University
Introduction to C++CS-2303, C-Term Introduction to C++ CS-2303 System Programming Concepts (Slides include materials from The C Programming Language,
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
Guide To UNIX Using Linux Third Edition
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
CMSC 104, Version 8/061L18Functions1.ppt Functions, Part 1 of 4 Topics Using Predefined Functions Programmer-Defined Functions Using Input Parameters Function.
Invitation to Computer Science 5th Edition
Chapter 9: Coupling & Cohesion Omar Meqdadi SE 273 Lecture 9 Department of Computer Science and Software Engineering University of Wisconsin-Platteville.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
1 The Software Development Process  Systems analysis  Systems design  Implementation  Testing  Documentation  Evaluation  Maintenance.
CST320 - Lec 11 Why study compilers? n n Ties lots of things you know together: –Theory (finite automata, grammars) –Data structures –Modularization –Utilization.
Chapter 6 Programming Languages (2) Introduction to CS 1 st Semester, 2015 Sanghyun Park.
D. M. Akbar Hussain: Department of Software & Media Technology 1 Compiler is tool: which translate notations from one system to another, usually from source.
FLEX Fast Lexical Analyzer EECS Introduction Flex is a lexical analysis (scanner) generator. Flex is provided with a user input file or Standard.
ADTs and C++ Classes Classes and Members Constructors The header file and the implementation file Classes and Parameters Operator Overloading.
Compiler design Lecture 1: Compiler Overview Sulaimany University 2 Oct
1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.
Chapter 1 Introduction Major Data Structures in Compiler
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Functions Chapter 6. Modular Programming Modular programming: breaking a program up into smaller, manageable functions or modules Function: a collection.
Recommending Adaptive Changes for Framework Evolution Barthélémy Dagenais and Martin P. Robillard ICSE08 Dec 4 th, 2008 Presented by EJ Park.
Compiler Construction CPCS302 Dr. Manal Abdulaziz.
© 2006 Pearson Addison-Wesley. All rights reserved 1-1 Chapter 1 Review of Java Fundamentals.
1 Asstt. Prof Navjot Kaur Computer Dept PRESENTED BY.
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
LECTURE 10 Semantic Analysis. REVIEW So far, we’ve covered the following: Compilation methods: compilation vs. interpretation. The overall compilation.
BIL 104E Introduction to Scientific and Engineering Computing Lecture 4.
Lecture 9 Symbol Table and Attributed Grammars
Software Testing.
Compiler Design (40-414) Main Text Book:
User-Written Functions
PRINCIPLES OF COMPILER DESIGN
Introduction to Compiler Construction
Software Metrics 1.
Programming with ANSI C ++
Chapter 7 Text Input/Output Objectives
Chapter 7 Text Input/Output Objectives
A Simple Syntax-Directed Translator
Constructing Precedence Table
Architecture Concept Documents
Introduction to Parsing (adapted from CS 164 at Berkeley)
Introduction to C++ Systems Programming.
8. Symbol Table Chih-Hung Wang
Physical Data Model – step-by-step instructions and template
Overview of Compilation The Compiler Front End
Overview of Compilation The Compiler Front End
System Design.
Algorithms Problem: Write pseudocode for a program that keeps asking the user to input integers until the user enters zero, and then determines and outputs.
PROGRAMMING LANGUAGES
Introduction to C++.
FORMAL SYSTEM DEVELOPMENT METHODOLOGIES
Lecture 17 Software Metrics
2011/11/20: Lecture 15 CMSC 104, Section 4 Richard Chang
Chapter 8: Introduction to High-Level Language Programming
CPSC 388 – Compiler Design and Construction
Chapter 3 Introduction to Classes, Objects Methods and Strings
Chapter 4 Functions Objectives
Representation, Syntax, Paradigms, Types
Class Diagrams.
Bryan Burlingame 13 February 2019
Protocols CS 4311 Wirfs Brock et al., Designing Object-Oriented Software, Prentice Hall, (Chapter 8) Meyer, B., Applying design by contract, Computer,
Chapter 10: Compilers and Language Translation
Standard Version of Starting Out with C++, 4th Edition
SPL – PS1 Introduction to C++.
Presentation transcript:

Semi-Automated Software Restructuring By Santosh K Singh Kesar Advisor Dr. James Fawcett Master’s Thesis Dept. of Electrical Engineering and Computer Science, Syracuse University October 8, 2008

Long-Term Research Goals Attempt to answer the question: Is it possible to reliably improve the structure of large, complex, software? If so, can that be automated? If so, find appropriate means to implement a process for such improvement.

Specific Goals of this Research Find ways to reduce the size of large functions and methods by turning them into a composition of smaller functions and methods with the same behavior. Automate that process. Evaluate the results.

Software Restructuring Extracting new functions and methods from source code functions and methods. Semi-Automated Source Code Restructuring Maintains the same external behavior of restructured source code. New files of restructured source code written, in a different location from actual source code

Restructuring v/s Refactoring Both code restructuring and refactoring are concerned with improving logical structure. Refactoring is a largely manual process with broader scope. Restructuring is automatic, but user-guided. Refactoring has traditionally been applied to managed source code Java in Eclipse C# in Visual Studio Our restructuring works with native languages: C and C++

Is Badly Structured Code Likely? Is there a need for the results of this research? Do experienced researchers and professional developers often create badly structured code?

WeightTwoQuadrantsFactor Imaging Research Code File Function Name Number of lines Weights_calculation.cpp WeightTwoQuadrantsFactor 280 FactorsTwoRays 206 AreaWeightFactor 223 W_Calculate 377 Main 191 WeightBottomFactor 164 Mlr800fs.c Emsid2_new 724 851 Ect 3608 emsid3_new 749 emsid4 516

GKGFX Library, Mozilla 1.4.1 Smallest disk is a file Dependency Lines Number indicates the size of a strong component, in this case, 60 mutually dependent files

Restructuring Process Analysis Find feasible regions for function extraction Selection Select from feasible regions code segments that require few parameters to be passed as function arguments Code generation

Analysis Lexical analysis Parsing Tokenize input stream Group into analysis sequences Parsing Recognize key grammatical elements Store for later use Deeper analysis of functions

Lexical Analysis Tokenize Form semi-expressions Remove comments Eliminate whitespace Recognize key punctuators Form semi-expressions Sequences of tokens appropriate for parsing

Our Lexical Analysis Tools Tokenizer Sample output from Tokenizer Module

Our Lexical Analysis Tools Semi-Expressions Sample output from Semi-Expressions Module

Parsing Recognize key grammatical elements A very small subset of language grammar Function definitions Method definitions Data declarations Data manipulations Build parse tree Use tree elements to support code generation

Top Level Structure of Parse Tree

Different types of Nodes in Parse Tree RootObj FunObj ClassObj ScopeObj DataObject Different types of Nodes in Parse Tree

Building of First Three Levels Building Parse Tree Root Union 1 Global Function 1 Class 1 Global data 1 Global Function 2 Level 0 Level 1 Member Function 1 Member data 1 Level 2 Building of First Three Levels

Containment Diagram of Parse Tree Root Global Function 1 Class 1 Global Function 2 Try Catch Member Function 1 Collection Of Local DataObjects Of member Collection of Local DataObjects Of Scope Top Level Containment diagram of Parse Tree

Criteria #1 – Line Numbers Void source_code(int param) { Int _value = param; Std::string str = “test”; …… // Source code removed for brevity Try param++; if(param>5) param--; } Catch(std::exception& ex) std::cout<<“Exception!”; exit(1); If the source code in this section spans with in the maximum line count, it satisfies criteria #1, and is thus identified as a candidate ‘feasible region’ Example Criteria #1 – Feasible Region’s maximum number of lines (a Command Line Argument) Line number criteria for Feasible Regions

Top down approach Int _value = param; Str = str + “ string”; Std::string str = “test”; Str = str + “ string”; Param = ++ _value; … 34 35 36 37 Constant bottom pointer Moving top pointer downwards Top down approach for determining parameters

Class Diagram of Parsers Class Diagram of Parsers using Utility Class

Class Diagram of ICRNode Class Diagram of ICRNode Interface

Class Diagram of RootObj

Association of DataObjects Class Relationship diagram of Parse tree Objects

Representing Node Types Class Diagram of Different Node Types

Class Diagram of DataObject

Hypothetical view of Hierarchy Stack Root class Function try catch for if Top of Stack Representing the current Scope Stack Top Pointer Hypothetical view of Hierarchy Stack

Class Diagram of TempContainer Figure 3.16 – Class Diagram of TempContainer

Class Diagram – feasibleRegions and newFunctions Class Relationship diagram of feasibleRegions and newFunctions

Class Diagram – FunctionParser and fileManager Class Relationship diagram of FunctionParser and fileManager

Restructuring in multiple passes Original length of ‘testFun’ function: 120   Maximum number of parameters Maximum number of lines Number of lines in host method: testFun Pass 1 3 20 105 Pass 2 97 Pass 3 92 Pass 4 79

Restructuring Functions void setRootValues() { try std::string inFile = getInputFile(); Directory dir; Scanner scanr; scanr.doRecursiveScan(inFile); dir.RestoreFirstDirectory(); if(dir.dirContainIncludes()) scanr.setFileIncludes(); std::vector<std::string> _files = getCompleteFiles(); if(_files.size() > 0) RootObj* root = new RootObj(); std::string _type = root->_typename(); if(_type == "") _type = "pRoot"; root->displayRootStats(); } catch(std::exception& ex) std::cout<< ex.what() <<std::endl; Original Source code

Extracted Function Restructured Code void setRootValues_1() { std::string inFile = getInputFile(); Directory dir; Scanner scanr; scanr.doRecursiveScan(inFile); dir.RestoreFirstDirectory(); if(dir.dirContainIncludes()) scanr.setFileIncludes(); } void setRootValues() { try setRootValues_1(); std::vector<std::string> _files = getCompleteFiles(); if(_files.size() > 0) RootObj* root = new RootObj(); std::string _type = root->_typename(); if(_type == "") _type = "pRoot"; root->displayRootStats(); } catch(std::exception& ex) std::cout<< ex.what() <<std::endl;

Contributions Semi-Automated Software Restructuring Future Work: Type Analysis parser for host language Representing source code structure as Parse tree Identification of Feasible regions Demonstration with working code Future Work: Further Optimization can be achieved. Semantic cues may help make sensible functions. Other things to think about like extracting Objects.

Changes to Thesis document Removed references to SMIRG Re-formatted to match university regulations.

Demonstration Simple code that shows: Parsing Functions and Methods Manages header and implementation files correctly.

End of Presentation Questions ?