Cross Language Clone Analysis Team 2 November 22, 2010

Slides:



Advertisements
Similar presentations
Testing Relational Database
Advertisements

Test Automation Success: Choosing the Right People & Process
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
Lecture # 2 : Process Models
Alternate Software Development Methodologies
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
1 An Introduction to Visual Basic Objectives Explain the history of programming languages Define the terminology used in object-oriented programming.
UML - Development Process 1 Software Development Process Using UML (2)
Microsoft Visual Basic 2005: Reloaded Second Edition
Chapter 2 The process Process, Methods, and Tools
CS 360 Lecture 3.  The software process is a structured set of activities required to develop a software system.  Fundamental Assumption:  Good software.
Testing Workflow In the Unified Process and Agile/Scrum processes.
Cross Language Clone Analysis Team 2 October 27, 2010.
Cross Language Clone Analysis Team 2 April 7, 2011.
Feasibility Study Cross-language Clone Analysis Team 2.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Guide to Programming with Python Chapter One Getting Started: The Game Over Program.
Systems Analysis and Design in a Changing World, 3rd Edition
Cross Language Clone Analysis Team 2 October 13, 2010.
Cross Language Clone Analysis Team 2 February 3, 2011.
Cross Language Clone Analysis Team 2 March 3, 2011.
Cross Language Clone Analysis Team 2 November 22, 2010.
Cross Language Clone Analysis Team 2 November 10, 2010.
Cross Language Clone Analysis Team 2. Team Introduction Task Summary Introduction Scope of Work Description of Related Research Identification of User.
 Software Clones:( Definitions from Wikipedia) ◦ Duplicate code: a sequence of source code that occurs more than once, either within a program or across.
Process Asad Ur Rehman Chief Technology Officer Feditec Enterprise.
Cross Language Clone Analysis Team 2 February 3, 2011.
Software Quality Assurance and Testing Fazal Rehman Shamil.
Software Development Process CS 360 Lecture 3. Software Process The software process is a structured set of activities required to develop a software.
Software Reuse. Objectives l To explain the benefits of software reuse and some reuse problems l To discuss several different ways to implement software.
Principles of Programming & Software Engineering
Chapter 15 Finalizing Design Specifications
Software Testing.
Software Project Configuration Management
Compiler Design (40-414) Main Text Book:
Regression Testing with its types
Development with Eclipse
Chapter 1 Introduction.
Software Metrics 1.
Software Testing.
Recent trends in estimation methodologies
Chapter 18 Maintaining Information Systems
Computer Aided Software Engineering (CASE)
Mike Cohn - Agile Estimating and Planning
CS101 Introduction to Computing Lecture 19 Programming Languages
GO! with Microsoft Access 2016
Chapter 1 Introduction.
Software Documentation
Compiler Lecture 1 CS510.
Software Project Planning &
Roberta Roth, Alan Dennis, and Barbara Haley Wixom
Unit# 8: Introduction to Computer Programming
Tools of Software Development
Introduction to Software Testing
Hands-on Introduction to Visual Basic .NET
Chapter 5 Designing the Architecture Shari L. Pfleeger Joanne M. Atlee
Software life cycle models
Chapter 2 The Origins of Software
What's New in eCognition 9
Introducing ISTQB Agile Foundation Extending the ISTQB Program’s Support Further Presented by Rex Black, CTAL Copyright © 2014 ASTQB 1.
Chapter 1 Introduction(1.1)
Analysis models and design models
Chapter 10 – Software Testing
Adaptive Product Development Process Framework
Lecture 06:Software Maintenance
JavaScript CS 4640 Programming Languages for Web Applications
Chapter 26 Estimation for Software Projects.
What's New in eCognition 9
IS 135 Business Programming
Presentation transcript:

Cross Language Clone Analysis Team 2 November 22, 2010 10/13/2010 Presentation 7 Cross Language Clone Analysis Team 2 November 22, 2010

Agenda Feasibility Study Release Plan Architecture Parsing CodeDOM 10/13/2010 Agenda Feasibility Study Release Plan Architecture Parsing CodeDOM Clone Analysis Testing Demonstration Team Collaboration Path Forward

Our Team Allen Tucker Patricia Bradford Greg Rodgers Brian Bentley 10/13/2010 Our Team Allen Tucker Patricia Bradford Greg Rodgers Brian Bentley Ashley Chafin Add Roles at end – initial allocation of effort

10/13/2010 Feasibility Study Our evaluation of the project to determine the difficulty in carrying out the task.

Task Summary Our Customers: Dr. Etzkorn and Dr. Kraft Customer Request: A tool that will abstract programs in C++, C#, Java, and (Python or VB) to the Dagstuhl Middle Metamodel, Microsoft CodeDOM or something similar, and detect cross-language clones. Areas to Note: the user interface easy comparisons of clones visualization of clones sub-clones clone detection for large bodies of code

Task Summary (cont.) Per our task, in order to find clones across different programming languages, we will have to first convert the code from each language over to a language independent object model. Some Language Independent Object Models: Dagstuhl Middle Metamodel (DMM) Microsoft CodeDOM Both of these models provide a language independent object model for representing the structure of source code. - Per our task, in order to find clones across different programming languages, we will have to first convert the code from each language over to a language independent object model. - Some Language Independent Object Models: Dagstuhl Middle Metamodel (DMM) and Microsoft CodeDOM - Both of these models provide a language independent object model for representing the structure of source code.

Related Research Detecting clones across multiple programming languages is on the cutting edge of research. A preliminary version of this was done by Dr. Kraft and his students for C# and VB. They compared the Mono C# parser (written in C#) to the Mono VB parser (written in VB). Publication: Nicholas A. Kraft, Brandon W. Bonds, Randy K. Smith: Cross-language Clone Detection. SEKE 2008: 54-59 - Detecting clones across multiple programming languages is on the cutting edge of research. Some related research: A preliminary version of this was done by Dr. Kraft and his students for C# and VB. They compared the Mono C# parser (written in C#) to the Mono VB parser (written in VB). They published the following paper.

Task Understanding Three Step Process Step 1 Code Translation Step 2 Clone Detection Step 3 Visualization Common Model Translator Source Files Detected Clones Inspector Common Model Clone Visualization UI Detected Clones

Task Understanding (cont.) Step 1: Code Translation C#, C++, Java, VB (or Python) CodeDOM Step 2: Clone Detection Leverage current clone detection techniques and research Step 3: Clone Visualization Need for an intuitive user interface

Clone Detection as a Product Commercial Product What are the benefits of Software Clone Detection? Main Goal: Decrease Coding Errors (bugs)

Benefits Fact: Modularity is a key characteristic in today’s software world Why? Allows us to divide software into a decomposed separation of concerns Attributes to maintainability, reusability, testability and reliability Clone Detection allows us to detect common software spread across large bodies of code Identify code that is subject to further modularity

Benefits (cont) But not all code can be cleanly decomposed Crosscutting Concerns Responsible for tangling and scattering (code duplication) an implementation Logging Scattered across Unrelated Functions How do you Manage large areas of (usually) Duplicated Crosscuts? Errors, Changes

Benefits (cont) Aspect Oriented Programming Modularize Crosscuts using Advice and Join Points Example: Spring Framework Identifying Aspects (crosscuts) Time Consuming task Use Clone Detection to Identify Aspects Define Rule

Benefits (cont) Summarize What? How? Detect code that is a candidate for modularity Identify Crosscuts in modules Am I a candidate for ASP? How? Continuous Integration Generate Reports every time new code is added

Features Clone Detection Software Suite Multi-language support Identifies Tracks Manages Software Clones Multi-language support C++ C# Java

Features (cont) Provides complete code coverage Multi-Application Support Stand-alone Plug-in based (Eclipse) Backend service (Ant task)

Features (cont) Extendible Easy to Navigate between Clones Built on a Plug-in Framework Add new languages Easy to Navigate between Clones Persists Clones for easy Retrieval

Human Factors Designing to meet user needs User center approach Need for an intuitive user interface Clone Visualization techniques

Intellectual Property The University of Alabama in Huntsville would own and manage any and all intellectual property associated with the research and developmental artifacts of this project.

Project and Development Issues Fast, Good, and Cheap…choose two. Fast…time required to deliver products Good…quality of product Cheap…cost of designing and building

Risk Analysis Complexity of problem proves more difficult than initial estimates. Technology to be applied is neither well- established or has yet to be developed. Unable to complete defined project scope within schedule. Volatile user requirements leading to redefinition of project objectives.

Project Scale-Down Factors Our initial approach…maximize existing open sourced developed items in order to reduce project timeline. Instability in harvested projects. Lack of support…documentation, forums, etc. Disjoint projects code bases. Non-existing code bases to harvest from.

10/13/2010 Release Plan Release Plan and User Stories

User Story Approach User Stories Applied…Mike Cohn suggested formal approach As a (role) I want (something) so that (benefit). Quality Attributes Independent Negotiable Valuable to user or customers Estimatable Small Testable

Re-tooled User Stories Came out with original Release Plan on 9/15/20 Due to customer wants/needs, we had to re- tool our user stories. Dr. Etzkorn’s main concerns: Load source code and translate to a language independent model Analyze the translated source code for clones Results from meeting: Created two new user stories (see next two slides) These two user stories have been pushed to the front of our card stack

Analysis There are three Agile levels of planning. Release planning is a group of stories selected because they represent a usable set of features that can be released together. These types of plans are made by selecting the stories and deciding how many iterations are needed or by selecting a release date and seeing how much can be done by then. Release plans have no details other than a list of stories to be done by a date. The second level of planning is the iteration or sprint plan. This plan is a subset of the release plan stories that will be done in the very next iteration or sprint. Only one iteration plan exists at a time. With our chosen collection of important features we can now estimate the amount of effort to implement them. The people who will do the work, namely the developers, have authority to set the estimates. The manager will set the total amount of work that the next iteration can have planned. The customer then chooses a subset of the most important features that will fit into the next iteration. The iteration plan will often be verified by breaking the stories into development tasks and estimating them with finer grain units. At this level use cases could also be created. This greater level of detail is permissible because iterations or sprints are kept very short. The third level is the daily plan. A daily plan isn't usually represented by any artifacts. At the daily scrum or stand up meeting everyone will announce their plan for the day and then act on it. Even greater detail is allowed because the plan's duration is one day and no more.

CS 666 Studio I User Stories Phase I

Summary ~ 68 remaining development days Focus on top 3 user stories Focus on Translation and Analysis

Source Code Load & Translate 017 1 14 Days As an analyst I want the to load and translate my source code projects so I can analyze the source for clones.

Source Code Analyze 018 1 14 Days As an analyst I want the to analyze my source code projects so I can see the clones.

Code Clone Highlights 002 1 14 Days As a analyst I want the capability to have the source code associated with clones highlighted within source files so that they are easy to identify.

CS 668 Software Studio II Phase II

Summary ~ 80 development days Focus on next 5 user stories Focus on analysis capabilities

Auto-Navigate 013 2 7 Days As a developer I want the capability to auto- browse to the code segment associated with a clone so I do not have to manually search for it.

Visual Reports 003 1 21 Days As a analyst I want the capability to generate reports on clones within projects in a number of formats (e.g. html, cvs, etc.) so that I can include them in presentations.

Clone Density Graph 014 1 21 Days As an analyst I want the capability to have a projects clone density reported in a graph form so I can visually see the distribution of detected clones within a project.

Project Management 001 10 5 Days As a analyst I want the capability to load and manage multiple projects within the application so that I can perform analysis on them at various times without having to reload them.

Analysis Options 005 3 20 Days As a analyst I want the capability to view summary analysis data (e.g. clones per file, package, projects, etc.) so that I can identify the distribution of clones within a project.

Follow-On Work Future Capabilities

Project Language Auto-Detection 010 8 14 Days As an analyst I want the capability to have the language of a source code project auto- detected so I do not have to define it.

Clone Categorization 008 5 14 Days As an analyst I want the capability to have the detected clones categorized by a number of criteria (e.g. type, priority, etc.) so that work prioritization can be established.

False Positive Identification 004 7 14 Days As a analyst I want the capability to label a prospective clone as a false positive so that it will be ignored in analysis and reports.

Development Environment Integration 007 4 30 Days As a developer I want the capability to integrate the clone detection tool directly into my development environment (e.g. eclipse, netbeans, visual studio, etc.) so that I have a single application with all development tools integrated.

Project History 012 6 21 Days As an analyst I want the capability to see project change history (e.g. initial project, xx clones found, clone id yyy removed, project updated, xx new clones found, etc.) so I can assess the impact of code changes within a project.

Detection Updates 011 9 21 Days As an analyst I want the capability to update a projects associated source code and the tool to detect these changes and offer a detection re-do so I can make corrections to clones and see resolutions in action.

Interactive Help 015 10 21 Days As a general user I want an interactive help system with context sensitive search so I can learn the system with ease.

Build Environment Integration 006 10 30 Days As a configuration manager I want the capability to integrate clone detection into an automated build environment (e.g. ant, nmake, msbuild, etc.) so that I can view reports on a code projects as they are built.

Dropped User Stories Cut By Customer

Source Code Association 009 Customer priority of 11 (Normal range is 1 – 10)…indicated would cut from scope. 11 5 Days As an analyst I want the capability to retain or not to retain the associated source code with a project so I can reduce my project size footprint.

10/13/2010 Current Tasks Requirements & Models

Current Tasks’ Requirements 10/13/2010 Current Tasks’ Requirements Requirements modeling for the first user story “Source Code Load & Translate”: Load & parse C#, Java, C++ source code. Translate the parsed C#, Java, C++ source code to CodeDOM. Associate the CodeDOM to the original source code. Requirements modeling for the second user story “Source Code Analyze”: Analyze CodeDom for clones.

10/13/2010 UML Model – Load & Parse

10/13/2010 UML Model – Translate

10/13/2010 UML Model – Associate

10/13/2010 UML Model – Analyze

10/13/2010 Architecture Design and Architecture

Key Architecture Points 10/13/2010 Key Architecture Points Multilanguage support Configurable for different platforms Stand-along application plug-in backend service Extendable

Architecture Application User Interface Web Interface Core 10/13/2010 Architecture Application User Interface Web Interface Core Clone Detection Algorithms Code Model Service API Language Support (Interface) Eclipse Plug-in C# Service Java Service C++ Service Etc…

Core Unit Code Model Stores the code in common format 10/13/2010 Core Unit Code Model Stores the code in common format Application Programming Interface Used to embed clone detection in applications Language Service Interface Communication layer between the core and the specific language services Code Model Clone Detection Algorithms Core API Language Service Interface

Visual Studio Solution 10/13/2010 Visual Studio Solution

10/13/2010 Core

10/13/2010 Core - API

10/13/2010 Language Service

10/13/2010 Language Service

10/13/2010 Language Service

10/13/2010 App Configuration

10/13/2010 CRC Cards Class Responsibility Collaboration Cards

Java Parser CRC Java Parser Parse Java source code LALRParser (Gold Parser) Construct Java token tree

C# Parser CRC Parser Parse C# source code LALRParser (Gold Parser) Construct C# token tree

Language ServiceCRC LanguageService Defines standard interface for all language providers. ILanguageService

Java Service CRC JavaService Reads Java source code Java Parser Understands Java grammar production rules CloneDetection Construct CodeDOM compilation unit JavaCodeProvider ILanguageService

Cs Service CRC CsService Reads C# source code C# Parser Understands C# grammar production rules CloneDetection Construct CodeDOM compilation unit CsCodeProvider ILanguageService

CloneDetectionCRC CloneDection Loads and manages languages services. ILanguageService Controls parsing Establishes CodeDOM compilation units to source code file associations Compares code segments CodeDomComparer Provides bookkeeping for code segments CodeDomSummary

FileSetNodeCRC FileSetNode Manages file set tree information for a CloneProject

ProjectNodeCRC ProjectNode Manages project tree information for a CloneProject

SourceFileNodeCRC SourceFileNode Manages source file tree information for a CloneProject

EnabledValueConverterCRC Manages enabled state for visual components bound to an object

VisibilityValueConverterCRC Manages visibility state for visual components bound to an object

CloneProjectCRC CloneProject Manages project information PresentationModel Knows the file sets associated with a project ILanguageService Knows the files associated with each file set Knows the name of the project Can add a file Can remove a file

ProjectIOCRC ProjectIO Save a CloneProject CloneProject Open a CloneProject

RecentProjectListCRC Manages a list of recently viewed projects CloneProject

ProjectViewCRC ProjectView Visual display of project tree CloneProject PresentationModel ProjectNode FileSetNode SourceFileNode ILanguageService

AppCRC App Startup class Manage visual theme

MainFrameCRC MainFrame Manage application frame PresentationModel Manage user input – Save CloneProject Manage user input – Open ProjectView Manage user input – Close Manage user input – Exit Manage user input – Add File Set Manage user input – Create New

PresentationModelCRC Manage current project state ICloneDetection Current Project CloneProject Clone Detection Currently Selected File

10/13/2010 Parsing Our struggles and our successes.

Parsing Struggles & Successes 10/13/2010 Parsing Struggles & Successes We explored and conducted spikes on CSParser and CS CodeDOM Parser. They both had advantages and disadvantage. We came to the conclusion that neither of them were going to fit our needs. We explored and conducted a spike on GOLD Parser. We ultimately chose the GOLD Parser because it best fit our needs. This gave us a way to manage multiple language grammars with one engine.

C# Spike

C# Spike Review Spike Objectives: CSParser Associated risks/shortfalls Project feasibility Familiarization CSParser a utility which parses the C# source code and creates a CodeDOM tree of the code Open source Supports most language features Error handling for features not supported

C# Spike: CSParser Output

C# Spike Review (cont) Spike Conclusion: Moving on from Spike: Some limitations, but has work around Wrapper code needed Moving on from Spike: This past iteration, we downloaded CSParser and familiarized ourselves with it more. Due to several programs having the same name, we came across CS CodeDOM Parser, as well.

CS Parser & CS CodeDOM Parser The good & the bad for both… CS Parser: Good parser - Parsed a lot of C# language features No GUI - It is all command line Came with a large number of test cases Does not use CodeDOM CS CodeDOM Parser: General parsing GUI Uses CodeDOM

C# Plan Since both programs have good and bad features, our plan is to combine them. CSParser + CS CodeDOM Parser Planned combined features: Good parsing GUI CodeDOM Test cases

10/13/2010 GOLD Parsing System Spike

Topics To Discuss What is it? How does it work? 10/13/2010 Topics To Discuss What is it? How does it work? What can we use it for? How can we extend it?

10/13/2010 What Is GOLD? GOLD is a free parsing system that you can use to develop your own programming languages, scripting languages and interpreters. It strives to be a development tool that can be used with numerous programming languages and on multiple platforms. – www.devincook.com/goldparser

How It Works (Block Structure) 10/13/2010 How It Works (Block Structure) Source Code Grammar Builder Compiled Grammar Table (*.cgt) Engine Parsed Data

How It Works (Components) 10/13/2010 How It Works (Components) Source Code Grammar Builder Compiled Grammar Table (*.cgt) Engine Parsed Data Three Major Components Builder – Reads a source grammar to construct a Compiled Grammar Table Compiled Grammar Table – Stores LALR and DFA parse tables Engine – Performs actual parsing

Compiled Grammar Table (*.cgt) 10/13/2010 How It Works (Process) Source Code Grammar Builder Compiled Grammar Table (*.cgt) Engine Parsed Data Step 1 Write the grammar for the language being implemented. (GOLD-Meta Language) Rules: Backus-Naur Form Terminals: Regular Expressions Character sets: Set Notation

Compiled Grammar Table (*.cgt) 10/13/2010 How It Works (Process) Source Code Grammar Builder Compiled Grammar Table (*.cgt) Engine Parsed Data Step 2 Analyze Grammar Construct LALR and DFA parse tables which are saved in a Compiled Grammar Table file.

Compiled Grammar Table (*.cgt) 10/13/2010 How It Works (Process) Source Code Grammar Builder Compiled Grammar Table (*.cgt) Engine Parsed Data Step 3 Analyze source text with parser engine and construct parse tree Engine can be implemented in any number of programming languages

Usage within CloneDigger 10/13/2010 Usage within CloneDigger Source Code Compiled Grammar Table (*.cgt) Engine Parsed Data CodeDOM Conversion AST CodeDOM Conversion Need to write routine to move data from Parsed Tree to CodeDOM Parsed data trees from parser are stored in consistent data structure, but are based on rules defined within grammars

Task Understanding Three Step Process Step 1 Code Translation 10/13/2010 Task Understanding Three Step Process Step 1 Code Translation Step 2 Clone Detection Step 3 Visualization Common Model Translator Source Files Detected Clones Inspector Common Model Step 1: Code Translation C#, C++, Java, VB (or Python) CodeDOM Step 2: Clone Detection Leverage current clone detection techniques and research Step 3: Clone Visualization Need for an intuitive user interface Clone Visualization UI Detected Clones

Extension and Enhancements 10/13/2010 Extension and Enhancements Source Code Grammar Builder Compiled Grammar Table (*.cgt) Engine Parsed Data Enhance Grammars Update Java Update C# Define C++ Share among other classmates with similar interest Share with greater community

Grammars What is a grammar? 10/13/2010 Grammars What is a grammar? A set of rules of a specific kind, for forming strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context —only their form.

10/13/2010 Gold Parser Grammars Gold Parser uses context-free grammars that can be used to do Lookahead Left-to-Right (LALR) parsing. LALR compliant grammars that we already have: C# Java Visual Basic .Net

10/13/2010 Grammar Example

10/13/2010 C++ Grammar Issue Currently no LALR compliant C++ grammar exists due to the overall complexity. Other C++ parsers exist, but give an output format different than the other languages we already have grammars for using Gold Parser. We are still searching for C++ parsing solutions.

GOLD Parser Conclusion 10/13/2010 GOLD Parser Conclusion We plan to use GOLD Parsing System. Tasks we have to complete: Update JAVA grammer Update C# grammer Research “Define C++ grammer” Create a CodeDOM conversion to move data from Parsed Tree to CodeDOM

10/13/2010 GOLD Parsing System GOLD Parsing Populating CodeDOM

Topics To Discuss What we are doing? Compiled Grammar Table 10/13/2010 Topics To Discuss What we are doing? Compiled Grammar Table Bookkeeping Testing

How It Works (Block Structure) 10/13/2010 How It Works (Block Structure) Source Code Grammar Builder Compiled Grammar Table (*.cgt) Engine Parsed Data

Compiled Grammar Table (*.cgt) 10/13/2010 How It Works (Process) Source Code Grammar Builder Compiled Grammar Table (*.cgt) Engine Parsed Data Typical output from engine: a long nested tree

Usage within CloneDigger 10/13/2010 Usage within CloneDigger Source Code Compiled Grammar Table (*.cgt) Engine Parsed Data CodeDOM Conversion AST CodeDOM Conversion Need to write routine to move data from Parsed Tree to CodeDOM Parsed data trees from parser are stored in consistent data structure, but are based on rules defined within grammars

10/13/2010 Grammar Updates GOLD Parser Grammar Updates

10/13/2010 Grammar Updates Currently the grammars we have for the Gold parser are out dated. Current Gold Grammars C# version 2.0 Java version 1.4 Current available software versions C# version 4.0 Java version 6

10/13/2010 Grammar Update Issues Grammars for C# and Java are very complex and require a lot of work to build. Antler and Gold Parser grammars use completely different syntax. Positive note: Other development not halted by use of older grammars.

Our Bookkeeping Bookkeeping for parsing the multiple grammars 10/13/2010 Our Bookkeeping Bookkeeping for parsing the multiple grammars

Compiled Grammar Table 10/13/2010 Compiled Grammar Table For Java, there is… 359 production rules 249 distinctive symbols (terminal & non-terminal) For C#, there is… 415 production rules 279 distinctive symbols (terminal & non-terminal)

Production Rule Dependancies 10/13/2010 Production Rule Dependancies

Our Grammar Bookkeeping 10/13/2010 Our Grammar Bookkeeping Since there are so many production rules, we came up with the following bookkeeping: A spreadsheet of the compiled grammar table (for each language) with each production rule indexed. This spreadsheet covers: various aspects of language what we have/have not handled from the parser what we have/have not implemented into CodeDOM percentage complete

Our Grammar Bookkeeping 10/13/2010 Our Grammar Bookkeeping

Parsing & CodeDOM Status 10/13/2010 Parsing & CodeDOM Status Parsing Handlers’ Status: C# = 100% complete Java = 100% complete

10/13/2010 CodeDOM Language Independent Object Model

CodeDOM Document Object Model for Source Code API - [System.CodeDom] Only supports certain aspects of the language since it’s language agnostic Good Enough What Does it Do? Programmatically Constructs Code What Doesn’t it Do? Does NOT parse

CodeDOM Example CodeCompileUnit CodeNameSpace Imports Types Members Event Field Method Statements Expression Property

10/13/2010 Clone Anaysis Clones & Dr. Kraft’s Tool

Software Clones Software Clones: (Definitions from Wikipedia) Duplicate code: a sequence of source code that occurs more than once, either within a program or across different programs owned or maintained by the same entity. Clones: sequences of duplicate code. “Clones are segments of code that are similar according to some definition of similarity.” —Ira Baxter, 2002 - Duplicate code: a sequence of source code that occurs more than once, either within a program or across different programs owned or maintained by the same entity. - Clones: sequences of duplicate code. - There is no agreement in the research community on the exact notion of redundancy and cloning. Ira Baxter’s definition of clones expresses this vagueness: (read quote)

Clones Types 3 Types of Clones (Definition of Similarity): Type 1: An exact copy without modifications (except for whitespace and comments) Type 2: A syntactically identical copy Only variable, type, or function identifiers have been changed Type 3: A copy with further modifications Statements have been changed, reordered, added, or removed 3 types of clones: Type 1: an exact copy without modifications (except for whitespace and comments). Type 2: a syntactically identical copy only variable, type, or function identifiers have been changed. Type 3: a copy with further modifications statements have been changed, added, or removed.

How Clones are Created Copy and Paste Programming Multiple Developers Ctrl-C, Ctrl-V Virus Multiple Developers Similar Functionality, Similar Code Plagiarism Code Theft How are clones created: - Copy and paste programming: the programmer selects a code fragment and copies it to another location. Sometimes these clones are modified slightly to adapt them to their new environment or purpose. - similar functionality, similar code: if a programmer sees similar functionality to what he is tasked to develop, then he will reuse that code as a template and then customize the template in the pasted context. - Plagiarism, where code is simply copied without permission or attribution.

Clone Research Multi-Language Clone Detection Preliminary Research Cutting Edge of Research Preliminary Research Dr. Kraft and Students at UAB C# and VB. Publication Nicholas A. Kraft, Brandon W. Bonds, Randy K. Smith: Cross-language Clone Detection. SEKE 2008: 54-59 Utilizes Mono Parsers C# VB - Detecting clones across multiple programming languages is on the cutting edge of research. Some related research: A preliminary version of this was done by Dr. Kraft and his students for C# and VB. They compared the Mono C# parser (written in C#) to the Mono VB parser (written in VB). They published the following paper.

Dr. Kraft Clone Analysis Performs Comparisons of Code Files For each File, a CodeDOM tree is tokenized Uses Levenshtein Distance Calculation Minimum number of edits needed to transform one sequence into the other Distances Calculated Distance determines Probability of a Clone - Detecting clones across multiple programming languages is on the cutting edge of research. Some related research: A preliminary version of this was done by Dr. Kraft and his students for C# and VB. They compared the Mono C# parser (written in C#) to the Mono VB parser (written in VB). They published the following paper.

Dr. Kraft Application

Limitations Only does file-to-file comparisons Does not detect clones in same source file Can only detect Type 1 and some Type 2 clones Not very efficient (brute force) - Detecting clones across multiple programming languages is on the cutting edge of research. Some related research: A preliminary version of this was done by Dr. Kraft and his students for C# and VB. They compared the Mono C# parser (written in C#) to the Mono VB parser (written in VB). They published the following paper.

Enhancements Add Support for Same File Clone Detection Add Support for Type 3 Clone Detection Requires more Research Provide a more efficient clone analysis algorithm

10/13/2010 Testing White Box & Black Box Testing

Testing Our Project White Box Testing: Black Box Testing: Unit Testing 10/13/2010 Testing Our Project White Box Testing: Unit Testing Black Box Testing: Production Rule Testing Allows us to test the robustness of our engine because we can force rule production errors. Regression Testing Automated Functional Testing

10/13/2010 Unit Testing

Production Rule Test Input File Example 10/13/2010 Production Rule Test Input File Example

10/13/2010 Functional Tests

10/13/2010 Metrics Project Metrics

SLOC For Our Project As of Nov 22, 2010 SLOC: 10/13/2010 SLOC For Our Project As of Nov 22, 2010 SLOC: CS666_Client = 1746 lines CS666_Core = 2653 lines CS666_CppParser = 155 lines CS666_CsParser = 3259 lines CS666_JavaParser = 3378 lines CS666_LanguageSupport = 84 lines CS666_UnitTests = 2162 lines Total = 13467 lines (including unit tests)

10/13/2010 Demonstration Demonstration of our progress.

Demonstration These are the things we would like to show you today: 10/13/2010 Demonstration These are the things we would like to show you today: GUI work Project setup Save project Load project Loading of source code Parsing of source code Translation of source code

10/13/2010 Team Collaboration Team 2 & Team 3

10/13/2010 Team Collaboration Due to Team 3’s team size, we have taken responsibility of gathering & sharing grammars. Team 3 has the responsibility of the C++ Parsing. Both Teams will… Use the same grammars & engines We will both have limitations based on this. Ex: JAVA grammar is based off 1.4 -> we are limited to using JAVA 1.4 Test the same grammars & engines We will have two test beds.

Team Collaboration Method of collaboration: 10/13/2010 Team Collaboration Method of collaboration: Google code project site: http://code.google.com/p/uah-studio-2010-2011/ Team 4 team members have access to this site. Meetings Email What does our google code project contain? Source control for grammers & engines Bugs/Issues Team 4 will have ability to document new bugs. Documents/Artifacts

10/13/2010 Team Collaboration Both teams met Monday (11-8-10) after class and performed the required Pair Programming. Current Status: Team 2 All project source code has been made available. We are researching and working to update the Java and C# grammars. Team 3 Team 3 is working on C++ parsing. Looking into other parser, ELSA.

Path Forward Current Status & Path Forward for Next Semester 10/13/2010 Path Forward Current Status & Path Forward for Next Semester

Where we stand… Iteration 1: Parsing -> 85% Completed parsing for Java & C# No parsing for C++ But we have a foundation and design to start from. Iteration 2: Translation to CodeDOM -> 60% We have the foundation and design completed. Now, it is a matter of turning the crank for the languages. Iteration 3: Clone Analysis -> 30% Ported majority of Dr. Kraft’s student project code. Started focusing on the GUI 150

Task Understanding Three Step Process Step 1 Code Translation 10/13/2010 Task Understanding Three Step Process Step 1 Code Translation Step 2 Clone Detection Step 3 Visualization Common Model Translator Source Files Detected Clones Inspector Common Model Step 1: Code Translation C#, C++, Java, VB (or Python) CodeDOM Step 2: Clone Detection Leverage current clone detection techniques and research Step 3: Clone Visualization Need for an intuitive user interface Clone Visualization UI Detected Clones

Schedule 152

Path Forward Our next step is to re-evaluate where we currently stand. Revisit Release Plan Pull in Software Studio I work that was not completed. Revisit User Stories Start off strong with unit tests not completed. 153