Cross Language Clone Analysis Team 2 November 22, 2010.

Slides:



Advertisements
Similar presentations
Designing Reusable Frameworks for Test Automation
Advertisements

JTX Overview Overview of Job Tracking for ArcGIS (JTX)
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
AHRT: The Automated Human Resources Tool BY Roi Ceren Muthukumaran Chandrasekaran.
Web Applications Development Using Coldbox Platform Eddie Johnston.
CSE 425: Semantic Analysis Semantic Analysis Allows rigorous specification of a program’s meaning –Lets (parts of) programming languages be proven correct.
Mike Azocar Sr. Developer Technical Specialist Microsoft Corporation
DCS Architecture Bob Krzaczek. Key Design Requirement Distilled from the DCS Mission statement and the results of the Conceptual Design Review (June 1999):
Computers: Tools for an Information Age
Quality Assurance CS 615. Mission Statement The Quality Assurance team will provide assurance to stakeholders in CS-615/616 projects that their projects.
Presented by IBM developer Works ibm.com/developerworks/ 2006 January – April © 2006 IBM Corporation. Making the most of Creating Eclipse plug-ins.
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
Supplement 02CASE Tools1 Supplement 02 - Case Tools And Franchise Colleges By MANSHA NAWAZ.
© 2006, Cognizant Technology Solutions. All Rights Reserved. The information contained herein is subject to change without notice. Automation – How to.
Miser-C MISRA-C Compliance Checker Ian Biller, Phillippe Dass, Bryan Eldridge, Jon Senchyna, Tracy Thomas Faculty Coach: Professor Michael Lutz Project.
CS-EE 481 Spring Founders Day, 2005 University of Portland School of Engineering Project Pocket Gopher Conversational Learning Agent Team Josh Jones.
Program development & programming languages Chapter 13.
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark.
MVC pattern and implementation in java
Table-Driven Acceptance Testing Mario Aquino Principal Software Engineer Object Computing, Inc.
Chapter 2 The process Process, Methods, and Tools
T Project Review RoadRunners [PP] Iteration
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
 Explain the role of a system analyst.  Identify the important parts of SRS document.  Identify the important problems that an organization would face.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
1 MDWE'2008, Toulouse, France, September 30, 2008 A Comparative Analysis of Transformation Engines for User Interface Development Juan Manuel González.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 10Slide 1 Architectural Design l Establishing the overall structure of a software system.
1 The Software Development Process  Systems analysis  Systems design  Implementation  Testing  Documentation  Evaluation  Maintenance.
Compiler course 1. Introduction. Outline Scope of the course Disciplines involved in it Abstract view for a compiler Front-end and back-end tasks Modules.
CST320 - Lec 11 Why study compilers? n n Ties lots of things you know together: –Theory (finite automata, grammars) –Data structures –Modularization –Utilization.
Cross Language Clone Analysis Team 2 October 27, 2010.
Cross Language Clone Analysis Team 2 April 7, 2011.
Feasibility Study Cross-language Clone Analysis Team 2.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
Systems Analysis and Design in a Changing World, 3rd Edition
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Software Product Line Material based on slides and chapter by Linda M. Northrop, SEI.
Overview of Previous Lesson(s) Over View  A program must be translated into a form in which it can be executed by a computer.  The software systems.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
Cross Language Clone Analysis Team 2 October 13, 2010.
1-1 Software Development Objectives: Discuss the goals of software development Identify various aspects of software quality Examine two development life.
Weaving a Debugging Aspect into Domain-Specific Language Grammars SAC ’05 PSC Track Santa Fe, New Mexico USA March 17, 2005 Hui Wu, Jeff Gray, Marjan Mernik,
Introduction to Compiling
Anubha Gupta | Software Engineer Visual Studio Online Microsoft Corp. Visual Studio Enterprise Leveraging modern tools to streamline Build and Release.
Cross Language Clone Analysis Team 2 February 3, 2011.
Accomplishments  Getting larger portion of both Java and C# into CodeDOM to support cross language detections  Source code and statement line number.
Cross Language Clone Analysis Team 2 March 3, 2011.
Cross Language Clone Analysis Team 2 November 10, 2010.
Cross Language Clone Analysis Team 2. Team Introduction Task Summary Introduction Scope of Work Description of Related Research Identification of User.
 Software Clones:( Definitions from Wikipedia) ◦ Duplicate code: a sequence of source code that occurs more than once, either within a program or across.
Cross Language Clone Analysis Team 2 February 3, 2011.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
©SoftMoore ConsultingSlide 1 Structure of Compilers.
1 Asstt. Prof Navjot Kaur Computer Dept PRESENTED BY.
Ganga/Dirac Data Management meeting October 2003 Gennady Kuznetsov Production Manager Tools and Ganga (New Architecture)
Integrating and Extending Workflow 8 AA301 Carl Sykes Ed Heaney.
Software Project Configuration Management
Compiler Design (40-414) Main Text Book:
Lexical and Syntax Analysis
Recent trends in estimation methodologies
Cross Language Clone Analysis Team 2 November 22, 2010
Top Reasons to Choose Angular. Angular is well known for developing robust and adaptable Single Page Applications (SPA). The Application structure is.
Introduction to Software Testing
Chapter 1 Introduction(1.1)
Lecture 06:Software Maintenance
JavaScript CS 4640 Programming Languages for Web Applications
Chapter 10: Compilers and Language Translation
JavaScript CS 4640 Programming Languages for Web Applications
Faculty of Computer Science and Information System
Presentation transcript:

Cross Language Clone Analysis Team 2 November 22, 2010

Feasibility Study Release Plan Architecture Parsing CodeDOM Clone Analysis Testing Demonstration Team Collaboration Path Forward 2

 Allen Tucker  Patricia Bradford  Greg Rodgers  Brian Bentley  Ashley Chafin 3

Our evaluation of the project to determine the difficulty in carrying out the task. 4

 Our Customers: Dr. Etzkorn and Dr. Kraft  Customer Request: ◦ A tool that will abstract programs in C++, C#, Java, and (Python or VB) to the Dagstuhl Middle Metamodel, Microsoft CodeDOM or something similar, and detect cross-language clones.  Areas to Note: ◦ the user interface ◦ easy comparisons of clones ◦ visualization of clones ◦ sub-clones ◦ clone detection for large bodies of code 5

 Per our task, in order to find clones across different programming languages, we will have to first convert the code from each language over to a language independent object model.  Some Language Independent Object Models: ◦ Dagstuhl Middle Metamodel (DMM) ◦ Microsoft CodeDOM  Both of these models provide a language independent object model for representing the structure of source code. 6

 Three Step Process Step 1 Code Translation Step 2 Clone Detection Step 3 Visualization Source Files Translator Common Model Inspector Detected Clones UI Clone Visualization 7

 Fact: Modularity is a key characteristic in today’s software world  Why? Allows us to divide software into a decomposed separation of concerns ◦ Attributes to maintainability, reusability, testability and reliability  Clone Detection allows us to detect common software spread across large bodies of code ◦ Identify code that is subject to further modularity 8

 Clone Detection Software Suite ◦ Identifies ◦ Tracks ◦ Manages Software Clones  Multi-language support ◦ C++ ◦ C# ◦ Java 9

 Provides complete code coverage  Multi-Application Support ◦ Stand-alone ◦ Plug-in based (Eclipse) ◦ Backend service (Ant task)  Extendible ◦ Built on a Plug-in Framework ◦ Add new languages  Easy to Navigate between Clones  Persists Clones for easy Retrieval 10

 Complexity of problem proves more difficult than initial estimates.  Technology to be applied is neither well- established or has yet to be developed.  Unable to complete defined project scope within schedule.  Volatile user requirements leading to redefinition of project objectives. 11

Release Plan and User Stories 12

 Came out with original Release Plan on 9/15/20  Due to customer wants/needs, we had to re- tool our user stories.  Dr. Etzkorn’s main concerns:  Load source code and translate to a language independent model  Analyze the translated source code for clones ◦ Results from meeting:  Created two new user stories (see next two slides)  These two user stories have been pushed to the front of our card stack 13

Phase I

Story ID: Priority: Estimate: Days 15  As an analyst I want the to load and translate my source code projects so I can analyze the source for clones.

Story ID: Priority: Estimate: Days 16  As an analyst I want the to analyze my source code projects so I can see the clones.

Story ID: Priority: Estimate: Days 17  As a analyst I want the capability to have the source code associated with clones highlighted within source files so that they are easy to identify.

Requirements & Models 18

 Requirements modeling for the first user story “Source Code Load & Translate”: ◦ Load & parse C#, Java, C++ source code. ◦ Translate the parsed C#, Java, C++ source code to CodeDOM. ◦ Associate the CodeDOM to the original source code.  Requirements modeling for the second user story “Source Code Analyze”: ◦ Analyze CodeDom for clones. 19

20

21

22

23

Design and Architecture 24

 Multilanguage support  Configurable for different platforms ◦ Stand-along application ◦ plug-in ◦ backend service  Extendable 25

C# Service Java Service C++ Service Application User Interface Application User Interface Code Model Clone Detection Algorithms Core API Language Support (Interface) 26 Service Eclipse Plug-in Eclipse Plug-in Etc… Web Interface Web Interface

 Code Model ◦ Stores the code in common format  Application Programming Interface ◦ Used to embed clone detection in applications  Language Service Interface ◦ Communication layer between the core and the specific language services Code Model Clone Detection Algorithms Core API Language Service Interface 27

28

Class Responsibility Collaboration Cards 29

Java Parser Parse Java source codeLALRParser (Gold Parser) Construct Java token tree 30

Parser Parse C# source codeLALRParser (Gold Parser) Construct C# token tree 31

LanguageService Defines standard interface for all language providers. ILanguageService 32

JavaService Reads Java source codeJava Parser Understands Java grammar production rules CloneDetection Construct CodeDOM compilation unit JavaCodeProvider ILanguageService 33

CsService Reads C# source codeC# Parser Understands C# grammar production rules CloneDetection Construct CodeDOM compilation unit CsCodeProvider ILanguageService 34

CloneDection Loads and manages languages services. ILanguageService Controls parsing Establishes CodeDOM compilation units to source code file associations Compares code segmentsCodeDomComparer Provides bookkeeping for code segments CodeDomSummary 35

Our struggles and our successes. 36

 We explored and conducted spikes on CSParser and CS CodeDOM Parser. ◦ They both had advantages and disadvantage. ◦ We came to the conclusion that neither of them were going to fit our needs.  We explored and conducted a spike on GOLD Parser. ◦ We ultimately chose the GOLD Parser because it best fit our needs.  This gave us a way to manage multiple language grammars with one engine. 37

GOLD Parsing Populating CodeDOM 38

Grammar Compiled Grammar Table (*.cgt) Source Code Parsed Data 39

Grammar Compiled Grammar Table (*.cgt) Source Code Parsed Data Typical output from engine: a long nested tree 40

Compiled Grammar Table (*.cgt) Source Code Parsed Data CodeDOM Conversion Need to write routine to move data from Parsed Tree to CodeDOM Parsed data trees from parser are stored in consistent data structure, but are based on rules defined within grammars AST 41

Bookkeeping for parsing the multiple grammars. 42

 Currently the grammars we have for the Gold parser are out dated.  Current Gold Grammars ◦ C# version 2.0 ◦ Java version 1.4  Current available software versions ◦ C# version 4.0 ◦ Java version 6 43

 Grammars for C# and Java are very complex and require a lot of work to build.  Antler and Gold Parser grammars use completely different syntax.  Positive note: Other development not halted by use of older grammars. 44

Bookkeeping for parsing the multiple grammars 45

 For Java, there is… ◦ 359 production rules ◦ 249 distinctive symbols (terminal & non-terminal)  For C#, there is… ◦ 415 production rules ◦ 279 distinctive symbols (terminal & non-terminal) 46

47

Since there are so many production rules, we came up with the following bookkeeping:  A spreadsheet of the compiled grammar table (for each language) with each production rule indexed. ◦ This spreadsheet covers:  various aspects of language  what we have/have not handled from the parser  what we have/have not implemented into CodeDOM  percentage complete 48

49

 Parsing Handlers’ Status: ◦ C# = 100% complete ◦ Java = 100% complete 50

Language Independent Object Model 51

 Document Object Model for Source Code  API - [System.CodeDom]  Only supports certain aspects of the language since it’s language agnostic ◦ Good Enough  What Does it Do? ◦ Programmatically Constructs Code  What Doesn’t it Do? ◦ Does NOT parse 52

 CodeCompileUnit ◦ CodeNameSpace  Imports  Types  Members  Event  Field  Method  Statements  Expression  Property 53

Clones & Dr. Kraft’s Tool 54

 3 Types of Clones (Definition of Similarity): ◦ Type 1: An exact copy without modifications (except for whitespace and comments) ◦ Type 2: A syntactically identical copy  Only variable, type, or function identifiers have been changed ◦ Type 3: A copy with further modifications  Statements have been changed, reordered, added, or removed 55

 Multi-Language Clone Detection ◦ Cutting Edge of Research  Preliminary Research ◦ Dr. Kraft and Students at UAB  C# and VB.  Publication  Nicholas A. Kraft, Brandon W. Bonds, Randy K. Smith: Cross-language Clone Detection. SEKE 2008: ◦ Utilizes Mono Parsers  C#  VB 56

 Performs Comparisons of Code Files  For each File, a CodeDOM tree is tokenized  Uses Levenshtein Distance Calculation ◦ Minimum number of edits needed to transform one sequence into the other  Distances Calculated ◦ Distance determines Probability of a Clone 57

58

 Only does file-to-file comparisons ◦ Does not detect clones in same source file  Can only detect Type 1 and some Type 2 clones  Not very efficient (brute force) 59

 Add Support for Same File Clone Detection  Add Support for Type 3 Clone Detection ◦ Requires more Research  Provide a more efficient clone analysis algorithm 60

White Box & Black Box Testing 61

 White Box Testing: ◦ Unit Testing  Black Box Testing: ◦ Production Rule Testing  Allows us to test the robustness of our engine because we can force rule production errors.  Regression Testing  Automated ◦ Functional Testing 62

63

64

65

Project Metrics 66

 As of Nov 22, 2010  SLOC: ◦ CS666_Client = 1746 lines ◦ CS666_Core = 2653 lines ◦ CS666_CppParser = 155 lines ◦ CS666_CsParser = 3259 lines ◦ CS666_JavaParser = 3378 lines ◦ CS666_LanguageSupport = 84 lines ◦ CS666_UnitTests = 2162 lines  Total = lines (including unit tests) 67

Demonstration of our progress. 68

 These are the things we would like to show you today: ◦ GUI work ◦ Project setup  Save project  Load project ◦ Loading of source code ◦ Parsing of source code ◦ Translation of source code 69

Team 2 & Team 3 70

 Due to Team 3’s team size, we have taken responsibility of gathering & sharing grammars.  Team 3 has the responsibility of the C++ Parsing.  Both Teams will… ◦ Use the same grammars & engines  We will both have limitations based on this.  Ex: JAVA grammar is based off 1.4 -> we are limited to using JAVA 1.4 ◦ Test the same grammars & engines  We will have two test beds. 71

 Both teams met Monday ( ) after class and performed the required Pair Programming.  Current Status: ◦ Team 2  All project source code has been made available.  We are researching and working to update the Java and C# grammars. ◦ Team 3  Team 3 is working on C++ parsing.  Looking into other parser, ELSA. 72

Current Status & Path Forward for Next Semester 73

 Iteration 1: Parsing -> 85% ◦ Completed parsing for Java & C# ◦ No parsing for C++  But we have a foundation and design to start from.  Iteration 2: Translation to CodeDOM -> 60% ◦ We have the foundation and design completed. ◦ Now, it is a matter of turning the crank for the languages.  Iteration 3: Clone Analysis -> 30% ◦ Ported majority of Dr. Kraft’s student project code. ◦ Started focusing on the GUI Where we stand… 74

 Three Step Process Step 1 Code Translation Step 2 Clone Detection Step 3 Visualization Source Files Translator Common Model Inspector Detected Clones UI Clone Visualization 75

Schedule 76

 Our next step is to re-evaluate where we currently stand. ◦ Revisit Release Plan  Pull in Software Studio I work that was not completed. ◦ Revisit User Stories ◦ Start off strong with unit tests not completed. Path Forward 77