Cross Language Clone Analysis Team 2 February 3, 2011.

Slides:



Advertisements
Similar presentations
JTX Overview Overview of Job Tracking for ArcGIS (JTX)
Advertisements

Test Case Management and Results Tracking System October 2008 D E L I V E R I N G Q U A L I T Y (Short Version)
Operating-System Structures
Reverse Engineering © SERG Code Cloning: Detection, Classification, and Refactoring.
1 Introducing Collaboration to Single User Applications A Survey and Analysis of Recent Work by Brian Cornell For Collaborative Systems Fall 2006.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Presented by IBM developer Works ibm.com/developerworks/ 2006 January – April © 2006 IBM Corporation. Making the most of Creating Eclipse plug-ins.
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
U of R eXtensible Catalog Team MetaCat. Problem Domain.
CASE Tools CIS 376 Bruce R. Maxim UM-Dearborn. Prerequisites to Software Tool Use Collection of useful tools that help in every step of building a product.
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
EE694v-Verification-Lect5-1- Lecture 5 - Verification Tools Automation improves the efficiency and reliability of the verification process Some tools,
The project plan. December 16, Agenda The project plan –Risks –Language decision –Schedule –Quality plan –Testing –Documentation Program architecture.
Adding Automated Functionality to Office Applications.
(C) 2013 Logrus International Practical Visualization of ITS 2.0 Categories for Real World Localization Process Part of the Multilingual Web-LT Program.
State of Connecticut Core-CT Project Query 4 hrs Updated 1/21/2011.
Miser-C MISRA-C Compliance Checker Ian Biller, Phillippe Dass, Bryan Eldridge, Jon Senchyna, Tracy Thomas Faculty Coach: Professor Michael Lutz Project.
Chapter 6– Artifacts of the process
Software Construction. Implementation System Specification Requirements Analysis Architectural Design Detailed Design Coding & Debugging Unit Testing.
CS-EE 481 Spring Founders Day, 2005 University of Portland School of Engineering Project Pocket Gopher Conversational Learning Agent Team Josh Jones.
A First Program Using C#
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark.
Table-Driven Acceptance Testing Mario Aquino Principal Software Engineer Object Computing, Inc.
Xactium xDSLs Run Models Not Code Tony Clark
Chapter 2 The process Process, Methods, and Tools
Bogor-Java Environment for Eclipse MSE Presentation II Yong Peng.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Reviewing Recent ICSE Proceedings For:.  Defining and Continuous Checking of Structural Program Dependencies  Automatic Inference of Structural Changes.
Compiler course 1. Introduction. Outline Scope of the course Disciplines involved in it Abstract view for a compiler Front-end and back-end tasks Modules.
Concordia University Department of Computer Science and Software Engineering Click to edit Master title style COMPILER DESIGN Review Joey Paquet,
Cross Language Clone Analysis Team 2 October 27, 2010.
Chapter 14 Part II: Architectural Adaptation BY: AARON MCKAY.
Cross Language Clone Analysis Team 2 April 7, 2011.
Joey Paquet, Lecture 12 Review. Joey Paquet, Course Review Compiler architecture –Lexical analysis, syntactic analysis, semantic.
Feasibility Study Cross-language Clone Analysis Team 2.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
© 2006 IBM Corporation Agile Planning Web UI. © 2006 IBM Corporation Agenda  Overview of APT Web UI  Current Issues  Required Infrastructure  API.
May08-21 Model-Based Software Development Kevin Korslund Daniel De Graaf Cory Kleinheksel Benjamin Miller Client – Rockwell Collins Faculty Advisor – Dr.
Ad Hoc Graphical Reports Ad Hoc Graphical Reports Copyright © Team #4 CSCI 6838 Spring CSCI Research Project and Seminar Team# 4 (
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
Cross Language Clone Analysis Team 2 October 13, 2010.
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
Anubha Gupta | Software Engineer Visual Studio Online Microsoft Corp. Visual Studio Enterprise Leveraging modern tools to streamline Build and Release.
Cross Language Clone Analysis Team 2 February 3, 2011.
Accomplishments  Getting larger portion of both Java and C# into CodeDOM to support cross language detections  Source code and statement line number.
Cross Language Clone Analysis Team 2 March 3, 2011.
Plug-in Architectures Presented by Truc Nguyen. What’s a plug-in? “a type of program that tightly integrates with a larger application to add a special.
Cross Language Clone Analysis Team 2 November 22, 2010.
May08-21 Model-Based Software Development Kevin Korslund Daniel De Graaf Cory Kleinheksel Benjamin Miller Client – Rockwell Collins Faculty Advisor – Dr.
Cross Language Clone Analysis Team 2 November 10, 2010.
Javascript Static Code Analyzer
Cross Language Clone Analysis Team 2. Team Introduction Task Summary Introduction Scope of Work Description of Related Research Identification of User.
 Software Clones:( Definitions from Wikipedia) ◦ Duplicate code: a sequence of source code that occurs more than once, either within a program or across.
PROGRAMMING TESTING B MODULE 2: SOFTWARE SYSTEMS 22 NOVEMBER 2013.
Chapter 12© copyright Janson Industries Java Server Faces ▮ Explain the JSF framework ▮ SDO (service data objects) ▮ Facelets ▮ Pagecode classes.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
Bogor-Java Environment for Eclipse MSE Presentation III Yong Peng.
Ganga/Dirac Data Management meeting October 2003 Gennady Kuznetsov Production Manager Tools and Ganga (New Architecture)
Integrating and Extending Workflow 8 AA301 Carl Sykes Ed Heaney.
Software Testing.
Compiler Design (40-414) Main Text Book:
Lexical and Syntax Analysis
CMS High Level Trigger Configuration Management
z/Ware 2.0 Technical Overview
Cross Language Clone Analysis Team 2 November 22, 2010
Course Name: QTP Trainer: Laxmi Duration: 25 Hrs Session: Daily 1 Hr.
Presentation transcript:

Cross Language Clone Analysis Team 2 February 3, 2011

Parsing/CodeDOM Clone Analysis Customer Meeting GUI Implementation Testing Current Status Path Forward 2

 Allen Tucker  Patricia Bradford  Greg Rodgers  Ashley Chafin 3

Quick Overview Quick overview of our project and where we currently stand. 4

 3 Types of Clones (Definition of Similarity): ◦ Type 1: An exact copy without modifications (except for whitespace and comments) ◦ Type 2: A syntactically identical copy  Only variable, type, or function identifiers have been changed ◦ Type 3: A copy with further modifications  Statements have been changed, reordered, added, or removed Clones Types 5

 Three Step Process Step 1 Code Translation Step 2 Clone Detection Step 3 Visualization Task Understanding Source Files Translator Common Model Inspector Detected Clones UI Clone Visualization 6

 Step 1: Code Translation ◦ C#, C++, Java, VB (or Python) ◦ CodeDOM  Step 2: Clone Detection ◦ Leverage current clone detection techniques and research  Step 3: Clone Visualization ◦ Need for an intuitive user interface Task Understanding (cont.) 7

Dr. Kraft Application 8

Limitations  Only does file-to-file comparisons ◦ Does not detect clones in same source file  Can only detect Type 1 and some Type 2 clones  Not very efficient (brute force) 9

 Add Support for Same File Clone Detection  Add Support for Type 3 Clone Detection ◦ Requires more Research  Provide a more efficient clone analysis algorithm Enhancements 10

Features  Clone Detection Software Suite ◦ Identifies ◦ Tracks ◦ Manages Software Clones  Multi-language support ◦ C++ ◦ C# ◦ Java 11

Features (cont)  Extendible ◦ Built on a Plug-in Framework ◦ Add new languages  Easy to Navigate between Clones  Persists Clones for easy Retrieval 12

Features (cont)  Provides complete code coverage  Multi-Application Support ◦ Stand-alone ◦ Plug-in based (Eclipse) ◦ Backend service (Ant task)  Extendible ◦ Built on a Plug-in Framework ◦ Add new languages  Easy to Navigate between Clones  Persists Clones for easy Retrieval 13

 Complexity of problem proves more difficult than initial estimates.  Technology to be applied is neither well- established or has yet to be developed.  Unable to complete defined project scope within schedule.  Volatile user requirements leading to redefinition of project objectives. Risks 14

Architecture Design and Architecture 15

Key Architecture Points  Multilanguage support  Configurable for different platforms ◦ Stand-along application ◦ plug-in ◦ backend service  Extendable 16

Architecture C# Service Java Service C++ Service Application User Interface Application User Interface Code Model Clone Detection Algorithms Core API Language Support (Interface) 17 Service Eclipse Plug-in Eclipse Plug-in Etc… Web Interface Web Interface

Core Unit  Code Model ◦ Stores the code in common format  Application Programming Interface ◦ Used to embed clone detection in applications  Language Service Interface ◦ Communication layer between the core and the specific language services Code Model Clone Detection Algorithms Core API Language Service Interface 18

Visual Studio Solution 19

Core 20

Core - API 21

Language Service 22

Language Service 23

Language Service 24

App Configuration 25

The Algorithm 26

 3 Types of Clones (Definition of Similarity): ◦ Type 1: An exact copy without modifications (except for whitespace and comments) ◦ Type 2: A syntactically identical copy  Only variable, type, or function identifiers have been changed ◦ Type 3: A copy with further modifications  Statements have been changed, reordered, added, or removed 27

28 Code Base CodeDOM Conversion Use Gold Parser for conversion Transformation Transform the CodeDOM elements into a sequence of tokens Processed Code Match Detection Run comparison algorithm on transformed code Transformed Code Clones Formatting Clone pair/class locations of the transformed code are mapped to the original code base by line numbers and file location Clone Pairs/Classes Filtering Clones are extracted from the source, visualized and manually analyzed to filter out false positives

 Covert source code to CodeDOM 29

 Transform the CodeDOM syntax to a sequence of tokens 30

 $p$p($p$p&$p){$p$p=$p;$p$p=$p.$p();for(; $p!=$p. $p();++$p){$p<<$p<<$p<<*$p<<$p;++$p;}}  $p$p($p$p&$p){$p$p=$p;$p$p=$p.$p();for(; $p!=$p. $p();++$p){$p $p $p<<$p;++$p;}}  Levenshtein Distance ◦ minimum number of edits needed to transform one string into the other  Insertion  Deletion  substitution 31

32

Parsing and conversion to CodeDOM 33

How It Works (Block Structure) Grammar Compiled Grammar Table (*.cgt) Source Code Parsed Data 34

How It Works (Process) Grammar Compiled Grammar Table (*.cgt) Source Code Parsed Data Typical output from engine: a long nested tree 35

Usage within CloneDigger Compiled Grammar Table (*.cgt) Source Code Parsed Data CodeDOM Conversion Need to write routine to move data from Parsed Tree to CodeDOM Parsed data trees from parser are stored in consistent data structure, but are based on rules defined within grammars AST 36

Grammar Updates  Currently the grammars we have for the Gold parser are out dated.  Current Gold Grammars ◦ C# version 2.0 ◦ Java version 1.4  Current available software versions ◦ C# version 4.0 ◦ Java version 6 37

 Received grammar and included in project.  One parser engine == Three languages

CodeDOM  Document Object Model for Source Code  API - [System.CodeDom]  Only supports certain aspects of the language since it’s language agnostic ◦ Good Enough  What Does it Do? ◦ Programmatically Constructs Code  What Doesn’t it Do? ◦ Does NOT parse 39

CodeDOM Example  CodeCompileUnit ◦ CodeNameSpace  Imports  Types  Members  Event  Field  Method  Statements  Expression  Property 40

White Box and Black Box Testing 41

 White Box Testing: ◦ Unit Testing  Black Box Testing: ◦ Production Rule Testing  Allows us to test the robustness of our engine because we can force rule production errors.  Regression Testing  Automated ◦ Functional Testing 42

 Current Test Count: 33  Added test to cover existing code  All tests are passing… ◦ “Happy Path Tests” ◦ Will begin off-nominals

Where we currently stand 47

48  These estimates are only for work done this semester.  Source Code Load & Translate ◦ C % ◦ C# - 0% ◦ Java – 35% ◦ Associate – 0%  Source Code Analyze ◦ Dr. Kraft’s analysis technique – 40% ◦ Type 1 clones – 0% (Implement Next Iteration) ◦ Type 2 clones – 0% ◦ Type 3 clones – 0% Where we stand…

49  Project Management ◦ Remove “demo” GUI – 100% ◦ Sketches for visual design – 40% ◦ GUI Rework – 83%  Testing ◦ Baseline unit tests – 100% ◦ Update unit test for this iteration – 90% ◦ Create/Update Functional Tests – 75% Where we stand…

 As of Feb 3, 2011  SLOC: ◦ CS666_Client = 2137 lines ◦ CS666_Core = 2695 lines ◦ CS666_Console = 138 lines ◦ CS666_CppParser = 155 lines ◦ CS666_CsParser = 3265 lines ◦ CS666_JavaParser = 3388 lines ◦ CS666_LanguageSupport = 84 lines ◦ CS666_UnitTests = 944 lines  Total = lines (including unit tests) 50 - Used lcounter.exe to count SLOC

Path Forward for the next iteration 51

52 Schedule

53  Below is a list of the tasks for our next iteration: ◦ Parsing/CodeDOM  C++ parsing  Complete Java conversion to CodeDOM ◦ Clone Analysis  Detecting Type 1 clones ◦ GUI  Project management  Displaying source code  Sketches for visual design Next Iteration

54 ◦ Documentation  User Stories, Use Cases, UML Models, Sketches  Project management  Displaying source code  Displaying CodeDOM  Displaying Type 1 clones detected  Functional Tests  Update schedule ◦ Testing  Unit tests  Execute functional tests Next Iteration