Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cross Language Clone Analysis Team 2 February 3, 2011.

Similar presentations


Presentation on theme: "Cross Language Clone Analysis Team 2 February 3, 2011."— Presentation transcript:

1 Cross Language Clone Analysis Team 2 February 3, 2011

2 Parsing/CodeDOM Clone Analysis Customer Meeting GUI Implementation Testing Current Status Path Forward 2

3  Allen Tucker  Patricia Bradford  Greg Rodgers  Ashley Chafin 3

4 Quick Overview Quick overview of our project and where we currently stand. 4

5  3 Types of Clones (Definition of Similarity): ◦ Type 1: An exact copy without modifications (except for whitespace and comments) ◦ Type 2: A syntactically identical copy  Only variable, type, or function identifiers have been changed ◦ Type 3: A copy with further modifications  Statements have been changed, reordered, added, or removed Clones Types 5

6  Three Step Process Step 1 Code Translation Step 2 Clone Detection Step 3 Visualization Task Understanding Source Files Translator Common Model Inspector Detected Clones UI Clone Visualization 6

7  Step 1: Code Translation ◦ C#, C++, Java, VB (or Python) ◦ CodeDOM  Step 2: Clone Detection ◦ Leverage current clone detection techniques and research  Step 3: Clone Visualization ◦ Need for an intuitive user interface Task Understanding (cont.) 7

8 Dr. Kraft Application 8

9 Limitations  Only does file-to-file comparisons ◦ Does not detect clones in same source file  Can only detect Type 1 and some Type 2 clones  Not very efficient (brute force) 9

10  Add Support for Same File Clone Detection  Add Support for Type 3 Clone Detection ◦ Requires more Research  Provide a more efficient clone analysis algorithm Enhancements 10

11 Features  Clone Detection Software Suite ◦ Identifies ◦ Tracks ◦ Manages Software Clones  Multi-language support ◦ C++ ◦ C# ◦ Java 11

12 Features (cont)  Extendible ◦ Built on a Plug-in Framework ◦ Add new languages  Easy to Navigate between Clones  Persists Clones for easy Retrieval 12

13 Features (cont)  Provides complete code coverage  Multi-Application Support ◦ Stand-alone ◦ Plug-in based (Eclipse) ◦ Backend service (Ant task)  Extendible ◦ Built on a Plug-in Framework ◦ Add new languages  Easy to Navigate between Clones  Persists Clones for easy Retrieval 13

14  Complexity of problem proves more difficult than initial estimates.  Technology to be applied is neither well- established or has yet to be developed.  Unable to complete defined project scope within schedule.  Volatile user requirements leading to redefinition of project objectives. Risks 14

15 Architecture Design and Architecture 15

16 Key Architecture Points  Multilanguage support  Configurable for different platforms ◦ Stand-along application ◦ plug-in ◦ backend service  Extendable 16

17 Architecture C# Service Java Service C++ Service Application User Interface Application User Interface Code Model Clone Detection Algorithms Core API Language Support (Interface) 17 Service Eclipse Plug-in Eclipse Plug-in Etc… Web Interface Web Interface

18 Core Unit  Code Model ◦ Stores the code in common format  Application Programming Interface ◦ Used to embed clone detection in applications  Language Service Interface ◦ Communication layer between the core and the specific language services Code Model Clone Detection Algorithms Core API Language Service Interface 18

19 Visual Studio Solution 19

20 Core 20

21 Core - API 21

22 Language Service 22

23 Language Service 23

24 Language Service 24

25 App Configuration 25

26 The Algorithm 26

27  3 Types of Clones (Definition of Similarity): ◦ Type 1: An exact copy without modifications (except for whitespace and comments) ◦ Type 2: A syntactically identical copy  Only variable, type, or function identifiers have been changed ◦ Type 3: A copy with further modifications  Statements have been changed, reordered, added, or removed 27

28 28 Code Base CodeDOM Conversion Use Gold Parser for conversion Transformation Transform the CodeDOM elements into a sequence of tokens Processed Code Match Detection Run comparison algorithm on transformed code Transformed Code Clones Formatting Clone pair/class locations of the transformed code are mapped to the original code base by line numbers and file location Clone Pairs/Classes Filtering Clones are extracted from the source, visualized and manually analyzed to filter out false positives

29  Covert source code to CodeDOM 29

30  Transform the CodeDOM syntax to a sequence of tokens 30

31  $p$p($p$p&$p){$p$p=$p;$p$p=$p.$p();for(; $p!=$p. $p();++$p){$p<<$p<<$p<<*$p<<$p;++$p;}}  $p$p($p$p&$p){$p$p=$p;$p$p=$p.$p();for(; $p!=$p. $p();++$p){$p $p $p<<$p;++$p;}}  Levenshtein Distance ◦ minimum number of edits needed to transform one string into the other  Insertion  Deletion  substitution 31

32 32

33 Parsing and conversion to CodeDOM 33

34 How It Works (Block Structure) Grammar Compiled Grammar Table (*.cgt) Source Code Parsed Data 34

35 How It Works (Process) Grammar Compiled Grammar Table (*.cgt) Source Code Parsed Data Typical output from engine: a long nested tree 35

36 Usage within CloneDigger Compiled Grammar Table (*.cgt) Source Code Parsed Data CodeDOM Conversion Need to write routine to move data from Parsed Tree to CodeDOM Parsed data trees from parser are stored in consistent data structure, but are based on rules defined within grammars AST 36

37 Grammar Updates  Currently the grammars we have for the Gold parser are out dated.  Current Gold Grammars ◦ C# version 2.0 ◦ Java version 1.4  Current available software versions ◦ C# version 4.0 ◦ Java version 6 37

38  Received grammar and included in project.  One parser engine == Three languages

39 CodeDOM  Document Object Model for Source Code  API - [System.CodeDom]  Only supports certain aspects of the language since it’s language agnostic ◦ Good Enough  What Does it Do? ◦ Programmatically Constructs Code  What Doesn’t it Do? ◦ Does NOT parse 39

40 CodeDOM Example  CodeCompileUnit ◦ CodeNameSpace  Imports  Types  Members  Event  Field  Method  Statements  Expression  Property 40

41 White Box and Black Box Testing 41

42  White Box Testing: ◦ Unit Testing  Black Box Testing: ◦ Production Rule Testing  Allows us to test the robustness of our engine because we can force rule production errors.  Regression Testing  Automated ◦ Functional Testing 42

43

44

45  Current Test Count: 33  Added test to cover existing code  All tests are passing… ◦ “Happy Path Tests” ◦ Will begin off-nominals

46

47 Where we currently stand 47

48 48  These estimates are only for work done this semester.  Source Code Load & Translate ◦ C++ - 10% ◦ C# - 0% ◦ Java – 35% ◦ Associate – 0%  Source Code Analyze ◦ Dr. Kraft’s analysis technique – 40% ◦ Type 1 clones – 0% (Implement Next Iteration) ◦ Type 2 clones – 0% ◦ Type 3 clones – 0% Where we stand…

49 49  Project Management ◦ Remove “demo” GUI – 100% ◦ Sketches for visual design – 40% ◦ GUI Rework – 83%  Testing ◦ Baseline unit tests – 100% ◦ Update unit test for this iteration – 90% ◦ Create/Update Functional Tests – 75% Where we stand…

50  As of Feb 3, 2011  SLOC: ◦ CS666_Client = 2137 lines ◦ CS666_Core = 2695 lines ◦ CS666_Console = 138 lines ◦ CS666_CppParser = 155 lines ◦ CS666_CsParser = 3265 lines ◦ CS666_JavaParser = 3388 lines ◦ CS666_LanguageSupport = 84 lines ◦ CS666_UnitTests = 944 lines  Total = 12806 lines (including unit tests) 50 - Used lcounter.exe to count SLOC

51 Path Forward for the next iteration 51

52 52 Schedule

53 53  Below is a list of the tasks for our next iteration: ◦ Parsing/CodeDOM  C++ parsing  Complete Java conversion to CodeDOM ◦ Clone Analysis  Detecting Type 1 clones ◦ GUI  Project management  Displaying source code  Sketches for visual design Next Iteration

54 54 ◦ Documentation  User Stories, Use Cases, UML Models, Sketches  Project management  Displaying source code  Displaying CodeDOM  Displaying Type 1 clones detected  Functional Tests  Update schedule ◦ Testing  Unit tests  Execute functional tests Next Iteration


Download ppt "Cross Language Clone Analysis Team 2 February 3, 2011."

Similar presentations


Ads by Google