Performance evaluation of plagiarism detection method based on the intermediate language Vedran Juričić Tereza Jurić Marija Tkalec.

Slides:



Advertisements
Similar presentations
Decompilation of.NET bytecode Stephen Horne Trinity Hall 10 th February 2004 Computer Science Part II Project Progress Report
Advertisements

Revealing the Secrets of Self-Documenting Code Svetlin Nakov Telerik Corporation For C# Developers.
Mike Barnett RSDE Microsoft Research Nikolai Tillmann RSDE Microsoft Research TL51.
Introduction to Programming in C++ John Galletly.
ANTLR in SSP Xingzhong Xu Hong Man Aug Outline ANTLR Abstract Syntax Tree Code Equivalence (Code Re-hosting) Future Work.
Introduction to Programming Lesson 1. Objectives Skills/ConceptsMTA Exam Objectives Understanding Computer Programming Understand computer storage and.
1 Chapter 2 Introduction to Java Applications Introduction Java application programming Display ____________________ Obtain information from the.
1 CS 161 Introduction to Programming and Problem Solving Chapter 9 C++ Program Components Herbert G. Mayer, PSU Status 10/20/2014.
Copyright © 2012 Pearson Education, Inc. Chapter 1: Introduction to Computers and Programming.
Antonio Cisternino & Diego Colombo VisualStorms Tools Another Brick in the Robot... Università degli Studi di Pisa.
Using and Building an Automatic Program Verifier K. Rustan M. Leino Research in Software Engineering (RiSE) Microsoft Research, Redmond Lecture 5 LASER.
Chapter 3: Introducing the Microsoft.NET Framework and Visual Basic.NET Visual Basic.NET Programming: From Problem Analysis to Program Design.
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
Principles of Procedural Programming
CHAPTER 1: INTORDUCTION TO C LANGUAGE
PRINCIPLES OF PROGRAMMING Revision. A Computer  A useful tool for solving a great variety of problems.  To make a computer do anything (i.e. solve.
1 CSC 221: Introduction to Programming Fall 2012 course overview  What did you set out to learn?  What did you actually learn?  Where do you go from.
ROBOTC Software Introduction. ROBOTC Software ROBOTC developed specifically for classrooms and competitions Complete programming solution for VEX Cortex.
1 CSC 221: Computer Programming I Fall 2004 course overview  what did we set out to learn?  what did you actually learn?  where do you go from here?
Introduction to Java Appendix A. Appendix A: Introduction to Java2 Chapter Objectives To understand the essentials of object-oriented programming in Java.
Introduction to FORTRAN
Advanced .NET Programming I 13th Lecture
Source Code and Text Plagiarism Detection Strategies
.NET Overview. 2 Objectives Introduce.NET –overview –languages –libraries –development and execution model Examine simple C# program.
CIS Computer Programming Logic
Programming. What is a Program ? Sets of instructions that get the computer to do something Instructions are translated, eventually, to machine language.
Copyright © 2012 Pearson Education, Inc. Chapter 1: Introduction to Computers and Programming.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
CS161 Topic #21 CS161 Introduction to Computer Science Topic #2.
Computing with C# and the.NET Framework Chapter 1 An Introduction to Computing with C# ©2003, 2011 Art Gittleman.
DEPARTMENT OF COMPUTER SCIENCE & TECHNOLOGY FACULTY OF SCIENCE & TECHNOLOGY UNIVERSITY OF UWA WELLASSA 1 CST 221 OBJECT ORIENTED PROGRAMMING(OOP) ( 2 CREDITS.
Mining and Analysis of Control Structure Variant Clones Guo Qiao.
Introduction to Java Applications Part II. In this chapter you will learn:  Different data types( Primitive data types).  How to declare variables?
Week 1: THE C# LANGUAGE Chapter 1: Variables and Expressions ➤ Included in Visual Studio.NET ➤ What the.NET Framework is and what it contains ➤ How.NET.
Computer Science 101 Introduction to Programming.
Introduction to Computer Programming Using C Session 23 - Review.
Using and Building an Automatic Program Verifier K. Rustan M. Leino Research in Software Engineering (RiSE) Microsoft Research, Redmond Lecture 3 Marktoberdorf.
Integer numerical data types. The integer data types The integer data types use the binary number system as encoding method There are a number of different.
Algorithm Design.
C++ Lecture 1 Friday, 4 July History of C++ l Built on top of C l C was developed in early 70s from B and BCPL l Object oriented programming paradigm.
Basics of Most C++ Programs // Programmer: Clayton Price date: 9/4/ // File: fahr2celc.cpp 03. // Purpose:
1 Lab 1. C Introduction  C: –Developed by Bell lab. in –a procedure-oriented programming language.  Developing environments: –Editing –Preprocessing.
1 CSC 221: Computer Programming I Spring 2008 course overview  What did we set out to learn?  What did you actually learn?  Where do you go from here?
MSIL C#.NET Software Development. MSIL AKA CIL What all.NET languages compile to What all.NET languages compile to Binary Intermediate Language Binary.
CSC 1051 – Data Structures and Algorithms I Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
1 09/27/04CS150 Introduction to Computer Science 1 Let ’ s all Repeat Together.
2016 N5 Prelim Revision. HTML Absolute/Relative addressing in HTML.
Object Oriented Software Development 4. C# data types, objects and references.
Reading input from the console input. Java's console input The console is the terminal window that is running the Java program I.e., that's the terminal.
General Computer Science for Engineers CISC 106 Lecture 12 James Atlas Computer and Information Sciences 08/03/2009.
Introduction to Java Applications Part II. In this chapter you will learn:  Different data types( Primitive data types).  How to declare variables?
Spring 2009 Programming Fundamentals I Java Programming XuanTung Hoang Lecture No. 8.
Lecture1 Instructor: Amal Hussain ALshardy. Introduce students to the basics of writing software programs including variables, types, arrays, control.
Overview CNS 3260 C#.NET Software Development. 2.NET Framework Began in 2000 Developed in three years (2000 to 2003) Operating System Hardware.NET Framework.
CSC 212 – Data Structures Lecture 15: Big-Oh Notation.
Lecture #1: Introduction to Algorithms and Problem Solving Dr. Hmood Al-Dossari King Saud University Department of Computer Science 6 February 2012.
CS Class 04 Topics  Selection statement – IF  Expressions  More practice writing simple C++ programs Announcements  Read pages for next.
Chapter 1: Introduction to Computers and Programming.
Copyright © Curt Hill The Assignment Operator and Statement The Most Common Statement you will use.
Software Engineering Algorithms, Compilers, & Lifecycle.
CHARLES UNIVERSITY IN PRAGUE faculty of mathematics and physics Advanced.NET Programming I 7 th Lecture Pavel Ježek
CSC 221: Computer Programming I Spring 2010
CSC 221: Computer Programming I Fall 2005
Chapter 1 IDE and Tools for Developing CLR-based Programs
CSE 1020:Programming by Delegation
Life is Full of Alternatives
Life is Full of Alternatives
Welcome back to Software Development!
Introduction to Programming
Introduction to C Programming
Presentation transcript:

Performance evaluation of plagiarism detection method based on the intermediate language Vedran Juričić Tereza Jurić Marija Tkalec

Plagiarism detection methodPlagiarism detection method Method for detecting plagiarism in source code for.Net languages C# Visual Basic.Net C++ … Identify similar code fragments Determine similarity between source files Based on intermediate language 2

Plagiarism detectionPlagiarism detection 3 1. using System.Text; 2. namespace Test { 3. class Math { 4. public double GetMaximum(double[] Input) { 5. double result = Input[0]; 6. foreach (double temp in Input) { 7. if (temp>result) 8. result = temp; } 9. return result; } } } 1. using System.Text; 2. namespace Test { 3. class Math { 4. public double GetMaximum(double[] Input) { 5. double result = Input[0]; 6. for (int i=0;i<Input.Length;i++) { 7. if (Input[i]>result) 8. result = Input[i]; } 9. return result; } } } Similarity = Number of overlapping lines / Total number of lines = 6 / 9 = 66,66% FirstSecond

But… 4 1. using System.Text; 2. namespace Test { 3. class Math { 4. public double GetMaximum(double[] Input) { 5. double result = Input[0]; 6. foreach (double temp in Input) { 7. if (temp>result) 8. result = temp; } 9. return result; } 1. using System; 2. namespace OtherTest { 3. class MyClass { 4. public double ReturnMaximum(double[] Array) { 5. double current = Input[0]; 6. for (int j=0;j<Input.Length;j++) { 7. if (Input[j]>current) 8. current = Input[j]; } 9. return result; } Similarity = Number of overlapping lines / Total number of lines = 0 / 9 = 0,00% FirstSecond

Problems Modification of variable names, types, constants Modification of class member definitions Line and command reordering … Solution Detail analysis Complex preprocessing For each supported language 5

Our solutionOur solution Convert from source language to low-level language (Common Intermediate Language) By using existing tools Compiler Disassemler Tools exist for all.Net languages 6

Our solutionOur solution 7 using System.Text; namespace Test { class Math { public double GetMaximum(double[] Input) { double result = Input[0]; foreach (double temp in Input) { if (temp>result) result = temp; } return result; }.method public hidebysig instance float64 GetMaximum(float64[] Input) cil managed { // Code size 61 (0x3d).maxstack 2.locals init (float64 V_0, float64 V_1, float64 V_2, float64[] V_3, int32 V_4, bool V_5) IL_0000: nop IL_0001: ldarg.1 IL_0002: ldc.i4.0 IL_0003: ldelem.r8 IL_0004: stloc.0 IL_0005: nop IL_0006: ldarg.1 IL_0007: stloc.3 ….. IL_0037: ldloc.0 IL_0038: stloc.2 IL_0039: br.s IL_003b IL_003b: ldloc.2 IL_003c: ret } // end of method C::GetMaximum C# language Common Intermediate Language C# compiler nop ldarg.1 ldc.i4.0 ldelem.r8 stloc.0 nop ldarg.1 stloc.3 … ldloc.0 stloc.2 br.s ldloc.2 ret

Plagiarism detection systemPlagiarism detection system Evaluate the performance Analyze and compare behavior to most commonly used plagiarism detection systems: MOSS JPlag CodeMatch 8

Tested systemsTested systems MOSS Developed in Commonly used in computer science faculties Supports 26 programming languages JPlag Developed in Commonly used in education Supports C, C++, C# and Java 9

Tested SystemsTested Systems CodeMatch Developed in Commercial software Supports 26 languages ILMatch (our system) Developed in Supports all.Net languages (currently 59 languages) 10

Testing 6 test categories 50 test cases covering common code modification techniques Evaluation methods Precision, recall F-measure 11

Results 12 MOSSJPlag CodeMatchILMatch Highest F-measures

Positive No impact User comments Code formatting Modification of variable and class names Modification of class members Changing data types Some impact Replacing expressions and loops Rewritting code in different language 13

Further workFurther work Significant impact Reordering operands Reordering class members Adding redundant statements and variables Improvements in comparison algorithm 14