Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

Slides:



Advertisements
Similar presentations
Duplicate code detection using Clone Digger Peter Bulychev Lomonosov Moscow State University CS department.
Advertisements

Chao Liu, Chen Chen, Jiawei Han, Philip S. Yu
This research is funded in part the U. S. National Science Foundation grant CCR DEET for Component-Based Software Murali Sitaraman, Durga P. Gandi.
ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.
Unification and Refactoring of Clones Giri Panamoottil Krishnan and Nikolaos Tsantalis Department of Computer Science & Software Engineering Clone images.
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Recursion CS 367 – Introduction to Data Structures.
ANTLR in SSP Xingzhong Xu Hong Man Aug Outline ANTLR Abstract Syntax Tree Code Equivalence (Code Re-hosting) Future Work.
Chair of Software Engineering From Program slicing to Abstract Interpretation Dr. Manuel Oriol.
IPOG: A General Strategy for T-Way Software Testing
Program Representations. Representing programs Goals.
Mining Graphs.
Reverse Engineering © SERG Code Cloning: Detection, Classification, and Refactoring.
Attack of the Clones: Detecting Cloned Applications on Android Markets Jonathan Crussell 1,2, Clint Gibler 1, and Hao Chen 1Clint GiblerHao Chen 1 University.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
A Graphical Model For Simultaneous Partitioning And Labeling Philip Cowans & Martin Szummer AISTATS, Jan 2005 Cambridge.
Association Analysis (7) (Mining Graphs)
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
Video summarization by video structure analysis and graph optimization M. Phil 2 nd Term Presentation Lu Shi Dec 5, 2003.
7. Duplicated Code Metrics Duplicated Code Software quality
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
Comp 245 Data Structures Software Engineering. What is Software Engineering? Most students obtain the problem and immediately start coding the solution.
1 Distributed Monitoring of Peer-to-Peer Systems By Serge Abiteboul, Bogdan Marinoiu Docflow meeting, Bordeaux.
Notes for Chapter 12 Logic Programming The AI War Basic Concepts of Logic Programming Prolog Review questions.
Class Specification Implementation Graph By: Njume Njinimbam Chi-Chang Sun.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Criterion for.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Supervisor:Mr. Sayed Morteza Zaker Presentor:Fateme hadinezhad.
Invariant Based Programming in Education Tutorial, FM’08 Linda Mannila
Scientific Inquiry Mr. Wai-Pan Chan Scientific Inquiry Research & Exploratory Investigation Scientific inquiry is a way to investigate things, events.
Mining and Analysis of Control Structure Variant Clones Guo Qiao.
1 ENERGY 211 / CME 211 Lecture 26 November 19, 2008.
CMCD: Count Matrix based Code Clone Detection Yang Yuan and Yao Guo Key Laboratory of High-Confidence Software Technologies (Ministry of Education) Peking.
Cross Language Clone Analysis Team 2 October 27, 2010.
Toward an Implementation of the “Form Template Method” Refactoring Nicolas Juillerat University of Fribourg Switzerland.
Principles of programming languages 5: An operational semantics of a small subset of C Department of Information Science and Engineering Isao Sasano.
C++ Programming Language Lecture 2 Problem Analysis and Solution Representation By Ghada Al-Mashaqbeh The Hashemite University Computer Engineering Department.
Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.
Do background research to decide on a project. Do background research to decide on a project. Form a hypothesis statement. Form a hypothesis statement.
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
Agenda Review: –Planar Graphs Lecture Content:  Concepts of Trees  Spanning Trees  Binary Trees Exercise.
The Software Development Process
Design - programming Cmpe 450 Fall Dynamic Analysis Software quality Design carefully from the start Simple and clean Fewer errors Finding errors.
Scientific Debugging. Errors in Software Errors are unexpected behaviors or outputs in programs As long as software is developed by humans, it will contain.
Cross Language Clone Analysis Team 2 February 3, 2011.
1 Measuring Similarity of Large Software System Based on Source Code Correspondence Tetsuo Yamamoto*, Makoto Matsushita**, Toshihiro Kamiya***, Katsuro.
Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK.
 Software Clones:( Definitions from Wikipedia) ◦ Duplicate code: a sequence of source code that occurs more than once, either within a program or across.
Cross Language Clone Analysis Team 2 February 3, 2011.
The Hashemite University Computer Engineering Department
1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.
The Scientific Method. 5 Steps 1.Question (Problem) 2.Hypothesis 3.Experiment 4.Analysis 5.Conclusion.
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
A Binary Linear Programming Formulation of the Graph Edit Distance Presented by Shihao Ji Duke University Machine Learning Group July 17, 2006 Authors:
Welcome! Simone Campanoni
Debuggers. Errors in Computer Code Errors in computer programs are commonly known as bugs. Three types of errors in computer programs –Syntax errors –Runtime.
©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.
YAHMD - Yet Another Heap Memory Debugger
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Optimization Code Optimization ©SoftMoore Consulting.
CBCD: Cloned Buggy Code Detector
Algorithms and networks
The Scientific Method.
Lecture 23 Pages : Separating Syntactic Analysis from Execution. We omit many details so you have to read the section in the book. The halting.
Algorithms and networks
IPOG: A General Strategy for T-Way Software Testing
make observations about the world around you
Data Flow Analysis Compiler Design
On Refactoring Support Based on Code Clone Dependency Relation
Presentation transcript:

Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel

Preliminaries Plagiarism - "use or close imitation of the language and thoughts of another author and the representation of them as one's own original work.“ Plagiarism.wmv Funny quote - “When one copies from one resource it’s Plagiarism but when copies from multiple resources – Research”

So what happens when you plagiarize? In countries like the US.

In India

Let’s get to the main theme now! GPLAG – Detection of Software Plagiarism by Program Dependence Graph Analysis

Motivation It’s time consuming and labour intensive to design large softwares with multitude of lines of code. So, the easiest way is to Plagiarize ! – especially from Open Source Softwares

Review Review of Plagiarism Detection 1. String Based 2. AST-based 3. Token-based

String Based B. S. Baker. On finding duplication and near duplication in large software systems. In Proc. of 2 nd Working Conf. on Reverse Engineering, 1995.

AST-Based I. D. Baxter, A. Yahin, L. Moura, M. Sant’Anna, and L. Bier. Clone detection using abstract syntax trees. In Proc. of Int. Conf. on Software Maintenance, 1998.

Background

PDG – Program Dependence Graph A PDG is a graph representation of the source code of a procedure. Basic statements like variables, assignments, and procedure calls are represented by program vertices in a PDG.

Original and Plagiarized Code

PDG-Based Plagiarism Detection Given an original program P, and a plagiarism suspect P`, plagiarism detection tries to search for duplicate structures between P and P` in order to prove or disprove the existence of plagiarism. By representing a program as a set of PDGs, the search for duplicates are performed on PDGs.

Plagiarism as Subgraph Isomorphism The disguises are: 1. Format alteration and identifier renaming 2. Statement reordering 3. Control replacement 4. Code insertion

The mature rate γ is set based on one’s belief in what proportion of a PDG will stay untouched in plagiarism. It is 0.9 in experiments because overhauling (without errors) 10% of a PDG of reasonable size is almost equivalent to rewriting the code.

Pruning Plagiarism Search Space In order to find plagiarized PDG pairs, n ¤ m pair-wise (relaxed) subgraph isomorphism testings are needed in principle. Two kinds of filters: 1. Lossless filter 2. Lossy filter

Lossless Filter 1. PDGs smaller than an interesting size K are excluded from both G and G`. 2. Based on the definition of γ- isomorphism, a PDG pair (g, g`), g G and g’ G`, can be excluded if

Lossy Filter Vertex histogram is constructed as a summarized representation of each PDG. Similarity is measured in terms of vertex histograms between g and g`.

The Main Idea Estimate the k-dimensional multinomial distribution and then consider whether h(g`) is likely to be an observation from P g

GPLAG Algorithm

Experiment Evaluation

Efficiency of GPLAG

Core part Plagiarism

Conclusion – Some questions still remain How this implementation better than debuggers? How is this approach better than reverse engineering? Human intervention!