Presentation is loading. Please wait.

Presentation is loading. Please wait.

Source Code and Text Plagiarism Detection Strategies

Similar presentations


Presentation on theme: "Source Code and Text Plagiarism Detection Strategies"— Presentation transcript:

1 Source Code and Text Plagiarism Detection Strategies
Maeve Paris University of Ulster

2 Summary Distinction drawn between plagiarism of text and plagiarism of source code Different tools and metrics have been developed for each BUT if we consider computer programming languages as being similar to natural languages, We might be able to apply techniques from computer-assisted text analysis to detect source code plagiarism/ collusion

3 Plagiarism detection in text
Verbatim copying Paraphrasing Inconsistencies Authorship attribution: CATA tools for Concordances KWIC index Online services (plagiarism.org, WordCHECK, etc).

4 KWIC and Frequency lists

5 Plagiarism detection in source code
Metrics-driven Plagiarising transformations Lexico-structural approach Sim Yap MOSS JPLAG

6 Is source code free speech?
Statement in English: The conference session starts at nine. Statement in Java: Dog dog = new Dog(); dog.bark(); Statement in Prolog: listensToMusic(yolanda):- happy(yolanda). s --> np, vp. Syntax and lexicon in all cases

7 Authoring styles in source code
Choice of algorithm Naming of classes and variables Physical layout Comments

8 Analysis Corpus of 35 java files Manual inspection JPLAG Concordance

9 JPLAG

10 JPLAG

11 JPLAG

12 Concordance

13 Stop list

14

15

16 Conclusions Computer programming languages are similar to natural languages It is possible to distinguish a programming style CATA may help in detecting plagiarism Focus on control structures may lead to wrong conclusions

17 Conclusions CATA is not restricted to programs which compile
Need for tools to combine metrics-based profiles with concordancing features


Download ppt "Source Code and Text Plagiarism Detection Strategies"

Similar presentations


Ads by Google