Download presentation
Published byAron Singleton Modified over 9 years ago
1
Source Code and Text Plagiarism Detection Strategies
Maeve Paris University of Ulster
2
Summary Distinction drawn between plagiarism of text and plagiarism of source code Different tools and metrics have been developed for each BUT if we consider computer programming languages as being similar to natural languages, We might be able to apply techniques from computer-assisted text analysis to detect source code plagiarism/ collusion
3
Plagiarism detection in text
Verbatim copying Paraphrasing Inconsistencies Authorship attribution: CATA tools for Concordances KWIC index Online services (plagiarism.org, WordCHECK, etc).
4
KWIC and Frequency lists
5
Plagiarism detection in source code
Metrics-driven Plagiarising transformations Lexico-structural approach Sim Yap MOSS JPLAG
6
Is source code free speech?
Statement in English: The conference session starts at nine. Statement in Java: Dog dog = new Dog(); dog.bark(); Statement in Prolog: listensToMusic(yolanda):- happy(yolanda). s --> np, vp. Syntax and lexicon in all cases
7
Authoring styles in source code
Choice of algorithm Naming of classes and variables Physical layout Comments
8
Analysis Corpus of 35 java files Manual inspection JPLAG Concordance
9
JPLAG
10
JPLAG
11
JPLAG
12
Concordance
13
Stop list
16
Conclusions Computer programming languages are similar to natural languages It is possible to distinguish a programming style CATA may help in detecting plagiarism Focus on control structures may lead to wrong conclusions
17
Conclusions CATA is not restricted to programs which compile
Need for tools to combine metrics-based profiles with concordancing features
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.