Poon Yan Horn Jonathan 09/04/2010 School Of Computing National University of Singapore Student Submissions Integrity Diagnosis (SSID) A User-Centric Plagiarism Checking System
Plagiarism in SoC In 2000, 98 cases in the first assignment of CS1102. (Cheang, Kurnia, Lim & Oon, 2005) In 2004, 181 students admitted to committing plagiarism. (Ooi & Tan, 2005)
How to Detect Plagiarism Manually…? Achieve the maximum accuracy But… time-consuming and tedious… Ask the computer to do it? Give up on some accuracy… Comparison done automatically
How to Prevent Plagiarism Factors of Plagiarism: o Fear of failure o Difficulty of work Solution: set up learning groups
SSID: A User-Centric Plagiarism Detecting System Features: Pairwise plagiarism detection Plagiarism cluster detection But also… Reporting suspicious and recording plagiarism cases Logging system Visuals
How SSID works? Pairwise plagiarism detection engine Clustering engine (DBSCAN) User Interface Database
Pairwise Comparison: Improvements Based on the Greedy-String-Tiling algorithm (Wise, 1993) 4 improvements: Usage of indexing – JPlag (Prechelt, Malpohl, & Phlippsen, 2000) Counting of statements Excluding skeleton code Providing more accurate making of tokens
Pairwise Comparisons: Accuracy Experiment 3 Assignments 28 participants 84 copies of plagiarism submissions Identified attacks – 20 types F-measure = 1
Demonstration Link to SSID
Conclusion Limitations Can be defeated by some attacks Not deployed in real life situation Future Work Approximate Matching Deploy in real life situation
Questions? All questions are welcome!
Thank you for your attention