Presented by Teererai Marange. According to Caliskan-Islam et al.(2015), authorship attribution using the Code Stylometry feature set is possible when.

Presented by Teererai Marange

According to Caliskan-Islam et al.(2015), authorship attribution using the Code Stylometry feature set is possible when code is run through commercial obfuscation software with no significant change in accuracy. Is this a good thing? But is it a great thing? In this presentation, I will discuss the implications of this statement as well as its relevance to the problem of authorship attribution and to the software security field.

Given: A set of authors O. A labelled set of code samples from various owners C where the labels represents the authors. An unlabeled set of code c. Find: The author o who wrote c.

Code set Owners set Unattributed code Syntactic feature set Owner set Classifier

According to Caliskan-Islam et al.(2015), authorship attribution using the Code Stylometry feature set is possible when code is run through commercial obfuscation software with no significant change in accuracy. Is this a good thing?

Code set Owners set Unattributed code Syntactic feature set Owner set Classifier Obfuscator

1.Such a system would have more data since it can also consider obfuscated software. 2.Results are not affected by tampering with lexical features and changing names in the code. 3.This also implies that hiding one’s authorship from such a system would require skill.

Machine learning algorithms perform better when they have a larger training dataset. Such a system would be able to train on obfuscated code and hence would be more accurate. It would also be possible to supply a piece of obfuscated code for authorship attribution, hence a larger number of possible problems solved.

Speaks to robustness of the system to changes in lexical and presentation features of code. Utilizing such a system in practice requires trust. If adding random whitespace throughout code would affect the results then trust goes out of the window. Thus this form of robustness is a good thing.

Useful for plagiarism detection where the offender does not understand the code for which they are trying to hide the true authorship. In such cases, running such a system would potentially be sufficient. But what about if the offender is skilled? According to the author, “We do not claim that our feature set resists attempts at manipulating one’s coding style. However, we do ﬁnd that our syntactic feature set is impervious to off-the shelf code obfuscators which only change layout and SOME lexical features”

By definition, obfuscation software is designed to make code difficult for humans to understand. This is in order to prevent reverse engineering of code. Obfuscation is not intended for hiding authorship. Thus if one used such software to hide authorship, they are using the wrong software for the wrong purpose.

However to the author’s knowledge no software that is meant to hide authorship or obfuscate style has been written. Thus the topic of determining performance when code is run through such software is left to future work.

The author’s claim is a good thing because: Results are more robust. Potentially larger training set for the algorithm. Potential to solve a larger set of problems. Authorship hiding now takes skill. However Obfuscation is not intended to hide author(left to future research).

Any questions???

Presented by Teererai Marange. According to Caliskan-Islam et al.(2015), authorship attribution using the Code Stylometry feature set is possible when.

Similar presentations

Presentation on theme: "Presented by Teererai Marange. According to Caliskan-Islam et al.(2015), authorship attribution using the Code Stylometry feature set is possible when."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Presented by Teererai Marange. According to Caliskan-Islam et al.(2015), authorship attribution using the Code Stylometry feature set is possible when.

Similar presentations

Presentation on theme: "Presented by Teererai Marange. According to Caliskan-Islam et al.(2015), authorship attribution using the Code Stylometry feature set is possible when."— Presentation transcript:

Similar presentations

About project

Feedback