Presentation is loading. Please wait.

Presentation is loading. Please wait.

Plagiarism Detection for Multithreaded Software Based on Thread-Aware Software Birthmarks Zhenzhou Tian MOE Key Lab for Intelligent.

Similar presentations


Presentation on theme: "Plagiarism Detection for Multithreaded Software Based on Thread-Aware Software Birthmarks Zhenzhou Tian MOE Key Lab for Intelligent."— Presentation transcript:

1 Plagiarism Detection for Multithreaded Software Based on Thread-Aware Software Birthmarks Zhenzhou Tian zztian@stu.xjtu.edu.cn MOE Key Lab for Intelligent Networks and Network Security Xi’an Jiaotong University, China 2015-10-18 1

2 2 Outline 1.Introduction 2.Thread-Aware Birthmark Methods 3.Evaluation 4.Unsolved Problems & Future Work

3 3 Introduction Software plagiarism has been a serious threat to the healthy development of software industry Violate licenses for commercial interests or unwittingly Weak code protection awareness Powerful automated code obfuscation tools Distributed in binary form

4 4 Introduction A series of methods are proposed for plagiarism detection Software Watermarking Insert extra data “a sufficiently determined attacker will eventually be able to defeat any watermark” Static and Dynamic Software Birthmarks Dynamic birthmarks are more resilient to semantic- preserving code obfusctions

5 5 Introduction A series of methods are proposed for plagiarism detection Software Watermarking Static and Dynamic Software Birthmarks Increasingly popular trend towards multithreaded programming brings new challenge to existing dynamic birthmark methods Existing dynamic birthmark remain optimized for sequential programs Neglect the effect of thread scheduling Two executions of a single program under same input can be very different, rendering the existing methods ineffective

6 6 Introduction DKISBSCSSB Cosine0.8380.452 Jaccard0.5510.369 Dice0.6780.51 Containment0.7350.477  DKISB: dynamic key instruction sequence birthmark  SCSSB: system call short sequence birthmark

7 7 Introduction Contributions:  Two thread-aware dynamic birthmarks TW-DKISB and TW- SCSSB are proposed to detect software plagiarism Operates directly on binary executables Not limited to specific operating systems and languages Resilient to various automated obfuscation techniques 29 different obfuscation techniques in SandMark

8 8 Introduction Contributions:  A prototype is implemented using the Pin instrumentation framework, and extensive experiments are conducted.  A suite of benchmarks is compiled for researchers to conduct experiments and present their findings http://labs.xjtudlc.com/labs/benchmark.html

9 9 Outline 1.Introduction 2.Thread-Aware Birthmark Methods 3.Evaluation 4.Unsolved Problems & Future Work

10 10 A set of characteristics extracted from a program that reflects intrinsic properties of the program, and which can be used to identify the program uniquely. Two types: Static and Dynamic software birthmarks Dynamic birthmark defined by Myles Software Birthmark

11 11 Thread-Aware Dynamic Software Birthmark Predetermining a thread schedule is very difficult Try to shield their influence on executions instead of enforcing thread schedule

12 12 Thread-Aware Dynamic Software Birthmarks  Main Idea: Split then Aggregate Execution order in each thread is relatively stable. Projecting the trace on thread-ids to obtain sub-traces to extract Slice birthmarks Aggregating all slice birthmarks. Different traces of a program under the same input Same slices

13 13 Slice Birthmark & Program Birthmark K-Gram Slice Birthmarks SAM SSM

14 14 Thread-Aware Birthmark based Plagiarism Detection 5 main modules: ① DAM: monitoring and recording ② PP: constitute valid traces ③ BG: extract thread- aware birthmarks ④ BSC: calculate similarity scores ⑤ PD: determine detection result

15 15 Thread-Aware Birthmark based Plagiarism Detection 5 main modules: ① DAM: monitoring and recording ② PP: constitute valid traces ③ BG: extract thread- aware birthmarks ④ BSC: calculate similarity scores ⑤ PD: determine detection result

16 16 Dynamic Analysis Module Monitoring the execution of a program using Pin DKISExtractor: performs dynamic taint analysis to identify and record key instructions SysTracer: record each execution of system calls

17 17 Thread-Aware Birthmark based Plagiarism Detection 5 main modules: ① DAM: monitoring and recording ② PP: constitute valid traces ③ BG: extract thread- aware birthmarks ④ BSC: calculate similarity scores ⑤ PD: determine detection result

18 18 Thread-Aware Birthmark based Plagiarism Detection 5 main modules: ① DAM: monitoring and recording ② PP: constitute valid traces ③ BG: extract thread- aware birthmarks ④ BSC: calculate similarity scores ⑤ PD: determine detection result

19 19 Pre-Processor & Birthmark Generator Pre-Processor: filter out noises and extract valid traces Birthmark Generator: generate TW-DKISBs and TW-SCSSBs utilizing SA model and SS model implemented

20 20 Thread-Aware Birthmark based Plagiarism Detection 5 main modules: ① DAM: monitoring and recording ② PP: constitute valid traces ③ BG: extract thread- aware birthmarks ④ BSC: calculate similarity scores ⑤ PD: determine detection result

21 21 Thread-Aware Birthmark based Plagiarism Detection 5 main modules: ① DAM: monitoring and recording ② PP: constitute valid traces ③ BG: extract thread- aware birthmarks ④ BSC: calculate similarity scores ⑤ PD: determine detection result

22 22 Similarity Calculator & Plagiarism Decider Similarity Calculator Four Similarity Metrics

23 23 Similarity Calculator & Plagiarism Decider Similarity Calculator Bipartite matching

24 24 Similarity Calculator & Plagiarism Decider Similarity Calculator Decision Maker

25 25 Outline 1.Introduction 2.Thread-Aware Birthmark Methods 3.Evaluation 4.Unsolved Problems & Future Work

26 26 Evaluation A high quality birthmark manifests in that the ratio of false classifications should be rather low for a given ɛ Two properties to check

27 27 Evaluating Resilience Property Resilience to different compilers and optimization levels Similairty scores between binaries of pigz Statistical differences for 20 versions of pigz

28 28 Evaluating Resilience Property Resilience to special obfuscation tools Cosine similarity between ConGzip and its 29 Sandmark obfuscated versions

29 29 Evaluating Resilience Property Resilience to special obfuscation tools Allatori, DashO, Jshrink, ProGuard and RetroGround Resilience to Allatori-Series obfuscation tools

30 30 Evaluating Credibility Property Similarity between independently implemented programs 6 compression software: Lbzip, lrzip, pbzip2, pigz, plzip and rar 5 audio players: Cmus, mocp, mp3blaster, mplayer and sox 10 web browsers: arora, chromium, dillo, dooble, epiphany, firefox, konqueror, luakit, midori and seaMonkey Credibility evaluation of TW-SCSSBs using 10 web browsers

31 31 Comparing with Traditional Birthmarks Performance Evaluation Metric By varying ɛ from 0-0.5, an F-Measure curve can be drawn AUC: area under the F-Measure curve Detection Criteria

32 32 Comparing with Traditional Birthmarks F-Measure curves for TW-SCSSBSA, TW-SCSSBSS, and SCSSB

33 33 Outline 1.Introduction 2.Thread-Aware Birthmark Methods 3.Evaluation 4.Unsolved Problems & Future Work

34 34 Unsolved Problems & Future Work Problems Partial and library plagiarism problems Tool is preliminary Impact of K is not evaluated Future Works Conduct experiments using other kinds tools, such as the shelling tools (Upx, ASProtect etc.); and on real plagiarism cases Improve our method to support for partial plagiarism detection Evaluate the effect of K to detection ability Form a relatively mature tool

35 35 Q&A

36 36 Some Definitions


Download ppt "Plagiarism Detection for Multithreaded Software Based on Thread-Aware Software Birthmarks Zhenzhou Tian MOE Key Lab for Intelligent."

Similar presentations


Ads by Google