Download presentation
Presentation is loading. Please wait.
Published byLee Sherman Modified over 9 years ago
1
Plagiarism Detection for Multithreaded Software Based on Thread-Aware Software Birthmarks Zhenzhou Tian zztian@stu.xjtu.edu.cn MOE Key Lab for Intelligent Networks and Network Security Xi’an Jiaotong University, China 2015-10-18 1
2
2 Outline 1.Introduction 2.Thread-Aware Birthmark Methods 3.Evaluation 4.Unsolved Problems & Future Work
3
3 Introduction Software plagiarism has been a serious threat to the healthy development of software industry Violate licenses for commercial interests or unwittingly Weak code protection awareness Powerful automated code obfuscation tools Distributed in binary form
4
4 Introduction A series of methods are proposed for plagiarism detection Software Watermarking Insert extra data “a sufficiently determined attacker will eventually be able to defeat any watermark” Static and Dynamic Software Birthmarks Dynamic birthmarks are more resilient to semantic- preserving code obfusctions
5
5 Introduction A series of methods are proposed for plagiarism detection Software Watermarking Static and Dynamic Software Birthmarks Increasingly popular trend towards multithreaded programming brings new challenge to existing dynamic birthmark methods Existing dynamic birthmark remain optimized for sequential programs Neglect the effect of thread scheduling Two executions of a single program under same input can be very different, rendering the existing methods ineffective
6
6 Introduction DKISBSCSSB Cosine0.8380.452 Jaccard0.5510.369 Dice0.6780.51 Containment0.7350.477 DKISB: dynamic key instruction sequence birthmark SCSSB: system call short sequence birthmark
7
7 Introduction Contributions: Two thread-aware dynamic birthmarks TW-DKISB and TW- SCSSB are proposed to detect software plagiarism Operates directly on binary executables Not limited to specific operating systems and languages Resilient to various automated obfuscation techniques 29 different obfuscation techniques in SandMark
8
8 Introduction Contributions: A prototype is implemented using the Pin instrumentation framework, and extensive experiments are conducted. A suite of benchmarks is compiled for researchers to conduct experiments and present their findings http://labs.xjtudlc.com/labs/benchmark.html
9
9 Outline 1.Introduction 2.Thread-Aware Birthmark Methods 3.Evaluation 4.Unsolved Problems & Future Work
10
10 A set of characteristics extracted from a program that reflects intrinsic properties of the program, and which can be used to identify the program uniquely. Two types: Static and Dynamic software birthmarks Dynamic birthmark defined by Myles Software Birthmark
11
11 Thread-Aware Dynamic Software Birthmark Predetermining a thread schedule is very difficult Try to shield their influence on executions instead of enforcing thread schedule
12
12 Thread-Aware Dynamic Software Birthmarks Main Idea: Split then Aggregate Execution order in each thread is relatively stable. Projecting the trace on thread-ids to obtain sub-traces to extract Slice birthmarks Aggregating all slice birthmarks. Different traces of a program under the same input Same slices
13
13 Slice Birthmark & Program Birthmark K-Gram Slice Birthmarks SAM SSM
14
14 Thread-Aware Birthmark based Plagiarism Detection 5 main modules: ① DAM: monitoring and recording ② PP: constitute valid traces ③ BG: extract thread- aware birthmarks ④ BSC: calculate similarity scores ⑤ PD: determine detection result
15
15 Thread-Aware Birthmark based Plagiarism Detection 5 main modules: ① DAM: monitoring and recording ② PP: constitute valid traces ③ BG: extract thread- aware birthmarks ④ BSC: calculate similarity scores ⑤ PD: determine detection result
16
16 Dynamic Analysis Module Monitoring the execution of a program using Pin DKISExtractor: performs dynamic taint analysis to identify and record key instructions SysTracer: record each execution of system calls
17
17 Thread-Aware Birthmark based Plagiarism Detection 5 main modules: ① DAM: monitoring and recording ② PP: constitute valid traces ③ BG: extract thread- aware birthmarks ④ BSC: calculate similarity scores ⑤ PD: determine detection result
18
18 Thread-Aware Birthmark based Plagiarism Detection 5 main modules: ① DAM: monitoring and recording ② PP: constitute valid traces ③ BG: extract thread- aware birthmarks ④ BSC: calculate similarity scores ⑤ PD: determine detection result
19
19 Pre-Processor & Birthmark Generator Pre-Processor: filter out noises and extract valid traces Birthmark Generator: generate TW-DKISBs and TW-SCSSBs utilizing SA model and SS model implemented
20
20 Thread-Aware Birthmark based Plagiarism Detection 5 main modules: ① DAM: monitoring and recording ② PP: constitute valid traces ③ BG: extract thread- aware birthmarks ④ BSC: calculate similarity scores ⑤ PD: determine detection result
21
21 Thread-Aware Birthmark based Plagiarism Detection 5 main modules: ① DAM: monitoring and recording ② PP: constitute valid traces ③ BG: extract thread- aware birthmarks ④ BSC: calculate similarity scores ⑤ PD: determine detection result
22
22 Similarity Calculator & Plagiarism Decider Similarity Calculator Four Similarity Metrics
23
23 Similarity Calculator & Plagiarism Decider Similarity Calculator Bipartite matching
24
24 Similarity Calculator & Plagiarism Decider Similarity Calculator Decision Maker
25
25 Outline 1.Introduction 2.Thread-Aware Birthmark Methods 3.Evaluation 4.Unsolved Problems & Future Work
26
26 Evaluation A high quality birthmark manifests in that the ratio of false classifications should be rather low for a given ɛ Two properties to check
27
27 Evaluating Resilience Property Resilience to different compilers and optimization levels Similairty scores between binaries of pigz Statistical differences for 20 versions of pigz
28
28 Evaluating Resilience Property Resilience to special obfuscation tools Cosine similarity between ConGzip and its 29 Sandmark obfuscated versions
29
29 Evaluating Resilience Property Resilience to special obfuscation tools Allatori, DashO, Jshrink, ProGuard and RetroGround Resilience to Allatori-Series obfuscation tools
30
30 Evaluating Credibility Property Similarity between independently implemented programs 6 compression software: Lbzip, lrzip, pbzip2, pigz, plzip and rar 5 audio players: Cmus, mocp, mp3blaster, mplayer and sox 10 web browsers: arora, chromium, dillo, dooble, epiphany, firefox, konqueror, luakit, midori and seaMonkey Credibility evaluation of TW-SCSSBs using 10 web browsers
31
31 Comparing with Traditional Birthmarks Performance Evaluation Metric By varying ɛ from 0-0.5, an F-Measure curve can be drawn AUC: area under the F-Measure curve Detection Criteria
32
32 Comparing with Traditional Birthmarks F-Measure curves for TW-SCSSBSA, TW-SCSSBSS, and SCSSB
33
33 Outline 1.Introduction 2.Thread-Aware Birthmark Methods 3.Evaluation 4.Unsolved Problems & Future Work
34
34 Unsolved Problems & Future Work Problems Partial and library plagiarism problems Tool is preliminary Impact of K is not evaluated Future Works Conduct experiments using other kinds tools, such as the shelling tools (Upx, ASProtect etc.); and on real plagiarism cases Improve our method to support for partial plagiarism detection Evaluate the effect of K to detection ability Form a relatively mature tool
35
35 Q&A
36
36 Some Definitions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.