Download presentation
Presentation is loading. Please wait.
Published byAnthony Manning Modified over 9 years ago
1
Hidden Markov Models for Software Piracy Detection Shabana Kazi Mark Stamp HMMs for Piracy Detection 1
2
Intro Here, we apply metamorphic analysis to software piracy detection Very similar to techniques used in malware detection o But, problem is completely different o Has nothing to do with malware We show that there are other applications of such techniques HMMs for Piracy Detection 2
3
Software Piracy Software piracy is major problem o By 2009 estimate, $3 to $4 lost to piracy for every $1 in software sales Usually, piracy consists of taking software without modification In some cases, software is modified o Commercial theft of intellectual property o Thief really doesn’t want to get caught… HMMs for Piracy Detection 3
4
Software Piracy We assume software is stolen o And modified, making it hard to detect o If completely rewritten from scratch, we won’t detect it by our approach Want to make life hard for bad guys o Ideally, major modifications required How much modification is need before we cannot reliably detect? HMMs for Piracy Detection 4
5
Goals Technique applicable to any software No special effort by developer o Nothing extra inserted into code We only require access to exe file Not a watermarking scheme o More like software “birthmark” analysis Also not plagiarism detection o Here, want a “deeper” analysis HMMs for Piracy Detection 5
6
Use Case You work for Alice’s Software Company o And you develop fancy software for ASC Trudy’s Software Company (TSC) develops suspiciously similar product You suspect TSC of stealing your code o Not identical, but seems similar What can you do? o We’ve got some ideas that might help… HMMs for Piracy Detection 6
7
Use Case Using the technique discussed here Can easily measure code similarity Low similarity? o Then no hope of proving code is stolen High similarity? o Further (costly) analysis is warranted High similarity does not prove stolen o But a good reason to take a closer look HMMs for Piracy Detection 7
8
Background Metamorphic software o Metamorphic techniques (dead code, permutation, substitution) HMM o Basic ideas and notation o The 3 problems and their solutions (discussed at a high level) We’ve seen all of this before HMMs for Piracy Detection 8
9
Overview Training and scoring Train HMM on slightly morphed copies of given “base” software o Slight morphing to avoid overfitting Score morphed copies and other files o Here, morphing serves to simulate modifications by attacker Want to know how much morphing required before detection fails HMMs for Piracy Detection 9
10
Metamorphic Generator Built our own metamorphic generator Morph based on extracted opcodes o Morphing consists of dead code insertion o Specify a dead code percentage and number of blocks to insert Do not require morphed code works o Makes detection more difficult, not easier o A worst-case scenario, detection-wise HMMs for Piracy Detection 10
11
Training Given a base executable file… Extract its opcode sequence Generate 100 slightly morphed copies o Each morphed 10%, using dead code extracted from random “normal” file Train HMM on morphed copies o Using 5-fold cross validation o Note: We train one model for each “fold” HMMs for Piracy Detection 11
12
Training Illustration of training process o Slightly morphed copies of base program HMMs for Piracy Detection 12
13
Determine Threshold For each of 5-folds o Train HMM o Score 20 morphed files (match set) and 15 normal (nomatch set) Determine threshold based on scores o Threshold is highest score of normal file o Implies FPR = 0; equivalently, TNR = 1 (for the given “fold”) HMMs for Piracy Detection 13
14
Setting a Threshold Process used to set threshold HMMs for Piracy Detection 14
15
Experiments Want to determine robustness For each base file tested… Train to obtain HMM and threshold Morph base file at various percentages o Using various morphing strategies o Refer to this morphing as tampering Score each tampered copy o Classify, based on threshold HMMs for Piracy Detection 15
16
Experiments Scoring tampered files HMMs for Piracy Detection 16
17
Experiment Details For each base file o 6 models o 10 tamper percent for each o 100 files each o So, 6000 scores! HMMs for Piracy Detection 17
18
Experiment Details Tested 10 base files, each data point o So 60,000 scores computed… HMMs for Piracy Detection 18
19
Experiment Details Repeated entire experiment 6 times o Using different number of blocks in training phase o Training made little difference on scores o So, here we only give results where 1 block used in training phase In total 360,000 scores computed o And 360 “models” generate o That is, 1800 HMMs (one per fold) HMMs for Piracy Detection 19
20
Results: Bar Graph HMMs for Piracy Detection 20
21
Results: 3-d Plot HMMs for Piracy Detection 21
22
Conclusions Results look very promising o Robust high degree of morphing required before base file undetected o Practical only requires exe, no special effort when developing o Applies to any exe, at any time Overall, strong software “birthmark” strategy with practical implications HMMs for Piracy Detection 22
23
Future Work Statistical analysis somewhat weak o Results may be stronger than it appears Many other scores/combinations of scores can be tested o Results can only get better Consider other morphing techniques o And other file types (e.g., bytecode) o And mitigations for 1-block morphing … HMMs for Piracy Detection 23
24
References S. Kazi and M. Stamp, Hidden Markov models for software piracy detection, Information Security Journal: A Global Perspective, 22:140-149, 2013Hidden Markov models for software piracy detection HMMs for Piracy Detection 24
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.