Hidden Markov Models for Software Piracy Detection Shabana Kazi Mark Stamp HMMs for Piracy Detection 1.

Slides:



Advertisements
Similar presentations
Effort Estimation and Scheduling
Advertisements

Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp.
Watermarking 3D Objects for Verification Boon-Lock Yeo Minerva M. Yeung.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
White-Box Cryptography
Lecture 19 Page 1 CS 111 Online Protecting Operating Systems Resources How do we use these various tools to protect actual OS resources? Memory? Files?
Transform Techniques Mark Stamp Transform Techniques.
Auditing Concepts.
Linear Obfuscation to Combat Symbolic Execution Zhi Wang 1, Jiang Ming 2, Chunfu Jia 1 and Debin Gao 3 1 Nankai University 2 Pennsylvania State University.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
An Introduction to Hidden Markov Models and Gesture Recognition Troy L. McDaniel Research Assistant Center for Cognitive Ubiquitous Computing Arizona State.
Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
A CONTROL INSTRUMENTS COMPANY The Effectiveness of T-way Test Data Generation or Data Driven Testing Michael Ellims.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Metamorphic Malware Research
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.
RIT Software Engineering
Detecting Botnets Using Hidden Markov Models on Network Traces Wade Gobel Bio-Grid, Summer 2008.
SE 450 Software Processes & Product Metrics 1 Defect Removal.
Sequence similarity.
HUNTING FOR METAMORPHIC ENGINES Mark Stamp & Wing Wong August 5, 2006.
Intermediate Code. Local Optimizations
System Implementation
Profile Hidden Markov Models PHMM 1 Mark Stamp. Hidden Markov Models  Here, we assume you know about HMMs o If not, see “A revealing introduction to.
Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.
Digital Video Disk (DVD) Protection “Watermarks allow embedded signals to be extracted from audio and video content for a variety of purposes. One application.
Other Features Index and table of contents Macros and VBA.
Digital Image Watermarking Er-Hsien Fu EE381K Student Presentation.
Antivirus Software Detects malware (not just viruses) May eliminate malware as well Often sold with firewalls Two approaches: Dictionary-based - Compares.
Business Logic Abuse Detection in Cloud Computing Systems Grzegorz Kołaczek 1st International IBM Cloud Academy Conference Research Triangle Park, NC April.
Computer System Lifecycle Chapter 1. Introduction Computer System users, administrators, and designers are all interested in performance evaluation. Whether.
Introduction to Profile Hidden Markov Models
Masquerade Detection Mark Stamp 1Masquerade Detection.
Department of Computer Science Yasmine Kandissounon.
Conditional Random Fields
Graph Techniques for Malware Detection Mark Stamp Graph Techniques 1.
Multimedia Copyright Protection Technologies M. A. Suhail, I. A. Niazy
1 CSC 221: Introduction to Programming Fall 2012 Functions & Modules  standard modules: math, random  Python documentation, help  user-defined functions,
Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.
1 A Feature Selection and Evaluation Scheme for Computer Virus Detection Olivier Henchiri and Nathalie Japkowicz School of Information Technology and Engineering.
ILOG Solver Directions Laurent Perron ILOG SA. Outline Constraint Programming, a powerful technology The CP suite in ILOG CP faces new challenges Recent.
Hunting for Metamorphic Engines Wing Wong Mark Stamp Hunting for Metamorphic Engines 1.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Comp. Genomics Recitation 3 The statistics of database searching.
An Object-Oriented Approach to Programming Logic and Design Fourth Edition Chapter 6 Using Methods.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
CISC Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.
PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.
Secure Spread Spectrum Watermarking for Multimedia Young K Hwang.
GOLD UNIT 4 - IT SECURITY FOR USERS (2 CREDITS) Bailey Ryan.
Multi resolution Watermarking For Digital Images Presented by: Mohammed Alnatheer Kareem Ammar Instructor: Dr. Donald Adjeroh CS591K Multimedia Systems.
Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp.
Mutation Testing Breaking the application to test it.
Section 9.1 First Day The idea of a significance test What is a p-value?
Assistant Instructor Nian K. Ghafoor Feb Definition of Proposal Proposal is a plan for master’s thesis or doctoral dissertation which provides the.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
MMC LAB Secure Spread Spectrum Watermarking for Multimedia KAIST MMC LAB Seung jin Ryu 1MMC LAB.
HUNTING FOR METAMORPHIC ENGINES Mark Stamp & Wing Wong September 13, 2006.
Profile Hidden Markov Models PHMM 1 Mark Stamp. Hidden Markov Models  Here, we assume you know about HMMs o If not, see “A revealing introduction to.
IMAGE AUTHENTICATION TECHNIQUES Based on Automatic video surveillance (AVS) systems Guided by: K ASTURI MISHRA PRESENTED BY: MUKESH KUMAR THAKUR REG NO:
1 Digital Water Marks. 2 History The Italians where the 1 st to use watermarks in the manufacture of paper in the 1270's. A watermark was used in banknote.
Technical Implementation: Security Risks
Auditing Concepts.
Evaluating Classifiers
HMM Applications Mark Stamp HMM Applications.
The Art of Deception.
Elementary Statistics
Presentation transcript:

Hidden Markov Models for Software Piracy Detection Shabana Kazi Mark Stamp HMMs for Piracy Detection 1

Intro  Here, we apply metamorphic analysis to software piracy detection  Very similar to techniques used in malware detection o But, problem is completely different o Has nothing to do with malware  We show that there are other applications of such techniques HMMs for Piracy Detection 2

Software Piracy  Software piracy is major problem o By 2009 estimate, $3 to $4 lost to piracy for every $1 in software sales  Usually, piracy consists of taking software without modification  In some cases, software is modified o Commercial theft of intellectual property o Thief really doesn’t want to get caught… HMMs for Piracy Detection 3

Software Piracy  We assume software is stolen o And modified, making it hard to detect o If completely rewritten from scratch, we won’t detect it by our approach  Want to make life hard for bad guys o Ideally, major modifications required  How much modification is need before we cannot reliably detect? HMMs for Piracy Detection 4

Goals  Technique applicable to any software  No special effort by developer o Nothing extra inserted into code  We only require access to exe file  Not a watermarking scheme o More like software “birthmark” analysis  Also not plagiarism detection o Here, want a “deeper” analysis HMMs for Piracy Detection 5

Use Case  You work for Alice’s Software Company o And you develop fancy software for ASC  Trudy’s Software Company (TSC) develops suspiciously similar product  You suspect TSC of stealing your code o Not identical, but seems similar  What can you do? o We’ve got some ideas that might help… HMMs for Piracy Detection 6

Use Case  Using the technique discussed here  Can easily measure code similarity  Low similarity? o Then no hope of proving code is stolen  High similarity? o Further (costly) analysis is warranted  High similarity does not prove stolen o But a good reason to take a closer look HMMs for Piracy Detection 7

Background  Metamorphic software o Metamorphic techniques (dead code, permutation, substitution)  HMM o Basic ideas and notation o The 3 problems and their solutions (discussed at a high level)  We’ve seen all of this before HMMs for Piracy Detection 8

Overview  Training and scoring  Train HMM on slightly morphed copies of given “base” software o Slight morphing to avoid overfitting  Score morphed copies and other files o Here, morphing serves to simulate modifications by attacker  Want to know how much morphing required before detection fails HMMs for Piracy Detection 9

Metamorphic Generator  Built our own metamorphic generator  Morph based on extracted opcodes o Morphing consists of dead code insertion o Specify a dead code percentage and number of blocks to insert  Do not require morphed code works o Makes detection more difficult, not easier o A worst-case scenario, detection-wise HMMs for Piracy Detection 10

Training  Given a base executable file…  Extract its opcode sequence  Generate 100 slightly morphed copies o Each morphed 10%, using dead code extracted from random “normal” file  Train HMM on morphed copies o Using 5-fold cross validation o Note: We train one model for each “fold” HMMs for Piracy Detection 11

Training  Illustration of training process o Slightly morphed copies of base program HMMs for Piracy Detection 12

Determine Threshold  For each of 5-folds o Train HMM o Score 20 morphed files (match set) and 15 normal (nomatch set)  Determine threshold based on scores o Threshold is highest score of normal file o Implies FPR = 0; equivalently, TNR = 1 (for the given “fold”) HMMs for Piracy Detection 13

Setting a Threshold  Process used to set threshold HMMs for Piracy Detection 14

Experiments  Want to determine robustness  For each base file tested…  Train to obtain HMM and threshold  Morph base file at various percentages o Using various morphing strategies o Refer to this morphing as tampering  Score each tampered copy o Classify, based on threshold HMMs for Piracy Detection 15

Experiments  Scoring tampered files HMMs for Piracy Detection 16

Experiment Details  For each base file o 6 models o 10 tamper percent for each o 100 files each o So, 6000 scores! HMMs for Piracy Detection 17

Experiment Details  Tested 10 base files, each data point o So 60,000 scores computed… HMMs for Piracy Detection 18

Experiment Details  Repeated entire experiment 6 times o Using different number of blocks in training phase o Training made little difference on scores o So, here we only give results where 1 block used in training phase  In total 360,000 scores computed o And 360 “models” generate o That is, 1800 HMMs (one per fold) HMMs for Piracy Detection 19

Results: Bar Graph HMMs for Piracy Detection 20

Results: 3-d Plot HMMs for Piracy Detection 21

Conclusions  Results look very promising o Robust  high degree of morphing required before base file undetected o Practical  only requires exe, no special effort when developing o Applies to any exe, at any time  Overall, strong software “birthmark” strategy with practical implications HMMs for Piracy Detection 22

Future Work  Statistical analysis somewhat weak o Results may be stronger than it appears  Many other scores/combinations of scores can be tested o Results can only get better  Consider other morphing techniques o And other file types (e.g., bytecode) o And mitigations for 1-block morphing … HMMs for Piracy Detection 23

References  S. Kazi and M. Stamp, Hidden Markov models for software piracy detection, Information Security Journal: A Global Perspective, 22: , 2013Hidden Markov models for software piracy detection HMMs for Piracy Detection 24