Experiments in Software Watermarking Bradford P. Cuppy B.S. University of Evansville Fri, Nov 8, 2002.

Slides:



Advertisements
Similar presentations
Data Structures Static and Dynamic.
Advertisements

Networked Digital Whiteboard with Handwritten-Symbol Interpreter and Dynamic-Display-Object Creator Atsuhide Kobashi Henry M. Gunn High School Palo Alto,
Chapter 4: Trees Part II - AVL Tree
Introduction To Java Objectives For Today â Introduction To Java â The Java Platform & The (JVM) Java Virtual Machine â Core Java (API) Application Programming.
SYSTEM PROGRAMMING & SYSTEM ADMINISTRATION
1 Starting a Program The 4 stages that take a C++ program (or any high-level programming language) and execute it in internal memory are: Compiler - C++
Systems Software.
Wmobf.1 1/5/00 Clark Thomborson Watermarking, Tamper-Proofing and Obfuscation – Tools for Software Protection Christian Collberg & Clark Thomborson Computer.
Creating a Program In today’s lesson we will look at: what programming is different types of programs how we create a program installing an IDE to get.
Linear Obfuscation to Combat Symbolic Execution Zhi Wang 1, Jiang Ming 2, Chunfu Jia 1 and Debin Gao 3 1 Nankai University 2 Pennsylvania State University.
Java.  Java is an object-oriented programming language.  Java is important to us because Android programming uses Java.  However, Java is much more.
Name: Hao Yuan Supervisor: Len Hamey ITEC810 ProjectTransformations for Obfuscating Object-Oriented Programs1.
16/13/2015 3:30 AM6/13/2015 3:30 AM6/13/2015 3:30 AMIntroduction to Software Development What is a computer? A computer system contains: Central Processing.
(Page 554 – 564) Ping Perez CS 147 Summer 2001 Alternative Parallel Architectures  Dataflow  Systolic arrays  Neural networks.
Chapter 4 Linked Structures. Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 4-2 Chapter Objectives Describe the use of references to create.
Programming Introduction November 9 Unit 7. What is Programming? Besides being a huge industry? Programming is the process used to write computer programs.
CS 225 Lab #2 - Pointers, Copy Constructors, Destructors, and DDD.
Guide To UNIX Using Linux Third Edition
Address Obfuscation: An Efficient Approach to Combat a Broad Range of Memory Error Exploits Sandeep Bhatkar, Daniel C. DuVarney, and R. Sekar Stony Brook.
Breaking Abstractions and Unstructuring Data Structures Christian Collberg Clark Thomborson Douglas Low “Mobile programs are distributed in forms that.
F13 Forensic tool analysis Dr. John P. Abraham Professor UTPA.
Dr. Pedro Mejia Alvarez Software Testing Slide 1 Software Testing: Building Test Cases.
Debugging, Build and Version Control Rudra Dutta CSC Spring 2007, Section 001.
Platforms for Learning in Computer Science July 28, 2005.
CHAPTER 4: INTRODUCTION TO COMPUTER ORGANIZATION AND PROGRAMMING DESIGN Lec. Ghader Kurdi.
Homework Reading Programming Assignments
Dr. Ahmad R. Hadaegh A.R. Hadaegh California State University San Marcos (CSUSM) Page 1 Virtual Functions Polymorphism Abstract base classes.
Application Security Tom Chothia Computer Security, Lecture 14.
Introduction Algorithms and Conventions The design and analysis of algorithms is the core subject matter of Computer Science. Given a problem, we want.
Old Chapter 10: Programming Tools A Developer’s Candy Store.
CS266 Software Reverse Engineering (SRE) Reversing and Patching Java Bytecode Teodoro (Ted) Cipresso,
EECS 354 Network Security Reverse Engineering. Introduction Preventing Reverse Engineering Reversing High Level Languages Reversing an ELF Executable.
Testing and Debugging Version 1.0. All kinds of things can go wrong when you are developing a program. The compiler discovers syntax errors in your code.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
1 Experience With Software Watermarking Author: Jens Palsberg et al. Presenter: Charles He “Embedding Watermarking in dynamic data structures … can be.
Protecting Software Code By Guards The George Washington University Cs297 YU-HAO HU.
1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.
Introduction to Compilers. Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software.
Fall 2002CS 150: Intro. to Computing1 Streams and File I/O (That is, Input/Output) OR How you read data from files and write data to files.
M. Accetta, R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and M. Young MACH: A New Kernel Foundation for UNIX Development Presenter: Wei-Lwun.
DATA & COMPUTER SECURITY (CSNB414) MODULE 3 MODERN SYMMETRIC ENCRYPTION.
Copyright © 2008 Pearson Addison-Wesley. All rights reserved. Chapter 15 Inheritance.
SEG 4110 – Advanced Software Design and Reengineering Topic T Introduction to Refactoring.
CMSC 202 Advanced Section Classes and Objects: Object Creation and Constructors.
Introduction to OOP CPS235: Introduction.
Chapter 5 Linked List by Before you learn Linked List 3 rd level of Data Structures Intermediate Level of Understanding for C++ Please.
JavaScript 101 Introduction to Programming. Topics What is programming? The common elements found in most programming languages Introduction to JavaScript.
Announcements Assignment 1 due Wednesday at 11:59PM Quiz 1 on Thursday 1.
1 CSC160 Chapter 1: Introduction to JavaScript Chapter 2: Placing JavaScript in an HTML File.
VM: Chapter 7 Buffer Overflows. csci5233 computer security & integrity (VM: Ch. 7) 2 Outline Impact of buffer overflows What is a buffer overflow? Types.
Experience with Software Watermarking Jens Palsberg, Sowmya Krishnaswamy, Minseok Kwon, Di Ma, Qiuyun Shao, Yi Zhang CERIAS and Department of Computer.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
DYNAMIC MEMORY ALLOCATION. Disadvantages of ARRAYS MEMORY ALLOCATION OF ARRAY IS STATIC: Less resource utilization. For example: If the maximum elements.
Fundamental of Java Programming (630002) Unit – 1 Introduction to Java.
Lecture 1b- Introduction
Visit for more Learning Resources
14 Compilers, Interpreters and Debuggers
Cash Me Presented By Group 8 Kartik Patel, Aaron Zhong, Wen-Kai Chen,
Introduction to programming
Testing and Debugging PPT By :Dr. R. Mall.
COP Introduction to Database Structures
Eclipse Navigation & Usage.
Chapter 9 – Real Memory Organization and Management
TRANSLATORS AND IDEs Key Revision Points.
Optimizing Malloc and Free
Lesson Objectives Aims
Software Watermarking Deterring Software Piracy
Oracle9i Developer: PL/SQL Programming Chapter 8 Database Triggers.
SPL – PS1 Introduction to C++.
Presentation transcript:

Experiments in Software Watermarking Bradford P. Cuppy B.S. University of Evansville Fri, Nov 8, 2002

Introduction Problem with Software Piracy is it has a cost in the billions of dollars. Theft of code segments by others and denying original author proper credit for work accomplished Watermarking as legal proof of ownership Software watermarking is a very new field whereas video, graphic and audio watermarking are well established. We will cover in this presentation, my watermark, the Planted Plane Cubic Tree (PPCT), Experiments on it, its strengths, attacks on it and comparisons to what others have done

Related Research Types of watermarking Systems –Private (my watermark) –Semi-Private –Public Different Types of Software Watermarks –Static Data (i.e. strings netscape) –Dynamic Software –Dynamic Data –Easter Egg –Dynamic Execution Trace –Others (Fingerprinting, License Mark)

Attacks on Watermarks Three types of attacks –additive counterfeiter adds his own watermark –subtractive removal of the watermark –distortive through obfuscation, code decompile and recompilation, altering the watermark enough to where it is no longer recognizable

Protecting Watermarks Palsberg Paper, there are three defenses for watermarks –randomization –obfuscation –tamper proofing This will be covered in later slides Reference: Jens Palsberg, Sowmya Krishnaswarmy, Minseok Kwon, Di Ma, Qiuyun Shao, Yi Zhang, “Experience with Software Watermarking”, 2000 Annual Computer Security Applications Conference, New Orleans, Section 3

My Software Watermark My Watermark uses modified Planted Plane Cubic Tree (PPCT) which is based on Dynamic Data Structure Watermark Watermark is basically a Binary Tree with a root node The watermark cannot be accessed while the program is running It is only accessible running through GNU Debugger (gdb) Accessibility is determined by setting a couple of special variables within debugger Watermark can be found in core dump

Comparison of Binary Trees PPCT Tree A PPCT Tree is like a binary tree except there is a root node “R” which points to the starting node of the binary tree. The binary tree nodes point back to root “R”.

Comparisons of Binary Trees (con’t) My Modified PPCT Tree My modified PPCT leaves out the pointers to the root “R” from the binary tree nodes. Questions concerning Binary Trees –Why a Binary Tree? Why not another kind of tree or graph ? Binary trees based on dynamic data structure watermarks are not apparent to an attacker since there isn’t any distinctive output when the program is executed even in debug mode. Other watermarks such as Easter Egg are apparent and can be found and destroyed.

Modified PPCT versus Original PPCT How is my Modified PPCT better than the original PPCT ? –There is no need to point back to root node since the tools available with Java to look at byte-code binaries are not available with C/C++. GNU software only has GNU Debugger (gdb) to go through programs. –The watermark is found in a core dump after execution of the code –The raw memory locations has to be set through GNU Debugger(dbg) before executing the program. –The public would get the binary with the symbols stripped (“-s” option). –The modified tree is easier to build through the available functions

Mod PPCT v. PPCT (con’t) Is the modified PPCT easier or harder to break ? –In the original PPCT tree, the whole watermark can be found regardless of which node the cracker starts on since there is a pointer to root node. –In the Modified PPCT, the cracker must start on the root node in order to manipulate the whole watermark. If he starts anywhere else, he won’t be able to destroy the whole watermark. Parts of it would remain. –The tree is smaller in structure since it has pointers forward only –The tree would be more difficult to find since the it is more compact. –An example which is opposite is the Easter Egg watermark which is very big and complex which makes it an easy target to go after as mentioned in the Palsberg paper.

Mod PPCT v. PPCT (con’t) Why was I motivated to modify it ? –PPCT under Java can be easily referenced with the available Java tools where as under GNU C/C++, the only tool to use is the GNU Debugger (dbg) –The equivalent tools in C/C++ would have to be written and would require a thorough knowledge on how the binaries are written by the compiler. –Under C/C++, the PPCT tree cannot be traversed through due to limitations of tools. –Simplicity of coding. How is the Watermark Discovered ? –The tree is traversed once and then a core dump is forced and the watermark is discovered through the core dump file

Alternative Watermarks What were the alternative watermarks have I looked at ? –Static Watermark: A display line such as doing “strings netscape” in Unix which shows “©1995 by Netscape, Inc.” which is the copyright message. It can be simply be removed through a bin-hex editor. Very easily cracked. –Code watermarks are easily susceptible to distortive de- watermarking attacks –The Easter egg watermark shows a special display after a special input sequence is performed. An example is putting “about:mozilla” in the URL field in Netscape and a Fire Breathing Dragon is shown instead of shooting stars.

Placement of Watermark Watermarking functions and sub-routines are added in the source code stage. Nothing is done after compilation. The watermark sub-routines are placed in various parts of the program by hand –Each program and its functions are unique enough to where it cannot be done automatically –The only way this can be done automatically is specific standards are set and followed by the programmer when writing source code. This would include naming convention for example. –The disadvantage of the automated watermark insertion would be consistency to finding the watermark. The programmer must decide how to place the watermarking subroutines –The functions are mixed in with functions that are required by the program –The watermark function names are renamed to names that look similar to names of the program’s required function names Watermarking subroutines need to be integrated into each unique program which varies quite a bit In diagram in next slide. The yellow “watermark” word represents an access to a watermark subroutine or variable

Place of Watermark - Diagram

Experimental Results Used John the Ripper v1.6 as the platform Early versions of the watermark were easily found through the use of unix “nm” command Fixes were including stripping the symbol table from the compiles. “-s” option Mixing watermark and non-watermark functions in the source code Including tamperproofing Putting conditional statement to get watermark further down in main code or program and required execution by a different name The size of program and execution time between watermarked and non-watermarked version varies very little

Strength Standards for Watermarks Three different Protection Mechanisms for Software Watermarks –Randomization –Obfuscation –Tamperproofing Randomization –Weave watermark into code as defined as mixing the watermark functions with program functions in source code, therefore, making it harder to do comparisons between watermarked versions and non-watermarked versions –My watermark was randomized by taking different watermarking subroutines and placing them in different parts of the watermarked program.

Strength Con’t Obfuscation –Dynamic & Static Opaque Predicate which is defined as a conditional statement that is triggered in order to show watermark –Variable Split and merging such as x is x1, x2 for split and y1, y2 into y for merge –Renaming which is changing the names of variables –Renaming variables is moot since variables are referred to as address numbers when symbols are stripped from the binary –My watermark was obfuscated by using static predicates and renaming the function and variable names to where they “blend” in with other parts of the program Tamperproofing –Program depends on watermark to function –check hash or CRC value of program –parent pointer which is point of origin for binary tree –check form of watermark such as if it still a PPCT or changed into a simple binary tree –My program depends on the watermark in order to function.

Attacking my Watermark In order to attack my watermark, the attacker will need to do : –Find out the type of watermark since it can be discovered only through the use of debugger and setting the special variable (raw memory location) and then executing the program –Use a debugger or decompiler to go through the binary code Reverse Engineering Compiler (REC) found at URL was used Decompilers for GNU C/C++ are crude and source code derived thereof is very large The original source is about 390 lines and decompiled source was about 26 thousand lines –A rough estimate on the time to break the watermark would be more than an 8 hour day

Comparison to other Watermarks Benchmarked against Dr. Collberg’s Watermark and Dr. Palsberg’s Watermark (Java Wiz) My Watermark contains all 3 elements of Randomization, Obfuscation and Tamperproofing Dr. Collberg’s and Dr. Palsberg’s Watermarks contains only 2 elements of Randomization and Obfuscation. There is no tamperproofing in their watermarks. On Randomization, both Dr. Collberg and I use weaving and unusual means to access the watermark whereas Dr. Palsberg uses only weaving On Obfuscation, Dr. Collberg’s watermark and mine differ quite a bit. –Dr. Collberg’s watermark uses static opaque predicate, padding, variable split/merge, renaming, method in/outline, and modify inheritance hierarchy –mine only uses static opaque predicate –Dr. Palsberg uses only dynamic opaque predicate

Conclusion Image and audio watermarking are well established Software watermarking is a relatively new field that has a lot of potential to be explored The technology has the potential to become a cat and mouse game between pirates and software authors/owners My watermark is one of many steps towards perfecting the field of Software Watermarking. It is not the end once and for all the research and work. Just a stepping stone to achieving the never ending elusive goal of the ultimate software watermark. END