Watermarking Relational Databases Rakesh Agrawal and Jerry Kiernan.

Slides:



Advertisements
Similar presentations
Chapter 7 Hypothesis Testing
Advertisements

Digital Watermarking With Phase Dispersion Algorithm Team 1 Final Presentation SIMG 786 Advanced Digital Image Processing Mahdi Nezamabadi, Chengmeng Liu,
C6 Databases.
Steganography - A review Lidan Miao 11/03/03. Outline History Motivation Application System model Steganographic methods Steganalysis Evaluation and benchmarking.
1 Watermarking Relational Databases Acknowledgement: Mohamed Shehab from Purdue Univ.
Introduction to Watermarking Anna Ukovich Image Processing Laboratory (IPL)
Information Hiding: Watermarking and Steganography
A New Scheme For Robust Blind Digital Video Watermarking Supervised by Prof. LYU, Rung Tsong Michael Presented by Chan Pik Wah, Pat Mar 5, 2002 Department.
Digital Watermarking for Multimedia Security R. Chandramouli MSyNC:Multimedia Systems, Networking, and Communications Lab Stevens Institute of Technology.
Fifth International Conference on Information
Chapter 4  Hash Functions 1 Overview  Cryptographic hash functions are functions that: o Map an arbitrary-length (but finite) input to a fixed-size output.
CMSC 414 Computer and Network Security Lecture 9 Jonathan Katz.
Cryptography (continued). Enabling Alice and Bob to Communicate Securely m m m Alice Eve Bob m.
Watermarking and Steganography Watermarking is the practice of hiding a message about an image, audio clip, video clip, or other work of media within that.
Multimedia Security Digital Video Watermarking Supervised by Prof. LYU, Rung Tsong Michael Presented by Chan Pik Wah, Pat Nov 20, 2002 Department of Computer.
Estimating a Population Proportion
Security in Databases. 2 Outline review of databases reliability & integrity protection of sensitive data protection against inference multi-level security.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
Chapter 14 Inferential Data Analysis
1 CSI5388 Data Sets: Running Proper Comparative Studies with Large Data Repositories [Based on Salzberg, S.L., 1997 “On Comparing Classifiers: Pitfalls.
Lecture slides prepared for “Computer Security: Principles and Practice”, 2/e, by William Stallings and Lawrie Brown, Chapter 21 “Public-Key Cryptography.
© 2002 Prentice-Hall, Inc.Chap 7-1 Statistics for Managers using Excel 3 rd Edition Chapter 7 Fundamentals of Hypothesis Testing: One-Sample Tests.
© 2003 Prentice-Hall, Inc.Chap 9-1 Fundamentals of Hypothesis Testing: One-Sample Tests IE 340/440 PROCESS IMPROVEMENT THROUGH PLANNED EXPERIMENTATION.
Tests of significance & hypothesis testing Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
Watermarking Relational Databases CSC 574/474 Information System Security.
Digital Watermarking With Phase Dispersion Algorithm Team 1 Final Presentation SIMG 786 Advanced Digital Image Processing Mahdi Nezamabadi, Chengmeng Liu,
Watermarking University of Palestine Eng. Wisam Zaqoot May 2010.
Sections 6-1 and 6-2 Overview Estimating a Population Proportion.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
The Argument for Using Statistics Weighing the Evidence Statistical Inference: An Overview Applying Statistical Inference: An Example Going Beyond Testing.
Copyright © 2012 by Nelson Education Limited. Chapter 7 Hypothesis Testing I: The One-Sample Case 7-1.
Non-parametric Tests. With histograms like these, there really isn’t a need to perform the Shapiro-Wilk tests!
© 2003 Prentice-Hall, Inc.Chap 7-1 Business Statistics: A First Course (3 rd Edition) Chapter 7 Fundamentals of Hypothesis Testing: One-Sample Tests.
Introduction to Hypothesis Testing: One Population Value Chapter 8 Handout.
Digital Watermarking Simg-786 Advanced Digital Image Processing Team 1.
Digital Watermarking Sapinkumar Amin Guided By: Richard Sinn.
Technical Seminar Presentation-2004 Presented by : ASHOK KUMAR SAHOO (EI ) NATIONAL INSTITUTE OF SCIENCE & TECHNOLOGY Presented By Ashok Kumar.
Robust Motion Watermarking based on Multiresolution Analysis Tae-hoon Kim Jehee Lee Sung Yong Shin Korea Advanced Institute of Science and Technology.
Information hiding in stationary images staff corporal Piotr Lenarczyk Military Uniwersity of Technology Institute of Electronics and Telecomunication.
A study for Relational Database watermarking scheme Speaker: Pei-Feng Shiu Date: 2012/09/21.
Digital image processing is the use of computer algorithms to perform image processing on digital images which is a subfield of digital signal processing.
Fall 2002CS 395: Computer Security1 Chapter 11: Message Authentication and Hash Functions.
Estimating a Population Proportion
Testing of Hypothesis Fundamentals of Hypothesis.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
Computer Security: Principles and Practice First Edition by William Stallings and Lawrie Brown Lecture slides by Lawrie Brown Chapter 2 – Cryptographic.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
4 Hypothesis & Testing. CHAPTER OUTLINE 4-1 STATISTICAL INFERENCE 4-2 POINT ESTIMATION 4-3 HYPOTHESIS TESTING Statistical Hypotheses Testing.
Monté Carlo Simulation  Understand the concept of Monté Carlo Simulation  Learn how to use Monté Carlo Simulation to make good decisions  Learn how.
Sections 7-1 and 7-2 Review and Preview and Estimating a Population Proportion.
11/16/2015Slide 1 We will use a two-sample test of proportions to test whether or not there are group differences in the proportions of cases that have.
Secure Spread Spectrum Watermarking for Multimedia Young K Hwang.
PRESENTED BY, C.RESHMA –II CSE S.POORNIMA –II IT.
INTRODUCTION TO HYPOTHESIS TESTING From R. B. McCall, Fundamental Statistics for Behavioral Sciences, 5th edition, Harcourt Brace Jovanovich Publishers,
What is a Hypothesis? A hypothesis is a claim (assumption) about the population parameter Examples of parameters are population mean or proportion The.
Hashes Lesson Introduction ●The birthday paradox and length of hash ●Secure hash function ●HMAC.
Lecture 5 Page 1 CS 236 Online More on Cryptography CS 236 On-Line MS Program Networks and Systems Security Peter Reiher.
1 of 53Visit UMT online at Prentice Hall 2003 Chapter 9, STAT125Basic Business Statistics STATISTICS FOR MANAGERS University of Management.
Cryptographic Hash Function. A hash function H accepts a variable-length block of data as input and produces a fixed-size hash value h = H(M). The principal.
[1] National Institute of Science & Technology Technical Seminar Presentation 2004 Suresh Chandra Martha National Institute of Science & Technology Audio.
1 Digital Water Marks. 2 History The Italians where the 1 st to use watermarks in the manufacture of paper in the 1270's. A watermark was used in banknote.
Watermarking Relational Databases
Cryptographic Hash Function
Statistics for Managers using Excel 3rd Edition
Watermarking with Side Information
A Watermarking Scheme for Categorical Relational Databases
INTRODUCTION TO HYPOTHESIS TESTING
Information Theoretical Analysis of Digital Watermarking
Hash Function Requirements
Presentation transcript:

Watermarking Relational Databases Rakesh Agrawal and Jerry Kiernan

Why Watermark Databases  Watermark -- Intentionally introduced pattern in the data ƒhard to occur by chance ƒhard to find => hard to destroy (robust against malicious attack)  Increasing use of databases in applications beyond "behind-the-firewall data processing" involving data publication  Data providers require technical solutions to deter data theft and assert ownership of pirated copies

 Value of the database is significantly reduced if all of k least significant bits of an attribute are dropped or perturbed, but it is acceptable to perturb a small number of attribute values  Datasets from many data publishers satisfy the above assumption (Acceptable to tradeoff a small decrease in quality to assert ownership) ƒTables of parametric specifications (mechanical, electrical, electronic, chemical, etc.), surveys (geological, climatic, etc.), life sciences (e.g. gene expressions) ƒHistorical precedence: Logarithm tables, Astronomical E phemerides, H.P.  Inappropriate dataset: Online bank balances Assumption

 Detectability ƒUsing a subset of the tuples and attributes  Robustness ƒUpdates and malicious attacks  Incremental Updatability ƒOn tuple insert/update/delete  Imperceptibility ƒHard to infer the presence of a watermark  Blind System ƒDetection requires neither the original data nor the watermark  Key-Based System ƒAlgorithm is public ƒSecurity resides in the choice of secret key Desiderata

Related Work  Images [BGM95,HG98,M98,DR00]  Audio [BTH96]  Text [M94]  Software [CT00]

Database Relation Multimedia Object  Consists of a large number of bits, with considerable redundancy => Watermark has a large cover to hide in.  Consists of tuples, each of which represents a separate object => Watermark needs to be spread over these separate objects.  Tuples of a relation constitute a set and there is no implied ordering between them  Relative spatial/temporal positioning of various pieces of an object does not change.  Portions of an object cannot be dropped or replaced arbitrarily without causing perceptual changes in the object.  Pirate can easily drop some tuples/attributes or substitute them with tuples/attributes from other relations Need watermarking techniques designed to take into account special characteristics of relational data Relational data is different from multimedia data

Techniques  Introduce watermarks across a fraction of the tuples in a database relation  Detect the watermark by retrieving a subset of the tuples  Use statistical hypothesis testing to locate the watermark even in the presence of updates to the data

Message Authentication Code  h = H(M), where H is a hash function and M is a message ƒGiven M, easy to compute h ƒGiven h, hard to compute M ƒGiven M, hard to find M' such that H(M) = H(M')  MD5 and SHA are good choices for H  MAC is a one-way hash function which depends on a key K  We use: F(r.P) = H(K o H(K o r.P)), where r.P is the primary key of relation r, and o is concatenation

Watermarking Algorithm  Determine the attributes(s) to be watermarked, the Gap, and the LSBs  For each tuple r, compute MAC: ƒEstablish if r doesn't fall into a gap ƒSelect attribute to be marked ƒDetermine bit position to contain the mark ƒCompute the mark's value ƒUpdate the attribute's value to reflect the watermark, if necessary

Technique A1A2A3A4 PK PK PK PK PK A1A2A3A4 PK PK PK PK PK Before Watermarking After Watermarking PK5 Not selected because in gap B2 of A1 selected for PK1 Value not changed because Mark = 1 Value changed Mark = 1

Without the Private Key, the Watermark is Hard to Destroy  Which tuple contains a mark  Which attribute got marked  Which bit position got marked  The expected value of a mark

Detection Algorithm  Locate suspicious data and extract sample which might contain watermark  For each tuple r, compute MAC: ƒIf r doesn't fall into a gap, extract the mark bit value  Count the number of success and Bernoulli trials  Apply statistical analysis to establish presence of the watermark

Extensions to the Algorithm  Relations with no primary keys  Null values

Evaluation  Analysis  Experiments ƒForest Cover Type dataset from UCI repository

Attacks  Bit attacks ƒRandomize, zero-out, bit flipping, rounding, translation  Subset attack ƒSelect subset of tuples and attributes  Mix-and-match attack ƒCombine data from multiple sources  Additive attack ƒInsert new watermark over existing watermark  Invertibility attack ƒCounterfeit watermark  Benign updates

Cumulative Binomial Probability Distribution b(k;n,p) = ( n k ) p (1-p) k n-k B(k;n,p) = b(i;n,p) S i=k n

Parameters and Defaults  Number of tuples: 1 million  Number of marked attributes: 1  Number of least significant bits: 1  Fraction of tuples marked: 1/1000  Significance level for hypothesis test: 0.01

Proportion of correctly marked tuples required for detectability  The proportion of correctly marked tuples needed for detectability decreases as the number of marks increases  For 1M tuples and 10% of tuples marked, that proportion < 51%  Illustrates the tolerance of the watermark to updates

Proportion of correctly marked tuples needed for decreasing alpha  The data can tolerate a large number of updates while maintaining detectability with high confidence

Excess Error in an Attack  Attacker can be forced to make orders of magnitude more errors than the owner,making his data economically much less attractive compared to that of the owner

Samples in Which the Watermark Could be Detected When the Attacker has Dropped Tuples  Watermark detected in a subset of the tuples of a watermarked relation  Selectivity gives the sample size  Each experiment repeated 100 times  Results show the percentage of trials in which the watermark could be detected

Samples in Which Watermark was Detected When the Attacker has Dropped some Attributes  Watermark detected in a subset of the attributes and tuples of a watermarked relation  Watermark spread across 10 attributes  Selectivity gives the sample size  Each experiment repeated 100 times  Results show the percentage of trials in which the watermark could be detected

Mix-and-Match Attack  Minimum fraction of tuples from the watermarked relation needed for detectability  N is the relation size  N x f = tuples from marked relation  N x (1 - f) = tuples from other relations

Summary  Provided desiderata for a system for watermarking database relations  First watermarking algorithm for database relations  No dependence on tuple ordering  Robust against attacks  Watermark can be incrementally updated  Requires neither the original relation nor the watermark for detection

Future Work  Watermarking extensions to handle non- numeric attributes  New algorithms for fingerprinting to track multiple sources of piracy