Model-based Steganography Phil Sallee University of California, Davis IWDW 2003 October 20, 2003 Seoul, Korea
Outline Introduction Current methods Model-based steganography framework JPEG steganography example Results Conclusions Future Work
Steganography + = Covered + Writing Cryptography: Conceal message content Steganography: Conceal communication 101101011010101010100101000101101101101010100101011010101011110000101010100101110101101011010100101001000010011101010011110110111101110111010001 + =
Steganography vs. Watermarking Emphasis on avoiding detection Largest hidden message possible Usually fragile Watermarking: Emphasis on avoiding distortion of cover As robust as possible Usually small hidden message
Measurements of Interest Capacity: <message size> / <steganogram size> Embedding Efficiency: <message size> / <# changes to cover>
Current Steganography Methods Coefficient Histogram Maximum Capacity Embedding efficiency JSteg 13% 2 F5 1.5 Outguess 6.5% 1
Can we do better? What is the maximum capacity achievable before risking detection? How can we achieve this maximum capacity? At what embedding efficiency can we obtain this maximum capacity?
Model-based Steganography Cover x is an instance of a random variable X distributed according to model: PX x = ( xa , xb ) Choose x0 = (xa , x0b ) to encode a message M while maintaining model statistics PX
Model-Based Steganography: Encoding
Model-Based Steganography: Decoding
Capacity Maximum capacity = entropy of PXb |Xa: Entropy codec designed to achieve the entropy limit
Steganalysis Determine likelihood that xb is drawn from PXb | Xa(xb | xa). Compute expected message length Decode “message” Longer than expected message indicates a violation of the statistical model
An example: JPEG Steganography Model: marginal statistics of DCT coefficients Achieve maximum capacity without altering marginal statistics Measure capacity, embedding rate achievable Compare results to current JPEG steganography methods F5 and Outguess
u = coefficient value p>1, s>0 are fit to each coefficient type Model u = coefficient value p>1, s>0 are fit to each coefficient type
Model CDF Cumulative density function easy to calculate: Used to integrate density function for a given histogram bin
Fitting the Model Parameters Parameters p, s fit by maximum likelihood: where h is a coefficient histogram
Model Fit to Histogram
Embedding step size = 2 xb Î{0,1} xa = bin group xb = offset (like LSB) xb Î{0,1} xa
Embedding step size = 2 step size = 3 xb Î{0,1} xa = bin group xb = offset (like LSB) step size = 3 xa is lower precision 3 offsets per group xb Î{0,1} xa xb Î{0,1,2} xa
Embedding Efficiency Embedding rate = where p = P(xb = 0 | xa) Change rate = Efficiency =
Embedding Efficiency Embedding efficiency >= 2!
Example Each image is 47k bytes. Which contains a 6.5kb message?
Example original image: 47k steganogram: 47k message: 6.46k (13.7%) embed. efficiency: 2.1
Results Image name File size (bytes) Message size (bytes) Capacity Embedding Efficiency barb 48,459 6,573 13.56% 2.06 boat 41,192 5,185 12.59% 2.03 bridge 55,698 7,022 12.61% 2.07 goldhill 48,169 6,607 13.72% 2.11 lena 37,678 4,707 12.49% 2.16 mandrill 78,316 10,902 13.92%
Histogram Comparison
JPEG Steganography Methods Coefficient Histogram Maximum Capacity Embedding efficiency JSteg 13% 2 F5 1.5 Outguess 6.5% 1 Model-based >2
Conclusions Presented a unifying framework for steganography and steganalysis Proposed method maximizes capacity while preserving a given set of statistics Steganographic security is based on a statistical model of the cover media
Future Work Use extra capacity to correct additional statistics: ‘blockiness’, wavelet statistics Improve model: Dependencies between coefficients Embed in wavelet domain JPEG2000, MP3, MPEG, …
Matlab code available: http://redwood.ucdavis.edu/phil Email: sallee@cs.ucdavis.edu