Data Hiding in Image and Video Part I: Fundamental Issues and Solutions ECE 738 Class Presentation By Tanaphol Thaipanich
Introduction Data hiding/Digital watermark = schemes to embed secondary data in digital media Data hiding/Digital watermark = schemes to embed secondary data in digital media Application: Ownership Protection, Access control, Authentication Application: Ownership Protection, Access control, Authentication Data hiding = Communication problem Data hiding = Communication problem Embedded Data = Signal to be transmitted Embedded Data = Signal to be transmitted
Introduction Embedding Capacity Vs Robustness Embedding Capacity Vs Robustness Distortion – imperceptibly small for commercial or artistic reason Distortion – imperceptibly small for commercial or artistic reason Actual Noise Conditions Actual Noise Conditions Overestimate – waste capacity Overestimate – waste capacity Underestimate – corruption of embedded bits Underestimate – corruption of embedded bits Uneven distribution of embedding capacity Uneven distribution of embedding capacity # of embeddable bit varies from location to location # of embeddable bit varies from location to location
Data Hiding Framework
Key Elements in Data Hiding System Upper Layers Build on top to obtain additional functionalities Three key elements (1)Mechanism for embedding one bit (2)Perceptual model imperceptibility (3)Modulation/Multiplexing techniques multiples bits.
Two Basic Embedded Mechanisms Type – I: Additive Embedding Type – I: Additive Embedding Adding secondary data to host signal I 1 – I 0 = f(b) Adding secondary data to host signal I 1 – I 0 = f(b) I 0 = Major noise source I 0 = Major noise source Knowledge of I 0 will enhance detection performance Knowledge of I 0 will enhance detection performance
Two Basic Embedded Mechanisms Type – II: Relationship Enforcement Embedding Type – II: Relationship Enforcement Embedding Deterministic enforcing relationship b = g(I 1 ) Deterministic enforcing relationship b = g(I 1 ) Minimize perceptual distortion I 1 closes to I 0 Minimize perceptual distortion I 1 closes to I 0 Doesn’t need a knowledge of I 0 Information about b carried in I 1 Doesn’t need a knowledge of I 0 Information about b carried in I 1
Comparison for Type I & II Capacity Vs Robustness under “Blind Detection” Capacity Vs Robustness under “Blind Detection” Simplified Additive Model (Type I) Simplified Additive Model (Type I) For simplicity, M i = i.i.d N(0, σ M 2 ) For simplicity, M i = i.i.d N(0, σ M 2 ) Optimal detector (Min prob. Of error) Optimal detector (Min prob. Of error) Detection Statistic Normalized correlation Detection Statistic Normalized correlation
Comparison for Type I & II T N is Gaussian dist with unit variance and this following mean T N is Gaussian dist with unit variance and this following mean Minimize probability of error Minimize probability of error Raise the ratio of total watermark energy to noise power Raise the ratio of total watermark energy to noise power What should we do? What should we do?
Comparison for Type I & II Given the same noise power A watermark with higher power distortion Using longer signal lower embedding capacity
Comparison for Type I & II Type II – no interference from host media & coding one bit in small number of host components High capacity Type II – no interference from host media & coding one bit in small number of host components High capacity Odd – Even embedding Odd – Even embedding
Comparison for Type I & II Robustness comes from quantization or tolerance zone Robustness comes from quantization or tolerance zone Large Q = more tolerance (-Q/2, Q/2) Large Q = more tolerance (-Q/2, Q/2) Assume host components within +/- Q of kQ to be uniform distribution MSE = 1/3*Q 2 Assume host components within +/- Q of kQ to be uniform distribution MSE = 1/3*Q 2 Large Q = larger distortion Large Q = larger distortion
Comparison for Type I & II Type I: Excellent robustness and invisibility when original host is available. For blind detection, using longer watermark to present 1 bit = More robust but less capacity Type II: High data-rate data hiding applications that do not have to survive noise
Quantified Capacity Study Type I embedding Type I embedding Channel model = CICO Channel model = CICO Additive noise Host interference & Processing Noise – i.i.d Gaussian dist Additive noise Host interference & Processing Noise – i.i.d Gaussian dist Shannon channel capacity C = W log 2 (1+S/N) Shannon channel capacity C = W log 2 (1+S/N) A 2 = power of embedded signal A 2 = power of embedded signal σ I 2 = power of host signal σ I 2 = power of host signal σ 2 = power of processing noise ( σ I 2 >> σ 2 ) σ 2 = power of processing noise ( σ I 2 >> σ 2 ) W = ½ (MTF50) W = ½ (MTF50)
Quantified Capacity Study Type II embedding Type II embedding Channel model = DIDO Channel model = DIDO C DIDO = 1 - h P ; h P = binary entropy C DIDO = 1 - h P ; h P = binary entropy h P = p.log(1/p) + (1-p).log(1/(1-p)) and h P = p.log(1/p) + (1-p).log(1/(1-p)) and
Capacity Comparison for Type I&II Problem setting - fixing MSE introduced by embedding process to E 2 – control perceptual quality Problem setting - fixing MSE introduced by embedding process to E 2 – control perceptual quality Type I: power of embedded signal = E 2 Type I: power of embedded signal = E 2 Gaussian processing noise – σ 2 Gaussian processing noise – σ 2 Host interference – σ I = 10E Host interference – σ I = 10E Type II: MSE = E 2 Q= Type II: MSE = E 2 Q=
Capacity Comparison for Type I&II Type I: Suitable for strong noise condition Type II: Useful under low noise condition
Multi-level Embedding Design by targeting single WNR (watermark-to- noise ratio) Design by targeting single WNR (watermark-to- noise ratio) Actual noise is stronger no extractable data Actual noise is stronger no extractable data Actual noise is weaker waste embedding capacity Actual noise is weaker waste embedding capacity
Multi-level Embedding Use two targeted value of WNR Use two targeted value of WNR Fraction α 1 of the embedded data survive WNR of x 1 and all embedded data survive a higher WNR of x 2 Fraction α 1 of the embedded data survive WNR of x 1 and all embedded data survive a higher WNR of x 2 Too many embedding level degradation Too many embedding level degradation
Handling Uneven Embedding Capacity Unevenly distributed embedding capacity comes from the non-stationary nature of perceptual sources. Ex) Changes made in smooth areas are easier to be perceived than those in texture areas Goal: embed as many bits as possible in each region (Highest capacity) conveying side information = large overhead Lower capacity
Handling Uneven Embedding Capacity Easy way to overcome high overhead problem Easy way to overcome high overhead problem Embed a fixed number of bits in each region = no need for side information CER (Constant Embedding Rate) Embed a fixed number of bits in each region = no need for side information CER (Constant Embedding Rate) Need to ensure that Need to ensure that Fixed number of bits is small Fixed number of bits is small Size of each region is large Size of each region is large Result in significant waste in embedding capacity Result in significant waste in embedding capacity Does increasing region’s size really help? Does increasing region’s size really help?
Handling Uneven Embedding Capacity Embeddable or unembeddable Blockwise DCT transform (using 8x8 16x16) Compare magnitude of DC/AC coefficient with Perceptual Threshold Smooth region = no embeddable coefficients
Handling Uneven Embedding Capacity 20% of 8x8 Block are smooth 15% of 16x16 Block are smooth Increase block size is ineffective in reducing the number of segment with zero embeddable coefficient What should we do?
Backup Embedding Embed the same data in multiple area Embed the same data in multiple area Backup embedding with L location = Increase block size by L time – Why is it better? Backup embedding with L location = Increase block size by L time – Why is it better? Shuffling is generalization of backup embedding Shuffling is generalization of backup embedding Block size = 1 and locations are specified by permutation function Block size = 1 and locations are specified by permutation function
Quick Questions Q1: Why embedding capacity is important? Q1: Why embedding capacity is important? Q2: MTF – Modulation Transfer Function Q2: MTF – Modulation Transfer FunctionModulation Transfer FunctionModulation Transfer Function Q3: Shuffle? Q3: Shuffle?
Shuffling Random/Non-random permutation function Focus on “complete random permutation” All permutations are equiprobable (1/S!)
Shuffling Complete random permutation Complete random permutation m r /N = the fraction of segments having r embeddable coefficients m r /N = the fraction of segments having r embeddable coefficients Considering simple scenario throw balls in holes Considering simple scenario throw balls in holes
Shuffling
Shuffling Before: 20% of segments have no embeddable coefficient (case: 8x8) Before: 20% of segments have no embeddable coefficient (case: 8x8) After shuffling: E[m 0 /N] = 0.002% Block After shuffling: E[m 0 /N] = 0.002% Block Most segments have embeddable coefficients Most segments have embeddable coefficients
Shuffling Image with high fraction of embeddable coefficients use small segment = high capacity Image with high fraction of embeddable coefficients use small segment = high capacity Smooth image use large segment to ensure no region with zero embeddable coefficient Smooth image use large segment to ensure no region with zero embeddable coefficient What actually is shuffling ? What actually is shuffling ? Allocate embeddable coefficients from non-smooth region to smooth region Allocate embeddable coefficients from non-smooth region to smooth region Any drawback? Any drawback?
Practical Consideration Shuffling = very low probability of getting block with no embeddable coefficient BUT there is still a chance Shuffling = very low probability of getting block with no embeddable coefficient BUT there is still a chance Solution: use primary and secondary shuffle – significantly different from each other Solution: use primary and secondary shuffle – significantly different from each other As discuss before, Shuffling increases sensitivity but it is alright for some applications that benefit comes from hidden data As discuss before, Shuffling increases sensitivity but it is alright for some applications that benefit comes from hidden data
Variable Embedding Rate (VER) Allow more data to be embedded if the average overhead is relatively small compared with average embedding capacity per segment Allow more data to be embedded if the average overhead is relatively small compared with average embedding capacity per segment Main issue = How to convey side information Main issue = How to convey side information Same/Different embedding mechanism Same/Different embedding mechanism Allocate more energy to side information = more robust but reduce capacity Allocate more energy to side information = more robust but reduce capacity Q: Embedding side information using same embedding mechanism – “Key in locked box” Q: Embedding side information using same embedding mechanism – “Key in locked box”
Variable Embedding Rate (VER) Part of embedded data are pre-determined Part of embedded data are pre-determined Ex: Detector decode data using all candidate shuffles Accurately decoding = actual mechanism Ex: Detector decode data using all candidate shuffles Accurately decoding = actual mechanism Same idea for segment size (to lessen complexity, using primary / secondary size Same idea for segment size (to lessen complexity, using primary / secondary size
Thank you Have question? Reach me at PPT –
MTF MTF – Spatial frequency response MTF applies only in the horizontal direction MTF 50, MTF 10, MTF 2 - Cut-off Go Back