Perceptually Based Methods for Robust Image Hashing PhD Defense Perceptually Based Methods for Robust Image Hashing Vishal Monga Committee Members: Prof. Ross Baldick Prof. Brian L. Evans (Advisor) Prof. Wilson S. Geisler Prof. Joydeep Ghosh Prof. John E. Gilbert Prof. Sriram Vishwanath Ph.D. Defense Communications, Networks, and Systems Area Dept. of Electrical and Computer Engineering The University of Texas at Austin April 13th , 2005
Towards a joint signal processing-cryptographic approach…….. Introduction The Dichotomy of Image Hashing Image hash Signal processing methods Capture perceptual attributes well, yield robust and visually meaningful representations Little can be said about how secure these representations are Cryptographic methods Provably secure However, do not respect underlying structure on signals/images Towards a joint signal processing-cryptographic approach……..
Database name search example Introduction Hash Example Hash function: Projects value from set with large (possibly infinite) number of members to set with fixed number of (fewer) members Irreversible Provides short, simple representation of large digital message Example: sum of ASCII codes for characters in name modulo N (= 7), a prime number Name Hash Value Ghosh 1 Monga 2 Baldick 3 Vishwanath Evans 5 Geisler Gilbert 6 Database name search example
Image Hashing: Motivation Introduction Image Hashing: Motivation Hash functions Fixed length binary string from a large digital message Used in compilers, database searching, cryptography Cryptographic hash: security applications e.g. message authentication, ensuring data integrity Traditional cryptographic hash Not suited for multimedia very sensitive to input, i.e. change in one input bit changes output dramatically Need for robust perceptual image hashing Perceptual: based on human visual system response Robust: hash values for “perceptually identical” images must be the same (with high probability)
Image Hashing: Applications Introduction Image Hashing: Applications Applications Image database search and indexing Content dependent key generation for watermarking Robust image authentication: hash must tolerate incidental modifications yet be sensitive to content changes JPEG Compressed Original Image Tampered Different hash values Same hash value h1 h2
Outline Perceptual image hashing Contribution # 1: A unified framework Motivation & applications Contribution # 1: A unified framework Formal definition of desired hash properties/goals Novel two-stage hashing algorithm Review of existing feature extraction techniques Contribution # 2: Robust feature extraction Contribution # 3: Clustering algorithms for feature vector compression Randomized clustering for secure hashing Summary
Perceptual Hash: Desirable Properties Contribution # 1: A Unified Framework Perceptual Hash: Desirable Properties Hash function takes two inputs Image (class of images, e.g. natural images) Secret key (key space) Perceptual robustness Fragility to visually distinct inputs Unpredictability Symbol Meaning H(I, K) Hash value extracted from image I Image identical in appearance to I Image clearly distinct in appearance vs. I q Length of hash (in bits)
Contribution # 1: A Unified Framework Hashing Framework Contribution # 1: A Unified Framework Two-stage hash algorithm [Monga & Evans, 2004] Feature vectors extracted from “perceptually identical” images should be close in some distance metric Input Image I Final Hash Intermediate hash Compression (e.g. 128 bits) (e.g. 1 MB) Extract visually robust feature vector Clustering of similar feature vectors
Outline Perceptual image hashing Contribution # 1: A unified framework Motivation & applications Contribution # 1: A unified framework Formal definition of desired hash properties/goals Novel two-stage hashing algorithm Review of existing feature extraction techniques Contribution # 2: Robust feature extraction Contribution # 3: Clustering algorithms for feature vector compression Randomized clustering for secure hashing Summary
Invariant Feature Extraction Existing techniques Image statistics based approaches Intensity statistics: Intensity histograms of image blocks [Schneider et al., 1996] mean, variance and kurtosis of intensity values extracted from image blocks [Kailasanathan et al., 2001] Statistics of wavelet coefficients [Venkatesan et al., 2000] Relation based approaches [Lin & Chang, 2001] Invariant relationship between corresponding discrete cosine transform (DCT) coefficients in two 8 8 blocks Preserve coarse representations Threshold low frequency DCT coefficients [Fridrich et al., 2001] Low-res wavelet sub-bands [Mihcak & Venkatesan, 2000, 2001] Singular values and vectors of sub-images [Kozat et al., 2004]
Necessitates a joint signal processing-cryptographic approach Open Issues Related Work A robust feature point scheme for hashing Inherent sensitivity to content-changing manipulations (useful in authentication) Representation of image content robust to global and local geometric distortions Exploit properties of human visual system Randomized algorithms for secure image hashing Quantifying impact of randomization in enhancing hash security Trade-offs with robustness/perceptual significance of hash Necessitates a joint signal processing-cryptographic approach
Outline Perceptual image hashing Contribution # 1: A unified framework Motivation & applications Review of existing techniques Contribution # 1: A unified framework Formal definition of desired hash properties/goals Novel two-stage hashing algorithm Contribution # 2: Robust feature extraction Contribution # 3: Clustering algorithms for feature vector compression Randomized clustering for secure hashing Summary
Hypercomplex or End-Stopped Cells Cells in visual cortex that help in object recognition Respond strongly to line end-points, corners and points of high curvature [Hubel et al.,1965; Dobbins, 1989] End-stopped wavelet basis [Vandergheynst et al., 2000] Apply First Derivative of Gaussian (FDoG) operator to detect end-points of structures identified by Morlet wavelet Synthetic L-shaped image Morlet wavelet response End-stopped wavelet response
Contribution # 2: Robust Feature Extraction Computing Wavelet Transform Generalize end-stopped wavelet Employ wavelet family Scale parameter = 2, i – scale of the wavelet Discretize orientation range [0, π] into M intervals i.e. θk = (k π/M ), k = 0, 1, … M - 1 End-stopped wavelet transform
Contribution # 2: Robust Feature Extraction Proposed Feature Detection Method [Monga & Evans, 2004] Compute wavelet transform of image I at suitably chosen scale i for several different orientations Significant feature selection: Locations (x,y) in the image that are identified as candidate feature points satisfy Avoid trivial (and fragile) features: Qualify a location as a final feature point if Randomization: Partition image into N (overlapping) random regions using a secret key K, extract features from each random region Perceptual Quantization: Quantize feature vector based on distribution (histogram) of image feature points to enhance robustness
Contribution # 2: Robust Feature Extraction Iterative Feature Extraction Algorithm [Monga & Evans, 2004] Extract feature vector f of length P from image I, quantize f perceptually to obtain a binary string bf1 (increase count*) 2. Remove “weak” image geometry: Compute 2-D order statistics (OS) filtering of I to produce Ios = OS(I;p,q,r) 3. Preserve “strong” image geometry: Perform low-pass linear shift invariant (LSI) filtering on Ios to obtain Ilp 4. Repeat step 1 with Ilp to obtain bf2 5. IF (count = MaxIter) go to step 6. ELSE IF D(bf1, bf2) < ρ go to step 6. ELSE set I = Ilp and go to step 1. 6. Set fv(I) = bf2 MaxIter, ρ, P, and count are algorithm parameters. * count = 0 to begin with fv(I) denotes quantized feature vector D(.,.) – normalized Hamming distance between its arguments
Contribution # 2: Robust Feature Extraction Image Features at Algorithm Convergence Original image JPEG with Quality Factor of 10 Additive White Gaussian Noise with zero mean and σ = 10 Stirmark local geometric attack
Contribution # 2: Robust Feature Extraction Quantitative Results: Feature Extraction Quantized feature vector comparison D(fv(I), fv(Iident)) < 0.2 D(fv(I), fv(Idiff)) > 0.3 *Attack Lena Bridge Peppers JPEG, QF = 10 0.04 0.06 AWGN, σ = 20 0.03 0.02 Contrast Enhancement 0.00 Gaussian Smoothing 0.01 0.05 Median Filtering 0.07 Scaling by 50% 0.08 0.14 0.11 Rotation by 20 0.12 0.15 Rotation by 50 0.18 0.20 0.19 Cropping by 10% 0.13 Cropping by 20% 0.21 0.22 0.24 Table 1. Comparison of quantized feature vectors Normalized Hamming distance between quantized feature vectors of original and attacked images *Attacked images generated by Stirmark benchmark software
Comparison with other approaches Contribution # 2: Robust Feature Extraction Comparison with other approaches Attack Threshold coarse wavelet coefficients (Mihcak et al., 2001) Preserve low freq, DCT coefficients (Fridrich et al., 2001) Proposed feature point detector JPEG, QF = 10 YES AWGN, σ = 20 NO Gaussian Smoothing Median Filtering Scaling 50% Rotation 2 degrees Cropping 10% Cropping 20% * Small object addition * Tamper with facial features YES survives attack, i.e. features were close *content changing manipulations, should be detected
Outline Perceptual image hashing Contribution # 1: A unified framework Motivation & applications Contribution # 1: A unified framework Formal definition of desired hash properties/goals Novel two-stage hashing algorithm Review of existing feature extraction techniques Contribution # 2: Robust feature extraction Contribution # 3: Clustering algorithms for feature vector compression Randomized clustering for secure hashing Summary
Clustering: Problem Statement Feature Vector Compression Goals in compressing to a final hash value Significant dimensionality reduction while retaining robustness, fragility to distinct inputs, randomization. Question: Minimum length of the final hash value (binary string) needed to meet the above goals ? Problem Statement Let (li, lj) denote vectors in the metric space of feature vectors V, then it is desired with high probability where 0 < ε < δ, C(li), C(lj) denote the clusters to which these vectors are mapped
Clustering: Possible compression methods Possible Solutions Error correction decoding [Venkatesan et al., 2000] Applicable to binary feature vectors Break the vector down to segments close to the length of codewords in a suitably chosen error correcting code More generally vector quantization/clustering Minimize an “average distance” to achieve compression close to the rate distortion limit (metric space of feature vectors) P(l) – probability of occurrence of vector l D(.,.) distance metric defined on feature vectors ck – codewords/cluster centers, Sk – kth cluster
Contribution # 3: Clustering Algorithms Is Average Distance the Appropriate Cost for the Hashing Application? Problems with average distance VQ No guarantee that “perceptually distinct” feature vectors indeed map to different clusters – no straightforward way to trade-off between the two goals Must decide number of codebook vectors in advance Must penalize some errors harshly e.g. if vectors really close are not clustered together, or vectors very far apart are compressed to the same final hash value Define alternate cost function for hashing Develop clustering algorithm that tries to minimize that cost
Contribution # 3: Clustering Algorithms Cost Function for Feature Vector Compression Define joint cost matrices C1 and C2 (n x n) n – total number of vectors be clustered, C(li), C(lj) denote the clusters that these vectors are mapped to Exponential cost Ensures that severe penalty is associated if feature vectors far apart and hence “perceptually distinct” are clustered together α > 0, Г > 1 are algorithm parameters
Contribution # 3: Clustering Algorithms Cost Function for Feature Vector Compression Further define S1 as *S2 is defined similarly Normalize to get , Then, minimize the “expected” cost p(i) = p(li), p(j) = p(lj)
Contribution # 3: Clustering Algorithms Clustering: Hardness Claims & a Good Heuristic Decision version of the clustering problem For a fixed number of clusters k, is there a clustering with cost less than a constant? k-way weighted graph cut problem: known to be NP-complete and reduces to our clustering problem in log-space [Monga et al., 2004] A good heuristic? Motivated by the stable roommate/spouse problem Give preference to the “bully” or the strongest candidates in ordered fashion: intuitively this minimizes the grief Our clustering problem Notion of strength is captured by the probability mass of the data point/feature vector
Contribution # 3: Clustering Algorithms Basic Clustering Algorithm [Monga et al. 2004] Type I error: Type II error: Heuristic: Select the data point associated with the highest probability mass as the cluster center For any (li, lj) in cluster Sk No errors until this stage of the algorithm
Contribution # 3: Clustering Algorithms Handling the unclustered data points Approach 2 Approach 1 Assign to the cluster that incurs the minimum cost All clusters are candidates: assign to one that minimizes a joint cost
Contribution # 3: Clustering Algorithms Clustering Algorithms: Revisited Approach 1 Tries to minimize conditioned on = 0 Approach 2 Smoothly trades off the minimization of vs. via the parameter β: β = ½ joint minimization Final hash length determined automatically! Given by bits, where k is the total number of clusters Proposed clustering can be used to compress feature vectors in any metric space, i.e. no assumptions on the topology
Clustering: Results Compress binary feature vector of L = 240 bits Contribution # 3: Clustering Algorithms Clustering: Results Compress binary feature vector of L = 240 bits ε = 0.2, δ = 0.3 (normalized hamming distance) Clustering Algorithm Final Hash Length Approach 1 7.64 * 10-8 54 bits Approach 2, β = ½ 7.43 * 10-9 7.46 * 10-10 Approach 2, β = 1 7.17 * 10-9 4.87 * 10-9 Decoding via Reed-Muller Error Correction Codes 5.96 * 10-4 3.65 * 10-5 75 bits Average Distance VQ 3.25 * 10-4 7.77 * 10-5 60 bits At approximately the same rate, the cost is orders of magnitude lower for the proposed clustering
Contribution # 3: Clustering Algorithms Validating the Perceptual Significance Applied the two-stage hash algorithm to a natural image database of 100 images For each image 20 perceptually identical images were generated using the Stirmark benchmark software Attacks included JPEG compression with varying quality factors, AWGN addition, geometric attacks viz. small rotation and cropping, linear/non-linear filtering etc. Results Robustness: Final hash values for the original and “distorted” images same in over 95% cases Fragility: 1 collision in all pairings (4950) of 100 images In comparison, 40 collisions for traditional VQ and 25 for error correction decoding More analysis
Outline Perceptual image hashing Contribution # 1: A unified framework Motivation & applications Contribution # 1: A unified framework Formal definition of desired hash properties/goals Novel two-stage hashing algorithm Review of existing feature extraction techniques Contribution # 2: Robust feature extraction Contribution # 3: Clustering algorithms for feature vector compression Randomized clustering for secure hashing Summary
Contribution # 3: Clustering Algorithms Randomized Clustering Heuristic for the deterministic map Select the highest probability data point as the cluster center Randomization Scheme Select cluster centers probabilistically via a randomization parameter i runs over unclustered data points On the role of s…. s = 0, implies is uniform or any point is selected as the cluster center with the same probability s implies deterministic clustering
Security Via Randomization Contribution # 3: Clustering Algorithms Security Via Randomization Conjecture Randomization makes generation of malicious inputs harder Adversary model U : set of all possible feature vector pairs in L : the error set for deterministic clustering Adversary has complete knowledge of feature extraction and deterministic clustering will contrive to generate input pairs over E Clustering cost computed over the error set E As randomization increases, adversary achieves little success
Contribution # 3: Clustering Algorithms The rest of the story……. Cost over the set (complement of the error set) Cost over the set An appropriate choice of s preserves perceptual robustness while significantly enhancing security: result of a joint crypto-signal processing approach
Contribution # 3: Clustering Algorithms Uniformity of the hash distribution Kullback-Leibler (KL) distance of the hash distribution measured against the uniform distribution Hash distribution is close to uniform for s < 1000
Summary of contributions Two-stage hashing framework Media dependent feature extraction followed by (almost) media independent clustering Robust feature extraction from natural images Iterative feature extractor that preserves significant image geometry, features invariant under several attacks Algorithms for feature vector compression Novel cost function for the hashing application Greedy heuristic based clustering algorithms Randomized clustering for secure hashing Image authentication under geometric attacks (not presented)
Questions and Comments!