Clustering Algorithms for Perceptual Image Hashing

Clustering Algorithms for Perceptual Image Hashing
IEEE Eleventh DSP Workshop, August 3rd 2004 Clustering Algorithms for Perceptual Image Hashing Vishal Monga, Arindam Banerjee, and Brian L. Evans {vishal, abanerje, Embedded Signal Processing Laboratory Dept. of Electrical and Computer Engineering The University of Texas at Austin Research supported by a gift from the Xerox Foundation

Database name search example
Hash Example Hash function: Projects value from set with large (possibly infinite) number of members to set with fixed number of (fewer) members Irreversible Provides short, simple representation of large digital message Example: sum of ASCII codes for characters in name modulo N, a prime number (N = 7) Name Hash Value Ghosh 1 Monga 2 Baldick 3 Vishwanath Evans 5 Geisler Gilbert 6 Database name search example

Perceptual Hash: Desirable Properties
Perceptual robustness Fragility to distinct inputs Randomization Necessary in security applications to minimize vulnerability against malicious attacks Symbol Meaning H(I) Hash value extracted from image I Iident Image identical in appearance to I Idiff Image clearly distinct in appearance vs. I m Length of hash (in bits)

Two-stage hash algorithm
Hashing Framework Two-stage hash algorithm Goal: Retain perceptual significance Let (li, lj) denote vectors in metric space of feature vectors V and 0 < ε < δ, then it is desired Minimizing average distance between clusters inappropriate Feature Vector Extraction Compress (or cluster) Feature Vectors Final Hash Input Image Visually Robust Feature Vector

Cost Function for Feature Vector Compression
Define joint cost matrices C1 and C2 (n x n) n = total number of vectors be clustered, C(li), C(lj) denote the clusters that these vectors are mapped to Exponential cost Ensures severe penalty associated if feature vectors far apart “Perceptually distinct” clustered together α > 0, Г > 1 are algorithm parameters

Cost Function for Feature Vector Compression
Define S1 as *S2 is defined similarly Normalize to get , Then, minimize “expected” cost p(i) = p(li), p(j) = p(lj)

Basic Clustering Algorithm
Obtain ε, δ, set k = 1. Select the data point associated with highest probability mass, label it l1 Make the first cluster by including all unclustered points lj such that D(l1, lj) < ε/2 3. k = k + 1. Select the highest probability data point lk among the unclustered points such that where S is any cluster, C – set of clusters formed till this step Form the kth cluster Sk by including all unclustered points lj such that D(lk, lj) < ε/2 5. Repeat steps 3-4 until no more clusters can be formed

Observations For any (li, lj) in cluster Sk
No errors up to this stage of algorithm Each cluster is at least ε away from any other cluster Within each cluster, maximum distance between any two points is at most ε

Approach 1 Select data point l* among unclustered data points that has highest probability mass For each existing cluster Si, i = 1,2,…, k compute Let S(δ) = {Si such that di ≤ δ} IF S(δ) = {Φ} THEN k = k + 1. Sk = l* is a cluster of its own ELSE for each Si in S(δ) define where denotes the complement of Si i.e. all clusters in S(δ) except Si. Then, l* is assigned to the cluster S* = arg min F(Si) 4. Repeat steps 1 through 3 until all data points are exhausted

Approach 2 Select data point l* among unclustered data points that has highest probability mass For each existing cluster Si, i = 1, 2,…, k, define and β lies in [1/2, 1] Here, denotes the complement of Si i.e. all existing clusters except Si. Then, l* is assigned to the cluster S* = arg min F(Si) 3. Repeat steps 1 and 2 until all data points are exhausted

Summary Approach 1 Approach 2
Tries to minimize conditioned on = 0 Approach 2 Smoothly trades off the minimization of vs. via the parameter β β = ½  joint minimization β = 1  exclusive minimization of Final hash length determined automatically! Given by bits, where k is number of clusters formed Proposed clustering can compress feature vectors in any metric space, e.g. Euclidean, Hamming, and Levenshtein

Error correction decoding [Mihcak &Venkatesan, 2000]
Clustering Results Compress binary feature vector of L = 240 bits Final hash length = 46 bits, with Approach 2, β = 1/2 Value of cost function is orders of magnitude lower for proposed clustering Clustering Algorithm Approach 1 7.64 x 10-8 Approach 2, β = ½ 7.43 x 10-9 7.464 x 10-10 Approach 2, β = 1 7.17 x 10-9 4.87 x 10-9 Error correction decoding [Mihcak &Venkatesan, 2000] 5.96 x 10-4 3.65 x 10-5

Conclusion & Future Work
Two-stage framework for image hashing Feature extraction followed by feature vector compression Second stage is media independent Clustering algorithms for compression Novel cost function for hashing applications Applicable to feature vectors in any metric space Trade-offs facilitated between robustness and fragility Final hash length determined automatically Future work Randomized clustering for secure hashing Information theoretically secure hashing

Clustering Algorithms for Perceptual Image Hashing

Similar presentations

Presentation on theme: "Clustering Algorithms for Perceptual Image Hashing"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Clustering Algorithms for Perceptual Image Hashing

Similar presentations

Presentation on theme: "Clustering Algorithms for Perceptual Image Hashing"— Presentation transcript:

Similar presentations

About project

Feedback