Clustering Algorithms for Perceptual Image Hashing

Slides:



Advertisements
Similar presentations
Decoding of Convolutional Codes  Let C m be the set of allowable code sequences of length m.  Not all sequences in {0,1}m are allowable code sequences!
Advertisements

Noise, Information Theory, and Entropy (cont.) CS414 – Spring 2007 By Karrie Karahalios, Roger Cheng, Brian Bailey.
Aggregating local image descriptors into compact codes
Clustering Categorical Data The Case of Quran Verses
Computer Networking Error Control Coding
Presented by Xinyu Chang
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
VIPER DSPS 1998 Slide 1 A DSP Solution to Error Concealment in Digital Video Eduardo Asbun and Edward J. Delp Video and Image Processing Laboratory (VIPER)
Randomized Radon Transforms for Biometric Authentication via Fingerprint Hashing 2007 ACM Digital Rights Management Workshop Alexandria, VA (USA) October.
Hidden Markov Models Theory By Johan Walters (SR 2003)
Information Theory Eighteenth Meeting. A Communication Model Messages are produced by a source transmitted over a channel to the destination. encoded.
2-1 Computer Organization Part Fixed Point Numbers Using only two digits of precision for signed base 10 numbers, the range (interval between lowest.
Chapter 1 Data Storage. 2 Chapter 1: Data Storage 1.1 Bits and Their Storage 1.2 Main Memory 1.3 Mass Storage 1.4 Representing Information as Bit Patterns.
1 Chapter 1 Introduction. 2 Outline 1.1 A Very Abstract Summary 1.2 History 1.3 Model of the Signaling System 1.4 Information Source 1.5 Encoding a Source.
Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn
Information Theory and Security
Hamming Code A Hamming code is a linear error-correcting code named after its inventor, Richard Hamming. Hamming codes can detect up to two bit errors,
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Digital Image Watermarking Er-Hsien Fu EE381K Student Presentation.
Lecture 1 Contemporary issues in IT Lecture 1 Monday Lecture 10:00 – 12:00, Room 3.27 Lab 13:00 – 15:00, Lab 6.12 and 6.20 Lecturer: Dr Abir Hussain Room.
Presented by Tienwei Tsai July, 2005
Information Coding in noisy channel error protection:-- improve tolerance of errors error detection: --- indicate occurrence of errors. Source.
Blind Pattern Matching Attack on Watermark Systems D. Kirovski and F. A. P. Petitcolas IEEE Transactions on Signal Processing, VOL. 51, NO. 4, April 2003.
CSIE Dept., National Taiwan Univ., Taiwan
Multiple Image Watermarking Applied to Health Information Management
1 Security and Robustness Enhancement for Image Data Hiding Authors: Ning Liu, Palak Amin, and K. P. Subbalakshmi, Senior Member, IEEE IEEE TRANSACTIONS.
Basic Concepts of Encoding Codes, their efficiency and redundancy 1.
Chih-Ming Chen, Student Member, IEEE, Ying-ping Chen, Member, IEEE, Tzu-Ching Shen, and John K. Zao, Senior Member, IEEE Evolutionary Computation (CEC),
Hashing Algorithms: Basic Concepts and SHA-2 CSCI 5857: Encoding and Encryption.
ERROR CONTROL CODING Basic concepts Classes of codes: Block Codes
Wireless Mobile Communication and Transmission Lab. Theory and Technology of Error Control Coding Chapter 5 Turbo Code.
Great Theoretical Ideas in Computer Science.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
Recent Results in Combined Coding for Word-Based PPM Radu Rădescu George Liculescu Polytechnic University of Bucharest Faculty of Electronics, Telecommunications.
David Wetherall Professor of Computer Science & Engineering Introduction to Computer Networks Error Detection (§3.2.2)
Timo O. Korhonen, HUT Communication Laboratory 1 Convolutional encoding u Convolutional codes are applied in applications that require good performance.
Vector Quantization CAP5015 Fall 2005.
1 Perceptually Based Methods for Robust Image Hashing Vishal Monga Committee Members: Prof. Ross Baldick Prof. Brian L. Evans (Advisor) Prof. Wilson S.
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) [Edited by J. Wiebe] Decision Trees.
Basic Message Coding 《 Digital Watermarking: Principles & Practice 》 Chapter 3 Multimedia Security.
Chapter 1 Algorithms with Numbers. Bases and Logs How many digits does it take to represent the number N >= 0 in base 2? With k digits the largest number.
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
Data Integrity / Data Authentication. Definition Authentication (Signature) algorithm - A Verification algorithm - V Authentication key – k Verification.
Naifan Zhuang, Jun Ye, Kien A. Hua
8 Coding Theory Discrete Mathematics: A Concept-based Approach.
EE465: Introduction to Digital Image Processing
Vishal Monga and Prof. Brian L. Evans
Perceptually Based Methods for Robust Image Hashing
Vishal Monga, Divyanshu Vats and Brian L. Evans
Jinseok Choi, Brian L. Evans and *Alan Gatherer
Watermarking with Side Information
Chapter 1 Data Storage.
Broadcast Encryption Amos Fiat & Moni Naor Advances in Cryptography - CRYPTO ’93 Proceeding, LNCS, Vol. 773, 1994, pp Multimedia Security.
Chapter 9: Huffman Codes
Rank Aggregation.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Fundamentals of Data Representation
Presented by: Chang Jia As for: Pattern Recognition
Handwritten Characters Recognition Based on an HMM Model
Text Categorization Berlin Chen 2003 Reference:
A Block Based MAP Segmentation for Image Compression
Efficient Huffman Decoding
Information Theoretical Analysis of Digital Watermarking
Error Correction Coding
Watermarking with Side Information
Presentation transcript:

Clustering Algorithms for Perceptual Image Hashing IEEE Eleventh DSP Workshop, August 3rd 2004 Clustering Algorithms for Perceptual Image Hashing Vishal Monga, Arindam Banerjee, and Brian L. Evans {vishal, abanerje, bevans}@ece.utexas.edu Embedded Signal Processing Laboratory Dept. of Electrical and Computer Engineering The University of Texas at Austin http://signal.ece.utexas.edu Research supported by a gift from the Xerox Foundation

Database name search example Hash Example Hash function: Projects value from set with large (possibly infinite) number of members to set with fixed number of (fewer) members Irreversible Provides short, simple representation of large digital message Example: sum of ASCII codes for characters in name modulo N, a prime number (N = 7) Name Hash Value Ghosh 1 Monga 2 Baldick 3 Vishwanath Evans 5 Geisler Gilbert 6 Database name search example

Perceptual Hash: Desirable Properties Perceptual robustness Fragility to distinct inputs Randomization Necessary in security applications to minimize vulnerability against malicious attacks Symbol Meaning H(I) Hash value extracted from image I Iident Image identical in appearance to I Idiff Image clearly distinct in appearance vs. I m Length of hash (in bits)

Two-stage hash algorithm Hashing Framework Two-stage hash algorithm Goal: Retain perceptual significance Let (li, lj) denote vectors in metric space of feature vectors V and 0 < ε < δ, then it is desired Minimizing average distance between clusters inappropriate Feature Vector Extraction Compress (or cluster) Feature Vectors Final Hash Input Image Visually Robust Feature Vector

Cost Function for Feature Vector Compression Define joint cost matrices C1 and C2 (n x n) n = total number of vectors be clustered, C(li), C(lj) denote the clusters that these vectors are mapped to Exponential cost Ensures severe penalty associated if feature vectors far apart “Perceptually distinct” clustered together α > 0, Г > 1 are algorithm parameters

Cost Function for Feature Vector Compression Define S1 as *S2 is defined similarly Normalize to get , Then, minimize “expected” cost p(i) = p(li), p(j) = p(lj)

Basic Clustering Algorithm Obtain ε, δ, set k = 1. Select the data point associated with highest probability mass, label it l1 Make the first cluster by including all unclustered points lj such that D(l1, lj) < ε/2 3. k = k + 1. Select the highest probability data point lk among the unclustered points such that where S is any cluster, C – set of clusters formed till this step Form the kth cluster Sk by including all unclustered points lj such that D(lk, lj) < ε/2 5. Repeat steps 3-4 until no more clusters can be formed

Observations For any (li, lj) in cluster Sk No errors up to this stage of algorithm Each cluster is at least ε away from any other cluster Within each cluster, maximum distance between any two points is at most ε

Approach 1 Select data point l* among unclustered data points that has highest probability mass For each existing cluster Si, i = 1,2,…, k compute Let S(δ) = {Si such that di ≤ δ} IF S(δ) = {Φ} THEN k = k + 1. Sk = l* is a cluster of its own ELSE for each Si in S(δ) define where denotes the complement of Si i.e. all clusters in S(δ) except Si. Then, l* is assigned to the cluster S* = arg min F(Si) 4. Repeat steps 1 through 3 until all data points are exhausted

Approach 2 Select data point l* among unclustered data points that has highest probability mass For each existing cluster Si, i = 1, 2,…, k, define and β lies in [1/2, 1] Here, denotes the complement of Si i.e. all existing clusters except Si. Then, l* is assigned to the cluster S* = arg min F(Si) 3. Repeat steps 1 and 2 until all data points are exhausted

Summary Approach 1 Approach 2 Tries to minimize conditioned on = 0 Approach 2 Smoothly trades off the minimization of vs. via the parameter β β = ½  joint minimization β = 1  exclusive minimization of Final hash length determined automatically! Given by bits, where k is number of clusters formed Proposed clustering can compress feature vectors in any metric space, e.g. Euclidean, Hamming, and Levenshtein

Error correction decoding [Mihcak &Venkatesan, 2000] Clustering Results Compress binary feature vector of L = 240 bits Final hash length = 46 bits, with Approach 2, β = 1/2 Value of cost function is orders of magnitude lower for proposed clustering Clustering Algorithm Approach 1 7.64 x 10-8 Approach 2, β = ½ 7.43 x 10-9 7.464 x 10-10 Approach 2, β = 1 7.17 x 10-9 4.87 x 10-9 Error correction decoding [Mihcak &Venkatesan, 2000] 5.96 x 10-4 3.65 x 10-5

Conclusion & Future Work Two-stage framework for image hashing Feature extraction followed by feature vector compression Second stage is media independent Clustering algorithms for compression Novel cost function for hashing applications Applicable to feature vectors in any metric space Trade-offs facilitated between robustness and fragility Final hash length determined automatically Future work Randomized clustering for secure hashing Information theoretically secure hashing