1 Perceptually Based Methods for Robust Image Hashing Vishal Monga Committee Members: Prof. Ross Baldick Prof. Brian L. Evans (Advisor) Prof. Wilson S.

Slides:

Advertisements

Similar presentations

Object Recognition from Local Scale-Invariant Features David G. Lowe Presented by Ashley L. Kapron.

Advertisements

Alignment Visual Recognition “Straighten your paths” Isaiah.

Aggregating local image descriptors into compact codes

Wavelets Fast Multiresolution Image Querying Jacobs et.al. SIGGRAPH95.

Presented by Xinyu Chang

Fast Algorithms For Hierarchical Range Histogram Constructions

November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.

Digital Image Processing

Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.

 Image Characteristics  Image Digitization Spatial domain Intensity domain 1.

New Attacks on Sari Image Authentication System Proceeding of SPIE 2004 Jinhai Wu 1, Bin B. Zhu 2, Shipeng Li, Fuzong Lin 1 State key Lab of Intelligent.

The Global Digital Elevation Model (GTOPO30) of Great Basin Location: latitude 38  15’ to 42  N, longitude 118  30’ to 115  30’ W Grid size: 925 m.

Automatic Feature Extraction for Multi-view 3D Face Recognition

Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.

Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

1 Abstract This paper presents a novel modification to the classical Competitive Learning (CL) by adding a dynamic branching mechanism to neural networks.

A Study of Approaches for Object Recognition

Spatial and Temporal Data Mining

Segmentation Divide the image into segments. Each segment:

Detecting Image Region Duplication Using SIFT Features March 16, ICASSP 2010 Dallas, TX Xunyu Pan and Siwei Lyu Computer Science Department University.

Image Enhancement.

Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.

Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Paul Blythe and Jessica Fridrich Secure Digital Camera.

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Digital Image Watermarking Er-Hsien Fu EE381K Student Presentation.

Introduction --Classification Shape ContourRegion Structural Syntactic Graph Tree Model-driven Data-driven Perimeter Compactness Eccentricity.

Chapter 2. Image Analysis. Image Analysis Domains Frequency Domain Spatial Domain.

Computer vision.

Computer Vision James Hays, Brown

Presented by Tienwei Tsai July, 2005

Hierarchical Distributed Genetic Algorithm for Image Segmentation Hanchuan Peng, Fuhui Long*, Zheru Chi, and Wanshi Siu {fhlong, phc,

Robustness Studies For a Multi-Mode Information Embedding Scheme for Digital Images Daniel Eliades Mentor: Dr. Neelu Sinha Department of Math and Computer.

Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.

Multiple Image Watermarking Applied to Health Information Management

CSE 185 Introduction to Computer Vision Pattern Recognition 2.

Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.

776 Computer Vision Jan-Michael Frahm, Enrique Dunn Spring 2013.

Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp

by Mitchell D. Swanson, Bin Zhu, and Ahmed H. Tewfik

1 Iris Recognition Ying Sun AICIP Group Meeting November 3, 2006.

Feature extraction: Corners and blobs. Why extract features? Motivation: panorama stitching We have two images – how do we combine them?

Chapter 13 (Prototype Methods and Nearest-Neighbors )

Digital Image Processing

October 1, 2013Computer Vision Lecture 9: From Edges to Contours 1 Canny Edge Detector However, usually there will still be noise in the array E[i, j],

CS654: Digital Image Analysis

Instructor: Mircea Nicolescu Lecture 5 CS 485 / 685 Computer Vision.

Instructor: Mircea Nicolescu Lecture 10 CS 485 / 685 Computer Vision.

Spread Spectrum and Image Adaptive Watermarking A Compare/Contrast summary of: “Secure Spread Spectrum Watermarking for Multimedia” [Cox ‘97] and “Image-Adaptive.

MMC LAB Secure Spread Spectrum Watermarking for Multimedia KAIST MMC LAB Seung jin Ryu 1MMC LAB.

April 21, 2016Introduction to Artificial Intelligence Lecture 22: Computer Vision II 1 Canny Edge Detector The Canny edge detector is a good approximation.

- photometric aspects of image formation gray level images

CS262: Computer Vision Lect 09: SIFT Descriptors

Vishal Monga and Prof. Brian L. Evans

Perceptually Based Methods for Robust Image Hashing

Vishal Monga, Divyanshu Vats and Brian L. Evans

Wavelets : Introduction and Examples

Mean Shift Segmentation

Watermarking with Side Information

Feature description and matching

Watermarking for Image Authentication ( Fragile Watermarking )

Computer Vision Lecture 16: Texture II

Image Transforms for Robust Coding

Image Coding and Compression

Fourier Transform of Boundaries

Feature descriptors and matching

Presented by Xu Miao April 20, 2005

Clustering Algorithms for Perceptual Image Hashing

Presentation transcript:

1 Perceptually Based Methods for Robust Image Hashing Vishal Monga Committee Members: Prof. Ross Baldick Prof. Brian L. Evans (Advisor) Prof. Wilson S. Geisler Prof. Joydeep Ghosh Prof. John E. Gilbert Prof. Sriram Vishwanath Ph.D. Qualifying Exam Communications, Networks, and Systems Area Dept. of Electrical and Computer Engineering The University of Texas at Austin April 14 th, 2004

2 Introduction Related work – Digital signature techniques for image authentication – Robust feature extraction from images – Open research issues Expected contributions – Framework for robust image hashing using feature points – Clustering algorithms for feature vector compression – Image authentication under geometric attacks via structure matching Conclusion Outline

3 Hash function: Projects value from set with large (possibly infinite) number of members to set with fixed number of (fewer) members in irreversible manner – Provides short, simple representation of large digital message – Hash Scheme – Sum of ASCII codes of characters in a name computed modulo N (= 7)  a prime number Hash Example Introduction Name Hash Value Ghosh 1 Monga 2 Baldick 3 Vishwanath 3 Evans 5 Geisler 5 Gilbert 6 Database name search example

4 Image Hashing: Motivation Introduction Hash functions –Fixed length binary string extracted from a message –Used in compilers, database searching, cryptography –Cryptographic hash: security applications e.g. message authentication, ensuring data integrity Traditional cryptographic hash –Not suited for multimedia  very sensitive to input, i.e. change in one input bit changes output dramatically Need for robust perceptual image hashing –Perceptual: based on human visual system response –Robust: hash values for “perceptually identical” images must be the same (with a high probability)

5 Image Hashing: Motivation Applications –Image database search and indexing –Content dependent key generation for watermarking –Robust image authentication: hash must tolerate incidental modifications yet be sensitive to content changes Introduction Same hash value h 1 Different hash values h2h2 Original Image JPEG Compressed Tampered

6 Perceptual Hash: Desirable Properties Perceptual robustness Fragility to distinct inputs Randomization –Necessary in security applications to minimize vulnerability against malicious attacks Introduction SymbolMeaning H(I) Hash value extracted from image I I sim Image identical in appearance to I I diff Image clearly distinct in appearance w.r.t I m Length of hash (in bits)

7 Introduction Related work – Digital signature techniques for image authentication – Robust feature extraction from images – Open research issues Expected contributions – Framework for robust image hashing using feature points – Clustering algorithms for feature vector compression – Image authentication under geometric attacks via structure matching Conclusion Outline

8 Content Based Digital Signatures Related Work Goal –Authenticate image based on extracted signature Image statistics based on –Intensity histograms of image blocks [Schneider et al., 1996] –mean, variance and kurtosis of intensity values extracted from image blocks and compare then to statistics of reference image [Kailasanathan et al., 2001] Drawbacks –Easy to modify the image without altering its intensity histogram  scheme is less secure –Intensity statistics can be altered easily without significantly changing the image appearance

9 Content Based Digital Signatures… Related Work Feature point based methods –Wavelet based corner detection [Bhatacherjee et al., 1998] –Canny edge detection [Dittman et al., 1999] –Apply public key encryption on the features to arrive at the digital signature Relation based methods [Lin & Chang 2001] –Invariant relationship between discrete cosine transform (DCT) coefficients of two different blocksdiscrete cosine transform (DCT) coefficients Common characteristic of above methods –work well for some attacks viz. JPEG compression –still sensitive to several incidental modifications that do not alter the image appearance

10 Robust Image Hashing: Method # 1 Related Work Image statistics vector from wavelet decomposition of image [Venkatesan et al., 2000] –Averages of wavelet coefficients in coarse sub-bands and variances in other sub-bands Extract Statistics Vector and Quantize [ | |1…………… | ] Error Correction Decoding [ …… 011] Hash Value Vertical freqs. Horizontal freqs. Diagonal freqs.Coarse Details

11 Robust Image Hashing: Method # 2 Related Work Preserve magnitude of low frequency DCT coefficients [Fridrich et al., 2001] –Survives JPEG compression, linear filtering attacks –Very sensitive to geometric distortions (local & global) Randomize using a secret key K –Generate N random smooth patterns P (i), i = 1,…, N –Take vectorized dot product of low frequency DCT coefficients (in block B) with random patterns and use threshold Th to obtain N bits b i Back

12 Robust Image Hashing: Method # 3 Related Work Invariance of coarse wavelet coefficients [Mihcak et al., 2001] Key observation –Main geometric features of image stay invariant under small perturbations to image Hash algorithm –Threshold wavelet coefficients of DC sub-band (coarse robust features) to obtain a binary matrix –Perform filtering and re-thresholding to iteratively arrive at binary map which is then used as the hash –Iterative procedure is designed so as to preserve significant image geometry DC sub-band 3- level Haar wavelet decomposition Back

13 Robust Digital Signature: Method # 4 Related Work Interscale relationship of wavelet coefficients [Lu & Liao, 2003] –Magnitude difference between a parent node and its four child nodes is difficult to destroy (alter) under content-preserving manipulations –s – wavelet scale, o – orientation, 0 ≤ i, j ≤ 1 2-D wavelet decomposition tree w 0,0 (x,y) w 1,3 (2x+1,2y+1) w 1,2 (2x,2y+ 1) w 1,1 (2x+1,2y) w 1,0 (2x,2y)

14 A robust feature point scheme for hashing – Inherent sensitivity to content-changing manipulations e.g. could be useful in authentication – Representation of image content robust to both global and local geometric distortions – Preferably use properties of the human visual system Trade-offs in image hashing – Robustness vs. Fragility, Randomness – Question: Minimum length of the final hash value (binary string) needed to meet the above goals ? Randomized algorithms for secure image hashing Open Issues Related Work Contribution 1 Contribution 3 Contribution 1 Contribution 2

15 Introduction Related Work – Digital signature techniques for Image Authentication – Robust feature extraction from Images – Open research issues Expected contributions – Framework for robust image hashing using feature points – Clustering algorithms for feature vector compression – Image authentication under geometric attacks via structure matching Conclusion Outline

16 Proposed two-stage hash algorithm Hashing Framework Expected Contribution #1 Feature vectors extracted from “perceptually identical” images must be close in a distance metric Final Hash Compression Input Image I

17 Hypercomplex or End-stopped cells Develop filters/kernels that capture this behavior To maintain robustness to changes in image resolution, – Wavelet based approach is needed Cells in the visual cortex that help in object recognition Respond strongly to line end-points, corners and points of high curvature [Hubel et al. 1965, Dobbins 1989] “End-stopping and Image Geometry”, Dobbins, 1989

18 End-Stopped Wavelet Basis Morlet wavelets [Antoine et al., 1996] – To detect linear (or curvilinear) structures having a specific orientation End-stopped wavelet [Vandergheynst et al., 2000] – Apply First Derivative of Gaussian (FDoG) operator to detect end-points of structures identified by Morlet wavelet x – (x,y) 2-D spatial co-ordinates k o – (k 0, k 1 ) wave-vector of the mother wavelet Orientation control -

19 End-Stopped Wavelets…Example Morlet Wavelet along the u-axis – Detects vertically oriented linear structures FDoG operator along frequency axis v – Applied on the Morlet wavelet to detect end-points and corners Synthetic L-shaped imageResponse of Morlet wavelet, orientation = 0 degrees Response of the end-stopped wavelet

20 Computing Wavelet Transform Generalize end-stopped wavelet Employ the wavelet family – Scale parameter = 2, i – scale of the wavelet – Discretize orientation range [0,π ] into M intervals i.e. – θ k = (k π/M ), k = 0, 1, … M - 1 Finally, the wavelet transform is given by Expected Contribution #1

21 Proposed Feature Detection Method [Monga & Evans, 2004] 1.Compute wavelet transform at suitably chosen scale i for several different orientations 2.Significant feature selection: Locations (x,y) in the image that are identified as candidate feature points satisfy 3.Avoid trivial (and fragile) features: Qualify a location as a final feature point if Expected Contribution #1 Randomization: Partition the image into N random regions using a secret key K, extract features from each random regionRandomization Probabilistic Quantization: Quantize feature vector based on distribution (histogram) of image feature points to enhance robustnessProbabilistic Quantization

22 Iterative Feature Extraction Algorithm [Monga & Evans, 2004] 1.Extract feature vector f of length P from image I, quantize f probabilistically to obtain a binary string b f 1 (increase count*) 2.Remove “weak” image geometry: Compute 2-D order statistics (OS) filtering of I to produce I os = OS(I;p,q,r) 3.Preserve “strong” image geometry: Perform low-pass linear shift invariant (LSI) filtering on I os to obtain I lp 4.Repeat step 1 with I lp to obtain b f 2 5.IF (count = MaxIter) go to step 6. ELSE IF D(b f 1, b f 2 ) < ρ go to step 6. ELSE set I = I lp and go to step 1. 6.Set fv(I) = b f 2 Expected Contribution #1 MaxIter, ρ and P are algorithm parameters. * count = 0 to begin with fv(I) denotes quantized feature vector D(.,.) – normalized Hamming distance between its arguments

23 Preliminary Results: Feature Extraction Original Image JPEG, QF = 10 Expected Contribution #1 AWGN, σ = 20 Image Features at Algorithm Convergence

24 Preliminary Results: Feature Extraction Quantized Feature Vector Comparsion D(fv(I), fv(I sim )) < 0.2 D(fv(I), fv(I diff )) > 0.3 *AttackLenaBridgePeppers JPEG, QF = AWGN, σ = Contrast Enhancement Gaussian Smoothing Median Filtering Scaling by 50% Rotation by Rotation by Cropping by 10% Cropping by 20% Table 1. Comparison of quantized feature vectorsfeature vectors Normalized Hamming distance between quantized feature vectors of original and attacked images *Attacked images generated by Stirmark benchmark softwareAttacked images Expected Contribution #1

25 Preliminary Results: Feature Extraction Attack Thresholding of coarse wavelet coefficients Thresholding of coarse wavelet coefficients (Mihcak et al.) Preserve low freq, DCT coefficients Preserve low freq, DCT coefficients (Fridrich et al.) Proposed feature point detector JPEG, QF = 10YES AWGN, σ = 20YESNOYES Gaussian Smoothing YES Median FilteringYESNOYES Scaling 50%YES Rotation 2 degreesYESNOYES Cropping 10%YESNOYES Cropping 20%YESNO * Small object additionSmall object addition NOYESNO * Tamper with facial featuresTamper with facial features YES NO Expected Contribution #1 YES  survives attack, i.e. hash was invariant *content changing manipulations, should be detected

26 Highlights Expected Contribution # 1 Framework for image hashing using feature points – Two stage hash algorithm – Any visually robust feature point detector is a good candidate to be used with the iterative algorithm Trade-offs facilitatedTrade-offs – Robustness vs. Fragility: select feature points such that T 1, T 2 large enough ensures that features are retained in several attacked versions of the image, else removed easily – Robustness vs. Randomization: number of random regions Until N < N max, robustness largely preserved else random regions shrink to the extent that they do not contain significant chunks of image geometry

27 Feature Vector Compression Expected Contribution # 2 Goals in compressing to a final hash value – Cancel small perturbations between feature vectors of “perceptually identical” images – Maintain fragility to distinct inputs – Retain and/or enhance randomness properties for secure hashing Problem statement: Retain perceptual significance – Let (l i, l j ) denote vectors in the metric space of feature vectors V and 0 < ε < δ, then it is desired

28 Possible Solutions Error correction decoding [Venkatesan et al., 2000] – Applicable to binary feature vectors – Break the vector down to segments close to the length of codewords in a suitably chosen error-correcting code More generally vector quantization/clustering – Minimize an “average distance” to achieve compression close to the rate distortion limit – P(l) – probability of occurrence of vector l, D(.,.) distance metric defined on the feature vectors – c k – codewords/cluster centers, S k – k th cluster Expected Contribution # 2

29 Is Average Distance the Appropriate Cost for the Hashing Application? Problems with average distance VQ – No guarantee that “perceptually distinct” feature vectors indeed map to different clusters – no straightforward way to trade-off between the two goals – Must decide number of codebook vectors in advance – Must penalized some errors harshly e.g. if vectors really close are not clustered together, or vectors very far apart are compressed to the same final hash value Define alternate cost function for hashing – Develop clustering algorithm that tries to minimize that cost Expected Contribution # 2

30 Cost Function for Feature Vector Compression Define joint cost matrices C 1 and C 2 (n x n) – n – total number of vectors be clustered, C(l i ), C(l j ) denote the clusters that these vectors are mapped to Exponential cost – Ensures that severe penalty is associated if feature vectors far apart and hence “perceptually distinct” are clustered together Expected Contribution # 2 α > 0, Г > 1 are algorithm parameters

31 Cost Function for Feature Vector Compression Further define S 1 as *S 2 is defined similarly Normalize to get, Then, minimize the “expected” cost – p(i) = p(l i ), p(j) = p(l j ) Expected Contribution # 2

32 Image Authentication Under Geometric Attacks Basic premise – Feature points of a reference image and a geometrically attacked image are related by a suitable transformation – Affine transformation models the geometric distortion x = (x 1, x 2 ), y = (y 1, y 2 ) R – 2 x 2 matrix, t – 2 x 1 vector Hausdorff distance to compare feature points from two images [Atallah, 1983; Rote 1991] – Used in computer vision for locating objects in an image – Relatively insensitive to perturbations in feature points, can tolerate errors due to occlusion or feature detector failure Expected Contribution # 3

33 Image Authentication Under Geometric Attacks Hausdorff distance between point sets A and B – A = {a 1,…, a p } and B = {b 1,…, b q } where – Measures degree of mismatch between two sets Employ structure matching algorithms [Huttenlocher et al. 1993, Rucklidge 1995] – To determine G such that – Here, f r and f c denote feature point sets from reference and candidate image to be authenticated Expected Contribution # 3

34 Conclusion & Future Work Conclusion Feature point based hashing framework Iterative feature detector that preserves significant image geometry, features invariant under several attacks Trade-offs facilitated between hash algorithm goals Algorithms for feature vector compression Novel cost function for the hashing application – Heuristic clustering algorithm(s) to minimize this cost – Randomized clustering for secure hashing Image authentication under geometric attacks Affine transformation to model geometric distortions – Hausdorff distance and structure matching algorithms to determine affine transformation and authenticate

35 Proposed Schedule Conclusion SemesterWork Plan Summer 2004Perform extensive tests on the feature extraction algorithm, implement the solution to stage 1 Fall 2004Develop and finalize the clustering algorithm for feature vector compression. Compare with other approaches viz. error correction decoding Spring 2005Finalize the design and implement the scheme for image authentication under geometric attacks Summer 2005Implement the two-step hash algorithm Fall 2005Write and defend dissertation

36 Backup Slides

37 Parsing in compiling a program Variable names kept in a data structure – Array of pointers, each pointer points to a linked list – Index into the array is a hash value Example: variable name “university” – Hashing Scheme – Sum of ASCII codes of characters in a variable name computed modulo N  a prime number –Check linked list at array index, add string to linked list if it had not been previously parsed Hash: Illustrative Example Introduction

38 End-Stopped Wavelets…Example Morlet Wavelet along the u-axis FDoG operator along frequency axis v Expected Contribution #1 Synthetic L-shaped imageResponse of Morlet wavelet, orientation = 0 degrees Response of the end-stopped wavelet spatial domain frequency domain

39 Content Changing Manipulations Feature Detection Original image Maliciously manipulated image Back

40 Image Conditioning – All images resized to 512 x 512 via triangular interpolation prior to feature extraction – Intensity planes of color images were used Pixel neighborhood – Circular to detect isotropic features – Radius of 5 pixels Iterative Feature Extraction – wavelet scale, i = 3 – MaxIter = 20, ρ = 0.001, P = 128 – LSI filter: zero-phase low pass filter (11 x 11) designed using McCllelan transformations – Order statistics filtering: median with 5 x 5 window Algorithm Parameters Results Back

41 Experimental Results Feature Detection AWGN σ = degree rotation

42 Trade-offs Expected Contribution # 1 Perceptual robustness vs. fragility – Size of the search neighborhood: large  feature points are more robust – Select feature points such that – T 1, T 2 large enough implies features retained in several attacked versions of the image else removed easily Robustness vs. Randomization – Uptil N < N max, robustness largely retained else random regions shrink to the extent that they do not contain significant chunks of image geometry Back

43 Relation Based Scheme : DCT coefficients Digital Signature Techniques Discrete Cosine Transform (DCT) – Typically employed on 8 x 8 blocks Digital Signature by Lin – F p, F q, DCT coefficients at the same positions in two different 8 x 8 blocks –, DCT coefficients in the compressed image Back 8 x 8 block pqN x N image

44 Multi-Resolution Approximations Wavelet Decomposition

45 Back

46 Examples of Perceptually Identical Images Wavelet Decomposition Original ImageContrast EnhancedJPEG, QF = 10 10% cropping3 degree rotation2 degree rotation Back

47 Iterative Hash Algorithm Expected Contribution # 1 Extract Feature Vector Probabilistic Quantization Order Statistics Filtering Linear Shift Invariant Low pass filtering Probabilistic Quantization Extract Feature Vector Input Image D(b1, b2) < ρ

48 Probabilistic Quantization Quantization Feature Vector – f mn = m + H*n Quantization Scheme – L quantization levels – Design quantization bins [l i,l i-1 ) such that – Quantization Rule Back

49 Feature Vector Extraction Feature Detection Randomization – Partition the image into N regions using k-means segmentation – extract feature points from each region – Secret key K is used to generate initial guesses for the clusters (centroids of random regions) – Avoid very small regions since they would not yield robust image features Back

50 Preliminary Results Attack Thresholding of coarse wavelet coefficients Thresholding of coarse wavelet coefficients (Mihcak et al.) Proposed feature point detector JPEG, QF = AWGN, σ = Gaussian Smoothing Median Filtering Scaling 50% Rotation 2 degrees Cropping 10% Cropping 20% * Small object additionSmall object addition * Tamper with facial featuresTamper with facial features Expected Contribution #1 Table 1. Comparison of quantized feature vectors Normalized Hamming distance between quantized feature vectors of original and attacked images

51 Minimizing the Cost Clustering Algorithms Decision Version of the Clustering Problem – For a fixed number of clusters k, is there a clustering with cost less than a constant? – Shown to be NP-complete via a reduction from the k-way graph cut problem [Monga et. al, 2004] Polynomial time greedy heuristic to solve the problem – Select cluster centers based on probability mass of vectors in V – minimize error probabilities in a rigorous sense – Trade-offs: Exclusive minimization of would compromise and vice-versa – Basic algorithm with variations to facilitate trade-offs

52 Basic Clustering Algorithm Clustering Algorithms 1.Obtain ε, δ, set k = 1. Select the data point associated with the highest probability mass, label it l 1 2.Make the first cluster by including all unclustered points l j such that D(l 1, l j ) < ε/2 3.k = k + 1. Select the highest probability data point l k amongst the unclustered points such that where S is any cluster, C – set of clusters formed till this step and 4.Form the k th cluster S k by including all unclustered points l j such that D(l k, l j ) < ε/2 5.Repeat steps 3-4 till no more clusters can be formed

53 Visualization of the Clustering Algorithm Clustering Algorithms

54 Observations Clustering Algorithms For any (l i, l j ) in cluster S k No errors till this stage of the algorithm – Each cluster is atleast ε away from any other cluster and hence there are no errors by violating (1) – Within each cluster the maximum distance between any two points is at most ε, and because 0 < ε < δ there are no errors by violation of (2) – The data points that are left unclustered are atleast 3 ε /2 away from each of the existing clusters Next – Two different approaches to handle the unclustered points

55 Input Image I Final Hash Value Hashing Framework Expected Contribution #1 Compress Features Two-stage Hash algorithm Feature Vectors extracted from “perceptually identical” images must be close in a distance metric Extract visually robust feature vector

56 Approach 1 Clustering Algorithms 1.Select the data point l* amongst the unclustered data points that has the highest probability mass 2.For each existing cluster S i, i = 1,2,…, k compute Let S(δ) = {S i such that d i ≤ δ} 3.IF S(δ) = {Φ} THEN k = k + 1. S k = l* is a cluster of its own ELSE for each S i in S(δ) define where denotes the complement of S i i.e. all clusters in S(δ) except S i. Then, l* is assigned to the cluster S* = arg min F(S i ) 4.Repeat steps 1 through 3 till all data points are exhausted

57 Approach 2 Clustering Algorithms 1.Select the data point l* amongst the unclustered data points that has the highest probability mass 2.For each existing cluster S i, i = 1,2,…, k define and β lies in [1/2, 1] where denotes the complement of S i i.e. all existing clusters except S i. Then, l* is assigned to the cluster S* = arg min F(S i ) 3.Repeat steps 1 and 2 till all data points are exhausted

58 Summary Clustering Algorithms Approach 1 – Tries to minimize conditioned on = 0 Approach 2 – Smoothly trades off the minimization of vs. via the parameter β – β = ½  joint minimization – β = 1  exclusive minimization of Final Hash length determined automatically! – Given by bits, where k is the total number of clusters formed – Proposed clustering can be used to compress feature vectors in any metric space e.g. euclidean, hamming

59 Randomized Clustering for Secure Hashing Clustering Algorithms Heuristic for the deterministic map – Select the highest probability data point amongst the unclustered data points Randomization Scheme – Normalize the probabilities of the existing unclustered data points to define a new probability mass such that where i runs over unclustered points, – Employ a uniformly distributed random variable in [0,1] (generated via a secret key) to select the data point i as a cluster center with probability

60 Randomized Clustering: Illustration Clustering Algorithms Example: s = 1 – 4 data points with probabilities 0.5, 0.25, 0.125, Key Observations – s = 0,  is uniform or any point is selected as the cluster center with the same probability – s =  deterministic clustering Uniform number generation to select data point

61 Clustering: Results Clustering Algorithms Compress binary feature vector of L = 240 bits – Final hash length = 46 bits, with Approach 2, β = 1/2 *Average distortion VQ at the same rate – Value of cost function is orders of magnitude lower for the proposed clustering Clustering Algorithm Approach * Approach 2, β = ½ 7.43 * * Approach 2, β = * * *Average distance VQ5.96 * * 10 -5

62 Conclusion & Future Work Clustering Algorithms Perceptual Image Hashing via Feature Points – Extract Feature Points that preserve significant image geomtery – Based on properties of the Human Visual System (HVS) – Robust to local and global geometric distortions Clustering Algorithms for compression – Randomized to minimize vulnerability against malicious attacks generated by an adversary – Trade-offs facilitated between robustness and randomness, fragility Future Work – Authentication under geometric attacks – Information theoretically secure hashing

63 Feature Points are required to be invariant across “perceptually identical” images – Primary geometric features of the image are largely preserved under small perturbations [Mihcak et. al, 2001] – i.e. extract significant image geometry preserving feature points – Identify what the human eye perceives as “robust” or “invariant” geometric features Edge based detection is not suited – Has problems with high compression ratios, quantization and scaling [Zheng and Chellapa, 1993] – Human recognition performance does not impede even when much edge information is lost [Beiderman, 1987] Perceptual Image Hashing Via Feature Points Image Hashing Via Feature Points

64 ES2 Wavelet End-stopping and image features Example Wavelets – SDoG operator on the morlet wavelet Wavelet behavior – produces a strong response at the center of any oriented linear stimuli of a particular length determined by σ

65 Clustering: Dependence on source distribution Clustering Algorithms Source distributions may be very “skewed” – Trivial clusters may be formed i.e. with very low probability points included – For efficient compression, the number of clusters formed should accurately represent the statistics of the source Solution – Consider the algorithm when m clusters are formed m < k and i < n points already clustered – Assign remaining points i.e. {i + 1, …, n} to the remaining clusters in a fashion similar to the basic algorithm – Compare the expected cost of this clustering vs. the one with k clusters as formed by the algorithm described before, if the increase is not significant terminate with the current number of clusters