The Capacity of Color Histogram Indexing Dong-Woei Lin NTUT CSIE
Outlines Preliminary Histogram and spatial information Effectiveness of histogram Histogram capacity M. Stricker, The capacity of color histogram indexing, ICCVPR, 1994 R. Brunelli, Histograms analysis for image retrieval, Pattern Recognition, 2001
Preliminary 1/4 Color histogram Incorporating spatial information Color coherence vector Correlogram (autocorrelogram) Proposed method Scale weighted (average distance of pixel pairs) Vector weighted (taking account of angle)
Preliminary 2/4 Performance evaluation (for CBIR) With relevant set through human subject: Precision: Recall: where A(q) and R(q) stands for answer set and relevant set for query image q respectively
Preliminary 3/4 Improving factor φ(for histogram-based) Histogram distance and similarity (based on vector norm or PDF)
Preliminary 4/4 Max.Min.MeanMean of top 10% 31.8%13.0%21.3%14.5% 45.7%15.2%26.0%17.0% 35.7%12.1%19.9%13.1% 40.6%14.7%24.6%15.9%
Histogram Space 1/2 For an image with N pixels, the histogram space ℌ is the subset of an n-dimensional vector space: ℌ For a given distance t : t-similar and t-different Identical (zero distance)
Histogram Space 2/2 Observations: The interval of reasonable values for t coincides with the first interval on the distance distribution increases very rapidly Indexing by color histograms works only if the histogram are sparse, i.e., most of the images contain only a fraction of the number of colors of the color space
The Capacity of Histogram Space 1/5 Definition of histogram capacity: C( ℌ, d, t), for a n-dimensional histogram space ℌ, a metric d, and a distance threshold t Assumption: uniform distribution across the color space
The Capacity of Histogram Space 2/5 Theorem: C( ℌ, d, t) max w,l A(n, 2l, w) α =(wt/2N) l w n, l n/2 A(.) : the maximal number of codewords in any binary code of length n w : constant weight 2l : Hamming distance
The Capacity of Histogram Space 3/5 Using (1, 1, …, 0, 0, …, 1) to denote the histogram: a binary word of length n (number of bin) with exactly w 1 ’ s (non-zero bins) in it each 1 represents the pixel number = N/w (w n) 2l : the number of bins for two such histogram differ (l w) n=64, w=62 N/62 11…..01….01..
The Capacity of Histogram Space 4/5 Distance of histogram H 1 and H 2 for d L1 t, solves l wt/2N = For any admissible w and l, the maximum of A(.) is still smaller than C
The Capacity of Histogram Space 5/5 Corollary for a computable lower bound: C( ℌ, d, t) for L 1, l(w)= wt/2N q: smallest prime power such that q n = n
Histogram analysis for IR Revised notation of histogram capacity: Capacity curve C is defined as the density distribution of the dissimilarity through measure d between two elements of all possible histogram couples within a n- dimensional histogram space ℌ Capacity ℒ (t) =
Histogram analysis for IR Two major differences from Stricker(94) No distance function is defined Transforms difficult task “ maximal number ” into an empirical estimation by considering all image couples within the database The shape of C(t) Indicator of the distribution of histograms Induced by the selected dissimilarity measure The average value of dissimilarity represents the sparseness of histogram space ℌ
Histogram analysis for IR Indexing effectiveness ℰ = Can be used to assess several descriptor- dissimilarity combinations: Norm, distribution distance Chi-square, Kolmogorov-Smironv, Kuiper Hue, luminance, edgeness…
Histogram analysis for IR TestSet: 3500 images All 64 bins Rgb space: 4x4x4 Effectiveness: Hue=70 RGB=64
Experiments Establishments: RGB color space with 4x4x4 quantization Targets: Original image(uncompressed) DC image DC image with scalar-weighted Autocorrelogram of DC image Test sets: x240 JPEG images x128 JPEG images from Berkeley collections
Incorporating Spatial Info. Using mean dist. of all same-color pixel pairs as weight: Similarity measure: Mean value of DCT block For color j, image I 1 I 2 For intersection For Bhattacharyya * For compatible, the similarity will be transformed to dissimilarity * Intersection adopted only for comparison
Incorporating Spatial Info. Autocorrelogram of DC image: Color Dist. 0 0 … … 1 …… 0 1 … d max 0 1 … d max …… Pair number p i,j : pair number of color i with distance j
Simulation Results I TypeCap. Original DC Image DC w. SW Auto TestSet: nor.dat x240 images 1081 hist. Pairs
Simulation Results II TypeCap. Original DC Image DC w. SW Auto TestSet: ber150.dat x128 images hist. Pairs
Semi-conclusion For histogram capacity: Autocorrelogram > scalar-weighted > DC image > original image The shape of autocorrelogram About the representation of curve
Spatial Histogram Capacity Spatial histogram (e.g. edgeness) Assessed features: E[dist] v.s. color # of pair v.s. pair dist
Simulation Result III TestsetCapacity nor ber nor47 ber150
Consistent of Last Exp. Considering the number of samples: Ber150(#)Capacity
Future Works Types and properties of spatial histogram Study spatial descriptor Correlation of spatial and color features Sufficiency of definition of effectiveness