An Image Database Retrieval Scheme Based Upon Multivariate Analysis and Data Mining Presented by C.C. Chang Dept. of Computer Science and Information Engineering, National Chung Cheng University
Outline Introduction Image Retrieval The Proposed Scheme Based Upon PCA and Data Mining Image Feature Extraction Data Mining for Image Features Illustration Future Works Conclusions
Introduction Image database Query image The ability to develop an efficient and effective image retrieval system to access desired images in the depth of the database has been a more and more interesting and challenging topic of research
Introduction Image retrieval system Text-based retrieval Text-based Content-based Text-based retrieval Query by keywords Keywords: setting sun, mountain, ocean, purple,… The ability to develop an efficient and effective image retrieval system to access desired images in the depth of the database has been a more and more interesting and challenging topic of research
Introduction Content-based image retrieval Images are indexed by their content, color, shape, texture, features and so on. Feature extraction methods Histogram Neural network (NN) Support vector machines (SVM) Genetic algorithm (GA) Principal component analysis (PCA) … The ability to develop an efficient and effective image retrieval system to access desired images in the depth of the database has been a more and more interesting and challenging topic of research
The proposed scheme based upon PCA and data mining If a digital image can be transformed into a transaction database, then we can use its corresponding derived association rules as its main features to filter out all the undesired digital images for a query image.
Principal component analysis (PCA) Given a set of points Y1, Y2, …, and YM where every Yi is characterized by a set of variables X1, X2, …, and XN. We want to find a direction D = (d1, d2, …, dN), where such that the variance of points projected onto D is maximized.
Principal component analysis (PCA) Algorithm of PCA Start by coding the variables Y = (Y1, Y2, …YN) to have zero means and unit variances. Calculate the covariance matrix C of the samples. Find the eigenvalues λ1, λ2, …, λN, for C, where λi λi+1, i = 1, 2, …, N-1. Let D1, D2, … DN denote the corresponding eigenvectors. D1 is the first principal component direction, D2 is the second principal component direction, … , DN is the Nth principal component direction .
Principal component analysis (PCA) Let A be a n*n covariance matrix. is an eigenvalue of A, and x is an eigenvector associated with the eigenvalue x = Ix, where I is an n*n identity matrix The characteristic polynomial of the matrix A
Principal component analysis (PCA) For example, Let A be a 2*2 matrix.
PCA For example, 40 samples with 2 variables, X1 and X2 Covariance matrix λ1 =1160.139 λ2 =36.780
Principal component analysis (PCA) D1 = [0.710 0.703] D2 = [-0.703 0.710]
Image Feature Extraction -PCA Gray level value M = Next, we shall illustrate how PCA can be used to extract features from images. There is a example image M with 10 * 10 pixels.
Image Feature Extraction -PCA 10*10 pixels Each block with 4 pixels We partition the image into 5 * 5 blocks each with 4 pixels. Where NB is the number of blocks which is 25. Number of blocks (NB) is 25
Image Feature Extraction -PCA Let matrix A be a matrix, which collects blocks of the image.
Image Feature Extraction -PCA (1) Compute the covariance matrix of an image C1 C4 CM = Next, we construct a variance covariance matrix (VCM) for A. Each column can be regarded as a variable, which means the number of variables is N. Let Ck denote a variable that is the kth column of A. Here we given two variables Cs and Ct, and are the means of Cs and Ct, respectively. Equation shows the formula of covariance between any two variables. Var (Ck) = Cov(Ck,Ck)
Image Feature Extraction -PCA (1) Compute the covariance matrix of an image CM = This slide shows the variance covariance matrix of A.
Image Feature Extraction -PCA (2) Determine eigenvalues and eigenvectors =21860, =1743, =877.335, and =393.73, The EValues of M are =21860, =1743, =877.335, and =393.73. Eigenvalues
Image Feature Extraction -PCA (2) Determine eigenvalues and eigenvectors CM = =21860, =1743, =877.335, and =393.73, Eigenvectors Each EVector corresponds to an EValue; therefore, there are as many EVectors as EValues. Each EVector can be seen as a direction of an axis.
Image Feature Extraction -PCA (3) Form the principal components (PCs) M = 23.9 = 20 * 0.419 + 8 * 0.488 + 15 * 0.57 + 6 * 0.511
Image Feature Extraction -PCA (4) Normalize the projected values
Image Feature Extraction -PCA (4) Normalize the projected values
Principal component analysis (PCA) PCA is a popular multivariate analysis technique, which can be used to extract features from images and to filter candidate images from image database. Nerveless, the number of candidate images offered by PCA is usually very large for a huge image database. Therefore, data mining technique is applied to speed up the retrieving speed and increase the accuracy rate.
Data Mining – Association Rules Candidate 1-itemsets I = {A, B, C, D} Frequent 1-itemsets Minimum Support = 3
Data Mining – Association Rules Candidate 2-itemsets I = {A, B, C, D} Frequent 2-itemsets Minimum Support = 3
Data Mining – Association Rules Minimum Confidence = 100% Frequent 2-itemsets Association Rules
Data Mining for Image Features
Data Mining for Image Features
Data Mining for Image Features Database for Normalization Projected Image(NPIDB) In Horizontal Direction
Data Mining for Image Features Minimum Support = 3 Candidate 1-itemsets Candidate 2-itemsets Frequent 1-itemsets
Data Mining for Image Features Minimum Confidence = 75% Frequent 2-itemsets
Data Mining for Image Features Association Rules in Horizontal Direction
Data Mining for Image Features Database for Normalization Projected Image(NPIDB) In Vertical Direction
Data Mining for Image Features Association Rules in Vertical Direction
Data Mining for Image Features Database for Normalization Projected Image(NPIDB) In Diagonal Direction
PCA and data mining
Illustration 450 full-color images 300 blocks for each image 4*4 pixels for a block
Illustration A query image Q The set of eigenvalues of Q is {0, 2, 4, 6, 8}
Illustration Rules of Q are File name is “SW003.JPG.”
Future works - VQ and PCA Vector Quantization (VQ) An image is separated into a set of input vectors Each input vector is matched with a codeword of the codebook
Vector Quantization (VQ) Definition of vector quantization (VQ): , where Y is a finite subset of Rk. VQ is composed of the following three parts: Codebook generation process, Encoding process, and Decoding process.
Vector Quantization (VQ) Image Index table
Vector Quantization (VQ) Codebook generation 1 . N-1 N Training Images Training set Separating the image to vectors
Vector Quantization (VQ) Codebook generation 1 . 1 . 254 255 N-1 N Initial codebook Training set Codebook initiation
Vector Quantization (VQ) 1 . Index sets 1 . 254 255 (1, 2, 5, 9, 45, …) (101, 179, 201, …) (8, 27, 38, 19, 200, …) N-1 N (23, 0, 67, 198, 224, …) Codebook Ci Training set 1 . Compute mean values 254 255 Replace the old vectors New Codebook Ci+1 Training using iteration algorithm
Example Codebook To encode an input vector, for example, v = (150,145,121,130) (1) Compute the distance between v with all vectors in codebook d(v, cw1) = 114.2 d(v, cw2) = 188.3 d(v, cw3) = 112.3 d(v, cw4) = 124.6 d(v, cw5) = 122.3 d(v, cw6) = 235.1 d(v, cw7) = 152.5 d(v, cw8) = 63.2 (2) So, we choose 8 to replace the input vector v.
The Encoding algorithm using PCA Codebook The covariance matrix
The Encoding algorithm using PCA From the covariance matrix, we compute D1: (0.5038, 0.4904, 0.4788, 0.5259), λ1=19552, D2: (-0.4915, -0.5126, 0.4293, 0.5580), λ2=151, D3: (-0.0294, -0.0292, 0.7658, -0.6418), λ3=86 and D4: (0.7098, -0.7042, -0.0108, -0.0134), λ4=6. D1: (0.5038, 0.4904, 0.4788, 0.5259) is a coordinate D1 reserves 98.77% information of the variance of the codewords.
The Encoding algorithm using PCA The new sorted codebook and the corresponding projected value of codewords Codebook The sorted codewords The projected values D1: (0.5038, 0.4904, 0.4788, 0.5259)
The Encoding algorithm using PCA Encode an input vector v = (150, 145, 121, 130) Transform v to α=D1*v α= (0.5038, 0.4904, 0.4788, 0.5259) * (150, 145, 121, 130)T= 272.98 321.93 is the closet value to 272.98 For 321.93, d(v, cw’5) = 63.2 For 162.60, d(v, cw’4) = 122.3 For 382.84, d(v, cw’6) = 114.2 So, we choose cw’5 to replace the v.
VQ and PCA for image retrieval Association Rules:
VQ and PCA for image retrieval Association Rules: ~
Query image Projected image ~
Conclusions An efficient image retrieval scheme based upon multivariate analysis technique and a data mining technique. PCA – extracting image features Association rules - matching the candidate images. VQ and PCA for similar image retrieval.