Download presentation
1
INFORMATION REPRESENTATION AND COMPRESSION
2
Our approach in TUT: We do not know how to describe locations of blocks so.... Let’s think first about GLOBAL cotnent description in which locations are not considered! That is look first into the problem in which only block STATISTICS is considered (we were illustrating on CAMSHIFT that color statistics gives good results)
3
Impact of Quantization
Distribution of DCT coefficients for typical 8x8 DCT block We can see that higher frequency coefficients are small. If we use strong quantization they will be quantized to zero.
4
Under strong quantization only first 4x4 block of
coefficients will be nonzero. This is equivalent to 4x4 DCT transform. There is another effect too: The greater the quantization the smaller the number of DIFFERENT blocks. In fact, with no quantization, every block is different Quantization is rounding the coefficients to limited number of values.
5
Coefficients of the 4x4 blocks
DC – zero frequency, average light level in the block AC – correspond to different frequencies Quantization by QP [DC]=round[DC/QP] [AC]=round[AC/QP] DC AC AC Higher QP -> more zeros in the block
6
Here is an illustration for a picture
QP is quantization parameter, we see that as it is increasing the number of DCT patterns is reduced stronlgy
7
Now we use the following idea:
Let’s see how the histogram of the quantized DCT blocks looks! For example, let’s find which blocks appear most often in a picture and create histogram of e.g. first 40 patterns
8
The shape of this histogram obviously depends on
the quantization. If the quantization is low, the histogram will tend to be flat. If the quantization is high it will tend to have a peak.
9
Let us see example of histograms for two pictures
Histograms of two face images
10
The database retrieval problem based on block histograms
Assume we have database D of pictures 1,2,..i,,j..m We take a picture and want to check if it is in the database or if there are similar pictures there. Example: database of passport photographs. In our approach we will use the similarity measure between pictures based on their quantized histograms Histograms are treated as vectors and similarity is based on the following formula: Bi,j= i,j єD
11
The measure is city-block measure (differences between
absolute values of coefficients) and it achives minimum value = 0. Then two histogram vectors should be identical. The closer the value to zero the more similar pictures should be. Remember that blocks are quantized so noise and nonrelevant features are removed. The question is what is the performance of such scheme but before we can check this, we need to look into the light normalization problem.
12
Light normalization problem
The values of DCT transform coefficients depend on the light level. If the light level is higher the values are higher. If we use the same quantization for two identical pictures with different light levels the quantized blocks will be different. Light level can be normalized. First, let’s calculate average light level for a picture. For this we use values of DC coefficients in blocks Here we get average light level for a picture
13
Average light level DCall in a database is calculated in the
same way based on values of DCmean for each picture. Next, the values of light level for each picture are rescaled by the factor of Rescaling makes that the values of coefficients in the quantized blocks will be similar:
14
The DC coefficients problem
At high quantization levels very many blocks will have only DC coefficients. Information about these blocks will be only DC that is what ist the average light level in the block. But of interest is how the average light level is changing between the blocks. We want to use this information. What we make is that we will account for the information in the differences between DC values in neighbouring blocks.
15
DC differences between blocks
In a) we see fragment of a picture in which DC values of the blocks are shown. For each block we have 8 neighbours like shown in b). We calculate 9 differences between the neighbours (8 for directions and 1 for the average from all directions) as shown in c). Now we order the differences and form a vector from first k coefficients as shown in d) for k=4
16
Combined histogram Bi,j=
A combined histogram for AC blocks and DC vectors is now formed H =[ HAC , α xHDC ] where α is a numerical parameter which will be optimized later. Combined histogram means that we have two vectors for minimizing and they are summed with parameter α Bi,j= i,j єD
17
Optimization of database retrieval
The question is: How good can be the database retrieval based on combined histogram? This means e.g. how many errors it will be made. But we can also ask another question: What is the best achievable performance of this approach? Remember that we use only statistical information but we have several parameters which can be selected: - quantization level - size of histograms - parameter α for combining histograms - size of DC difference vectors
18
Optimization procedure
We can check this problem taking some databases and optimizing the parameters for best retrieval. This will show us what is the maximum performance. We did this for face databases using the following scheme:
19
EVALUATION OF RESULTS Given certain classification threshold, an input face image of person A may be falsely classified to person B. If the target person is person A. The ratio of how many images of person A have been classified into other persons is called False Rejection Rate, FRR. The ratio of how many images of other persons have been classified into person A is called False Acceptance Rate, FAR.
20
Equal Error Rate The ratio of how many images of other persons have been classified into person A is called False Acceptance Rate, FAR. From the FAR and FRR, an Equal Error Rate (EER) is achieved when both measures take equal values. The lower the EER is, the better is the system's performance, as the total error rate which is the sum of the FAR and the FRR at the point of the EER decreases. Typical performance of EER histogram for two face databases
21
DATABASE SELECTION There are two cases:
Database in which there is only one (standard) picture of each person 2. Database in which there are multiple pictures of each person (and they might very different) In case 2. the same person should be retrieved for any of its pictures which can be difficult.
22
DATABASES SELECTED The GTF (Georgia Tech Face) database contains the face images of 50 people, from both male and female, each with 15 images. Most of the images were taken in two different sessions to account for the variations in illumination conditions, facial expression, appearance, different scales and orientations. For test, we store the first 11 images of each person in the database and the remaining 4 images serve as key images for retrieval. Therefore, the total number of stored images is 550 and the total number of key images is 200.
23
DATABASES SELECTED The ORL (Olivetti Research Laboratory) database contains 10 different images of 40 persons. Images were taken at different times, with slightly varying lighting, various facial expressions (open/closed eyes, smiling/non-smiling) and facial details (glasses/no-glasses). The ORL has thus more variations for images taken from one person. For experiment, we store the first 6 images of each person in the database and the remaining 4 images serve as key images. Therefore, the total number of stored images is 240 and the total number of key images is 160.
24
RESULTS We present results for AC only, for DC only and for
combined histogram AC-Patterns Histograms Direction-Vectors Histograms Combined Histogram EER - ORL 1.25% 3.125% 0.625% EER - GTF 7% 4.5% The best result of ORL is obtained when: QP_AC=36, number of AC patterns=80, QP_DC=75, number of Direction-Vector patterns = 300 and α=0.7, γ=7. The best result of GTF is obtained when: QP_AC=10, number of AC patterns = 250, QP_DC = 20, number of Direction-Vector patterns = 400 and α=0.9, γ=5.
25
EVALUATION OF RESULTS Given certain classification threshold, an input face image of person A may be falsely classified to person B. If the target person is person A. The ratio of how many images of person A have been classified into other persons is called False Rejection Rate, FRR. The ratio of how many images of other persons have been classified into person A is called False Acceptance Rate, FAR.
26
ANOTHER DATABASE The FERET database contains overall more than 10,000 images from more than 1000 individuals taken in largely varying circumstances. The FERET database images are divided into several sets which are formed to match its methodology of evaluation. Here we made a test based on the sets fa and fb. In both of them, each face has one picture with picture in fb taken seconds after the corresponding picture in fa. The fa set which has size of 994 images and serves as the database, the fb set which has sizes of 992 images, is used as key images for retrieval from the fa.
27
EVALUATION OF RESULTS FERET is considered difficult database used in evaluation of professional applications: AC-Patterns Direction-Vectors Combined Histogram EER 4.6371% 7.06% 3.43% The best EER result is obtained when: QP_AC = 12, number of AC patterns = 400, QP_DC=12, number of Direction-Vector patterns = 400 and α=0.5, γ=4.
28
FERET METHODOLOGY OF EVALUATION
For FERET there is another methodology based on calculation of how many correct retrievals will be obtained among n trials, n=1,2,…,3.
29
FERET EVALUATION FERET evaluation is called cumulative match score.
Results are seen for histogram (red) and is overlaid with other known good methods. Rank means how many retrievals are made, one retrieval is most demanding.
30
Features based on Binary Feature Vectors
For each non-border 4x4 image block, there are eight blocks surrounding it. Such a 3x3 block matrix is utilized here to generate a Binary Feature Vector (BFV). Taking the DC coefficients as an example: the nine DC coefficients within this area form a 3x3 DC coefficient matrix. By measuring and thresholding the magnitude of differences between the non-center DC’s and the central DC coefficient, a binary vector length 8 is formed. Two different cases are considered here: Case1: 0 – current coefficient ≤ threshold 1 – current coefficient > threshold Case2: 0 – current coefficient < threshold 1 – current coefficient ≥ threshold Example
31
DC-BFV Histogram (based on DC coeff.)
AC-BFV Histogram (based on AC coeff. Example of DC-BFV histogram
32
Performance results for the Feret database
Result is quite good if we take into account that the method uses statistical information only
33
WHICH IS THE BEST METHOD?
On the FERET plot we see the best performance 95%. Which method it is? It is called EIGENFACES and it is based on calculation of eigenvectors and eigenevalues of matrices.
34
Construction of Face Space
EIGENFACES Construction of Face Space Suppose a face image consists of N pixels, so it can be represented by a vector of dimension N. Let be the training set of face images. The average face of these M images is given by Then each face differs from the average face by :
35
EIGENFACES Now covariance matrix of the training images can be
constructed: where The basis vectors of the face space, i.e., the eigenfaces, are then the orthogonal eigenvectors of the covariance matrix . The number of training images is usually less than the number of pixels in an image, there will be only M-1, instead of N, meaningful eigenvectors
36
Eigenvalues, eigenvectors
x is eigenvector for matrix A ís eigenvalue If S is an nonsingular nxn matrix then matrix B has the same eigenvalues B = SAS-1 nxn matrix has n eigenvalues
37
EIGENFACES Therefore, the eigenfaces are computed by first finding the eigenvectors, , of the M by M matrix L: The eigenvectors, , of the matrix are then expressed by a linear combination of the difference face images, , weighted by : In practice, a smaller set of M'(M'<M) eigenfaces is sufficient for face identification. Hence, only M' significant eigenvectors of L, corresponding to the largest M' eigenvalues, are selected for the eigenface computation
38
Thus further data compression can be obtained
Thus further data compression can be obtained. M' is determined by a threshold, , of the ratio of the eigenvalue summation: In the training stage, the face of each known individual, , is projected into the face space and an M'-dimensional vector, , is obtained: where is the number of face classes
39
A distance threshold, , that defines the maximum allowable distance from a face class as well as from the face space, is set up by computing half the largest distance between any two face classes: In the recognition stage, a new image, , is projected into the face space to obtain a vector, : The distance of to each face class is defined by
40
For the purpose of discriminating between face images and non-face like images, the distance, , between the original image, , and its reconstructed image from the eigenface space, , is also computed: where These distances are compared with the threshold given in equation (8) and the input image is classified by the following rules: IF THEN input image is not a face image; IF AND THEN input image contains an unknown face; IF AND THEN input image contains the face of individual .
41
EXPERIMENTAL RESULTS The eigenface-based face recognition method was tested on the ORL face database. 150 images of 15 individuals, were selected for experiments.
42
EXPERIMENTAL RESULTS In the training stage, three images of each individual were used as the training samples, forming a training set totalling 45 images The average face of the training set
43
EXPERIMENTAL RESULTS The first 15 eigenfaces corresponding to the 15 largest eigenvalues.
44
EXPERIMENTAL RESULTS Recognition rate
Recognition rate depends on training images – when single view images are used for training recognition is much worse
45
EXPERIMENTAL RESULTS Faces with calm expressions in the training stage and faces of the same individual but with various expressions in the testing stage Training images Test images lower images are projections in the face space
46
CONCLUSIONS Eigenfaces method treat images globally, no local
information is used. Compression is done on global level. The method requires lots of computations but results are good. Explanation of good results: images are represented as combinations of ”simple” images and the system is trained on them.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.