Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presented by Li-Jen Kao July, 2005

Similar presentations


Presentation on theme: "Presented by Li-Jen Kao July, 2005"— Presentation transcript:

1 Presented by Li-Jen Kao July, 2005
Fast Class Rendering Using Multiresolution Classification in Discrete Cosine Transform Domain Presented by Li-Jen Kao July, 2005 Mr. Chairman (Madam Chairman), Ladies and gentlemen, I am pleased to have you all here today. My name is XXX. I am here today to present a paper titled ” XXX”.

2 Outline Introduction Feature Extraction Classification Scheme
Experimental Results Conclusion The outline of this presentation is shown as follows: First, I’ll introduce the character recognition problem, which is our application. Second, I’ll describe how to extract features via the Discrete Cosine Transform Third, I’ll introduce our classification scheme Next, I’ll show the experimental results. Finally, I’ll summarize our approach.

3 1 Introduction Classification of objects (or patterns) into a number of predefined classes has been extensively studied in wide variety of applications such as optical character recognition (OCR) speech recognition face recognition We may consider the design of classification systems in terms of two subproblems: feature extraction classification. Classification of objects (or patterns) into a number of predefined classes has been extensively studied in wide variety of applications such as optical character recognition (OCR), speech recognition, and face recognition. These applications often involve hundreds or even thousands of classes. Generally speaking, we have to extract useful features from the objects being classified before the classification process. Thus, we may consider the design of classification systems in terms of two subproblems: 1) feature extraction and 2) classification.

4 Reliable and general features are required
Feature extraction: Features are functions of the measurements performed on a class of objects It has not found a general solution in most applications. Our purpose is to design a general classification scheme, which is less dependent on domain-specific knowledge. Reliable and general features are required Firstly, we briefly introduce feature extraction. Features are functions of the measurements performed on a class of objects that enable that class to be distinguished from other classes in the same general category. Before the classification process, useful features must be extracted from the objects. Although feature design would largely affect the classification performance, it has not found a general solution in most applications. They are typically subject to the experience, intuition, and/or cleverness of the designer. The purpose of this paper is to design a general classification scheme, which is less dependent on domain-specific knowledge. To achieve this goal, reliable and general features are required in the feature extraction stage. Based on previous observations, we are motivated to apply the statistical approach and a technique which is widely used in image compression, known as discrete cosine transform (DCT) [1], for classification.

5 Discrete Cosine Transform (DCT)
It helps separate an image into parts of differing importance with respect to the image's visual quality. Due to the energy compacting property of DCT, much of the signal energy has a tendency to lie at low frequencies. The DCT consists of two important characteristics: It helps separate an image into parts of differing importance with respect to the image's visual quality. Due to the energy compacting property of DCT, much of the signal energy has a tendency to lie at low frequencies.

6 Four advantages in applying DCT
The features extracted by DCT are general and reliable. It can be applied to most of the vision-oriented applications. The amount of data to be stored can be reduced tremendously. Multiresolution classification and progressive matching can be achieved by nature. The DCT is scale-invariant and less sensitive to noise and distortion. There are four main advantages in applying DCT as feature extraction method: The features extracted by DCT are general and reliable. It can be applied to most of the vision-oriented applications such as OCR, face recognition and image retrieval, etc. The amount of data to be stored can be reduced tremendously. Based on the energy compacting property of the DCT, most of the signal energy is preserved in a relatively small amount of DCT coefficients. For instance, in the case of OCR [2], 84% signal energy appears in the 2.8% DCT coefficients, which corresponds to a saving of 97.2%. Multiresolution classification and progressive matching can be achieved by nature. DCT provides a coarse-to-fine strategy. The recognition starts from the coarse scale and moves to the finer scales. The low-frequency DCT coefficients, i.e. the coefficients with the lowest resolution, can be used for coarse classification; the coefficients with the higher resolution can be applied for fine classification. The DCT is scale-invariant and less sensitive to noise and distortion. Usually, only low frequency components will be used as features for the sake of efficiency. Some details included in the higher frequency band are removed so that the algorithm is less sensitive to noises or imperfections of the object images. Further, DCT is scale-invariant due to its energy compacting property.

7 Two philosophies of classification
Statistical the measurements that describe an object are treated only formally as statistical variables, neglecting their “meaning Structural regards objects as compositions of structural units, usually called primitives. There are two distinct philosophies of classification in general, known as “statistical” [2], [3] and “structural” [4]. In the statistical approach the measurements that describe an object are treated only formally as statistical variables, neglecting their “meaning”. The structural approach, on the other hand, regards objects as compositions of structural units, usually called primitives. Since our main goal is to develop a general coarse classification scheme for vision-oriented applications (such as OCR, face recognition, image retrieval, etc.), the statistical approach is more suitable than the structural approach.

8 2 Feature Extraction via DCT
The DCT coefficients C(u, v) of an N×N image represented by x(i, j) can be defined as where The DCT can be defined as this equation In this equation, x denotes the original image before transformation. And C represents the DCT coefficients after the transformation.

9 Figure 1. The DCT coefficients of the character image “為”.
This figure shows the DCT coefficients of a character image (“為”) of size 48×48. The number of coefficients is equal to the number of pixels in the original character image. From this figure, we can found that most of the signal energy appears in the upper left corner of the DCT coefficients.

10 Figure 2. Illustratation of the multiresolution ability of DCT
(a) (b) (c) (d) (a) The original image of size 48×48; (b) The reconstructed image of size 8×8; (c) The reconstructed image of size 16×16; (d) The reconstructed image of size 32×32. Figure 2 shows the images reconstructed from different block sizes via IDCT. For instance, Figure 2 (b) is the image reconstructed from the lowest 8×8 DCT coefficients of the original image. From this figure, it can be seen that the bigger the block size, the higher the resolution of the reconstructed image. Adding more DCT coefficients usually imply increasing the resolution level of an image. If current resolution is not high enough to distinguish one character from the others, we have to raise the level of resolution such that the discriminating ability can also be improved. Based on such observations, we are motivated to devise a multiresolution classification scheme using DCT coefficients as the discriminating features.

11 3. The Proposed Classification Scheme
The ultimate goal of classification is to classify an unknown pattern x to one of M possible classes (c1, c2,…, cM). Each pattern is represented by a set of D features, viewed as a D-dimensional feature vector. 1. Before going into the details of our approach, we first briefly review the problem and introduce related symbols used in this paper. 2. The ultimate goal of classification is to classify an unknown pattern x to one of M possible classes (c1, c2,…, cM). 3. Each pattern is represented by a set of D features, viewed as a D-dimensional feature vector.

12 3.1. Our classification model
In the training mode: the feature extraction module finds the appropriate features for representing the input patterns, and the classifier is trained. In the classification mode: the trained classifier assigns the input pattern to one of the pattern classes based on the measured features. Our system is operated in two modes: training and classification. In the training mode, the feature extraction module finds the appropriate features for representing the input patterns, and the classifier is trained. In the classification mode, the trained classifier assigns the input pattern to one of the pattern classes based on the measured features.

13 To alleviate the burden of classification process, the process is usually divided into two stages:
Coarse Classification Fine Classification To alleviate the burden of classification process, the process is usually divided into two stages: the coarse classification (also called preclassification) process and the fine classification process. To classify an unknown object, firstly, the coarse classification is employed to reduce the large set of candidate objects to a smaller one. Then, the fine classification is applied to identify the class of the object.

14 Figure 3. Model for multiresolution classification

15 3.2. Coarse classification module
In the training mode: The features of each training sample are first extracted by DCT and quantized. Then the most D significant quantized DCT features of each training sample are transformed to a code, called grid code (GC), which corresponds to a grid of feature space partitioned by the quantization method. The training samples with the same GC are similar and can be classified into a coarse class. Therefore, the information about all possible GCs is gathered in the training mode. Our coarse classification scheme is developed as follows. In the training mode, the features of each training sample are first extracted by DCT and quantized. Then the most D significant quantized DCT features of each training sample are transformed to a code, called grid code (GC), which corresponds to a grid of feature space partitioned by the quantization method. Obviously, the training samples with the same GC are similar and can be classified into a coarse class. This characteristic implies that: if an unknown sample has the same GC of a training sample, it may belong to the same class the training sample belongs to. Therefore, the information about all possible GCs is gathered in the training mode. Each code may correspond to many or no classes. In the classification mode, the classes with the same GC as that of the test sample are chosen as the candidates of the test sample. In the subsequent subsections, the details of the sub-modules that compose the coarse classification module are introduced.

16 In the classification mode:
The classes with the same GC as that of the test sample are chosen as the candidates of the test sample.

17 Quantization The 2-D DCT coefficient F(u,v) is quantized to F’(u,v) according to the following equation: Most of the high frequency coefficients will be quantized to zero and only the most significant coefficients will be retained. Quantization is the process of reducing the number of possible values of a quantity, thereby reducing the number of bits needed to represent it. Since only significant features are needed for coarse classification, the quantization technique is applied to obtain a reduced representation of the feature set and render the features more suitable for the coarse classification. In our approach the 2-D DCT coefficient F(u,v) is quantized to F’(u,v) according to the following equation: (4) Assume that the quantized DCT coefficient F’(u, v) belongs to one of the following quantized values: 0, 1, .., q. In other words, the span of F(u, v) is partitioned into (q+1) intervals, each of which corresponds to a quantized value. It is obvious that most of the high frequency coefficients will be quantized to zero and only the most significant coefficients will be retained. Thus, dimension of the feature vector can be reduced after quantization.

18 3.2.2. Grid Code Transformation
After the quantization process, the most D significant quantized DCT features of sample Oi are obtained, say [qi1, qi2, .., qiD]. The significance of each DCT coefficient is decided according to the following zigzag order: F(0,0), F(0,1), F(1,0), F(2,0), F(1,1), F(0,2), F(0,3), F(1,2), F(2,1), F(3,0), F(3,1),…, and so on. Because the value of qij may be negative, for the ease of operation, we transform qij to positive integer dij by adding a number, say kj, to qij. In this way, object Oi can be transformed to a D-digit GC. This process is called the grid code transformation (GCT). After the quantization process, the most D significant quantized DCT features of sample Oi are obtained, say [qi1, qi2, .., qiD]. In this quantized feature vector, qi1 is the most important feature and qi2 is the second most important one, and so on. In our approach the significance of each DCT coefficient is decided according to the following zigzag order: F(0,0), F(0,1), F(1,0), F(2,0), F(1,1), F(0,2), F(0,3), F(1,2), F(2,1), F(3,0), F(3,1),…, and so on. The order is based on the energy compacting property that low-frequency DCT coefficients usually are more important than high-frequency DCT coefficients. Because the value of qij may be negative, for the ease of operation, we transform qjj to positive integer dij by adding a number, say kj, to qij. kj is obtained from the statistics of the j-th component of the quantized feature vector of all training samples. The value of kj is set to transform the smallest value of the j-th feature to zero. In this way, object Oi can be transformed to a D-digit GC, say . This process is what we call the grid code transformation (GCT). From another point of view, the GCT is applied for the purpose of grouping the patterns with high similarity into the same code. This process can thus be regarded as a coarse classification process.

19 3.2.3. Grid Code Sorting and Elimination
After the GCT, we obtain a list of triplets (Ti, Ci, GCi) Ti is the ID of a training sample Ci is the Class ID the training sample belongs to GCi is the grid code of the training sample. Then the list is sorted according to the GC ascendingly. Given the GC of a test sample, we can get a list of candidate classes of the same GC for the test sample. After the GCT, we obtain a list of triplets (Ti, Ci, GCi), where Ti is the ID of a training sample, Ci is the Class ID the training sample belongs to, and GCi is the grid code of the training sample. Each training sample is associated with a triplet. Then the list is sorted according to the GC ascendingly. Such that we can efficiently find the classes whose training samples belong to the GC being searched. Consequently, given the GC of a test sample, we can get a list of candidate classes of the same GC for the test sample.

20 Elimination of Redundancy
Redundancy occurs as the training samples belonging to the same class have the same GC. This redundancy can be eliminated by establishing an abstract lookup table that only contains the information about the GCs and their corresponding classes. Then, given a GC, this table can tell the relevant classes very quickly by binary search. To further improve the efficiency of the searching process, redundant information must be removed. Redundancy occurs as the training samples belonging to the same class have the same GC. This redundancy can be eliminated by establishing an abstract lookup table that only contains the information about the GCs and their corresponding classes. Then, given a GC, this table can tell the relevant classes very quickly by binary search.

21 3.3. The fine classification module
Progressive matching method Adding more DCT coefficients usually imply increasing the resolution level of an image. If current resolution is not high enough to distinguish one character from the others, we have to raise the level of resolution such that the discrimination power can also be improved. The establishment of the templates for each class Templates are established in the DCT domain. The average DCT coefficients of size N×N are obtained from the set of training samples with respect to the class. Such that M sets of average DCT coefficients are obtained and served as the templates for each class. The fine classification method applied here is progressive matching method. The underlying philosophy of the progressive matching algorithm is the concept of multiresolution. Adding more DCT coefficients usually imply increasing the resolution level of an image. If current resolution is not high enough to distinguish one character from the others, we have to raise the level of resolution such that the discrimination power can also be improved. In order to apply the template matching method, we have to establish the templates (or prototypes) for each class in the training mode. In our approach, these templates are established in the DCT domain. Accordingly, to establish the template for a class, the average DCT coefficients of size N×N are obtained from the set of training samples with respect to the class, assuming there is at least one training sample for each class. Such that M sets of average DCT coefficients are obtained and served as the templates for each class.

22 The process is repeated until one of the stop criterions is satisfied:
The sum of squared differences (SSD) is used as the matching criterion. The matching of x and Ti is decomposed into K iterations, each of which corresponds to the matching under the block of size nk×nk. After the kth iteration, the block size is enlarged from nk×nk to nk+1×nk+1 (nk+1 = nk+d). The process is repeated until one of the stop criterions is satisfied: 1) to preserve enough signal energy in the block, and 2) to reject unqualified classes as soon as possible. Since the most significant DCT coefficients lie in its low frequency area, the matching process can be performed step by step by adding more and more coefficients from low frequency area to high frequency area. Thus, we develop a matching algorithm that can progressively measure the distance (or dissimilarity) between the unknown character x and template Ti. In the matching process, the sum of squared differences (SSD) is used as the matching criterion. The matching of x and Ti is decomposed into K iterations, each of which corresponds to the matching under the block of size nk×nk. After the kth iteration, the block size is enlarged from nk×nk to nk+1×nk+1 (nk+1 = nk+d). The process is repeated until one of the stop criterions is satisfied: 1) to preserve enough signal energy in the block, and 2) to reject unqualified classes as soon as possible. To reject the unqualified classes in the early iterations, statistical tools are applied to calculate the theoretical sound distance thresholds for the SSD under different block size. The details of the progressive matching method can be found in [2].

23 4 Experimental Results 18600 samples (about 640 categories) are extracted from Kin-Guan (金剛) bible. Each character image was transformed into a 48×48 bitmap. 1000 of the samples are used for testing and the others are used for training. The most D significant DCT coefficients were quantized and transformed to a GC for each sample. In our application, the objects to be classified are handwritten characters in Chinese paleography. Now, let’s look at the experimental results. In our experiments, there are…

24 Figure 3. Reduction and accuracy rate using
Figure 3. Reduction and accuracy rate using our coarse classification scheme Figure 3 shows the reduction rate and accuracy rate of the test samples under different value of D. It can be seen that the reduction rate increases as the size of GC, i.e. D, increases. However, it appears to be a tradeoff that good reduction rate usually sacrifices its accuracy rate at the same time.

25 Figure 4. Accuracy rate using both coarse and fine classification
Figure 4 shows the accuracy rate of the test samples using both the coarse and fine classification. It can be found that even though the accuracy rate after the coarse classification drops as D increases, the final recognition rate would not be dropped in the same way. This result reveals the fact that, if the coarse classification process misses the corresponding class of a test sample, the class of the test sample can not be predicted correctly at the same time.

26 6 Conclusions This paper presents a multiresolution classification scheme based on DCT for vision-based applications. The DCT features of a pattern can be extracted progressively according to their significance. On classifying an unknown object, most of the improbable candidate classes for the object can be eliminated at lower resolution levels. Experiments were conducted for recognizing handwritten characters in Chinese palaeography and showed that our approach performs well in this application domain. Since only preliminary experiment has been made to test our approach, a lot of works should be done to improve this system. For example, since features of different types complement one another in classification performance, by using different types of vision-oriented features simultaneously, classification accuracy could be improved.

27 Future Works Since only preliminary experiment has been made to test our approach, a lot of works should be done to improve this system. For example, since features of different types complement one another in classification performance, by using different types of vision-oriented features simultaneously, classification accuracy could be improved. Since only preliminary experiment has been made to test our approach, a lot of works should be done to improve this system. For example, since features of different types complement one another in classification performance, by using different types of vision-oriented features simultaneously, classification accuracy could be improved.


Download ppt "Presented by Li-Jen Kao July, 2005"

Similar presentations


Ads by Google