Download presentation
Presentation is loading. Please wait.
Published byLindsey Singleton Modified over 9 years ago
1
1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute of Technology Date:2005/12/10
2
2 Outline 1. Introduction 2. The proposed classification approach 3. Experiments and Discussions 4. Conclusions
3
3 1. Introduction Paper documents -> Computer codes OCR(Optical Character Recognition) The design of classification systems consists of two subproblems: Feature extraction Classification
4
4 Classification of objects (or patterns) into a number of predefined classes has been extensively studied in wide variety of applications such as Optical character recognition (OCR) Speech recognition Face recognition
5
5 Feature extraction Features are functions of the measurements that enable a class to be distinguished from other classes. It has not found a general solution in most applications. Our purpose is to design a general classification scheme, which is less dependent on domain-specific knowledge.
6
6 Two philosophies of classification Statistical The measurements that describe an object are treated only formally as statistical variables, neglecting their “ meaning ” Structural Regard objects as compositions of structural units, usually called primitives.
7
7 Template matching “ Template matching ” is one of the most popular techniques for visual pattern recognition. We can store a template in the computer for each distinct class of characters to be recognized, and to compare the unknown characters with the stored set to find the best matching. Suppose that the character images are two- tone – all black on white backgrounds. The matching degree can be calculated by counting their matching bits, which is known as the Hamming distance criterion. However, this kind of global Boolean template is unreliable. It is because the bits along the edge of a character image are often subject to unpredictable variations and noises.
8
8 This paper presents a template-based classification approach to recognize the characters in the rare books transcribed by ancient calligraphers. Compared to traditional approaches, which first use some feature extraction methods, like the thinning method, to extract the features of the characters and then recognize them by these features, we apply the original character images directly to achieve this goal. The system is operated in two phases: training and classification. In the training phase, we superimpose a number of training samples belonging to the same character class and calculate the fraction of time that a given bit in the character bitmap is 1 to construct the template of the character class. Then, in the classification phase, an unknown character can be recognized by finding the character class whose template is best fitted for the unknown character via the L1 (or Manhattan) distance criterion.
9
9 2. The proposed classification approach The ultimate goal of classification is to classify an unknown pattern x to one of M possible classes (c 1, c 2, …, c M ). Each pattern is represented by a set of D features, viewed as a D- dimensional feature vector.
10
10 2.1 System Architecture Figure 1. The framework of our classification approach.
11
11 2.2 Template Generation The template for a character class is the representative image for the character class. Basically, templates are generated from the training samples in the training phase. For the purpose of generating the templates that are not only representative but also easy to be matched, we developed two types of templates: the statistical templates the average templates.
12
12 Figure 2. Some training samples of character class “ 心 ”.
13
13 2.2.1 Statistical templates To generate a template that summarizes the characteristics of the training samples of the same character class, we superimpose the images of the training samples belonging to the same class to obtain the probability of each bit (or pixel) being white in the presence of that class of samples. We call this kind of templates the statistical templates. The statistical templates can be generated by the following equation:
14
14 Figure 3. The statistical template of character class “ 心 ”.
15
15 2.2.2 Average templates We intend to develop a binary template that can represent the majority of the training samples belonging to the same character class. Mathematically, the binary templates, called the average templates, can be generated by the following equation:
16
16
17
17 2.3 Template Matching 2.3.1 Distance measurement Basically, two major types of distance measure can be applied: L1 (or Manhattan) distance L2 (or Euclidean) distance
18
18 Then the L1-metric-based distance between f and can be defined as Similarly, the distance between the test character image f and the average template of class ci, can be defined as The distance between f and can also be calculated by the Hamming distance criterion, i.e., counting their matching bits, which is computationally efficient than the L1 distance criterion.
19
19 2.3.2 Decision rules To decide the expected class of f, the following decision rules can be applied: 1) Rule MSMD (Minimum Statistical Matching Distance): E(x) = arg min 1 i M { }, 2) Rule MAMD (Minimum Average Matching Distance): E(x) = arg min 1 i M { }. Therefore, the class whose template matches the unknown character image most will be regarded as the expected class of the character.
20
20 3. Experimental Results A famous handwritten rare book, Kin-Guan bible ( 金剛經 ) 18,600 samples. 640 classes. Each character image was transformed into a 48×48 bitmap. 1000 of the 18600 samples are used for testing; the others are used for training.
21
21 Figure 5. The accuracy rate of each decision rule.
22
22 4. Conclusions This paper presents a template-based classification approach for recognizing handwritten characters in Chinese paleography. Instead of extracting features from the images of the characters, our template matching method apply the original character images directly to recognize the characters. In our approach, two types of templates (i.e., the statistical templates and the average templates) are generated in the training phase, and two decision rules (i.e., MSMD and MAMD) based on the two types of templates are applied to recognize the unknown characters in the classification phase. Both decision rules are better than the PMD decision rule in terms of the accuracy rate.
23
23 Future Works In the future, some works can be done to improve this system: since features of different types complement one another in classification performance, by using features of different types simultaneously, classification accuracy could be improved; In order to alleviate the load of the character recognition, a coarse classification scheme needs to be involved in our system.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.