1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Learning deformable models Yali Amit, University of Chicago Alain Trouvé, CMLA Cachan.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Word Spotting DTW.
Identifying Image Spam Authorship with a Variable Bin-width Histogram-based Projective Clustering Song Gao, Chengcui Zhang, Wei Bang Chen Department of.
Bayesian Decision Theory
Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.
São Paulo Advanced School of Computing (SP-ASC’10). São Paulo, Brazil, July 12-17, 2010 Looking at People Using Partial Least Squares William Robson Schwartz.
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin Nov
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Region labelling Giving a region a name. Image Processing and Computer Vision: 62 Introduction Region detection isolated regions Region description properties.
Principal Component Analysis
CS292 Computational Vision and Language Pattern Recognition and Classification.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Chapter 2: Pattern Recognition
Ensemble Tracking Shai Avidan IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE February 2007.
Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.
Feature Subset Selection using Minimum Cost Spanning Trees Mike Farah Supervisor: Dr. Sid Ray.
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Handwritten Thai Character Recognition Using Fourier Descriptors and Robust C-Prototype Olarik Surinta Supot Nitsuwat.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin The Chinese.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture Notes by Neşe Yalabık Spring 2011.
Case Studies Dr Lee Nung Kion Faculty of Cognitive Sciences and Human Development UNIVERSITI MALAYSIA SARAWAK.
Presented by: Kamakhaya Argulewar Guided by: Prof. Shweta V. Jain
FEATURE EXTRACTION FOR JAVA CHARACTER RECOGNITION Rudy Adipranata, Liliana, Meiliana Indrawijaya, Gregorius Satia Budhi Informatics Department, Petra Christian.
Image Recognition and Processing Using Artificial Neural Network Md. Iqbal Quraishi, J Pal Choudhury and Mallika De, IEEE.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
: Chapter 10: Image Recognition 1 Montri Karnjanadecha ac.th/~montri Image Processing.
1 An Efficient Classification Approach Based on Grid Code Transformation and Mask-Matching Method Presenter: Yo-Ping Huang Tatung University.
Wavelet-Based Multiresolution Matching for Content-Based Image Retrieval Presented by Tienwei Tsai Department of Computer Science and Engineering Tatung.
OBJECT RECOGNITION. The next step in Robot Vision is the Object Recognition. This problem is accomplished using the extracted feature information. The.
Presented by Tienwei Tsai July, 2005
BACKGROUND LEARNING AND LETTER DETECTION USING TEXTURE WITH PRINCIPAL COMPONENT ANALYSIS (PCA) CIS 601 PROJECT SUMIT BASU FALL 2004.
COMPARISON OF IMAGE ANALYSIS FOR THAI HANDWRITTEN CHARACTER RECOGNITION Olarik Surinta, chatklaw Jareanpon Department of Management Information System.
COLOR HISTOGRAM AND DISCRETE COSINE TRANSFORM FOR COLOR IMAGE RETRIEVAL Presented by 2006/8.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
BARCODE IDENTIFICATION BY USING WAVELET BASED ENERGY Soundararajan Ezekiel, Gary Greenwood, David Pazzaglia Computer Science Department Indiana University.
Handwritten Recognition with Neural Network Chatklaw Jareanpon, Olarik Surinta Mahasarakham University.
Visual Information Systems Recognition and Classification.
A Face processing system Based on Committee Machine: The Approach and Experimental Results Presented by: Harvest Jang 29 Jan 2003.
Content-Based Image Retrieval Using Fuzzy Cognition Concepts Presented by Tienwei Tsai Department of Computer Science and Engineering Tatung University.
2005/12/021 Content-Based Image Retrieval Using Grey Relational Analysis Dept. of Computer Engineering Tatung University Presenter: Tienwei Tsai ( 蔡殿偉.
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER
Content-Based Image Retrieval Using Block Discrete Cosine Transform Presented by Te-Wei Chiang Department of Information Networking Technology Chihlee.
1 An Efficient Classification Approach Based on Grid Code Transformation and Mask-Matching Method Presenter: Yo-Ping Huang.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Bootstrapped Optimistic Algorithm for Tree Construction
Irfan Ullah Department of Information and Communication Engineering Myongji university, Yongin, South Korea Copyright © solarlits.com.
3.Learning In previous lecture, we discussed the biological foundations of of neural computation including  single neuron models  connecting single neuron.
2D-LDA: A statistical linear discriminant analysis for image matrix
Content-Based Image Retrieval Using Color Space Transformation and Wavelet Transform Presented by Tienwei Tsai Department of Information Management Chihlee.
Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.
1 A latent information function to extend domain attributes to improve the accuracy of small-data-set forecasting Reporter : Zhao-Wei Luo Che-Jung Chang,Der-Chiang.
1 A Statistical Matching Method in Wavelet Domain for Handwritten Character Recognition Presented by Te-Wei Chiang July, 2005.
Optical Character Recognition
CONTENTS:  Introduction.  Face recognition task.  Image preprocessing.  Template Extraction and Normalization.  Template Correlation with image database.
Neural Network Architecture Session 2
Presented by Li-Jen Kao July, 2005
A new data transfer method via signal-rich-art code images captured by mobile devices Source: IEEE Transactions on Circuits and Systems for Video Technology,
REMOTE SENSING Multispectral Image Classification
Coarse Classification via Discrete Cosine Transform and Quantization
Pattern Recognition and Training
Pattern Recognition and Training
Random Neural Network Texture Model
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute of Technology Date:2005/12/10

2 Outline 1. Introduction 2. The proposed classification approach 3. Experiments and Discussions 4. Conclusions

3 1. Introduction Paper documents -> Computer codes OCR(Optical Character Recognition) The design of classification systems consists of two subproblems: Feature extraction Classification

4 Classification of objects (or patterns) into a number of predefined classes has been extensively studied in wide variety of applications such as Optical character recognition (OCR) Speech recognition Face recognition

5 Feature extraction Features are functions of the measurements that enable a class to be distinguished from other classes. It has not found a general solution in most applications. Our purpose is to design a general classification scheme, which is less dependent on domain-specific knowledge.

6 Two philosophies of classification Statistical The measurements that describe an object are treated only formally as statistical variables, neglecting their “ meaning ” Structural Regard objects as compositions of structural units, usually called primitives.

7 Template matching “ Template matching ” is one of the most popular techniques for visual pattern recognition. We can store a template in the computer for each distinct class of characters to be recognized, and to compare the unknown characters with the stored set to find the best matching. Suppose that the character images are two- tone – all black on white backgrounds. The matching degree can be calculated by counting their matching bits, which is known as the Hamming distance criterion. However, this kind of global Boolean template is unreliable. It is because the bits along the edge of a character image are often subject to unpredictable variations and noises.

8 This paper presents a template-based classification approach to recognize the characters in the rare books transcribed by ancient calligraphers. Compared to traditional approaches, which first use some feature extraction methods, like the thinning method, to extract the features of the characters and then recognize them by these features, we apply the original character images directly to achieve this goal. The system is operated in two phases: training and classification. In the training phase, we superimpose a number of training samples belonging to the same character class and calculate the fraction of time that a given bit in the character bitmap is 1 to construct the template of the character class. Then, in the classification phase, an unknown character can be recognized by finding the character class whose template is best fitted for the unknown character via the L1 (or Manhattan) distance criterion.

9 2. The proposed classification approach The ultimate goal of classification is to classify an unknown pattern x to one of M possible classes (c 1, c 2, …, c M ). Each pattern is represented by a set of D features, viewed as a D- dimensional feature vector.

System Architecture Figure 1. The framework of our classification approach.

Template Generation The template for a character class is the representative image for the character class. Basically, templates are generated from the training samples in the training phase. For the purpose of generating the templates that are not only representative but also easy to be matched, we developed two types of templates: the statistical templates the average templates.

12 Figure 2. Some training samples of character class “ 心 ”.

Statistical templates To generate a template that summarizes the characteristics of the training samples of the same character class, we superimpose the images of the training samples belonging to the same class to obtain the probability of each bit (or pixel) being white in the presence of that class of samples. We call this kind of templates the statistical templates. The statistical templates can be generated by the following equation:

14 Figure 3. The statistical template of character class “ 心 ”.

Average templates We intend to develop a binary template that can represent the majority of the training samples belonging to the same character class. Mathematically, the binary templates, called the average templates, can be generated by the following equation:

16

Template Matching Distance measurement Basically, two major types of distance measure can be applied: L1 (or Manhattan) distance L2 (or Euclidean) distance

18 Then the L1-metric-based distance between f and can be defined as Similarly, the distance between the test character image f and the average template of class ci, can be defined as The distance between f and can also be calculated by the Hamming distance criterion, i.e., counting their matching bits, which is computationally efficient than the L1 distance criterion.

Decision rules To decide the expected class of f, the following decision rules can be applied: 1) Rule MSMD (Minimum Statistical Matching Distance): E(x) = arg min 1  i  M { }, 2) Rule MAMD (Minimum Average Matching Distance): E(x) = arg min 1  i  M { }. Therefore, the class whose template matches the unknown character image most will be regarded as the expected class of the character.

20 3. Experimental Results A famous handwritten rare book, Kin-Guan bible ( 金剛經 ) 18,600 samples. 640 classes. Each character image was transformed into a 48×48 bitmap of the samples are used for testing; the others are used for training.

21 Figure 5. The accuracy rate of each decision rule.

22 4. Conclusions This paper presents a template-based classification approach for recognizing handwritten characters in Chinese paleography. Instead of extracting features from the images of the characters, our template matching method apply the original character images directly to recognize the characters. In our approach, two types of templates (i.e., the statistical templates and the average templates) are generated in the training phase, and two decision rules (i.e., MSMD and MAMD) based on the two types of templates are applied to recognize the unknown characters in the classification phase. Both decision rules are better than the PMD decision rule in terms of the accuracy rate.

23 Future Works In the future, some works can be done to improve this system: since features of different types complement one another in classification performance, by using features of different types simultaneously, classification accuracy could be improved; In order to alleviate the load of the character recognition, a coarse classification scheme needs to be involved in our system.