25th June 2002IEMCT CDAC Pune1 Non-linear Normalization to Improve Telugu OCR Atul Negi, Chakravarthy Bhagvati, V.V. Suresh Kumar Department of Computer.

Slides:



Advertisements
Similar presentations
Patient information extraction in digitized X-ray imagery Hsien-Huang P. Wu Department of Electrical Engineering, National Yunlin University of Science.
Advertisements

ONLINE ARABIC HANDWRITING RECOGNITION By George Kour Supervised by Dr. Raid Saabne.
1 CPC group SeminarThursday, June 1, 2006 Classification techniques for Hand-Written Digit Recognition Venkat Raghavan N. S., Saneej B. C., and Karteek.
Document Processing Methods for Telugu and other SE Asian Scripts
Identifying Image Spam Authorship with a Variable Bin-width Histogram-based Projective Clustering Song Gao, Chengcui Zhang, Wei Bang Chen Department of.
Automatic 3D Face Recognition System 邱彥霖 龔士傑 Biometric Authentication.
1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal.
Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
Input to the Computer * Input * Keyboard * Pointing Devices
ICIP 2000, Vancouver, Canada IVML, ECE, NTUA Face Detection: Is it only for Face Recognition?  A few years earlier  Face Detection Face Recognition 
Chapter 2: Pattern Recognition
LYU0203 Smart Traveller with Visual Translator for OCR and Face Recognition Supervised by Prof. LYU, Rung Tsong Michael Prepared by: Wong Chi Hang Tsang.
Cyclic input of characters through a single digital button without visual feedback Yang Xiaoqing New Interaction Techniques Dept.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
California Car License Plate Recognition System ZhengHui Hu Advisor: Dr. Kang.
Handwritten Thai Character Recognition Using Fourier Descriptors and Robust C-Prototype Olarik Surinta Supot Nitsuwat.
Handwritten Character Recognition Using Block wise Segmentation Technique (BST) in Neural Network 47th Annual Convention of the Computer Society of India.
1 Image Processing(IP) 1. Introduction 2. Digital Image Fundamentals 3. Image Enhancement in the spatial Domain 4. Image Enhancement in the Frequency Domain.
بسم الله الرحمن الرحيم معالج الحروف الضوئي OCR. Introduction Definition : OCR stands for O ptical C haracter R ecognition refers to the branch of computer.
IIIT HyderabadUMASS AMHERST Robust Recognition of Documents by Fusing Results of Word Clusters Venkat Rasagna 1, Anand Kumar 1, C. V. Jawahar 1, R. Manmatha.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
04/05/031 Computer Input and Output Dairne Jesperson Charles Darwin University.
ONLINE HANDWRITTEN GURMUKHI SCRIPT RECOGNITION AND ITS CHALLENGES R. K. SHARMA THAPAR UNIVERSITY, PATIALA.
Handwriting Copybook Style Analysis Of Pseudo-Online Data Student and Faculty Research Day Mary L. Manfredi, Dr. Sung-Hyuk Cha, Dr. Charles Tappert, Dr.
1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute.
Clustering methods Course code: Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu,
1 An ICU Library Supporting the Display of Complex Text Eric Mader Globalization Center of Competency, Cupertino, CA.
Eigenedginess vs. Eigenhill, Eigenface and Eigenedge by S. Ramesh, S. Palanivel, Sukhendu Das and B. Yegnanarayana Department of Computer Science and Engineering.
CPSC 601 Lecture Week 5 Hand Geometry. Outline: 1.Hand Geometry as Biometrics 2.Methods Used for Recognition 3.Illustrations and Examples 4.Some Useful.
Presented by Tienwei Tsai July, 2005
IIIT Hyderabad Thesis Presentation By Raman Jain ( ) Towards Efficient Methods for Word Image Retrieval.
Transcription of Text by Incremental Support Vector machine Anurag Sahajpal and Terje Kristensen.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
COMPARISON OF IMAGE ANALYSIS FOR THAI HANDWRITTEN CHARACTER RECOGNITION Olarik Surinta, chatklaw Jareanpon Department of Management Information System.
BARCODE IDENTIFICATION BY USING WAVELET BASED ENERGY Soundararajan Ezekiel, Gary Greenwood, David Pazzaglia Computer Science Department Indiana University.
Handwritten Recognition with Neural Network Chatklaw Jareanpon, Olarik Surinta Mahasarakham University.
Online Signature Verification Based on Dynamic Regression Signature Verification 11/06/2003.
Handwritten Hindi Numerals Recognition Kritika Singh Akarshan Sarkar Mentor- Prof. Amitabha Mukerjee.
Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒.
Reporter: 資訊所 P Yung-Chih Cheng ( 鄭詠之 ).  Introduction  Data Collection  System Architecture  Feature Extraction  Recognition Methods  Results.
Neural Network Applications in OCR Daniel Hentschel Robert Johnston Center for Imaging Science Rochester Institute of Technology.
Advanced Computer Vision Chapter 1 Introduction 傅楸善 Chiou-Shann Fuh ext. 327
Scanned Documents INST 734 Module 10 Doug Oard. Agenda Document image retrieval  Representation Retrieval Thanks for David Doermann for most of these.
Mentor Prof. Amitabha Mukerjee Deepak Pathak Kaustubh Tapi 10346
Preliminary Transformations Presented By: -Mona Saudagar Under Guidance of: - Prof. S. V. Jain Multi Oriented Text Recognition In Digital Images.
Pattern Recognition NTUEE 高奕豪 2005/4/14. Outline Introduction Definition, Examples, Related Fields, System, and Design Approaches Bayesian, Hidden Markov.
Handwriting Recognition
Automatic Script Identification. Why do we need Script Identification OCRs are generally language dependent. Document layout analysis is sometimes language.
Arabic Handwriting Recognition Thomas Taylor. Roadmap  Introduction to Handwriting Recognition  Introduction to Arabic Language  Challenges of Recognition.
WLD: A Robust Local Image Descriptor Jie Chen, Shiguang Shan, Chu He, Guoying Zhao, Matti Pietikäinen, Xilin Chen, Wen Gao 报告人:蒲薇榄.
1 Convolutional neural networks Abin - Roozgard. 2  Introduction  Drawbacks of previous neural networks  Convolutional neural networks  LeNet 5 
1 A Statistical Matching Method in Wavelet Domain for Handwritten Character Recognition Presented by Te-Wei Chiang July, 2005.
 Handwritten character recognition is a frontier area for research for the past few decades  OCR-process of translation of images of handwritten shorthand.
License Plate Recognition of A Vehicle using MATLAB
Optical Character Recognition
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
National Taiwan Normal A System to Detect Complex Motion of Nearby Vehicles on Freeways C. Y. Fang Department of Information.
Online Signature Verification
S.Rajeswari Head , Scientific Information Resource Division
Mousavi,Seyed Muhammad – Lyashenko, Vyacheslav
Extracting Old Persian Cuneiform Font Out of
Zone Identification in the Printed Gujarati Text
Face Recognition and Detection Using Eigenfaces
network of simple neuron-like computing elements
Convolutional neural networks Abin - Roozgard.
CS4670: Intro to Computer Vision
Rohit Kumar *, Amit Kataria, Sanjeev Sofat
Atul Negi and Ravi Raj Singh
Automatic Handwriting Generation
Presentation transcript:

25th June 2002IEMCT CDAC Pune1 Non-linear Normalization to Improve Telugu OCR Atul Negi, Chakravarthy Bhagvati, V.V. Suresh Kumar Department of Computer and Information Sciences, University of Hyderabad

25th June 2002IEMCT CDAC Pune2 Acknowledgements Ministry of Information Technology, New Delhi Under the Project Resource Center for Indian Language Technology Solutions (Telugu)

25th June 2002IEMCT CDAC Pune3 Organization of Presentation Introduction Telugu Script Classification By Template Matching Complete OCR Algorithm Nonlinear Normalization Results Concluding Remarks Bibliography Contact Information

25th June 2002IEMCT CDAC Pune4 Introduction OCR Research Indian Scripts – Initial era Pioneers: RMK Sinha, Deekshitalu, ISI Kolkata –Maturity: Mid Nineties Complete Systems Bangla Devanagari Recent Status of OCR in Indian Scripts –ICDAR 1999, Bangalore –ILOCR Workshop, 2002 UoH –Sadhana, Indian Acad. Sci. Feb `02, Special Issue

25th June 2002IEMCT CDAC Pune5 Introduction: Progress of Telugu OCR Structural approach (ref. 4), –moments and size of the character used Neural networks (ref. 1), –Connected Components, training and recognition Template Matching (ref. 5), –Connected Components, Templates and linear size normalization Wavelet multi-resolution analysis (ref. 6)

25th June 2002IEMCT CDAC Pune6 Telugu Script Features of Telugu –Basic vowel sounds (Acchulu) 16 symbols –Simple consonants (Hallulu) 36 symbols –Vowel Sounds (Matraas) 16 symbols –Half Consonants (Voththus) 30 symbols Complexity of Character Recognition –Composition of Characters and Syllables from above symbols: 5000 or so in common use. Reducing Complexity –Identification of glyphs used in composition : about 400

25th June 2002IEMCT CDAC Pune7 Few Telugu Characters Achchus Hallus Maatras Voththus

25th June 2002IEMCT CDAC Pune8 Classification By Template Matching Why Template Matching? –Feature Extraction Effectiveness –Dimensionality (Size 32x32) Fringe Distances (ref. 10) –No need for blurring –Distances Pre-computed and Stored –Ease of matching

25th June 2002IEMCT CDAC Pune9 The Complete OCR algorithm Read an input binary image Segment the image into words Extract the connected components from each word For each component –(a) Normalize size to match stored templates –(b) Compute fringe distance map –(c) Compute fringe distance from all templates –(d) Output template with smallest fringe distance –(e) Convert template code to ISCII Store ISCII output in a file

25th June 2002IEMCT CDAC Pune10 Nonlinear Normalization Need for Normalization –Preprocessing step to equalize size, position, inclination etc. to ease recognition –Necessary when recognition is by template matching Non-Linear Normalization –All parts of the character image not treated equally –Hypothesis: Differences between characters will be increased, therefore improved discrimination

25th June 2002IEMCT CDAC Pune11 Nonlinear Normalization Technique Line density equalization-analogous to histogram density equalization (ref. 13) Generalization : Feature Density Equalization (ref. 14) –Projection of feature density onto horizontal, vertical axes –Feature projection functions H(i) and V(j) –input, i=1,…I and j=1,…J. –new position (m, n) output computed in normalized image of size (M,N) for point (i, j) in input image of size (I,J).

25th June 2002IEMCT CDAC Pune12 Nonlinear Normalization Technique Feature Density Equalization –Feature projection functions H(k) and V(l), input, i=1,…I and j=1,…J. –New position (m, n) output size (M,N), for each point (i, j) in input image of size (I,J). –m=  k=1 to i H(k)  M / [  k=1 to i H(k)] –n=  l=1 to j V(l)  N / [  l=1 to j V(l)] –H(i)=  (j=1 to J) f(i, j) +  H –V(j)=  (i=1 to I) f(i, j) +  V { NSN by dot density

25th June 2002IEMCT CDAC Pune13 Example

25th June 2002IEMCT CDAC Pune14 Normalized Glyphs

25th June 2002IEMCT CDAC Pune15 Results

25th June 2002IEMCT CDAC Pune16 Image 1 Misclassifications: 1 (NSN), 7 (L) Total Glyphs: 145 ( 99%, 95.2% )

25th June 2002IEMCT CDAC Pune17 Image 5 Misclassifications: 105 (NSN) 136 (linear Normalization) Total Glyphs: 354 (70.3%, 61.6%)

25th June 2002IEMCT CDAC Pune18 Discussion Why Should Nonlinear Normalization succeed despite shape distortions? Is the best that we can do? Why not use this always?

25th June 2002IEMCT CDAC Pune19 Concluding Remarks Non-linear normalization appears to improve OCR accuracy (based on 1300 glyphs examined) More experimentation with the features is required to overcome problems like gaps Further testing on variety of fonts and sizes is required to conclude recognition improvement with more confidence

25th June 2002IEMCT CDAC Pune20 Bibliography M.B. Sukhswami, P. Seetharamulu, and Arun K. Pujari, “Recognition of Telugu characters using Neural networks,” Int. J. of Neural Systems, 6(3):317 (1995). R. Kasturi and S. N. Srihari (Eds.). Proc. Fifth International Conf. Document Anaalysis and Recognition. Bangalore, India, IEEE Computer Society Press, Los Alamitos, CA, (1999). B.B. Chaudhuri and U. Garain, and M. Mitra, “On OCR of the most popular two Indian language scripts: Devanagari and Bangla”, in Visual Text Recogntion and Document Processing, Ed. N. Murshed, World Scientific Press (2000). SNS Rajasekharan and B.L. Deekshatulu, “Generation and Recognition of Printed Telugu characters”, Computer Graphics and Image Processing, 6: , (1977). Atul Negi, Chakravarthy Bhagvati, and B. Krishna, “An OCR system for Telugu”, Proc.. Sixth International Conf. Document Analysis and Recognition. Seattle, USA, IEEE Computer Society Press, Los Alamitos, CA, (2001). A.K. Pujari, C.D. Naidu, and B.C.Jinaga, “An addaptive and intelligent character recognizer for Telugu scripts using multiresolution analysis and associative memory”, Proc. Canadian Conf. On AI, Calagary, Canada, May 2002, LNCS, Springer Verlag (2002). B. Krishna, “Design and implementation of a Telugu script recognition system” Technical report, Dept. of Computer and Information Sciences, University of Hyderabad, Hyderabad, India, (2000). R.C. Gonzalez and R.E. Woods. Digital Image Processing. Addison-Wesley, 1993 O.D. Trier, A.K. Jain, and R.Taxt. “Feature extraction methods for character recognition-a survey”, Pattern Recognition, 29(4): , (1996). R.L. Brown. “The fringe distance measure: an easily calculated image distance measure with recognition results comparable to Gaussian blurring”, IEEE Trans. System Man and Cybernetics, 24(1): , (1994). K. Wong, R. Casey, and F. Wahl. “Document analysis system”. IBM J. Research and Development, 26(6), (1982). G. Nagy, S. Seth, and M. Vishwanathan, “A prototype document image analysis system for technical journals” Computer, 25(7), (1992). H. Yamada, K. Yamamoto and T. Saito, “A nonlinear normalization method for handprinted Kanji character recognition-line density equalization”, Pattern Recognition, 23(9): , (1990). S-W. Lee and J-S. Park, “Nonlinear shape normalization methods for the recognition of large set handwritten characters”, Pattern Recognition, 27(7): , (1994). V.V. Suresh Kumar, “Non-linear Normalization Techniques to Improve OCR”, Technical report, Dept. of Computer and Information Sciences, University of Hyderabad, Hyderabad, India,(2002).

25th June 2002IEMCT CDAC Pune21 Contact Information Atul Negi, Chakravarthy Bhagvati Department of Computer and Information Sciences, University of Hyderabad Hyderabad , AP INDIA Visit and