Ondrej Rohlik, Pavel Mautner, Vaclav Matousek, Juergen Kempf HMM Based Handwritten Text Recognition Using Biometrical Data Acquisition Pen Ondrej Rohlik, Pavel Mautner, Vaclav Matousek, Juergen Kempf Department of Computer Science and Engineering University of West Bohemia in Pilsen
Outline Data acquisition device: The BiSP pen Handwritten text recognition Hidden Markov models Experimental results Future Work Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan
Input Devices – Overview off-line (static) scanners cameras* on-line (dynamic) electronic pens digitizers, tablets mouse* Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan
Input Device: The BiSP Pen Electronic pen* is used for data acquisition * built at University of Applied Sciences in Regensburg, Germany Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan
Input Device – Writing Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan
Input Device – Signals Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan
Handwritten Text Recognition Objective: To convert handwritten sentences or phrases in analog form (off-line or on-line sources) into digital form (ASCII or Unicode). isolated character recognition (TM, DTW, NN) word recognition (HMMs) gesture recognition Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan
Hanwritten Text hand printed characters spaced descrete characters cursive script words Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan
Primitive (observation) Signal Description Pairs of x and y signals are transformed into sequence of primitives Primitive (observation) Signal trend x y 1 2 3 4 Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan
Hidden Markov Models left-to-right model (used mostly in speech recognition) Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan
Hidden Markov Models Training – Baum-Welch algorithm Recognition – Backward algorithm Matrices that describes the model (A, B, ) are decomposed after the training – one model for each letter Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan
Word HMM Decomposition Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan
Word HMM Composition Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan
Experimental Results method have been tested on three independent data sets of various sizes limited number of letters used in our data sets: 15 reduced complexity of tagging the training set Vocabulary size 1649 2198 5129 Recognition rate (%) 88 90 82 Recognition time (min) 17-26 27-49 360 Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan
Future Work to speed up the algorithm to achieve real-time recognition incorporation of language models to improve the recognition rate special attention will be paid to signature analysis and signature verification application in tele-robotics and robot sensing robot aided signature forging Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan
Forgeries – Overwiew a) genuine c) unskilled b) zero-effort d) skilled Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan
Example of Two Features Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan
Class Boundaries Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan
Signature Verification – Algorithms For each class C Training algorithm For each feature f For each pair of signatures Classes[C][i] and Classes[C][j] Compute the difference between Classes[C][i] and Classes[C][j] and add it to an extra variable Sum[f] Compute mean value mean[f] and variance var[f] of each feature over all pairs using the variable Sum[f] Compute critical cluster coefficient using variances var[f] and weights w[f] over all features f For class C to be verified Classification algorithm For each pattern Classes[c][i] For each feature f Compute the difference and remember the least one over all patterns Sum up products of least differences and weights w[f] and compare the sum with Critical cluster coefficient Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan