Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center
Pen Technologies Pen-based interfaces in mobile computing
Mathematical Formulation H : Handwriting evidence on the basis of which a recognizer will make its decision – H = {h1, h2, h3, h4,…,hm} W : Word string from a large vocabulary – W = {w1, w2, w3, w4,…., wn} Recognizer : –
Mathematical Formulation SOURCE CHANNEL
Source Channel Model WRITERDIGITIZER FEATURE EXTRACTOR DECODER H CHANNEL
Source Channel Model Handwriting Modeling : HMMs Language Modeling SEARCH STRATEGY
Hidden Markov Models Memoryless Model Add Memory Hide Something Markov Model Mixture Model Hide Something Add Memory Hidden Markov Model Alan B Poritz : Hidden Markov Models : A Guided Tour ICASSP 1988
Memoryless Model COIN : Heads (1) : probability p Tails (0) : probability 1-p Flip the coin 10 times (IID Random sequence) Sequence : Probability = p*(1-p)*p*(1-p)*(1-p)*(1-p)*p*p*p*p =
Add Memory – Markov Model 2 Coins : COIN 1 => p(1) = 0.9, p(0) = 0.1 COIN 2 => p(1) = 0.1, p(0) = 0.9 Experiment : Flip COIN 1, Note the outcome If ( outcome = Head) Flip Coin 1 Else Flip Coin 2 End Sequence : Probability = 0.9*0.9*0.1*0.9 Sequence 1010 : Probability = 0.9*0.1*0.1*0.1
State Sequence Representation : : : : 0.9 Observed Output Sequence Unique State Sequence
Hide the states => Hidden Markov Model s1 s
Why use Hidden Markov Models Instead of Non-hidden? Hidden Markov Models can be smaller – less parameters to estimate States may be truly hidden – Position of the hand – Positions of articulators
Summary of HMM Basics We are interested in assigning probabilities p(H) to feature sequences Memoryless model – This model has no memory of the past Markov noticed that is some sequences the future depends on the past. He introduced the concept of a STATE – a equivalence class of the past that influences the future Hide the states : HMM
Hidden Markov Models Given a observed sequence H – Compute p(H) for decoding – Find the most likely state sequence for a given Markov model (Viterbi algorithm) – Estimate the parameters of the Markov source (training)
Compute p(H) s1 s p(a) p(b) s
Compute p(H) – contd. Compute p(H) where H = a a b b Enumerate all ways of producing h1=a s1 s2 s3 0.5x x x x
Compute p(H) – contd. Enumerate all ways of producing h1=a h2=a s1 s2 s3 0.5x x x x s1 s2 s3 0.5x x x x s2 s3 0.4x x0.3
Compute p(H) Can save computation by combining paths s1 s2 s3 s1 s2 s3 s2 s3
Compute p(H) Trellis Diagram s1 s2 s3 0aaaaabaabb.5x.8.5x.2.4x.5.3x.7.3x.3.5x.3.5x.7.2.1
Basic Recursion Prob (Node) = sum (Prob(predecessor) x Prob (predecessor->node) ) Boundary condition : Prob (s, 0) = 1 s1 s2 s3 0a aaaabaabb 1.0 s1, a : s1, a : 0.4 s1, 0 :.08 s1, a :.21 s2, a : s1, 0 : 0.2 s1, 0 :.032 s1, a :.084 s2, a :.066 s1, 0 :.0032 s1, b :.0144 s2, b :.0364 s1, 0 : s1, b : s2, b :.0108 s2, 0 :.033 s1, a : s2, 0 : 0.02 s2, 0 :.0182 s2, a :.0495 s2, 0 :.0054 s2, b :.0637 s2, 0 : s2, b :.0189
More Formally –Forward Algorithm
Find Most Likely Path for aabb - Dynamic Prog. or Viterbi Max Prob (Node) = MAX(Max(predecessor) x Prob (predecessor->node) ) s1 s2 s3 0a aaaabaabb 1.0 s1, a : 0.4s1, a :.16s1, b :.016 s1,b :.0016 s1, 0 :.08 s1, a :.21 s2, a :.04 s1, 0 : 0.2 s1, 0 :.032 s1, a :.084 s2, a :.066 s1, 0 :.0032 s1, b :.0144 s2, b :.0168 s1, 0 : s1, b : s2, b : s2, 0 :.021 s1, a :.03 s2, 0 : 0.02 s2, 0 :.0084 s2, a :.0315 s2, 0 : s2, b :.0294 s2, 0 : s2, b :.00588
Training HMM parameters 1/3 1/2 p(a) p(b) = H = abaa p(H) =
Training HMM parameters = A posterior probability of path i =
Training HMM parameters
Keep on repeating : 600 iterations : p(H) = Another initial parameter set : p(H) =
Training HMM parameters Converges to local maximum There are 7 (atleast) local maxima Final solution depends on starting point Speed of convergence depends on starting point
Training HMM parameters : Forward Backward algorithm Improves on enumerating algorithm by using the Trellis Results in reduction from exponential computation to linear computation
Forward Backward Algorithm j
Forward Backward Algorithm = Probability that hj is produced by and the complete output is H = = Probability of being in state and producing the output h1,.. hj-1 = Probability of being in state and producing the output hj+1,..hm
Forward Backward Algorithm Transition count
Training HMM parameters Guess initial values for all parameters Compute forward and backward pass probabilities Compute counts Re-estimate probabilities BAUM-WELCH, BAUM-EAGON, FORWARD-BACKWARD, E-M