Hidden Markov Modelling and Handwriting Recognition Csink László 2009.

Hidden Markov Modelling and Handwriting Recognition Csink László 2009

2 Types of Handwriting 1 1. BLOCK PRINTING 2. GUIDED CURSIVE HANDWRITING

3 Types of Handwriting 2 3. UNCONSTRAINED CURSIVE HANDWRITING Clearly faster, but less legible, than 1 or 2 ONLINE recognition for 3: some systems have been developed OFFLINE recognition for 3: much research has been done, still a lot to do Suen: ”no simple scheme is likely to achieve high recognition and reliability rates, not to mention human performance”

4 Introduction to Hidden Markov Modelling (HMM): a simple example 1 Suppose we want to determine the average annual temperature at a specific location over a series of years. We want to do it of such a past era of which measurements are unavailable. We assume that only two kinds of years exist: hot (H) and cold(C) and we know that the probability of a cold year coming after a hot one is 0.3, and the probability of a cold year coming after a cold one is 0.6. Similar data are known about the prob of a hot year after a hot one or a cold one, respectively. We assume that the probabilities are the same over the years. Then the data are expressed like this: HC H0.70.3 C0.40.6 We note that the row sums in the red matrix are 1! (row stochastic matrix) The transition process described by the red matrix is a MARKOV PROCESS, as the next state depends only on the prevoius one.

5 Introduction to HMM: a simple example 2 We also suppose that there is a known correaltion between the size of tree growth rings and temperature. We consider only 3 different ring sizes: Small, Medium and Large. We know that in each year the following probabilistic realtionship holds between the states H and C and the rings S, M and L: We also suppose that there is a known correaltion between the size of tree growth rings and temperature. We consider only 3 different ring sizes: Small, Medium and Large. We know that in each year the following probabilistic realtionship holds between the states H and C and the rings S, M and L: SML H0.10.40.5 C0.70.20.1 We note that the row sums in the red matrix are 1! (also a row stochastic matrix)

6 Introduction to HMM: a simple example 3 Since the past temperatures are unknown, that is the past states are hidden, the above model is called a Hidden Markov Model (HMM). Since the past temperatures are unknown, that is the past states are hidden, the above model is called a Hidden Markov Model (HMM). State transition matrix (Markov) Observation matrix Initial state distribution (we assume this is also known) A and B and π are all row stochastic

7 Introduction to HMM: a simple example 4 Denote the rings S, M and L by 0,1 and 2, resp. Assume that in a –year period we observe O=(0,1,0,2). We want to determine the most likely sequence of the Markov process given the observations O. Denote the rings S, M and L by 0,1 and 2, resp. Assume that in a –year period we observe O=(0,1,0,2). We want to determine the most likely sequence of the Markov process given the observations O. Dynamic Programming: the most likely sequence is the one with the highest probability from all possible state sequences of length four. Dynamic Programming: the most likely sequence is the one with the highest probability from all possible state sequences of length four. HMM solution: the most likely sequence is the one that maximizes the expected number of correct states. HMM solution: the most likely sequence is the one that maximizes the expected number of correct states. These two solutuions do not necessarily coincide! These two solutuions do not necessarily coincide!

8 Introduction to HMM: a simple example 5 Notations In the previous example, T=4, N=2, M=3, Q={H, C} V={0(=S), 1(=M), 2(=L)}, O=(0,1,0,2) O= (0, 1, 0, 2) State transition matrix A Observation matrix B Initial state distribution π

9 State Sequence Probability Consider a state sequence of length four Consider a state sequence of length four X=(x 0,x 1,x 2,x 3 ) with observations O=(O 0,O 1,O 2,O 3 ) X=(x 0,x 1,x 2,x 3 ) with observations O=(O 0,O 1,O 2,O 3 ) Denote by π x 0 the probability of starting in state x 0. b x 0 (O 0 ) is the probability of initially observing O 0 and a x 0,x 1 is the probability of transiting from state x 0 to state x 1. We see that the probability of the state sequence X above is Denote by π x 0 the probability of starting in state x 0. b x 0 (O 0 ) is the probability of initially observing O 0 and a x 0,x 1 is the probability of transiting from state x 0 to state x 1. We see that the probability of the state sequence X above is

10 Probability of Sequence (H,H,C,C) A= B= P(HHCC) = 0.6 (0.1) (0.7) (0.4) (0.3) (0.7) (0.6) (0.1) = 0.000212

11 Finding the Best Solution in the DP Sense Using EXCEL functions, =INDEX(A2:A17; MATCH(MAX(B2:B17);B2:B17;0)) [ =INDEX(A2:A17; HOL.VAN(MAX(B2:B17);B2:B17;0)) ] we find that sequence with highest probability is CCCH. This gives the best solution in the Dynamic Programming (DP) sense. 1 state seq. (A)Prob. (B) Normalized prob.(C) 2 HHHH0,0004120,042787 3 HHHC0,0000350,003635 4 HHCH0,0007060,073320 5 HHCC0,0002120,022017 6 HCHH0,0000500,005193 7 HCHC0,0000040,000415 8 HCCH0,0003020,031364 9 HCCC0,0000910,009451 10 CHHH0,0010980,114031 11 CHHC0,0000940,009762 12 CHCH0,0018820,195451 13 CHCC0,0005640,058573 14 CCHH0,0004700,048811 15 CCHC0,0000400,004154 16 CCCH0,0028220,293073 17 CCCC0,0008470,087963 18 SUM0,0096291,000000 We compute the state sequence probabilities (see left) the same way as we computed the 5th row in the previous slides. Writing =B2/B$18 into C2 and copying the formula downwards we get the normalized probabilities.

12 1 state seq. (A)Prob. (B) Norm. prob. (C)1st2nd3rd4th 2HHHH 0,0004120,042787 HHHH 3HHHC 0,0000350,003635 HHHC 4HHCH 0,0007060,073320 HHCH 5HHCC 0,0002120,022017 HHCC 6HCHH 0,0000500,005193 HCHH 7HCHC 0,0000040,000415 HCHC 8HCCH 0,0003020,031364 HCCH 9HCCC 0,0000910,009451 HCCC 10CHHH 0,0010980,114031 CHHH 11CHHC 0,0000940,009762 CHHC 12CHCH 0,0018820,195451 CHCH 13CHCC 0,0005640,058573 CHCC 14CCHH 0,0004700,048811 CCHH 15CCHC 0,0000400,004154 CCHC 16CCCH 0,0028220,293073 CCCH 17CCCC 0,0008470,087963 CCCC 0123 P(H)0,18820,51960,22880,8040 P(C)0,81180,48040,77120,1960 Using the EXCEL finctions MID [KÖZÉP] and SUMIF [SZUMHA] we produced the columns D,E,F and G to show the 1st,2nd,3rd and 4th states. Then summing up columns D,E,F and G when the state is ”H” we get the first row of the HMM prob matrix. The second row of the HMM prob matrix is computed similarly, using ”C” instead of ”H”. The HMM prob matrix

13 Three Problems Problem 1 Given the model λ=(A,B,π) and a sequence of observations O, find P(O| λ). In other words, we want to determine the likelihood of the observed sequence O, given the model. Problem 2 Given the model λ=(A,B,π) and a sequence of observations O, find an optimal state sequence for the underlying Markov process. In other words, we want to uncover the hidden part of the Hidden Markov Model. Problem 3 Given an observation sequence O and dimensions N and M, find the model λ=(A,B,π) that maximizes the probability of O. This can be viewd as training the model to best fit the observed data.

14 Solution to Problem 1 Let λ=(A,B,π) be a given model and let O=(O 0,O 1,…,O T-1 ) ne a series of observations. We want to find P(O| λ). Let X=(x 0,x 1,…,x T-1 ) be a state sequence.Then by definion of B we have and by the definition of π and A we have

15 By summing over all possible state sequences we get As the length of the state sequence and the observation sequence is T, we have N T terms in this sum, and we have T multiplications in a term, so the total number of multiplications is T×N T. Fortunately, there exists a much faster algorithm as well.

16 The Forward α-pass Algorithm α t (i) is the probability of the partial observation sequence up to time t, where q i is the state the underlying Markov process has at time t. Let α 0 (i)=π i b i (O 0 ) for i=0,1,…,N-1. For t=1,2,…,T-1 and i=0,1,…,N-1 compute We have to compute α T×N-times and there are N multiplications in each α, so this method needs T×N 2 multiplications.

17 Solution to Problem 2 Given the model λ=(A,B,π) and a sequence of observations O, our goal is to find the most likely state sequence, i.e. the one that maximizes the expected number of correct states. First we define the backward algorithm called β-pass.

18 Example (1996): HMM-based Handwritten Symbol Recognition Input: a sequence of strokes captured during writing.A stroke is a sequence of (x,y)-coordinates correponding to pen positions. A stroke is writing from pen down to pen up. Input: a sequence of strokes captured during writing.A stroke is a sequence of (x,y)-coordinates correponding to pen positions. A stroke is writing from pen down to pen up. Slant correction: try to find a near-vertical part in each stroke and rotate it the whole stroke so that the part should be vertical. Slant correction: try to find a near-vertical part in each stroke and rotate it the whole stroke so that the part should be vertical.

19 Normalization of Strokes Normalization: determine the x-length of each stroke. Denote t10 the threshold under which 10 % of strokes are with respect to x-length. Denote x90 the threshold above which 10 % of strokes are with respect to x-length. Then compute the average of x- lengths of all strokes that are between the two thresholds, denote this by x’. Normalization: determine the x-length of each stroke. Denote t10 the threshold under which 10 % of strokes are with respect to x-length. Denote x90 the threshold above which 10 % of strokes are with respect to x-length. Then compute the average of x- lengths of all strokes that are between the two thresholds, denote this by x’. Perform the above operations with respect to y- length two; compute y’. Perform the above operations with respect to y- length two; compute y’. Then normalize all strokes to x’ and y’. Then normalize all strokes to x’ and y’.

20 The Online Temporal Feature Vector Introduce a hidden stroke between the pen-up position of a stroke and the pen-down position of the next stroke (we assume that the strokes are sequenced according to time). Introduce a hidden stroke between the pen-up position of a stroke and the pen-down position of the next stroke (we assume that the strokes are sequenced according to time). The unified sequence of strokes and hidden strokes is resampled at equispaced points along the trajectory retaining the temporal order. For each point we store: the local position, the sine and cosine of the angle between the x-axis and the vector connecting the current point and the origin, and the fact that the point belomgs to a stroke or a hidden stroke constitute a feature vector. The unified sequence of strokes and hidden strokes is resampled at equispaced points along the trajectory retaining the temporal order. For each point we store: the local position, the sine and cosine of the angle between the x-axis and the vector connecting the current point and the origin, and the fact that the point belomgs to a stroke or a hidden stroke constitute a feature vector.

21 HMM Topology For each symbol S i of the alphabet {S 1,S 2,…,S K } an HMM λ i is generated. The HMM is such that P(s j |s i )=0 for states j i+2. For each symbol S i of the alphabet {S 1,S 2,…,S K } an HMM λ i is generated. The HMM is such that P(s j |s i )=0 for states j i+2. The question is: how can we generate an HMM? The answer is given by the solution to Problem 3. The question is: how can we generate an HMM? The answer is given by the solution to Problem 3.

22 Solution of Problem 3 Now we want to adjust the model parameters to best fit the observations. The sizes N (number of states) and M (number of observations) are fixed but A, B and π are free, we only have to take care that they be row stochastic. For t=0,1,…,T-2 and i,j in {0,1,…,N-1} define the prob of being in state q i at t and transiting to state q j at t+1:

24 The Iteration 1. First we initialize λ=(A,B,π) with a best guess, or choose random values such that π i ≈1/N, a ij ≈1/N, b j (k) ≈1/M, π,A and B must be row stochastic. 2. Compute 3. Estimate the model λ=(A,B,π) 4. If P(O| λ) increases, GOTO 2. (the increase may be measured by a threshold, or the maximum number of iterations may be set)

25 Practical Considerations Be aware of the fact that α t (i) tends to 0 as T increases. Therefore, realization of the above formulas may lead to underflow. Be aware of the fact that α t (i) tends to 0 as T increases. Therefore, realization of the above formulas may lead to underflow. Details, and pseudocodes, may be found here: http://www.cs.sjsu.edu/faculty/stamp/RUA/HMM.pdf Details, and pseudocodes, may be found here: http://www.cs.sjsu.edu/faculty/stamp/RUA/HMM.pdf http://www.cs.sjsu.edu/faculty/stamp/RUA/HMM.pdf

26 Another example (2004): Writer Identification Using HMM Recognizers Writer identification is the task of determining the author of a sample handwriting from a set of writers. Writer identification is the task of determining the author of a sample handwriting from a set of writers. Writer verification is the task of determining if a given text has been written by a certain person. Writer verification is the task of determining if a given text has been written by a certain person. If the text is predefined, it is text dependent verification, otherwise it is text independent verification. If the text is predefined, it is text dependent verification, otherwise it is text independent verification. Writer verification may be done online or offline. Writer verification may be done online or offline. It is generally believed that text independent verification is more difficult than the text dependent one. It is generally believed that text independent verification is more difficult than the text dependent one.

27 For each writer, an individual HMM-based handwriting recognition system is trained using only data from that writer. Thus from n writers we get n different HMM’s. For each writer, an individual HMM-based handwriting recognition system is trained using only data from that writer. Thus from n writers we get n different HMM’s. Given an arbitrary line of text input, each HMM recognizer outputs some recognition with a recognition score. Given an arbitrary line of text input, each HMM recognizer outputs some recognition with a recognition score. It is assumed that It is assumed that Correctly recognized words have a higher score than incorrectly recognized words Correctly recognized words have a higher score than incorrectly recognized words Recognition rate on input from a writer the system was trained on is higher than on input from other writers Recognition rate on input from a writer the system was trained on is higher than on input from other writers The scores produced by the different HMM’s can be used to decide who has written the input text line. The scores produced by the different HMM’s can be used to decide who has written the input text line.

28 After preprocessing (slant, skew, baseline location, height) a sliding window of one-pixel width is shifted from left to right After preprocessing (slant, skew, baseline location, height) a sliding window of one-pixel width is shifted from left to right The features are: number of black pixels in the window, center of gravity, second order moment, position and contour direction of the upper- and lowermost pixels, number of black-to-white transitions in the window, distance between the upper- and lowermost pixels. The features are: number of black pixels in the window, center of gravity, second order moment, position and contour direction of the upper- and lowermost pixels, number of black-to-white transitions in the window, distance between the upper- and lowermost pixels. Normalization may lead to the reduction of individuality, on the other hand, it supports recognition which is important for the verification project Normalization may lead to the reduction of individuality, on the other hand, it supports recognition which is important for the verification project For each upper- and lowercase character an individual HMM is built. For each upper- and lowercase character an individual HMM is built.

29 Related Concepts The Viterbi algorithm is a dynamic programming for finding the most likely sequence of hidden states – called the Viterbi path. The Viterbi algorithm is a dynamic programming for finding the most likely sequence of hidden states – called the Viterbi path. The Baum–Welch algorithm is used to find the unknown parameters of an HMM. It makes use of the forward-backward algorithm used above. The Baum–Welch algorithm is used to find the unknown parameters of an HMM. It makes use of the forward-backward algorithm used above.

30 HMM-based Speech Recognition Modern general-purpose speech recognition systems are generally based on Hidden Markov Models. Reason: Speech could be thought of as a Markov model. Modern general-purpose speech recognition systems are generally based on Hidden Markov Models. Reason: Speech could be thought of as a Markov model.Markov modelMarkov model For further reference consult Rabiner: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition For further reference consult Rabiner: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition http://www.caip.rutgers.edu/~lrr/Reprints/tutorial%20on%20hmm%2 0and%20applications.pdf http://www.caip.rutgers.edu/~lrr/Reprints/tutorial%20on%20hmm%2 0and%20applications.pdf http://www.caip.rutgers.edu/~lrr/Reprints/tutorial%20on%20hmm%2 0and%20applications.pdf http://www.caip.rutgers.edu/~lrr/Reprints/tutorial%20on%20hmm%2 0and%20applications.pdf

31 Thank you for your attention !

Hidden Markov Modelling and Handwriting Recognition Csink László 2009.

Similar presentations

Presentation on theme: "Hidden Markov Modelling and Handwriting Recognition Csink László 2009."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hidden Markov Modelling and Handwriting Recognition Csink László 2009.

Similar presentations

Presentation on theme: "Hidden Markov Modelling and Handwriting Recognition Csink László 2009."— Presentation transcript:

Similar presentations

About project

Feedback