Download presentation
Presentation is loading. Please wait.
Published byLester Sherman Modified over 6 years ago
1
Improving Chinese handwriting Recognition by Fusing speech recognition
Zhang Xi-Wen CSE, CUHK and HCI Lab., ISCAS Good afternoon, everyone. The title of my presentation is “Improving Chinese handwriting recognition by fusing speech recognition based on a lexicon using Dynamic Programming”.
2
Outline 1 Chinese handwriting recognition 2 Chinese speech recognition
3 Information fusion 4 Experimental results In this presentation, characters are first extracted from Chinese handwriting, and then are recognized. The speech corresponding to the handwriting is recognized by a continuous speech recognizer from Microsoft Corporation. Two strings of text is fused based on a lexicon using Dynamic Programming. Finally, some experimental results are also presented, and show the approach is effective and robust.
3
Handwriting Recognition
Handwriting segmentation Character recognition To recognize Chinese handwriting, characters are first extracted, and then are recognized.
4
1.1 Handwriting segmentation
It is more difficult for Chinese handwriting segmentation In Chinese handwriting, there are no gap between two adjacent characters, not like in English handwriting, there are a gap between two neighboring words.
5
Character extraction using histogram
A histogram of between-stroke gaps. The dimidiate threshold of the histogram is to extract lines of strokes. The dimidiate threshold of the histogram of a line of strokes is to extract characters. To improve adaptability of character extraction, a fast approach based on the histogram of between-stroke gaps is proposed. The dimidiate threshold of the histogram of a piece of handwriting is to extract lines of strokes. The dimidiate threshold of the histogram of a line of strokes is to extract characters.
6
Figure 1. Handwriting segmentation
The first line is the original handwriting. The second line is a extracted text line, which are labeled using a bounding rectangle. The third line is many extracted characters, which are also labeled using bounding rectangles. Figure 1. Handwriting segmentation
7
Problems remained A Chinese character may be mis-segmented into many characters. Many Chinese characters may be mis-grouped as a character. The segmentation error will inevitably result in handwriting recognition errors. Thus, after Chinese handwriting segmentation, a character may be mis-segmented into many characters, and many characters may be mis-grouped as a character. It is difficult to attain completely correct character extraction. The segmentation error will inevitably result in handwriting recognition errors.
8
1.2 Character recognition
Isolated character recognizer from HW Many candidates Characters extracted are recognized using isolated character recognizer from Hang Wang Corporation.
9
Figure 2. Handwriting recognition
Text recognized from the handwriting. Some characters will be recognized incorrectly even if they are extracted correctly. To improve handwriting recognition, we fuse speech recognition. The ground-truth text. Figure 2. Handwriting recognition
10
2 Speech recognition Chinese speech. On-line, microphone.
Continuous speech recognizer from MS. Speech is captured by a microphone connected with computer. The speech is recognized using continuous speech recognizer from MS while speech is being captured.
11
Figure 3. Speech recognition
Text recognized from the speech corresponding to the handwriting. The ground-truth text. The result from speech recognition is also not satisfactory, there are segmentation error and recognition error. Moreover, it is more difficult to correct speech segmentation error than that of handwriting because speech must be played in a dynamic way, and can give no text information while viewing it. But, fortunately, in many cases, the texts recognized from a piece of Chinese handwriting and the speech corresponding to the same handwriting, respectively, are complementary to each other. Figure 3. Speech recognition
12
3 Text fusion An optimization problem Dynamic Programming
We propose to formulate how to fuse them properly as an combination optimization problem, and solves it using dynamic programming.
13
3.1 Principles The fused text should contain more semantic information. Construct a text with the least characters and the most semantic information. We hope to get a fused text, which contain more semantic information. And we aim to construct a text with the least characters and the most semantic information based on a language model. The right Chinese characters in the two texts should be extracted. But the wrong Chinese characters in them should not be extracted. The extracted Chinese characters in the fused text should be listed in the order providing more semantic information.
14
3.2 Four ways Figure 4. Texts to be fused
Text recognized from the handwriting. Text recognized from the speech corresponding to the handwriting. Each character in a fused text is selected from the two texts. There are four ways to select or pass over a character in the two texts: 1) select a character from the text recognized from handwriting and move the current position to the next character, 2) move the current position to the next character in the text, 3) select a character from the text recognized from the speech and move the current position to the next character, 4) move the current position to the next character in the text. Figure 4. Texts to be fused
15
3.3 Dynamic Programming A directed graph. Optimal paths.
All different indexed strings can be represented as a directed graph, which paths correspond to all possible fused texts. The optimal fused texts correspond to optimal paths with maximum semantic information in the graph, which can be attained using a dynamic programming algorithm.
16
Figure 5. A directed graph with N levels.
In the graph, each level has four vertices, and each vertex in the current level is all connected with all vertices in the succeeding level. A possible solution (i.e. a possible fused text) should be a path with N vertices, and is a string with integers: 0, 1, 2, or 3. N is the number of levels in the graph; and the integers (0~3) represent four ways for choosing characters from the two texts to generate the fused text. Figure 5. A directed graph with N levels.
17
(a) Text recognized from the handwriting.
(b) Text recognized from the speech corresponding to the handwriting. (c) The optimal fused text corresponding to the optimal path. Figure 6(a) shows a text recognized from a piece of handwriting shown in Figure 1(d), which has 10 characters in total. Figure 6(b) shows a text recognized from the speech corresponding to the handwriting, which has 9 characters in total. So, the character number of the indexed string for optimal fused texts is 19, and the graph has 19 levels. Each level in the graph has four vertices. Figure 6(c) shows the optimal fused text corresponding to the best path in the graph after optimal path finding. As the ground-truth texts can’t be obtained by any automatic processing, it is obtained by a professional engineer, shown in Figure 6(d). The optimal fused text attained using the dynamic programming algorithm is very satisfactory. (d) The ground-truth text. Figure 6. Text fusion using DP.
18
3.4 A language model Lexicon Syntax Semantic
The semantic scores of the fused texts corresponding to paths in the graph is calculated using a statistical model of Chinese language. We have used a lexicon with thirty thousand Chinese words with characters from two to eleven. In the future, we will incorporate higher information, syntax and semantic.
19
Lexicon It is the lexicon. There are words, each word has two characters. There are words, each word has three characters. There are words, each word has four characters. There are 701 words, each word has five characters.
20
There are 242 words with six characters
There are 242 words with six characters. There are 351 words with seven characters. There are 381 words with eight characters. There are 415 words with eight characters.
21
4 Experimental results More experimental results are shown.
22
Figure 7 (a) shows a piece of handwriting
Figure 7 (a) shows a piece of handwriting. Lines of characters are extracted as shown in Figure 7(b). The extracted characters using the approach developed in this paper are shown in Figure 7(c), where there are three underlined characters that are segmented into two wrong characters. Figure 7(d) shows characters identified by a professional engineer.
23
The text firstly recognized from the handwriting is shown in Figure 8(a), which has 24 characters in total. Figure 8(b) presents the text firstly recognized from the speech corresponding to the handwriting, which has 21 characters in total. In the two figures, the wrong recognized characters are marked in gray color. To speed up the fusion, each line of text is used as a processing element in the approach proposed. The graph corresponding to fused texts are shown in Figure 8(c). The fused texts with the maximum semantic scores from the graph are shown in Figure 8(d). That is over of my presentation.
24
your criticism, comments and suggestions!
Thank you very much for your criticism, comments and suggestions! Tel:
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.