Download presentation
Presentation is loading. Please wait.
Published byDarren McBride Modified over 9 years ago
1
Intelligent Key Prediction by N-grams and Error-correction Rules Kanokwut Thanadkran, Virach Sornlertlamvanich and Tanapong Potipiti Information Research and Development Division, National Electronics and Computer Technology Center(NECTEC), Thailand
2
Outline Introduction Related Works The Approach Experiments Conclusions
3
Introduction The Thai-English Keyboard System, Problems and Solutions
4
The Thai-English Keyboard System 174 characters. 83 in Thai. 52 in English. 39 in Symbol. 4 characters/1 Button. 2 languages. 2 sets of characters. Use special keys to select languages and to distinguish a set of characters. ฤ ฟ EnglishThai Shift No shift A a A
5
The Problems Two annoyances for Thais to type Thai- English bilingual texts. To switch between languages by using a language switching key. To distinguish a set of characters by using a combination of shift-key and a character- key. Other multilingual users face the same problems.
6
Related Works Language Identification and Key Prediction
7
Language Identification Microsoft Word 2000. Automatic language identification. Considering only surrounding text. Problems. No key prediction. Often detect incorrect language. Difficult to switch to the proper language.
8
Key Prediction Zheng et al. Automatic word prediction. Base on lexical tree. Word class n-grams are employed. Problems. No language identification. Required large amount of memory space.
9
The Approach Overview, Language Identification, Key Prediction and Pattern Shortening
10
The System Overview There are two main processes in our system. Automatic language identification. Automatic Thai key prediction. Language? Key Input Thai Key Prediction Output Eng Thai
11
(1) (2) Language Identification
12
Thai Key Prediction (3)
13
Error-correction Rules Some character sequences often generate errors. Error patterns are collected for error-correction process. Mutual information is used to optimize the patterns. Training corpus Tri-gram prediction Error correction rules Errors from Tri-gram prediction
14
Error-collection Rule Reduction Pattern No. Error Key Sequences Correct Patterns 1.k[lkl9 2.mpklkl9 3.kkklkl9 4.lkl9
15
Pattern Shortening (4) (5)
16
The Error-correction Process Finite Automata pattern matching is applied. The longest pattern is selected. Key Input Finite Automata Terminal? No Yes Correction
17
Experiments Language Identification and Key Prediction
18
Language Identification Result Bi-Gram is employed. Using artificial corpus for testing. The first 6 characters are enough. m (number of first characters) Identification Accuracy (%) 394.27 497.06 598.16 699.1 799.11
19
Key Prediction Corpus Information 25 MB training corpus. 5 MB test corpus Training Corpus (%) Test Corpus (%) Unshift Alphabets 88.6388.95 Shift Alphabets 11.3711.05
20
Key Prediction Result Prediction Accuracy from Training Corpus (%) Prediction Accuracy from Test Corpus (%) Tri-gram Prediction 93.1192.21 Tri-gram Prediction + Error Correction 99.5399.42
21
Conclusions The Approach and Future Works
22
The Approach We have applied the tri-gram model and error-correction rules for intelligent Thai key prediction and English-Thai language identification. The experiment reports 99 percent in accuracy, which is very impressive.
23
Future Works Hopefully, this technique is applicable to other Asian languages and multilingual systems. Our future work is to apply the algorithm to mobile phones, handheld devices and multilingual input systems.
24
The End Questions and Answers
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.