Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Key Prediction by N-grams and Error-correction Rules Kanokwut Thanadkran, Virach Sornlertlamvanich and Tanapong Potipiti Information Research.

Similar presentations


Presentation on theme: "Intelligent Key Prediction by N-grams and Error-correction Rules Kanokwut Thanadkran, Virach Sornlertlamvanich and Tanapong Potipiti Information Research."— Presentation transcript:

1 Intelligent Key Prediction by N-grams and Error-correction Rules Kanokwut Thanadkran, Virach Sornlertlamvanich and Tanapong Potipiti Information Research and Development Division, National Electronics and Computer Technology Center(NECTEC), Thailand

2 Outline Introduction Related Works The Approach Experiments Conclusions

3 Introduction The Thai-English Keyboard System, Problems and Solutions

4 The Thai-English Keyboard System 174 characters. 83 in Thai. 52 in English. 39 in Symbol. 4 characters/1 Button. 2 languages. 2 sets of characters. Use special keys to select languages and to distinguish a set of characters. ฤ ฟ EnglishThai Shift No shift A a A

5 The Problems Two annoyances for Thais to type Thai- English bilingual texts. To switch between languages by using a language switching key. To distinguish a set of characters by using a combination of shift-key and a character- key. Other multilingual users face the same problems.

6 Related Works Language Identification and Key Prediction

7 Language Identification Microsoft Word 2000. Automatic language identification. Considering only surrounding text. Problems. No key prediction. Often detect incorrect language. Difficult to switch to the proper language.

8 Key Prediction Zheng et al. Automatic word prediction. Base on lexical tree. Word class n-grams are employed. Problems. No language identification. Required large amount of memory space.

9 The Approach Overview, Language Identification, Key Prediction and Pattern Shortening

10 The System Overview There are two main processes in our system. Automatic language identification. Automatic Thai key prediction. Language? Key Input Thai Key Prediction Output Eng Thai

11 (1) (2) Language Identification

12 Thai Key Prediction (3)

13 Error-correction Rules Some character sequences often generate errors. Error patterns are collected for error-correction process. Mutual information is used to optimize the patterns. Training corpus Tri-gram prediction Error correction rules Errors from Tri-gram prediction

14 Error-collection Rule Reduction Pattern No. Error Key Sequences Correct Patterns 1.k[lkl9 2.mpklkl9 3.kkklkl9 4.lkl9

15 Pattern Shortening (4) (5)

16 The Error-correction Process Finite Automata pattern matching is applied. The longest pattern is selected. Key Input Finite Automata Terminal? No Yes Correction

17 Experiments Language Identification and Key Prediction

18 Language Identification Result Bi-Gram is employed. Using artificial corpus for testing. The first 6 characters are enough. m (number of first characters) Identification Accuracy (%) 394.27 497.06 598.16 699.1 799.11

19 Key Prediction Corpus Information 25 MB training corpus. 5 MB test corpus Training Corpus (%) Test Corpus (%) Unshift Alphabets 88.6388.95 Shift Alphabets 11.3711.05

20 Key Prediction Result Prediction Accuracy from Training Corpus (%) Prediction Accuracy from Test Corpus (%) Tri-gram Prediction 93.1192.21 Tri-gram Prediction + Error Correction 99.5399.42

21 Conclusions The Approach and Future Works

22 The Approach We have applied the tri-gram model and error-correction rules for intelligent Thai key prediction and English-Thai language identification. The experiment reports 99 percent in accuracy, which is very impressive.

23 Future Works Hopefully, this technique is applicable to other Asian languages and multilingual systems. Our future work is to apply the algorithm to mobile phones, handheld devices and multilingual input systems.

24 The End Questions and Answers


Download ppt "Intelligent Key Prediction by N-grams and Error-correction Rules Kanokwut Thanadkran, Virach Sornlertlamvanich and Tanapong Potipiti Information Research."

Similar presentations


Ads by Google