Intelligent Key Prediction by N-grams and Error-correction Rules Kanokwut Thanadkran, Virach Sornlertlamvanich and Tanapong Potipiti Information Research.

Intelligent Key Prediction by N-grams and Error-correction Rules Kanokwut Thanadkran, Virach Sornlertlamvanich and Tanapong Potipiti Information Research and Development Division, National Electronics and Computer Technology Center(NECTEC), Thailand

Outline Introduction Related Works The Approach Experiments Conclusions

Introduction The Thai-English Keyboard System, Problems and Solutions

The Thai-English Keyboard System 174 characters. 83 in Thai. 52 in English. 39 in Symbol. 4 characters/1 Button. 2 languages. 2 sets of characters. Use special keys to select languages and to distinguish a set of characters. ฤ ฟ EnglishThai Shift No shift A a A

The Problems Two annoyances for Thais to type Thai- English bilingual texts. To switch between languages by using a language switching key. To distinguish a set of characters by using a combination of shift-key and a character- key. Other multilingual users face the same problems.

Related Works Language Identification and Key Prediction

Language Identification Microsoft Word 2000. Automatic language identification. Considering only surrounding text. Problems. No key prediction. Often detect incorrect language. Difficult to switch to the proper language.

Key Prediction Zheng et al. Automatic word prediction. Base on lexical tree. Word class n-grams are employed. Problems. No language identification. Required large amount of memory space.

The Approach Overview, Language Identification, Key Prediction and Pattern Shortening

The System Overview There are two main processes in our system. Automatic language identification. Automatic Thai key prediction. Language? Key Input Thai Key Prediction Output Eng Thai

(1) (2) Language Identification

Thai Key Prediction (3)

Error-correction Rules Some character sequences often generate errors. Error patterns are collected for error-correction process. Mutual information is used to optimize the patterns. Training corpus Tri-gram prediction Error correction rules Errors from Tri-gram prediction

Error-collection Rule Reduction Pattern No. Error Key Sequences Correct Patterns 1.k[lkl9 2.mpklkl9 3.kkklkl9 4.lkl9

Pattern Shortening (4) (5)

The Error-correction Process Finite Automata pattern matching is applied. The longest pattern is selected. Key Input Finite Automata Terminal? No Yes Correction

Experiments Language Identification and Key Prediction

Language Identification Result Bi-Gram is employed. Using artificial corpus for testing. The first 6 characters are enough. m (number of first characters) Identification Accuracy (%) 394.27 497.06 598.16 699.1 799.11

Key Prediction Corpus Information 25 MB training corpus. 5 MB test corpus Training Corpus (%) Test Corpus (%) Unshift Alphabets 88.6388.95 Shift Alphabets 11.3711.05

Key Prediction Result Prediction Accuracy from Training Corpus (%) Prediction Accuracy from Test Corpus (%) Tri-gram Prediction 93.1192.21 Tri-gram Prediction + Error Correction 99.5399.42

Conclusions The Approach and Future Works

The Approach We have applied the tri-gram model and error-correction rules for intelligent Thai key prediction and English-Thai language identification. The experiment reports 99 percent in accuracy, which is very impressive.

Future Works Hopefully, this technique is applicable to other Asian languages and multilingual systems. Our future work is to apply the algorithm to mobile phones, handheld devices and multilingual input systems.

The End Questions and Answers

Intelligent Key Prediction by N-grams and Error-correction Rules Kanokwut Thanadkran, Virach Sornlertlamvanich and Tanapong Potipiti Information Research.

Similar presentations

Presentation on theme: "Intelligent Key Prediction by N-grams and Error-correction Rules Kanokwut Thanadkran, Virach Sornlertlamvanich and Tanapong Potipiti Information Research."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Intelligent Key Prediction by N-grams and Error-correction Rules Kanokwut Thanadkran, Virach Sornlertlamvanich and Tanapong Potipiti Information Research.

Similar presentations

Presentation on theme: "Intelligent Key Prediction by N-grams and Error-correction Rules Kanokwut Thanadkran, Virach Sornlertlamvanich and Tanapong Potipiti Information Research."— Presentation transcript:

Similar presentations

About project

Feedback