Intelligent Key Prediction by N-grams and Error-correction Rules Kanokwut Thanadkran, Virach Sornlertlamvanich and Tanapong Potipiti Information Research.

Slides:



Advertisements
Similar presentations
Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Advertisements

Chapter 1 - An Introduction to Computers and Problem Solving
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Input to the Computer * Input * Keyboard * Pointing Devices
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Starting Out with Programming Logic & Design First Edition by Tony Gaddis.
Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Fall 2008.
1 Security problems of your keyboard –Authentication based on key strokes –Compromising emanations consist of electrical, mechanical, or acoustical –Supply.
Chapter 1- Visual Basic Schneider1 Chapter 1 An Introduction to Computers and Visual Basic.
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
Programming Logic and Design, Introductory, Fourth Edition1 Understanding Computer Components and Operations (continued) A program must be free of syntax.
Language Identification in Web Pages Bruno Martins, Mário J. Silva Faculdade de Ciências da Universidade Lisboa ACM SAC 2005 DOCUMENT ENGENEERING TRACK.
Using CTW as a language modeler in Dasher Martijn van Veen Signal Processing Group Department of Electrical Engineering Eindhoven University.
Design and Analysis of Algorithms
CHARACTERS Data Representation. Using binary to represent characters Computers can only process binary numbers (1’s and 0’s) so a system was developed.
AS Level ICT Selection and use of input devices and input media: Capturing transaction data.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Natural Language Processing Lab Northeastern University, China Feiliang Ren EBMT Based on Finite Automata State Transfer Generation Feiliang Ren.
© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
TagHelper: Basics Part 1 Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh Science of Learning Center and The Office of Naval.
Boosting Neural Networks Published by Holger Schwenk and Yoshua Benggio Neural Computation, 12(8): , Presented by Yong Li.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
CSC 338: Compiler design and implementation
1 An Introduction to Formal Languages and Automata Provided by : Babak Salimi webAdd:
AUTOMATA THEORY Reference Introduction to Automata Theory Languages and Computation Hopcraft, Ullman and Motwani.
1 Unit 1: Automata Theory and Formal Languages Readings 1, 2.2, 2.3.
17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World.
Towards an Intelligent Multilingual Keyboard System Tanapong Potipiti, Virach Sornlertlamvanich, Kanokwut Thanadkran Information Research and Development.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
1 A Hierarchical Approach to Wrapper Induction Presentation by Tim Chartrand of A paper bypaper Ion Muslea, Steve Minton and Craig Knoblock.
VarDetect: a nucleotide sequence variation exploratory tool VarDetect Chumpol Ngamphiw 1, Supasak Kulawonganunchai 2, Anunchai Assawamakin 3, Ekachai Jenwitheesuk.
Malay-English Bitext Mapping and Alignment Using SIMR/GSA Algorithms Mosleh Al-Adhaileh Tang Enya Kong Mosleh Al-Adhaileh and Tang Enya Kong Computer Aided.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
An Investigation of Statistical Machine Translation (Spanish to English) Raghav Bashyal.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
Internal Lab Registeration labreg/lab/signup.aspxhttp:// labreg/lab/signup.aspx
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Introduction to Parsing
INTRO TO COMPUTING. Looking Inside Computer 2Computing 2 | Lecture-1 Capabilities Can Read Can Write Can Store A/L Operations Automation.
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Detection of Spelling Errors in Swedish Clinical Text Nizamuddin Uddin and Hercules Dalianis Department of Computer and Systems Sciences, (DSV)
公司 標誌 Question Answering System Introduction to Q-A System 資訊四 B 張弘霖 資訊四 B 王惟正.
Chunk Parsing II Chunking as Tagging. Chunk Parsing “Shallow parsing has become an interesting alternative to full parsing. The main goal of a shallow.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
CISC Machine Learning for Solving Systems Problems Presented by: Eunjung Park Dept of Computer & Information Sciences University of Delaware Solutions.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
UNL Document Summarization Virach Sornlertlamvanich, Tanapong Potipiti and Thatsanee Charoenporn Information Research and Development Division National.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
INTRODUCTION TO COMPUTER PROGRAMMING(IT-303) Basics.
Introduction to Machine Learning August, 2014 Vũ Việt Vũ Computer Engineering Division, Electronics Faculty Thai Nguyen University of Technology.
C++ Programming: From Problem Analysis to Program Design, Fifth Edition Chapter 1: An Overview of Computers and Programming Languages.
Theory of Computation Automata Theory Dr. Ayman Srour.
Chapter 13 Artificial Intelligence. Artificial Intelligence – Figure 13.1 The Turing Test.
Language Identification and Part-of-Speech Tagging
Binary Representation in Text
Binary Representation in Text
Programming Languages Translator
G. Pullaiah College of Engineering and Technology
Chapter 2- Visual Basic Schneider
3.1.1 Introduction to Machine Learning
COMS 161 Introduction to Computing
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
ASCII and Unicode.
Automata theory and formal languages COS 3112 – AUTOMATA THEORY PRELIM PERIOD WEEK 1 AND 2.
Presentation transcript:

Intelligent Key Prediction by N-grams and Error-correction Rules Kanokwut Thanadkran, Virach Sornlertlamvanich and Tanapong Potipiti Information Research and Development Division, National Electronics and Computer Technology Center(NECTEC), Thailand

Outline Introduction Related Works The Approach Experiments Conclusions

Introduction The Thai-English Keyboard System, Problems and Solutions

The Thai-English Keyboard System 174 characters. 83 in Thai. 52 in English. 39 in Symbol. 4 characters/1 Button. 2 languages. 2 sets of characters. Use special keys to select languages and to distinguish a set of characters. ฤ ฟ EnglishThai Shift No shift A a A

The Problems Two annoyances for Thais to type Thai- English bilingual texts. To switch between languages by using a language switching key. To distinguish a set of characters by using a combination of shift-key and a character- key. Other multilingual users face the same problems.

Related Works Language Identification and Key Prediction

Language Identification Microsoft Word Automatic language identification. Considering only surrounding text. Problems. No key prediction. Often detect incorrect language. Difficult to switch to the proper language.

Key Prediction Zheng et al. Automatic word prediction. Base on lexical tree. Word class n-grams are employed. Problems. No language identification. Required large amount of memory space.

The Approach Overview, Language Identification, Key Prediction and Pattern Shortening

The System Overview There are two main processes in our system. Automatic language identification. Automatic Thai key prediction. Language? Key Input Thai Key Prediction Output Eng Thai

(1) (2) Language Identification

Thai Key Prediction (3)

Error-correction Rules Some character sequences often generate errors. Error patterns are collected for error-correction process. Mutual information is used to optimize the patterns. Training corpus Tri-gram prediction Error correction rules Errors from Tri-gram prediction

Error-collection Rule Reduction Pattern No. Error Key Sequences Correct Patterns 1.k[lkl9 2.mpklkl9 3.kkklkl9 4.lkl9

Pattern Shortening (4) (5)

The Error-correction Process Finite Automata pattern matching is applied. The longest pattern is selected. Key Input Finite Automata Terminal? No Yes Correction

Experiments Language Identification and Key Prediction

Language Identification Result Bi-Gram is employed. Using artificial corpus for testing. The first 6 characters are enough. m (number of first characters) Identification Accuracy (%)

Key Prediction Corpus Information 25 MB training corpus. 5 MB test corpus Training Corpus (%) Test Corpus (%) Unshift Alphabets Shift Alphabets

Key Prediction Result Prediction Accuracy from Training Corpus (%) Prediction Accuracy from Test Corpus (%) Tri-gram Prediction Tri-gram Prediction + Error Correction

Conclusions The Approach and Future Works

The Approach We have applied the tri-gram model and error-correction rules for intelligent Thai key prediction and English-Thai language identification. The experiment reports 99 percent in accuracy, which is very impressive.

Future Works Hopefully, this technique is applicable to other Asian languages and multilingual systems. Our future work is to apply the algorithm to mobile phones, handheld devices and multilingual input systems.

The End Questions and Answers