1 NLP in Thailand by Asanee Kawtrakul Kasetsart University
2 Thailand Language Features What we do and the problems The main actors Research Model and Infrastructure What do we need more ? Outline
3 Dialects and Tone Isolating language Uninflected Monosyllabic No word delimiters Same form but several functions Same form but several meanings Thai Language Characteristics
4 Grammar coverage Word formation/Recognition u Compound words vs Sentences u Proper name vs Common noun u Loan word (transliterated foreign words) without special orthography
5 North 18.8% Dialects North-East 34.2% Central 33.7% South 13.3%
7 Text Processing
8 Text Processing and Problems Statistical Based Approach Knowledge Based Approaches Lack of Legal Corpus Small Corpus Lack of Standard(Pos, Semantic Concepts) Redundancy work
9 Speech Processing Speech recognition Speech generation
10 Speech Processing and Problems Recognition Generation Not Only Dialect but Tone Isolated word not Continuous speech Word Boundary detection
11 Image Processing Thai optical character recognition Hand written recognition
12 OCR and Problems Isolated Characters
13 The Main Actors Universities NECTEC (National Electronic and Computer Technology Center), Ministry of Science and Technology Environment SIGNLP
14 The Main Actors More than 50 experienced researchers (minimum 5 years research) More than 100 young researchers
15 Financial Supporter National Electronics and Computer Technology Center (NECTEC) National research council of Thailand (NRCT) Kasetsart University Research and Development Institute (KURDI) Thai Research Foundation (TRF) etc.
16 Research Model and Infrastructure Short Term Long Term Simple But Work Collaboration between end users, universities and Funding Agency (including Private sectors) Robust and very large scale Enlarge the number of researchers
17 What do we need more? Share resources (Corpus, Dictionary, Tools, etc.) Share Experiences and Knowledge Set Big Umbrella and distribute workload Establish research network Partnership
18 Conclusion Most Thai uses Thai Language Thai Language Processing has good future in the market IF…….
19 We have more Collaborative work NLP Market for 1/2 of 60 millions