VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen – ePi Technology Thi Dam Nguyen – ePi Technology
Major tasks in Natural Language Processing High level Application Word segmentation Part-of-speech tagging Automatic summarization Machine translation Sentiment analysis Word segmentation, part-of-speech tagging (POS tagging), syntactic parsing, named-entity recognition (NER) and co-reference resolution is fundamental task in Natural Language Processing (NLP). Researchers have to do the task although they are spent a lot of time and cost of researchers to reach a deliverable state. ... ... Fundamental task
Named Entity Recognizer (NER) Fundamental Tasks Word segmentation Part-of-speech tagging Syntactic Parser Named Entity Recognizer (NER) Coreference resolution
Framework for Vietnamese NLP? Stanford CoreNLP Framework for English Framework for Vietnamese Natural Language Processing
JVnTextPro JVnTextPro Tokenizer POS Tagging Enough? Solution?
Word segmentation VnTokenizer with accuracy upto 96%-98%. Some improvement are to speed up vnTokenizer: Reading XML-encode data via SAX Tokenize a document by LL parser Using automaton with default transition.
Part-of-speech tagging JVnTagger 91.3% VnTagger 95% VnQTag 85.57 %
MaltParser Syntactic parsing Tree adjoining grammar, head driven phrase structure grammar,… No software deliverable. MaltParser Open-source Independent of language Acceptable accuracy 70%
Named-Entity Recognition Using rule-based method. The rule-based NER includes two part: a word searching component called gazetteer in GATE's terminology a pattern matching component called transducer Accuracy 59%
Coreference resolution Approaching by heuristic rules consists of two component: Orthographical matcher (orthormatcher) with 17 rules. Co-referencer performs pronominal co-referencing and integrate everything into co-reference lists
Open Source Framework for Vietnamese NLP Document Reset PR VnTokenizer Syntactic parsing Named-entity recognition Sentence splitter VnTagger MaltParser Vn-Ner Co-reference VNLP 11
Application of VNLP Automatic synthesis and classification webpages Online Reputation Managerment - noti5.vn applications of sentiment analysis all mention about a brand determine positive and negative opinion 12
PART 5 – CONCLUSION AND FUTURE WORK
Thank for your attention! Q & A