Download presentation
Published byClaribel Henderson Modified over 9 years ago
1
VNLP: An Open Source Framework for Vietnamese Natural Language Processing
Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen – ePi Technology Thi Dam Nguyen – ePi Technology
2
Major tasks in Natural Language Processing
High level Application Word segmentation Part-of-speech tagging Automatic summarization Machine translation Sentiment analysis Word segmentation, part-of-speech tagging (POS tagging), syntactic parsing, named-entity recognition (NER) and co-reference resolution is fundamental task in Natural Language Processing (NLP). Researchers have to do the task although they are spent a lot of time and cost of researchers to reach a deliverable state. ... ... Fundamental task
3
Named Entity Recognizer (NER)
Fundamental Tasks Word segmentation Part-of-speech tagging Syntactic Parser Named Entity Recognizer (NER) Coreference resolution
4
Framework for Vietnamese NLP?
Stanford CoreNLP Framework for English Framework for Vietnamese Natural Language Processing
5
JVnTextPro JVnTextPro Tokenizer POS Tagging Enough? Solution?
6
Word segmentation VnTokenizer with accuracy upto 96%-98%.
Some improvement are to speed up vnTokenizer: Reading XML-encode data via SAX Tokenize a document by LL parser Using automaton with default transition.
7
Part-of-speech tagging
JVnTagger 91.3% VnTagger 95% VnQTag %
8
MaltParser Syntactic parsing
Tree adjoining grammar, head driven phrase structure grammar,… No software deliverable. MaltParser Open-source Independent of language Acceptable accuracy 70%
9
Named-Entity Recognition
Using rule-based method. The rule-based NER includes two part: a word searching component called gazetteer in GATE's terminology a pattern matching component called transducer Accuracy 59%
10
Coreference resolution
Approaching by heuristic rules consists of two component: Orthographical matcher (orthormatcher) with 17 rules. Co-referencer performs pronominal co-referencing and integrate everything into co-reference lists
11
Open Source Framework for Vietnamese NLP
Document Reset PR VnTokenizer Syntactic parsing Named-entity recognition Sentence splitter VnTagger MaltParser Vn-Ner Co-reference VNLP 11
12
Application of VNLP Automatic synthesis and classification webpages
Online Reputation Managerment - noti5.vn applications of sentiment analysis all mention about a brand determine positive and negative opinion 12
13
PART 5 – CONCLUSION AND FUTURE WORK
15
Thank for your attention!
Q & A
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.