Natural language processing tools Lê Đức Trọng 1
Crawler and Parser tools Crawler tools: Crawler 4j: httpClient: Parser tools: htmlParser: Jsoup html parser: Neko html parser: 2
Vietnamese NLP – Tools JVnTextPro: Sentence Segmentation, Sentence Tokenization, Word Segmentation, POS-Tagging VnToolkit: An automatic tagger for Vietnamese texts A tokenize for automatic word segmentation of Vietnamese texts A sentence detector for automatic detecting sentences of Vietnamese texts VLSP Tools: Vietnamese Chunking 3
NLP Toolkits LingPipe: Find the names of people, organizations or locations in news Automatically classify Twitter search results into categories Suggest correct spellings of queries Mallet - Machine Learning for Language Toolkit: Statistic, document classification, clustering, topic modeling, information extraction Stanford NLP softwares: Word segmentation, part-of-speech tagging, named entity recognition, chunking, parsing, classification and coreference resolution NLTK: Open source Python modules, linguistic data and documentation for research and development in natural language processing and text analytics. OpenNLP: Tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution 4
Machine learning libraries Conditional random fields (CRF) CRF: Maximum entropy (Maxent) OpenNLP, Mallet Support vector machine (SVM) libSVM: svmLight: 5