Introduction NLP Applications

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
©2012 Paula Matuszek CSC 9010: Text Mining Applications: Text Features Dr. Paula Matuszek (610)
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Matakuliah: G0922/Introduction to Linguistics Tahun: 2008 Session 11 Syntax 2.
Natural Language Processing Ellen Back, LIS489, Spring 2015.
Siemens Big Data Analysis GROUP 3: MARIO MASSAD, MATTHEW TOSCHI, TYLER TRUONG.
A Survey of NLP Toolkits Jing Jiang Mar 8, /08/20072 Outline WordNet Statistics-based phrases POS taggers Parsers Chunkers (syntax-based phrases)
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Semiautomatic domain model building from text-data Petr Šaloun Petr Klimánek Zdenek Velart Petr Šaloun Petr Klimánek Zdenek Velart SMAP 2011, Vigo, Spain,
Natural language processing tools Lê Đức Trọng 1.
Introduction to Dialogue Systems. User Input System Output ?
A.F.K. by SoTel. An Introduction to SoTel SoTel created A.F.K., an Android application used to auto generate text message responses to other users. A.F.K.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
GoBack definitions Level 1 Parts of Speech GoBack is a memorization game; the teacher asks students definitions, and when someone misses one, you go back.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
Handling Unlike Coordinated Phrases in TAG by Mixing Syntactic Category and Grammatical Function Carlos A. Prolo Faculdade de Informática – PUCRS CELSUL,
Definitions Adjectives or Adverbs Conjunctions or Interjections Nouns or Prepositions Pronouns or Verbs
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
LANGUAGE ARTS PARTS OF SPEECH NOUNS NOUN A noun is a word used to describe a  Person  Place  Animal  Thing.
Sentence Structure By: Amanda Garrett Bailey. What is the function of: Nouns Pronouns Verbs Adjectives Adverbs.
NLP. Introduction to NLP #include int main() { int n, reverse = 0; printf("Enter a number to reverse\n"); scanf("%d",&n); while (n != 0) { reverse =
Parts of Speech By: Miaya Nischelle Sample. NOUN A noun is a person place or thing.
Syntactic Functions of Adjectives
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
Language Identification and Part-of-Speech Tagging
Lecture 9: Part of Speech
Parts of Speech Review.
Introduction to Machine Learning and Text Mining
Beginning Syntax Linda Thomas
Tools for Natural Language Processing Applications
Statistical NLP: Lecture 3
Basic Parsing with Context Free Grammars Chapter 13
Digital Text and Data Processing
Computational and Statistical Methods for Corpus Analysis: Overview
Memory Standardization
Syntactic Functions of Adjectives
University of Computer Studies, Mandalay
Sentiment Analysis Study
LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.
--Mengxue Zhang, Qingyang Li
Machine Learning in Natural Language Processing
CSCE 590 Web Scraping - NLTK
LING/C SC 581: Advanced Computational Linguistics
Improving an Open Source Question Answering System
Phil Durrant Debra Myhill Mark Brenchley
Topics in Linguistics ENG 331
Syntactic Functions of Adjectives
PART OF SPEECH TAGGING (POS)
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Chunk Parsing CS1573: AI Application Development, Spring 2003
Text Mining & Natural Language Processing
PREPOSITIONAL PHRASES
Natural Language Processing
Linguistic Essentials
Text Mining & Natural Language Processing
CS246: Information Retrieval
The Phrase.
Natural Language Processing
CS224N Section 3: Corpora, etc.
Unit 4 Lesson 6: Adjective or Adverb
CSCE 590 Web Scraping - NLTK
By Hossein Hematialam and Wlodek Zadrozny Presented by
CS224N Section 3: Project,Corpora
Part-of-Speech Tagging Using Hidden Markov Models
Extracting Why Text Segment from Web Based on Grammar-gram
Statistical NLP: Lecture 10
Presentation transcript:

Introduction NLP Applications

Tweets 140 Characters Also contains images, 😊emoji, #hashtag, @usernames and links Grammatically ambiguous Customer Service Requests through Social Media

Present Research Method developed for extracting keywords from Tweets. By obtaining essential keywords by imitating human question-answering logic.

In answering a question, humans focus on the Keywords What is ? your name your name

Highest token accuracy POS tagging by NLP4J - 97.64% [4] NLP - Current Tools Stanford CoreNLP [1] OpenNLP [2] NLP4J [3] Highest token accuracy POS tagging by NLP4J - 97.64% [4]

Tweets affect the token accuracy of POS taggers. Models for POS tagging TwitIE [5] TweetNLP [6] Twitter-POS tagger for Stanford CoreNLP [7] it is noisy, with linguistic errors and idiosyncratic style. Token Accuracy of Stanford CoreNLP is 97.32% [4] Twitter-POS Tagger for Stanford CoreNLP recorded accuracy of 90.5% [7]

Data Collection Keyword Extraction Implementation Methodology

Methodology : Data Collection Tweets of the months of February and March 2016 were used Dialog Axiata Twitter Profile Rejected - Domain specific nouns,verbs,interjections and aux verbs Keywords - essential for the meaning of the sentence Keyword Corpus (258 words) Rejected words Corpus (64 Words)

2. Keyword Extraction Methodology Parser 1 Stanford CoreNLP POS Tagging with Twitter Model Parser 2 Keyword Matching Parser 3 Rejected Words Matching

Stanford CoreNLP POS Tagging with Twitter Model Parser 1 Parser 2 Parser 3 divided into a Subject (Noun Phrase, NP) Predicate (Verb Phrase, VP) NP - Numbers (CD), Noun (NN - all forms), Adjectives (JJ - all forms) VP - Verbs (VB - all forms) NP & VP – essence of the meaning NP - Usernames, Emoji, Hashtags, Pronouns VP - Adverbs, Wh-adverbs, Auxiliary Verbs

Fig.1 POS Tagged Tweet (Tregex Notation) Tweet - @dialoglk Please unsubscribe cool club service .my number 0771111111 Nouns – Club(NN), service(NN), number(NN), 0771111111(CD) Verbs – please(VB) Other - @dialoglk(USR), unsubscribe(JJ), cool(JJ), my (PRP$) Fig.2 Results from Parser 1

Keyword Matching Parser 1 Parser 2 Parser 3 Tweet is matched against a Domain Specific Keywords Corpus The words not classified as NPs and VPs The NPs and VPs identified from Parser 1 Tweet

Tweet - @dialoglk Please unsubscribe cool club service Tweet - @dialoglk Please unsubscribe cool club service .my number 0771111111 Nouns – Club(NN), service(NN), number(NN), 0771111111(CD) Verbs – please(VB) Adjectives – unsubscribe(JJ), cool(JJ) Other - @dialoglk(USR), unsubscribe(JJ), cool(JJ), my (PRP$) Fig. 3 Result from Parser 2

Rejected Words Matching Parser 1 Parser 2 Parser 3 The noise from the resulting keywords from Parser 2 The keywords which have a Levenshtein Distance match of 0 with the corpus Tweet - @dialoglk Please unsubscribe cool club service .my number 0771111111 Nouns – Club(NN), service(NN), number(NN), 0771111111(CD) Verbs – please(VB) Adjectives - unsubscribe(JJ), cool(JJ) Other - @dialoglk(USR), my (PRP$) Fig. 3 Noise Removed by Parser 3

unsubscribe (JJ), cool (JJ), club (NN), Final Keywords List = unsubscribe (JJ), cool (JJ), club (NN), service (NN), number(NN), 0771111111 (CD) @dialoglk Please unsubscribe cool club service .my number 0771111111

3. Implementation Implemented using Java. Fig. 5 GUI of the Program

Evaluation Methodology Evaluated using the Turing Test.[8] “The machine to be linguistically indistinguishable from humans” [9]

Evaluation Methodology : Design 14 new Tweets Keyword sets were generated by Humans (6 categories from different fields) Non-modified System (Sys.A) Modified System (Sys.B) Human supervisors evaluated the responses Sys A - Explain

Calculation of the test results n : Total number of Tweets x : Machine and Human answers were identical y : Supervisor detected the answer generated by the Machine z : Supervisor could not detect the answer generated by the machine T : Total instances where the system was successful

Summary of Turing Test results for Sys.A TABLE II Summary of Turing Test results for Sys.B Test Case Criteria x y z T Academics 14 0.00% English Language Experts 2 12 85.71% Undergraduates 3 9 35.71% Graduates 8 42.86% Computer Science Graduates 4 7 71.43% General Public 1 92.86% Test Case Criteria x y z T Academics 3 11 78.57% English Language Experts 2 12 85.71% Undergraduates 5 7 50.00% Graduates 6 57.14% Computer Science Graduates 4 9 1 35.71% General Public Test Case Failed Test Case Passed

Summary and Conclusions TABLE III Summary of Turing Test Results The research modifies the Stanford CoreNLP with Twitter POS Tagger Model using a mix of parsers and corpora The modified system had keyword sets identical to humans The enhancements increase overall Turing Test result from 50% to 83.33% System Tested Test cases that passed Test cases that failed Success rate of the System System without Modifications 3 50.00% System with modifications 5 1 83.33%

Language supported is English Future Work Limitations The system could be evaluated with a larger population for nuanced results Language supported is English Future Work Use a complete domain specific corpus to increase accuracy Present approach could be applied to other NLP tools

References [1] C. D. Manning, J. Bauer, J. Finkel, S. J. Bethard, M. Surdeanu, and D. McClosky, “The Stanford CoreNLP Natural Language Processing Toolkit,” Proc. 52nd Annu. Meet. Assoc. Comput. Linguist. Syst. Demonstr., pp. 55–60, 2014. [2] “Welcome to Apache OpenNLP,” 2013. [Online]. Available: http://opennlp.apache.org/. [3] “emorynlp/nlp4j: NLP tools developed by Emory University,” 2016. [Online]. Available: https://github.com/emorynlp/nlp4j. [4] “POS Tagging (State of the art),” 2016. [Online]. Available: http://aclweb.org/aclwiki/index.php?title=POS_Tagging_(State_of_the_art). [Accessed: 22-Aug-2016] [5] K. Bontcheva, L. Derczynski, A. Funk, M. A. Greenwood, D. Maynard, and N. Aswani, “TwitIE: An Open- Source Information Extraction Pipeline for Microblog Text,” 2013.

References [6] O. Owoputi, B. O ’connor, C. Dyer, K. Gimpel, N. Schneider, and N. A. Smith, “Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters,” Proc. NAACL, 2013. [7] L. Derczynski, A. Ritter, S. Clark, and K. Bontcheva, “Twitter part-of-speech tagging for all: Overcoming sparse and noisy data,” Proc. Recent Adv. Nat. Lang. Process., no. September, pp. 198–206, 2013. [8] A. M. Turing, “Computing Machinery and Intelligence,” Mind, vol. 49, pp. 433–460, 1950. [9] K. Lacurts, “Criticisms of the Turing Test and Why You Should Ignore ( Most of ) Them,” Official Blog of MIT’s Course: Philosophy and Theoretical Computer Science, 2011. [Online]. Available: people.csail.mit.edu/katrina/papers/6893.pdf. [Accessed: 23-Jun-2016]. *Images obtained from online sources.