Introduction NLP Applications

Slides:

Advertisements

Similar presentations

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.

Advertisements

Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,

Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.

©2012 Paula Matuszek CSC 9010: Text Mining Applications: Text Features Dr. Paula Matuszek (610)

Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.

Matakuliah: G0922/Introduction to Linguistics Tahun: 2008 Session 11 Syntax 2.

Natural Language Processing Ellen Back, LIS489, Spring 2015.

Siemens Big Data Analysis GROUP 3: MARIO MASSAD, MATTHEW TOSCHI, TYLER TRUONG.

A Survey of NLP Toolkits Jing Jiang Mar 8, /08/20072 Outline WordNet Statistics-based phrases POS taggers Parsers Chunkers (syntax-based phrases)

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

Semiautomatic domain model building from text-data Petr Šaloun Petr Klimánek Zdenek Velart Petr Šaloun Petr Klimánek Zdenek Velart SMAP 2011, Vigo, Spain,

Natural language processing tools Lê Đức Trọng 1.

Introduction to Dialogue Systems. User Input System Output ?

A.F.K. by SoTel. An Introduction to SoTel SoTel created A.F.K., an Android application used to auto generate text message responses to other users. A.F.K.

1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.

GoBack definitions Level 1 Parts of Speech GoBack is a memorization game; the teacher asks students definitions, and when someone misses one, you go back.

1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.

Shallow Parsing for South Asian Languages -Himanshu Agrawal.

◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.

Handling Unlike Coordinated Phrases in TAG by Mixing Syntactic Category and Grammatical Function Carlos A. Prolo Faculdade de Informática – PUCRS CELSUL,

Definitions Adjectives or Adverbs Conjunctions or Interjections Nouns or Prepositions Pronouns or Verbs

Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.

LANGUAGE ARTS PARTS OF SPEECH NOUNS NOUN A noun is a word used to describe a  Person  Place  Animal  Thing.

Sentence Structure By: Amanda Garrett Bailey. What is the function of: Nouns Pronouns Verbs Adjectives Adverbs.

NLP. Introduction to NLP #include int main() { int n, reverse = 0; printf("Enter a number to reverse\n"); scanf("%d",&n); while (n != 0) { reverse =

Parts of Speech By: Miaya Nischelle Sample. NOUN A noun is a person place or thing.

Syntactic Functions of Adjectives

Twitter as a Corpus for Sentiment Analysis and Opinion Mining

Language Identification and Part-of-Speech Tagging

Lecture 9: Part of Speech

Parts of Speech Review.

Introduction to Machine Learning and Text Mining

Beginning Syntax Linda Thomas

Tools for Natural Language Processing Applications

Statistical NLP: Lecture 3

Basic Parsing with Context Free Grammars Chapter 13

Digital Text and Data Processing

Computational and Statistical Methods for Corpus Analysis: Overview

Memory Standardization

Syntactic Functions of Adjectives

University of Computer Studies, Mandalay

Sentiment Analysis Study

LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.

--Mengxue Zhang, Qingyang Li

Machine Learning in Natural Language Processing

CSCE 590 Web Scraping - NLTK

LING/C SC 581: Advanced Computational Linguistics

Improving an Open Source Question Answering System

Phil Durrant Debra Myhill Mark Brenchley

Topics in Linguistics ENG 331

Syntactic Functions of Adjectives

PART OF SPEECH TAGGING (POS)

Eiji Aramaki* Sadao Kurohashi* * University of Tokyo

Chunk Parsing CS1573: AI Application Development, Spring 2003

Text Mining & Natural Language Processing

PREPOSITIONAL PHRASES

Natural Language Processing

Linguistic Essentials

Text Mining & Natural Language Processing

CS246: Information Retrieval

Natural Language Processing

CS224N Section 3: Corpora, etc.

Unit 4 Lesson 6: Adjective or Adverb

CSCE 590 Web Scraping - NLTK

By Hossein Hematialam and Wlodek Zadrozny Presented by

CS224N Section 3: Project,Corpora

Part-of-Speech Tagging Using Hidden Markov Models

Extracting Why Text Segment from Web Based on Grammar-gram

Statistical NLP: Lecture 10

Presentation transcript:

Introduction NLP Applications

Tweets 140 Characters Also contains images, 😊emoji, #hashtag, @usernames and links Grammatically ambiguous Customer Service Requests through Social Media

Present Research Method developed for extracting keywords from Tweets. By obtaining essential keywords by imitating human question-answering logic.

In answering a question, humans focus on the Keywords What is ? your name your name

Highest token accuracy POS tagging by NLP4J - 97.64% [4] NLP - Current Tools Stanford CoreNLP [1] OpenNLP [2] NLP4J [3] Highest token accuracy POS tagging by NLP4J - 97.64% [4]

Tweets affect the token accuracy of POS taggers. Models for POS tagging TwitIE [5] TweetNLP [6] Twitter-POS tagger for Stanford CoreNLP [7] it is noisy, with linguistic errors and idiosyncratic style. Token Accuracy of Stanford CoreNLP is 97.32% [4] Twitter-POS Tagger for Stanford CoreNLP recorded accuracy of 90.5% [7]

Data Collection Keyword Extraction Implementation Methodology

Methodology : Data Collection Tweets of the months of February and March 2016 were used Dialog Axiata Twitter Profile Rejected - Domain specific nouns,verbs,interjections and aux verbs Keywords - essential for the meaning of the sentence Keyword Corpus (258 words) Rejected words Corpus (64 Words)

2. Keyword Extraction Methodology Parser 1 Stanford CoreNLP POS Tagging with Twitter Model Parser 2 Keyword Matching Parser 3 Rejected Words Matching

Stanford CoreNLP POS Tagging with Twitter Model Parser 1 Parser 2 Parser 3 divided into a Subject (Noun Phrase, NP) Predicate (Verb Phrase, VP) NP - Numbers (CD), Noun (NN - all forms), Adjectives (JJ - all forms) VP - Verbs (VB - all forms) NP & VP – essence of the meaning NP - Usernames, Emoji, Hashtags, Pronouns VP - Adverbs, Wh-adverbs, Auxiliary Verbs

Fig.1 POS Tagged Tweet (Tregex Notation) Tweet - @dialoglk Please unsubscribe cool club service .my number 0771111111 Nouns – Club(NN), service(NN), number(NN), 0771111111(CD) Verbs – please(VB) Other - @dialoglk(USR), unsubscribe(JJ), cool(JJ), my (PRP$) Fig.2 Results from Parser 1

Keyword Matching Parser 1 Parser 2 Parser 3 Tweet is matched against a Domain Specific Keywords Corpus The words not classified as NPs and VPs The NPs and VPs identified from Parser 1 Tweet

Tweet - @dialoglk Please unsubscribe cool club service Tweet - @dialoglk Please unsubscribe cool club service .my number 0771111111 Nouns – Club(NN), service(NN), number(NN), 0771111111(CD) Verbs – please(VB) Adjectives – unsubscribe(JJ), cool(JJ) Other - @dialoglk(USR), unsubscribe(JJ), cool(JJ), my (PRP$) Fig. 3 Result from Parser 2

Rejected Words Matching Parser 1 Parser 2 Parser 3 The noise from the resulting keywords from Parser 2 The keywords which have a Levenshtein Distance match of 0 with the corpus Tweet - @dialoglk Please unsubscribe cool club service .my number 0771111111 Nouns – Club(NN), service(NN), number(NN), 0771111111(CD) Verbs – please(VB) Adjectives - unsubscribe(JJ), cool(JJ) Other - @dialoglk(USR), my (PRP$) Fig. 3 Noise Removed by Parser 3

unsubscribe (JJ), cool (JJ), club (NN), Final Keywords List = unsubscribe (JJ), cool (JJ), club (NN), service (NN), number(NN), 0771111111 (CD) @dialoglk Please unsubscribe cool club service .my number 0771111111

3. Implementation Implemented using Java. Fig. 5 GUI of the Program

Evaluation Methodology Evaluated using the Turing Test.[8] “The machine to be linguistically indistinguishable from humans” [9]

Evaluation Methodology : Design 14 new Tweets Keyword sets were generated by Humans (6 categories from different fields) Non-modified System (Sys.A) Modified System (Sys.B) Human supervisors evaluated the responses Sys A - Explain

Calculation of the test results n : Total number of Tweets x : Machine and Human answers were identical y : Supervisor detected the answer generated by the Machine z : Supervisor could not detect the answer generated by the machine T : Total instances where the system was successful

Summary of Turing Test results for Sys.A TABLE II Summary of Turing Test results for Sys.B Test Case Criteria x y z T Academics 14 0.00% English Language Experts 2 12 85.71% Undergraduates 3 9 35.71% Graduates 8 42.86% Computer Science Graduates 4 7 71.43% General Public 1 92.86% Test Case Criteria x y z T Academics 3 11 78.57% English Language Experts 2 12 85.71% Undergraduates 5 7 50.00% Graduates 6 57.14% Computer Science Graduates 4 9 1 35.71% General Public Test Case Failed Test Case Passed

Summary and Conclusions TABLE III Summary of Turing Test Results The research modifies the Stanford CoreNLP with Twitter POS Tagger Model using a mix of parsers and corpora The modified system had keyword sets identical to humans The enhancements increase overall Turing Test result from 50% to 83.33% System Tested Test cases that passed Test cases that failed Success rate of the System System without Modifications 3 50.00% System with modifications 5 1 83.33%

Language supported is English Future Work Limitations The system could be evaluated with a larger population for nuanced results Language supported is English Future Work Use a complete domain specific corpus to increase accuracy Present approach could be applied to other NLP tools

References [1] C. D. Manning, J. Bauer, J. Finkel, S. J. Bethard, M. Surdeanu, and D. McClosky, “The Stanford CoreNLP Natural Language Processing Toolkit,” Proc. 52nd Annu. Meet. Assoc. Comput. Linguist. Syst. Demonstr., pp. 55–60, 2014. [2] “Welcome to Apache OpenNLP,” 2013. [Online]. Available: http://opennlp.apache.org/. [3] “emorynlp/nlp4j: NLP tools developed by Emory University,” 2016. [Online]. Available: https://github.com/emorynlp/nlp4j. [4] “POS Tagging (State of the art),” 2016. [Online]. Available: http://aclweb.org/aclwiki/index.php?title=POS_Tagging_(State_of_the_art). [Accessed: 22-Aug-2016] [5] K. Bontcheva, L. Derczynski, A. Funk, M. A. Greenwood, D. Maynard, and N. Aswani, “TwitIE: An Open- Source Information Extraction Pipeline for Microblog Text,” 2013.

References [6] O. Owoputi, B. O ’connor, C. Dyer, K. Gimpel, N. Schneider, and N. A. Smith, “Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters,” Proc. NAACL, 2013. [7] L. Derczynski, A. Ritter, S. Clark, and K. Bontcheva, “Twitter part-of-speech tagging for all: Overcoming sparse and noisy data,” Proc. Recent Adv. Nat. Lang. Process., no. September, pp. 198–206, 2013. [8] A. M. Turing, “Computing Machinery and Intelligence,” Mind, vol. 49, pp. 433–460, 1950. [9] K. Lacurts, “Criticisms of the Turing Test and Why You Should Ignore ( Most of ) Them,” Official Blog of MIT’s Course: Philosophy and Theoretical Computer Science, 2011. [Online]. Available: people.csail.mit.edu/katrina/papers/6893.pdf. [Accessed: 23-Jun-2016]. *Images obtained from online sources.