Assignment Demonstration

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.
CS626 Assignments Abhirut Gupta Mandar Joshi Piyush Dungarwal (Group No. 6)
Final Assignment Demo 11 th Nov, 2012 Deepak Suyel Geetanjali Rakshit Sachin Pawar CS 626 – Sppech, NLP and the Web.
Parts of Speech Generally speaking, the “grammatical type” of word: –Verb, Noun, Adjective, Adverb, Article, … We can also include inflection: –Verbs:
Statistical NLP: Lecture 3
LING 388 Language and Computers Lecture 22 11/25/03 Sandiway FONG.
Stemming, tagging and chunking Text analysis short of parsing.
Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
BIOI 7791 Projects in bioinformatics Spring 2005 March 22 © Kevin B. Cohen.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Natural Language Processing Assignment Group Members: Soumyajit De Naveen Bansal Sanobar Nishat.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Outline POS tagging Tag wise accuracy Graph- tag wise accuracy
인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with.
Lecture 10 NLTK POS Tagging Part 3 Topics Taggers Rule Based Taggers Probabilistic Taggers Transformation Based Taggers - Brill Supervised learning Readings:
Hindi Parts-of-Speech Tagging & Chunking Baskaran S MSRI.
S1: Chapter 1 Mathematical Models Dr J Frost Last modified: 6 th September 2015.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
Linguistic Essentials
Tokenization & POS-Tagging
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 17 (14/03/06) Prof. Pushpak Bhattacharyya IIT Bombay Formulation of Grammar.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Supertagging CMSC Natural Language Processing January 31, 2006.
CS 4705 Lecture 7 Parsing with Context-Free Grammars.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Word classes and part of speech tagging Chapter 5.
NATURAL LANGUAGE PROCESSING
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 3 rd.
Part-Of-Speech Tagging Radhika Mamidi. POS tagging Tagging means automatic assignment of descriptors, or tags, to input tokens. Example: “Computational.
Language Identification and Part-of-Speech Tagging
Lecture 9: Part of Speech
Approaches to Machine Translation
Introduction to Machine Learning and Text Mining
Statistical NLP: Lecture 3
Basic Parsing with Context Free Grammars Chapter 13
CSC 594 Topics in AI – Natural Language Processing
CSCI 5832 Natural Language Processing
Machine Learning in Natural Language Processing
CS 388: Natural Language Processing: Syntactic Parsing
Natural Language - General
LING/C SC 581: Advanced Computational Linguistics
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Constraining Chart Parsing with Partial Tree Bracketing
N-Gram Model Formulas Word sequences Chain rule of probability
Approaches to Machine Translation
Chunk Parsing CS1573: AI Application Development, Spring 2003
Linguistic Essentials
Extracting Recipes from Chemical Academic Papers
Natural Language Processing
Hindi POS Tagger By Naveen Sharma ( )
David Kauchak CS159 – Spring 2019
Artificial Intelligence 2004 Speech & Natural Language Processing
Part-of-Speech Tagging Using Hidden Markov Models
Extracting Why Text Segment from Web Based on Grammar-gram
Presentation transcript:

Assignment Demonstration NATURAL LANGUAGE PROCESSING Assignment Demonstration Group:9 Kritika Jain Vinita Sharma Rucha Kulkarni

Part 2: Discriminative Vs Generative model Outline Part 1: Pos-Tagger Bigram Trigram Part 2: Discriminative Vs Generative model Part 3: Next Word Prediction Part 4: Pos-Tagging with A* Part 5: Parser Projection Part 6: Yago Path finding

Part 1: Pos-Tagger

Unkown words : Handled by Suffix matching PoS Tagger Data Sparsity (in Trigrams) : Handled by smoothing using linear interpolation of unigram, bigram and trigram probabilities Unkown words : Handled by Suffix matching

5-folds Cross Validation Bigrams: Trigrams: Case 1 Case 2 Case 3 Case 4 Case 5 Accuracy 90.55 90.01 90.54 91.50 90.26 Case 1 Case 2 Case 3 Case 4 Case 5 Accuracy 91.318 91.439 91.617 91.69 91.5

AVP: 70% to 73.5% NPO: 70.85% to 81.09% UNC: 14.9% to 44.64% Bigram Vs Trigram AVP: 70% to 73.5% NPO: 70.85% to 81.09% UNC: 14.9% to 44.64%

Handling Interjections These are often followed by “!” eg: hi! , what! etc Words followed by exclamation marks are tagged as ITJ. Per PoS Accuracy of ITJ: Without handling: 40% With Handling: 68.9%

Statistical Comparision

Discriminative Vs Generative Model Part 2: Discriminative Vs Generative Model

Per PoS Accuracy Poor Accuracy tags (Mostly confused): ITJ (AJ0) PNI (CRD) VDI, VHI, VHN (VDB, VHB, VHD) VVB, VVD, VVI, VVN (AJ0, NN1) UNC (NN1) ZZ0 (AT0)

Result Accuracy: 81.948% F-Score: 0.8179

Discriminative Vs Generative Accuracy Discriminative (unigram) model: 81.948 % Generative (bigram) model: 90.57%

Comparison of all Tagging Algorithms

Comparison of all Tagging Algorithms

Part 3: Next Word Prediction

Tagged Vs Untagged Model

Tagged Vs Untagged Model Sentence tagged untagged you have to write to the I don't drink: and . I am tired of and

Part 4: PoS Tagging with AStar

Cost of each edge: -log(p(t2/t1)*p(t2->w2)) Description Cost of each edge: -log(p(t2/t1)*p(t2->w2)) where the edge is from t1 to t2 F(x)=H(x)+G(x) H(x)=Distance to the target node (i.e “$” ) G(x)=Sum of costs from the start node (i.e “^” ) Unknown words are handled by Suffix matching.

Viterbi Vs Astar Accuracy of Pos-Tagger Viterbi: 90.57%(Bigram), 91.83%(Trigram) Astar: 90.6%

Part 5: Parser Projection

Mapping from syntactic rules of English to corresponding rule in Hindi Parser Projection Mapping from syntactic rules of English to corresponding rule in Hindi Input: Sentence to be parsed, in Hindi, English Output: Parse tree in Hindi Steps: English sentence is parsed using Standford Parser Dependencies are converted into rules in Hindi Hindi sentence is mapped to the generated tree

Mapping of Dependencies Constraints to be considered in the conversion: Inversion of a few rules. Elimination of the words like “to”, “the” etc. Mapping of phrases into a single word and vice-versa.

Mapping of Dependencies-Examples The train is arriving on time. DT → The is mapped to DT → NULL

Mapping of Dependencies-Examples The train is arriving on time. DT → The is mapped to DT → NULL

Mapping of Dependencies-Examples I went to the farm to pick tomatoes TO → to is mapped to TO → NULL & TO → के लिए

Mapping of Dependencies-Examples India is a land of various cultures.

Mapping of Dependencies-Examples NP → JJ NN remains the same.

Mapping of Dependencies-Examples NP → JJ NN remains the same.

Mapping of Dependencies-Examples I will be coming late. ADVP → RB becomes ADVP → ADV P

Mapping of Dependencies-Examples I would like to go for a walk by the beach . Verb modifiers become NULL MD → would becomes MD → NULL

Mapping of Dependencies-Examples I would like to go for a walk by the beach .. DT → a becomes DT → एक

Mapping of Dependencies-Examples What is your name ? WHNP → WP becomes WHNP → WP VP SQ → VBZ NP becomes SQ → NP

Mapping of Dependencies-Examples When will you come ?

Mapping of Dependencies-Examples Have you done your home-work ? S → SQ becomes S → SBARQ, and then SBARQ → WHNP SQ

Mapping of Dependencies- Examples

Mapping of Dependencies- Examples

Mapping of Dependencies- Examples

Part 6: Yago Path finding

BFS traversal from the source and the target words. Description BFS traversal from the source and the target words. Intersection of the search spaces is performed. A non empty intersection set results to the end of search. Backtrack to find the relations in the path.

Example 1 Given Words Sachin Tandulkar Lata Mangeshkar

<wasBornIn> Mumbai <isLeaderOf> Suthaliya Example 1 Sachin Tendulkar: <wasBornIn> Mumbai <isLeaderOf> Suthaliya <hasFamilyName> Tendulkar …........ Lata Mangeshkar: <linksTo> Kishore Kumar <wasBornIn> Indore <created> Dil to hai dil …........

Example 1 Sachin Tendulkar Lata Mangeshkar Indore Suthaliya Mumbai Kishore Kumar Dil to hai dil Tendulkar Madhubala Mumbai

Example 2 Given Words: Marilyn Monroe Atal Bihari vajpayee

Marilyn Monroe expands to: <IsMarriedTo> Joe DiMaggio Example 2 Marilyn Monroe expands to: <IsMarriedTo> Joe DiMaggio <wasBornIn> Los Angeles <actedIn> River_of_no_return Atal Bihari vajpayee expands to: India <LinksTo> <hasGender> Male <hasWonPrize> Padma_Vibhushan

Example 2 Marilyn Monroe Atal Bihari Vajpayee Padma Vibhushan Joe DiMaggio Los Angeles India male River_of_no_return United States Israel Israel

Part 7: NLTK

Natural Language Tool Kit NLTK Natural Language Tool Kit Platform for building Python programs to work with human language data.

-ment, -ness produces Nouns happy → happiness, govern → government Category of words Morphological Clues -ment, -ness produces Nouns happy → happiness, govern → government present participle of a verb ends in -ing etc Syntactic Clues An adjective in English is that it can occur immediately before a noun, or immediately following the words be or very. a. the near window b. The end is (very) near. Semantic Clues The best-known definition of a noun is semantic: "the name of a person, place or thing" Semantic criteria for word classes are treated with suspicion, mainly because they are hard to formalize

Morphological Analysis Go away! He sometimes goes to the cafe. All the cakes have gone. We went on the excursion. Four distinct grammatical forms, all tagged as VB in some tagset. Brown Corpus however, tag them as VB (go), VBZ (goes), VBN (gone), VBG (going), VBD (went)

Various tagging models employed: Default Tagger PoS Tagging Various tagging models employed: Default Tagger Regular Expression Tagger Unigram Tagger N-Gram Tagger Techniques for combining models: Back-off Cut-off Data-Structures available: Dictionaries pos = {'furiously': 'adv', 'ideas': 'n', 'colorless': 'adj'}.

Chunking NLTK provides functions for creating a grammar for purpose of chunking. The chunking done in NLTK is bottom up that is done on POS-tagged sentences.

Thank You