NLP. Parsing ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (,,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (,,) ) (VP (MD will) (VP (VB join) (NP (DT.

Slides:



Advertisements
Similar presentations
Feature Forest Models for Syntactic Parsing Yusuke Miyao University of Tokyo.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Expectation Maximization Dekang Lin Department of Computing Science University of Alberta.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
LING 388: Language and Computers Sandiway Fong Lecture 2.
Modeling the Evolution of Product Entities Priya Radhakrishnan 1, Manish Gupta 1,2, Vasudeva Varma 1 1 Search and Information Extraction Lab, IIIT-Hyderabad,
Using Percolated Dependencies in PBSMT Ankit K. Srivastava and Andy Way Dublin City University CLUKI XII: April 24, 2009.
Statistical NLP: Lecture 3
Semantic Role Labeling Abdul-Lateef Yussiff
CS 388: Natural Language Processing: Statistical Parsing
Probabilistic Parsing Chapter 14, Part 2 This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney.
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
LING 581: Advanced Computational Linguistics Lecture Notes January 19th.
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
Using Treebanks tgrep2 Lecture 2: 07/12/2011. Using Corpora For discovery For evaluation of theories For identifying tendencies – distribution of a class.
Partial Prebracketing to Improve Parser Performance John Judge NCLT Seminar Series 7 th December 2005.
6/29/051 New Frontiers in Corpus Annotation Workshop, 6/29/05 Ann Bies – Linguistic Data Consortium* Seth Kulick – Institute for Research in Cognitive.
Part-of-Speech Tagging & Sequence Labeling
TopicTrend By: Jovian Lin Discover Emerging and Novel Research Topics.
PARSING David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.
LING/C SC/PSYC 438/538 Lecture 27 Sandiway Fong. Administrivia 2 nd Reminder – 538 Presentations – Send me your choices if you haven’t already.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
LING 581: Advanced Computational Linguistics Lecture Notes February 12th.
Syntax The study of how words are ordered and grouped together Key concept: constituent = a sequence of words that acts as a unit he the man the short.
December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)
April 17, 2007MT Marathon: Tree-based Translation1 Tree-based Translation with Tectogrammatical Representation Jan Hajič Institute of Formal and Applied.
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with.
CS626: NLP, Speech and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 15, 17: Parsing Ambiguity, Probabilistic Parsing, sample seminar 17.
Albert Gatt Corpora and Statistical Methods Lecture 11.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
Linguistic Essentials
Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-16: Probabilistic parsing; computing probability of.
The man bites the dog man bites the dog bites the dog the dog dog Parse Tree NP A N the man bites the dog V N NP S VP A 1. Sentence  noun-phrase verb-phrase.
CSA2050 Introduction to Computational Linguistics Parsing I.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
NLP. Introduction to NLP Motivation –A lot of the work is repeated –Caching intermediate results improves the complexity Dynamic programming –Building.
CPSC 503 Computational Linguistics
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.
Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
NLP. Introduction to NLP Background –From the early ‘90s –Developed at the University of Pennsylvania –(Marcus, Santorini, and Marcinkiewicz 1993) Size.
Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Slides are from Rebecca Hwa, Ray Mooney.
NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
Part-of-Speech Tagging & Sequence Labeling Hongning Wang
NLP. Introduction to NLP #include int main() { int n, reverse = 0; printf("Enter a number to reverse\n"); scanf("%d",&n); while (n != 0) { reverse =
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 3 rd.
Prototype-Driven Grammar Induction Aria Haghighi and Dan Klein Computer Science Division University of California Berkeley.
Clayton Greenberg Princeton University April 21, 2013.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
LING 581: Advanced Computational Linguistics Lecture Notes February 24th.
Coping with Problems in Grammars Automatically Extracted from Treebanks Carlos A. Prolo Computer and Info. Science Dept. University of Pennsylvania.
Statistical Parsing IP disclosure: Content borrowed from J&M 3 rd edition and Raymond Mooney.
Natural Language Processing Vasile Rus
Statistical Natural Language Parsing Parsing: The rise of data and statistics.
COSC 6336 Natural Language Processing Statistical Parsing
CS 388: Natural Language Processing: Statistical Parsing
LING/C SC 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
CS : Language Technology For The Web/Natural Language Processing
Linguistic Essentials
LING/C SC 581: Advanced Computational Linguistics
David Kauchak CS159 – Spring 2019
Artificial Intelligence 2004 Speech & Natural Language Processing
LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong.
Presentation transcript:

NLP

Parsing

( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (,,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (,,) ) (VP (MD will) (VP (VB join) (NP (DT the) (NN board) ) (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director) )) (NP-TMP (NNP Nov.) (CD 29) ))) (..) ))

( (S (NP-SBJ (NNP Mr.) (NNP Vinken) ) (VP (VBZ is) (NP-PRD (NP (NN chairman) ) (PP (IN of) (NP (NP (NNP Elsevier) (NNP N.V.) ) (,,) (NP (DT the) (NNP Dutch) (VBG publishing) (NN group) ))))) (..) ))

join board as director High (verbal): Low (nominal): is chairman of Elsevier

Jane NNP VBDDT NN caughtthebutterfly NP VP NP S with the net DTNN NP IN PP

Examples: –Lucy’s plane leaves Detroit on Monday. - high –Jenna met Mike at the concert. - high –This painting must cost millions of dollars. - low High or low attachment? –Alicia ate spaghetti from Italy. –Alicia ate spaghetti with meatballs. –Alicia ate spaghetti with a fork. –Alicia ate spaghetti with Justin. –Alicia ate spaghetti with delight. –Alicia ate spaghetti on Friday.

High or low attachment? –Alicia ate spaghetti from Italy. - low –Alicia ate spaghetti with meatballs. - low –Alicia ate spaghetti with a fork. - high –Alicia ate spaghetti with Justin. - high –Alicia ate spaghetti with delight. - high –Alicia ate spaghetti on Friday. - high

Police shoot man with box cutters. (S (NP (N Police)) (VP (V shoot) (NP (N man) (PP (P with) (NP (N box) (N cutters)))))) (?) (S (NP (N Police)) (VP (V shoot) (NP (N man)) (PP (P with) (NP (N box) (N cutters)))))

Input –a prepositional phrase and the surrounding context Output –a binary label: 0(high) or 1(low) In practice –the context consists only of four words: the preposition, the verb before the preposition, the noun before the preposition, and the noun after the preposition Example: –join board as director Why?

Because almost all the information needed to classify a prepositional phrase’s attachment as high or low is contained in these four features. Furthermore, using only these tuples of four features allows for a consistent and scaleable approach.

Sent #Verb Noun 1 Preposition Noun 2 Class 0joinboardasdirectorV 2nameddirectorofconglomerateN 3causedpercentageofdeathsN 6bringattentiontoproblemV 12ledteamofresearchersN 16includingthreewithcancerN 24imposedbanonusesN 26madepaperforfiltersN 28dumpedsacksofmaterialN 28dumpedsacksintobinV

The linguistics (and psycholinguistics) literature offers competitive explanations for attachment. One theory (Kimball 1973) favors the so-called right association rule. It says that, given a new phrase and two choices for attachment, people tend to attach the new phrase with the more recent (“rightmost” within the sentence) of the candidate nodes, resulting in low attachment. Alternatively, the minimal attachment principle (Frazier 1978) favors an attachment that results in the syntax tree having fewer additional syntactic nodes (in this case, favoring high attachment). As one can see from the statistics, none of these methods alone can explain the high prevalence of both types of attachment.

Some other observations can be made by performing statistical analysis of a training set. The standard corpus used for this sort of analyses comes from (RRR 1994) and includes 27,937 prepositional phrases extracted from the Penn Treebank (Marcus et al. 1993), divided into three groups (20,801 training, 4039 development, and 3097 test). This data representation makes the assumption that additional context is only marginally more useful for classification purposes compared to the four features in the table (verb, noun1, preposition, and noun2). For comparison, the sentence matching the data point “bring attention to problem” is actually “Although preliminary findings were reported more than a year ago, the latest results appear in today's New England Journal of Medicine, a forum likely to bring new attention to the problem.” It is unlikely that the information in the first ¾ of the sentence will affect the classification of the prepositional phrase “to the problem”.

NLP