Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.

Slides:

Advertisements

Similar presentations

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.

Advertisements

Introduction to Syntax and Context-Free Grammars Owen Rambow

Syntax and Context-Free Grammars Julia Hirschberg CS 4705 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin.

Chapter 4 Syntax.

Syntactic analysis using Context Free Grammars. Analysis of language Morphological analysis – Chairs, Part Of Speech (POS) tagging – The/DT man/NN left/VBD.

Introduction to Syntax Owen Rambow September 30.

Natural Language Processing Projects Heshaam Feili

SYNTAX Introduction to Linguistics. BASIC IDEAS What is a sentence? A string of random words? If it is a sentence, does it have to be meaningful?

1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.

NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.

1 Part of Speech Tagging (Chapter 5) September 2009 Lecture #6.

Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.

PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.

1 Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:

Introduction to Syntax Owen Rambow September

Introduction to Syntax Owen Rambow October

CS 4705 Lecture 8 Word Classes and Part-of- Speech (POS) Tagging.

Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.

Matakuliah: G0922/Introduction to Linguistics Tahun: 2008 Session 11 Syntax 2.

Tagging – more details Reading: D Jurafsky & J H Martin (2000) Speech and Language Processing, Ch 8 R Dale et al (2000) Handbook of Natural Language Processing,

Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.

POS Tagging and Context-Free Grammars

Introduction to Syntax and Context-Free Grammars Owen Rambow September

تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.

Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.

Stochastic POS tagging Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers.

Constituency Tests Phrase Structure Rules

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.

ELN – Natural Language Processing Giuseppe Attardi

Chapter 4 Syntax Part II.

Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.

Parts of Speech Sudeshna Sarkar 7 Aug 2008.

Natural Language Processing Lecture 6 : Revision.

10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.

THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)

GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.

11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.

10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.

Word classes and part of speech tagging Chapter 5.

Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin.

CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.

CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?

1 Context Free Grammars October Syntactic Grammaticality Doesn’t depend on Having heard the sentence before The sentence being true –I saw a unicorn.

1 Context Free Grammars Chapter 9 (Much influenced by Owen Rambow) October 2009 Lecture #7.

Natural Language Processing

Statistical Decision-Tree Models for Parsing NLP lab, POSTECH 김 지 협.

CPSC 503 Computational Linguistics

Supertagging CMSC Natural Language Processing January 31, 2006.

CS 4705 Lecture 7 Parsing with Context-Free Grammars.

CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.

11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.

◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.

December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)

Language and Cognition Colombo, June 2011 Day 2 Introduction to Linguistic Theory, Part 3.

Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Word classes and part of speech tagging Chapter 5.

Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.

N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.

LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 3 rd.

Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.

Introduction to Syntax and Context-Free Grammars

CSC NLP -Context-Free Grammars

Introduction to Syntax and Context-Free Grammars cs

CSCI 5832 Natural Language Processing

Introduction to Syntax

David Kauchak CS159 – Spring 2019

Presentation transcript:

Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19

Admin Stuff (Homework) Ex 2.6: every possible time expression, not just those patterns that are listed in book! Ani if can’t access links on homework page

What is Syntax? Study of structure of language Specifically, goal is to relate surface form (e.g., interface to phonological component) to semantics (e.g., interface to semantic component) Morphology, phonology, semantics farmed out (mainly), issue is word order and structure Representational device is tree structure

What About Chomsky? At birth of formal language theory (comp sci) and formal linguistics Major contribution: syntax is cognitive reality Humans able to learn languages quickly, but not all languages  universal grammar is biological Goal of syntactic study: find universal principles and language-specific parameters Specific Chomskyan theories change regularly These ideas adopted by almost all contemporary syntactic theories

Types of Linguistic Activity Descriptive: provide account of syntax of a language; often good enough for NLP engineering work Explanatory: provide principles-and- parameters style account of syntax of (preferably) several languages Prescriptive: “prescriptive linguistics” is an oxymoron

Structure in Strings Some words: the a small nice big very boy girl sees likes Some good sentences: o the boy likes a girl o the small girl likes the big girl o a very small nice boy sees a very nice boy Some bad sentences: o *the boy the girl o *small boy likes nice girl Can we find subsequences of words (constituents) which in some way behave alike?

Structure in Strings Proposal 1 Some words: the a small nice big very boy girl sees likes Some good sentences: o (the) boy (likes a girl) o (the small) girl (likes the big girl) o (a very small nice) boy (sees a very nice boy) Some bad sentences: o *(the) boy (the girl) o *(small) boy (likes the nice girl)

Structure in Strings Proposal 2 Some words: the a small nice big very boy girl sees likes Some good sentences: o (the boy) likes (a girl) o (the small girl) likes (the big girl) o (a very small nice boy) sees (a very nice boy) Some bad sentences: o *(the boy) (the girl) o *(small boy) likes (the nice girl) This is better proposal: fewer types of constituents

More Structure in Strings Proposal 2 -- ctd Some words: the a small nice big very boy girl sees likes Some good sentences: o ((the) boy) likes ((a) girl) o ((the) (small) girl) likes ((the) (big) girl) o ((a) ((very) small) (nice) boy) sees ((a) ((very) nice) girl) Some bad sentences: o *((the) boy) ((the) girl) o *((small) boy) likes ((the) (nice) girl)

From Substrings to Trees (((the) boy) likes ((a) girl)) boy the likes girl a

Node Labels? ((the) boy) likes ((a) girl) Deliberately chose constituents so each one has one non-bracketed word: the head Group words by distribution of constituents they head (part-of-speech, POS): o Noun (N), verb (V), adjective (Adj), adverb (Adv), determiner (Det) Category of constituent: XP, where X is POS o NP, S, AdjP, AdvP, DetP

Node Labels (((the/ Det ) boy/ N ) likes/ V ((a/ Det ) girl/ N )) boy the likes girl a DetP NP DetP S

Word Classes (=POS) Heads of constituents fall into distributionally defined classes Additional support for class definition of word class comes from morphology

Some Points on POS Tag Sets Possible basic set: N, V, Adj, Adv, P, Det, Aux, Comp, Conj 2 supertypes: open- and closed-class o Open: N, V, Adj, Adv o Closed: P, Det, Aux, Comp, Conj Many subtypes: o eats/V  eat/VB, eat/VBP, eats/VBZ, ate/VBD, eaten/VBN, eating/VBG, o Reflect morphological form & syntactic function

More on POS Tag Sets: Problematic Cases o adjective or participle?  a seen event, a rarely seen event, an unseen event, an event rarely seen in Idaho, *a rarely seen in Idaho event o noun or adjective?  a child seat, *a very child seat, *this seat is child o preposition or particle?  he threw out the garbage, he threw the garbage out, he threw the garbage out the door, *he threw the garbage the door out

The Penn TreeBank POS Tag Set Penn Treebank: hand-annotated corpus of Wall Street Journal, 1M words 46 tags Some particularities: o to /TO not disambiguated o Auxiliaries and verbs not distinguished

Part-of-Speech Tagging Problem: assign POS tags to words in a sentence o fruit flies like a banana

Part-of-Speech Tagging Problem: assign POS tags to words in a sentence o fruit/N flies/N like/V a/DET banana/N

Part-of-Speech Tagging Problem: assign POS tags to words in a sentence o fruit/N flies/N like/V a/DET banana/N o fruit/N flies/V like/P a/DET banana/N

Part-of-Speech Tagging Problem: assign POS tags to words in a sentence o fruit/N flies/N like/V a/DET banana/N o fruit/N flies/V like/P a/DET banana/N 2nd example: o the/Det flies/N like/V a/Det banana/N

Part-of-Speech Tagging Problem: assign POS tags to words in a sentence o fruit/N flies/N like/V a/DET banana/N o fruit/N flies/V like/P a/DET banana/N 2nd example: o the/Det flies/N like/V a/Det banana/N Useful for parsing, but also partial parsing/chunking, IR, etc

Approaches to POS Tagging Hand-written rules Statistical approaches Machine learning of rules (e.g., decision trees or transformation-based learning) Role of corpus: o No corpus (hand-written) o No machine learning (hand-written) o Unsupervised learning from raw data o Supervised learning from annotated data

Methodological Points When looking at problem in NLP, need to know how to evaluate Possible evaluations: o against annotated naturally occurring corpus o against hand-crafted corpus o against human task performance Need to know baseline: how well does simple method do? Need to do topline: given evaluation, what is meaninful best result?

Methodological Points When looking at problem in NLP, need to know how to evaluate Possible evaluations: o against naturally occurring annotated corpus (POS tagging: 96%) o against hand-crafted corpus o against human task performance Need to know baseline: how well does simple method do? (POS tagging: 91%) Need to do topline: given evaluation, what is meaninful best result? (POS tagging: 97%)

Reminder: Bayes’s Law 10 students, of which 4 women & 3 smokers Probability that randomly chosen student is woman: p(w) = 0.4 Prob that rcs is smoker: p(s) = 0.3 Prob that rcs is female smoker: p(s,w) = 0.2 women smokers

Reminder: Bayes’s Law 10 students, of which 4 women & 3 smokers Prob that rcs is female smoker: p(s,w) = 0.2 women smokers Prob that a rc woman is a smoker: p(s|w) = 0.5 Probability that randomly chosen student is woman: p(w) = 0.4

Reminder: Bayes’s Law 10 students, of which 4 women & 3 smokers Prob that rcs is female smoker: p(s,w) = 0.2 = p(w) p(s|w) women smokers Prob that a rc woman is a smoker: p(s|w) = 0.5 Probability that randomly chosen student is woman: p(w) = 0.4

Reminder: Bayes’s Law 10 students, of which 4 women & 3 smokers Prob that rcs is smoker: p(s) = 0.3 Prob that rcs is female smoker: p(s,w) = 0.2 = p(s) p(w|s) women smokers Prob that rc smoker is a woman: p(w|s) = 0.66

Reminder: Bayes’s Law (end) p(s,w) = p(s) p(w|s) p(s,w) = p(w) p(s|w) So: p(s) = p(w) p(s|w) / p(w|s) p(s|w) = p(s) p(w|s) / p(w) prior probabilitylikelihood

Statistical POS Tagging Want to choose most likely string of tags (T), given the string of words (W) W = w 1, w 2, …, w n T = t 1, t 2, …, t n I.e., want argmax T p(T | W) Problem: sparse data

Statistical POS Tagging (ctd) p(T|W) = p(T,W) / p(W) = p(W|T) p (T) / p(W) argmax T p(T|W) = argmax T p(W|T) p (T) / p(W) = argmax T p(W|T) p (T)

Statistical POS Tagging (ctd) p(T) = p(t 1, t 2, …, t n-1, t n ) = p(t n | t 1, …, t n-1 ) p (t 1, …, t n-1 ) = p(t n | t 1, …, t n-1 ) p(t n-1 | t 1, …, t n-2 ) p (t 1, …, t n-2 ) =  i p(t i | t 1, …, t i-1 )   i p(t i | t i-2, t i-1 )  trigram (n-gram)

Statistical POS Tagging (ctd) p(W|T) = p(w 1, w 2, …, w n | t 1, t 2, …, t n ) =  i p(w i | w 1, …, w i-1, t 1, t 2, …, t n )   i p(w i | t i )

Statistical POS Tagging (ctd) argmax T p(T|W) = argmax T p(W|T) p (T)  argmax T  i p(w i | t i ) p(t i | t i-2, t i-1 ) Relatively easy to get data for parameter estimation (next slide) But: need smoothing for unseen words Easy to determine the argmax (Viterbi algorithm in time linear in sentence length)

Probability Estimation for trigram POS Tagging Maximum-Likelihood Estimation p’ ( w i | t i ) = c( w i, t i ) / c( t i ) p’ ( t i | t i-2, t i-1 ) = c( t i, t i-2, t i-1 ) / c( t i-2, t i-1 )

Statistical POS Tagging Method common to many tasks in speech & NLP “Noisy Channel Model”, Hidden Markov Model