Download presentation
Presentation is loading. Please wait.
1
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19
2
Admin Stuff These slides available at o http://www.cs.columbia.edu/~rambow/teaching.html http://www.cs.columbia.edu/~rambow/teaching.html For Eliza in homework, you can use a tagger or chunker, if you want – details at: o http://www.cs.columbia.edu/~ani/cs4705.html http://www.cs.columbia.edu/~ani/cs4705.html Special office hours (Ani): today after class, tomorrow at 10am in CEPSR 721
3
Statistical POS Tagging Want to choose most likely string of tags (T), given the string of words (W) W = w 1, w 2, …, w n T = t 1, t 2, …, t n I.e., want argmax T p(T | W) Problem: sparse data
4
Statistical POS Tagging (ctd) p(T|W) = p(T,W) / p(W) = p(W|T) p (T) / p(W) argmax T p(T|W) = argmax T p(W|T) p (T) / p(W) = argmax T p(W|T) p (T)
5
Statistical POS Tagging (ctd) p(T) = p(t 1, t 2, …, t n-1, t n ) = p(t n | t 1, …, t n-1 ) p (t 1, …, t n-1 ) = p(t n | t 1, …, t n-1 ) p(t n-1 | t 1, …, t n-2 ) p (t 1, …, t n-2 ) = i p(t i | t 1, …, t i-1 ) i p(t i | t i-2, t i-1 ) trigram (n-gram)
6
Statistical POS Tagging (ctd) p(W|T) = p(w 1, w 2, …, w n | t 1, t 2, …, t n ) = i p(w i | w 1, …, w i-1, t 1, t 2, …, t n ) i p(w i | t i )
7
Statistical POS Tagging (ctd) argmax T p(T|W) = argmax T p(W|T) p (T) argmax T i p(w i | t i ) p(t i | t i-2, t i-1 ) Relatively easy to get data for parameter estimation (next slide) But: need smoothing for unseen words Easy to determine the argmax (Viterbi algorithm in time linear in sentence length)
8
Probability Estimation for trigram POS Tagging Maximum-Likelihood Estimation p’ ( w i | t i ) = c( w i, t i ) / c( t i ) p’ ( t i | t i-2, t i-1 ) = c( t i, t i-2, t i-1 ) / c( t i-2, t i-1 )
9
Statistical POS Tagging Method common to many tasks in speech & NLP “Noisy Channel Model”, Hidden Markov Model
10
Back to Syntax (((the/ Det ) boy/ N ) likes/ V ((a/ Det ) girl/ N )) boy the likes girl a DetP NP DetP S Phrase-structure tree nonterminal symbols = constituents terminal symbols = words
11
Phrase Structure and Dependency Structure likes/ V boy/ N girl/ N the/ Det a/ Det boy the likes girl a DetP NP DetP S
12
Types of Dependency likes/ V boy/ N girl/ N a/ Det small/ Adj the/ Det very/ Adv sometimes/ Adv Obj Subj Adj(unct) Fw Adj
13
Grammatical Relations Types of relations between words o Arguments: subject, object, indirect object, prepositional object o Adjuncts: temporal, locative, causal, manner, … o Function Words
14
Subcategorization List of arguments of a word (typically, a verb), with features about realization (POS, perhaps case, verb form etc) In canonical order Subject-Object- IndObj Example: o like: N-N, N-V(to-inf) o see: N, N-N, N-N-V(inf) Note: J&M talk about subcategorization only within VP
15
Where is the VP? boy the likes girl a DetP NP DetP S boy the likes DetP NP girl a NP DetP S VP
16
Where is the VP? Existence of VP is a linguistic (empirical) claim, not a methodological claim Semantic evidence??? Syntactic evidence o VP-fronting (and quickly clean the carpet he did! ) o VP-ellipsis (He cleaned the carpets quickly, and so did she ) o Can have adjuncts before and after VP, but not in VP (He often eats beans, *he eats often beans ) Note: in all right-branching structures, issue is different again
17
Penn Treebank, Again Syntactically annotated corpus (phrase structure) PTB is not naturally occurring data! Represents a particular linguistic theory (but a fairly “vanilla” one) Particularities o Very indirect representation of grammatical relations (need for head percolation tables) o Completely flat structure in NP (brown bag lunch, pink-and-yellow child seat ) o Has flat Ss, flat VPs
18
Context-Free Grammars Defined in formal language theory (comp sci) Terminals, nonterminals, start symbol, rules String-rewriting system Start with start symbol, rewrite using rules, done when only terminals left
19
CFG: Example Rules o S NP VP o VP V NP o NP Det N | AdjP NP o AdjP Adj | Adv AdjP o N boy | girl o V sees | likes o Adj big | small o Adv very o Det a | the the very small boy likes a girl
20
Derivations of CFGs String rewriting system: we derive a string (=derived structure) But derivation history represented by phrase-structure tree (=derivation structure)!
21
Grammar Equivalence and Normal Form Can have different grammars that generate same set of strings (weak equivalence) Can have different grammars that have same set of derivation trees (string equivalence)
22
Nobody Uses CFGs Only (Except Intro NLP Courses) o All major syntactic theories (Chomsky, LFG, HPSG, TAG-based theories) represent both phrase structure and dependency, in one way or another o All successful parsers currently use statistics about phrase structure and about dependency
23
Massive Ambiguity of Syntax For a standard sentence, and a grammar with wide coverage, there are 1000s of derivations! Example: o The large head master told the man that he gave money and shares in a letter on Wednesday
24
Some Syntactic Constructions: Wh -Movement
25
Control
26
Raising
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.