Iceland 5/30-6/1/07 1 Parsing with Morphological Information for Treebank Construction Seth Kulick University of Pennsylvania.

Iceland 5/30-6/1/07 1 Parsing with Morphological Information for Treebank Construction Seth Kulick University of Pennsylvania

Iceland 5/30-6/1/07 2 Outline  Motivation: Parsing for Treebanking Experience at Penn Issues for languages with more morphology  Parsing with Morphological Features Generative Parsers Discriminative Parsers  Conclusions, Issues, Questions

Iceland 5/30-6/1/07 3 Parsing for Treebanking  Major area of NLP over last decade – development of statistical parsers Trained on treebank (sentence, tree) pairs Output most likely parse for new sentence Handling ambiguity  Utility for treebanking Quicker for annotator to correct parser output than create structure from scratch (unless parser is really bad!)

Iceland 5/30-6/1/07 4 How Successful are Parsers?  What are parsers used for? Generate conference papers —Evaluation can be isolation As step in NLP pipeline for information extraction systems, etc. —Harder to evaluate across environments For Treebank construction —Not a focus

Iceland 5/30-6/1/07 5 How “Successful” are Parsers?  (Phrase-Structure) Parser evaluation Trained and tested on Penn Treebank  Based on matching brackets Treebank differences can be significant  Goal is only a “skeletal” tree Doesn’t test everything we need for treebank construction

Iceland 5/30-6/1/07 6 What do We Want from a Parser? ( NP (NP answers) (SBAR (WHNP-6 that (S (NP-SBJ-3 we) (VP ‘d (VP like (S NP-SBJ *-3) (VP to (VP have (NP *T*-6))))) (Example from Penn Treebank)

Iceland 5/30-6/1/07 7 What do We Want from a Parser? ( NP (NP answers) (SBAR (WHNP-6 that (S (NP-SBJ-3 we) (VP ‘d (VP like (S NP-SBJ *-3) (VP to (VP have (NP *T*-6) )))) (Example from Penn Treebank)

Iceland 5/30-6/1/07 8 What do We Want from a Parser? ( NP (NP answers) (SBAR (WHNP that (S (NP we) (VP ‘d (VP like (VP to (VP have)))) (Example from Penn Treebank) It’s perfect!

Iceland 5/30-6/1/07 9 Improved Parsing for Treebanking  Some work on recovering empty categories (evaluation is not easy) Johnson ‘02, Levy&Manning ‘04, Campbell ‘04  Some work on function tags Blaheta ’03, Musillo & Merlo ‘05  Function tags and empty categories Gabbard, Kulick, Marcus’ 05 (still PTB-centric)

Iceland 5/30-6/1/07 10 Experience at Penn building Treebanks  Parser provides function tags and empty categories Various Penn Treebank-style (English) treebanks  Parser provides function tags, not empty categories Arabic Treebank, Historical Corpora But tags are significantly different for historical corpora – no evaluation yet

Iceland 5/30-6/1/07 11 Open Questions from Penn Experience  Improve parsing for treebanking Improve skeletal parse (Arabic, historical) How well can we recover function tags? Howe well can we recover empty categories?  How are these issues different for languages with morphology?

Iceland 5/30-6/1/07 12 Open Questions from Penn Experience  What problems arise for parsers from languages with more inflection, more free word order? Greater number of words, Part of Speech tags Correctly identifying arguments of a verb  What frequent syntactic phenomena can the parsers not handle well? Movement out of clause

Iceland 5/30-6/1/07 13 Parsers: Generative vs. Discriminative  Generative Computationally reasonable (but still complicated!) Less flexibility on using features Examples: Collins (’99), Bikel (’04)  Discriminative Computationally harder Flexibility on using features Examples: Multilingual Dependency Parser, Reranking

Iceland 5/30-6/1/07 14 Generative Parsers - Outline  Main properties, overview of Collins model  Greater morphology -> tagset games Spanish, Czech, Arabic Gets hard to “order” the information  More free Word Order -> ? Subcat frames – hasn’t been done(?) Problem of Long-distance movement

Iceland 5/30-6/1/07 15 Generative Parsers  Decompose tree into parsing decisions.  Generate a tree node-by-node, top-down  Based on a Probabilistic CFG Nice computational properties Horrendous independence assumptions!  Additional annotation to get better dependencies…  And then deal with sparse data issues

Iceland 5/30-6/1/07 16 PCFG Example  Lack of sensitivity to lexical information S NP V ? VP  One solution: (Collins and others) Add head (word,pos) to each nonterminal Argument marking and pseudo-subcategorization frames

Iceland 5/30-6/1/07 17 Collins’ Modified PCFG S bought, V NP yest,N V VP bought,V  New Problem: Sparse Data NP-A IBM,N NP-A Lotus,N yesterday N N IBM bought Lotus

Iceland 5/30-6/1/07 18 Collins: Dealing with Sparse Data  Decompose Tree: A head-centered, top- down derivation  Independence Assumption: Each word has an associated sub-derivation: Head-projection Subcategorization (careful!) Placement of “modifiers” (arguments and adjuncts) Lexical Dependencies

Iceland 5/30-6/1/07 19 Top-down derivation S bought, V VP bought,V  Generate head and  Subcat frames for left, right Left: {NP-A} Right:{}

Iceland 5/30-6/1/07 20 Top-down derivation  Generate “modifier” with POS tag  Generated as argument, so check off subcat entry (verb has a subject) S bought, V VP bought,V NP-A N

Iceland 5/30-6/1/07 21 Top-down derivation S bought, V VP bought,V  Generate head word of modifier  Recursively generate modifier’s subderivation NP-A IBM,N

Iceland 5/30-6/1/07 22 Top-down derivation S bought, V VP bought,V  Skipping the subderivation of the subject.. NP-A IBM,N N IBM

Iceland 5/30-6/1/07 23 Top-down derivation S bought, V NP N VP bought,V  Generate another modifier with pos tag NP-A IBM,N N IBM

Iceland 5/30-6/1/07 24 Top-down derivation S bought, V NP yest,N VP bought,V  And then head word NP-A IBM,N N IBM

Iceland 5/30-6/1/07 25 Top-down derivation S bought, V NP yest,N VP bought,V  And recursive derivation of modifier  Small locality of dependencies each step NP-A IBM,N yesterday N N IBM

Iceland 5/30-6/1/07 26 Backoff models to deal with sparse data  Lexicalization to sneak in some notion of linguistic locality.  Huge sparse data problem: rules like: S(bought,V)->NP(yesterday,N) NP(IBM,N) VP(bought,V)  Decomposed tree into series of tree creation/attachment decisions A “history-based” model  But there’s still a sparse data problem…

Iceland 5/30-6/1/07 27 Backoff models to deal with sparse data  How often will we see evidence of this? P(IBM,n,NP | bought,v,S,VP)  Need to backoff to use just POS tag P(IBM,n,NP | v,S,VP)  POS tags – bootstrap syntactic parsing classify words based on syntactic behavior S bought, V VP bought,V NP-A N

Iceland 5/30-6/1/07 28 Tagset Games - Spanish (Cowan & Collins, 2005)  Plural subject, singular verb How come this isn’t ruled out?  P(gatos,n,NP | corrio,v,S,VP) = P 1 (n,NP | corrio,v,S,VP) x P 2 (gatos | n,NP,corrio,v,S,VP) Time for the backoff models… S corrio,v VP corrio,V NP (gatos,n)

Iceland 5/30-6/1/07 29 Tagset Games - Spanish (Cowan & Collins, 2005) P(gatos,n.NP | corrio,v,S,VP) = P 1 (n,NP | corrio,v,S,VP) x P 2 (gatos | n,NP,corrio,v,S,VP) P 1 (n,NP | corrio,v,S,VP) = λ 1,1 P 1,1 (n,NP | corrio,v,S,VP)+ λ 1,2 P 1,2 (n,NP | v,S,VP)+ λ 1,3 P 1,3 (n,NP | S,VP)  If corrio not seen (P 1,1 ), use evidence without lexical dependence (P 1,2 )

Iceland 5/30-6/1/07 30 Tagset Games - Spanish (Cowan & Collins, 2005) P(gatos,n.NP | corrio,v,S,VP) = P 1 (n,NP | corrio,v,S,VP) x P 2 (gatos | n,NP,corrio,v,S,VP) P 2 (gatos| n,NP,corrio,v,S,VP) = λ 2,1 P 2,1 (gatos | n,NP,corrio,v,S,VP)+ λ 2,2 P 2,2 (gatos | n,NP,v,S,VP)+ λ 2,3 P 2,3 (gatos | n)  If corrio not seen (P 2,1 ), use evidence without lexical dependence (P 2,2 )

Iceland 5/30-6/1/07 31 Tagset Games - Spanish (Cowan & Collins, 2005) P 2 (gatos| n,NP,corrio,v,S,VP) = λ 2,1 P 2,1 (gatos | n,NP,corrio,v,S,VP)+ λ 2,2 P 2,2 (gatos | n,NP,v,S,VP)+ λ 2,3 P 2,3 (gatos | n)  If corrio not seen (P 2,1 ), use evidence without lexical dependence (P 2,2 )  But P 2,1 is the only probability that rules it out even worse – even it was seen, probably not often, so P 2,2 will overwhelm it

Iceland 5/30-6/1/07 32 Tagset Games - Spanish (Cowan & Collins, 2005)  “The impoverished model can only capture morphological restrictions through lexically- specific estimates based on extremely sparse statistics”  But suppose the noun and verb part of speech tags had number information.

Iceland 5/30-6/1/07 33 Tagset Games - Spanish (Cowan & Collins, 2005) P 1 (pn,NP | corrio,sv,S,VP) = λ 1,1 P 1,1 (pn,NP | corrio,sv,S,VP)+ λ 1,2 P 1,2 (pn,NP | sv,S,VP)+ λ 1,3 P 1,3 (pn,NP | S,VP)  pn=plural noun, sv = singular verb  P 1,2 will be very low, with high confidence λ 1,2  They tried a variety of ways to play with the tagset.

Iceland 5/30-6/1/07 34 Tagset Games - Spanish (Cowan & Collins, 2005)  Scores not additive – sparse data?  Best model n(A,D,N,V,P) +m(V) helps with Finding subjects Distinguishing infinitival and gerund VPs Attaching NP and PP postmodifiers to verbs Baseline81.0 number(Adj,Det,Noun,Pronoun,Verb)82.8 mode(V)82.4 person(V)82.4 number(A,D,N,P,V)+mode(V)83.5 number(A,D,N,P,V)+mode(V)+person(V)83.2

Iceland 5/30-6/1/07 35 Tagset Games – Czech (Collins,Hajic,Ramshaw,Tillmann 1999)  Convert from dependency to phrase-structure  Baseline: Use main POS of each tag NNMP1-----A– mapped to N noun,masculine,plural,nom,”affirmative” negativeness  Two letter tag: main POS and either detailed POS (for D,J,V,X) or Case : 58 tags  richer tagsets -> no improvement, “presumably” because of “damage from sparse data”

Iceland 5/30-6/1/07 36 Tagset Games – Arabic Treebank (Bikel, 2004), (Kulick,Gabbard,Marcus 2006)  Lots of tags Usual Sparse Data problem Bikel: can even get new tags not seen in training  Map them down (DET+ADJ+NSUFF_FEM_SG+CASE_DEF_ACC JJ) The “Bies tag set” – it was just a quick hack!  ( Kulick et al.) – keep the determiner at least DET+ADJ+NSUFF_FEM_SG+CASE_DEF_ACC DT+JJ)  The Case endings mostly aren’t really there Maybe: Case information to identify heads of constituents (e.g., ADJ heading NP)

Iceland 5/30-6/1/07 37 Tagset Games – Conclusion (still the same as in Collins’ 99)  More tag information w/o sparse data? P(modifier POS | head POS) difficulty in doing this is motivation for other parsing models  Lots of word forms w/o sparse data? P(word-form|word-stem, POS tag)  Another question – for parsing, how important is such information compared to Case? Where does it help disambiguate the parse?

Iceland 5/30-6/1/07 38 What about Free Word Order?  Czech work mentioned as a problem, nothing done  May not have mattered for evaluation? If not scoring SBJ,OBJ, etc. labels, so what if the parser doesn’t know what’s what?

Iceland 5/30-6/1/07 39 What about Free Word Order?  “Subcat” frame between each level S bought, V VP bought,V NP-A N  Obvious thing to try: integrate Case assignment into subcat frames Verb requires NOM instead of NP-A, etc. Alluded to in Collins 2003, not done (but Zeman 2002 for dependency Czech parsing)  Ease the problem of using other morph info?

Iceland 5/30-6/1/07 40 What about Free Word Order?  Problem: “subcat” frames as implemented are near meaningless  Independent horizontally Left and right independent  Independent vertically between one-level trees Sisters of VP are independent of sisters of V  Can this be fixed? Requires some greater amount of history between one-level trees, based on head percolation

Iceland 5/30-6/1/07 41 Long-distance movement?  Not handled in most generative parsers Exception: CCG (Hockenmaier 2003)  Postprocessing Who do you think John saw Who i do you think John saw t i  “Good enough”, or way to integrate into parsing? “Good enough” for languages with more long- distance movement?

Iceland 5/30-6/1/07 42 Discriminative Parsers - Outline  Basic idea and main properties  Dependency parsers – Easier handling of morphology and free word order – how successful? Long movement – “non-projective” – how successful?  Discriminative Phrase-Structure Parsers Handling of morphology and free word order – hasn’t really been tried

Iceland 5/30-6/1/07 43 Discriminative Parsers  Conditional P(T|S) instead of joint P(T,S) Training requires reparsing training corpus to update parameters Can take a long time!  Easier to utilize dependent features Successfully used in other aspects of NLP How about for parsing? – computational problem

Iceland 5/30-6/1/07 44 Discriminative Parsers  Dependency Parsing Not as computationally hard, still useful  Post-Processing: Parse reranking Just work with output of k-best generative parser  Phrase Structure Parsing Limited to sentences of length <=15 Lots of pruning, doesn’t outperform generative parsers (but can still be promising for using morphology)

Iceland 5/30-6/1/07 45 Multi-Lingual Dependency Parsing  CoNLL Shared Task, 2006, 2007  High-performing system: McDonald ’06  Unlabelled Parsing Projective or non-Projective (“long” movement) Still requires factoring the parsing problem Features between (head, modifier, previous modifier) – but within that can be very flexible  Labelled Parsing Postprocessing stage to add labels Features not limited.

Iceland 5/30-6/1/07 46 Parsing Results RegularProjectiveNo-morph Arabic (real)66.9 65.1 Bulgarian87.6 87.2 Chinese85.9 Czech (real)80.2 Danish(real)84.884.184.0 Dutch79.274.777.8 English89.4 German87.3 Japanese90.7 90.6 Portuguese86.887.085.7 Slovene (real)73.472.671.5 Spanish82.3 80.9 Swedish (real)82.6 Turkish (real)63.2 60.6

Iceland 5/30-6/1/07 47 Effects of non-projectivity RegularProjectiveNo-morph Arabic(real)66.9 65.1 Bulgarian87.6 87.2 Danish(real)84.884.184.0 Dutch79.274.777.8 Japanese90.7 90.6 Portuguese86.887.085.7 Slovene(real)73.472.671.5 Spanish82.3 80.9 Swedish(real)82.6 Turkish(real)63.2 60.6

Iceland 5/30-6/1/07 48 Effects of morphology RegularProjectiveNo-morph Arabic (real)66.9 65.1 Bulgarian87.6 87.2 Danish (real)84.884.184.0 Dutch79.274.777.8 Japanese90.7 90.6 Portuguese86.887.085.7 Slovene (real)73.472.671.5 Spanish82.3 80.9 Swedish (real)82.6 Turkish (real)63.2 60.6

Iceland 5/30-6/1/07 49 Dependency Parsing  Improvement with morphology Effect of different types (Case, inflection?)  Improvement with freer word-order? Local freer word-order reflected in labeled accuracy – how does it compare?  Improvement with non-local movement? Anything is an improvement Czech all sentences: 85.2, only sentences with nonprojective dependency: 81.9 (unlabeled!)

Iceland 5/30-6/1/07 50 Parse Reranking  Work with output of k-best parser Limited problem, computationally easier No tree decomposition, arbitrary features e.g. trigrams with heads of arguments of PPs  Spanish: 83.5 to 85.1 with reranking Cowan & Collins – same reranking features as for English  Reranking incorporating morphological features?

Iceland 5/30-6/1/07 51 Discriminative Phrase-Structure Parsing  What discriminative parsers parse the usual set of phrase-structure sentences? Ratnaparkhi ’98 — History-based, “local” features Collins & Roark ’04 – — Incremental parsing, “global” features Shen ’06 - Tree Adjoining Grammar-based — related to previous — Different approach to non-projectivity — But modifies trees severely Others? (CCG?)

Iceland 5/30-6/1/07 52 Incremental Parsing – (Collins & Roark ’04)  Main properties About same performance as generative Severe pruning of possible parses for sentence Features same as generative Can use generative score as feature  Possibilities: Use morphological features Free Word Order, non-local dependencies? More tightly integrate generative and discriminative —This should be done

Iceland 5/30-6/1/07 53 Issues,Questions,Conclusions  Not the question: How can a parser work with a language with lots of morphology?

Iceland 5/30-6/1/07 54 Issues,Questions,Conclusions  The question(s): What do we want the annotation to look like? — An independent question Based on what we know of how parsers work, what will the parser have problems recovering? —We can (mostly) answer this Where might morphological information be valuable (or not)? —We can speculate about this What approaches should we use? —It depends on the above

Iceland 5/30-6/1/07 55 Some Questions from Talk  Spare Data from greater morphology Can the tagset game be sufficient for utilizing what is valuable? How to backoff from sparse word forms when there is more inflection?  Free Word Order: Can Case be integrated into subcat frames? For generative (need better subcat frames) For discriminative (need subcat frames?)  Function Tags/Labelled Dependency Good enough for what’s needed?

Iceland 5/30-6/1/07 56 Some Questions from Talk  Empty category recovery Features to use with morphology?  Long-Distance movement Can be hacked into generative model? How adequate are dependency parsers? How usable is TAG-based approach? How much do we care for parsing?  Is reranking a reasonable approach? Make sense to throw morphological features in here?

Iceland 5/30-6/1/07 57 Combining Solutions  Generative parser as input for discriminative Incremental parser (Collin & Roark) Rerankers  Better input for Parser by preprocessing Dependency parser as input for generative Chunk using morphology as input for either (maybe don’t need to parse entire sentence?)  Tighter integration? Move into discriminative mode at key points inside generative model

Iceland 5/30-6/1/07 58 Blah  The following slides don’t count

Iceland 5/30-6/1/07 59 Issues for Discussion  Sneaking discriminative info into generative model  Dependency parsing as preprocessing constraint for phrase-structure parsing  How much does morphology matter for Icelandic anyway?  Will the treebank be phrase-structure or dependency  The world could really use a high-quality phrase-structure treebank with lot of morphology

Iceland 5/30-6/1/07 60 Issues for Discussion  Preprocessing constituent bracketing from Case info Do we even really need to parse the whole sentence?  Arabic hypothetical example of using Case information to identify heads before parsing  Linguistically interesting way to order morph info in generative model?

Iceland 5/30-6/1/07 61 Issues for Discussion  And can’t forget – what about morphology and function tags and empty categories? How helpful will the morphology be? Examples? Again, this depends on what the treebank will look like

Iceland 5/30-6/1/07 62 Alternative Approach #2  Hack around with the parser to allow a little “discriminative” modelling to sneak in at key points. Probably need to save this point for after the end.

Iceland 5/30-6/1/07 63 Generative Parsing - Summary  Pretty good at skeletal structure for English Recovering function tags for Penn Treebank Post-processing empty category recovery  Not good at (or unknown) Integrating complex POS/morph tags Long-distance movement Free word order (but perhaps integrated into postprocessing)

Iceland 5/30-6/1/07 64 Another note on Function Tags  Treebanker, faced with a function tag: If correct, nothing to do If incorrect, delete it and assign new one If none, add if necessary  Want to increase the precision to where the tags can be assumed to be correct  Possible of some tags – e.g., SBJ  Be wary of overall numbers

Iceland 5/30-6/1/07 1 Parsing with Morphological Information for Treebank Construction Seth Kulick University of Pennsylvania.

Similar presentations

Presentation on theme: "Iceland 5/30-6/1/07 1 Parsing with Morphological Information for Treebank Construction Seth Kulick University of Pennsylvania."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Iceland 5/30-6/1/07 1 Parsing with Morphological Information for Treebank Construction Seth Kulick University of Pennsylvania.

Similar presentations

Presentation on theme: "Iceland 5/30-6/1/07 1 Parsing with Morphological Information for Treebank Construction Seth Kulick University of Pennsylvania."— Presentation transcript:

Similar presentations

About project

Feedback