Download presentation
Presentation is loading. Please wait.
Published byEdith Jefferson Modified over 8 years ago
1
Coping with Problems in Grammars Automatically Extracted from Treebanks Carlos A. Prolo Computer and Info. Science Dept. University of Pennsylvania
2
● The extraction of a Tree Adjoining Grammar (TAG) ● From the Penn Treebank (English WSJ corpus) ● Using Xia's extraction tool + Other stuff Context Focus ● Extraction problems: – Some case studies
3
Teaser ● Business of grammar extraction from corpora is intended to produce a grammar with “full” coverage of the constructions in a language ● But we know we don't know how to model many syntactic phenomena ● So, what are we doing? ● We have to start looking, pragmatically, to the quality of the extracted grammars we produce
4
Sources of extraction problems 1 Lack of proper linguistic account 2 Treebank annotation style 3 Extraction tool/process itself 4 Unsuitability of the language model 5 Unsuitability of the grammar formalism 6 Annotation errors 7... and, of course, Inability on the part of the grammar developers
5
Sources of extraction problems 1 Lack of proper linguistic account 2 Treebank annotation style 3 X Extraction tool/process itself 4 Unsuitability of the language model 5 Unsuitability of the grammar formalism 6 X Annotation errors
6
VPVP VP * Adv NP N S V VP Lexicalized Tree Adjoining Grammar (LTAG) 4
7
VPVP VP * Adv NP N S V VP S NPVP Adv V NP N N LTAG: combining trees 4
8
Automatic TAG extraction Figure is thanks to Fei Xia
9
Automatic TAG extraction Figure is thanks to Fei Xia
10
A few selected problem cases 1 (PTB) Extraction of Free Relatives 2 Wh percolation up 3 “Unlike Coordinated Phrases” (UCP) 4 Extraposition (Verb Subcategorization) 5 Parentheticals 6 VP topicalization 7 X (PTB) Projection of Parts-of-speech
11
Extraction of Free Relatives ( problem due to PTB annotation style) (S-3 (NP-SBJ (PRP We)) (VP (VBP make) (SBAR-NOM (WHNP-1 (WP what)) (S we know how to make)))) (S-3 (NP-SBJ (PRP We)) (VP (VBP make) (NP (NP (WP what)) (SBAR (WHNP-1 (-NONE- 0)) (S we know how to make))))) ● Problem: Free relatives are annotated as wh sentential complements. Verb is extracted with the wrong argument category: “S (SBAR)” ● Solution: Change the free relatives to NP (relative clause has empty wh: “head” account – Bresnan 78)
12
Wh percolation up (NP (NP (DT the) (NNS researchers)) (SBAR (WHNP-3 (WP who)) (S (NP-SBJ (-NONE- *T*-3)) (VP (VBD studied) (NP (DT the) (NNS workers)))))) WHNP WP NP NNS
13
Wh percolation up (SBARQ (WHNP-46 (WP What) (NN sector)) (SQ (VBZ is) (NP-SBJ-2 (-NONE- *T*-46)) (VP (VBG stepping) (ADVP-DIR (RB forward))))) NP NN NPNP WP NP * +
14
Wh percolation up (SBARQ (WHNP-46 (WP What) (NN sector)) (SQ (VBZ is) (NP-SBJ-2 (-NONE- *T*-46)) (VP (VBG stepping) (ADVP-DIR (RB forward))))) NP NN NPNP WP NP * + WHN P WP WHNP *
15
Wh percolation up (SBARQ (WHNP-46 (WP What) (NN sector)) (SQ (VBZ is) (NP-SBJ-2 (-NONE- *T*-46)) (VP (VBG stepping) (ADVP-DIR (RB forward))))) NP NN WHN P WP NP * NPNP WP NP * + WHN P WP WHNP *
16
Wh percolation up (SBARQ (WHNP-46 (WP What) (NN sector)) (SQ (VBZ is) (NP-SBJ-2 (-NONE- *T*-46)) (VP (VBG stepping) (ADVP-DIR (RB forward))))) NP NN WHN P WP NP * NPNP WP NP * + WHN P WP WHNP * (Vijay-Schanker et al.)
17
Wh percolation up (SBARQ (WHNP-46 (WP What) (NN sector)) (SQ (VBZ is) (NP-SBJ-2 (-NONE- *T*-46)) (VP (VBG stepping) (ADVP-DIR (RB forward))))) WHNP WP + ?
18
Wh percolation up (NP-SBJ (NP The bid) (PP for Great Northern) (,,) (SBAR (WHNP-1 (NP (DT a) (NN notice)) (WHPP (IN of) (WHNP (WDT which)))) (S *T* appears in an advertisement))) WHNP WP
19
Wh percolation up (SBARQ (WHNP-46 (WP What) (NN sector)) (SQ (VBZ is) (NP-SBJ-2 (-NONE- *T*-46)) (VP (VBG stepping) (ADVP-DIR (RB forward))))) WHNP NN + WHN P WP WHNP * WHN P WHNP * IN WHNP WHPP
20
Unlike Coordinated Phrases (UCP) (NP (UCP (NN construction) (CC and) (JJ commercial)) (NNS loans)) (VP (VB be) (UCP-PRD (NP (CD 35)) (CC or) (ADJP (JJR older)))) (VP (VB take) (NP (NN effect)) (UCP-TMP (ADVP 96 days later) (,,) (CC or) (PP in early February)))
21
Unlike Coordinated Phrases (UCP) (NP (UCP (NN construction) (CC and) (JJ commercial)) (NNS loans)) (VP (VB be) (UCP-PRD (NP (CD 35)) (CC or) (ADJP (JJR older)))) (VP (VB take) (NP (NN effect)) (UCP-TMP (ADVP 96 days later) (,,) (CC or) (PP in early February))) NPNP NP * UCP JJ NNCC S NP VB UCP VP [be]
22
Unlike Coordinated Phrases (UCP) ● We give the UCP the status of an independent non- terminal as if it had some intrinsic categorial significance ● Multiple conjunts: it is enough for one of them to be of a distinct category to turn the entire constituent into a UCP
23
Unlike Coordinated Phrases (UCP): as the head of a constituent (S (NP-SBJ-1 The Series 1989 B bonds) (VP (VBP are) (VP (VBN rated) (S *-1 double-A)))) (S (NP-SBJ-1 The Series 1989 B bonds) (VP (VBP are) (UCP-PRD (ADJP-PRD (JJ uninsured)) (CC and) (VP (VBN rated) (S *-1 double-A)))))
24
Extraposition (“it” extraposition) (S (NP-SBJ-1 (NP (PRP it)) (S (-NONE- *EXP*-2))) (VP (MD would) (ADVP-TMP (RB no) (RBR longer)) (VP (VB be) (ADJP-PRD (JJ possible)) (S-2 (NP-SBJ (-NONE- *-1)) (VP (TO to) (VP (VB win) (NP (NN reinstatement)))))))) VPVP VP * S [win]
25
Extraposition (relative clause) (S (ADVP-TMP (RB Soon)) (,,) (NP-SBJ (NP (NNS T-shirts)) (SBAR (-NONE- *ICH*-1))) (VP (VBD appeared) (PP-LOC (IN in) (NP (DT the) (NNS corridors))) (SBAR-1 (WHNP-2 (WDT that)) (S (NP-SBJ (-NONE- *T*-2)) (VP (VBD carried) (NP (NP the school 's familiar logo) (PP-LOC on the front) )))))) VPVP VP * SBAR [carried]
26
Extraposition (Object) (S (NP-SBJ Mr. Peters) (VP (VBZ says) (PP-LOC in his affidavit) (SBAR (IN that) (S (NP the movie 's staff) (VP (VBD was) (VP (VBN told) (NP (-NONE- *-1)) (NP-TMP last week) (SBAR that Warner was...) S NP VBZ SBAR VP + [says] S NP VBD NP VP + [told] SBAR VPVP VP * PP [in] VPVP VP * NP [week]
27
Extraposition (Object) (S (NP-SBJ Mr. Peters) (VP (VBZ says) (PP-LOC in his affidavit) (SBAR (IN that) (S (NP the movie 's staff) (VP (VBD was) (VP (VBN told) (NP (-NONE- *-1)) (NP-TMP last week) (SBAR that Warner was...) S NP VBZ SBAR VP + [says] S NP VBD NP VP + [told] SBAR VPVP VP * PP [in] VPVP VP * NP [week] Note: Chiang 2000 (sister adjunction)
28
Extraposition (Object) (S (NP-SBJ Mr. Peters) (VP (VBZ says) (PP-LOC in his affidavit) (SBAR (IN that) (S (NP the movie 's staff) (VP (VBD was) (VP (VBN told) (NP (-NONE- *-1)) (NP-TMP last week) (SBAR that Warner was...) S NP VBZ VP + [says] S NP VBD NP VP + [told] VPVP VP * PP [in] VPVP VP * NP [week] VPVP VP * SBAR [S compl] + +
29
Extraposition (Object) S NP VP SBAR VP S NP VP SBAR VP NP VBD VBZ [says] [told] Note: Multi-component tags (Bleam & Xia, TAG+ 2000)
30
Parentheticals (non-lexicalized trees !!) (NP (NP the 3 billion New Zealand dollars) (PRN (-LRB- -LRB-) (NP US$ 1.76 billion *U*) (-RRB- -RRB-))) (S (NP-SBJ The total relationship) (PRN (,,) (SBAR-ADV as Mr. Lee sees it) (,,)) (VP (VBZ is)...)) VPVP PRN VP * SBAR NPNP NP * PRN NP
31
VP Topicalization S NP VP + VPVP V VP * S NP V VP + ver sus [be] [excluded] [be] VBN VP VBN Lexical HeadSyntactic Head (S (NP-SBJ-1 investments in...) (VP (MD will) (VP (VB be) (VP (VBN excluded)))))
32
VP Topicalization S NP VP + VPVP V VP * S NP V VP + ver sus [be] [excluded] [be] VBN VP VBN Lexical HeadSyntactic Head (SINV (ADVP (RB Also)) (VP-TPC-2 (VBN excluded)) (VP (MD will) (VP (VB be) (VP (-NONE- *T*-2)))) (NP-SBJ-1 investments in...))
33
Projections of Parts-of-speech (NP (DT a) (JJR stronger) (NN argument)) (NP (DT an) (ADJP (RB even) (JJR stronger)) (NN argument)) NPNP JJR NP * [stronger] NPNP ADJP NP * JJR [stronger]
34
Projections of Parts-of-speech (NP-SBJ-1 (NNP October) (NN weather)) (NP-SBJ-1 (NP (JJ late) (NNP October)) (NN weather)) NPNP NNP NP * [October] NPNP NP NP * NNP [October]
35
Forced Projections of Parts-of-speech PROJECTEDPROJECTION NN, NNP, PRP, EX NP JJ, JJR, JJSADJP RB, RBR, RBSADVP S, SINVSBAR SQ SBARQ WP WHNP WRBADVP CDQP QPNP UHINTJ LSLST
36
Conclusion ● Full coverage of language (currently) is utopic ● Grammar extraction can/should be used to search for solutions to grammar development problems ● We presented a few selected problems in grammar extraction and discussed solutions with various degrees of acceptability (using TAGs) ● There are more and harder ones where these came from ● Question: how would these problems be handled: – By other grammar formalisms ? – By other linguistic approaches using the TAG formalism ?
37
S NP V VP S NP V VP S NP VBN VP SBAR WHNP S NP VBN VP SBAR WHNP S NP V VP SBAR WHNP NP NP * LTAG Verb Trees 5
38
Automatic TAG extraction
39
Figure is thanks to Fei Xia
40
Wh percolation up (NP-SBJ (NP The bid) (PP for Great Northern) (,,) (SBAR (WHNP-1 (NP (DT a) (NN notice)) (WHPP (IN of) (WHNP (WDT which)))) (S *T* appears in an advertisement))) (NP-PRD (NP (NNS hitches)) (,,) (SBAR (RB not) (WHNP-17 (NP (DT the) (JJS least)) (WHPP (IN of) (WHNP (WDT which)))) (S *T* was that... )
41
Extraposition (S (NP-SBJ (NP (PRP it)) (S (-NONE- *EXP*-1))) (VP (VBZ is) (ADJP-PRD (JJ unjust)) (S-1 (NP-SBJ (-NONE- *)) (VP (TO to) (VP (VB reprove) (NP (NNP China)) (PP-PRP (IN for) (NP (PRP it))))))))
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.