Coping with Problems in Grammars Automatically Extracted from Treebanks Carlos A. Prolo Computer and Info. Science Dept. University of Pennsylvania.

Slides:



Advertisements
Similar presentations
Feature Forest Models for Syntactic Parsing Yusuke Miyao University of Tokyo.
Advertisements

Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein.
LING/C SC/PSYC 438/538 Lecture 22 Sandiway Fong 1.
Context-Free Grammars Julia Hirschberg CS 4705 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin.
Layering Semantics (Putting meaning into trees) Treebank Workshop Martha Palmer April 26, 2007.
How to perform tree surgery Anna Rafferty Marie-Catherine de Marneffe.
Probabilistic Parsing Chapter 14, Part 2 This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney.
Computer Science Department David Caley Thomas Folz-Donahue Rob Hall Matt Marzilli Accurate Parsing ('they worry that air the shows, drink too much, whistle.
LING 581: Advanced Computational Linguistics Lecture Notes January 19th.
Using Treebanks tgrep2 Lecture 2: 07/12/2011. Using Corpora For discovery For evaluation of theories For identifying tendencies – distribution of a class.
Introduction to treebanks Session 1: 7/08/
6/29/051 New Frontiers in Corpus Annotation Workshop, 6/29/05 Ann Bies – Linguistic Data Consortium* Seth Kulick – Institute for Research in Cognitive.
Recovering empty categories. Penn Treebank The Penn Treebank Project annotates naturally occurring text for linguistic structure. It produces skeletal.
Extracting LTAGs from Treebanks Fei Xia 04/26/07.
Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein.
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
Syntax and Context-Free Grammars CMSC 723: Computational Linguistics I ― Session #6 Jimmy Lin The iSchool University of Maryland Wednesday, October 7,
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
CS224N Interactive Session Competitive Grammar Writing Chris Manning Sida, Rush, Ankur, Frank, Kai Sheng.
Semantic Role Labeling
Parsing Long and Complex Natural Language Sentences
LING/C SC/PSYC 438/538 Lecture 27 Sandiway Fong. Administrivia 2 nd Reminder – 538 Presentations – Send me your choices if you haven’t already.
GALE Banks 11/9/06 1 Parsing Arabic: Key Aspects of Treebank Annotation Seth Kulick Ryan Gabbard Mitch Marcus.
Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi and introduced in Tree-adjoining grammars are somewhat similar to context-free.
LING 581: Advanced Computational Linguistics Lecture Notes February 12th.
Click to edit Master title style Instructor: Nick Cercone CSEB - CSE Introduction to Computational Linguistics Tuesdays,
The Prague (Czech-)English Dependency Treebank Jan Hajič Charles University in Prague Computer Science School Institute of Formal and Applied Linguistics.
HW7 Extracting Arguments for % Ang Sun March 25, 2012.
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with.
CS460/626 : Natural Language Processing/Speech, NLP and the Web Some parse tree examples (from quiz 3) Pushpak Bhattacharyya CSE Dept., IIT Bombay 12 th.
Conversion of Penn Treebank Data to Text. Penn TreeBank Project “A Bank of Linguistic Trees” (as of 11/1992) University of Pennsylvania, LINC Laboratory.
Using Minimum Description Length to make Grammatical Generalizations Mike Dowman University of Tokyo.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 17 (14/03/06) Prof. Pushpak Bhattacharyya IIT Bombay Formulation of Grammar.
Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
NLP. Introduction to NLP Background –From the early ‘90s –Developed at the University of Pennsylvania –(Marcus, Santorini, and Marcinkiewicz 1993) Size.
Supertagging CMSC Natural Language Processing January 31, 2006.
Part-of-speech tagging
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Handling Unlike Coordinated Phrases in TAG by Mixing Syntactic Category and Grammatical Function Carlos A. Prolo Faculdade de Informática – PUCRS CELSUL,
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
NLP. Parsing ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (,,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (,,) ) (VP (MD will) (VP (VB join) (NP (DT.
NLP. Introduction to NLP #include int main() { int n, reverse = 0; printf("Enter a number to reverse\n"); scanf("%d",&n); while (n != 0) { reverse =
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 3 rd.
Prototype-Driven Grammar Induction Aria Haghighi and Dan Klein Computer Science Division University of California Berkeley.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
LING 581: Advanced Computational Linguistics Lecture Notes February 24th.
Statistical Natural Language Parsing Parsing: The rise of data and statistics.
Instructor: Nick Cercone CSEB -
Treebanks, Trees, Querying, QC, etc.
Introduction to Machine Learning and Text Mining
Statistical NLP: Lecture 3
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
LING/C SC/PSYC 438/538 Lecture 21 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.
CS 388: Natural Language Processing: Statistical Parsing
TREE ADJOINING GRAMMAR
LING/C SC 581: Advanced Computational Linguistics
LING 581: Advanced Computational Linguistics
Statistical NLP Spring 2011
LING/C SC 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
Constraining Chart Parsing with Partial Tree Bracketing
LING/C SC/PSYC 438/538 Lecture 23 Sandiway Fong.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 26
Linguistic Essentials
Natural Language Processing
LING/C SC 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
Presentation transcript:

Coping with Problems in Grammars Automatically Extracted from Treebanks Carlos A. Prolo Computer and Info. Science Dept. University of Pennsylvania

● The extraction of a Tree Adjoining Grammar (TAG) ● From the Penn Treebank (English WSJ corpus) ● Using Xia's extraction tool + Other stuff Context Focus ● Extraction problems: – Some case studies

Teaser ● Business of grammar extraction from corpora is intended to produce a grammar with “full” coverage of the constructions in a language ● But we know we don't know how to model many syntactic phenomena ● So, what are we doing? ● We have to start looking, pragmatically, to the quality of the extracted grammars we produce

Sources of extraction problems 1 Lack of proper linguistic account 2 Treebank annotation style 3 Extraction tool/process itself 4 Unsuitability of the language model 5 Unsuitability of the grammar formalism 6 Annotation errors 7... and, of course, Inability on the part of the grammar developers

Sources of extraction problems 1 Lack of proper linguistic account 2 Treebank annotation style 3 X Extraction tool/process itself 4 Unsuitability of the language model 5 Unsuitability of the grammar formalism 6 X Annotation errors

VPVP VP * Adv NP N S V VP Lexicalized Tree Adjoining Grammar (LTAG) 4

VPVP VP * Adv NP N S V VP S NPVP Adv V NP N N LTAG: combining trees 4

Automatic TAG extraction Figure is thanks to Fei Xia

Automatic TAG extraction Figure is thanks to Fei Xia

A few selected problem cases 1 (PTB) Extraction of Free Relatives 2 Wh percolation up 3 “Unlike Coordinated Phrases” (UCP) 4 Extraposition (Verb Subcategorization) 5 Parentheticals 6 VP topicalization 7 X (PTB) Projection of Parts-of-speech

Extraction of Free Relatives ( problem due to PTB annotation style) (S-3 (NP-SBJ (PRP We)) (VP (VBP make) (SBAR-NOM (WHNP-1 (WP what)) (S we know how to make)))) (S-3 (NP-SBJ (PRP We)) (VP (VBP make) (NP (NP (WP what)) (SBAR (WHNP-1 (-NONE- 0)) (S we know how to make))))) ● Problem: Free relatives are annotated as wh sentential complements. Verb is extracted with the wrong argument category: “S (SBAR)” ● Solution: Change the free relatives to NP (relative clause has empty wh: “head” account – Bresnan 78)

Wh percolation up (NP (NP (DT the) (NNS researchers)) (SBAR (WHNP-3 (WP who)) (S (NP-SBJ (-NONE- *T*-3)) (VP (VBD studied) (NP (DT the) (NNS workers)))))) WHNP WP NP NNS

Wh percolation up (SBARQ (WHNP-46 (WP What) (NN sector)) (SQ (VBZ is) (NP-SBJ-2 (-NONE- *T*-46)) (VP (VBG stepping) (ADVP-DIR (RB forward))))) NP NN NPNP WP NP * +

Wh percolation up (SBARQ (WHNP-46 (WP What) (NN sector)) (SQ (VBZ is) (NP-SBJ-2 (-NONE- *T*-46)) (VP (VBG stepping) (ADVP-DIR (RB forward))))) NP NN NPNP WP NP * + WHN P WP WHNP *

Wh percolation up (SBARQ (WHNP-46 (WP What) (NN sector)) (SQ (VBZ is) (NP-SBJ-2 (-NONE- *T*-46)) (VP (VBG stepping) (ADVP-DIR (RB forward))))) NP NN WHN P WP NP * NPNP WP NP * + WHN P WP WHNP *

Wh percolation up (SBARQ (WHNP-46 (WP What) (NN sector)) (SQ (VBZ is) (NP-SBJ-2 (-NONE- *T*-46)) (VP (VBG stepping) (ADVP-DIR (RB forward))))) NP NN WHN P WP NP * NPNP WP NP * + WHN P WP WHNP * (Vijay-Schanker et al.)

Wh percolation up (SBARQ (WHNP-46 (WP What) (NN sector)) (SQ (VBZ is) (NP-SBJ-2 (-NONE- *T*-46)) (VP (VBG stepping) (ADVP-DIR (RB forward))))) WHNP WP + ?

Wh percolation up (NP-SBJ (NP The bid) (PP for Great Northern) (,,) (SBAR (WHNP-1 (NP (DT a) (NN notice)) (WHPP (IN of) (WHNP (WDT which)))) (S *T* appears in an advertisement))) WHNP WP

Wh percolation up (SBARQ (WHNP-46 (WP What) (NN sector)) (SQ (VBZ is) (NP-SBJ-2 (-NONE- *T*-46)) (VP (VBG stepping) (ADVP-DIR (RB forward))))) WHNP NN + WHN P WP WHNP * WHN P WHNP * IN WHNP WHPP

Unlike Coordinated Phrases (UCP) (NP (UCP (NN construction) (CC and) (JJ commercial)) (NNS loans)) (VP (VB be) (UCP-PRD (NP (CD 35)) (CC or) (ADJP (JJR older)))) (VP (VB take) (NP (NN effect)) (UCP-TMP (ADVP 96 days later) (,,) (CC or) (PP in early February)))

Unlike Coordinated Phrases (UCP) (NP (UCP (NN construction) (CC and) (JJ commercial)) (NNS loans)) (VP (VB be) (UCP-PRD (NP (CD 35)) (CC or) (ADJP (JJR older)))) (VP (VB take) (NP (NN effect)) (UCP-TMP (ADVP 96 days later) (,,) (CC or) (PP in early February))) NPNP NP * UCP JJ NNCC S NP VB UCP VP [be]

Unlike Coordinated Phrases (UCP) ● We give the UCP the status of an independent non- terminal as if it had some intrinsic categorial significance ● Multiple conjunts: it is enough for one of them to be of a distinct category to turn the entire constituent into a UCP

Unlike Coordinated Phrases (UCP): as the head of a constituent (S (NP-SBJ-1 The Series 1989 B bonds) (VP (VBP are) (VP (VBN rated) (S *-1 double-A)))) (S (NP-SBJ-1 The Series 1989 B bonds) (VP (VBP are) (UCP-PRD (ADJP-PRD (JJ uninsured)) (CC and) (VP (VBN rated) (S *-1 double-A)))))

Extraposition (“it” extraposition) (S (NP-SBJ-1 (NP (PRP it)) (S (-NONE- *EXP*-2))) (VP (MD would) (ADVP-TMP (RB no) (RBR longer)) (VP (VB be) (ADJP-PRD (JJ possible)) (S-2 (NP-SBJ (-NONE- *-1)) (VP (TO to) (VP (VB win) (NP (NN reinstatement)))))))) VPVP VP * S [win]

Extraposition (relative clause) (S (ADVP-TMP (RB Soon)) (,,) (NP-SBJ (NP (NNS T-shirts)) (SBAR (-NONE- *ICH*-1))) (VP (VBD appeared) (PP-LOC (IN in) (NP (DT the) (NNS corridors))) (SBAR-1 (WHNP-2 (WDT that)) (S (NP-SBJ (-NONE- *T*-2)) (VP (VBD carried) (NP (NP the school 's familiar logo) (PP-LOC on the front) )))))) VPVP VP * SBAR [carried]

Extraposition (Object) (S (NP-SBJ Mr. Peters) (VP (VBZ says) (PP-LOC in his affidavit) (SBAR (IN that) (S (NP the movie 's staff) (VP (VBD was) (VP (VBN told) (NP (-NONE- *-1)) (NP-TMP last week) (SBAR that Warner was...) S NP VBZ SBAR VP + [says] S NP VBD NP VP + [told] SBAR VPVP VP * PP [in] VPVP VP * NP [week]

Extraposition (Object) (S (NP-SBJ Mr. Peters) (VP (VBZ says) (PP-LOC in his affidavit) (SBAR (IN that) (S (NP the movie 's staff) (VP (VBD was) (VP (VBN told) (NP (-NONE- *-1)) (NP-TMP last week) (SBAR that Warner was...) S NP VBZ SBAR VP + [says] S NP VBD NP VP + [told] SBAR VPVP VP * PP [in] VPVP VP * NP [week] Note: Chiang 2000 (sister adjunction)

Extraposition (Object) (S (NP-SBJ Mr. Peters) (VP (VBZ says) (PP-LOC in his affidavit) (SBAR (IN that) (S (NP the movie 's staff) (VP (VBD was) (VP (VBN told) (NP (-NONE- *-1)) (NP-TMP last week) (SBAR that Warner was...) S NP VBZ VP + [says] S NP VBD NP VP + [told] VPVP VP * PP [in] VPVP VP * NP [week] VPVP VP * SBAR [S compl] + +

Extraposition (Object) S NP VP SBAR VP S NP VP SBAR VP NP VBD VBZ [says] [told] Note: Multi-component tags (Bleam & Xia, TAG+ 2000)

Parentheticals (non-lexicalized trees !!) (NP (NP the 3 billion New Zealand dollars) (PRN (-LRB- -LRB-) (NP US$ 1.76 billion *U*) (-RRB- -RRB-))) (S (NP-SBJ The total relationship) (PRN (,,) (SBAR-ADV as Mr. Lee sees it) (,,)) (VP (VBZ is)...)) VPVP PRN VP * SBAR NPNP NP * PRN NP

VP Topicalization S NP VP + VPVP V VP * S NP V VP + ver sus [be] [excluded] [be] VBN VP VBN Lexical HeadSyntactic Head (S (NP-SBJ-1 investments in...) (VP (MD will) (VP (VB be) (VP (VBN excluded)))))

VP Topicalization S NP VP + VPVP V VP * S NP V VP + ver sus [be] [excluded] [be] VBN VP VBN Lexical HeadSyntactic Head (SINV (ADVP (RB Also)) (VP-TPC-2 (VBN excluded)) (VP (MD will) (VP (VB be) (VP (-NONE- *T*-2)))) (NP-SBJ-1 investments in...))

Projections of Parts-of-speech (NP (DT a) (JJR stronger) (NN argument)) (NP (DT an) (ADJP (RB even) (JJR stronger)) (NN argument)) NPNP JJR NP * [stronger] NPNP ADJP NP * JJR [stronger]

Projections of Parts-of-speech (NP-SBJ-1 (NNP October) (NN weather)) (NP-SBJ-1 (NP (JJ late) (NNP October)) (NN weather)) NPNP NNP NP * [October] NPNP NP NP * NNP [October]

Forced Projections of Parts-of-speech PROJECTEDPROJECTION NN, NNP, PRP, EX NP JJ, JJR, JJSADJP RB, RBR, RBSADVP S, SINVSBAR SQ SBARQ WP WHNP WRBADVP CDQP QPNP UHINTJ LSLST

Conclusion ● Full coverage of language (currently) is utopic ● Grammar extraction can/should be used to search for solutions to grammar development problems ● We presented a few selected problems in grammar extraction and discussed solutions with various degrees of acceptability (using TAGs) ● There are more and harder ones where these came from ● Question: how would these problems be handled: – By other grammar formalisms ? – By other linguistic approaches using the TAG formalism ?

S NP V VP S NP V VP S NP VBN VP SBAR WHNP  S NP VBN VP SBAR WHNP  S NP V VP SBAR WHNP  NP NP * LTAG Verb Trees 5

Automatic TAG extraction

Figure is thanks to Fei Xia

Wh percolation up (NP-SBJ (NP The bid) (PP for Great Northern) (,,) (SBAR (WHNP-1 (NP (DT a) (NN notice)) (WHPP (IN of) (WHNP (WDT which)))) (S *T* appears in an advertisement))) (NP-PRD (NP (NNS hitches)) (,,) (SBAR (RB not) (WHNP-17 (NP (DT the) (JJS least)) (WHPP (IN of) (WHNP (WDT which)))) (S *T* was that... )

Extraposition (S (NP-SBJ (NP (PRP it)) (S (-NONE- *EXP*-1))) (VP (VBZ is) (ADJP-PRD (JJ unjust)) (S-1 (NP-SBJ (-NONE- *)) (VP (TO to) (VP (VB reprove) (NP (NNP China)) (PP-PRP (IN for) (NP (PRP it))))))))